Kubernetes Homelab: From Zero to GitOps in a Weekend
Why Build a Homelab?
I've been running production Kubernetes for years. But there's a gap between "knowing K8s" and "deeply understanding K8s." Production clusters have:
- Change approval processes
- Limited experimentation budget
- Actual users who get grumpy when you break things
A homelab removes those constraints. You can:
- Learn K8s internals without cloud bills
- Test infrastructure-as-code before production
- Dogfood your own deployments
- Break things at 3am and only disappoint yourself
The Case for Physical Hardware
Cloud alternatives existβkind clusters, local Docker Compose, cloud sandbox accounts. They're fine for learning basics. But they miss:
| Feature | Cloud Sandbox | Physical Homelab |
|---|---|---|
| Persistent storage | Ephemeral or expensive | Cheap NVMe drives |
| Network troubleshooting | Abstracted away | Real NICs, real problems |
| Multi-node networking | Single node typically | Actual cross-node traffic |
| Long-running workloads | Time/resource limited | Run for months |
| Cost over 6 months | $200-500+ | $0 (after hardware) |
I went with a mini PC cluster. Four nodes, 64GB total RAM, silent, low power.
Hardware Build
The Node Strategy
I didn't want a rack server. Too loud, too hot, too ugly for a home office. Mini PCs are the sweet spot:
| Component | Per Node | x4 Total Cost |
|---|---|---|
| Intel N100 Mini PC | $150 | $600 |
| 16GB DDR5 SODIMM | $35 | $140 |
| 256GB NVMe SSD | $40 | $160 |
| Total | $900 |
Power draw: ~10W idle per node. That's $12/month in electricity. The cloud equivalent would be $300+/month.
Storage Architecture
Two approaches for persistent storage:
Option 1: Centralized NFS
- One node exports NFS share
- Other nodes mount it
- Pros: Simple, one backup target
- Cons: Single point of failure
Option 2: Distributed (Longhorn)
- Each node contributes storage
- Replicated across nodes
- Pros: Fault tolerant, K8s-native
- Cons: More complex, network overhead
I use both. NFS for media files (movies don't need replication), Longhorn for databases (need HA).
Software Stack
The Foundation: K3s
K3s is Kubernetes without the bloat. Single binary, minimal dependencies, perfect for homelab.
# On control plane node
curl -sfL https://get.k3s.io | sh -s - server \
--cluster-init \
--disable traefik \
--write-kubeconfig-mode 644
# Get the token for workers
cat /var/lib/rancher/k3s/server/node-token# On each worker node
curl -sfL https://get.k3s.io | sh -s - agent \
--server https://CONTROL_PLANE_IP:6443 \
--token YOUR_TOKENFive minutes later, you have a cluster.
GitOps with ArgoCD
This is the game-changer. ArgoCD watches your Git repo and syncs cluster state.
# bootstrap/argocd.yaml
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: homelab-apps
namespace: argocd
spec:
project: default
source:
repoURL: https://github.com/YOUR_USERNAME/homelab
targetRevision: main
path: apps
destination:
server: https://kubernetes.default.svc
namespace: default
syncPolicy:
automated:
prune: true
selfHeal: true
syncOptions:
- CreateNamespace=trueDeploy ArgoCD:
kubectl create namespace argocd
kubectl apply -n argocd -f https://raw.githubusercontent.com/argoproj/argo-cd/stable/manifests/install.yaml
kubectl apply -f bootstrap/argocd.yamlNow everything in apps/ directory in Git gets deployed automatically.
Ingress with Traefik
Traefik is modern ingress, auto-discovers services.
# infrastructure/traefik/values.yaml
ports:
web:
redirectTo: websecure
websecure:
tls:
enabled: true
certificatesResolvers:
cloudflare:
acme:
email: your@email.com
dnsChallenge:
provider: cloudflareSSL with Let's Encrypt via DNS challenge. No port forwarding, no manual cert management.
Monitoring Stack
You can't run K8s blind. Prometheus + Grafana are essential.
# monitoring/kube-prometheus-stack.yaml
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: monitoring
namespace: argocd
spec:
source:
repoURL: https://prometheus-community.github.io/helm-charts
chart: kube-prometheus-stack
targetRevision: 45.x
destination:
namespace: monitoringThis gives you:
- Prometheus (metrics collection)
- Grafana (visualization)
- AlertManager (alerting)
- Node Exporter (node metrics)
- Kube-State-Metrics (K8s metrics)
Real Workloads
What actually runs on this thing?
Media Stack
# apps/media/plex.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: plex
spec:
template:
spec:
containers:
- name: plex
image: plexinc/pms-docker:latest
resources:
requests:
memory: "4Gi"
cpu: "1"
volumeMounts:
- name: media
mountPath: /media
- name: config
mountPath: /configPlex with GPU passthrough for transcoding. Sonarr, Radarr, and Prowlarr handle the automation.
Home Automation
# apps/home-assistant.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: home-assistant
spec:
template:
spec:
containers:
- name: home-assistant
image: homeassistant/home-assistant:stable
hostNetwork: true
volumeMounts:
- name: config
mountPath: /configHome Assistant with host networking for mDNS discovery.
Password Management
# apps/vaultwarden.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: vaultwarden
spec:
template:
spec:
containers:
- name: vaultwarden
image: vaultwarden/server:latest
env:
- name: SIGNUPS_ALLOWED
value: "false"Bitwarden-compatible server. My passwords, my infrastructure.
DNS Ad Blocking
# apps/pihole.yaml
apiVersion: v1
kind: Service
metadata:
name: pihole
spec:
type: LoadBalancer
loadBalancerIP: 192.168.1.200Pi-hole as the network DNS. Every device gets ad-blocking without configuration.
GitOps Workflow
The workflow is stupid simple:
# Make a change
vim apps/media/plex.yaml
# Commit and push
git add . && git commit -m "increase plex memory" && git push
# Wait 30 seconds
# ArgoCD syncs automaticallyNo kubectl apply. No manual cluster changes. Git is the single source of truth.
Handling Secrets
Never commit secrets. Use Sealed Secrets:
# Create a secret
kubectl create secret generic db-password --from-literal=password=hunter2
# Seal it (encrypt with cluster's public key)
kubeseal --format=yaml < db-password-secret.yaml > sealed-secret.yaml
# Now safe to commit
git add sealed-secret.yaml && git commit -m "add db secret"The sealed secret can only be decrypted by your cluster's controller.
Lessons from Production Incidents
Incident 1: The Time I Deleted Everything
What happened: I ran kubectl delete namespace default --grace-period=0 thinking it was a test cluster. It was the production cluster.
Impact: Every workload gone. 50+ services.
Recovery time: 5 minutes.
How: ArgoCD noticed the drift and recreated everything from Git. The selfHeal: true setting saved my ass.
syncPolicy:
automated:
prune: true
selfHeal: true # <-- THISLesson: GitOps doesn't just make deployment easier. It makes recovery instant.
Incident 2: The Mysterious Network Latency
Symptoms: 2-second delays between services. Intermittent. Frustrating.
Without monitoring: I'd have been guessing for hours.
With Prometheus: Grafana dashboard showed elevated TCP retransmits on one node.
Root cause: One mini PC had a failing NIC. Not dead, just flaky.
Fix: Replaced the node. Prometheus confirmed latency dropped.
Lesson: Monitor before you need it. The metrics you ignore today are the debugging data you'll wish you had tomorrow.
Incident 3: Certificate Expiration
What happened: Let's Encrypt certs expired. Internal services unreachable.
Why: Cert-manager wasn't auto-renewing because of a misconfigured DNS challenge.
How I found it: Alerts from Prometheus via AlertManager to Slack.
Fix: Corrected Cloudflare API token permissions. Certs renewed within minutes.
Lesson: Alert on everything. Cert expiration at 3am should wake you up.
Cost Comparison: Cloud vs Homelab
| Resource | Cloud (Monthly) | Homelab |
|---|---|---|
| 4 nodes, 64GB RAM | $350 (EKS + EC2) | $0 (paid upfront) |
| 500GB block storage | $50 (EBS) | $0 (local NVMe) |
| Load balancer | $18 (ALB) | $0 (Traefik) |
| 1TB data transfer | $90 | $0 (home internet) |
| Managed DNS | $0.50 (Route53) | $0 (Cloudflare free) |
| SSL certificates | $0 (ACM) | $0 (Let's Encrypt) |
| Total monthly | ~$500 | $12 (electricity) |
Break-even: ~2 months
Year 1 savings: ~$5,900
Year 2+ savings: ~$6,000/year
What's Next
The homelab evolves. Current experiments:
- GPU scheduling: Adding an RTX 3060 for ML inference workloads
- Multi-cluster: Setting up a staging cluster for pre-production testing
- Disaster recovery: Velero for cross-cluster backups
- Service mesh: Linkerd for zero-trust internal networking
Conclusion
A homelab isn't about being cheap. It's about having a sandbox where:
- Mistakes are learning opportunities, not career-limiting incidents
- You control the entire stack
- Skills transfer directly to production environments
The Kubernetes knowledge that took me from "can deploy a pod" to "can architect multi-cluster GitOps platforms" came from hours of breaking and fixing my own cluster. That's education you can't buy.
Questions about building your own homelab? Hit me up on Twitter or check out my homelab repo.