Building a 3-Node Proxmox Cluster
Hardware selection, networking, storage, and lessons learned from building a production homelab.
Starting point
I started with a single Beelink SER7 running Proxmox. It handled development VMs and a few Docker containers, but as I moved business services onto the homelab, I needed redundancy and more compute.
The goal: a 3-node cluster that could survive a single node failure without losing any critical services.
Hardware selection
| Node | Hardware | Role | |------|----------|------| | pve-1 | Beelink SER7 (Ryzen 7 7840HS, 64GB) | General compute, Coolify | | pve-2 | Beelink SER7 (Ryzen 7 7840HS, 64GB) | Databases, monitoring | | pve-3 | Custom Ryzen build (Ryzen 9, 128GB, RTX 4070) | AI/ML workloads, GPU passthrough |
The SER7s are excellent value for homelab use — quiet, low power draw, and the 7840HS has enough cores for 15-20 containers each. The custom build handles anything GPU-bound: LLM inference, embedding generation, video transcoding.
Networking
TP-Link Omada runs the network. Managed switches with VLANs separate traffic:
- VLAN 10: Management (Proxmox web UI, IPMI)
- VLAN 20: Production services
- VLAN 30: Development/staging
- VLAN 40: IoT and cameras (isolated)
Inter-VLAN routing happens at the gateway with strict firewall rules. Production can't reach management. IoT can't reach anything except its own VLAN and the internet.
Storage
Each node has local NVMe with ZFS mirrors. For shared storage, a TrueNAS instance exports NFS shares that Proxmox mounts for ISOs, backups, and container templates.
I considered Ceph but decided against it — three nodes is the bare minimum for Ceph, and the overhead isn't worth it at this scale. ZFS replication between nodes handles the redundancy I need.
Key lessons
Start with networking. VLANs and proper segmentation save hours of debugging later. Do it before you deploy a single VM.
ZFS wants RAM. Budget 1GB of RAM per 1TB of storage for ZFS ARC. The SER7s have 64GB each, which is more than enough for their local storage pools.
GPU passthrough is finicky. IOMMU groups, vfio-pci binding, and ROM bar issues cost me a weekend. Document your working config and don't touch it.
Monitoring from day one. Uptime Kuma watches every service. Better Stack provides external monitoring and alerting. If something goes down at 3 AM, I get a Telegram notification, not a surprise the next morning.
What I'd do differently
I'd buy all identical hardware. Mixing node specs makes resource scheduling awkward — you can't live-migrate a GPU-dependent VM to a node without a GPU. Homogeneous clusters are simpler.
I'd also set up Proxmox Backup Server from the start instead of retrofitting it. Backups are the most important thing in any infrastructure, and I was running for three months with only ZFS snapshots before adding proper offsite backups.