High Availability K3s Cluster Setup

This guide walks you through setting up a highly available K3s cluster using embedded etcd for true HA configuration. We’ll use keepalived and HAProxy for load balancing and high availability with a floating IP address to eliminate single points of failure.

Architecture Overview

Our HA K3s setup consists of:

3 control plane nodes running K3s server with embedded etcd
3 worker nodes for running workloads
2 dedicated load balancer nodes running Keepalived and HAProxy
Keepalived for floating IP management and failover
HAProxy for load balancing K3s API server traffic
Floating IP (192.168.1.10) to eliminate single points of failure

Network Architecture Diagram

                      ┌─────────────────────────────────┐
                      │        👤 Clients               │
                      │    kubectl, k9s, apps           │
                      └──────────────┬──────────────────┘
                                     │
                      ┌──────────────▼──────────────────┐
                      │    🌐 Floating IP (VIP)         │
                      │       192.168.1.10:6443         │
                      │    Managed by Keepalived        │
                      └──────────────┬──────────────────┘
                                     │
                                     ▼
    ┌─────────────────────────────────────────────────────────────────┐
    │                    🔧 Load Balancer Layer                       │
    │                                                                 │
    │  ┌───────────────────────┐     ┌───────────────────────┐        │
    │  │ 🖥️ lb-01              │    │ 🖥️ lb-02              │        │
    │  │ 192.168.1.8           │     │ 192.168.1.9           │        │
    │  │ Keepalived MASTER     │     │ Keepalived BACKUP     │        │
    │  │ HAProxy instance      │     │ HAProxy instance      │        │
    │  └───────────────────────┘     └───────────────────────┘        │
    └─────────────────────────────────┬───────────────────────────────┘
                                      │
                                      ▼
    ┌─────────────────────────────────────────────────────────────────┐
    │                     ⚙️ K3s Control Plane                        │
    │                                                                 │
    │  ┌───────────────────────┐     ┌───────────────────────┐        │
    │  │ 🖥️ k3s-server-01      │    │ 🖥️ k3s-server-02      │        │
    │  │ 192.168.1.11          │     │ 192.168.1.12          │        │
    │  │ Control Plane+etcd    │     │ Control Plane+etcd    │        │
    │  └───────────────────────┘     └───────────────────────┘        │
    │                                                                 │
    │               ┌───────────────────────┐                         │
    │               │ 🖥️ k3s-server-03      │                         │
    │               │ 192.168.1.13          │                         │
    │               │ Control Plane+etcd    │                         │
    │               └───────────────────────┘                         │
    └─────────────────────────────────┬───────────────────────────────┘
                                      │
                                      ▼
                   ┌──────────────────────────────────────┐
                   │    🗄️ Embedded etcd Cluster          │
                   │   Distributed across control plane   │
                   │     (servers 01, 02, 03)             │
                   └──────────────────────────────────────┘
                                      │
                                      ▼
    ┌─────────────────────────────────────────────────────────────────┐
    │                     🚀 K3s Worker Nodes                         │
    │                                                                 │
    │  ┌───────────────────────┐ ┌───────────────────────┐            │
    │  │ 🖥️ k3s-worker-01     │ │ 🖥️ k3s-worker-02      │            │
    │  │ 192.168.1.21          │ │ 192.168.1.22          │            │
    │  │ Agent node            │ │ Agent node            │            │
    │  │ Workload execution    │ │ Workload execution    │            │
    │  └───────────────────────┘ └───────────────────────┘            │
    │                                                                 │
    │               ┌───────────────────────┐                         │
    │               │ 🖥️ k3s-worker-03      │                         │
    │               │ 192.168.1.23          │                         │
    │               │ Agent node            │                         │
    │               │ Workload execution    │                         │
    │               └───────────────────────┘                         │
    └─────────────────────────────────────────────────────────────────┘

Component Details

Component	Purpose	IP Address	Port	High Availability
Floating IP	Single entry point	`192.168.1.10`	`6443`	Managed by Keepalived
lb-01	Load balancer	`192.168.1.8`	`6443`	Keepalived MASTER
lb-02	Load balancer	`192.168.1.9`	`6443`	Keepalived BACKUP
k3s-server-01	Control plane + etcd	`192.168.1.11`	`6443`	Active member
k3s-server-02	Control plane + etcd	`192.168.1.12`	`6443`	Active member
k3s-server-03	Control plane + etcd	`192.168.1.13`	`6443`	Active member
k3s-worker-01	Agent node	`192.168.1.21`	`10250`	Workload distribution
k3s-worker-02	Agent node	`192.168.1.22`	`10250`	Workload distribution
k3s-worker-03	Agent node	`192.168.1.23`	`10250`	Workload distribution
Keepalived	IP failover	lb-01, lb-02	VRRP	2-node redundancy
HAProxy	Load balancing	lb-01, lb-02	`6443`	2-instance redundancy

Traffic Flow

Client Connection: Clients connect to the floating IP 192.168.1.10:6443
Keepalived: Routes traffic to the active load balancer (normally lb-01)
HAProxy: Load balances incoming API requests across all three K3s control plane servers
K3s API: Control plane processes requests and maintains cluster state via embedded etcd
Worker Communication: Worker nodes connect to the floating IP for resilient control plane access
Workload Distribution: Pods and services are scheduled across the worker nodes for actual workloads
Failover: If active load balancer fails, Keepalived automatically promotes backup load balancer

Prerequisites

Proxmox VE environment with VM template (see Proxmox Ubuntu VM Template Setup Guide)
8 Ubuntu VMs total:
- 2 for load balancers (2 CPU cores, 512MB RAM minimum)
- 3 for K3s servers (2 CPU cores, 4GB RAM minimum)
- 3 for K3s agents (2 CPU cores, 8GB RAM minimum)
Network access between all nodes
Basic understanding of Kubernetes concepts

Step 1: Prepare Load Balancer VMs

Clone VM template for load balancers

Create 2 VMs from your Ubuntu template for dedicated load balancers:
- VM IDs: 201, 202
- Names: lb-01, lb-02
- Clone Mode: Full clone
Distribute load balancer VMs across different Proxmox nodes for hardware redundancy.
Configure load balancer hardware resources

Navigate to the Hardware tab for each VM and adjust:
- Memory: 512 MB (should be sufficient for HAProxy and Keepalived)
- Processors: 1 core
- Disk: 10GB
Configure load balancer network settings

In the Cloud-Init tab, set static IP addresses:
- lb-01: 192.168.1.8/24
- lb-02: 192.168.1.9/24
- Gateway: 192.168.1.1
Start and verify load balancer VMs

Start both VMs and verify connectivity.

Step 2: Prepare K3s Server VMs

Clone VM template

Create 3 VMs from your Ubuntu template, ideally distributed across Proxmox nodes for hardware redundancy:
- VM IDs: 301, 302, 303
- Names: k3s-server-01, k3s-server-02, k3s-server-03
- Clone Mode: Full clone
If your template is on a different Proxmox node, clone locally first, then migrate the VM after creation.
Configure hardware resources

Navigate to the Hardware tab for each VM and adjust:
- Memory: 4096 MB (4GB)
- Processors: 2 cores
- Disk: Ensure adequate space (minimum 20GB recommended)
Configure network settings

In the Cloud-Init tab, set static IP addresses:
- k3s-server-01: 192.168.1.11/24
- k3s-server-02: 192.168.1.12/24
- k3s-server-03: 192.168.1.13/24
- Gateway: 192.168.1.1
Start and verify VMs

Start all VMs and verify:
- Correct IP addresses are assigned
- SSH access is working
- Internet connectivity is available
Terminal window
```
# Test connectivity to each node
ssh root@192.168.1.11
ssh root@192.168.1.12
ssh root@192.168.1.13
```

Step 3: Install and Configure Keepalived

Keepalived manages our floating IP address and provides automatic failover between the dedicated load balancer nodes.

Install keepalived

Run on both lb-01 and lb-02:
Terminal window
```
sudo apt update
sudo apt install keepalived -y
```
Create keepalived configuration

Create the configuration file on both VMs:
Terminal window
```
sudo nano /etc/keepalived/keepalived.conf
```

Configure master node (lb-01)

vrrp_instance VI_1 {
    state MASTER
    interface eth0
    virtual_router_id 55
    priority 255
    advert_int 1
    authentication {
        auth_type PASS
        auth_pass SuPeRsEcReT
    }
    virtual_ipaddress {
        192.168.1.10/24
    }
}

Configure backup node (lb-02)

vrrp_instance VI_1 {
    state BACKUP
    interface eth0
    virtual_router_id 55
    priority 150
    advert_int 1
    authentication {
        auth_type PASS
        auth_pass SuPeRsEcReT
    }
    virtual_ipaddress {
        192.168.1.10/24
    }
}

Start and enable keepalived

On both load balancer nodes:

sudo systemctl enable --now keepalived.service
sudo systemctl status keepalived.service

Test failover functionality

# Test basic connectivity to load balancer nodes
ping -c 3 192.168.1.8
ping -c 3 192.168.1.9

# Test floating IP
ping -c 3 192.168.1.10

Test failover by stopping keepalived on the master:

# On lb-01 (master)
sudo systemctl stop keepalived.service

# The floating IP should still respond
ping -c 3 192.168.1.10

# Check which node took over on lb-02
sudo systemctl status keepalived.service

# Restart the service on lb-01
sudo systemctl start keepalived.service

Step 4: Install and Configure HAProxy

HAProxy will load balance the K3s API server traffic across all three K3s server nodes.

Install HAProxy

Run on both lb-01 and lb-02:
Terminal window
```
sudo apt update
sudo apt install haproxy -y
```

Configure HAProxy

Edit the HAProxy configuration on both load balancer VMs:

sudo nano /etc/haproxy/haproxy.cfg

Add the following configuration at the end of the file:

# K3s API Server Load Balancer
frontend k3s_frontend
        bind *:6443
        mode tcp
        default_backend k3s_backend

backend k3s_backend
        mode tcp
        option tcp-check
        balance roundrobin

        server k3s-server-01 192.168.1.11:6443 check
        server k3s-server-02 192.168.1.12:6443 check
        server k3s-server-03 192.168.1.13:6443 check

Restart and enable HAProxy

Run on both load balancer VMs:
Terminal window
```
sudo systemctl restart haproxy
sudo systemctl enable haproxy
sudo systemctl status haproxy
```
HAProxy status will show that the K3s servers are down, which is expected since we haven’t set them up yet.
Verify HAProxy configuration

Check for configuration errors:
Terminal window
```
sudo haproxy -f /etc/haproxy/haproxy.cfg -c
```

Step 5: Install K3s Cluster

Now we’ll install K3s with embedded etcd to create our highly available cluster.

Initialize the first control plane node

On k3s-server-01, run:
Terminal window
```
curl -sfL https://get.k3s.io | K3S_TOKEN=SECRET sh -s - server \
  --cluster-init \
  --tls-san=192.168.1.10
```
Replace SECRET with a secure token of your choice. This token will be used by other nodes to join the cluster.
Wait for first node to be ready

Verify the first node is running:
Terminal window
```
sudo systemctl status k3s
sudo kubectl get nodes
```

Join additional control plane nodes

On k3s-server-02 and k3s-server-03, run:

curl -sfL https://get.k3s.io | K3S_TOKEN=SECRET sh -s - server \
  --server https://192.168.1.11:6443 \
  --tls-san=192.168.1.10

Verify cluster status

Check that all nodes have joined:
Terminal window
```
sudo kubectl get nodes -o wide
sudo kubectl get pods -A
```

Step 6: Configure Agent (Worker) Nodes

Worker nodes are responsible for running your actual workloads (pods, containers). They join the cluster as agent nodes and communicate with the control plane via the floating IP for high availability.

Clone VM template for worker nodes

Create 3 worker VMs from the Ubuntu template, distributed across Proxmox nodes:
- VM IDs: 311, 312, 313
- Names: k3s-worker-01, k3s-worker-02, k3s-worker-03
- Clone Mode: Full clone
- Distribution: Place one worker on each Proxmox node alongside the control plane nodes
Configure worker hardware resources

Navigate to the Hardware tab for each worker VM and adjust:
- Memory: 24576 MB (24GB - adjust based on your workload requirements)
- Processors: 2 cores (adjust based on your workload requirements)
- Disk: 32GB (consider more storage if using Longhorn or other persistent storage solutions)
Worker nodes need more resources than control plane nodes since they run your actual workloads. Adjust memory and CPU based on your expected application requirements.
Configure worker network settings

In the Cloud-Init tab, set static IP addresses:
- k3s-worker-01: 192.168.1.21/24
- k3s-worker-02: 192.168.1.22/24
- k3s-worker-03: 192.168.1.23/24
- Gateway: 192.168.1.1

Start and verify worker VMs

Start all worker VMs and verify connectivity:

# Test connectivity to each worker node
ssh root@192.168.1.21
ssh root@192.168.1.22
ssh root@192.168.1.23

Install K3s agent on worker nodes

On each worker node (k3s-worker-01, k3s-worker-02, k3s-worker-03), run:
Terminal window
```
curl -sfL https://get.k3s.io | K3S_TOKEN=SECRET sh -s - agent \
  --server https://192.168.1.10:6443
```
Use the same SECRET token that you used when setting up the control plane cluster.

Worker nodes connect to the floating IP (192.168.1.10) rather than individual server IPs. This ensures high availability - if any control plane node fails, workers can still communicate with the cluster.

Verify cluster with worker nodes

From any control plane node, verify all nodes have joined:

# Check all nodes are ready
sudo kubectl get nodes -o wide

# Should show something like:
# NAME             STATUS   ROLES                  AGE   VERSION
# k3s-server-01    Ready    control-plane,master   45m   v1.28.x+k3s1
# k3s-server-02    Ready    control-plane,master   40m   v1.28.x+k3s1
# k3s-server-03    Ready    control-plane,master   35m   v1.28.x+k3s1
# k3s-worker-01    Ready    <none>                 5m    v1.28.x+k3s1
# k3s-worker-02    Ready    <none>                 4m    v1.28.x+k3s1
# k3s-worker-03    Ready    <none>                 3m    v1.28.x+k3s1

# Check cluster info
sudo kubectl cluster-info

# Verify system pods are running
sudo kubectl get pods -A

Test workload scheduling

Deploy a test application to verify workers can run pods:

# Create a test deployment
sudo kubectl create deployment nginx-test --image=nginx --replicas=3

# Check pod distribution across worker nodes
sudo kubectl get pods -o wide

# Clean up test deployment
sudo kubectl delete deployment nginx-test

Next Steps

Configure persistent storage with Longhorn for high availability data persistence
Set up monitoring and logging
Implement backup strategies for your cluster data