Skip to main content

Setting up a bare-metal cluster - part 2

Terminal screen, kubectl

The story so far

In the first part (part1) we set up and provisioned three servers to serve as nodes in the cluster. We have a virtual ip managed by Keepalived and a HA etcd cluster that we will be using as external datastore for the k3s cluster. Some other stuff was installed and configured using ansible as well; we have docker installed, though in principle we could have simply let k3s use the default containerd.

Let's first have a look at the nodes and services after having run ansible to make sure everything is as expected. Then we'll install k3s to turn these servers into a Kubernetes cluster.

Keepalived - virtual IP

Since we defined Helena to be the master, with the other two nodes defined as backups, we should expect the vip address to be attached to Helena initially. Logging into the host, we can check this:

helena:~$ ip -4 addr show dev eno1
2: eno1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    inet 10.0.0.85/24 brd 10.0.0.255 scope global dynamic eno1
       valid_lft 597020sec preferred_lft 597020sec
    inet 10.0.0.84/24 scope global secondary eno1
       valid_lft forever preferred_lft forever

pollux:/var/log$ ip -4 addr show dev eno1
2: eno1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    inet 10.0.0.86/24 brd 10.0.0.255 scope global dynamic eno1
       valid_lft 828252sec preferred_lft 828252sec

castor:~$ ip -4 addr show dev eno1
2: eno1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    inet 10.0.0.87/24 brd 10.0.0.255 scope global dynamic eno1
       valid_lft 852089sec preferred_lft 852089sec

Now, if anything particularly bad would happen to host Helena, the ip address would be taken over by one of the other nodes. Checking this by manually stopping the keepalived on Helena:

helena:~$ sudo service keepalived stop

pollux:/var/log$ sudo service keepalived status
...
Dec 08 19:00:17 pollux Keepalived_vrrp[10939]: (APIServerVIP) Backup received priority 0 advertisement
Dec 08 19:00:18 pollux Keepalived_vrrp[10939]: (APIServerVIP) Entering MASTER STATE

pollux:/var/log$ ip -4 addr show dev eno1
2: eno1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    inet 10.0.0.86/24 brd 10.0.0.255 scope global dynamic eno1
       valid_lft 828156sec preferred_lft 828156sec
    inet 10.0.0.84/24 scope global secondary eno1
       valid_lft forever preferred_lft forever

helena:~$ ip -4 addr show dev eno1
2: eno1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    inet 10.0.0.85/24 brd 10.0.0.255 scope global dynamic eno1
       valid_lft 595801sec preferred_lft 595801sec

And back again, by starting the daemon on Helena again

helena:~$ sudo service keepalived start

Dec 08 19:03:41 pollux Keepalived_vrrp[10939]: (APIServerVIP) Master received advert from 10.0.0.85 with higher priority 100, ours 99
Dec 08 19:03:41 pollux Keepalived_vrrp[10939]: (APIServerVIP) Entering BACKUP STATE

helena:~$ ip -4 addr show dev eno1
2: eno1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    inet 10.0.0.85/24 brd 10.0.0.255 scope global dynamic eno1
       valid_lft 595691sec preferred_lft 595691sec
    inet 10.0.0.84/24 scope global secondary eno1
       valid_lft forever preferred_lft forever
pollux:/var/log$ ip -4 addr show dev eno1
2: eno1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    inet 10.0.0.86/24 brd 10.0.0.255 scope global dynamic eno1
       valid_lft 827867sec preferred_lft 827867sec

Pollux realises that Helena is back up, so it yields to the higher priority. In general, failover is very fast.

Pros and cons

Using a virtual IP address in front of our cluster has some pros and cons. A more widely used solution would be running an external load balancer like HAProxy or Nginx. Using an external load balancer has the big advantage of not only providing redundancy, but also being able to spread incoming requests, that is doing load balancing. For bare-metal instances there is also a great product called MetalLB (Metal Load Balancer). If you're in the cloud, there will be a perfectly fine external load balancer provided by the cloud provider.  

Using a virtual IP address, there is only redundancy, no load balancing. So all traffic will always go through the current master. For our purposes, a local home lab / development setup, this is fine. It does have the advantage of making the cluster self contained as it doesn't depend on external infrastructure. In the same way, using the local Etcd cluster, rather than an external database for the Kubernetes install, also keeps the cluster more self contained. 

 

Etcd cluster

Since etcd uses certificates for access, we will need to provide the server's cert and the certificate authority's cert when accessing the cluster. To make life simpler, we can add a small convenience script on the host;

helena:~$ cat etcdctl.sh
/opt/etcd/bin/etcdctl --cacert /etc/etcd/ssl/ca.crt --cert /etc/etcd/ssl/server.crt --key /etc/etcd/ssl/server.key "$@"

Now we can check to make sure our cluster is up and healthy:

helena:~/scripts$ sudo ./etcdctl.sh --write-out=table member list
+------------------+---------+--------+------------------------+------------------------+------------+
|        ID        | STATUS  |  NAME  |       PEER ADDRS       |      CLIENT ADDRS      | IS LEARNER |
+------------------+---------+--------+------------------------+------------------------+------------+
| d13d4f984ec3fcf7 | started | helena | https://10.0.0.85:2380 | https://10.0.0.85:2379 |      false |
| e491571ced5af241 | started | pollux | https://10.0.0.86:2380 | https://10.0.0.86:2379 |      false |
| fcbb30413deea276 | started | castor | https://10.0.0.87:2380 | https://10.0.0.87:2379 |      false |
+------------------+---------+--------+------------------------+------------------------+------------+
helena:~/scripts$ sudo ./etcdctl.sh --write-out=table endpoint health --cluster
+------------------------+--------+-------------+-------+
|        ENDPOINT        | HEALTH |    TOOK     | ERROR |
+------------------------+--------+-------------+-------+
| https://10.0.0.85:2379 |   true | 41.540932ms |       |
| https://10.0.0.86:2379 |   true | 47.588864ms |       |
| https://10.0.0.87:2379 |   true | 47.968092ms |       |
+------------------------+--------+-------------+-------+
helena:~/scripts$ sudo ./etcdctl.sh --write-out=table --endpoints=https://10.0.0.86:2379 endpoint status
+------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
|        ENDPOINT        |        ID        | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS |
+------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
| https://10.0.0.86:2379 | e491571ced5af241 |  3.4.13 |   93 MB |     false |      false |      2109 |   22953058 |           22953058 |        |
+------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+

Looking good so far.

While we will be using the cluster mainly as our k3s backend, we can have a play around with it first, to verify.

#Put foo into the cluster:
helena:~/scripts$ sudo ./etcdctl.sh put foo '{id:1, name: "Bob", occupation: "President"}'
OK

#Retrieve foo from the cluster on another host
pollux:~/scripts$ sudo ./etcdctl.sh get foo
foo
{id:1, name: "Bob", occupation: "President"}

#We can watch for changes (here interactively):
pollux:~/scripts$ sudo ./etcdctl.sh watch -i
watch foo
PUT
foo
{id:1, name: "Bob", occupation: "King"}
^C

#While watching on Pollux, we updated foo on Helena:
helena:~/scripts$ sudo ./etcdctl.sh put foo '{id:1, name: "Bob", occupation: "King"}'
OK

#Keys can be linked to a lease
helena:~/scripts$ sudo ./etcdctl.sh lease grant 120
lease 7cf775c3dec6257a granted with TTL(120s)
helena:~/scripts$ sudo ./etcdctl.sh put footime "{still here}" --lease 7cf775c3dec6257a
OK
helena:~/scripts$ sudo ./etcdctl.sh get footime
footime
{still here}

#Watching for footime:
pollux:~/scripts$ sudo ./etcdctl.sh watch -i
watch footime
PUT
footime
{still here}
DELETE
footime

helena:~/scripts$ sudo ./etcdctl.sh get footime
#(no result)

By itself etcd is a great piece of technology, more on the official site: etcd.io

 

Installing k3s high availability

Everything is now ready to install k3s on the cluster. We'll use the installation script as documented on Rancher's website. We will set a couple of options beforehand to specify our specific install. 

#Download the installation script
helena:~/k3s$ curl -sfL https://get.k3s.io > install.sh

helena:~/k3s$ cat settings.sh
export K3S_DATASTORE_ENDPOINT='https://10.0.0.85:2379,https://10.0.0.86:2379,https://10.0.0.87:2379'
export K3S_DATASTORE_CAFILE='/etc/etcd/ssl/ca.crt'
export K3S_DATASTORE_CERTFILE='/etc/etcd/ssl/server.crt'
export K3S_DATASTORE_KEYFILE='/etc/etcd/ssl/server.key'
export INSTALL_K3S_EXEC='--docker --tls-san 10.0.0.84'

#source the settings file
helena:~/k3s$ . settings.sh

First of all we are setting the variables that define our etcd cluster, K3S_DATASTORE_*, specifying the endpoints and certificates to use.

Next we provide the install options. We will use docker as container engine (optional, we could leave this out and use the default containerd). Also, we specify that the virtual ip (10.0.0.84) should be included in the k3s certificate (tls-san 10.0.0.84). This is necessary to be able to connect to the api using the vip address.

Onward with the install:

marcel@helena:~/k3s$ ./install.sh


#After a little while:
marcel@helena:~/k3s$ sudo kubectl get nodes
NAME     STATUS   ROLES    AGE    VERSION
helena   Ready    master   2m6s   v1.19.3+k3s2

#After running the install on all three nodes:
marcel@helena:~$ kubectl get nodes -o wide
NAME     STATUS   ROLES    AGE     VERSION        INTERNAL-IP   EXTERNAL-IP   OS-IMAGE                       KERNEL-VERSION    CONTAINER-RUNTIME
castor   Ready    master   5m14s   v1.19.3+k3s2   10.0.0.87     <none>        Debian GNU/Linux 10 (buster)   4.19.0-12-amd64   docker://19.3.13
helena   Ready    master   13m     v1.19.3+k3s2   10.0.0.85     <none>        Debian GNU/Linux 10 (buster)   4.19.0-12-amd64   docker://19.3.13
pollux   Ready    master   8m11s   v1.19.3+k3s2   10.0.0.86     <none>        Debian GNU/Linux 10 (buster)   4.19.0-12-amd64   docker://19.3.13

Succes!

We can quickly verify that k3s is actually using the etcd store by retrieving some keys having to do with the pods registry:

helena:~/scripts$ sudo ./etcdctl.sh get /registry/pods/ --prefix=true -w json |jq
#Several pages of json showing pods

Kubectl

K3s writes the cluster yml to /etc/rancher/k3s/k3s.yaml, so if we want to connect from our devbox or another management tool, we'll need the info and certificate from there. We'll also update the server address to use our virtual ip. After copying the file, either integrate it in your .kubeconfig, or use it directly by setting KUBECONFIG.       

devbox1:~$ vi k3s.yaml
apiVersion: v1
clusters:
- cluster:
    certificate-authority-data: <KEY>
    server: https://10.0.0.84:6443
  name: default
contexts:
- context:
    cluster: default
    user: default
  name: default
current-context: default
kind: Config
preferences: {}
users:
- name: default
  user:
    client-certificate-data: <KEY>
    client-key-data: <KEY>

Now we can access our cluster from the outside

devbox1:~$ kubectl get nodes
NAME     STATUS   ROLES    AGE   VERSION
castor   Ready    master   26d   v1.19.3+k3s2
helena   Ready    master   26d   v1.19.3+k3s2
pollux   Ready    master   26d   v1.19.3+k3s2

Finalizing and further steps

First of all, we should now be able to set vip_check_kubernetes: True and run the provisioning again using ansible. The only change will be that the Keepalived daemon will now use the health of the k3s api on port 6443 as health check. 

Github

I've added the Ansible playbook to github.com/pmvrolijk/k3s-cluster, with an added role to also install the k3s masters to the cluster in one go. So using that will get you to the end of this part in one step.  

In the next part, we'll install Rancher (the cluster management tool) itself on the cluster and set up some storage. Stay tuned.  

Department