In the previous post we had quick overview over the history of web development, how and why virtualization happens to be a huge success and all of this without a single line of code.
Impressive, the power of the mumbo jumbo we can deliver...
In this post we will explore a little bit more the first steps in a long journey of kubernets on a cluster of PIs.
Hardware, here we go..
So, evidently, the first step was to buy the hardware. Here we don't have much to say. It was basically just buying a set of 5 raspberry pi 3 computers and connecting them over a Local Area Network.
In the end this was the deal
- 5 raspberry pi computers
- 5 32gb sd cards
- 5 rj45 cat5e cables
- Cisco switch with 16 ethernet ports
A little bit about packets
I should tell you that the switch is a key piece on this. You know, routers they do their magical work at tcp/ip layer 3 the reason for this is that routers need to route (you don't say). To route traffic, routers have several tables that are used to manage, for instance, tcp persistent connections. In a nutshell what we can conclude is that routers do lot of work and any more work you will put them under, you'll notice.
A Switch is a much dumber beast, and also because of it, much faster. Switches don't care about routing, they delegate it for routers, instead they just switch traffic based on the media access control address. A switch operates on the tcp/ip layer 2.
So why the f*?k is this important?
You see, by being a lot more dumber than routers, switch also do a lot less work and so the trade off here is that they can also process a lot of more packets for the same amount of CPU. The sad part is that a switch doesn't route traffic. But the good part is that for lan connected devices we don't need that.
But a question remains.
If the switch don't route the traffic of the devices, how the hell will the nodes connect with the outside?
Well turns out that one of these devices can be a router! This is the trick. This is also the reason why you see a combination of router/switch on most of network implementations.
Bellow you can see the network topology
Bring me the juice
Now that we have all the hardware its time to do the most boring of the tasks. Setting a linux distribution in all of the five sd cards. But what linux distribution?
Actually any linux distribution would be viable, but turns out that there is available a community project that already does a lot of work for us, regarding kubernets provisioning under ARM based processors, like the raspberry pi. The HypriotOS. They also have a very nice set of tutorials that explain how you can setup a kubernets cluster.
The only downside is that this is a very tedious work to do, and I would need to repeat it at least five times. Moreover, in the future, if I need to add a new node in the cluster I would need to repeat all of the steps. If I forget just one step I could end up with a heterogeneous setup that could create some nasty bugs difficult to spot.
I need a better idea. Yes you guess it, Ansible.
This is the current folder structure of the ansible automation code
├── config.yml ├── hosts ├── kubeconfig │ └── kuber5.local │ └── etc │ └── kubernetes │ └── admin.conf ├── master.yml ├── nodes.yml ├── postsetup.yml ├── roles │ ├── applications │ │ ├── files │ │ │ ├── dashboard │ │ │ │ └── dashboard.yaml │ │ │ ├── flannel │ │ │ │ ├── kube-flannel-rbac.yaml │ │ │ │ └── kube-flannel.yaml │ │ │ ├── heapster │ │ │ │ ├── heapster-rbac.yaml │ │ │ │ └── heapster.yaml │ │ │ ├── influx │ │ │ │ ├── grafana.yaml │ │ │ │ └── influxdb.yaml │ │ │ └── traefik │ │ │ └── traefik-with-rbac.yaml │ │ └── tasks │ │ └── main.yml │ ├── base │ │ ├── defaults │ │ │ └── main.yml │ │ ├── files │ │ │ ├── copy_hosts │ │ │ └── docker_profile.sh │ │ ├── tasks │ │ │ ├── apt.yml │ │ │ ├── main.yml │ │ │ ├── mountpoints.yml │ │ │ ├── swap.yml │ │ │ ├── system.yml │ │ │ ├── user.yml │ │ │ └── wifi.yml │ │ └── templates │ │ ├── hosts │ │ └── wpa_supplicant.conf │ ├── master │ │ ├── files │ │ │ └── set-configs.sh │ │ └── tasks │ │ └── main.yml │ ├── node │ │ ├── files │ │ │ └── join.sh │ │ └── tasks │ │ └── main.yml │ ├── presteps │ │ └── tasks │ │ └── main.yml │ └── update │ └── tasks │ └── main.yml ├── setup.yml └── update.yml
What the heck!? That's lot of creepy stuff!
Well, yeah. But lets break this, piece by piece.
Don't forget the simple idea behind. We need to automate the several tasks involved in the provisioning of kubernets in all of the nodes.
We got a config.xml file that is responsible to hold general configurations that we would like to avoid have scattered over the several scripts.
# You timezone timezone: "Europe/Lisbon" arch: arm token: <cluster_id> traefik_node: rasp1 shared_folder: /media/wd5t shared_folder_pattern: \/media\/wd5t master: 192.168.1.15:6443 advertise_master: 0.0.0.0 docker_registry: 10.108.252.69:5000 cidr: 10.244.0.0/16 reset: true nocows: 0
We also have a file called hosts. This file is the inventory of machines available and has the following
[pis] rasp5 name=rasp5 ansible_ssh_host=192.168.1.15 rasp4 name=rasp4 ansible_ssh_host=192.168.1.14 rasp3 name=rasp3 ansible_ssh_host=192.168.1.13 rasp2 name=rasp2 ansible_ssh_host=192.168.1.12 rasp1 name=rasp1 ansible_ssh_host=192.168.1.11 [master] rasp5 [nodes] rasp1 rasp2 rasp3 rasp4
The provisioning is defined in a very small file called setup.xml. Lets look at it
- name: Setup Raspberry Pi Base system hosts: newslave gather_facts: yes remote_user: pirate become: true become_method: sudo vars_files: - config.yml roles: - presteps - base
Yes, this don't add much. It is like a lie, right? Well the idea here is to create the provisioning as modular as possible. So we extracted the hard work into two different roles, namely presteps and base. The rest of the yaml ansible configuration sets some needed properties.
- A set of hosts for which the scripts will be applied upon hosts: newslave. This basically loads the hosts file and get the group of hosts identified under the tag newslave
- Inject the variables presented in the config.yml we viewed before.
If you notice that folder tree above, you'll notice that the role presteps has a folder tasks. By convention this tasks folder has a set of ansible playbooks where we do all the heavy lifting.
Ansible provide us with a set of commands that have some very interesting properties, being one of those idempotency. This is a very handy feature to have. Idempotency means that if you run your ansible command twice it will behave just as your run it once. This is very important when we need to create folders, users and other kind of tasks that are somewhat cumbersome to implement manually due of idempotent requirements. Ansible modules give us this feature for free.
A look at the prestep tasks
Lets look at presteps/tasks/main.yml file
- name: Disable swap shell: swapoff -a - name: Adding iptables rules shell: iptables -A FORWARD -i cni0 -j ACCEPT; iptables -A FORWARD -o cni0 -j ACCEPT - name: System daemon reload shell: systemctl daemon-reload
These three presteps are needed because we have a version of kubernets that needs
- Have swap disabled
- Need to forward packets that comes in the cni0 network interface, for kubernets network packet routing
- We need to put these changes in place
Automated provisioning with ansible base role
The base provisioning is not so basic at all. If you look into the base ansible role you'll see lots of files. We could set everything in just one big file. The problem is that filling an entire file with ansible modules is not a very modular approach. With more and more requirements the big fat file tends to become difficult to manage.
So instead we divide the base provisioning into several main tasks
apt.yml responsible to install the main debian packages needed. It also adds the kubernets and hypriotos repositories
swap.yml in the current setup swap is not supported by the kubernets daemon. Nevertheless, this is the swap configuration, and if in future releases we are able to use swap we just need to include this task file.
system.yml This is a hacky task file where we include all the needed system tweeks to integrate kubernets docker and the operative system.
users.yml here we setup the users. By doing this we ensure that we got in all nodes the same user with the same permissions.
mountpoints.yml This tasks file will setup the nfs mountpoints in all raspberrys.
main.yml We group in this file the previous task files. Notice that, for instance, swap and wifi configurations are disabled on main because the current setup doesn't support swap and we have a cable network, so we don't need wifi power management configuration. This last configuration is only needed if we had in place a wireless cluster.
Montpoints and NFS.
From the all previous configurations we should give a little more attention to one in particular. The mountpoints.yml task file. This is, indeed, one of the most important setup configurations. Being a cluster of raspberry pi nodes we have a very limited storage availability. Currently we end up with just 32Gb (sd card) per node. We need space for:
- The operative system
- The kubernets environment
- The docker environment
- The docker images
- The containers
- The state managed by the containers.
If we just have the 32Gb sd cards we would quickly run out of space. For instance imagine you got one container running some database and imagine that this database is growing some hundred megabytes everyday. We would run out of space in less than a few months. Vertical scale on the sd cards is not a option. Notice that if we swap 32Gb to 128Gb we would run into the same problem.
However, this is not the only problem.
We have, however, a bigger problem. We would be mixing the space available in the nodes, which primarily responsible is to run ephemeral containers, with a long term data storage. You see, the nodes in kubernetes should be seen as replaceable ones. If you start mixing containers needed space with long term data storage you complicate things. So how to solve this?
Well the most simple, and effective, approach I saw was to separate container state from long run persistent state.
So we assume all the 32Gb available space presented in the cards is of ephemeral nature. We just don't care about the management of data there (apart from the periodical docker images removal to avoid filling the cards with images not used).
All the important state should be managed differently. For that we have NFS mountpoints. We use nfs volumes and we mount them at runtime, binding it with the containers.
This approach has the following main advantages.
- We decouple ephemeral data storage from long term data.
- We enable data management policies on just the important data.
- We simplify the process because we just don't care about what state is being managed in the containers.
There is a trade off, though.
The only downside here is that we end up with a little more complexity on the container configuration. Without this separation the container configuration is pretty simple because we don't care to separate transient data from long term data and so we let the container manage it. The problem with this is if we lost the container we also lost important information. And, as we saw, we have the problem data storage scaling.
We don't, yet, have a cluster
It seems that we already have the kubernets cluster ready. Well not so fast. Notice that we have all the dependencies resolved but it is missing an important step. From the previous post we saw that kubernets is a master slave architecture. So it is expectable to have different configuration for the master and for the slaves.
Indeed we are missing these two steps, master and slave configuration.
For that, you guess, we got as set of two new roles. One that deals with master configuration and another that deals with the slaves setup.
There's not much happening here, we basically set all the steps described in the kubernets setup cluster tutorial and write it down in ansible commands.
And so it is...
We basically did it. We manage to provision the kubernets raspberry cluster in an automated way. In the way we solve the data storage scalability problem and now we just need to add services to our kubernets cluster. At this time we have a functional kubernets cluster. There is a catch though, we have nothing running on it. So no function at all. The next post will walk us through all the service configuration and how to give real value to the cluster.