Introduction
When you are hosting multiple mission-critical line-of-business applications on a single AKS cluster, you want to ensure the cluster runs as stable as possible.
We can and should apply multiple best practices, such as limiting resource usage of pods and using resource quotas on namespaces.
Another best practice I'd recommend is separating critical system pods from application pods with dedicated node pools. This separation helps to protect system components from rogue application workloads that might negatively affect the stability of the cluster.
This article will demonstrate how to create dedicated node pools and prevent any user workload from being scheduled on the critical system node pool.
How can we achieve that goal?
This can be achieved by using a feature called taints
and tolerations
. Taints are the opposite of node affinity. Instead of attracting pods to nodes, they allow repelling a set of pods. Tolerations, conversely, work together with taints and allow a pod to be scheduled on a tainted node.
So what we're going to do is configure taints
on an existing (or a new) system node pool to keep every pod away that hasn't set a specific toleration
.
Let's walk through the process step by step.
Step by step
Create a user node pool for the application workload
It's important to have a user node pool in place before we start setting taints on the system node pool, so we don't prematurely stop AKS from scheduling new pods.
💡 Obviously, the command from above creates a very limited user node pool for demonstration purposes only!
Now that the user node pool is up and running we can add a taint to the system node pool.
Updating an existing system node pool
az aks nodepool update \
--cluster-name aks-azureblue \
--nodepool-name system \
--resource-group rg-kubernetes \
--node-taints "CriticalAddonsOnly=true:NoSchedule"
💡 It is important to note, that the taint key name,CriticalAddonsOnly
, can not be choosen as wish. This is because the system pods, which are managed by Microsoft, have a default toleration configured, that matches this taint. Otherwise pods, such asCoreDNS
, wont get scheduled anymore!
Let's double-check that the correct taint got created.
az aks nodepool show \
--cluster-name aks-azureblue \
--resource-group rg-kubernetes \
--nodepool-name system \
--query nodeTaints
This should return with the following
[
"CriticalAddonsOnly=true:NoSchedule"
]
Alternatively, you can use some kubectl
voodoo (or dig through kubectl describe/get nodes ...
)
kubectl get nodes -o custom-columns=NAME:.metadata.name,TAINTS:.spec.taints --no-headers
Verify the setup
Let's schedule a very basic pod in the default namespace by running kubectl apply -f pod.yaml
.
By running kubectl get pod -wide
we can verify it does get scheduled on the user node pool. Note the name of the node carrying part of the node pool name.
$ kubectl get pod -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
pod-a 1/1 Running 0 5s 10.244.1.4 aks-app-95917181-vmss000000 <none> <none>
Taking it one step further
There might be situations where you'd want to host other critical pods on the system node pool as well, which, however, aren't exactly application workloads. An example could be ingress-nginx
.
This is again, where tolerations
come into play. But instead of adjusting each and every helm chart that you might have in use, we can instead apply default tolerations on a namespace scope. The relevant annotation is called scheduler.alpha.kubernetes.io/defaultTolerations
.
Quoting the documentation: "[...] This annotation key allows assigning tolerations to a namespace and any new pods created in this namespace would get these tolerations added."
Further, we need to explicitly bind every pod that should be excluded to the system node pool, by using the scheduler.alpha.kubernetes.io/node-selector
annotation key.
This is what the final namespace object would look like. Every pod belonging to this namespace would get scheduled on the system node pool.
Conclusion
Let me summarize the most important key takeaways.
- Separating application workload from system pods helps to increase the AKS cluster stability
- We can use
taints
on nodes to repel a set of pods andtolerations
to add exclusions - We are required to use the key name
CriticalAddonsOnly
for thetaints
, and cannot freely choose them - We can inherit
tolerations
on the namespace level by using thescheduler.alpha.kubernetes.io/defaultTolerations
That's it for today. I hope you enjoyed reading it! 😎