Cheat Sheet - Azure Kubernetes Services

Identity Management

Quickly switch between identites when using AKS with Azure AD integration (given kubelogin is used with devicecode login method).

$ kubectl auth whoami 
Username     [email protected]
Groups       [... system:authenticated]
Extra: oid   [...]

$ kubelogin remove-tokens

$ kubectl auth whoami 
To sign in, use a web browser to open the page and enter the code XXXXX to authenticate.

> Select a different account... 

Username     [email protected] 
Groups       [... system:authenticated]
Extra: oid   [...]

Node Pool Management

Reboot a node

kubectl get nodes
NAME                                STATUS     ROLES   AGE   VERSION
aks-agentpool-29989922-vmss000003   Ready      agent   2d    v1.24.6
aks-apps-11756085-vmss000000        Ready      agent   25m   v1.24.6
aks-apps-11756085-vmss000001        Ready      agent   25m   v1.24.6
aks-apps-11756085-vmss000002        Ready      agent   25m   v1.24.6

# Mark node as unschedulable 
kubectl cordon aks-apps-11756085-vmss000002

# Drain the node 
kubectl drain --ignore-daemonsets --delete-emptydir-data aks-apps-11756085-vmss000002

# Restart it 
az vmss restart -g MC_rg-demo_aks-azureblue_switzerlandnorth -n aks-apps-11756085-vmss --instance-ids 2

# Mark again as schedulable 
kubectl uncordon aks-apps-11756085-vmss000002

Rebooting node "aks-apps-11756085-vmss000002"

As an alternative, use kured

Create an interactive shell connection to a Linux node

# Run debug container on node 
kubectl debug node/aks-nodepool1-12345678-vmss000000 -it

# Interact with node session 
chroot /host /bin/bash

I'd also suggest to create a function in your .bashrc to always have it at hand, so you can call it like kdebug <node>

kdebug() {
  kubectl debug node/"$1" -it --namespace kube-system

Manually scale a nodepool

This only works when the cluster autoscaler is disabled

az aks nodepool scale --cluster-name aks-azureblue --name agentpool --resource-group rg-demo --node-count 3

Get current node count

az aks nodepool show --cluste-name aks-azureblue --name agentpool --query count

Change Scale Method

# Get current setup
az aks nodepool show --cluster-name aks-azureblue --name agentpool --resource-group rg-demo --query enableAutoScaling

# Switch to manual 
az aks nodepool update --cluster-name aks-azureblue --name agentpool --resource-group rg-demo --disable-cluster-autoscaler

# Switch to autoscale 
az aks nodepool update --cluster-name aks-azureblue --name agentpool --resource-group rg-demo --enable-cluster-autoscaler --min-count 1 --max-count 3

Show node pools

az aks nodepool list --cluster-name <cluster> -g <group> -o table

Check if node pool is on the latest node image

# Find latest node image version available 
az aks nodepool get-upgrades \
    --nodepool-name mynodepool \
    --cluster-name myAKSCluster \
    --resource-group myResourceGroup
# Get current node image version
az aks nodepool show \
    --resource-group myResourceGroup \
    --cluster-name myAKSCluster \
    --name mynodepool \
    --query nodeImageVersion

Upgrade all nodes in all node pools

az aks upgrade \
    --resource-group myResourceGroup \
    --name myAKSCluster \

Upgrade specific node pool

az aks nodepool upgrade \
    --resource-group myResourceGroup \
    --cluster-name myAKSCluster \
    --name mynodepool \

Upgrade node images with node surge

az aks nodepool update \
    --resource-group myResourceGroup \
    --cluster-name myAKSCluster \
    --name mynodepool \
    --max-surge 33% \


Find cluster-autoscaler warnings and errors

This query assumes you have diagnostic settings enabled on the category cluster-autoscaler and that you feed the logs to a log analytics workspace.

| where Category == 'cluster-autoscaler' and Resource =~ 'aks-azureblue' and log_s matches regex '(W[0-9][0-9][0-9][0-9].*)|(E[0-9][0-9][0-9][0-9].*)'
| order by TimeGenerated
| project TimeGenerated, log_s

Find existing authorized IP ranges (api-server-authorized-ip-range feature)

az aks show \
    --resource-group myResourceGroup \
    --name myAKSCluster \
    --query apiServerAccessProfile.authorizedIpRanges

Use labels to perform actions on pods

kubectl delete pod -l app=my-killer-app 
kubectl get pods -l app=my-killer-app

Restarting pods (rollout)

kubectl get deployments -n <namespace>
kubectl rollout restart deployment <deployment> -n <namespace>

Restarting pods (scaling)

kubectl get deployments -n <namespace>
kubectl scale deployment --replicas=0 <deployment> -n <namespace>
kubectl scale deployment --replicas=x <deployment> -n <namespace>

Available AKS addons

Create a spot node pool

az aks nodepool add --resource-group ResourceGroup --cluster-name AKSCluster --name spotnodepool --priority Spot --eviction-policy Delete --spot-max-price 1 --enable-cluster-autoscaler --min-count 1 --max-count 3 --no-wait

Enable HTTP Application Routing

In case you forgot to enable it while deploying the AKS cluster

az aks enable-addons --addons http_application_routing -n <aks-cluster> -g <resource-group>

Connect to an AKS cluster

az aks get-credentials -g <resource-group> -n <aks-cluster>
kubectl get nodes 

Attach an ACR to an AKS cluster

az aks update -n <aks-cluster> -g <resource-group> --attach-acr <acr-name>

Get running Kubernetes version

az aks show -n <cluster> -g <resource-group> --query kubernetesVersion