Kubernetes is a rapidly emerging technology and AKS automatically looks at these updates that are periodically released for being in sync with the newly launched Kubernetes versions. Having an AKS upgrade provides security, stability, and newer functionalities. But there is no easy way to do this; it has its share of complications. This step-by-step guide takes you through the process of upgrading your AKS cluster to the latest version, using best practices, key considerations, and common issues people often encounter.
Understanding AKS Versioning and Support Policy
Before you can start the upgrade process, know about AKS versioning:
- Kubernetes Version Lifecycle: For any Kubernetes version, there is a support cycle. AKS by default maintains three minor versions: the most recent and two previous ones. Always be upgraded to supported versions to get security updates and patches.
- Scheduled Upgrades: Azure periodically retires obsolete versions. Make sure to upgrade your cluster in advance to avoid unsupported versions.
Prerequisites for Upgrading AKS
Before Upgrading, the following conditions must be met
i. Get Cluster Health: Your AKS cluster must be healthy. Run the following command to view its health status: kubectl get nodes
. All nodes should be "Ready."
ii. AKS Version: Find out the current version of AKS. Issue the following command to find out the current version of Kubernetes
az aks show --resource-group <ResourceGroupName> --name <ClusterName> --query kubernetesVersion
iii. Data and Configuration Backup: Screenshot important data and back up any workloads, but especially Persistent Volumes
iv. Check Azure Service Limits: Evaluate your quotas in the subscription to be sure you have as much available resource capacity as possible (compute CPU/Memory, etc.) to cover potential upgrades and rolling updates.
Step-by-Step AKS Upgrade Process
Step 1: Check Available Upgrades
To identify which versions of Kubernetes are available to an AKS cluster, execute this:
az aks get-upgrades --resource-group <ResourceGroupName> --name <ClusterName>
This command will provide the versions of Kubernetes that are available for upgrading.
Step 2: Upgrade the AKS Control Plane
Upgrading the control plane, also called master nodes, is the first step. This doesn't take away from your workloads but it's important for managing the state and behavior of your cluster.
Run:
az aks upgrade --resource-group <ResourceGroupName> --name <ClusterName> --Kubernetes-version <NewVersion> --control-plane-only
Don't forget to include the target version of Kubernetes.
Step 3: Upgrade Node Pools
Now that the control plane has been upgraded, you need to update the node pools. If you have more than one, each needs to be updated.
For single node pool:
az aks nodepool upgrade --resource-group <ResourceGroupName> --cluster-name <ClusterName> --name <NodePoolName> --kubernetes-version <NewVersion>
For a multi-node pool cluster, repetition is the trick. Do the above for other nodes to upgrade them also.
Azure does a rolling update, draining and upgrading each node in turn.
Step 4: Validate the Upgrade
After upgrading, it's time to validate the functionality of the cluster by:
i. Confirming that all nodes are running on a new version of Kubernetes with the command
kubectl get nodes
ii. Confirming that all workloads are performing well by using the command
kubectl get pods --all-namespaces
Best Practices for AKS Upgrades
- Test in a Non-Production Environment: Always test the upgrade process in a non-production or staging environment. In this way, you can catch all errors before making any changes to your live cluster.
- Read Kubernetes Release Notes: Read up on what is going to change with the new version of Kubernetes. A great place to start is reading through the release notes. They are usually replete with deprecations, changes to the API, and other differences that can impact your cluster.
- Leverage Maintenance Windows: Upgrade across low traffic times or maintenance windows to ensure that interruptions are kept to a minimum.
- Automate Upgrades: For environments on a large scale, you could make use of IAC scripts with CI/CD pipelines to automate the upgrade. You could set up AKS for auto-upgrades of node pools and thus always keep your cluster up-to-date without the intervention of the administrator.
- Monitor Resource Consumption: You should always monitor the consumption of resources in your cluster after an upgrade. New features in Kubernetes may mean increased use of certain resources, and workloads may behave slightly differently under the new version.
Key Considerations for AKS Upgrades
- Upgrade Increments: By nature, one would avoid upgrades over several versions in sequence because that skips many versions. The approach is to upgrade incrementally, one minor version at a time to reduce the possibility of breaking changes or deprecated APIs.
- Managed vs. User-Managed Node Pools: Decide if you want to choose managed node pools for automatic scaling as well as updating or use user-managed node pools for control over the environment.
- Application Compatibility: You need to confirm if the applications are compatible with the new version of Kubernetes. It refers to whether the API, networking policy, or security configurations are going to change because of the update.
- Downtime Consideration: AKS upgrades are mostly non-disruptive, but single-replica workloads will experience short-lived downtime. Be prepared, though, for services that must have high uptime, like mission-critical services.
Common Issues During AKS Upgrades
- Node Not Upgrading: On occasion, nodes fail to upgrade, typically due to a lack of resource availability. Ensure your cluster has enough available resources-for example, CPU and memory-before attempting an upgrade.
- Application Downtime: Given AKS will do rolling upgrades, you'll most likely put some single-pod workloads or those with not enough replicas down. Look to scale up your application before upgrading.
- API Deprecations: Upgrading multiple versions of Kubernetes carries the possibility that some APIs may be already deprecated. Don't forget to read the release notes to know which APIs might be deprecated so you can update your manifests accordingly.
- Node Drain Failures: While upgrading a node, the process drains the workloads of the node. If workloads cannot be rescheduled, an update may hang. Then check the pods or PVCs that were probably misconfigured.
- Failed Upgrades: Rollback does not support a previous version of Kubernetes in Azure. In the worst case of anything going wrong in an upgrade, there is always plan B to fall back on- recreate the cluster or recover workloads from backups.
The most common problem that arises with unplanned clusters is 'InsufficientSubnetSize' and the error statement can be The number of free IP addresses in the subnet is less than xx or 'Pre-allocated IPs xx exceeds IPs available yy in Subnet Cidr, Subnet Name.'
Cause: A subnet that is actually in use for a cluster has no more available IP addresses within its CIDR address space which could be allocated successfully for resources.
Solution: For an instant and minimal solution, a trick to reduce the cluster nodes to reserve IP addresses for the upgrade, will work.
However, if scaling down isn't an option, then follow the below procedure,
Update a subnet's CIDR address space in an existing node pool for AKS, and that is currently not allowed. To migrate your workloads to a new node pool in a bigger subnet, you should do the following:
i. Create a subnet under the cluster virtual network with a CIDR address range larger than the existing subnet's.
ii. Deploy a node pool on the new subnet: az aks nodepool add --vnet-subnet-id
.
iii. Move your workloads to the new node pool by draining the nodes in the old node pool.
iv. Delete the original node pool using the command az aks nodepool delete
.
Post-Upgrade Tasks
- Cluster Audit: After Upgrade Review the cluster configurations. There could be changes due to new Kubernetes versions, or some pods end up in a pending state.
- Review PodDisruptionBudgets (PDB): Ensure that they are properly set up so that rolling upgrades do not interfere with your applications.
- Update Helm Charts and Manifests: If you are rolling out applications with Helm ensure your charts are compatible with the version of Kubernetes. Manifests need to be updated by applying any deployment changes.
Conclusion
Such upgrades of your AKS cluster to the latest version must, therefore, be done to ensure the system remains safe, stable, and performs. Preparation of common pitfalls along the way as well as following best practices help reduce the odds of causing some sort of disruption to the process. Ensure that you test your upgrade strategy in a non-production environment, automate wherever possible, and keep up with the Kubernetes release changes to avoid surprises.