Future-Proof Your AKS Cluster with Strategic IP Address Planning

Scaling Kubernetes efficiently is an art that blends strategic planning with technical expertise. In Azure Kubernetes Service (AKS), mastering the allocation of IP addresses is crucial for ensuring seamless scalability and optimal performance. When configuring an AKS cluster with default Azure CNI networking, it’s essential to address the nature of scaling Pods, Nodes, and internal LoadBalancer-type services.

This blog delves into the intricate process of IP address planning, explicitly focusing on scaling Pods, Nodes, and Services in an AKS cluster. Discover how thoughtful IP management can prevent bottlenecks, optimize resource utilization, and pave the way for a robust production environment.

Image Source: https://kubernetes.io/docs/images/kubernetes-cluster-network.svg

We’ll talk about two approaches to IP address planning in AKS: Static IP allocation and Dynamic IP allocation. Let us understand a bit about both, and then we will go deeper to understand each approach.

Static IP Allocation: Suitable for predictable and stable environments where the maximum number of pods per node is known and does not change frequently. This approach can lead to IP wastage if the number of pods varies.

Dynamic IP Allocation: Better for environments with varying workloads and dynamic scaling needs. It helps reduce IP wastage by allocating additional IPs only when necessary, but it requires careful subnet planning with a buffer to accommodate dynamic allocations.

Static IP Allocation

The default configuration in AKS uses static IP allocation based on the ‘maximum number of pods per node’ setting, counting all pods created from Deployments, Daemonsets, Statefulsets, or Jobs. The private subnet CIDR specified during node pool configuration assigns IP addresses to Pods, Nodes, and Load balancer type services for an internal network.

It is essential to decide the “maximum pods per node” configuration and to have a dedicated subnet CIDR range before creating a node pool. AKS reserves the same number of IP addresses as the “maximum pods per node” setting per node, regardless of the actual number of allocated/running pods.

Let’s understand this with an example:

Suppose we have ’10 nodes’ and have set ‘20’ as the maximum number of pods per node:

IP addresses for nodes = 10
IP addresses reserved for the pods on 10 nodes = 10 * 20 = 200"

In this case, 210 IP addresses are reserved upfront when 10 nodes are launched. If fewer than 20 pods run on any node, the remaining unused/reserved IP addresses are wasted until more pods are launched on that node. Additionally, when more nodes are added to the cluster, IP addresses for nodes are assigned from the same subnet.

NOTE*: By default, AKS runs approximately 8 to 12 daemonsets in the kube-system namespace (this number varies based on the AKS version). These pods use the host network and do not require separate IP addresses from the pool, as they use the node’s network interface. However, they still occupy space on the node and get accounted for in the “maximum number of pods” config. It’s important to understand that the total number of pods on a node, including all system daemonsets, does not necessarily equal the total number of IP addresses used.*

Pros:

Predictability: IP addresses are reserved upfront based on the maximum number of pods per node, leading to predictable IP usage.
Simple Configuration: It is easier to configure and manage as the IP allocation is static and predefined.
No Surprise Scaling Issues: Since IPs are reserved upfront, there is less chance of running out of IP addresses during sudden scale-ups.
IP Wastage: If the number of pods running on a node is less than the maximum number set, the unused IP addresses are wasted.
Rigidity: It lacks flexibility for dynamic scaling. Any change in the number of maximum pods per node requires the creation of a new node pool.
Over-Reservation: The system may over-reserve IPs, leading to inefficient use of the IP address space.
Complicated CIDR Planning: Requires careful planning of CIDR ranges to avoid IP shortages or over-reservation, especially in large clusters.

Another approach is to leverage dynamic pod IP allocation, which enables nodes and pods to scale independently. Since they reside in different subnets, you can plan their address spaces separately.

It’s important to note that pods receive IPs from the pod subnet, while internal LB-type services and worker nodes obtain IPs from the node subnet.

Dynamic IPs assigned for pods at the node level are allocated in batches of 16. Therefore, it’s recommended that the pod subnet be planned with a minimum of 16 IPs per node. AKS initially assigns 16 IPs at node startup and requests another batch of 16 IPs whenever fewer than 8 unallocated IPs remain on any node. However, this design may result in some IP wastage.

Let’s delve into a scenario where we plan a node pool with 10 nodes, accommodating 20 pods each, including 10 system daemonset pods. These daemonset pods utilize the host network and don’t require an IP from the pod subnet.

However, dynamic scaling scenarios may arise, such as horizontal scaling up or feature addition to services needing more pods for any deployments, leading to a need for more than 20 pods on a single node.

In a dynamic scenario like this, the advantage of dynamic IP allocation becomes apparent. Without creating a new node pool(unlike static IP allocation, where the max pods per node are fixed, you would end up creating a new node pool with a higher “max pods per node” configuration), AKS can dynamically reserve or request an additional set of 16 IP addresses for existing nodes, resulting in a dynamic reservation of 32 IP addresses in such scenario.

Initially, the pod subnet requires 10 * 20 = 200 IPs; however, by default, AKS reserves 16 IP addresses at the beginning. In this case, the node would need to request another set of 16 IPs (given that unallocated IPs on those nodes are less than 8, as all 16 IPs are allocated to pods), and the total reserved IP address is 32. We would need 10 * 32 = 320 IPs. This complexity highlights the challenge of calculating the exact IP addresses required for the subnet. Thus, planning with a generous buffer for the pod subnet is prudent, considering that dynamic IP allocation can lead to some IP wastage.

Understanding IP Address Wastage in Different Scenarios

Let’s explore two scenarios to understand IP address wastage in detail:

Scenario 1:

In this case, the total number of pods running on the node (excluding AKS daemonset) is 8. The total unallocated IPs is calculated as (16–8) = 8. Since the total unallocated IPs are at least 8, Azure CNI, which reserves IPs in batches of 16, would result in a wastage of (16–8) = 8 IPs per node.

Scenario 2:

Moving on to the second scenario, where the total number of pods running on the node (excluding AKS daemonset) is 9 pods. The total unallocated IPs is calculated as (16–9) = 7. As the total unallocated IPs are less than 8, AKS adjusts by requesting another batch of 16 IP addresses to the reservation. Consequently, the total reserved IP addresses become 32 per node. The IP wastage in this scenario is calculated as (32–9) = 23 IPs per node.

These extreme scenarios illustrate IP wastage, ranging from a minimum of 8 IPs to a maximum of 23 IPs per node. It’s important to note that these examples assume an equal distribution of pods across all nodes, and the values for the number of running pods can vary in different scenarios. The underlying logic, however, remains consistent.

You can run the following command to see the allocated IPs per node:

kubectl get nnc -n kube-system

To get the total number of reserved IP addresses

kubectl -n kube-system get nnc | awk '{sum+=$2;}END{print sum;}'

Pros:

Flexibility: IP addresses are allocated dynamically as needed, which allows for more flexible scaling of pods without thinking about how many pods to run on any node.
Efficient IP Usage: Reduces IP wastage by allocating additional IPs only when necessary.
Separate Subnets: Different subnets for pods and nodes enable independent scaling, optimizing IP address space usage.
Scalability: Better suited for environments with unpredictable or dynamic workloads.
Complex Planning: Requires careful subnet planning with a generous buffer to accommodate dynamic IP allocations.
Potential IP Wastage: Although reduced, dynamic IP allocation can still lead to some IP wastage, especially when pods exceed the threshold, triggering additional IP allocation.
Monitoring Overhead: Requires continuous monitoring and managing IP address space to ensure sufficient availability during scale-ups.
Initial Configuration Complexity: Setting up and understanding the dynamic allocation mechanism can be more complex than static allocation.

Effective IP address planning is critical for managing scalability in AKS clusters. The static IP allocation approach is suitable for predictable and stable environments but can lead to IP wastage if pod counts vary. On the other hand, dynamic IP allocation is ideal for environments with fluctuating workloads, allowing for more efficient use of IP addresses by allocating them as needed. However, handling dynamic allocations requires careful subnet planning with a buffer.

The recommendation is to adopt dynamic pod IP address allocation with a buffer for the pod subnet to address potential scaling needs. This proactive approach ensures your cluster can seamlessly scale up without predefining the exact number of pods per node. If private IP addresses become a constraint at large scales, consider using static IP allocation with meticulous planning to optimize resource usage. This balanced strategy will help maintain efficient and flexible IP management, catering to current and future scalability requirements.”