Skip to content

Autoscaling

MetaKube allows you to scale the size of MachineDeployments automatically to adjust to your dynamic workloads.

Enable autoscaling

You can enable autoscaling for a MachineDeployment using different clients.

MetaKube Terraform Provider

Configure autoscaling by specifying the min_replicas and max_replicas fields:

resource "metakube_node_deployment" "nodes" {
  spec {
    min_replicas = 0
    max_replicas = 5
  }
}

Add the following annotations to the MachineDeployment:

metadata:
  annotations:
    cluster.k8s.io/cluster-api-autoscaler-node-group-min-size: "1"
    cluster.k8s.io/cluster-api-autoscaler-node-group-max-size: "15"

Behavior

Scaling up

If autoscaling is enabled, the autoscaler will scale up the MachineDeployment if all the following conditions are met:

  • The current number of replicas is lower than the maximum number of replicas

    1. Get current replica count

      kubectl -n kube-system get machinedeployment $name -o jsonpath='{.spec.replicas}'
      
    2. Get configured max replica count

      kubectl -n kube-system get machinedeployment $name -o jsonpath='{.metadata.annotations.cluster\.k8s\.io/cluster-api-autoscaler-node-group-max-size}'
      
  • Pods can not be scheduled and remain in Pending state due to limited resources

    Get Pods in Pending state:

    kubectl -n $namespace get pod --field-selector spec.nodeName==""
    
  • The Nodes managed by the MachineDeployment allow the Pods to be scheduled

    Inspect the MachineDeployment's taints and labels and if the Pod matches.

  • Adding a new Node will create enough free capacity to accommodate the Pods

    Inspect the Pod's containers' requests:

    kubectl -n $namespace get pod $pod -o jsonpath='{..requests}'
    

    Their sum must be smaller than the node's allocatable resources.

The autoscaler calculates the number of required Nodes to schedule all Pending Pods and will update the replica count of the MachineDeployment accordingly.

Scaling down

If autoscaling is enabled, the autoscaler will scale down the MachineDeployment if all the following conditions are met:

  • The current number of replicas is higher than the specified minimum number of replicas

    1. Get current replica count

      kubectl -n kube-system get machinedeployment $name -o jsonpath='{.spec.replicas}'
      
    2. Get configured min replica count

      kubectl -n kube-system get machinedeployment $name -o jsonpath='{.metadata.annotations.cluster\.k8s\.io/cluster-api-autoscaler-node-group-min-size}'
      
  • The last scale-up hasn't happened within the last 2 minutes

  • The Nodes are utilized under a 50% threshold

    To check utilization of the Nodes:

    kubectl top no
    
  • All Pods can be scheduled on fewer nodes

    It could be that Pods are not allowed to be evicted for example because of PodDisruptionBudgets.

The autoscaler will simulate moving existing Pods to other Nodes and calculate candidates for removal. It will remove underutilized Nodes one at a time.

Scaling up from zero

MetaKube supports scaling MachineDeployments down to zero and up from zero. You may also use taints or rely on node labels.

Configuration

MetaKube runs the cluster autoscaler with the generic Cluster API provider plugin. The version always matches the cluster's minor Kubernetes version.

We use the following additional configuration flags:

--scan-interval=1m
--scale-down-delay-after-add=2m
--scale-down-unneeded-time=2m
--scale-down-unready-time=2m
--skip-nodes-with-local-storage=false
--enforce-node-group-min-size=true

Info

We currently provide no way to change this configuration. If you encounter issues or have special requirements, please contact us.

Local storage

Autoscaling is not suitable for workloads that use host local storage. Because of the flag --skip-nodes-with-local-storage=false, Nodes with Pods that use e.g. hostPath volumes, may still be considered candidates to remove when scaling down. This is a deliberate decision as in our experience, the alternative very often leads to false positives and unnecessarily blocks scale down.

Troubleshooting

If your MachineDeployment isn't scaling up or down, carefully examine the conditions required for scaling up or down respectively.

Info

If you still can't find a reason why the MachineDeployment isn't scaled up or down, please contact our support. Please include the output of the above steps in your inquiry.

References