We will deploy an sample nginx application as a ReplicaSet
of 1 Pod
cat <<EoF> ~/environment/cluster-autoscaler/nginx.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: nginx-to-scaleout
spec:
replicas: 1
selector:
matchLabels:
app: nginx
template:
metadata:
labels:
service: nginx
app: nginx
spec:
containers:
- image: nginx
name: nginx-to-scaleout
resources:
limits:
cpu: 500m
memory: 512Mi
requests:
cpu: 500m
memory: 512Mi
EoF
kubectl apply -f ~/environment/cluster-autoscaler/nginx.yaml
kubectl get deployment/nginx-to-scaleout
Let’s scale out the replicaset to 10
kubectl scale --replicas=10 deployment/nginx-to-scaleout
Some pods will be in the Pending
state, which triggers the cluster-autoscaler to scale out the EC2 fleet.
kubectl get pods -l app=nginx -o wide --watch
NAME READY STATUS RESTARTS AGE
nginx-to-scaleout-7cb554c7d5-2d4gp 0/1 Pending 0 11s
nginx-to-scaleout-7cb554c7d5-2nh69 0/1 Pending 0 12s
nginx-to-scaleout-7cb554c7d5-45mqz 0/1 Pending 0 12s
nginx-to-scaleout-7cb554c7d5-4qvzl 0/1 Pending 0 12s
nginx-to-scaleout-7cb554c7d5-5jddd 1/1 Running 0 34s
nginx-to-scaleout-7cb554c7d5-5sx4h 0/1 Pending 0 12s
nginx-to-scaleout-7cb554c7d5-5xbjp 0/1 Pending 0 11s
nginx-to-scaleout-7cb554c7d5-6l84p 0/1 Pending 0 11s
nginx-to-scaleout-7cb554c7d5-7vp7l 0/1 Pending 0 12s
nginx-to-scaleout-7cb554c7d5-86pr6 0/1 Pending 0 12s
nginx-to-scaleout-7cb554c7d5-88ttw 0/1 Pending 0 12s
View the cluster-autoscaler logs
kubectl -n kube-system logs -f deployment/cluster-autoscaler
You will notice Cluster Autoscaler events similar to below
](https://www.eksworkshop.com/images/scaling-asg-up2.png)
Check the EC2 AWS Management Console to confirm that the Auto Scaling groups are scaling up to meet demand. This may take a few minutes. You can also follow along with the pod deployment from the command line. You should see the pods transition from pending to running as nodes are scaled up.
or by using the kubectl
kubectl get nodes
Output
ip-192-168-12-114.us-east-2.compute.internal Ready <none> 3d6h v1.17.7-eks-bffbac
ip-192-168-29-155.us-east-2.compute.internal Ready <none> 63s v1.17.7-eks-bffbac
ip-192-168-55-187.us-east-2.compute.internal Ready <none> 3d6h v1.17.7-eks-bffbac
ip-192-168-82-113.us-east-2.compute.internal Ready <none> 8h v1.17.7-eks-bffbac
缩容过程:
当节点资源利用率较低时,会将其上Pod驱逐到其他节点上,并删除这些节点以节约成本
CA might scale down non-empty nodes with utilization below a threshold (configurable with --scale-down-utilization-threshold
flag).
To prevent this behavior, set the utilization threshold to 0
.
CA可以多个ASG一起使用
The Cluster Autoscaler has a concept of Expanders
, which provide different strategies for selecting which Node Group to scale. The strategy --expander=least-waste
is a good general purpose default, and if you’re going to use multiple node groups for Spot Instance diversification (as described in the image above), it could help further cost-optimize the node groups by scaling the group which would be best utilized after the scaling activity.
ca如何决定扩展哪个asg呢?
When Cluster Autoscaler identifies that it needs to scale up a cluster due to unschedulable pods, it increases the number of nodes in some node group. When there is one node group, this strategy is trivial. When there is more than one node group, it has to decide which to expand.
Expanders provide different strategies for selecting the node group to which new nodes will be added.
Expanders can be selected by passing the name to the --expander
flag, i.e. ./cluster-autoscaler --expander=random
.
Currently Cluster Autoscaler has 5 expanders:
random
- this is the default expander, and should be used when you don’t have a particular need for the node groups to scale differently.most-pods
- selects the node group that would be able to schedule the most pods when scaling up. This is useful when you are using nodeSelector to make sure certain pods land on certain nodes. Note that this won’t cause the autoscaler to select bigger nodes vs. smaller, as it can add multiple smaller nodes at once.least-waste
- selects the node group that will have the least idle CPU (if tied, unused memory) after scale-up. This is useful when you have different classes of nodes, for example, high CPU or high memory nodes, and only want to expand those when there are pending pods that need a lot of those resources.price
- select the node group that will cost the least and, at the same time, whose machines would match the cluster size. This expander is described in more details HERE
. Currently it works only for GCE, GKE and Equinix Metal (patches welcome.)priority
- selects the node group that has the highest priority assigned by the user. It’s configuration is described in more details here
From 1.23.0 onwards, multiple expanders may be passed, i.e. .cluster-autoscaler --expander=priority,least-waste
This will cause the least-waste
expander to be used as a fallback in the event that the priority expander selects multiple node groups. In general, a list of expanders can be used, where the output of one is passed to the next and the final decision by randomly selecting one. An expander must not appear in the list more than once.