EKS + Spot

EKS managed node group支持管理spot,默认的策略是 Capacity Optimized 并开启Capacity Rebalancing ,所有创建的节点都会添加eks.amazonaws.com/capacityType: SPOT标签

Create Spot capacity

我们先创建一个managed node group,在里面声明创建spot机器:

cat << EOF > add-mng-spot.yml
apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig

- name: mng-spot-4vcpu-8gb
  desiredCapacity: 2
  minSize: 1
  maxSize: 4
  spot: true
  - c5.xlarge
  - c5a.xlarge
  - c5ad.xlarge
  - c5d.xlarge
  - c6a.xlarge
    - key: spotInstance
      value: "true"
      effect: NoSchedule
    intent: apps
    managed-by: mng-spot


Now create the new EKS managed node group:

eksctl create nodegroup --config-file=add-mng-spot.yml

Creation of node groups will take 3-4 minutes.

There are a few things to note in the configuration that we just used to create these node groups.

  • Node groups configurations are set under the managedNodeGroups section, this indicates that the node groups are managed by EKS.
  • A diversified set of instance types has been requested based on the output of ec2-instance-selector
  • The configuration spot: true indicates that the node group being created is a EKS managed node group with Spot capacity.
  • We applied a Taint spotInstance: "true:NoSchedule". NoSchedule is used to indicate we prefer pods not be scheduled on Spot Instances.
  • The nodes will be labelled with a particular intent, to allow you to deploy stateless applications on nodes that have been labeled with value apps

If you are wondering at this stage: Where is spot bidding price ? you are missing some of the changes EC2 Spot Instances had since 2017. Since November 2017 EC2 Spot price changes infrequently based on long term supply and demand of spare capacity in each pool independently. You can still set up a maxPrice in scenarios where you want to set maximum budget. By default maxPrice is set to the On-Demand price; Regardless of what the maxPrice value, Spot Instances will still be charged at the current Spot market price.

Confirm that the new nodes joined the cluster correctly:

kubectl get nodes -l intent=apps -L eks.amazonaws.com/capacityType,eks.amazonaws.com/nodegroup

The output will show all of the nodes we have provisioned to run our applications:

NAME                                           STATUS   ROLES    AGE     VERSION                CAPACITYTYPE   NODEGROUP
ip-192-168-19-58.us-west-2.compute.internal    Ready    <none>   118m    v1.22.12-eks-ba74326   SPOT           mng-spot-4vcpu-8gb
ip-192-168-43-230.us-west-2.compute.internal   Ready    <none>   5d15h   v1.22.12-eks-ba74326   ON_DEMAND      mng-od-4vcpu-8gb
ip-192-168-70-31.us-west-2.compute.internal    Ready    <none>   5d15h   v1.22.12-eks-ba74326   ON_DEMAND      mng-od-4vcpu-8gb
ip-192-168-95-234.us-west-2.compute.internal   Ready    <none>   118m    v1.22.12-eks-ba74326   SPOT           mng-spot-4vcpu-8gb

Spot best practices

Use the AWS Management Console to inspect the managed node groups deployed in your Kubernetes cluster.

  • Go to Elastic Kubernetes Service » click on Clusters » select eksworkshop-eksctl cluster » select Configuration tab » go to Compute tab in the bottom pane.
  • You can see 3 node groups created; two On-Demand node groups and one Spot node group.
  • Click on mng-spot-4vcpu-8gb node group and you can see the instance types we selected in earlier section.
  • Click on the Auto Scaling Group name in the tab. Scroll to the Purchase options and instance types settings. Note how Spot best practices are applied out of the box:
    • Capacity Optimized allocation strategy, which will launch Spot Instances from the most-available spare capacity pools. This results in minimizing the Spot Interruptions.
    • Capacity Rebalance helps EKS managed node groups manage the lifecycle of the Spot Instance by proactively replacing instances that are at higher risk of being interrupted. Node groups use Auto Scaling Group’s Capacity Rebalance feature to launch replacement nodes in response to Rebalance Recommendation notice, thus proactively maintaining desired node capacity.

Spot Best Practices

Interruption Handling

处理spot中断不需要安装三方的工具(例如AWS Node Termination Handler), managed node group会按以下行为来处理中断:

默认开启Spot Capacity Rebalancing ,保证对线上应用影响时间最短。(参考: https://compute.kpingfan.com/02-asg/11.capacity-rebalacing/ )。当收到rebalance recommendation,且新的spot节点状态ready后,EKS会先cordon原来的spot节点(unschedulable),再进行drain操作,把原来上面跑的pod驱赶到其他节点上。整个流程如下:

Spot Rebalance Recommendation

Spot workload

Now lets deploy a new workload that leverages the Spot capacity that we just added to the EKS cluster. The manifest below uses a nodeSelector to ensure that it will only use nodes that offer Spot capacity:

cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: Namespace
  name: team5
    app.kubernetes.io/created-by: eks-finhack
apiVersion: apps/v1
kind: Deployment
  name: workload5
  namespace: team5
    app.kubernetes.io/created-by: eks-finhack
  replicas: 3
      app: workload5
        app: workload5
        intent: apps
        eks.amazonaws.com/capacityType: SPOT
      terminationGracePeriodSeconds: 0
        - name: inflate
          image: public.ecr.aws/eks-distro/kubernetes/pause:3.2
              cpu: "250m"
              memory: 1Gi
      - key: "spotInstance" 
        operator: "Equal" 
        value: "true" 
        effect: "NoSchedule"

Can you check which nodes these pods were scheduled on? You can use either kubectl or kube-ops-view.


There is one more thing that we’ve accomplished!

  • Log into the EC2 Spot Request page in the Console.
  • Click on the Savings Summary button.

EC2 Spot Savings

We have achieved a significant cost saving over On-Demand prices that we can apply in a controlled way and at scale. We hope this savings will help you try new experiments or build other cool projects. Now Go Build !