在本节中,我们将创建容器 Dockerfile 配置文件,其中安装了所有必需的软件包。
创建 ECR 存储库:
export ECR_REPO_NAME_CPI=eks-cacl-pi-$(uuidgen --random | cut -d'-' -f1)
echo "export ECR_REPO_NAME_CPI=${ECR_REPO_NAME_CPI}" >> ~/.bash_profile
aws ecr create-repository \
--repository-name ${ECR_REPO_NAME_CPI} \
--image-scanning-configuration scanOnPush=true \
--region ${AWS_REGION}
将repositoryUri
的值保存到的 Bash 环境变量和 Bash 配置文件中:
export ECR_REPO_CPI="${ACCOUNT_ID}.dkr.ecr.${AWS_REGION}.amazonaws.com/${ECR_REPO_NAME_CPI}"
echo "export ECR_REPO_CPI=${ECR_REPO_CPI}" >> ~/.bash_profile
创建一个用于存放 Dockerfile 的项目目录:
mkdir ~/environment/calc-pi
cd ~/environment/calc-pi
创建一个脚本来使用 Python 计算 Pi 的值。为了使这项工作花费超过 5 秒,我们执行 100 次:
cat > calc-pi.py <<EOF
#!/usr/local/bin/python
s = 0
for x in range(100):
# Initialize denominator
k = 1
# Initialize sum
s = 0
for i in range(1000000):
# even index elements are positive
if i % 2 == 0:
s += 4/k
else:
# odd index elements are negative
s -= 4/k
# denominator is odd
k += 2
print(s)
EOF
使脚本可执行:
chmod +x calc-pi.py
为容器创建 Dockerfile:
cat > Dockerfile <<EOF
FROM public.ecr.aws/docker/library/python:latest
COPY calc-pi.py /usr/local/bin/calc-pi.py
EOF
现在应该有一个如下所示的目录:
calc-pi/
├── calc-pi.py
├── Dockerfile
构建容器并将其推送到 ECR
echo "Logging in to ECR..."
aws ecr get-login-password --region $AWS_REGION | docker login --username AWS --password-stdin $ACCOUNT_ID.dkr.ecr.$AWS_REGION.amazonaws.com
echo "Build started on `date`"
echo "Building the Docker image..."
docker build -t $ECR_REPO_NAME_CPI:latest .
docker tag $ECR_REPO_NAME_CPI:latest $ECR_REPO_CPI:latest
echo "Build completed on `date`"
echo "Pushing the Docker image..."
docker push $ECR_REPO_CPI:latest
本节我们将使用IRSA来创建service role,以便后面pod访问ECR。
获取提供的用于访问私有 ECR 存储库的策略文档:
curl 'https://pingfan.s3.amazonaws.com/files/ecr-access-iam-policy.json' --output ecr-access-iam-policy.json
创建策略并将 ARN 存储在环境变量中:
export BATCH_EKS_ECR_POLICY_ARN=$(aws iam create-policy --policy-name "ecr-access-iam-policy" --policy-document file://./ecr-access-iam-policy.json --output text --query Policy.Arn)
创建 IAM 角色并使用eksctl
将其与 Kubernetes service account
关联:
export BATCH_EKS_ECR_SA_NAME="${BATCH_EKS_CLUSTER_NAME}-ecr-sa"
export BATCH_EKS_ECR_ROLE_NAME="${BATCH_EKS_CLUSTER_NAME}-ecr-role"
eksctl create iamserviceaccount --name ${BATCH_EKS_ECR_SA_NAME} \
--namespace ${BATCH_EKS_NAMESPACE} \
--cluster ${BATCH_EKS_CLUSTER_NAME} \
--role-name ${BATCH_EKS_ECR_ROLE_NAME} \
--attach-policy-arn ${BATCH_EKS_ECR_POLICY_ARN} \
--approve
将这些值保存在的 Bash 配置文件中以供以后使用:
echo "export BATCH_EKS_ECR_SA_NAME=${BATCH_EKS_ECR_SA_NAME}" >> ~/.bash_profile
echo "export BATCH_EKS_ECR_ROLE_NAME=${BATCH_EKS_ECR_ROLE_NAME}" >> ~/.bash_profile
echo "export BATCH_EKS_ECR_POLICY_ARN=${BATCH_EKS_ECR_POLICY_ARN}" >> ~/.bash_profile
执行完成后的效果:
创建Job Defination
。请注意使用作业定义参数以及 pod 的 Kubernetes service account来启用对私有 ECR 和 S3 资源的访问:
JD_CPI_NAME="b4eks-calc-pi-$(uuidgen --random | cut -d'-' -f1)"
echo "export JD_CPI_NAME=${JD_CPI_NAME}" >> ~/.bash_profile
cat > ${JD_CPI_NAME}.json <<EOF
{
"jobDefinitionName": "${JD_CPI_NAME}",
"type": "container",
"parameters": {
"Source": "NULL",
"Destination": "NULL"
},
"eksProperties": {
"podProperties": {
"serviceAccountName": "${BATCH_EKS_ECR_SA_NAME}",
"hostNetwork": true,
"containers": [
{
"image": "${ECR_REPO_CPI}:latest",
"command": [
"/usr/local/bin/calc-pi.py"
],
"resources": {
"limits": {
"cpu": "1",
"memory": "1024Mi"
}
}
}
]
}
}
}
EOF
使用 AWS CLI 注册 AWS Batch Job Defination:
aws batch register-job-definition --cli-input-json file://./${JD_CPI_NAME}.json
应该看到如下所示的输出:
{
"jobDefinitionName": "b4eks-calc-pi-403f8225",
"jobDefinitionArn": "arn:aws:batch:us-east-1:111111111111:job-definition/b4eks-calc-pi-403f8225:1",
"revision": 1
}
现在,我们将通过提交一系列作业来对集群进行压力测试。我们将看到节点被添加到集群中,并且随着作业启动,Grafana 仪表板开始显示 Pod 和集群资源的扩展。
在终端中执行:
aws batch submit-job --job-name 'batch4eks-calc-pi' \
--job-queue ${BATCH_EKS_JQ_NAME} \
--job-definition "${JD_CPI_NAME}:1" \
--array-properties size=100
在 Batch控制台中,能看到Job状态:
几分钟后,应该会在EKS 管理控制台中看到节点被添加到 EKS 集群中。还可以使用以下命令查看 AWS Batch 计算环境增加的所需 vCPU值:
aws batch describe-compute-environments --compute-environments $BATCH_EKS_CE_NAME --query "computeEnvironments[].computeResources.[minvCpus, desiredvCpus, maxvCpus]"
随着作业的开始和完成,还应该看到 Grafana 仪表板指标随着时间的推移先上升后下降: