在运行Batch任务前,我们先做好环境准备:
将EKS集群名称写入环境变量:
export BATCH_EKS_CLUSTER_NAME=<YOUR EKS CLUSTER NAME>
echo "export BATCH_EKS_CLUSTER_NAME=${BATCH_EKS_CLUSTER_NAME}" >> ~/.bash_profile
使用 AWS CLI 查询 EKS 集群的子网 ID、服务角色和安全组 ID:
# get the
export SUBNET_IDS=($(aws --region ${AWS_REGION} eks describe-cluster --name ${BATCH_EKS_CLUSTER_NAME} --output text --query "cluster.resourcesVpcConfig.subnetIds"))
# export the service role
export EKS_SERVICE_ROLE_ARN=$(aws --region ${AWS_REGION} eks describe-cluster --name ${BATCH_EKS_CLUSTER_NAME} --output text --query "cluster.roleArn" )
export EKS_SERVICE_ROLE_NAME=$(echo $EKS_SERVICE_ROLE_ARN | cut -f2 -d/ )
# export the security group ID for nodes
export SECURITY_GROUP_ID=$(aws --region ${AWS_REGION} eks describe-cluster --name ${BATCH_EKS_CLUSTER_NAME} --output text --query "cluster.resourcesVpcConfig.clusterSecurityGroupId" )
使用 AWS CLI 查询 EKS 集群节点组的节点组名称和实例角色:
# get the nodegroup information
export NODEGROUP_NAME=$(aws eks list-nodegroups --cluster-name $BATCH_EKS_CLUSTER_NAME --output text --query "nodegroups[0]")
# get the node IAM role
export EKS_NODE_ROLE_ARN=$(aws --region ${AWS_REGION} eks describe-nodegroup --cluster-name ${BATCH_EKS_CLUSTER_NAME} --nodegroup-name ${NODEGROUP_NAME} --output text --query "nodegroup.nodeRole" )
export EKS_NODE_ROLE_NAME=$(echo $EKS_NODE_ROLE_ARN | cut -f2 -d/ )
# get the IAM instance profile ARN
export EKS_NODE_IAM_PROFILE_ARN=$(aws iam list-instance-profiles-for-role --role-name $EKS_NODE_ROLE_NAME --query InstanceProfiles[].Arn --output text)
# get the EKS cluster ARN
export EKS_CLUSTER_ARN="arn:aws:eks:${AWS_REGION}:${ACCOUNT_ID}:cluster/${BATCH_EKS_CLUSTER_NAME}"
创建更多环境变量,以在batch资源定义文件中使用:
export BATCH_EKS_NAMESPACE="${BATCH_EKS_CLUSTER_NAME}-batch-nodes"
export BATCH_EKS_CE_NAME="${BATCH_EKS_CLUSTER_NAME}-CE1"
export BATCH_EKS_JQ_NAME="${BATCH_EKS_CLUSTER_NAME}-JQ1"
export BATCH_EKS_S3ECR_SA_NAME="${BATCH_EKS_CLUSTER_NAME}-s3ecr-sa"
export BATCH_EKS_S3ECR_ROLE_NAME="${BATCH_EKS_CLUSTER_NAME}-s3ecr-role"
将环境变量保存到的 Bash 配置文件中,以防会话中断:
echo "export EKS_CLUSTER_ARN=${EKS_CLUSTER_ARN}" >> ~/.bash_profile
echo "export EKS_NODE_ROLE_NAME=${EKS_NODE_ROLE_NAME}" >> ~/.bash_profile
echo "export EKS_SERVICE_ROLE_NAME=${EKS_SERVICE_ROLE_NAME}" >> ~/.bash_profile
echo "export EKS_NODE_ROLE_ARN=${EKS_NODE_ROLE_ARN}" >> ~/.bash_profile
echo "export EKS_SERVICE_ROLE_ARN=${EKS_SERVICE_ROLE_ARN}" >> ~/.bash_profile
echo "export SECURITY_GROUP_ID=${SECURITY_GROUP_ID}" >> ~/.bash_profile
echo "export SUBNET_IDS=(${SUBNET_IDS[@]})" >> ~/.bash_profile
echo "export EKS_NODE_IAM_PROFILE_ARN=${EKS_NODE_IAM_PROFILE_ARN}" >> ~/.bash_profile
echo "export BATCH_EKS_NAMESPACE=${BATCH_EKS_NAMESPACE}" >> ~/.bash_profile
echo "export BATCH_EKS_CE_NAME=${BATCH_EKS_CE_NAME}" >> ~/.bash_profile
echo "export BATCH_EKS_JQ_NAME=${BATCH_EKS_JQ_NAME}" >> ~/.bash_profile
echo "export BATCH_EKS_S3ECR_SA_NAME=\"${BATCH_EKS_CLUSTER_NAME}-s3ecr-sa\"" >> ~/.bash_profile
echo "export BATCH_EKS_S3ECR_ROLE_NAME=\"${BATCH_EKS_CLUSTER_NAME}-s3ecr-role\"" >> ~/.bash_profile
执行结束后,可以检查~/.bash_profile
文件:
要收集应用程序日志,必须在 EKS 集群中安装日志收集组件,例如 Fluent Bit、Fluentd 或 CloudWatch Container Insights。
我们将使用开源日志处理器和转发器 FluentBit , 将应用程序级别日志发送到 CloudWatch Logs
。
将CloudWatchAgentServerPolicy
附加到节点实例的Role
aws iam attach-role-policy --role-name ${EKS_NODE_ROLE_NAME} --policy 'arn:aws:iam::aws:policy/CloudWatchAgentServerPolicy'
如果还没有名为amazon-cloudwatch
的命名空间,先创建命名空间:
kubectl create ns amazon-cloudwatch
运行以下命令创建一个以集群名称和要将日志发送到的区域命名的ConfigMap cluster-info
:
ClusterName=${BATCH_EKS_CLUSTER_NAME}
RegionName=${AWS_REGION}
FluentBitHttpPort='2020'
FluentBitReadFromHead='Off'
[[ ${FluentBitReadFromHead} = 'On' ]] && FluentBitReadFromTail='Off'|| FluentBitReadFromTail='On'
[[ -z ${FluentBitHttpPort} ]] && FluentBitHttpServer='Off' || FluentBitHttpServer='On'
kubectl create configmap fluent-bit-cluster-info \
--from-literal=cluster.name=${ClusterName} \
--from-literal=http.server=${FluentBitHttpServer} \
--from-literal=http.port=${FluentBitHttpPort} \
--from-literal=read.head=${FluentBitReadFromHead} \
--from-literal=read.tail=${FluentBitReadFromTail} \
--from-literal=logs.region=${RegionName} -n amazon-cloudwatch
通过运行以下命令下载 Fluent Bit daemonset
并将其部署到集群:
kubectl apply -f https://raw.githubusercontent.com/aws-samples/amazon-cloudwatch-container-insights/latest/k8s-deployment-manifest-templates/deployment-mode/daemonset/container-insights-monitoring/fluent-bit/fluent-bit.yaml
通过输入以下命令来验证部署。每个节点应该有一个名为 Fluent-bit-* 的 Pod:
kubectl get pods -n amazon-cloudwatch
还应该看到日志组显示在AWS CloudWatch 管理控制台中 :
接下来我们配置EKS 集群与 AWS Batch 一起使用。
为 AWS Batch 创建 Kubernetes 命名空间:
# use kubectl to create the namespace
cat - <<EOF | kubectl create -f -
{
"apiVersion": "v1",
"kind": "Namespace",
"metadata": {
"name": "${BATCH_EKS_NAMESPACE}",
"labels": {
"name": "${BATCH_EKS_NAMESPACE}"
}
}
}
EOF
# namespace/eks-cilium-batch-nodes created
创建 Kubernetes view Role:
cat - <<EOF | kubectl apply -f -
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: aws-batch-cluster-role
rules:
- apiGroups: [""]
resources: ["namespaces"]
verbs: ["get"]
- apiGroups: [""]
resources: ["nodes"]
verbs: ["list", "watch"]
- apiGroups: [""]
resources: ["pods"]
verbs: ["list", "watch"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: aws-batch-cluster-role-binding
subjects:
- kind: User
name: aws-batch
apiGroup: rbac.authorization.k8s.io
roleRef:
kind: ClusterRole
name: aws-batch-cluster-role
apiGroup: rbac.authorization.k8s.io
EOF
创建 Kubernetes Pod 管理角色:
cat - <<EOF | kubectl apply -f - --namespace "${BATCH_EKS_NAMESPACE}"
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: aws-batch-compute-environment-role
namespace: ${BATCH_EKS_NAMESPACE}
rules:
- apiGroups: [""]
resources: ["pods"]
verbs: ["create", "get", "list", "watch", "delete"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: aws-batch-compute-environment-role-binding
namespace: ${BATCH_EKS_NAMESPACE}
subjects:
- kind: User
name: aws-batch
apiGroup: rbac.authorization.k8s.io
roleRef:
kind: Role
name: aws-batch-compute-environment-role
apiGroup: rbac.authorization.k8s.io
EOF
创建角色并应用策略后,使用eksctl
CLI 创建身份映射:
eksctl create iamidentitymapping \
--cluster ${BATCH_EKS_CLUSTER_NAME} \
--arn "arn:aws:iam::${ACCOUNT_ID}:role/AWSServiceRoleForBatch" \
--username aws-batch