环境准备

在运行Batch任务前,我们先做好环境准备:

  1. 设置所需环境变量
  2. 安装FluentBit以收集日志
  3. 为 AWS Batch 创建 Kubernetes中的权限

设置所需环境变量

将EKS集群名称写入环境变量:

export BATCH_EKS_CLUSTER_NAME=<YOUR EKS CLUSTER NAME>
echo "export BATCH_EKS_CLUSTER_NAME=${BATCH_EKS_CLUSTER_NAME}" >> ~/.bash_profile

使用 AWS CLI 查询 EKS 集群的子网 ID、服务角色和安全组 ID:

# get the
export SUBNET_IDS=($(aws --region ${AWS_REGION} eks describe-cluster  --name ${BATCH_EKS_CLUSTER_NAME}  --output text --query  "cluster.resourcesVpcConfig.subnetIds"))

# export the service role
export EKS_SERVICE_ROLE_ARN=$(aws --region ${AWS_REGION} eks describe-cluster --name ${BATCH_EKS_CLUSTER_NAME}  --output text --query  "cluster.roleArn" )
export EKS_SERVICE_ROLE_NAME=$(echo $EKS_SERVICE_ROLE_ARN | cut -f2 -d/  )


# export the security group ID for nodes
export SECURITY_GROUP_ID=$(aws --region ${AWS_REGION} eks describe-cluster --name ${BATCH_EKS_CLUSTER_NAME}  --output text --query "cluster.resourcesVpcConfig.clusterSecurityGroupId" )

使用 AWS CLI 查询 EKS 集群节点组的节点组名称和实例角色:

# get the nodegroup information
export NODEGROUP_NAME=$(aws eks list-nodegroups --cluster-name $BATCH_EKS_CLUSTER_NAME --output text --query "nodegroups[0]")

# get the node IAM role
export EKS_NODE_ROLE_ARN=$(aws --region ${AWS_REGION} eks describe-nodegroup --cluster-name ${BATCH_EKS_CLUSTER_NAME}  --nodegroup-name ${NODEGROUP_NAME} --output text --query "nodegroup.nodeRole" )
export EKS_NODE_ROLE_NAME=$(echo $EKS_NODE_ROLE_ARN | cut -f2 -d/ )

# get the IAM instance profile ARN
export EKS_NODE_IAM_PROFILE_ARN=$(aws iam list-instance-profiles-for-role --role-name $EKS_NODE_ROLE_NAME --query InstanceProfiles[].Arn --output text)

# get the EKS cluster ARN
export EKS_CLUSTER_ARN="arn:aws:eks:${AWS_REGION}:${ACCOUNT_ID}:cluster/${BATCH_EKS_CLUSTER_NAME}"

创建更多环境变量,以在batch资源定义文件中使用:

export BATCH_EKS_NAMESPACE="${BATCH_EKS_CLUSTER_NAME}-batch-nodes"
export BATCH_EKS_CE_NAME="${BATCH_EKS_CLUSTER_NAME}-CE1"
export BATCH_EKS_JQ_NAME="${BATCH_EKS_CLUSTER_NAME}-JQ1"
export BATCH_EKS_S3ECR_SA_NAME="${BATCH_EKS_CLUSTER_NAME}-s3ecr-sa"
export BATCH_EKS_S3ECR_ROLE_NAME="${BATCH_EKS_CLUSTER_NAME}-s3ecr-role"

将环境变量保存到的 Bash 配置文件中,以防会话中断:

echo "export EKS_CLUSTER_ARN=${EKS_CLUSTER_ARN}" >> ~/.bash_profile
echo "export EKS_NODE_ROLE_NAME=${EKS_NODE_ROLE_NAME}" >> ~/.bash_profile
echo "export EKS_SERVICE_ROLE_NAME=${EKS_SERVICE_ROLE_NAME}" >> ~/.bash_profile
echo "export EKS_NODE_ROLE_ARN=${EKS_NODE_ROLE_ARN}" >> ~/.bash_profile
echo "export EKS_SERVICE_ROLE_ARN=${EKS_SERVICE_ROLE_ARN}" >> ~/.bash_profile
echo "export SECURITY_GROUP_ID=${SECURITY_GROUP_ID}" >> ~/.bash_profile
echo "export SUBNET_IDS=(${SUBNET_IDS[@]})" >> ~/.bash_profile
echo "export EKS_NODE_IAM_PROFILE_ARN=${EKS_NODE_IAM_PROFILE_ARN}" >> ~/.bash_profile
echo "export BATCH_EKS_NAMESPACE=${BATCH_EKS_NAMESPACE}" >> ~/.bash_profile
echo "export BATCH_EKS_CE_NAME=${BATCH_EKS_CE_NAME}" >> ~/.bash_profile
echo "export BATCH_EKS_JQ_NAME=${BATCH_EKS_JQ_NAME}" >> ~/.bash_profile
echo "export BATCH_EKS_S3ECR_SA_NAME=\"${BATCH_EKS_CLUSTER_NAME}-s3ecr-sa\"" >> ~/.bash_profile
echo "export BATCH_EKS_S3ECR_ROLE_NAME=\"${BATCH_EKS_CLUSTER_NAME}-s3ecr-role\"" >> ~/.bash_profile

执行结束后,可以检查~/.bash_profile文件:

image-20240107174059889

安装FluentBit以收集日志

要收集应用程序日志,必须在 EKS 集群中安装日志收集组件,例如 Fluent Bit、Fluentd 或 CloudWatch Container Insights。

我们将使用开源日志处理器和转发器 FluentBit , 将应用程序级别日志发送到 CloudWatch Logs

CloudWatchAgentServerPolicy附加到节点实例的Role

aws iam attach-role-policy --role-name ${EKS_NODE_ROLE_NAME} --policy 'arn:aws:iam::aws:policy/CloudWatchAgentServerPolicy'

如果还没有名为amazon-cloudwatch的命名空间,先创建命名空间:

kubectl create ns amazon-cloudwatch

运行以下命令创建一个以集群名称和要将日志发送到的区域命名的ConfigMap cluster-info

ClusterName=${BATCH_EKS_CLUSTER_NAME}
RegionName=${AWS_REGION}
FluentBitHttpPort='2020'
FluentBitReadFromHead='Off'
[[ ${FluentBitReadFromHead} = 'On' ]] && FluentBitReadFromTail='Off'|| FluentBitReadFromTail='On'
[[ -z ${FluentBitHttpPort} ]] && FluentBitHttpServer='Off' || FluentBitHttpServer='On'
kubectl create configmap fluent-bit-cluster-info \
--from-literal=cluster.name=${ClusterName} \
--from-literal=http.server=${FluentBitHttpServer} \
--from-literal=http.port=${FluentBitHttpPort} \
--from-literal=read.head=${FluentBitReadFromHead} \
--from-literal=read.tail=${FluentBitReadFromTail} \
--from-literal=logs.region=${RegionName} -n amazon-cloudwatch

通过运行以下命令下载 Fluent Bit daemonset 并将其部署到集群:

kubectl apply -f https://raw.githubusercontent.com/aws-samples/amazon-cloudwatch-container-insights/latest/k8s-deployment-manifest-templates/deployment-mode/daemonset/container-insights-monitoring/fluent-bit/fluent-bit.yaml

image-20240107182024795

通过输入以下命令来验证部署。每个节点应该有一个名为 Fluent-bit-* 的 Pod:

kubectl get pods -n amazon-cloudwatch

image-20240107182046721

还应该看到日志组显示在AWS CloudWatch 管理控制台中 :

image-20240107182450556

为 AWS Batch 创建 Kubernetes中的权限

接下来我们配置EKS 集群与 AWS Batch 一起使用。

为 AWS Batch 创建 Kubernetes 命名空间:

# use kubectl to create the namespace
cat - <<EOF | kubectl create -f -
{
  "apiVersion": "v1",
  "kind": "Namespace",
  "metadata": {
    "name": "${BATCH_EKS_NAMESPACE}",
    "labels": {
      "name": "${BATCH_EKS_NAMESPACE}"
    }
  }
}
EOF

# namespace/eks-cilium-batch-nodes created

创建 Kubernetes view Role:

cat - <<EOF | kubectl apply -f -
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: aws-batch-cluster-role
rules:
  - apiGroups: [""]
    resources: ["namespaces"]
    verbs: ["get"]
  - apiGroups: [""]
    resources: ["nodes"]
    verbs: ["list", "watch"]
  - apiGroups: [""]
    resources: ["pods"]
    verbs: ["list", "watch"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: aws-batch-cluster-role-binding
subjects:
- kind: User
  name: aws-batch
  apiGroup: rbac.authorization.k8s.io
roleRef:
  kind: ClusterRole
  name: aws-batch-cluster-role
  apiGroup: rbac.authorization.k8s.io
EOF

创建 Kubernetes Pod 管理角色:

cat - <<EOF | kubectl apply -f - --namespace "${BATCH_EKS_NAMESPACE}"
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: aws-batch-compute-environment-role
  namespace: ${BATCH_EKS_NAMESPACE}
rules:
  - apiGroups: [""]
    resources: ["pods"]
    verbs: ["create", "get", "list", "watch", "delete"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: aws-batch-compute-environment-role-binding
  namespace: ${BATCH_EKS_NAMESPACE}
subjects:
- kind: User
  name: aws-batch
  apiGroup: rbac.authorization.k8s.io
roleRef:
  kind: Role
  name: aws-batch-compute-environment-role
  apiGroup: rbac.authorization.k8s.io
EOF

创建角色并应用策略后,使用eksctlCLI 创建身份映射:

eksctl create iamidentitymapping \
    --cluster ${BATCH_EKS_CLUSTER_NAME} \
    --arn "arn:aws:iam::${ACCOUNT_ID}:role/AWSServiceRoleForBatch" \
    --username aws-batch