Introduction > 在EKS上运行Batch > 通过Pi计算压测集群 - I

通过Pi计算压测集群 - I

在本节中，我们使用 AWS Batch 将大量计算密集型作业发送到 EKS 集群。我们将使用 Prometheus 和 Grafana 来监控并绘制节点启动和作业放置时集群上的负载的图表：

Grafana 仪表板显示随时间变化的集群负载

我们将：

使用 Helm 在 EKS 集群上安装 Prometheus 和 Grafana。
为计算 Pi 的脚本编写 Dockerfile 容器规范。
创建 ECR 存储库来保存自定义容器映像。
构建容器镜像并将其上传到ECR存储库。
创建执行任务的批处理作业定义。
将作业作为数组作业提交。
查看 Grafana 仪表板上绘制的结果

设置 Prometheus 和 Grafana 来监控集群

helm安装 Prometheus 和 Grafana 监控工具：

# add prometheus Helm repo
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts

# add grafana Helm repo
helm repo add grafana https://grafana.github.io/helm-charts

部署prometheus

安装 Prometheus:

kubectl create namespace prometheus

helm repo add prometheus-community https://prometheus-community.github.io/helm-charts

helm install prometheus prometheus-community/prometheus \
    --namespace prometheus \
    --set alertmanager.persistentVolume.storageClass="gp2" \
    --set server.persistentVolume.storageClass="gp2"

记下 helm返回中的 prometheus endpoint（稍后将需要它）。它应该类似于以下内容：

The Prometheus server can be accessed via port 80 on the following DNS name from within your cluster:
prometheus-server.prometheus.svc.cluster.local

检查 Prometheus 组件是否按预期部署

kubectl get all -n prometheus

应该看到类似于下面的响应：

一旦注意到所有 Pod 均已部署，请继续执行下一步。

部署 Grafana

使用以下命令创建名为grafana.yaml的 YAML 文件：

mkdir ${HOME}/environment/grafana

cat << EoF > ${HOME}/environment/grafana/grafana.yaml
datasources:
  datasources.yaml:
    apiVersion: 1
    datasources:
    - name: Prometheus
      type: prometheus
      url: http://prometheus-server.prometheus.svc.cluster.local
      access: proxy
      isDefault: true
EoF

部署grafana：

kubectl create namespace grafana

helm install grafana grafana/grafana \
    --namespace grafana \
    --set persistence.storageClassName="gp2" \
    --set persistence.enabled=true \
    --set adminPassword='EKS!sAWSome' \
    --values ${HOME}/environment/grafana/grafana.yaml \
    --set service.type=LoadBalancer

运行以下命令检查Grafana是否正确部署：

kubectl get all -n grafana

应该会看到类似的结果：

由于现在使用service.type=LoadBalancer参数创建出来的是Internal Netowork LoadBalancer，所以不能公网访问。接下来我们还是以port foward这种方式访问grafana

执行：

kubectl port-forward service/grafana 8080:80 --namespace=grafana

然后访问本机的8080端口：

登录时，使用用户名 admin，并通过运行以下命令获取密码：

kubectl get secret --namespace grafana grafana -o jsonpath="{.data.admin-password}" | base64 --decode ; echo

成功在浏览器中登录到grafana：

登录Grafana并查看集群监控仪表板

选择Import dashboard：

输入3119，然后选择Load：

在 prometheus 数据源下拉列表中选择Prometheus，然后选择Import：

Grafana 仪表板导入 UI，展示如何导入公共仪表板

这将显示所有集群节点的监控仪表板：