步骤概述

  1. 在kubelet里更新domain name参数

  2. 在CoreDNS ConfigMap里更新domain name

下面是具体操作流程

更新kubelet参数

在新的节点组里将设置kubelet cluster-domain的命令以overrideBootstrapCommand形式执行:

apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig

metadata:
  name: eks-lab
  region: ap-northeast-1

managedNodeGroups:
  - name: managed-ng-3
    instanceType: m5.large
    ami: ami-0f7fede2f39ef763d  # 替换ami
    minSize: 2
    maxSize: 4
    desiredCapacity: 2
    volumeSize: 20
    labels:
      auto-delete: 'no'
      
    # cluster domain由cluster.local更新为staging.local 将eks-lab替换成自己的集群名称
    overrideBootstrapCommand: |
      #!/bin/bash
      /etc/eks/bootstrap.sh eks-lab --kubelet-extra-args --cluster-domain="staging.local"  
      
    iam:
      withAddonPolicies:
        externalDNS: true
        certManager: true
  

部署这个yaml创建新的节点组:eksctl create nodegroup -f xxx.yaml

yaml解释

如果要指定overrideBootstrapCommand,则必须使用customed ami:

参考 https://repost.aws/zh-Hans/knowledge-center/eks-troubleshoot-eksctl-cluster-node

不指定ami的话报错如下:

image-20230521093040385

这里不再制作ami,直接使用eks默认的ami id(注意替换版本及region):

aws ssm get-parameter --name /aws/service/eks/optimized-ami/1.23/amazon-linux-2/recommended/image_id --region ap-southeast-1 --query "Parameter.Value" --output text

CoreDNS ConfigMap里更新domain name

创建一个新的nginx服务:

kind: Service
apiVersion: v1
metadata:
  name: nginx
  labels:
    app: nginx
spec:
  selector:
    app: nginx
  ports:
  - port: 80
    protocol: TCP
    targetPort: 80
  type: ClusterIP
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx
  labels:
    app: nginx
spec:
  replicas: 1
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - name: nginx
        image: nginx:latest
        imagePullPolicy: Always
        ports:
        - containerPort: 80
          protocol: TCP

将上面内容保存为nginx.yaml然后kubectl apply -f nginx.yaml,这样创建了一个nginx服务。

此时查看容器内部的/etc/resolv.conf,发现已经被替换为staging.local:

kubectl exec -it nginx-deployment-857b7d78bb-wsrxl  cat /etc/resolv.conf # 使用kubectl get po 获取pod id做替换

image-20230521104423009

创建一个curl pod:

kubectl run mycurlpod --image=curlimages/curl -i --tty -- sh

在里面执行curl命令访问nginx服务,报无法解析的错误:

image-20230521105332867

这是因为还需要再更新coredns的配置:

 kubectl edit cm -n kube-system coredns

在这一行增加一个 staging.local域名:


    kubernetes staging.local cluster.local in-addr.arpa  ip6.arpa  

然后保存。 更新后的内容:

image-20230521105656372

重启coredns pod以应用这个configmap更新:

$ kubectl rollout restart deployment coredns -n kube-system                                                                                       
deployment.apps/coredns restarted

测试解析

在上面的 curl pod上执行命令:

curl nginx
curl nginx.default
curl nginx.default.svc.staging.local

均能访问成功:

image-20230521105743511

image-20230521105856227