参考 https://medium.com/codex/establish-cilium-clustermesh-whelm-chart-11b08b0c995c

ClusterMesh, a solution of multi-cluster from Cilium, can provides a bunch of benefits for cross-cluster communications or networking traffics, such as:

\1. Pod IP routing across multiple Kubernetes clusters at native performance via tunneling or direct-routing without requiring any gateways or proxies.

\2. Transparent service discovery with standard Kubernetes services and coredns/kube-dns.

\3. Network policy enforcement spanning multiple clusters. Policies can be specified as Kubernetes NetworkPolicy resource or the extended CiliumNetworkPolicy CRD.

\4. Transparent encryption for all communication between nodes in the local cluster as well as across cluster boundaries.

img

The most attractive points for us are 3) and 4).

For point 3), it is pretty useful for multi-tenant network isolation in multi-cluster environments. We want to apply Pod network policy rules against targets located in other Kubernetes clusters. Cilium ClusterMesh would be a perfect solution here.

For point 4), it improves the security of service communications among the clusters. (Learn more details of Cilium ClusterMesh here )

Therefore, we want to build Cilium ClusterMesh in our multi-cluster environments and this post shows details of what I did to setup the clustermesh on AWS EKS clusters, especially with Helm Chart instead of Cilium CLI. Helm is mostly used in our IaC (terraform) and easy to be deployed with our pipeline.

Prerequisites

There are some requirements of setting up Cilium ClusterMesh on two Kubernetes clusters, such as IP address connectivity between clusters, which means Node IPs (also Pod IPs if using Native-Routing mode) can communicate with each other across clusters.

Here are the entire list of prerequisites from Cilium documentation.

Connect two Kubernetes (EKS) Clusters

两个EKS集群先用vpc-peering或者tgw打通,这样彼此的pod或node可以互相ping通。另外两个EKS集群的网段也不能有重叠,例如:

  • Cluster1: 10.103.0.0/16
  • Cluster2: 10.104.0.0/16

使用Cilium CIL建立clustermesh

开启cluster mesh

下面命令会开启clustermesh,在集群中部署 clustermesh-apiserver

❯ cilium clustermesh enable --context $CLUSTER1
🔮 Auto-exposing service within AWS VPC (service.beta.kubernetes.io/aws-load-balancer-internal: 0.0.0.0/0
🔑 Found CA in secret cilium-ca
🔑 Generating certificates for ClusterMesh...
✨ Deploying clustermesh-apiserver from quay.io/cilium/clustermesh-apiserver:v1.12.4...
✅ ClusterMesh enabled!

因为clustermesh-apiserver是以Service形式部署,所以会创建一个AWS internal ELB:

❯ cilium clustermesh status --context $CLUSTER1
Hostname based ingress detected, trying to resolve it
Hostname resolved, using the found ip(s)
✅ Cluster access information is available:
  - 10.103.27.163:2379
  - 10.103.98.190:2379
✅ Service "clustermesh-apiserver" of type "LoadBalancer" found
🔌 Cluster Connections:
🔀 Global services: [ min:0 / avg:0.0 / max:0 ]

在第二个集群上运行类似的命令。

连接两个集群

运行以下命令:

❯ cilium clustermesh connect --context $CLUSTER1 --destination-context $CLUSTER2
✨ Extracting access information of cluster cluster2...
🔑 Extracting secrets from cluster cluster2...
Hostname based ingress detected, trying to resolve it
Hostname resolved, using the found ip(s)
ℹ️  Found ClusterMesh service IPs: [10.104.119.68 10.104.48.130]
✨ Extracting access information of cluster cluster1...
🔑 Extracting secrets from cluster cluster1...
Hostname based ingress detected, trying to resolve it
Hostname resolved, using the found ip(s)
ℹ️  Found ClusterMesh service IPs: [10.103.98.190 10.103.27.163]
✨ Connecting cluster arn:aws:eks:us-east-2:1234567890:cluster/cluster1 -> arn:aws:eks:us-east-2:1234567890:cluster/cluster2...
🔑 Secret cilium-clustermesh does not exist yet, creating it...
🔑 Patching existing secret cilium-clustermesh...
✨ Patching DaemonSet with IP aliases cilium-clustermesh...
✨ Connecting cluster arn:aws:eks:us-east-2:1234567890:cluster/cluster2 -> arn:aws:eks:us-east-2:1234567890:cluster/cluster1...
🔑 Secret cilium-clustermesh does not exist yet, creating it...
🔑 Patching existing secret cilium-clustermesh...
✨ Patching DaemonSet with IP aliases cilium-clustermesh...
✅ Connected cluster arn:aws:eks:us-east-2:1234567890:cluster/cluster1 and arn:aws:eks:us-east-2:1234567890:cluster/cluster2!

再次运行clustermesh status 命令:

❯ cilium clustermesh status --context $CLUSTER1
Hostname based ingress detected, trying to resolve it
Hostname resolved, using the found ip(s)
✅ Cluster access information is available:
  - 10.103.27.163:2379
  - 10.103.98.190:2379
✅ Service "clustermesh-apiserver" of type "LoadBalancer" found
✅ All 4 nodes are connected to all clusters [min:1 / avg:1.0 / max:1]
🔌 Cluster Connections:
- cluster2: 4/4 configured, 4/4 connected
🔀 Global services: [ min:13 / avg:13.0 / max:13 ]

有一个新的cluster2显示出来,在cluster2执行也是类似的输出:

❯ cilium clustermesh status --context $CLUSTER2
Hostname based ingress detected, trying to resolve it
Hostname resolved, using the found ip(s)
✅ Cluster access information is available:
  - 10.104.119.68:2379
  - 10.104.48.130:2379
✅ Service "clustermesh-apiserver" of type "LoadBalancer" found
✅ All 2 nodes are connected to all clusters [min:1 / avg:1.0 / max:1]
🔌 Cluster Connections:
- cluster1: 2/2 configured, 2/2 connected
🔀 Global services: [ min:14 / avg:14.0 / max:14 ]

Verify from Cilium Nodes

Another verification can be done on the Cilium daemonset pods. Open a shell to a cilium pod and run cilium status — verbose to display more detailed status.

There is a section for cluster health and you would see all the node from both clusters if everything is set up properly.

Cluster health:                                                              6/6 reachable   
  Name                                                                       IP              Node        Endpoints
  cluster1/ip-10-103-77-148.us-east-2.compute.internal (localhost)   10.103.77.148   reachable   reachable
  cluster1/ip-10-103-0-33.us-east-2.compute.internal                 10.103.0.33     reachable   reachable
  cluster1/ip-10-103-48-26.us-east-2.compute.internal                10.103.48.26    reachable   reachable
  cluster1/ip-10-103-90-232.us-east-2.compute.internal               10.103.90.232   reachable   reachable
  cluster2/ip-10-104-108-60.us-east-2.compute.internal               10.104.108.60   reachable   reachable
  cluster2/ip-10-104-28-22.us-east-2.compute.internal                10.104.28.22    reachable   reachable

Another command we could run is cilium-health status which shows the health status for each node from both clusters.

root@ip-10-103-77-148:/home/cilium# cilium-health status
Nodes:
  cluster1/ip-10-103-77-148.us-east-2.compute.internal (localhost):
    Host connectivity to 10.103.77.148:
      ICMP to stack:   OK, RTT=537.702µs
      HTTP to agent:   OK, RTT=101.574µs
    Endpoint connectivity to 10.103.77.254:
      ICMP to stack:   OK, RTT=530.279µs
      HTTP to agent:   OK, RTT=206.954µs
  cluster1/ip-10-103-0-33.us-east-2.compute.internal:
    Host connectivity to 10.103.0.33:
      ICMP to stack:   OK, RTT=1.083102ms
      HTTP to agent:   OK, RTT=1.089925ms
    Endpoint connectivity to 10.103.6.167:
      ICMP to stack:   OK, RTT=1.111456ms
      HTTP to agent:   OK, RTT=1.268708ms
  cluster1/ip-10-103-48-26.us-east-2.compute.internal:
    Host connectivity to 10.103.48.26:
      ICMP to stack:   OK, RTT=1.21498ms
      HTTP to agent:   OK, RTT=991.185µs
    Endpoint connectivity to 10.103.40.104:
      ICMP to stack:   OK, RTT=1.220342ms
      HTTP to agent:   OK, RTT=1.265738ms
  cluster1/ip-10-103-90-232.us-east-2.compute.internal:
    Host connectivity to 10.103.90.232:
      ICMP to stack:   OK, RTT=1.007351ms
      HTTP to agent:   OK, RTT=475.152µs
    Endpoint connectivity to 10.103.67.2:
      ICMP to stack:   OK, RTT=550.526µs
      HTTP to agent:   OK, RTT=444.213µs
  cluster2/ip-10-104-108-60.us-east-2.compute.internal:
    Host connectivity to 10.104.108.60:
      ICMP to stack:   OK, RTT=737.763µs
      HTTP to agent:   OK, RTT=617.232µs
    Endpoint connectivity to 10.104.103.40:
      ICMP to stack:   OK, RTT=936.179µs
      HTTP to agent:   OK, RTT=549.053µs
  cluster2/ip-10-104-28-22.us-east-2.compute.internal:
    Host connectivity to 10.104.28.22:
      ICMP to stack:   OK, RTT=1.936628ms
      HTTP to agent:   OK, RTT=1.056997ms
    Endpoint connectivity to 10.104.138.202:
      ICMP to stack:   OK, RTT=2.024877ms
      HTTP to agent:   OK, RTT=1.398727ms

Any Other Options?

Since the domain of an ELB can be used, how about a domain in an org or company, such as cluster1.mesh.my-comany.com?

With external-dns , it is as simple as an annotation for the clustermesh-apiserver.

External-dns creates a domain record as specified in the annotation in Route 53 for the ELB automatically so cilium agents can use it to access the remote clustermesh-apiserver too.