Deploy Cilium with policy enforcement

Cilium has an option to configure policy enforcement mode .

The default mode is not restrictive enough, we want to block all traffic unless it is explicitly allowed.

With Cilium installed, the cluster network should now be ready.

As we don’t have any network policy configured yet, pods requiring communications should be in errors though.

Observe pod errors

After a while, cluster nodes should become Ready:

$ kubectl get nodeNAME                  STATUS   ROLES                  AGE     VERSION
kind-control-plane    Ready    control-plane,master   9m20s   v1.23.3
kind-control-plane2   Ready    control-plane,master   9m8s    v1.23.3
kind-control-plane3   Ready    control-plane,master   8m16s   v1.23.3
kind-worker           Ready    <none>                 7m59s   v1.23.3
kind-worker2          Ready    <none>                 7m59s   v1.23.3
kind-worker3          Ready    <none>                 7m59s   v1.23.3

A couple of critical pods should exhibit some errors though:

$ kubectl get pod -A -o wideNAMESPACE            NAME                                          READY   STATUS    RESTARTS      AGE     IP             NODE                  NOMINATED NODE   READINESS GATES
.....
.....
kube-system          coredns-64897985d-s78wv                       0/1     Running   0             11m     10.244.5.77    kind-worker3          <none>           <none>
kube-system          coredns-64897985d-vrz8c                       0/1     Running   0             11m     10.244.5.70    kind-worker3          <none>           <none>
.....
.....
local-path-storage   local-path-provisioner-5ddd94ff66-prh4p       0/1     Error     2 (47s ago)   11m     10.244.5.166   kind-worker3          <none>           <none>

From the list above, core-dns pods are not getting Ready and local-path-provisioner is in Error.

This is because those pods need to talk with the api server, but there is no network policy that allows such communications.

Basically all pods running with hostNetwork will be fine, but those without will be in troubles if they need to communicate with another pod and don’t have a network policy that allows the traffic.

Fix core-dns

In order to resolve the core-dns pods issue, we will add a CiliumNetworkPolicy to allow core-dns pods to talk to the api server:

kubectl apply -f - <<EOF
apiVersion: cilium.io/v2
kind: CiliumNetworkPolicy
metadata:
  name: core-dns
  namespace: kube-system
spec:
  endpointSelector:
    matchLabels:
      io.cilium.k8s.policy.serviceaccount: coredns
  egress:
  - toEntities:
    - kube-apiserver
EOF

With this policy, core-dns pods should reach the Ready state.

There are two interesting things to note about the policy above:

  • core-dns pods are matched using their service account, we could have used pod labels too but matching with a service account is a great Cilium feature. Service accounts kind of represents an identity, regardless of the pods that use it. Being able to express policies using identities makes them easier to write and understand.
  • toEntities in the egress rule uses predefined targets managed directly by Cilium . This allows to easily whitelist well know target like the api server.

Fix local-path-provisioner

In the same spirit as core-dns pods, local-path-provisioner pods need to talk with the api server.

We can apply almost the same policy to fix the issue:

kubectl apply -f - <<EOF
apiVersion: cilium.io/v2
kind: CiliumNetworkPolicy
metadata:
  name: local-path-provisioner
  namespace: local-path-storage
spec:
  endpointSelector:
    matchLabels:
      io.cilium.k8s.policy.serviceaccount: local-path-provisioner-service-account
  egress:
  - toEntities:
    - kube-apiserver
EOF

Once the policy above created, local-path-provisioner pods should run without errors.

Anything else to fix ?

Unfortunately yes. We have other pods not running within hostNetwork that require communication:

Fix Hubble Relay

kubectl apply -f - <<EOF
apiVersion: cilium.io/v2
kind: CiliumNetworkPolicy
metadata:
  name: hubble-relay
  namespace: kube-system
spec:
  endpointSelector:
    matchLabels:
      io.cilium.k8s.policy.serviceaccount: hubble-relay
  egress:
  - toEntities:
    - host
    - remote-node
EOF

This time, we allow Hubble Relay to talk with cluster nodes, this is because cilium agents run in hostNetwork and therefore Hubble Relay needs to be allowed to talk to remote-node and host entities.

Fix Hubble UI

For this one, we will try to connect to Hubble UI with our browser.

First, run a port-forward command to the Hubble UI service:

kubectl port-forward -n kube-system svc/hubble-ui 8888:80

Then browse http://localhost:8888/ and see what happens:

img

We are able to communicate with Hubble UI but no namespaces are visible in the list.

This makes sense as Hubble UI needs to talk to the api server to fetch the namespace list.

Let’s allow Hubble UI pods to talk with the api server:

kubectl apply -f - <<EOF
apiVersion: cilium.io/v2
kind: CiliumNetworkPolicy
metadata:
  name: hubble-ui
  namespace: kube-system
spec:
  endpointSelector:
    matchLabels:
      io.cilium.k8s.policy.serviceaccount: hubble-ui
  egress:
  - toEntities:
    - kube-apiserver
EOF

This looks better, we now have the namespace list working:

img

Unfortunately this is not enough, trying to dig in a namespace will not work because Hubble UI is not allowed to communicate with <strong>Hubble Relay</strong> and therefore cannot retrieve flows.

img

We need to allow communications between Hubble UI and Hubble Relay :

kubectl apply -f - <<EOF
apiVersion: cilium.io/v2
kind: CiliumNetworkPolicy
metadata:
  name: hubble-ui
  namespace: kube-system
spec:
  endpointSelector:
    matchLabels:
      io.cilium.k8s.policy.serviceaccount: hubble-ui
  egress:
  - toEntities:
    - kube-apiserver
  - toEndpoints:
    - matchLabels:
        io.cilium.k8s.policy.serviceaccount: hubble-relay
---
apiVersion: cilium.io/v2
kind: CiliumNetworkPolicy
metadata:
  name: hubble-relay
  namespace: kube-system
spec:
  endpointSelector:
    matchLabels:
      io.cilium.k8s.policy.serviceaccount: hubble-relay
  ingress:
  - fromEndpoints:
    - matchLabels:
        io.cilium.k8s.policy.serviceaccount: hubble-ui
  egress:
  - toEntities:
    - host
    - remote-node
EOF

Note that we need to modify two policies here:

That is, communication has to been authorized on both sides.

Unfortunately, this still doesn’t work. Looking at the Hubble UI pod logs will reveal that it cannot resolve the Hubble Relay service:

level=error msg="hubble status checker: failed to connect to hubble-relay: rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing dial tcp: lookup hubble-relay on 10.96.0.10:53: read udp 10.244.5.4:55406->10.96.0.10:53: i/o timeout\"\n" subsys=ui-backend

Once again, this makes sense as the <strong>Hubble UI</strong> pods need to resolve the IP address of the <strong>Hubble Relay</strong> service and that involves communication. This communication was not whitelisted though and observing an error is completely expected.

To fix this error we need to add another policy to allow egress from Hubble UI pods to core-dns pods and allow ingress in core-dns pods from Hubble UI pods:

kubectl apply -f - <<EOF
apiVersion: cilium.io/v2
kind: CiliumNetworkPolicy
metadata:
  name: hubble-ui
  namespace: kube-system
spec:
  endpointSelector:
    matchLabels:
      io.cilium.k8s.policy.serviceaccount: hubble-ui
  egress:
  - toEntities:
    - kube-apiserver
  - toEndpoints:
    - matchLabels:
        io.cilium.k8s.policy.serviceaccount: hubble-relay
    - matchLabels:
        io.cilium.k8s.policy.serviceaccount: coredns
---
apiVersion: cilium.io/v2
kind: CiliumNetworkPolicy
metadata:
  name: core-dns
  namespace: kube-system
spec:
  endpointSelector:
    matchLabels:
      io.cilium.k8s.policy.serviceaccount: coredns
  ingress:
  - fromEndpoints:
    - matchLabels:
        io.cilium.k8s.policy.serviceaccount: hubble-ui
  egress:
  - toEntities:
    - kube-apiserver
EOF

Finally, Hubble UI should now display flows correctly 🎉

img

One last fix

Now Hubble UI works, we can observe that some communication is still blocked. The communication between core-dns pods and the world entity is not allowed.

We can easily fix this by adding the world entity to the egress whitelist of core-dns pods:

kubectl apply -f - <<EOF
apiVersion: cilium.io/v2
kind: CiliumNetworkPolicy
metadata:
  name: core-dns
  namespace: kube-system
spec:
  endpointSelector:
    matchLabels:
      io.cilium.k8s.policy.serviceaccount: coredns
  ingress:
  - fromEndpoints:
    - matchLabels:
        io.cilium.k8s.policy.serviceaccount: hubble-ui
  egress:
  - toEntities:
    - kube-apiserver
    - world
EOF

Now we should not observe unauthorized communications anymore as all legitimate traffic has been whitelisted.

img

Wrapping it up

In the end, configuring network policies in a cluster is HARD. In this article we did it for a small number of workloads so doing it at a larger scale definitely takes some work.

Luckily, CiliumNetworkPolicy simplifies a couple of things with entities and using service accounts to describe rules certainly helps, it stays complex by nature though.

On the other hand it is worth the effort as it vastly improves security in the cluster.

Although it’s a good start, rules could be even more restrictive by:

This leaves room for other articles 😏

Please note that Cilium network policies support more options to configure ingress and egress rules that were not covered in this article.