Skip to content

Setup Cilium Clustermesh

This guide explains how to setup cilium clustermesh on top of Metakube instances with cilium cni.

Prerequisites

This tutorial requires:

  • Access to two MetaKube clusters using IPv4.
  • PodCIDR/NodeCIDR ranges in all clusters and all nodes must be non-conflicting and unique IP addresses.
  • Nodes in all clusters must have IP connectivity between each other using the configured InternalIP for each node.
  • Access to OpenStack with valid and sufficient credentials
  • Openstack Subnet

Assumptions

  • The kubeconfig files are called kubeconfig-cluster-1 and kubeconfig-cluster-2

Create 2 Metakube clusters with non overlapping pod/node CIDRs

Cluster-1

...
   "spec": {
        "cloud": {
            "openstack":{
                "subnetCIDR":"192.168.0.0/24",
...
            }
        }
        "cniPlugin": {
            "type": "cilium",
            "cilium": {
                "clustermesh": {
                    "enable": true,
                    "clusterID": 1,
                    "ipv4NativeRoutingCIDR": "172.0.0.0/15"
                }
            }
        },
        "clusterNetwork": {
          "proxyMode":"none",
          "pods": {
            "cidrBlocks":[
              "172.0.0.0/16"
            ]
          },
        }
...
    }

Cluster-2

...
   "spec": {
        "cloud": {
            "openstack":{
                "subnetCIDR":"192.168.1.0/24",
...
            }
        }
        "cniPlugin": {
            "type": "cilium",
            "cilium": {
                "clustermesh": {
                    "enable": true,
                    "clusterID": 2,
                    "ipv4NativeRoutingCIDR": "172.0.0.0/15"
                }
            }
        },
        "clusterNetwork": {
          "proxyMode":"none",
          "pods": {
            "cidrBlocks":[
              "172.1.0.0/16"
            ]
          },
        }
...
    }

Enable clustermesh on existing cluster

Info

If you change the cluster ID and/or cluster name in a cluster with running workloads, you will need to restart all workloads. The cluster ID is used to generate the security identity and it will need to be re-created in order to establish access across clusters.

See

METAKUBE_CLUSTER_ID=clusterID-1
METAKUBE_PROJECT_ID=projectID
curl -X PATCH https://metakube.syseleven.de/api/v2/projects/${METAKUBE_PROJECT_ID}/clusters/${METAKUBE_CLUSTER_ID} -H "accept: application/json"  -H "Authorization: Bearer <TOKEN> -d "$(jq -n '{spec: {cniPlugin: {cilium: {clustermesh: {enable: true, clusterID: 1, ipv4NativeRoutingCIDR: "172.0.0.0/15"}}}}}')"

Connect to the clusters

Copy cilium-ca

  1. copy cilium-ca over to the second cluster

    kubectl get secret -n kube-system cilium-ca -o yaml | \
    kubectl neat | \
    kubectl --kubeconfig kubeconfig-cluster-2 apply -f -
    
  2. restart clustermesh-apiserver in peer cluster

    kubectl --kubeconfig kubeconfig-cluster-2 -n kube-system rollout restart deployment clustermesh-apiserver
    

Connect the cluster networks

Clusters in the same region ( for different regions see below )

openstack network create interconn
openstack subnet create --network interconn --subnet-range 10.0.0.0/30 --gateway none interconn

cluster-1

METAKUBE_CLUSTER_ID=clusterID-1
openstack port create --network interconn --fixed-ip subnet=interconn,ip-address=10.0.0.1 ic-${METAKUBE_CLUSTER_ID}
openstack router add port metakube-${METAKUBE_CLUSTER_ID} ic-${METAKUBE_CLUSTER_ID}
openstack router add route --route destination=192.168.1.0/24,gateway=10.0.0.2 metakube-${METAKUBE_CLUSTER_ID}
openstack router add route --route destination=172.1.0.0/16,gateway=10.0.0.2 metakube-${METAKUBE_CLUSTER_ID}
openstack security group rule create --ingress --remote-ip '192.168.1.0/24' metakube-${METAKUBE_CLUSTER_ID}
openstack security group rule create --ingress --remote-ip '172.0.0.0/15' metakube-${METAKUBE_CLUSTER_ID}

cluster-2

METAKUBE_CLUSTER_ID=clusterID-2
openstack port create --network interconn --fixed-ip subnet=interconn,ip-address=10.0.0.2 ic-${METAKUBE_CLUSTER_ID}
openstack router add port metakube-${METAKUBE_CLUSTER_ID} ic-${METAKUBE_CLUSTER_ID}
openstack router add route --route destination=192.168.0.0/24,gateway=10.0.0.1 metakube-${METAKUBE_CLUSTER_ID}
openstack router add route --route destination=172.0.0.0/16,gateway=10.0.0.1 metakube-${METAKUBE_CLUSTER_ID}
openstack security group rule create --ingress --remote-ip '192.168.0.0/24' metakube-${METAKUBE_CLUSTER_ID}
openstack security group rule create --ingress --remote-ip '172.0.0.0/15' metakube-${METAKUBE_CLUSTER_ID}

Clusters in different regions

cluster-1

METAKUBE_CLUSTER_ID=clusterID-1
openstack security group rule create --ingress --remote-ip '192.168.1.0/24' metakube-${METAKUBE_CLUSTER_ID}
openstack security group rule create --ingress --remote-ip '172.0.0.0/15' metakube-${METAKUBE_CLUSTER_ID}
openstack subnet create --network metakube-${METAKUBE_CLUSTER_ID} --subnet-range 172.0.0.0/16 --gateway none metakube-${METAKUBE_CLUSTER_ID}-pod
openstack port create --network metakube-${METAKUBE_CLUSTER_ID} --fixed-ip subnet=metakube-${METAKUBE_CLUSTER_ID}-pod,ip-address=172.0.255.254 metakube-${METAKUBE_CLUSTER_ID}-pod
openstack router add port metakube-${METAKUBE_CLUSTER_ID} metakube-${METAKUBE_CLUSTER_ID}-pod

openstack vpn ike policy create metakube-${METAKUBE_CLUSTER_ID} --ike-version v2 --auth-algorithm sha256 --encryption-algorithm aes-256 --pfs group14
openstack vpn ipsec policy create metakube-${METAKUBE_CLUSTER_ID} --auth-algorithm sha256 --encryption-algorithm aes-256 --pfs group14
openstack vpn service create metakube-${METAKUBE_CLUSTER_ID} --router metakube-${METAKUBE_CLUSTER_ID}

openstack vpn endpoint group create metakube-${METAKUBE_CLUSTER_ID}-local-epg --type subnet --value metakube-${METAKUBE_CLUSTER_ID} --value metakube-${METAKUBE_CLUSTER_ID}-pod
openstack vpn endpoint group create metakube-${METAKUBE_CLUSTER_ID}-peer-epg --type cidr --value 192.168.1.0/24 --value 172.1.0.0/16

cluster-2

METAKUBE_CLUSTER_ID=clusterID-2
openstack security group rule create --ingress --remote-ip '192.168.0.0/24' metakube-${METAKUBE_CLUSTER_ID}
openstack security group rule create --ingress --remote-ip '172.0.0.0/15' metakube-${METAKUBE_CLUSTER_ID}
openstack subnet create --network metakube-${METAKUBE_CLUSTER_ID} --subnet-range 172.1.0.0/16 --gateway none metakube-${METAKUBE_CLUSTER_ID}-pod
openstack port create --network metakube-${METAKUBE_CLUSTER_ID} --fixed-ip subnet=metakube-${METAKUBE_CLUSTER_ID}-pod,ip-address=172.1.255.254 metakube-${METAKUBE_CLUSTER_ID}-pod
openstack router add port metakube-${METAKUBE_CLUSTER_ID} metakube-${METAKUBE_CLUSTER_ID}-pod

openstack vpn ike policy create metakube-${METAKUBE_CLUSTER_ID} --ike-version v2 --auth-algorithm sha256 --encryption-algorithm aes-256 --pfs group14
openstack vpn ipsec policy create metakube-${METAKUBE_CLUSTER_ID} --auth-algorithm sha256 --encryption-algorithm aes-256 --pfs group14
openstack vpn service create metakube-${METAKUBE_CLUSTER_ID} --router metakube-${METAKUBE_CLUSTER_ID}

openstack vpn endpoint group create metakube-${METAKUBE_CLUSTER_ID}-local-epg --type subnet --value metakube-${METAKUBE_CLUSTER_ID} --value metakube-${METAKUBE_CLUSTER_ID}-pod
openstack vpn endpoint group create metakube-${METAKUBE_CLUSTER_ID}-peer-epg --type cidr --value 192.168.0.0/24 --value 172.0.0.0/16

cluster-1

Grab the public IP of the openstack vpn service from cluster-2:

METAKUBE_CLUSTER_ID=clusterID-2
PEER_ADDRESS=$(openstack vpn service show metakube-${METAKUBE_CLUSTER_ID} -f value -c external_v4_ip)

and connect the vpn site from cluster-1 side:

PEER_METAKUBE_CLUSTER_ID=clusterID-2
openstack vpn ipsec site connection create metakube-${PEER_METAKUBE_CLUSTER_ID} \
  --vpnservice metakube-${METAKUBE_CLUSTER_ID} \
  --ikepolicy metakube-${METAKUBE_CLUSTER_ID} \
  --ipsecpolicy metakube-${METAKUBE_CLUSTER_ID} \
  --local-endpoint-group metakube-${METAKUBE_CLUSTER_ID}-local-epg \
  --local-id left-peer.domain.example \
  --peer-address ${PEER_ADDRESS} \
  --peer-id right-peer.domain.example \
  --peer-endpoint-group metakube-${METAKUBE_CLUSTER_ID}-peer-epg \
  --psk secret

cluster-2

Grab the public IP of the openstack vpn service from cluster-1:

METAKUBE_CLUSTER_ID=clusterID-1
PEER_ADDRESS=$(openstack vpn service show metakube-${METAKUBE_CLUSTER_ID} -f value -c external_v4_ip)

and connect the vpn site from cluster-2 side:

PEER_METAKUBE_CLUSTER_ID=clusterID-1
openstack vpn ipsec site connection create metakube-${PEER_METAKUBE_CLUSTER_ID} \
  --vpnservice metakube-${METAKUBE_CLUSTER_ID} \
  --ikepolicy metakube-${METAKUBE_CLUSTER_ID} \
  --ipsecpolicy metakube-${METAKUBE_CLUSTER_ID} \
  --local-endpoint-group metakube-${METAKUBE_CLUSTER_ID}-local-epg \
  --local-id right-peer.domain.example \
  --peer-address ${PEER_ADDRESS} \
  --peer-id left-peer.domain.example \
  --peer-endpoint-group metakube-${METAKUBE_CLUSTER_ID}-peer-epg \
  --psk secret

Create externalName services

cluster-1

LB_IP=$(kubectl --kubeconfig kubeconfig-cluster-2 get -n kube-system service clustermesh-apiserver -o jsonpath='{.status.loadBalancer.ingress[0].ip}')
kubectl -n kube-system create service externalname cluster-${PEER_METAKUBE_CLUSTER_ID} --external-name ${LB_IP}.nip.io

cluster-2

LB_IP=$(kubectl --kubeconfig kubeconfig-cluster-1 get -n kube-system service clustermesh-apiserver -o jsonpath='{.status.loadBalancer.ingress[0].ip}')
kubectl -n kube-system create service externalname cluster-${PEER_METAKUBE_CLUSTER_ID} --external-name ${LB_IP}.nip.io

Connect the clusters

cluster-1

PEER_METAKUBE_CLUSTER_ID=clusterID-2
helm template cilium cilium/cilium \
  --version v1.18.7 \
  --namespace kube-system \
  --set cluster.name=${METAKUBE_CLUSTER_ID} \
  --set cluster.id=1 \
  --set clustermesh.useAPIServer=true \
  --set clustermesh.config.enabled=true \
  --set clustermesh.apiserver.tls.authMode=cluster \
  --set clustermesh.apiserver.kvstoremesh.enabled=true \
  --set clustermesh.apiserver.service.type=LoadBalancer \
  --set 'clustermesh.config.clusters[0].name='${PEER_METAKUBE_CLUSTER_ID} \
  --set 'clustermesh.config.clusters[0].port=2379' \
  --set 'clustermesh.config.clusters[0].address=cluster-'${PEER_METAKUBE_CLUSTER_ID}'.kube-system' |\
yq -cr -y 'select((.kind == "Secret") and (.metadata.name | test("cilium-(clustermesh|kvstoremesh)")))' | \
kubectl apply -f -

cluster-2

PEER_METAKUBE_CLUSTER_ID=clusterID-1
helm template cilium cilium/cilium \
  --version v1.18.7 \
  --namespace kube-system \
  --set cluster.name=${METAKUBE_CLUSTER_ID} \
  --set cluster.id=2 \
  --set clustermesh.useAPIServer=true \
  --set clustermesh.config.enabled=true \
  --set clustermesh.apiserver.tls.authMode=cluster \
  --set clustermesh.apiserver.kvstoremesh.enabled=true \
  --set clustermesh.apiserver.service.type=LoadBalancer \
  --set 'clustermesh.config.clusters[0].name='${PEER_METAKUBE_CLUSTER_ID} \
  --set 'clustermesh.config.clusters[0].port=2379' \
  --set 'clustermesh.config.clusters[0].address=cluster-'${PEER_METAKUBE_CLUSTER_ID}'.kube-system' |\
yq -cr -y 'select((.kind == "Secret") and (.metadata.name | test("cilium-(clustermesh|kvstoremesh)")))' | \
kubectl apply -f -

Troubleshooting

  1. cilium endpoint

    kubectl -n kube-system exec -it daemonset/cilium -- cilium-dbg endpoint list | grep app=http-echo
    622        Disabled           Disabled          66711      k8s:app=http-echo
    

    If you don't see the expected endpoint try to roll overt the cilium DS

    kubectl -n kube-system rollout restart daemonset cilium
    

  2. clustermesh service

    kubectl -n kube-system exec -ti deployment/clustermesh-apiserver -c apiserver -- clustermesh-apiserver shell kvstore/list | grep -E "# cilium/(cache|state)/services/"
    # cilium/cache/services/v1/clusterID-2/default/http-echo
    # cilium/state/services/v1/clusterID-1/default/http-echo
    
  3. cilium agent connecting to the local kvstoremesh

    kubectl -n kube-system exec -it daemonset/cilium -- cilium-dbg troubleshoot clustermesh
        Found 1 cluster configurations
    
        Cluster "clusterID-2":
        📄 Configuration path: /var/lib/cilium/clustermesh/clusterID-2
    
        🔌 Endpoints:
        - https://clustermesh-apiserver.kube-system.svc:2379
             Hostname resolved to: 10.240.22.125
             TCP connection successfully established to 10.240.22.125:2379
             TLS connection successfully established to 10.240.22.125:2379
            ℹ️  Negotiated TLS version: TLS 1.3, ciphersuite TLS_AES_128_GCM_SHA256
            ℹ️  Etcd server version: 3.6.6
    
        🔑 Digital certificates:
         TLS Root CA certificates:
            - Serial number:       fb:62:5c:15:b9:20:12:e7:70:3b:6d:12:ee:7d:cc:14
                Subject:             CN=Cilium CA
                Issuer:              CN=Cilium CA
                Validity:
                Not before:  2026-01-12 15:25:04 +0000 UTC
                Not after:   2029-01-11 15:25:04 +0000 UTC
         TLS client certificates:
            - Serial number:       2c:41:b7:b3:0d:94:29:67:6a:c1:80:9b:2a:ea:e7:14
                Subject:             CN=local-clusterID-1
                Issuer:              CN=Cilium CA
                Validity:
                Not before:  2026-01-12 16:03:35 +0000 UTC
                Not after:   2029-01-11 16:03:35 +0000 UTC
    
        ⚙️ Etcd client:
         Etcd connection successfully established
        ℹ️  Etcd cluster ID: f159592cd2603408
    
  4. kvstoremesh connecting to peer cluster

    kubectl -n kube-system exec -ti deployment/clustermesh-apiserver -c kvstoremesh -- clustermesh-apiserver kvstoremesh-dbg troubleshoot
        Found 1 cluster configurations
    
        Cluster "clusterID-2":
        📄 Configuration path: /var/lib/cilium/clustermesh/clusterID-2
    
        🔌 Endpoints:
        - https://clusterID-2.mesh.cilium.io:2379
             Hostname resolved to: 109.68.229.69
             TCP connection successfully established to 109.68.229.69:2379
             TLS connection successfully established to 109.68.229.69:2379
            ℹ️  Negotiated TLS version: TLS 1.3, ciphersuite TLS_AES_128_GCM_SHA256
            ℹ️  Etcd server version: 3.6.6
    
        🔑 Digital certificates:
         TLS Root CA certificates:
            - Serial number:       fb:62:5c:15:b9:20:12:e7:70:3b:6d:12:ee:7d:cc:14
                Subject:             CN=Cilium CA
                Issuer:              CN=Cilium CA
                Validity:
                Not before:  2026-01-12 15:25:04 +0000 UTC
                Not after:   2029-01-11 15:25:04 +0000 UTC
         TLS client certificates:
            - Serial number:       d2:77:54:8b:e6:b4:91:3d:cd:b5:3f:fe:ad:b3:d2:84
                Subject:             CN=remote
                Issuer:              CN=Cilium CA
                Validity:
                Not before:  2026-01-12 16:03:35 +0000 UTC
                Not after:   2029-01-11 16:03:35 +0000 UTC
    
        ⚙️ Etcd client:
         Etcd connection successfully established
        ℹ️  Etcd cluster ID: 7711876c9ebb329e
    
  5. overall cluster health

    kubectl -n kube-system exec -it daemonset/cilium -- cilium-dbg status --all-clusters
        KVStore:                Disabled
        Kubernetes:             Ok         1.34 (v1.34.2) [linux/amd64]
        Kubernetes APIs:        ["EndpointSliceOrEndpoint", "cilium/v2::CiliumCIDRGroup", "cilium/v2::CiliumClusterwideNetworkPolicy", "cilium/v2::CiliumEndpoint", "cilium/v2::CiliumNetworkPolicy", "cilium/v2::CiliumNode", "core/v1::Pods", "networking.k8s.io/v1::NetworkPolicy"]
        KubeProxyReplacement:   True   [ens3   192.168.0.10 fe80::f816:3eff:fe26:8da8 (Direct Routing)]
        Host firewall:          Disabled
        SRv6:                   Disabled
        CNI Chaining:           none
        CNI Config file:        successfully wrote CNI configuration file to /host/etc/cni/net.d/05-cilium.conflist
        Cilium:                 Ok   1.18.4 (v1.18.4-afda2aa9)
        NodeMonitor:            Listening for events on 2 CPUs with 64x4096 of shared memory
        Cilium health daemon:   Ok
        IPAM:                   IPv4: 9/254 allocated from 172.0.1.0/24,
        ClusterMesh:            1/1 remote clusters ready, 1 global-services
        clusterID-2: ready, 2 nodes, 19 endpoints, 0 identities, 1 services, 0 MCS-API service exports, 0 reconnections (last: never)
          etcd: 1/1 connected, leases=0, lock leases=0, has-quorum=true: endpoint status checks are disabled, ID: f159592cd2603408
          remote configuration: expected=true, retrieved=true, cluster-id=2, kvstoremesh=true, sync-canaries=true, service-exports=disabled
          synchronization status: nodes=true, endpoints=true, identities=true, services=true
        IPv4 BIG TCP:            Disabled
        IPv6 BIG TCP:            Disabled
        BandwidthManager:        Disabled
        Routing:                 Network: Native   Host: BPF
        Attach Mode:             TCX
        Device Mode:             veth
        Masquerading:            BPF   [ens3]   172.0.0.0/16  [IPv4: Enabled, IPv6: Disabled]
        Controller Status:       59/59 healthy
        Proxy Status:            OK, ip 172.0.1.149, 0 redirects active on ports 10000-20000, Envoy: external
        Global Identity Range:   min 65536, max 131071
        Hubble:                  Ok              Current/Max Flows: 4095/4095 (100.00%), Flows/s: 36.02   Metrics: Ok
        Encryption:              Disabled
        Cluster health:          4/4 reachable   (2026-01-13T09:02:48Z)   (Probe interval: 2m11.83347464s)
        Name                     IP              Node                     Endpoints
        Modules Health:          Stopped(16) Degraded(3) OK(84)
    
    6. try etcdctl

    export PEER_CLUSTER=clusterID-2 ; etcdctl \
  --endpoints=$(kubectl -n kube-system get secret cilium-kvstoremesh -o json | jq -cr '.data | with_entries(select(.key | test("etcd") | not)) | ."'${PEER_CLUSTER}'" |  @base64d' | yq -c ".endpoints?[]") \
  --cacert=<(kubectl -n kube-system get secret clustermesh-apiserver-remote-cert -o json | jq -cr '.data."ca.crt" |  @base64d') \
  --cert =(kubectl -n kube-system get secret clustermesh-apiserver-remote-cert -o json | jq -cr '.data."tls.crt" |  @base64d') \
  --key =(kubectl -n kube-system get secret clustermesh-apiserver-remote-cert -o json | jq -cr '.data."tls.key" |  @base64d') \
 get --prefix cilium/state/