Setup Cilium Clustermesh
This guide explains how to setup cilium clustermesh on top of Metakube instances with cilium cni.
Prerequisites
This tutorial requires:
- Access to two MetaKube clusters using IPv4.
- PodCIDR/NodeCIDR ranges in all clusters and all nodes must be non-conflicting and unique IP addresses.
- Nodes in all clusters must have IP connectivity between each other using the configured InternalIP for each node.
- Access to OpenStack with valid and sufficient credentials
- Openstack Subnet
Assumptions
- The kubeconfig files are called
kubeconfig-cluster-1andkubeconfig-cluster-2
Create 2 Metakube clusters with non overlapping pod/node CIDRs
Cluster-1
...
"spec": {
"cloud": {
"openstack":{
"subnetCIDR":"192.168.0.0/24",
...
}
}
"cniPlugin": {
"type": "cilium",
"cilium": {
"clustermesh": {
"enable": true,
"clusterID": 1,
"ipv4NativeRoutingCIDR": "172.0.0.0/15"
}
}
},
"clusterNetwork": {
"proxyMode":"none",
"pods": {
"cidrBlocks":[
"172.0.0.0/16"
]
},
}
...
}
Cluster-2
...
"spec": {
"cloud": {
"openstack":{
"subnetCIDR":"192.168.1.0/24",
...
}
}
"cniPlugin": {
"type": "cilium",
"cilium": {
"clustermesh": {
"enable": true,
"clusterID": 2,
"ipv4NativeRoutingCIDR": "172.0.0.0/15"
}
}
},
"clusterNetwork": {
"proxyMode":"none",
"pods": {
"cidrBlocks":[
"172.1.0.0/16"
]
},
}
...
}
Enable clustermesh on existing cluster
Info
If you change the cluster ID and/or cluster name in a cluster with running workloads, you will need to restart all workloads. The cluster ID is used to generate the security identity and it will need to be re-created in order to establish access across clusters.
METAKUBE_CLUSTER_ID=clusterID-1
METAKUBE_PROJECT_ID=projectID
curl -X PATCH https://metakube.syseleven.de/api/v2/projects/${METAKUBE_PROJECT_ID}/clusters/${METAKUBE_CLUSTER_ID} -H "accept: application/json" -H "Authorization: Bearer <TOKEN> -d "$(jq -n '{spec: {cniPlugin: {cilium: {clustermesh: {enable: true, clusterID: 1, ipv4NativeRoutingCIDR: "172.0.0.0/15"}}}}}')"
Connect to the clusters
Copy cilium-ca
-
copy
cilium-caover to the second cluster -
restart
clustermesh-apiserverin peer cluster
Connect the cluster networks
Clusters in the same region ( for different regions see below )
openstack network create interconn
openstack subnet create --network interconn --subnet-range 10.0.0.0/30 --gateway none interconn
cluster-1
METAKUBE_CLUSTER_ID=clusterID-1
openstack port create --network interconn --fixed-ip subnet=interconn,ip-address=10.0.0.1 ic-${METAKUBE_CLUSTER_ID}
openstack router add port metakube-${METAKUBE_CLUSTER_ID} ic-${METAKUBE_CLUSTER_ID}
openstack router add route --route destination=192.168.1.0/24,gateway=10.0.0.2 metakube-${METAKUBE_CLUSTER_ID}
openstack router add route --route destination=172.1.0.0/16,gateway=10.0.0.2 metakube-${METAKUBE_CLUSTER_ID}
openstack security group rule create --ingress --remote-ip '192.168.1.0/24' metakube-${METAKUBE_CLUSTER_ID}
openstack security group rule create --ingress --remote-ip '172.0.0.0/15' metakube-${METAKUBE_CLUSTER_ID}
cluster-2
METAKUBE_CLUSTER_ID=clusterID-2
openstack port create --network interconn --fixed-ip subnet=interconn,ip-address=10.0.0.2 ic-${METAKUBE_CLUSTER_ID}
openstack router add port metakube-${METAKUBE_CLUSTER_ID} ic-${METAKUBE_CLUSTER_ID}
openstack router add route --route destination=192.168.0.0/24,gateway=10.0.0.1 metakube-${METAKUBE_CLUSTER_ID}
openstack router add route --route destination=172.0.0.0/16,gateway=10.0.0.1 metakube-${METAKUBE_CLUSTER_ID}
openstack security group rule create --ingress --remote-ip '192.168.0.0/24' metakube-${METAKUBE_CLUSTER_ID}
openstack security group rule create --ingress --remote-ip '172.0.0.0/15' metakube-${METAKUBE_CLUSTER_ID}
Clusters in different regions
cluster-1
METAKUBE_CLUSTER_ID=clusterID-1
openstack security group rule create --ingress --remote-ip '192.168.1.0/24' metakube-${METAKUBE_CLUSTER_ID}
openstack security group rule create --ingress --remote-ip '172.0.0.0/15' metakube-${METAKUBE_CLUSTER_ID}
openstack subnet create --network metakube-${METAKUBE_CLUSTER_ID} --subnet-range 172.0.0.0/16 --gateway none metakube-${METAKUBE_CLUSTER_ID}-pod
openstack port create --network metakube-${METAKUBE_CLUSTER_ID} --fixed-ip subnet=metakube-${METAKUBE_CLUSTER_ID}-pod,ip-address=172.0.255.254 metakube-${METAKUBE_CLUSTER_ID}-pod
openstack router add port metakube-${METAKUBE_CLUSTER_ID} metakube-${METAKUBE_CLUSTER_ID}-pod
openstack vpn ike policy create metakube-${METAKUBE_CLUSTER_ID} --ike-version v2 --auth-algorithm sha256 --encryption-algorithm aes-256 --pfs group14
openstack vpn ipsec policy create metakube-${METAKUBE_CLUSTER_ID} --auth-algorithm sha256 --encryption-algorithm aes-256 --pfs group14
openstack vpn service create metakube-${METAKUBE_CLUSTER_ID} --router metakube-${METAKUBE_CLUSTER_ID}
openstack vpn endpoint group create metakube-${METAKUBE_CLUSTER_ID}-local-epg --type subnet --value metakube-${METAKUBE_CLUSTER_ID} --value metakube-${METAKUBE_CLUSTER_ID}-pod
openstack vpn endpoint group create metakube-${METAKUBE_CLUSTER_ID}-peer-epg --type cidr --value 192.168.1.0/24 --value 172.1.0.0/16
cluster-2
METAKUBE_CLUSTER_ID=clusterID-2
openstack security group rule create --ingress --remote-ip '192.168.0.0/24' metakube-${METAKUBE_CLUSTER_ID}
openstack security group rule create --ingress --remote-ip '172.0.0.0/15' metakube-${METAKUBE_CLUSTER_ID}
openstack subnet create --network metakube-${METAKUBE_CLUSTER_ID} --subnet-range 172.1.0.0/16 --gateway none metakube-${METAKUBE_CLUSTER_ID}-pod
openstack port create --network metakube-${METAKUBE_CLUSTER_ID} --fixed-ip subnet=metakube-${METAKUBE_CLUSTER_ID}-pod,ip-address=172.1.255.254 metakube-${METAKUBE_CLUSTER_ID}-pod
openstack router add port metakube-${METAKUBE_CLUSTER_ID} metakube-${METAKUBE_CLUSTER_ID}-pod
openstack vpn ike policy create metakube-${METAKUBE_CLUSTER_ID} --ike-version v2 --auth-algorithm sha256 --encryption-algorithm aes-256 --pfs group14
openstack vpn ipsec policy create metakube-${METAKUBE_CLUSTER_ID} --auth-algorithm sha256 --encryption-algorithm aes-256 --pfs group14
openstack vpn service create metakube-${METAKUBE_CLUSTER_ID} --router metakube-${METAKUBE_CLUSTER_ID}
openstack vpn endpoint group create metakube-${METAKUBE_CLUSTER_ID}-local-epg --type subnet --value metakube-${METAKUBE_CLUSTER_ID} --value metakube-${METAKUBE_CLUSTER_ID}-pod
openstack vpn endpoint group create metakube-${METAKUBE_CLUSTER_ID}-peer-epg --type cidr --value 192.168.0.0/24 --value 172.0.0.0/16
cluster-1
Grab the public IP of the openstack vpn service from cluster-2:
METAKUBE_CLUSTER_ID=clusterID-2
PEER_ADDRESS=$(openstack vpn service show metakube-${METAKUBE_CLUSTER_ID} -f value -c external_v4_ip)
and connect the vpn site from cluster-1 side:
PEER_METAKUBE_CLUSTER_ID=clusterID-2
openstack vpn ipsec site connection create metakube-${PEER_METAKUBE_CLUSTER_ID} \
--vpnservice metakube-${METAKUBE_CLUSTER_ID} \
--ikepolicy metakube-${METAKUBE_CLUSTER_ID} \
--ipsecpolicy metakube-${METAKUBE_CLUSTER_ID} \
--local-endpoint-group metakube-${METAKUBE_CLUSTER_ID}-local-epg \
--local-id left-peer.domain.example \
--peer-address ${PEER_ADDRESS} \
--peer-id right-peer.domain.example \
--peer-endpoint-group metakube-${METAKUBE_CLUSTER_ID}-peer-epg \
--psk secret
cluster-2
Grab the public IP of the openstack vpn service from cluster-1:
METAKUBE_CLUSTER_ID=clusterID-1
PEER_ADDRESS=$(openstack vpn service show metakube-${METAKUBE_CLUSTER_ID} -f value -c external_v4_ip)
and connect the vpn site from cluster-2 side:
PEER_METAKUBE_CLUSTER_ID=clusterID-1
openstack vpn ipsec site connection create metakube-${PEER_METAKUBE_CLUSTER_ID} \
--vpnservice metakube-${METAKUBE_CLUSTER_ID} \
--ikepolicy metakube-${METAKUBE_CLUSTER_ID} \
--ipsecpolicy metakube-${METAKUBE_CLUSTER_ID} \
--local-endpoint-group metakube-${METAKUBE_CLUSTER_ID}-local-epg \
--local-id right-peer.domain.example \
--peer-address ${PEER_ADDRESS} \
--peer-id left-peer.domain.example \
--peer-endpoint-group metakube-${METAKUBE_CLUSTER_ID}-peer-epg \
--psk secret
Create externalName services
cluster-1
LB_IP=$(kubectl --kubeconfig kubeconfig-cluster-2 get -n kube-system service clustermesh-apiserver -o jsonpath='{.status.loadBalancer.ingress[0].ip}')
kubectl -n kube-system create service externalname cluster-${PEER_METAKUBE_CLUSTER_ID} --external-name ${LB_IP}.nip.io
cluster-2
LB_IP=$(kubectl --kubeconfig kubeconfig-cluster-1 get -n kube-system service clustermesh-apiserver -o jsonpath='{.status.loadBalancer.ingress[0].ip}')
kubectl -n kube-system create service externalname cluster-${PEER_METAKUBE_CLUSTER_ID} --external-name ${LB_IP}.nip.io
Connect the clusters
cluster-1
PEER_METAKUBE_CLUSTER_ID=clusterID-2
helm template cilium cilium/cilium \
--version v1.18.7 \
--namespace kube-system \
--set cluster.name=${METAKUBE_CLUSTER_ID} \
--set cluster.id=1 \
--set clustermesh.useAPIServer=true \
--set clustermesh.config.enabled=true \
--set clustermesh.apiserver.tls.authMode=cluster \
--set clustermesh.apiserver.kvstoremesh.enabled=true \
--set clustermesh.apiserver.service.type=LoadBalancer \
--set 'clustermesh.config.clusters[0].name='${PEER_METAKUBE_CLUSTER_ID} \
--set 'clustermesh.config.clusters[0].port=2379' \
--set 'clustermesh.config.clusters[0].address=cluster-'${PEER_METAKUBE_CLUSTER_ID}'.kube-system' |\
yq -cr -y 'select((.kind == "Secret") and (.metadata.name | test("cilium-(clustermesh|kvstoremesh)")))' | \
kubectl apply -f -
cluster-2
PEER_METAKUBE_CLUSTER_ID=clusterID-1
helm template cilium cilium/cilium \
--version v1.18.7 \
--namespace kube-system \
--set cluster.name=${METAKUBE_CLUSTER_ID} \
--set cluster.id=2 \
--set clustermesh.useAPIServer=true \
--set clustermesh.config.enabled=true \
--set clustermesh.apiserver.tls.authMode=cluster \
--set clustermesh.apiserver.kvstoremesh.enabled=true \
--set clustermesh.apiserver.service.type=LoadBalancer \
--set 'clustermesh.config.clusters[0].name='${PEER_METAKUBE_CLUSTER_ID} \
--set 'clustermesh.config.clusters[0].port=2379' \
--set 'clustermesh.config.clusters[0].address=cluster-'${PEER_METAKUBE_CLUSTER_ID}'.kube-system' |\
yq -cr -y 'select((.kind == "Secret") and (.metadata.name | test("cilium-(clustermesh|kvstoremesh)")))' | \
kubectl apply -f -
Troubleshooting
-
cilium endpoint
kubectl -n kube-system exec -it daemonset/cilium -- cilium-dbg endpoint list | grep app=http-echo 622 Disabled Disabled 66711 k8s:app=http-echoIf you don't see the expected endpoint try to roll overt the cilium DS
-
clustermesh service
-
cilium agent connecting to the local kvstoremesh
kubectl -n kube-system exec -it daemonset/cilium -- cilium-dbg troubleshoot clustermesh Found 1 cluster configurations Cluster "clusterID-2": 📄 Configuration path: /var/lib/cilium/clustermesh/clusterID-2 🔌 Endpoints: - https://clustermesh-apiserver.kube-system.svc:2379 ✅ Hostname resolved to: 10.240.22.125 ✅ TCP connection successfully established to 10.240.22.125:2379 ✅ TLS connection successfully established to 10.240.22.125:2379 ℹ️ Negotiated TLS version: TLS 1.3, ciphersuite TLS_AES_128_GCM_SHA256 ℹ️ Etcd server version: 3.6.6 🔑 Digital certificates: ✅ TLS Root CA certificates: - Serial number: fb:62:5c:15:b9:20:12:e7:70:3b:6d:12:ee:7d:cc:14 Subject: CN=Cilium CA Issuer: CN=Cilium CA Validity: Not before: 2026-01-12 15:25:04 +0000 UTC Not after: 2029-01-11 15:25:04 +0000 UTC ✅ TLS client certificates: - Serial number: 2c:41:b7:b3:0d:94:29:67:6a:c1:80:9b:2a:ea:e7:14 Subject: CN=local-clusterID-1 Issuer: CN=Cilium CA Validity: Not before: 2026-01-12 16:03:35 +0000 UTC Not after: 2029-01-11 16:03:35 +0000 UTC ⚙️ Etcd client: ✅ Etcd connection successfully established ℹ️ Etcd cluster ID: f159592cd2603408 -
kvstoremesh connecting to peer cluster
kubectl -n kube-system exec -ti deployment/clustermesh-apiserver -c kvstoremesh -- clustermesh-apiserver kvstoremesh-dbg troubleshoot Found 1 cluster configurations Cluster "clusterID-2": 📄 Configuration path: /var/lib/cilium/clustermesh/clusterID-2 🔌 Endpoints: - https://clusterID-2.mesh.cilium.io:2379 ✅ Hostname resolved to: 109.68.229.69 ✅ TCP connection successfully established to 109.68.229.69:2379 ✅ TLS connection successfully established to 109.68.229.69:2379 ℹ️ Negotiated TLS version: TLS 1.3, ciphersuite TLS_AES_128_GCM_SHA256 ℹ️ Etcd server version: 3.6.6 🔑 Digital certificates: ✅ TLS Root CA certificates: - Serial number: fb:62:5c:15:b9:20:12:e7:70:3b:6d:12:ee:7d:cc:14 Subject: CN=Cilium CA Issuer: CN=Cilium CA Validity: Not before: 2026-01-12 15:25:04 +0000 UTC Not after: 2029-01-11 15:25:04 +0000 UTC ✅ TLS client certificates: - Serial number: d2:77:54:8b:e6:b4:91:3d:cd:b5:3f:fe:ad:b3:d2:84 Subject: CN=remote Issuer: CN=Cilium CA Validity: Not before: 2026-01-12 16:03:35 +0000 UTC Not after: 2029-01-11 16:03:35 +0000 UTC ⚙️ Etcd client: ✅ Etcd connection successfully established ℹ️ Etcd cluster ID: 7711876c9ebb329e -
overall cluster health
6. try etcdctlkubectl -n kube-system exec -it daemonset/cilium -- cilium-dbg status --all-clusters KVStore: Disabled Kubernetes: Ok 1.34 (v1.34.2) [linux/amd64] Kubernetes APIs: ["EndpointSliceOrEndpoint", "cilium/v2::CiliumCIDRGroup", "cilium/v2::CiliumClusterwideNetworkPolicy", "cilium/v2::CiliumEndpoint", "cilium/v2::CiliumNetworkPolicy", "cilium/v2::CiliumNode", "core/v1::Pods", "networking.k8s.io/v1::NetworkPolicy"] KubeProxyReplacement: True [ens3 192.168.0.10 fe80::f816:3eff:fe26:8da8 (Direct Routing)] Host firewall: Disabled SRv6: Disabled CNI Chaining: none CNI Config file: successfully wrote CNI configuration file to /host/etc/cni/net.d/05-cilium.conflist Cilium: Ok 1.18.4 (v1.18.4-afda2aa9) NodeMonitor: Listening for events on 2 CPUs with 64x4096 of shared memory Cilium health daemon: Ok IPAM: IPv4: 9/254 allocated from 172.0.1.0/24, ClusterMesh: 1/1 remote clusters ready, 1 global-services clusterID-2: ready, 2 nodes, 19 endpoints, 0 identities, 1 services, 0 MCS-API service exports, 0 reconnections (last: never) └ etcd: 1/1 connected, leases=0, lock leases=0, has-quorum=true: endpoint status checks are disabled, ID: f159592cd2603408 └ remote configuration: expected=true, retrieved=true, cluster-id=2, kvstoremesh=true, sync-canaries=true, service-exports=disabled └ synchronization status: nodes=true, endpoints=true, identities=true, services=true IPv4 BIG TCP: Disabled IPv6 BIG TCP: Disabled BandwidthManager: Disabled Routing: Network: Native Host: BPF Attach Mode: TCX Device Mode: veth Masquerading: BPF [ens3] 172.0.0.0/16 [IPv4: Enabled, IPv6: Disabled] Controller Status: 59/59 healthy Proxy Status: OK, ip 172.0.1.149, 0 redirects active on ports 10000-20000, Envoy: external Global Identity Range: min 65536, max 131071 Hubble: Ok Current/Max Flows: 4095/4095 (100.00%), Flows/s: 36.02 Metrics: Ok Encryption: Disabled Cluster health: 4/4 reachable (2026-01-13T09:02:48Z) (Probe interval: 2m11.83347464s) Name IP Node Endpoints Modules Health: Stopped(16) Degraded(3) OK(84)
export PEER_CLUSTER=clusterID-2 ; etcdctl \
--endpoints=$(kubectl -n kube-system get secret cilium-kvstoremesh -o json | jq -cr '.data | with_entries(select(.key | test("etcd") | not)) | ."'${PEER_CLUSTER}'" | @base64d' | yq -c ".endpoints?[]") \
--cacert=<(kubectl -n kube-system get secret clustermesh-apiserver-remote-cert -o json | jq -cr '.data."ca.crt" | @base64d') \
--cert =(kubectl -n kube-system get secret clustermesh-apiserver-remote-cert -o json | jq -cr '.data."tls.crt" | @base64d') \
--key =(kubectl -n kube-system get secret clustermesh-apiserver-remote-cert -o json | jq -cr '.data."tls.key" | @base64d') \
get --prefix cilium/state/