The api-syncagent is a Kubernetes agent responsible for integrating external Kubernetes clusters. It runs on a Kubernetes cluster, is configured with credentials to a kcp instance and will then synchronize data out of kcp (i.e. out of kcp workspaces) onto the local cluster, and vice versa.
The intended usecase follows roughly these steps:
- A user in kcp with sufficient permissions creates an
APIExport
object and provides appropriate credentials for the Sync Agent (e.g. by creating a Kubernetes Secret with a preconfigured kubeconfig in it). - A service owner will now take these credentials and the configured API group (the
APIExport
's name) and use them to setup the Sync Agent. It is assumed that the service owner (i.e. the cluster-admin in a service cluster) wants to make some resources (usually CRDs) available to use inside of kcp. - The service owner uses the Sync Agent Helm chart (or similar deployment technique) to install the Sync Agent in their cluster.
- To actually make resources available in kcp, the service owner now has to create a set of
PublishedResource
objects. The configuration happens from their point of view, meaning they define how to publish a CRD to kcp, defining renaming rules and other projection settings. - Once a
PublishedResource
is created in the service cluster, the Sync Agent will pick it up, find the referenced CRD, convert/project this CRD into anAPIResourceSchema
(ARS) for kcp and then create the ARS in org workspace. - Finally the Sync Agent will take all
PublishedResources
and bundle them into the pre-existingAPIExport
in the org workspace. This APIExport can then be bound in the org workspace itself (or later any workspaces (depending on permissions)) and be used there. - kcp automatically provides a virtual workspace for the
APIExport
and this is what the Sync Agent then uses to watch all objects for the relevant resources in kcp (i.e. in all workspaces). - The Sync Agent will now begin to synchronize objects back and forth between the service cluster and kcp.
It might be a bit confusing at first: The PublishedResource
CRD describes the world from the
standpoint of a service owner, i.e. a person or team that owns a Kubernetes cluster and is tasked
with making their CRDs available in kcp (i.e. "publish" them).
However the actual data flow later will work in the opposite direction: users creating objects inside
their kcp workspaces serve as the source of truth. From there they are synced down to the service
cluster, which is doing the projection of the PublishedResource
in reverse.
Of course additional, auxiliary (related) objects could originate on the service cluster. For example
if you create a Certificate object in a kcp workspace and it's synced down, cert-manager will then
acquire the certificate and create a Kubernetes Secret
, which will have to be synced back up (into
a kcp workspace, where the certificate originated from). So the source of truth can also be, for
auxiliary resources, on the service cluster.
Each Sync Agent must have a name, like "nora" or "oskar". The FQ name for a Sync Agent is
<agentname>.<apigroup>
, so if the user in kcp had created a new APIExport
named
databases.examplecorp
, the name of the Sync Agent that serves this Service (sic) could be
nora.databases.examplecorp
.
A single APIExport
in kcp must only be processed by exactly 1 Sync Agent. There is currently no
mechanism planned to subdivide an APIExport
into shards, where multiple service clusters (and
therefore multiple Sync Agents) could process each shard.
Later the Sync Agent might be extended with Label Selectors, alternatively they might also "claim" any object by annotating it in the kcp workspace. These things are not yet worked out, so for now we have this 1:1 restriction.
Sync Agents make use of leader election, so it's perfectly fine to have multiple Sync Agent replicas, as long as only one them is leader and actually doing work.
controller-runtime can be used in a "kcp-aware" mode, where the cache, clients, mappers etc. are aware of the workspace information. This however is neither well tested upstream and the code would require shard-admin permissions to behave like this work regular kcp workspaces. The controller-runtime fork's kcp-awareness is really more geared towards working in virtual workspaces.
Because of this the Sync Agent needs to get a kubeconfig to kcp that already points to the APIExport
's
workspace (i.e. the server
URL already contains a /clusters/root:myorg
path). The basic
controllers in the Sync Agent then treat this as a plain ol', regular Kubernetes cluster
(no kcp-awareness).
To this end, the Sync Agent will, upon startup, try to access the cluster
object in the target
workspace. This is to resolve the cluster name (e.g. root:myorg
) into a logicalcluster name (e.g.
gibd3r1sh
). The Sync Agent has to know which logicalcluster the target workspace represents in order
to query resources properly.
Only the controllers that are later responsible for interacting with the virtual workspace are kcp-aware. They have to be in order to know what workspace a resource is living in.
A PublishedResource
describes which CRD should be made available inside kcp. The CRD name can be
projected (i.e. renamed), so a kubermatic.k8c.io/v1 Cluster
can become a
cloud.examplecorp/v1 KubernetesCluster
.
In addition to projecting (mapping) the GVK, the PublishedResource
also contains optional naming
rules, which influence how the local objects that the Sync Agent is creating are named.
As a single Sync Agent serves a single service, the API group used in kcp is the same for all
PublishedResources
. It's the API group configured in the APIExport
inside kcp (created in step 1
in the overview above).
To prevent chaos, PublishedResources
are immutable: handling the case that a PR first wants to
publish kubermatic.k8c.io/v1 Cluster
and then suddenly kubermatic.k8c.io/v1 User
resources would
mean to re-sync and cleanup everything in all affected kcp workspaces. The Sync Agent would need to be
able to delete and recreate objects to follow this GVK change, which is a level of complexity we
simply do not want to deal with at this point in time. Also, APIResourceSchemas
are immutable
themselves.
More information is available in the [Publishing Resources][publish-resources.md] guide.
An APIExport
in kcp combines multiple APIResourceSchemas
(ARS). Each ARS is created based on a
PublishedResource
in the service cluster.
To prevent data loss, ARS are never removed from an APIExport
. We simply do not have enough
experience to really know what happens when an ARS would suddenly become unavailable. To prevent
damage and confusion, the Sync Agent will only ever add new ARS to the one APIExport
it manages.
The Sync Agent consists of a number of independent controllers.
This controller aggregates the PublishedResources
and manages a single APIExport
in kcp.
This controller takes PublishedResources
, projects and converts them and creates APIResourceSchemas
in kcp.
This controller watches the APIExport
and waits for the virtual workspace to become available. It
also watches all PublishedResources
(PRs) and reconciles when any of them is changed (they are
immutable, but the controller is still reacting to any events on them).
The controller will then setup a controller-runtime Cluster
abstraction for the virtual workspace
and then start many sync
controllers (one for each PublishedResource
). Whenever PRs change, the
syncmanager will make sure that the correct set of sync
controller is running.
This is where the meat and potatoes happen. The sync controller is started for a single
PublishedResource
and is responsible for synchronizing all objects for that resource between the
local service cluster and kcp.
The sync
controller was written to handle a single PublishedResource
so that it does not have to
deal with dynamically registering/stopping watches on its own. Instead the sync controller can be
written as more or less "normal" controller-runtime controller.