Configuration Updates
openshift/release
repository are propagated to the CI clusters.Various long-running services deployed in the CI clusters operate on the
configuration files in the openshift/release
repository. This document
describes how that information is made available to those services and updated
when changes are made. This information can be used as a guide for writing
services that consume those files. It also describes the problems with previous
strategies and the solutions adopted.
ConfigMap
mounts
The primary mechanism used to give services access to the contents of the
repository are Kubernetes volumes,
specifically ConfigMap
volume mounts.
The update process for these mounts involves several Kubernetes and test-infra
components and is divided in the following steps:
- A pull request is merged in
openshift/release
in Github. - The
updateconfig
Prow plugin is triggered by the merge event delivered via a web hook. It updates theConfigMap
s in the cluster with the new contents of the files according to its configuration. - The
kubelet
in the node where the service instances are deployed sees theConfigMap
s have been updated and recomputes the contents of the mount directory. - The
AtomicWriter
component of thekubelet
updates the contents of the directory to match the new contents of the mount. - The service somehow (e.g. using the
test-infra
configuration agent package) watches the mount directory and responds to those changes.
updateconfig
This plugin
is configured by files under the Prow configuration directory,
core-services/prow/02_config
.
The openshift/release/_pluginconfig.yaml
file enables it for the repository:
while _plugins.yaml
configures it via the top-level config_updater
key. It is configured to
populate several ConfigMap
s in the clusters from the contents of the
repository:
The process
by which ConfigMap
s in the cluster are reconciled with the PR changes is:
- calculate the list of changes made by the PR
- determine whether changes were made to files listed in the configuration
- for each
ConfigMap
whose input files were changed 0. fetch the existing content from the cluster, if it already exists- merge the content of the changed files with the existing one
- update the
ConfigMap
in the cluster
config-bootstrapper
A second process, the openshift-release-master-config-bootstrapper
periodic Prow job, also performs this procedure every hour using the
config-bootstrapper
program, which shares most of its code with the plugin. The job is not
triggered by a PR, so all configured files are loaded as if the repository had
just been created (hence its name). It is meant to continually ensure the
content in openshift/release
can be used to recreate the clusters from
nothing.
Note that there are race conditions inherent to how the updateconfig
plugin
works and interacts with other executions of itself and with the periodic job.
However, they haven’t been observed in production so far, in part because of how
Tide generally operates. Details are documented in
this Jira issue and its associated
links.
kubelet
The kubelet
is the Kubernetes process present in each physical node responsible for
creating/monitoring/managing containers according to the Pod
specifications in
the cluster. It is the intermediary between the Kubernetes core and the
container runtime in each node.
Its general mode of operation is to monitor Pod
resources in the cluster (and
its own static Pod
s) and constantly reconcile the containers in the node to
reflect the specification in etcd
received from the API server. One aspect of
this responsibility is to configure volume mounts according to the configuration
in the specification and the latest contents of its inputs.
Several types of volumes are available to be mounted in a container. Volume
types are implemented as plugins in Kubernetes and the kubelet
:
- https://github.com/kubernetes/kubernetes/blob/v1.23.4/pkg/volume/plugins.go#L141
- https://github.com/kubernetes/kubernetes/blob/v1.23.4/pkg/volume/volume.go#L30
- https://github.com/kubernetes/kubernetes/blob/v1.23.4/pkg/volume/configmap/configmap.go#L44
Beyond the initial volume mount setup, the kubelet
also keeps dynamic volume
mounts updated.
These include ConfigMap
, Secret
, projected, and other types of mounts, which
are all implemented similarly. These updates happen at a predefined frequency
specified in the kubelet
configuration.
The default, used in all of our clusters, is 1m
.
Note
The kubelet
configuration for cluster nodes can be displayed with a script
such as:
|
|
That is, at regular intervals the kubelet
looks at its view of the cluster
resources and decides whether volume mounts reflect the desired state or have to
be updated. It uses its own local cache to make this decision, whose update is
also configurable but always happens asynchronously with respect to this
process.
AtomicWriter
Eventually, all plugins which expose volumes as a directory
make use
of the AtomicWriter
component, which propagates the changes in an atomic
manner (for some definition of “atomic”) to the container’s file system. The
plugins fetch the information required and assemble it in the form of a
directory, passing it to the writer for the final file system update.
The update algorithm is described in detail in the source code:
|
|
Processes interested in updates to the volume mount can watch the ..data
symbolic link to be notified when the directory is updated. The update to that
file is done using the rename(2)
system call, which guarantees the atomicity of the update process (this is what
is referred to as “atomic” in the documentation).
One implicit assumption in this scheme is that the application responding to
updates will be able to process the contents of the new directory in time. If a
ConfigMap
is updated in rapid succession, it may happen that the mount is
updated while the old contents are still being used (this may happen even for
well-behaved programs, e.g. if there is sufficient load in the node where it is
being executed).
There is no provision to guarantee that the contents of the mount survive long
enough for an application to process them in time before a new update removes
the files. Even worse, this grace period during which the process can process
the mount is a configurable parameter of the kubelet
, as described previously,
so it cannot in general be determined. It is not difficult (and has happened in
the past) to make innocent changes to the code which loads and processes these
configuration files and inadvertently increase the runtime by an order of
magnitude. It may even happen gradually (as has also happened), without notice,
as the size of the input grows with the number of repositories supported.
Note
The problem of concurrent updates and resource reclamation in particular is a known “hard” computer science problem, and certainly requires two cooperating processes (not unlike advisory file locking in Unix systems). See this article on Wikipedia for a general discussion and this LWN article for an example of how this type of problem can be solved.Projected volumes
An additional problem is present when multiple ConfigMap
s are assembled into a
single mount, as is done for ci-operator-configresolver
,
stemming from the fact that Kubernetes in general operates under an eventually
consistent concurrency model.
This is because there is no guarantee of the order in which the updates to each
of the constituents of the mount will be perceived. The kubelet
update loop,
dictated by its configured update frequency, establishes a point in time where
the external state is collected and propagated to the volume mounts. It may
decide to do so between updates to the various objects used to assemble the
mount. There is furthermore no guarantee that updates will be seen in the
same order they were originally made in.
git-sync
More recently, git-sync
has been
used for configuration updates. It is a collocated container inside the main
service Pod
which maintains a local git
repository clone synchronized with a
remote, and its mode of operation is very similar to the kubelet
/AtomicWriter
process described
above. However, it has significant advantages over ConfigMap
-based updates:
- The entirety of the local contents of the repository are updated atomically,
eliminating the problems caused by trying to aggregate data from multiple
ConfigMap
s. - It bypasses the size limitation of
ConfigMap
s, eliminating the need to fetch data from multiple sources in the first place.
File system updates are done using a variation of the AtomicWriter
protocol:
- The remote history for the selected refs is checked for updates using
git ls-remote
. - New revisions are pulled using
git fetch
. - A work tree directory based on the latest revision is created using
git worktree add
. - The primary path (a symlink) is replaced using the same process used by
AtomicWriter
: a temporary link is created and moved into place atomically usingrename(2)
. Services can monitor changes to this link in the same manner. - The previous work tree is removed.
The interval between each update cycle is controlled by the --wait
parameter,
which is analogous to the kubelet
’s syncFrequency
configuration. Because of
this, it suffers from the same directory reclamation problem.