CI Internals

A detailed view of some aspects of the implementation of CI services, intended for developers working on their source code.

This section describes in detail some aspects of the implementation of ci-operator and associated programs, with the intent of serving as an auxiliary guide for developers working with the openshift/ci-tools repository.

ci-operator has a number of foundational principles:

  • don’t do work twice
  • do the least amount of work needed
  • hide complexity from the end user

To achieve this, ci-operator requires configuration to understand the build process for each component as well as every output container image. This document overviews the workflow that ci-operator uses to build components and structure tests.

Every invocation of ci-operator creates a workspace to isolate test execution, seed it with build inputs and the published component images from the OpenShift release if the component under test is a part of one, then schedule test workflows as Kubernetes and OpenShift objects.

Execution graph

ci-operator is at its core a task scheduling program. The input configuration is processed and used to build a task graph, which is then executed until completion, failure, or interruption. Thus, the execution flow of ci-operator can be divided in these major phases:

  1. input processing
  2. task graph creation
  3. task graph execution
  4. cleanup

In the code base, these phases correspond to the following modules in the pkg directory:

  • api: Go types used by all phases
  • load: I/O operations for types
  • registry: configuration resolution using the step registry
  • config: input configuration processing
  • validation: input configuration validation
  • defaults: mapping from inputs to tasks
  • steps: definitions for each task type, task execution

Input Resolution

To avoid repeating work, ci-operator needs to determine when work can be re-used. The tool identifies a build of any specific job with a hash of:

  • job metadata (refs to test, clone configuration)
  • configuration (YAML configuration for refs under test)
  • other inputs (levels of input tagged images)

With such an identifier, ci-operator can determine if two builds are using the same configuration on the same inputs and can therefore re-use common work. This identifier is used to create the Kubernetes Namespace in which the test workloads will run and is furthermore available to tests via the NAMESPACE environment variable.

Input resolution can be identified in the ci-operator output by all of the steps that precede the creation of the test Namespace:

INFO[2022-07-11T17:03:22Z] ci-operator version v20220708-ca6de370c
INFO[2022-07-11T17:03:22Z] Loading configuration from https://config.ci.openshift.org for openshift/ci-tools@master
INFO[2022-07-11T17:03:22Z] Resolved source https://github.com/openshift/ci-tools to master@ca6de370, merging: #2883 3530bc01 @smg247
INFO[2022-07-11T17:03:23Z] Using namespace https://console-openshift-console.apps.build04.34d2.p2.openshiftapps.com/k8s/cluster/projects/ci-op-022dqmlr

Most log messages during input resolution have debug priority, so the log file is more informative in this case:

{"level":"info","msg":"ci-operator version v20220708-ca6de370c","time":"2022-07-11T17:03:22Z"}
{"level":"info","msg":"Loading configuration from https://config.ci.openshift.org for openshift/ci-tools@master","time":"2022-07-11T17:03:22Z"}
{"level":"debug","msg":"performing request method GET url https://config.ci.openshift.org/config?branch=master&org=openshift&repo=ci-tools","time":"2022-07-11T17:03:22Z"}
{"level":"info","msg":"Resolved source https://github.com/openshift/ci-tools to master@ca6de370, merging: #2883 3530bc01 @smg247","time":"2022-07-11T17:03:22Z"}
{"level":"debug","msg":"Determining if build cache build-cache/openshift-ci-tools:master can be used in place of root ci/ci-tools-build-root:1.18","time":"2022-07-11T17:03:22Z"}
{"level":"debug","msg":"Resolved build cache build-cache/openshift-ci-tools:master to sha256:de666c2027b84ff2b1ca1a4cafb08959aadcd5e8ba9e6a40c2767bbda87d1599","time":"2022-07-11T17:03:22Z"}
{"level":"debug","msg":"Build cache build-cache/openshift-ci-tools:master is based on root image at sha256:f8c36a557d17e88976fea1349279a656e546b299d034e985b4ae43309003153d","time":"2022-07-11T17:03:22Z"}
{"level":"debug","msg":"Resolved root image ci/ci-tools-build-root:1.18 to sha256:f8c36a557d17e88976fea1349279a656e546b299d034e985b4ae43309003153d","time":"2022-07-11T17:03:22Z"}
{"level":"debug","msg":"Using build cache build-cache/openshift-ci-tools:master as root image.","time":"2022-07-11T17:03:22Z"}
{"level":"debug","msg":"Resolved build-cache/openshift-ci-tools:master (root) to sha256:de666c2027b84ff2b1ca1a4cafb08959aadcd5e8ba9e6a40c2767bbda87d1599.","time":"2022-07-11T17:03:22Z"}
{"level":"debug","msg":"Resolved origin/centos:stream8 (base_image: os) to sha256:ad7d81f622a590e73c34ec20b4ae6a0ff162b1e7306d121d3d634949bfae6b45.","time":"2022-07-11T17:03:22Z"}
{"level":"debug","msg":"Resolved ocp/4.10:cli (base_image: cli) to sha256:954f2ea4c53cfdc6b439ba691180b6a3fcba64cc7d71d49bf81f269066fe4af6.","time":"2022-07-11T17:03:22Z"}
{"level":"debug","msg":"Resolved ci/golangci-lint:v1.45.2 (base_image: golangci-lint) to sha256:50d12acf8ef7c41545ae8e1fe14cfc22e690baca7d2bb41d40da8feb8a78cabe.","time":"2022-07-11T17:03:23Z"}
{"level":"trace","msg":"Using binary as hash: /usr/bin/ci-operator 1657322166 69194954","time":"2022-07-11T17:03:23Z"}
{"level":"info","msg":"Using namespace https://console-openshift-console.apps.build04.34d2.p2.openshiftapps.com/k8s/cluster/projects/ci-op-022dqmlr","time":"2022-07-11T17:03:23Z"}

Namespace Initialization

The hash created from input resolution is used to create a Namespace as an isolated workspace for the test; the Namespace is subsequently initialized for use by the test workloads.

All input images for the tests that are described in the configuration YAML are tagged in, as are all images that form the larger release that the test is a part of. Images that are used for the build graph, like those identified with the base_images, base_rpm_images, and build_root stanzas, have ImageStreamTags created for them in the pipeline ImageStream in the test Namespace. Images that are part of the release that the test exists within, as specified with the optional releases stanza, are mirrored to ImageStreamTags in the stable ImageStream within the test Namespace.

In order to ensure that resources from tests do not leak on the cluster the tests are executed on, both hard and soft TTLs are set on the Namespace and the ci-ns-ttl-controller is used to enforce the TTLs and reap namespaces when TTLs have expired. Both a hard and a soft TTL are set on the namespaces; the hard TTL (cleanupDuration, currently 24 hours) describes how much time can pass after the creation of the Namespace before it is reaped, the soft TTL (idleCleanupDuration, currently 1 hour) describes how much time can pass without any active Pods in the Namespace before it is reaped. Whichever TTL is reached first triggers reaping.

Build Graph Traversal

A configuration file for ci-operator defines build steps, test targets and output images for a component git repository. A graph of build dependencies is built from this configuration in order to determine what concrete actions need to occur for any specific target. Each invocation of ci-operator specifies one or more --targets to execute; for each target, the build graph is traversed to execute dependent steps first.

The ci-operator configuration file creates some implicit build steps:

Output ImageStreamTagAction
pipeline:srcclones the refs under test
pipeline:binruns the binary_build_commands
pipeline:test-binruns the test_binary_build_commands
pipeline:rpmsruns the rpm_build_commands

Container image builds – whether from the implicit pipeline steps above or from explicit image build configurations in the images stanza, are executed using OpenShift Builds. Test targets in the tests stanza are executed using Kubernetes Pods. As all of the test workflow execution objects are created in a Namespace shared for all jobs with the same input, re-use is achieved by deterministic naming. For instance, the src Build that creates the pipeline:src ImageStreamTag will be created only once in a given Namespace; other builds of jobs that require this build step will see the Build running and wait for it to complete or see the ImageStreamTag existing and consider the build step finished.

Image Pipeline

Below is a graph showing the various pipeline images:

ci-operator image pipeline

Solid boxes are images, solid lines are dependencies. The dashed stable box represents the “internal” promotion to the stable stream prior to the execution of tests. Dashed lines represent edges not fully depicted since they are optional and can be added to any image in the pipeline:

  • Each entry in operator.substitutions makes src-bundle depend on that image.
  • The operator.base_index entry, if specified, makes all index generator images depend on that image.

Accessing CI component logs

Using AWS’s CloudWatch to query application logs for: Prow, CI services, Cluster Bot, Release Controller, etc..

Artifact gathering in Prow job and `ci-operator` tests

A description of the components and processes responsible for uploading test results and artifacts to long-term storage.

Config Resolver

A description of the ci-operator-configresolver service.

Configuration Updates

The process by which changes to files in the openshift/release repository are propagated to the CI clusters.

Dynamic Scheduling of Prowjobs

How Prowjob scheduler and Prowjob dispatcher work together to provide dynamic scheduling of prowjobs

Images in CI

Debugging the issues about the images in CI.

Observer Pods

Description of how observer pods work

Steps

A description of the various ci-operator task types.

Testing

Executing ci-operator outside of CI jobs.

Timeouts and Interruptions

A description of the implementation details of job and test timeouts and interruptions.

Last modified November 14, 2023: nit: remove duplicit entry (473b723)