Multi-Stage Tests and the Test Step Registry
The multistage test style in the ci-operator
is a modular test design that allows users to create new tests by combining smaller, individual test steps.
These individual steps can be put into a shared registry that other tests can access. This results in test workflows that are easier to maintain and
upgrade as multiple test workflows can share steps and don’t have to each be updated individually to fix bugs or add new features. It also reduces the
chances of a mistake when copying a feature from one test workflow to another.
The current step registry is available for browsing here.
To understand how the multistage tests and registry work, we must first talk about the three components of the test registry and how to use those components to create a test:
- Step: A step is the lowest level component in the test step registry. It describes an individual test step.
- Chain: A chain is a registry component that specifies multiple steps to be run. Any item of the chain can be either a step or another chain.
- Workflow: A workflow is the highest level component of the step registry. It contains three chains: pre, test, post.
Step
A step is the lowest level component in the test registry. A step defines a base container image, the filename of the shell script to run inside the container, the resource requests and limits for the container, and documentation for the step. Example of a step:
|
|
A step may be referred to in chains, workflows, and ci-operator
configs.
Configuring the Container Image For a Step
The container image used to run a test step can be configured in one of two ways: by referencing an image tag otherwise present in the configuration or by explicitly referencing an image tag present on the build farm.
Referencing Another Configured Image
A step may execute in a container image already present in the ci-operator
configuration file by identifying the tag with the from
configuration field. Steps should use this mechanism to determine the container image they run in when that image will vary with the
code under test. For example, the container image could have contents from the code under test (like src); similarly, the image may
need to contain a component matching the version of OpenShift used in the test (like installer). When using this configuration option,
ensure that the tag is already present in one of the following places:
- a pipeline image
- an external image
- an image built by
ci-operator
- an image imported from a release ImageStream
Note
Static validation for this field is limited because the set of images originating from the releaseImageStream
is only known at runtime.Referencing a Literal Image
A step may also be configured to use an available ImageStreamTag
on the build farm where the test is executed by specifying the details for the tag
with the from_image
configuration field. A step should use this option when the version of the container image to be used does not vary with the
code under test or the version of OpenShift being tested. Using the from_image
field is synonymous with importing the image as a base_image
and
referencing the tag with the from
field, but allows the step definition to be entirely self-contained. The following example of a step configuration
uses this option:
|
|
Commands
The commands file must contain shell script in a shell language supported by the shellcheck
program used to validate the commands. However,
regardless of the shell language used for the commands, the web UI will syntax highlight all commands as bash.
Note: the shell script file must follow the naming convention described later in this help page.
Resources
A step accepts resource requests
and limits
for its container’s configuration.
There is more information about resources here.
Warning
The resources for the pod running a test step might be overwritten by CI automation based on the historical data of the job and the limited hardware capability.Configuring a Custom shm-size
If it is necessary to increase the shared memory size (the default is 64m) of a Multi-Stage test, then the resources
can be modified to add
the ci-operator.openshift.io/shm
resource size. Note that this will not be propagated to the container itself,
but will simply resize the dshm
volume.
Note
Thelimits
and requests
must be set to the same amountStep Execution Environment
While a step simply defines a set of commands to run in a container image, by virtue of executing within a ci-operator
workflow, the commands
have a number of special considerations for their execution environment. The commands can expect a set of environment variables to exist that
inform them of the context in which they run. Commands in steps can communicate to other steps via a shared directory in their filesystem.
Available Environment Variables
The following environment variables will be available to commands in a step:
Variable | Definition | When is it Present? |
---|---|---|
${OPENSHIFT_CI} | Set to "true" , should be used to detect that a script is running in a ci-operator environment. | Always. |
${SHARED_DIR} | Directory on the step’s filesystem where files shared between steps can be read and written. | Always. |
${ARTIFACT_DIR} | Directory on the step’s filesystem where files should be placed to persist them in the job’s artifacts. | Always. |
${CLUSTER_PROFILE_DIR} | Directory on the step’s filesystem where credentials and configuration from the cluster profile are stored. | When the test as defined in a ci-operator configuration file sets a cluster_profile. |
${KUBECONFIG} | Path to system:admin credentials for the ephemeral OpenShift cluster under test. | After an ephemeral cluster has been installed. |
${KUBEADMIN_PASSWORD_FILE} | Path to the kubeadmin password file. | After an ephemeral cluster has been installed. |
${RELEASE_IMAGE_INITIAL} | Image pull specification for the initial release payload snapshot when the test began to run. | When the test imports or builds an initial release. See the docs. |
${RELEASE_IMAGE_LATEST} | Image pull specification for the ephemeral release payload used to install the ephemeral OpenShift cluster. | When the test imports or builds a latest release. . See the docs. |
${LEASED_RESOURCE} | The name of the resource leased to grant access to cloud quota. See below. | When the test requires a lease. |
${IMAGE_FORMAT} | The registry location from which images built or imported for this test may be pulled. | Always except claiming a cluster. Deprecated, use dependencies to provide tests with fully resolved pull specifications of images. |
In addition to these variables, commands will also have a number of other environment variables available to them from
ci-operator
through leases, parameters and dependencies.
A further set of environment variables are made available by Prow;
if a job is using these variables, however, it may be an indication that some level of encapsulation has been broken and that a more
straightforward approach exists to achieve the same outcome.
Sharing Data Between Steps
Steps can communicate between each other by using a shared directory on their filesystems. This directory is available for test processes via
${SHARED_DIR}
. When the process finishes executing, the contents of that directory will be copied and will be available to following steps.
New data will overwrite previous data, absent files will be removed. The underlying mechanism for this uses Kubernetes concepts; therefore,
the total amount of data that can be shared is capped at 1MB and only a flat file structure is permissible: no sub-directories are supported.
Steps are more commonly expected to communicate between each other by using state in the OpenShift cluster under test. For instance, if a step
installs some components or changes configuration, a later step could check for that as a pre-condition by using oc
or the API to view the
cluster’s configuration.
Note
The${SHARED_DIR}
may only contain files. No directories or nested structures are supported.A Note on $KUBECONFIG
In the default execution environment, commands run in steps will be given the $KUBECONFIG
environment variable to allow them to interact with
the ephemeral cluster that was created for testing. It is required that any steps which execute a cluster installation publish the resulting
configuration file to $SHARED_DIR/kubeconfig
to allow the ci-operator
to correctly propagate this configuration to subsequent steps.
Exposing Artifacts
Steps can commit artifacts to the output of a job by placing files at the ${ARTIFACT_DIR}
. These artifacts will be available for a
job under artifacts/job-name/step-name/
. The logs of each container in a step will also be present at that location.
Injecting Custom Credentials
Steps can inject custom credentials by adding configuration that identifies which secrets hold the credentials and where the data should be mounted
in the step. For instance, to mount the my-data secret into the step’s filesystem at /var/run/my-data
, a step could be configured in a literal
ci-operator
configuration, or in the step’s configuration in the registry in the following manner:
Registry step configuration:
|
|
Warning
Access to read these secrets from the namespace configured must be granted separately from the configuration being added to a step. By default, only secrets in thetest-credentials
namespace will be available for mounting into test steps. Please follow the secret-management
documentation to set up a custom secret in that namespace.Injecting the oc
CLI
Steps can make the oc
CLI available to their commands by adding the cli
configuration item to the test step, specifying which OpenShift
release the CLI should be sourced from. The ci-operator
configuration must use or releases
to configure which
release payloads the oc
CLI may be injected from. For example, the following configuration pulls in a CentOS image, configures a custom OCP
release using releases
and runs a test where the release’s CLI is injected to the test step.
|
|
Opting Out of ServiceAccount
Credentials
By default, the Pod
in which a step runs will have ServiceAccount
credentials mounted in order to update the $SHARED_DIR
and expose the
$KUBECONFIG
. If your test does not use either of these features, and the presence of in-cluster configuration is not desired, this may be
turned off:
|
|
Chain
A chain is a registry component that specifies multiple registry components to be run. Components are run in the order that they are written. Components specified by a chain can be either steps and other chains. Example of a chain:
|
|
Workflow
A workflow is the highest level component of the step registry. It is almost identical to the syntax of the ci-operator
configuration for
multistage tests and defines an entire test from start to finish. It has four basic components: a cluster_profile
string
(eg: aws
, azure4
, gcp
), and three chains: pre
, test
, and post
. The pre
chain is intended to be used to set up a testing environment
(such as creating a test cluster), the test
chain is intended to contain all tests that a job wants to run, and the post
chain is intended
to be used to clean up any resources created/used by the test. If a step in pre
or test
fails, all pending pre
and test
steps are skipped
and all post
steps are run to ensure that resources are properly cleaned up. This is an example of a workflow configuration
|
|
ci-operator
Test Configuration
The ci-operator
test configuration syntax for multistage tests is very similar to the registry workflow syntax. The main differences are that
the ci-operator
configuration does not have a documentation field, and the ci-operator configuration can specify a workflow to use. Also,
the cluster_profile
, pre
, test
, and post
fields are under a steps field instead of workflow. Here is an example of the tests
section of a ci-operator
configuration using the multistage test design:
In this example, the ci-operator
configuration simply specifies the desired cluster profile and the origin-e2e
workflow shown in the
example for the Workflow
section above.
Since the ci-operator
configuration and workflows share the same fields, it is possible to override fields specified in a workflow.
In cases where both the workflow and a ci-operator
configuration specify the same field, the ci-operator
configuration’s field has
priority (i.e. the value from the ci-operator
configuration is used). List and mapping fields have a few special rules, described
in the hierarchical propagation section.
Example of a ci-operator
configuration that overrides a workflow field:
The configuration can also override a workflow field with a full literal step (not only a reference to a shared step):
Options to Change Control Flow
Skipping post
Steps On Success
ci-operator
can be configured to skip some or all post
steps when all test
steps pass. Skipping a post
step when all tests have
passed may be useful to skip gathering artifacts and save some time at the end of the multistage test. In order to allow steps to be
skipped in a test, the allow_skip_on_success
field must be set in the steps
configuration. Individual post
steps opt into being
skipped by setting the optional_on_success
field. This is an example:
|
|
Marking post
Steps Best-Effort
ci-operator
can be configured to run post
steps in best-effort mode, meaning that failures in these steps will not cause the overall
test to fail. Running a post
-step in best-effort mode may be useful when the step is used to gather debugging information or otherwise
is useful but should not cause the job to fail if it does not complete correctly. In order to run post
steps in best-effort mode, the
best_effort
field must be set to true
in the configuration for an individual step and the allow_best_effort_post_steps
setting must
be set at the workflow or job level. For example:
|
|
Registry Layout and Naming Convention
To prevent naming collisions between all the registry components, the step registry has a very strict naming scheme and directory layout.
First, all components have a prefix determined by the directory structure, similar to how the ci-operator
configs do. The prefix is
the relative directory path with all /
characters changed to -
. For example, a file under the ipi/install/conf
directory would have
as prefix of ipi-install-conf
. If there is a workflow
, chain
, or step
in that directory, the as field for that component
would need to be the same as the prefix. Further, only one of step
, chain
, or workflow can be in a subdirectory
(otherwise there would be a name conflict).
After the prefix, we apply a suffix based on what the file is defining. These are the suffixes for the four file types that exist in the registry:
- Step:
-ref.yaml
- Step command script:
-commands.sh
- Chain:
-chain.yaml
- Workflow:
-workflow.yaml
Continuing the example above, a step in the ipi/install/conf
subdirectory would have a filename of ipi-install-conf-ref.yaml
and the
command would be ipi-install-conf-commands.sh
.
Other files that are allowed in the step registry but are not used for testing are OWNERS
files and files that end in .md
.
Parameters
Steps and chains can declare parameters they expect to consume in their env
section. These can then be set to different values to generate tests that have
small variations between them. Different tests can be generated by setting
different values, which can make generating simple variations easier. More
complex combinations are encouraged to use separate steps instead.
Note
Parameters are meant to be used to create different test variations. If a simple environment variable that is uniform across all tests is all that is required, it can be declared directly in the test script.In the context of the step registry, parameters are used in two distinct scenarios, described in the following sections: declared by the test step author as inputs, or set by the test author.
Declaring step parameters
Each parameter declaration in the env
section consists of the following
fields:
name
: environment variable namedefault
(optional): the value assigned if none is provideddocumentation
(optional): a textual description of the parameter. Markdown supported.
Parameters are declared in the env
section (note that the placement of this
section varies depending on the component type, see common
mistakes). The simplest form of declaration in a step is:
TEST_SUITE
is declared as an input parameter to the step and will be available
at runtime as an environment variable. Different tests can set the parameter to
different values to create test variations.
Omitting a default value makes TEST_SUITE
a required parameter. A test that
wishes to use this step must give the parameter a value in its corresponding
env
section — failing to do so will result in a validation error.
If a parameter has a sensible default value, it can be declared in the step:
Using this step with the default value no longer requires an env
section in
the test, but one can be used to override it:
Workflows can similarly set parameter values. The format of their env
section
is the same as that of a test:
Tests can then use the workflow instead and dispense the env
section:
For more advanced uses of parameters and overrides, see the hierarchical propagation section.
Setting parameter values
Once a registry component exists that declares one or more parameters, it can be
used by other components and tests. Components and their parameters can be found
either directly in the step registry directory in
openshift/release
or via the step registry web page. The latter
shows what parameters are available for each type (follow the links for
examples):
- Steps list their input parameters, along with the default values if they exist.
- Chains and workflows list their parameters as well as all parameters that are declared in their child components (other chains and steps).
Assuming a preexisting step declared as:
Note
These examples are simplified versions for illustrative purposes of the components present in the registry. See their original definitions for the full contents and their documentation for intended usage.A test can use this step directly:
The default value declared in the step will be used for the TEST_SUITE
parameter. If desired, it can be overridden with an env
section in the test
declaration:
For more advanced uses of parameters and overrides, see the hierarchical propagation section.
Common mistakes
Step/chain/workflow/test does not accept env
field
Verify that the env
field is placed correctly. Note that it is a top-level
field in steps and chains, alongside the as
field, while it is placed in the
steps
field in tests and workflows. The strict YAML validation used to parse
these files will generate an error, but this is still a common source of
confusion.
Parameter is not set
Parameters must be declared in the env
section of every step that requires
them. Setting values in parent components is not sufficient. Basic compliance
with this rule is enforced and simple cases of unused values will result in
validation errors, but not all can be detected, resulting in parameter values
not being set.
In this case, TEST_SUITE
will be set in openshift-e2e-test
since it is
declared in its env
section, but will not be set in
step-with-no-parameters
. If that is desired, a similar env
section should be
added to that step as well. Note that this case evades the unused parameter
validation since at least one step declares that it uses the relevant parameter.
Leases
Tests can acquire leases for cloud quota (described in this page) in two different ways:
Implicit Lease Configuration with cluster_profile
A test that declares a cluster_profile
implicitly adds a requirement for a
lease. The type of lease is pre-configured and determined automatically based
on the cluster profile.
Explicit Lease Configuration
Tests that have more complex requirements can configure lease acquisition
explicitly with a leases
section. Each entry should have the following
fields:
resource_type
: one of the resource types declared in theboskos
configuration.env
name of the environment variable through which the name of the leased resource will be exposed to the test. If acount
is specified, the variable will contain multiple names separated by spaces.count
: an optional number of resources of the specified type to lease. Defaults to1
.
Every step in the test will have access to the AWS_LEASED_RESOURCE
and
GCP_LEASED_RESOURCES
environment variables, which will contain the name of the
resource(s) acquired. AWS_LEASED_RESOURCE
will contain a single resource
name, while GCP_LEASED_RESOURCES
will contain the name of the three resources
separated by space, as described above.
Leases can be configured in references and chains. Contrary to parameters, lease configuration applies to the test as a whole: all declared leases will be acquired before the execution of the steps, and are held throughout its entirety. The environment variable name in each lease configuration entry must be unique for the entire test.
|
|
Hierarchical Propagation
Some fields of individual steps can be changed by the chains, workflows, and test definitions that include them. Those are: parameters, dependencies, and leases.
Values set in parent elements will be propagated down the hierarchy. That is: a
variable in the env
section of a chain will propagate to all of its sub-chains
and sub-steps, a variable in the env
section of a workflow or test will
propagate to all of its stages. The same applies for dependencies and leases.
Warning
As described in their individual sections, parameters and dependencies must be declared in all steps that use them: setting values in parent components is not sufficient. Basic compliance with this rule is enforced and simple cases of unused values will result in validation errors, but not all can be detected, resulting in parameter values not being set.One special rule applies to list and mapping fields that are specified both in a test and its workflow. Instead of completely overriding the workflow value, as is the case for scalar values, the two sections are merged according to the following rules:
- Parameters and dependencies declared in the test override those in the workflow if they target the same environment variable. Otherwise, the resulting parameter list is the combination of both sections.
- Leases declared in the test must not target an environment variable already present in the workflow. Otherwise, the resulting lease list is the combination of both sections.
Examples
This section contains more exotic examples not present in Parameters or elsewhere.
Tests and workflows
Starting from a step that declares a parameter:
As has already been described, a test or workflow can include the step. Without
any additional env
sections, the default value declared in the step — the
lowest level of the hierarchy, but the highest that declares a value — will be
used.
In all these examples, the TEST_SUITE
parameter will be set to the default
value declared in the step.
Test/workflow override
If a value different from the parameter’s default is desired, it can be declared in either the workflow or the test.
Here, the previously declared workflow is used and the value is set in the test. Alternatively, it could be set in the workflow.
Including a step and giving its parameters values are independent actions, so yet another option would be:
Test overrides workflow
Following the propagation rules described previously in this section, even if a
workflow defines a parameter value, the test can still choose to override it.
The value in the test’s env
section will be used, as it is at a higher level
in the hierarchy.
This would be safe even if the workflow declared more variables: the two sections are merged as expected.
Chains
Chains introduce additional levels of propagation. They can also declare parameters, dependencies, and leases, which override those declared in their steps. Because they can be arbitrarily nested, more complex overriding patterns can be constructed.
The examples in test/workflow override could be rewritten using chains.
VPN connection
For platforms that need access to restricted environments, ci-operator
supports adding a dedicated VPN connection to each test step. Since this is a
requirement of specific platforms, it is enabled when a
cluster profile
for one of those platforms is used. This process is transparent to the test
command: when a VPN connection is requested at the test level, it is set up
automatically by the test platform, which ensures the connection is available
throughout the execution of each step. No changes are required to individual
tests.
More details about the interaction between the test steps and the VPN client can be found in the cluster profile documentation.