Note!In the examples below we use the
Backend Disruptiontests, but the same will hold true for the alerts durations.
To measure our ability to provide upgrades to OCP clusters with minimal downtime the Disruption Testing framework monitors select backends and records disruptions in the backend service availability. This document serves as an overview of the framework used to provide disruption testing and how to configure new disruption tests when needed
Matcher Code Implementation
Now that we have a better understanding of how the disruption test data is generated and updated, let’s discuss how the code makes use of it.
The origin/pkg/synthetictests/allowedbackenddisruption/query_results.json file that we updated previously is embedded into the
openshift-tests binary. At runtime, we ingest the raw data and create a
historicaldata.NewMatcher() object which implements the
The core logic of the current best matcher will check if we have an exact match in the historical data. An exact match is one that contains the same
Backend Name and JobType. When we don’t have an exact match, we make a best guess effort by doing a fuzzy match for data we don’t have. Fuzzy matching is done by iterating through all the
nextBestGuessers and stopping at the first match that fits our criteria and checking if it’s contained in the data set.
Default Next Best Guessers
Next Best Guessers are functions that can be chained together and will return either a
false if the current
JobType matches the desired logic. In the code snippet below, we check if
MicroReleaseUpgrade matches the current
JobType, if false, we continue down the list. The combine helper function gives you the option to chain and compose a more sophisticated check. In the example below, if we can do a PreviousReleaseUpgrade the result of that will be fed into MicroReleaseUpgrade and if no function returns
false during this chain, we have successfully fuzzy matched and can now check the historical data has information for this match.
Adding new disruption tests
Currently disruption tests are focused on disruptions created during upgrades. To add a new backend to monitor during the upgrade test Add a new backendDisruptionTest
via NewBackendDisruptionTest to the e2e upgrade AllTests.
If this is a completely new backend being tested then query_results data will need to be added or, if preferable, NewBackendDisruptionTestWithFixedAllowedDisruption can be used instead of NewBackendDisruptionTest and the allowable disruption hardcoded.
Updating test data
Disruption test framework overview
To check for disruptions while upgrading OCP clusters
- The tests are defined by AllTests
- The disruption is defined by clusterUpgrade
- These are passed into disruption.Run
- Which creates a new Chaosmonkey and executes the disruption monitoring tests and the disruption
- The backendDisruptionTest is responsible for
- Creating the event broadcaster, recorder and monitor
- Attempting to query the backend and timing out after the max interval (1 second typically)
- Analyzing the disruption events for disruptions that exceed allowable values
- When the disruption is complete the disruptions tests are validated via Matches / BestMatcher to find periods that exceed allowable thresholds