Testing Backends For Availability

This is an overview for how backends are queried for their availability status.

Overview Diagram

This diagram shows how backends are queried to determine their availability:

Query Backends1

1
	backendSampler, err := createKubeAPIMonitoringWithNewConnections(clusterConfig)
1
2
3
	disruptionSampler := newDisruptionSampler(b)
    go disruptionSampler.produceSamples(producerContext, interval)
	go disruptionSampler.consumeSamples(consumerContext, interval, monitorRecorder, eventRecorder)
  • (3) The produceSamples function is called to produce the disruptionSamples. This function is built around a Ticker that fires every 1 second. The checkConnection function is called to send an Http GET to the backend and look for a response from the backend.
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
    func (b *disruptionSampler) produceSamples(ctx context.Context, interval time.Duration) {
    	ticker := time.NewTicker(interval)
    	defer ticker.Stop()
    	for {
    		// the sampleFn may take a significant period of time to run.  In such a case, we want our start interval
    		// for when a failure started to be the time when the request was first made, not the time when the call
    		// returned.  Imagine a timeout set on a DNS lookup of 30s: when the GET finally fails and returns, the outage
    		// was actually 30s before.
    		currDisruptionSample := b.newSample(ctx)
    		go func() {
    			sampleErr := b.backendSampler.checkConnection(ctx)
    			currDisruptionSample.setSampleError(sampleErr)
    			close(currDisruptionSample.finished)
    		}()

    		select {
    		case <-ticker.C:
    		case <-ctx.Done():
      			return
    		}
    	}
    }
  • (4) The checkConnection function, produces disruptionSamples which represent the startTime of the Http GET and an associated sampleErr that trackes if the Http GET was successful (sampleErr set to nil) or failing (the error is saved). The disruptionSamples are stored in a slice referenced by the disruptionSampler.

  • (5) The consumeSamples function takes the disruptionSamples and determines when disruption started and stopped. It then records Events and records Intervals/Conditions on the monitorRecorder.

1
    func (b *disruptionSampler) consumeSamples(ctx context.Context, interval time.Duration, monitorRecorder Recorder, eventRecorder events.EventRecorder) {
  • (6) Intervals on the monitorRecorder are used by the synthetic tests.