Circuit Breakers
3 minute read
Envoy circuit breakers can be used to fail quickly and apply back-pressure in response to upstream service degradation.
Envoy Gateway supports the following circuit breaker thresholds:
- Concurrent Connections: limit the connections that Envoy can establish to the upstream service. When this threshold is met, new connections will not be established, and some requests will be queued until an existing connection becomes available.
- Concurrent Requests: limit on concurrent requests in-flight from Envoy to the upstream service. When this threshold is met, requests will be queued.
- Pending Requests: limit the pending request queue size. When this threshold is met, overflowing requests will be terminated with a
503
status code.
Envoy’s circuit breakers are distributed: counters are not synchronized across different Envoy processes. The default Envoy and Envoy Gateway circuit breaker threshold values (1024) may be too strict for high-throughput systems.
Envoy Gateway introduces a new CRD called BackendTrafficPolicy that allows the user to describe their desired circuit breaker thresholds. This instantiated resource can be linked to a Gateway, HTTPRoute or GRPCRoute resource.
Note: There are distinct circuit breaker counters for each BackendReference
in an xRoute
rule. Even if a BackendTrafficPolicy
targets a Gateway
, each BackendReference
in that gateway still has separate circuit breaker counter.
Prerequisites
Install Envoy Gateway
- Follow the installation step from the Quickstart to install Envoy Gateway and sample resources.
Install the hey load testing tool
- The
hey
CLI will be used to generate load and measure response times. Follow the installation instruction from the Hey project docs.
Test and customize circuit breaker settings
This example will simulate a degraded backend that responds within 10 seconds by adding the ?delay=10s
query parameter to API calls. The hey tool will be used to generate 100 concurrent requests.
hey -n 100 -c 100 -host "www.example.com" http://${GATEWAY_HOST}/?delay=10s
Summary:
Total: 10.3426 secs
Slowest: 10.3420 secs
Fastest: 10.0664 secs
Average: 10.2145 secs
Requests/sec: 9.6687
Total data: 36600 bytes
Size/request: 366 bytes
Response time histogram:
10.066 [1] |■■■■
10.094 [4] |■■■■■■■■■■■■■■■
10.122 [9] |■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
10.149 [10] |■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
10.177 [10] |■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
10.204 [11] |■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
10.232 [11] |■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
10.259 [11] |■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
10.287 [11] |■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
10.314 [11] |■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
10.342 [11] |■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
The default circuit breaker threshold (1024) is not met. As a result, requests do not overflow: all requests are proxied upstream and both Envoy and clients wait for 10s.
In order to fail fast, apply a BackendTrafficPolicy
that limits concurrent requests to 10 and pending requests to 0.
cat <<EOF | kubectl apply -f -
apiVersion: gateway.envoyproxy.io/v1alpha1
kind: BackendTrafficPolicy
metadata:
name: circuitbreaker-for-route
spec:
targetRef:
group: gateway.networking.k8s.io
kind: HTTPRoute
name: backend
namespace: default
circuitBreaker:
maxPendingRequests: 0
maxParallelRequests: 10
EOF
Execute the load simulation again.
hey -n 100 -c 100 -host "www.example.com" http://${GATEWAY_HOST}/?delay=10s
Summary:
Total: 10.1230 secs
Slowest: 10.1224 secs
Fastest: 0.0529 secs
Average: 1.0677 secs
Requests/sec: 9.8785
Total data: 10940 bytes
Size/request: 109 bytes
Response time histogram:
0.053 [1] |
1.060 [89] |■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
2.067 [0] |
3.074 [0] |
4.081 [0] |
5.088 [0] |
6.095 [0] |
7.102 [0] |
8.109 [0] |
9.115 [0] |
10.122 [10] |■■■■
With the new circuit breaker settings, and due to the slowness of the backend, only the first 10 concurrent requests were proxied, while the other 90 overflowed.
- Overflowing Requests failed fast, reducing proxy resource consumption.
- Upstream traffic was limited, alleviating the pressure on the degraded service.
Feedback
Was this page helpful?
Glad to hear it! Please tell us how we can improve.
Sorry to hear that. Please tell us how we can improve.