This is the "latest" release of Envoy Gateway, which contains the most recent commits from the main branch.
This release might not be stable.
Please refer to the /docs documentation for the most current information.
Backend Utilization Load Balancing
6 minute read
BackendUtilization load balancing uses Open Resource Cost Application (ORCA) load metrics reported by the backend to dynamically weight endpoints. Under the hood it is implemented as Envoy’s client-side weighted round-robin policy: each endpoint’s weight is derived from the utilization metrics it emits, so instances running hot receive proportionally less traffic than those with headroom.
If no ORCA metrics are received from an endpoint, that endpoint is treated as evenly weighted.
See the Load Balancing concepts page for a deeper explanation of ORCA metric formats.
Prerequisites
- Your backend (or a sidecar in front of it) must emit ORCA load metrics as response headers or trailers. See Backend instrumentation below.
Follow the steps below to install Envoy Gateway and the example manifest. Before proceeding, you should be able to query the example backend using HTTP.
Expand for instructions
Install the Gateway API CRDs and Envoy Gateway using Helm:
helm install eg oci://docker.io/envoyproxy/gateway-helm --version v0.0.0-latest -n envoy-gateway-system --create-namespaceInstall the GatewayClass, Gateway, HTTPRoute and example app:
kubectl apply -f https://github.com/envoyproxy/gateway/releases/download/latest/quickstart.yaml -n defaultVerify Connectivity:
Get the External IP of the Gateway:
export GATEWAY_HOST=$(kubectl get gateway/eg -o jsonpath='{.status.addresses[0].value}')Curl the example app through Envoy proxy:
curl --verbose --header "Host: www.example.com" http://$GATEWAY_HOST/getThe above command should succeed with status code 200.
Get the name of the Envoy service created the by the example Gateway:
export ENVOY_SERVICE=$(kubectl get svc -n envoy-gateway-system --selector=gateway.envoyproxy.io/owning-gateway-namespace=default,gateway.envoyproxy.io/owning-gateway-name=eg -o jsonpath='{.items[0].metadata.name}')Get the deployment of the Envoy service created the by the example Gateway:
export ENVOY_DEPLOYMENT=$(kubectl get deploy -n envoy-gateway-system --selector=gateway.envoyproxy.io/owning-gateway-namespace=default,gateway.envoyproxy.io/owning-gateway-name=eg -o jsonpath='{.items[0].metadata.name}')Port forward to the Envoy service:
kubectl -n envoy-gateway-system port-forward service/${ENVOY_SERVICE} 8888:80 &Curl the example app through Envoy proxy:
curl --verbose --header "Host: www.example.com" http://localhost:8888/getThe above command should succeed with status code 200.
Build and Deploy the Example Backend
The Envoy Gateway repository includes a small HTTP server under examples/backend-utilization/ that emits a fixed ORCA cpu_utilization value (set via the ORCA_CPU_UTILIZATION environment variable) on every response. The example manifest deploys two sets of pods — one reporting 0.1 (idle) and one reporting 0.9 (hot) — behind a single Service. This lets you observe the weighting effect without wiring real load into a backend.
Note: The envoyproxy/gateway-backend-utilization image is not published to a public registry — you need to build it locally from a checkout of the Envoy Gateway repository.
Build the example backend image
make -C examples/backend-utilization docker-buildxMake the image available to your cluster
kind load docker-image --name envoy-gateway envoyproxy/gateway-backend-utilization:latestdocker tag envoyproxy/gateway-backend-utilization:latest $YOUR_DOCKER_REPO/gateway-backend-utilization:latest docker push $YOUR_DOCKER_REPO/gateway-backend-utilization:latestIf you push to your own registry, update the
image:field inexamples/kubernetes/backend-utilization.yamlto match before applying.Apply the example manifest (Service, two Deployments, HTTPRoute)
kubectl apply -f https://raw.githubusercontent.com/envoyproxy/gateway/latest/examples/kubernetes/backend-utilization.yaml -n default
Verify the two Deployments are ready:
kubectl get deployment/backend-utilization-low deployment/backend-utilization-high -n default
Configure BackendUtilization
Apply a BackendTrafficPolicy with loadBalancer.type: BackendUtilization:
cat <<EOF | kubectl apply -f -
apiVersion: gateway.envoyproxy.io/v1alpha1
kind: BackendTrafficPolicy
metadata:
name: backend-utilization
namespace: default
spec:
targetRefs:
- group: gateway.networking.k8s.io
kind: HTTPRoute
name: backend-utilization
loadBalancer:
type: BackendUtilization
backendUtilization:
blackoutPeriod: 1s # shorten so the demo shifts traffic quickly
weightUpdatePeriod: 500ms
EOF
---
apiVersion: gateway.envoyproxy.io/v1alpha1
kind: BackendTrafficPolicy
metadata:
name: backend-utilization
namespace: default
spec:
targetRefs:
- group: gateway.networking.k8s.io
kind: HTTPRoute
name: backend-utilization
loadBalancer:
type: BackendUtilization
backendUtilization:
blackoutPeriod: 1s # shorten so the demo shifts traffic quickly
weightUpdatePeriod: 500ms
Leaving backendUtilization: {} empty accepts the defaults, but the 10 s default blackoutPeriod means traffic will appear evenly split for the first 10 seconds of the test. The shorter values above make the weighting visible immediately. The backendUtilization field itself is required when type: BackendUtilization — omitting it will fail CEL validation.
Configuration Fields
All fields on backendUtilization are optional.
| Field | Default | Purpose |
|---|---|---|
blackoutPeriod | 10s | How long an endpoint must report metrics before its reported weight is trusted. Prevents traffic from shifting based on a single noisy sample. |
weightExpirationPeriod | 3m | If an endpoint stops reporting for this long, its reported weight is discarded and it reverts to the default weight. |
weightUpdatePeriod | 1s | How often Envoy recomputes the weight table. Values below 100ms are capped at 100ms. |
errorUtilizationPenaltyPercent | 0 | Multiplier (as percent × 100) applied to an endpoint’s effective utilization based on its error rate (eps/qps). 100 = 1.0×, 150 = 1.5×, 200 = 2.0×. Higher values push errant endpoints out of rotation faster. |
metricNamesForComputingUtilization | unset | Custom ORCA metric keys to feed into the weight formula when application_utilization isn’t reported. Use named_metrics.<key> for keys inside the ORCA proto’s named_metrics map. |
keepResponseHeaders | false | By default Envoy strips the ORCA headers/trailers before forwarding the response. Set to true to let downstream clients see them (useful for chained load balancers or debugging). |
Example: Tuned for a Bursty Backend
loadBalancer:
type: BackendUtilization
backendUtilization:
blackoutPeriod: 30s # ignore reports during slow-start
weightExpirationPeriod: 1m # shorter memory — react faster to silent endpoints
weightUpdatePeriod: 500ms # faster reweighting
errorUtilizationPenaltyPercent: 150 # 1.5× penalty for errant endpoints
Example: Application-Defined Utilization
If your backend reports a custom metric (for example, queue depth) instead of CPU utilization, wire it in through metricNamesForComputingUtilization:
loadBalancer:
type: BackendUtilization
backendUtilization:
metricNamesForComputingUtilization:
- named_metrics.queue_depth
The backend would then emit:
endpoint-load-metrics: TEXT named_metrics.queue_depth=0.42
Backend Instrumentation
Your backend must emit ORCA load metrics. Envoy accepts metrics in three formats on response headers or trailers:
| Format | Header | Payload |
|---|---|---|
| Binary | endpoint-load-metrics-bin | Base64-encoded serialized OrcaLoadReport proto |
| JSON | endpoint-load-metrics | JSON {"cpu_utilization": 0.3, "mem_utilization": 0.8} |
| TEXT | endpoint-load-metrics | TEXT cpu=0.3,mem=0.8,named_metrics.queue_depth=0.42 |
For gRPC backends, the xDS ORCA libraries emit these automatically via the orca_load_report service. For HTTP backends, add a response middleware that measures and serializes your CPU/memory/custom metrics on each response.
Combining With Zone-Aware Routing
BackendUtilization composes with weightedZones to produce locality-aware weighted round-robin (Envoy’s wrr_locality policy). See the WeightedZones example on the zone-aware routing page.
preferLocal is not supported with BackendUtilization.
Testing
Ensure the GATEWAY_HOST environment variable from the Quickstart is set. If not, follow the Quickstart instructions to set the variable.
Give Envoy a few seconds after applying the policy to collect ORCA samples and compute endpoint weights — until then, traffic will appear roughly even. Then send 200 requests and tally which deployment handled each. Because backend-utilization-low reports cpu_utilization=0.1 and backend-utilization-high reports 0.9, Envoy should weight the low pods roughly 9× more heavily.
for i in $(seq 1 200); do
curl -s -H "Host: www.example.com" "http://${GATEWAY_HOST}/backend-utilization" | jq -r '.pod'
done | sort | uniq -c
Expected output (exact counts will vary, but low should dominate ~9:1):
90 backend-utilization-low-6b9cf46b59-l7df7
87 backend-utilization-low-6b9cf46b59-xxrw2
12 backend-utilization-high-5fdb65cb87-mctlp
11 backend-utilization-high-5fdb65cb87-rrdvq
If you instead see a roughly even split, the weights may not have stabilized yet — wait a few seconds and retry. You can verify the per-endpoint weights directly through the Envoy admin interface:
ENVOY_POD=$(kubectl get pods -n envoy-gateway-system -l gateway.envoyproxy.io/owning-gateway-name=eg -o jsonpath='{.items[0].metadata.name}')
kubectl -n envoy-gateway-system port-forward pod/${ENVOY_POD} 19000:19000 &
curl -s localhost:19000/clusters | grep "backend-utilization" | grep weight
You should see weights roughly 10000 for the low pods and 1111 for the high pods (the inverse of the reported utilization).
Clean-Up
kubectl delete backendtrafficpolicy/backend-utilization
kubectl delete -f https://raw.githubusercontent.com/envoyproxy/gateway/latest/examples/kubernetes/backend-utilization.yaml -n default
Feedback
Was this page helpful?
Glad to hear it! Please tell us how we can improve.
Sorry to hear that. Please tell us how we can improve.