Retry

A retry setting specifies the maximum number of times an Envoy proxy attempts to connect to a service if the initial call fails. Retries can enhance service availability and application performance by making sure that calls don’t fail permanently because of transient problems such as a temporarily overloaded service or network. The interval between retries prevents the called service from being overwhelmed with requests.

Envoy Gateway supports the following retry settings:

  • NumRetries: is the number of retries to be attempted. Defaults to 2.
  • RetryOn: specifies the retry trigger condition.
  • PerRetryPolicy: is the retry policy to be applied per retry attempt.

Envoy Gateway introduces a new CRD called BackendTrafficPolicy that allows the user to describe their desired retry settings. This instantiated resource can be linked to a Gateway, HTTPRoute or GRPCRoute resource.

Note: There are distinct circuit breaker counters for each BackendReference in an xRoute rule. Even if a BackendTrafficPolicy targets a Gateway, each BackendReference in that gateway still has separate circuit breaker counter.

Prerequisites

Follow the steps below to install Envoy Gateway and the example manifest. Before proceeding, you should be able to query the example backend using HTTP.

Expand for instructions
  1. Install the Gateway API CRDs and Envoy Gateway using Helm:

    helm install eg oci://docker.io/envoyproxy/gateway-helm --version v1.3.0 -n envoy-gateway-system --create-namespace
    
  2. Install the GatewayClass, Gateway, HTTPRoute and example app:

    kubectl apply -f https://github.com/envoyproxy/gateway/releases/download/v1.3.0/quickstart.yaml -n default
    
  3. Verify Connectivity:

    You can also test the same functionality by sending traffic to the External IP. To get the external IP of the Envoy service, run:

    export GATEWAY_HOST=$(kubectl get gateway/eg -o jsonpath='{.status.addresses[0].value}')
    

    Note: In certain environments, the load balancer may be exposed using a hostname, instead of an IP address. If so, replace ip in the above command with hostname.

    Curl the example app through Envoy proxy:

    curl --verbose --header "Host: www.example.com" http://$GATEWAY_HOST/get
    

    Get the name of the Envoy service created by the example Gateway:

    export ENVOY_SERVICE=$(kubectl get svc -n envoy-gateway-system --selector=gateway.envoyproxy.io/owning-gateway-namespace=default,gateway.envoyproxy.io/owning-gateway-name=eg -o jsonpath='{.items[0].metadata.name}')
    

    Port forward to the Envoy service:

    kubectl -n envoy-gateway-system port-forward service/${ENVOY_SERVICE} 8888:80 &
    

    Curl the example app through Envoy proxy:

    curl --verbose --header "Host: www.example.com" http://localhost:8888/get
    

Test and customize retry settings

Before applying a BackendTrafficPolicy with retry setting to a route, let’s test the default retry settings.

curl -v -H "Host: www.example.com" "http://${GATEWAY_HOST}/status/500"

It will return 500 response immediately.

*   Trying 172.18.255.200:80...
* Connected to 172.18.255.200 (172.18.255.200) port 80
> GET /status/500 HTTP/1.1
> Host: www.example.com
> User-Agent: curl/8.4.0
> Accept: */*
>
< HTTP/1.1 500 Internal Server Error
< date: Fri, 01 Mar 2024 15:12:55 GMT
< content-length: 0
<
* Connection #0 to host 172.18.255.200 left intact

Let’s create a BackendTrafficPolicy with a retry setting.

The request will be retried 5 times with a 100ms base interval and a 10s maximum interval.

cat <<EOF | kubectl apply -f -
apiVersion: gateway.envoyproxy.io/v1alpha1
kind: BackendTrafficPolicy
metadata:
  name: retry-for-route
spec:
  targetRefs:
    - group: gateway.networking.k8s.io
      kind: HTTPRoute
      name: backend
  retry:
    numRetries: 5
    perRetry:
      backOff:
        baseInterval: 100ms
        maxInterval: 10s
      timeout: 250ms
    retryOn:
      httpStatusCodes:
        - 500
      triggers:
        - connect-failure
        - retriable-status-codes
EOF

Save and apply the following resource to your cluster:

---
apiVersion: gateway.envoyproxy.io/v1alpha1
kind: BackendTrafficPolicy
metadata:
  name: retry-for-route
spec:
  targetRefs:
    - group: gateway.networking.k8s.io
      kind: HTTPRoute
      name: backend
  retry:
    numRetries: 5
    perRetry:
      backOff:
        baseInterval: 100ms
        maxInterval: 10s
      timeout: 250ms
    retryOn:
      httpStatusCodes:
        - 500
      triggers:
        - connect-failure
        - retriable-status-codes

Execute the test again.

curl -v -H "Host: www.example.com" "http://${GATEWAY_HOST}/status/500"

It will return 500 response after a few while.

*   Trying 172.18.255.200:80...
* Connected to 172.18.255.200 (172.18.255.200) port 80
> GET /status/500 HTTP/1.1
> Host: www.example.com
> User-Agent: curl/8.4.0
> Accept: */*
>
< HTTP/1.1 500 Internal Server Error
< date: Fri, 01 Mar 2024 15:15:53 GMT
< content-length: 0
<
* Connection #0 to host 172.18.255.200 left intact

Let’s check the stats to see the retry behavior.

egctl x stats envoy-proxy -n envoy-gateway-system -l gateway.envoyproxy.io/owning-gateway-name=eg,gateway.envoyproxy.io/owning-gateway-namespace=default | grep "envoy_cluster_upstream_rq_retry{envoy_cluster_name=\"httproute/default/backend/rule/0\"}"

You will expect to see the stats.

envoy_cluster_upstream_rq_retry{envoy_cluster_name="httproute/default/backend/rule/0"} 5

Last modified February 21, 2025: feat: support HPA in helm chart (#5127) (57c7e64)