Retry

4 minute read

Starting from v1.3, Envoy Gateway supports [HTTPRoute Retries(GEP-1731)][], this setting in the core Gateway API takes precedence over the BackendTrafficPolicy configuration.

A retry setting specifies the maximum number of times an Envoy proxy attempts to connect to a service if the initial call fails. Retries can enhance service availability and application performance by making sure that calls don’t fail permanently because of transient problems such as a temporarily overloaded service or network. The interval between retries prevents the called service from being overwhelmed with requests.

Envoy Gateway supports the following retry settings:

NumRetries: is the number of retries to be attempted. Defaults to 2.
RetryOn: specifies the retry trigger condition.
PerRetryPolicy: is the retry policy to be applied per retry attempt.

Envoy Gateway introduces a new CRD called BackendTrafficPolicy that allows the user to describe their desired retry settings. This instantiated resource can be linked to a Gateway, HTTPRoute or GRPCRoute resource.

Note: There are distinct circuit breaker counters for each BackendReference in an xRoute rule. Even if a BackendTrafficPolicy targets a Gateway, each BackendReference in that gateway still has separate circuit breaker counter.

Prerequisites

Follow the steps below to install Envoy Gateway and the example manifest. Before proceeding, you should be able to query the example backend using HTTP.

Expand for instructions

Install the Gateway API CRDs and Envoy Gateway using Helm:

helm install eg oci://docker.io/envoyproxy/gateway-helm --version v1.7.0 -n envoy-gateway-system --create-namespace

Install the GatewayClass, Gateway, HTTPRoute and example app:

kubectl apply -f https://github.com/envoyproxy/gateway/releases/download/v1.7.0/quickstart.yaml -n default

Verify Connectivity:

Get the External IP of the Gateway:

export GATEWAY_HOST=$(kubectl get gateway/eg -o jsonpath='{.status.addresses[0].value}')

Curl the example app through Envoy proxy:

curl --verbose --header "Host: www.example.com" http://$GATEWAY_HOST/get

The above command should succeed with status code 200.

Get the name of the Envoy service created the by the example Gateway:

export ENVOY_SERVICE=$(kubectl get svc -n envoy-gateway-system --selector=gateway.envoyproxy.io/owning-gateway-namespace=default,gateway.envoyproxy.io/owning-gateway-name=eg -o jsonpath='{.items[0].metadata.name}')

Get the deployment of the Envoy service created the by the example Gateway:

export ENVOY_DEPLOYMENT=$(kubectl get deploy -n envoy-gateway-system --selector=gateway.envoyproxy.io/owning-gateway-namespace=default,gateway.envoyproxy.io/owning-gateway-name=eg -o jsonpath='{.items[0].metadata.name}')

Port forward to the Envoy service:

kubectl -n envoy-gateway-system port-forward service/${ENVOY_SERVICE} 8888:80 &

Curl the example app through Envoy proxy:

curl --verbose --header "Host: www.example.com" http://localhost:8888/get

The above command should succeed with status code 200.

Test and customize retry settings

Before applying a BackendTrafficPolicy with retry setting to a route, let’s test the default retry settings.

curl -v -H "Host: www.example.com" "http://${GATEWAY_HOST}/status/500"

It will return 500 response immediately.

*   Trying 172.18.255.200:80...
* Connected to 172.18.255.200 (172.18.255.200) port 80
> GET /status/500 HTTP/1.1
> Host: www.example.com
> User-Agent: curl/8.4.0
> Accept: */*
>
< HTTP/1.1 500 Internal Server Error
< date: Fri, 01 Mar 2024 15:12:55 GMT
< content-length: 0
<
* Connection #0 to host 172.18.255.200 left intact

Let’s create a BackendTrafficPolicy with a retry setting.

The request will be retried 5 times with a 100ms base interval and a 10s maximum interval.

cat <<EOF | kubectl apply -f -
apiVersion: gateway.envoyproxy.io/v1alpha1
kind: BackendTrafficPolicy
metadata:
  name: retry-for-route
spec:
  targetRefs:
    - group: gateway.networking.k8s.io
      kind: HTTPRoute
      name: backend
  retry:
    numRetries: 5
    perRetry:
      backOff:
        baseInterval: 100ms
        maxInterval: 10s
      timeout: 250ms
    retryOn:
      httpStatusCodes:
        - 500
      triggers:
        - connect-failure
        - retriable-status-codes
EOF

Save and apply the following resource to your cluster:

---
apiVersion: gateway.envoyproxy.io/v1alpha1
kind: BackendTrafficPolicy
metadata:
  name: retry-for-route
spec:
  targetRefs:
    - group: gateway.networking.k8s.io
      kind: HTTPRoute
      name: backend
  retry:
    numRetries: 5
    perRetry:
      backOff:
        baseInterval: 100ms
        maxInterval: 10s
      timeout: 250ms
    retryOn:
      httpStatusCodes:
        - 500
      triggers:
        - connect-failure
        - retriable-status-codes

Execute the test again.

curl -v -H "Host: www.example.com" "http://${GATEWAY_HOST}/status/500"

It will return 500 response after a few while.

*   Trying 172.18.255.200:80...
* Connected to 172.18.255.200 (172.18.255.200) port 80
> GET /status/500 HTTP/1.1
> Host: www.example.com
> User-Agent: curl/8.4.0
> Accept: */*
>
< HTTP/1.1 500 Internal Server Error
< date: Fri, 01 Mar 2024 15:15:53 GMT
< content-length: 0
<
* Connection #0 to host 172.18.255.200 left intact

Let’s check the stats to see the retry behavior.

egctl x stats envoy-proxy -n envoy-gateway-system -l gateway.envoyproxy.io/owning-gateway-name=eg,gateway.envoyproxy.io/owning-gateway-namespace=default | grep "envoy_cluster_upstream_rq_retry{envoy_cluster_name=\"httproute/default/backend/rule/0\"}"

You will expect to see the stats.

envoy_cluster_upstream_rq_retry{envoy_cluster_name="httproute/default/backend/rule/0"} 5

Feedback

Was this page helpful?

Glad to hear it! Please tell us how we can improve.

Sorry to hear that. Please tell us how we can improve.