1 - Gateway API Metrics

Resource metrics for Gateway API objects are available using the Gateway API State Metrics project. The project also provides example dashboard for visualising the metrics using Grafana, and example alerts using Prometheus & Alertmanager.

Prerequisites

Follow the steps from the Quickstart task to install Envoy Gateway and the example manifest. Before proceeding, you should be able to query the example backend using HTTP.

Verify the Gateway status:

kubectl get gateway/eg -o yaml
egctl x status gateway -v

Run the following commands to install the metrics stack, with the Gateway API State Metrics configuration, on your kubernetes cluster:

kubectl apply --server-side -f https://raw.githubusercontent.com/Kuadrant/gateway-api-state-metrics/main/config/examples/kube-prometheus/bundle_crd.yaml
kubectl apply -f https://raw.githubusercontent.com/Kuadrant/gateway-api-state-metrics/main/config/examples/kube-prometheus/bundle.yaml

Metrics and Alerts

To access the Prometheus UI, wait for the statefulset to be ready, then use the port-forward command:

# This first command may fail if the statefulset has not been created yet.
# In that case, try again until you get a message like 'Waiting for 2 pods to be ready...'
# or 'statefulset rolling update complete 2 pods...'
kubectl -n monitoring rollout status --watch --timeout=5m statefulset/prometheus-k8s
kubectl -n monitoring port-forward service/prometheus-k8s 9090:9090 > /dev/null &

Navigate to http://localhost:9090. Metrics can be queried from the ‘Graph’ tab e.g. gatewayapi_gateway_created See the Gateway API State Metrics README for the full list of Gateway API metrics available.

Alerts can be seen in the ‘Alerts’ tab. Gateway API specific alerts will be grouped under the ‘gateway-api.rules’ heading.

Note: Alerts are defined in a PrometheusRules custom resource in the ‘monitoring’ namespace. You can modify the alert rules by updating this resource.

Dashboards

To view the dashboards in Grafana, wait for the deployment to be ready, then use the port-forward command:

kubectl -n monitoring wait --timeout=5m deployment/grafana --for=condition=Available
kubectl -n monitoring port-forward service/grafana 3000:3000 > /dev/null &

Navigate to http://localhost:3000 and sign in with admin/admin. The Gateway API State dashboards will be available in the ‘Default’ folder and tagged with ‘gateway-api’. See the Gateway API State Metrics README for further information on available dashboards.

Note: Dashboards are loaded from configmaps. You can modify the dashboards in the Grafana UI, however you will need to export them from the UI and update the json in the configmaps to persist changes.

2 - Gateway Exported Metrics

The Envoy Gateway provides a collection of self-monitoring metrics in Prometheus format.

These metrics allow monitoring of the behavior of Envoy Gateway itself (as distinct from that of the EnvoyProxy it managed).

Watching Components

The Resource Provider, xDS Translator and Infra Manager etc. are key components that made up of Envoy Gateway, they all follow the design of Watching Components.

Envoy Gateway collects the following metrics in Watching Components:

NameDescription
watchable_depthCurrent depth of watchable map.
watchable_subscribe_duration_secondsHow long in seconds a subscribed watchable queue is handled.
watchable_subscribe_totalTotal number of subscribed watchable queue.
watchable_panics_recovered_totalTotal recovered panics in the watchable infrastructure.

Each metric includes the runner label to identify the corresponding components, the relationship between label values and components is as follows:

ValueComponents
gateway-apiGateway API Translator
infrastructureInfrastructure Manager
xds-serverxDS Server
xds-translatorxDS Translator
global-ratelimitGlobal RateLimit xDS Translator

Metrics may include one or more additional labels, such as message, status and reason etc.

Status Updater

Envoy Gateway monitors the status updates of various resources (like GatewayClass, Gateway and HTTPRoute etc.) through Status Updater.

Envoy Gateway collects the following metrics in Status Updater:

NameDescription
status_update_totalTotal number of status update by object kind.
status_update_duration_secondsHow long a status update takes to finish.

Each metric includes kind label to identify the corresponding resources.

xDS Server

Envoy Gateway monitors the cache and xDS connection status in xDS Server.

Envoy Gateway collects the following metrics in xDS Server:

NameDescription
xds_snapshot_create_totalTotal number of xds snapshot cache creates.
xds_snapshot_update_totalTotal number of xds snapshot cache updates by node id.
xds_stream_duration_secondsHow long a xds stream takes to finish.
  • For xDS snapshot cache update and xDS stream connection status, each metric includes nodeID label to identify the connection peer.
  • For xDS stream connection status, each metric also includes streamID label to identify the connection stream, and isDeltaStream label to identify the delta connection stream.

Infrastructure Manager

Envoy Gateway monitors the apply (create or update) and delete operations in Infrastructure Manager.

Envoy Gateway collects the following metrics in Infrastructure Manager:

NameDescription
resource_apply_totalTotal number of applied resources.
resource_apply_duration_secondsHow long in seconds a resource be applied successfully.
resource_delete_totalTotal number of deleted resources.
resource_delete_duration_secondsHow long in seconds a resource be deleted successfully.

Each metric includes the kind label to identify the corresponding resources being applied or deleted by Infrastructure Manager.

Metrics may also include name and namespace label to identify the name and namespace of corresponding Infrastructure Manager.

Wasm

Envoy Gateway monitors the status of Wasm remote fetch cache.

NameDescription
wasm_cache_entriesNumber of Wasm remote fetch cache entries.
wasm_cache_lookup_totalTotal number of Wasm remote fetch cache lookups.
wasm_remote_fetch_totalTotal number of Wasm remote fetches and results.

For metric wasm_cache_lookup_total, we are using hit label (boolean) to indicate whether the Wasm cache has been hit.

3 - Gateway Observability

Envoy Gateway provides observability for the ControlPlane and the underlying EnvoyProxy instances. This task show you how to config gateway control-plane observability, includes metrics.

Prerequisites

Follow the steps from the Quickstart to install Envoy Gateway and the example manifest. Before proceeding, you should be able to query the example backend using HTTP.

Envoy Gateway provides an add-ons Helm Chart, which includes all the needing components for observability. By default, the OpenTelemetry Collector is disabled.

Install the add-ons Helm Chart:

helm install eg-addons oci://docker.io/envoyproxy/gateway-addons-helm --version v0.0.0-latest --set opentelemetry-collector.enabled=true -n monitoring --create-namespace

Metrics

The default installation of Envoy Gateway installs a default EnvoyGateway configuration and attaches it using a ConfigMap. In this section, we will update this resource to enable various ways to retrieve metrics from Envoy Gateway.

Retrieve Prometheus Metrics from Envoy Gateway

By default, prometheus metric is enabled. You can directly retrieve metrics from Envoy Gateway:

export ENVOY_POD_NAME=$(kubectl get pod -n envoy-gateway-system --selector=control-plane=envoy-gateway,app.kubernetes.io/instance=eg -o jsonpath='{.items[0].metadata.name}')
kubectl port-forward pod/$ENVOY_POD_NAME -n envoy-gateway-system 19001:19001

# check metrics 
curl localhost:19001/metrics

The following is an example to disable prometheus metric for Envoy Gateway.

cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: ConfigMap
metadata:
  name: envoy-gateway-config
  namespace: envoy-gateway-system
data:
  envoy-gateway.yaml: |
    apiVersion: gateway.envoyproxy.io/v1alpha1
    kind: EnvoyGateway
    provider:
      type: Kubernetes
    gateway:
      controllerName: gateway.envoyproxy.io/gatewayclass-controller
    telemetry:
      metrics:
        prometheus:
          disable: true
EOF

Save and apply the following resource to your cluster:

---
apiVersion: v1
kind: ConfigMap
metadata:
  name: envoy-gateway-config
  namespace: envoy-gateway-system
data:
  envoy-gateway.yaml: |
    apiVersion: gateway.envoyproxy.io/v1alpha1
    kind: EnvoyGateway
    provider:
      type: Kubernetes
    gateway:
      controllerName: gateway.envoyproxy.io/gatewayclass-controller
    telemetry:
      metrics:
        prometheus:
          disable: true    

After updating the ConfigMap, you will need to wait the configuration kicks in.
You can force the configuration to be reloaded by restarting the envoy-gateway deployment.

kubectl rollout restart deployment envoy-gateway -n envoy-gateway-system

Enable Open Telemetry sink in Envoy Gateway

The following is an example to send metric via Open Telemetry sink to OTEL gRPC Collector.

cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: ConfigMap
metadata:
  name: envoy-gateway-config
  namespace: envoy-gateway-system
data:
  envoy-gateway.yaml: |
    apiVersion: gateway.envoyproxy.io/v1alpha1
    kind: EnvoyGateway
    provider:
      type: Kubernetes
    gateway:
      controllerName: gateway.envoyproxy.io/gatewayclass-controller
    telemetry:
      metrics:
        sinks:
          - type: OpenTelemetry
            openTelemetry:
              host: otel-collector.monitoring.svc.cluster.local
              port: 4317
              protocol: grpc
EOF

Save and apply the following resource to your cluster:

---
apiVersion: v1
kind: ConfigMap
metadata:
  name: envoy-gateway-config
  namespace: envoy-gateway-system
data:
  envoy-gateway.yaml: |
    apiVersion: gateway.envoyproxy.io/v1alpha1
    kind: EnvoyGateway
    provider:
      type: Kubernetes
    gateway:
      controllerName: gateway.envoyproxy.io/gatewayclass-controller
    telemetry:
      metrics:
        sinks:
          - type: OpenTelemetry
            openTelemetry:
              host: otel-collector.monitoring.svc.cluster.local
              port: 4317
              protocol: grpc    

After updating the ConfigMap, you will need to wait the configuration kicks in.
You can force the configuration to be reloaded by restarting the envoy-gateway deployment.

kubectl rollout restart deployment envoy-gateway -n envoy-gateway-system

Verify OTel-Collector metrics:

export OTEL_POD_NAME=$(kubectl get pod -n monitoring --selector=app.kubernetes.io/name=opentelemetry-collector -o jsonpath='{.items[0].metadata.name}')
kubectl port-forward pod/$OTEL_POD_NAME -n monitoring 19001:19001

# check metrics 
curl localhost:19001/metrics

4 - Proxy Access Logs

Envoy Gateway provides observability for the ControlPlane and the underlying EnvoyProxy instances. This task show you how to config proxy access logs.

Prerequisites

Follow the steps from the Quickstart to install Envoy Gateway and the example manifest. Before proceeding, you should be able to query the example backend using HTTP.

Envoy Gateway provides an add-ons Helm Chart, which includes all the needing components for observability. By default, the OpenTelemetry Collector is disabled.

Install the add-ons Helm Chart:

helm install eg-addons oci://docker.io/envoyproxy/gateway-addons-helm --version v0.0.0-latest --set opentelemetry-collector.enabled=true -n monitoring --create-namespace

By default, the Service type of loki is ClusterIP, you can change it to LoadBalancer type for further usage:

kubectl patch service loki -n monitoring -p '{"spec": {"type": "LoadBalancer"}}'

Expose endpoints:

LOKI_IP=$(kubectl get svc loki -n monitoring -o jsonpath='{.status.loadBalancer.ingress[0].ip}')

Default Access Log

If custom format string is not specified, Envoy Gateway uses the following default format:

{
  "start_time": "%START_TIME%",
  "method": "%REQ(:METHOD)%",
  "x-envoy-origin-path": "%REQ(X-ENVOY-ORIGINAL-PATH?:PATH)%",
  "protocol": "%PROTOCOL%",
  "response_code": "%RESPONSE_CODE%",
  "response_flags": "%RESPONSE_FLAGS%",
  "response_code_details": "%RESPONSE_CODE_DETAILS%",
  "connection_termination_details": "%CONNECTION_TERMINATION_DETAILS%",
  "upstream_transport_failure_reason": "%UPSTREAM_TRANSPORT_FAILURE_REASON%",
  "bytes_received": "%BYTES_RECEIVED%",
  "bytes_sent": "%BYTES_SENT%",
  "duration": "%DURATION%",
  "x-envoy-upstream-service-time": "%RESP(X-ENVOY-UPSTREAM-SERVICE-TIME)%",
  "x-forwarded-for": "%REQ(X-FORWARDED-FOR)%",
  "user-agent": "%REQ(USER-AGENT)%",
  "x-request-id": "%REQ(X-REQUEST-ID)%",
  ":authority": "%REQ(:AUTHORITY)%",
  "upstream_host": "%UPSTREAM_HOST%",
  "upstream_cluster": "%UPSTREAM_CLUSTER%",
  "upstream_local_address": "%UPSTREAM_LOCAL_ADDRESS%",
  "downstream_local_address": "%DOWNSTREAM_LOCAL_ADDRESS%",
  "downstream_remote_address": "%DOWNSTREAM_REMOTE_ADDRESS%",
  "requested_server_name": "%REQUESTED_SERVER_NAME%",
  "route_name": "%ROUTE_NAME%"
}

Note: Envoy Gateway disable envoy headers by default, you can enable it by setting EnableEnvoyHeaders to true in the ClientTrafficPolicy CRD.

Verify logs from loki:

curl -s "http://$LOKI_IP:3100/loki/api/v1/query_range" --data-urlencode "query={job=\"fluentbit\"}" | jq '.data.result[0].values'

Disable Access Log

If you want to disable it, set the telemetry.accesslog.disable to true in the EnvoyProxy CRD.

kubectl apply -f - <<EOF
apiVersion: gateway.networking.k8s.io/v1
kind: GatewayClass
metadata:
  name: eg
spec:
  controllerName: gateway.envoyproxy.io/gatewayclass-controller
  parametersRef:
    group: gateway.envoyproxy.io
    kind: EnvoyProxy
    name: disable-accesslog
    namespace: envoy-gateway-system
---
apiVersion: gateway.envoyproxy.io/v1alpha1
kind: EnvoyProxy
metadata:
  name: disable-accesslog
  namespace: envoy-gateway-system
spec:
  telemetry:
    accessLog:
      disable: true
EOF

OpenTelemetry Sink

Envoy Gateway can send logs to OpenTelemetry Sink.

kubectl apply -f - <<EOF
apiVersion: gateway.networking.k8s.io/v1
kind: GatewayClass
metadata:
  name: eg
spec:
  controllerName: gateway.envoyproxy.io/gatewayclass-controller
  parametersRef:
    group: gateway.envoyproxy.io
    kind: EnvoyProxy
    name: otel-access-logging
    namespace: envoy-gateway-system
---
apiVersion: gateway.envoyproxy.io/v1alpha1
kind: EnvoyProxy
metadata:
  name: otel-access-logging
  namespace: envoy-gateway-system
spec:
  telemetry:
    accessLog:
      settings:
        - format:
            type: Text
            text: |
              [%START_TIME%] "%REQ(:METHOD)% %REQ(X-ENVOY-ORIGINAL-PATH?:PATH)% %PROTOCOL%" %RESPONSE_CODE% %RESPONSE_FLAGS% %BYTES_RECEIVED% %BYTES_SENT% %DURATION% "%REQ(X-FORWARDED-FOR)%" "%REQ(USER-AGENT)%" "%REQ(X-REQUEST-ID)%" "%REQ(:AUTHORITY)%" "%UPSTREAM_HOST%"
          sinks:
            - type: OpenTelemetry
              openTelemetry:
                host: otel-collector.monitoring.svc.cluster.local
                port: 4317
                resources:
                  k8s.cluster.name: "cluster-1"
EOF

Verify logs from loki:

curl -s "http://$LOKI_IP:3100/loki/api/v1/query_range" --data-urlencode "query={exporter=\"OTLP\"}" | jq '.data.result[0].values'

gGRPC Access Log Service(ALS) Sink

Envoy Gateway can send logs to a backend implemented gRPC access log service proto. There’s an example service here, which simply count the log and export to prometheus endpoint.

kubectl apply -f https://raw.githubusercontent.com/envoyproxy/gateway/main/examples/kubernetes/envoy-als.yaml -n monitoring

The following configuration sends logs to the gRPC access log service:

kubectl apply -f - <<EOF
apiVersion: gateway.networking.k8s.io/v1
kind: GatewayClass
metadata:
  name: eg
spec:
  controllerName: gateway.envoyproxy.io/gatewayclass-controller
  parametersRef:
    group: gateway.envoyproxy.io
    kind: EnvoyProxy
    name: als
    namespace: envoy-gateway-system
---
apiVersion: gateway.envoyproxy.io/v1alpha1
kind: EnvoyProxy
metadata:
  name: als
  namespace: envoy-gateway-system
spec:
  telemetry:
    accessLog:
      settings:
        - sinks:
            - type: ALS
              als:
                backendRefs:
                  - name: envoy-als
                    namespace: monitoring
                    port: 8080
                type: HTTP
EOF

Verify logs from envoy-als:

curl -s "http://$(kubectl get svc envoy-als -n monitoring -o jsonpath='{.status.loadBalancer.ingress[0].ip}'):19001/metrics" | grep log_count

CEL Expressions

Envoy Gateway provides CEL expressions to filter access log .

For example, you can use the expression 'x-envoy-logged' in request.headers to filter logs that contain the x-envoy-logged header.

kubectl apply -f - <<EOF
apiVersion: gateway.networking.k8s.io/v1
kind: GatewayClass
metadata:
  name: eg
spec:
  controllerName: gateway.envoyproxy.io/gatewayclass-controller
  parametersRef:
    group: gateway.envoyproxy.io
    kind: EnvoyProxy
    name: otel-access-logging
    namespace: envoy-gateway-system
---
apiVersion: gateway.envoyproxy.io/v1alpha1
kind: EnvoyProxy
metadata:
  name: otel-access-logging
  namespace: envoy-gateway-system
spec:
  telemetry:
    accessLog:
      settings:
        - format:
            type: Text
            text: |
              [%START_TIME%] "%REQ(:METHOD)% %REQ(X-ENVOY-ORIGINAL-PATH?:PATH)% %PROTOCOL%" %RESPONSE_CODE% %RESPONSE_FLAGS% %BYTES_RECEIVED% %BYTES_SENT% %DURATION% "%REQ(X-FORWARDED-FOR)%" "%REQ(USER-AGENT)%" "%REQ(X-REQUEST-ID)%" "%REQ(:AUTHORITY)%" "%UPSTREAM_HOST%"
          matches:
            - "'x-envoy-logged' in request.headers"
          sinks:
            - type: OpenTelemetry
              openTelemetry:
                host: otel-collector.monitoring.svc.cluster.local
                port: 4317
                resources:
                  k8s.cluster.name: "cluster-1"
EOF

Verify logs from loki:

curl -s "http://$LOKI_IP:3100/loki/api/v1/query_range" --data-urlencode "query={exporter=\"OTLP\"}" | jq '.data.result[0].values'

Additional Metadata

Envoy Gateway provides additional metadata about the K8s resources that were translated to certain envoy resources. For example, details about the HTTPRoute and GRPCRoute (kind, group, name, namespace and annotations) are available for access log formatter using the METADATA operator. To enrich logs, users can add log operator such as: %METADATA(ROUTE:envoy-gateway:resources)% to their access log format.

Access Log Types

By default, Access Log settings would apply to:

  • All Routes
  • If traffic is not matched by any Route known to Envoy, the Listener would emit the access log instead

Users may wish to customize this behavior:

  • Emit Access Logs by all Listeners for all traffic with specific settings
  • Do not emit Route-oriented access logs when a route is not matched.

To achieve this, users can select if Access Log settings follow the default behavior or apply specifically to Routes or Listeners by specifying the setting’s type.

Note: When users define their own Access Log settings (with or without a type), the default Envoy Gateway file access log is no longer configured. It can be re-enabled explicitly by adding empty settings for the desired components.

In the following example:

  • Route Access logs would use the default Envoy Gateway format and sink
  • Listener Access logs are customized to report transport-level failures and connection attributes
kubectl apply -f - <<EOF
apiVersion: gateway.networking.k8s.io/v1
kind: GatewayClass
metadata:
  name: eg
spec:
  controllerName: gateway.envoyproxy.io/gatewayclass-controller
  parametersRef:
    group: gateway.envoyproxy.io
    kind: EnvoyProxy
    name: otel-access-logging
    namespace: envoy-gateway-system
---
apiVersion: gateway.envoyproxy.io/v1alpha1
kind: EnvoyProxy
metadata:
  name: otel-access-logging
  namespace: envoy-gateway-system
spec:
  telemetry:
    accessLog:
      settings:
        - type: Route # re-enable default access log for route
        - type: Listener # configure specific access log for listeners
          format:
            type: Text
            text: |
              [%START_TIME%] %DOWNSTREAM_REMOTE_ADDRESS% %RESPONSE_FLAGS% %BYTES_RECEIVED% %BYTES_SENT% %DOWNSTREAM_TRANSPORT_FAILURE_REASON%
          sinks:
            - type: OpenTelemetry
              openTelemetry:
                host: otel-collector.monitoring.svc.cluster.local
                port: 4317
                resources:
                  k8s.cluster.name: "cluster-1"
EOF

5 - Proxy Metrics

Envoy Gateway provides observability for the ControlPlane and the underlying EnvoyProxy instances. This task show you how to config proxy metrics.

Prerequisites

Follow the steps from the Quickstart to install Envoy Gateway and the example manifest. Before proceeding, you should be able to query the example backend using HTTP.

Envoy Gateway provides an add-ons Helm Chart, which includes all the needing components for observability. By default, the OpenTelemetry Collector is disabled.

Install the add-ons Helm Chart:

helm install eg-addons oci://docker.io/envoyproxy/gateway-addons-helm --version v0.0.0-latest --set opentelemetry-collector.enabled=true -n monitoring --create-namespace

Metrics

By default, Envoy Gateway expose metrics with prometheus endpoint.

Verify metrics:

export ENVOY_POD_NAME=$(kubectl get pod -n envoy-gateway-system --selector=gateway.envoyproxy.io/owning-gateway-namespace=default,gateway.envoyproxy.io/owning-gateway-name=eg -o jsonpath='{.items[0].metadata.name}')
kubectl port-forward pod/$ENVOY_POD_NAME -n envoy-gateway-system 19001:19001

# check metrics 
curl localhost:19001/stats/prometheus  | grep "default/backend/rule/0"

You can disable metrics by setting the telemetry.metrics.prometheus.disable to true in the EnvoyProxy CRD.

kubectl apply -f https://raw.githubusercontent.com/envoyproxy/gateway/latest/examples/kubernetes/metric/disable-prometheus.yaml

Envoy Gateway can send metrics to OpenTelemetry Sink. Send metrics to OTel-Collector:

kubectl apply -f https://raw.githubusercontent.com/envoyproxy/gateway/latest/examples/kubernetes/metric/otel-sink.yaml

Verify OTel-Collector metrics:

export OTEL_POD_NAME=$(kubectl get pod -n monitoring --selector=app.kubernetes.io/name=opentelemetry-collector -o jsonpath='{.items[0].metadata.name}')
kubectl port-forward pod/$OTEL_POD_NAME -n monitoring 19001:19001

# check metrics 
curl localhost:19001/metrics  | grep "default/backend/rule/0"

6 - Proxy Tracing

Envoy Gateway provides observability for the ControlPlane and the underlying EnvoyProxy instances. This task show you how to config proxy tracing.

Prerequisites

Follow the steps from the Quickstart to install Envoy Gateway and the example manifest. Before proceeding, you should be able to query the example backend using HTTP.

Envoy Gateway provides an add-ons Helm Chart, which includes all the needing components for observability. By default, the OpenTelemetry Collector is disabled.

Install the add-ons Helm Chart:

helm install eg-addons oci://docker.io/envoyproxy/gateway-addons-helm --version v0.0.0-latest --set opentelemetry-collector.enabled=true -n monitoring --create-namespace

Expose Tempo endpoints:

TEMPO_IP=$(kubectl get svc tempo -n monitoring -o jsonpath='{.status.loadBalancer.ingress[0].ip}')

Traces

By default, Envoy Gateway doesn’t send traces to any sink. You can enable traces by setting the telemetry.tracing in the EnvoyProxy CRD. Currently, Envoy Gateway support OpenTelemetry, Zipkin and Datadog tracer.

Tracing Provider

The following configurations show how to apply proxy with different providers:

kubectl apply -f - <<EOF
apiVersion: gateway.networking.k8s.io/v1
kind: GatewayClass
metadata:
  name: eg
spec:
  controllerName: gateway.envoyproxy.io/gatewayclass-controller
  parametersRef:
    group: gateway.envoyproxy.io
    kind: EnvoyProxy
    name: otel
    namespace: envoy-gateway-system
---
apiVersion: gateway.envoyproxy.io/v1alpha1
kind: EnvoyProxy
metadata:
  name: otel
  namespace: envoy-gateway-system
spec:
  telemetry:
    tracing:
      # sample 100% of requests
      samplingRate: 100
      provider:
        backendRefs:
        - name: otel-collector
          namespace: monitoring
          port: 4317
        type: OpenTelemetry
      customTags:
        # This is an example of using a literal as a tag value
        provider:
          type: Literal
          literal:
            value: "otel"
        "k8s.pod.name":
          type: Environment
          environment:
            name: ENVOY_POD_NAME
            defaultValue: "-"
        "k8s.namespace.name":
          type: Environment
          environment:
            name: ENVOY_GATEWAY_NAMESPACE
            defaultValue: "envoy-gateway-system"
        # This is an example of using a header value as a tag value
        header1:
          type: RequestHeader
          requestHeader:
            name: X-Header-1
            defaultValue: "-"
EOF

Verify OpenTelemetry traces from tempo:

curl -s "http://$TEMPO_IP:3100/api/search?tags=component%3Dproxy+provider%3Dotel" | jq .traces
kubectl apply -f - <<EOF
apiVersion: gateway.networking.k8s.io/v1
kind: GatewayClass
metadata:
  name: eg
spec:
  controllerName: gateway.envoyproxy.io/gatewayclass-controller
  parametersRef:
    group: gateway.envoyproxy.io
    kind: EnvoyProxy
    name: zipkin
    namespace: envoy-gateway-system
---
apiVersion: gateway.envoyproxy.io/v1alpha1
kind: EnvoyProxy
metadata:
  name: zipkin
  namespace: envoy-gateway-system
spec:
  telemetry:
    tracing:
      # sample 100% of requests
      samplingRate: 100
      provider:
        backendRefs:
        - name: otel-collector
          namespace: monitoring
          port: 9411
        type: Zipkin
        zipkin:
          enable128BitTraceId: true
      customTags:
        # This is an example of using a literal as a tag value
        provider:
          type: Literal
          literal:
            value: "zipkin"
        "k8s.pod.name":
          type: Environment
          environment:
            name: ENVOY_POD_NAME
            defaultValue: "-"
        "k8s.namespace.name":
          type: Environment
          environment:
            name: ENVOY_GATEWAY_NAMESPACE
            defaultValue: "envoy-gateway-system"
        # This is an example of using a header value as a tag value
        header1:
          type: RequestHeader
          requestHeader:
            name: X-Header-1
            defaultValue: "-"
EOF

Verify zipkin traces from tempo:

curl -s "http://$TEMPO_IP:3100/api/search?tags=component%3Dproxy+provider%3Dzipkin" | jq .traces
kubectl apply -f - <<EOF
apiVersion: gateway.networking.k8s.io/v1
kind: GatewayClass
metadata:
  name: eg
spec:
  controllerName: gateway.envoyproxy.io/gatewayclass-controller
  parametersRef:
    group: gateway.envoyproxy.io
    kind: EnvoyProxy
    name: datadog
    namespace: envoy-gateway-system
---
apiVersion: gateway.envoyproxy.io/v1alpha1
kind: EnvoyProxy
metadata:
  name: datadog
  namespace: envoy-gateway-system
spec:
  telemetry:
    tracing:
      # sample 100% of requests
      samplingRate: 100
      provider:
        backendRefs:
        - name: datadog-agent
          namespace: monitoring
          port: 8126
        type: Datadog
      customTags:
        # This is an example of using a literal as a tag value
        provider:
          type: Literal
          literal:
            value: "datadog"
        "k8s.pod.name":
          type: Environment
          environment:
            name: ENVOY_POD_NAME
            defaultValue: "-"
        "k8s.namespace.name":
          type: Environment
          environment:
            name: ENVOY_GATEWAY_NAMESPACE
            defaultValue: "envoy-gateway-system"
        # This is an example of using a header value as a tag value
        header1:
          type: RequestHeader
          requestHeader:
            name: X-Header-1
            defaultValue: "-"
EOF

Verify Datadog traces in Datadog APM

Query trace by trace id:

curl -s "http://$TEMPO_IP:3100/api/traces/<trace_id>" | jq

Sampling Rate

Envoy Gateway use 100% sample rate, which means all requests will be traced. This may cause performance issues when traffic is very high, you can adjust the sample rate by setting the telemetry.tracing.samplingRate in the EnvoyProxy CRD.

The following configurations show how to apply proxy with 1% sample rates:

kubectl apply -f - <<EOF
apiVersion: gateway.networking.k8s.io/v1
kind: GatewayClass
metadata:
  name: eg
spec:
  controllerName: gateway.envoyproxy.io/gatewayclass-controller
  parametersRef:
    group: gateway.envoyproxy.io
    kind: EnvoyProxy
    name: otel
    namespace: envoy-gateway-system
---
apiVersion: gateway.envoyproxy.io/v1alpha1
kind: EnvoyProxy
metadata:
  name: otel
  namespace: envoy-gateway-system
spec:
  telemetry:
    tracing:
      # sample 1% of requests
      samplingRate: 1
      provider:
        backendRefs:
        - name: otel-collector
          namespace: monitoring
          port: 4317
        type: OpenTelemetry
      customTags:
        # This is an example of using a literal as a tag value
        provider:
          type: Literal
          literal:
            value: "otel"
        "k8s.pod.name":
          type: Environment
          environment:
            name: ENVOY_POD_NAME
            defaultValue: "-"
        "k8s.namespace.name":
          type: Environment
          environment:
            name: ENVOY_GATEWAY_NAMESPACE
            defaultValue: "envoy-gateway-system"
        # This is an example of using a header value as a tag value
        header1:
          type: RequestHeader
          requestHeader:
            name: X-Header-1
            defaultValue: "-"
EOF

7 - RateLimit Observability

Envoy Gateway provides observability for the RateLimit instances. This guide show you how to config RateLimit observability, includes traces.

Prerequisites

Follow the steps from the Quickstart to install Envoy Gateway and the example manifest. Before proceeding, you should be able to query the example backend using HTTP.

Envoy Gateway provides an add-ons Helm Chart, which includes all the needing components for observability. By default, the OpenTelemetry Collector is disabled.

Install the add-ons Helm Chart:

helm install eg-addons oci://docker.io/envoyproxy/gateway-addons-helm --version v0.0.0-latest --set opentelemetry-collector.enabled=true -n monitoring --create-namespace

Follow the steps from the Global Rate Limit to install RateLimit.

Traces

By default, the Envoy Gateway does not configure RateLimit to send traces to the OpenTelemetry Sink. You can configure the collector in the rateLimit.telemetry.tracing of the EnvoyGatewayCRD.

RateLimit uses the OpenTelemetry Exporter to export traces to the collector. You can configure a collector that supports the OTLP protocol, which includes but is not limited to: OpenTelemetry Collector, Jaeger, Zipkin, and so on.

Note:

  • By default, the Envoy Gateway configures a 100% sampling rate for RateLimit, which may lead to performance issues.

Assuming the OpenTelemetry Collector is running in the observability namespace, and it has a service named otel-svc, we only want to sample 50% of the trace data. We would configure it as follows:

cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: ConfigMap
metadata:
  name: envoy-gateway-config
  namespace: envoy-gateway-system
data:
  envoy-gateway.yaml: |
    apiVersion: gateway.envoyproxy.io/v1alpha1
    kind: EnvoyGateway
    provider:
      type: Kubernetes
    gateway:
      controllerName: gateway.envoyproxy.io/gatewayclass-controller
    rateLimit:
      backend:
        type: Redis
        redis:
          url: redis-service.default.svc.cluster.local:6379
      telemetry:
        tracing:
          sampleRate: 50
          provider:
            url: otel-svc.observability.svc.cluster.local:4318
EOF

Save and apply the following resource to your cluster:

---
apiVersion: v1
kind: ConfigMap
metadata:
  name: envoy-gateway-config
  namespace: envoy-gateway-system
data:
  envoy-gateway.yaml: |
    apiVersion: gateway.envoyproxy.io/v1alpha1
    kind: EnvoyGateway
    provider:
      type: Kubernetes
    gateway:
      controllerName: gateway.envoyproxy.io/gatewayclass-controller
    rateLimit:
      backend:
        type: Redis
        redis:
          url: redis-service.default.svc.cluster.local:6379
      telemetry:
        tracing:
          sampleRate: 50
          provider:
            url: otel-svc.observability.svc.cluster.local:4318    

After updating the ConfigMap, you will need to wait the configuration kicks in.
You can force the configuration to be reloaded by restarting the envoy-gateway deployment.

kubectl rollout restart deployment envoy-gateway -n envoy-gateway-system

8 - Visualising metrics using Grafana

Envoy Gateway provides support for exposing Envoy Gateway and Envoy Proxy metrics to a Prometheus instance. This task shows you how to visualise the metrics exposed to Prometheus using Grafana.

Prerequisites

Follow the steps from the Quickstart to install Envoy Gateway and the example manifest. Before proceeding, you should be able to query the example backend using HTTP.

Envoy Gateway provides an add-ons Helm Chart, which includes all the needing components for observability. By default, the OpenTelemetry Collector is disabled.

Install the add-ons Helm Chart:

helm install eg-addons oci://docker.io/envoyproxy/gateway-addons-helm --version v0.0.0-latest --set opentelemetry-collector.enabled=true -n monitoring --create-namespace

Follow the steps from the Gateway Observability and Proxy Metrics to enable Prometheus metrics for both Envoy Gateway (Control Plane) and Envoy Proxy (Data Plane).

Expose endpoints:

GRAFANA_IP=$(kubectl get svc grafana -n monitoring -o jsonpath='{.status.loadBalancer.ingress[0].ip}')

Connecting Grafana with Prometheus datasource

To visualise metrics from Prometheus, we have to connect Grafana with Prometheus. If you installed Grafana follow the command from prerequisites sections, the Prometheus datasource should be already configured.

You can also add the datasource manually by following the instructions from Grafana Docs.

Accessing Grafana

You can access the Grafana instance by visiting http://{GRAFANA_IP}, derived in prerequisites.

To log in to Grafana, use the credentials admin:admin.

Envoy Gateway has examples of dashboard for you to get started, you can check them out under Dashboards/envoy-gateway.

If you’d like import Grafana dashboards on your own, please refer to Grafana docs for importing dashboards.

Envoy Proxy Global

This dashboard example shows the overall downstream and upstream stats for each Envoy Proxy instance.

Envoy Proxy Global

Envoy Clusters

This dashboard example shows the overall stats for each cluster from Envoy Proxy fleet.

Envoy Clusters

Envoy Gateway Global

This dashboard example shows the overall stats exported by Envoy Gateway fleet.

Envoy Gateway Global: Watching Components

Envoy Gateway Global: Status Updater

Envoy Gateway Global: xDS Server

Envoy Gateway Global: Infrastructure Manager

Resources Monitor

This dashboard example shows the overall resources stats for both Envoy Gateway and Envoy Proxy fleet.

Envoy Gateway Resources

Update Dashboards

All dashboards of Envoy Gateway are maintained under charts/gateway-addons-helm/dashboards, feel free to make contributions.

Grafonnet

Newer dashboards are generated with Jsonnet with the Grafonnet. This is the preferred method for any new dashboards.

You can run make helm-generate.gateway-addons-helm to generate new version of dashboards. All the generated dashboards have a .gen.json suffix.

Legacy Dashboards

Many of our older dashboards are manually created in the UI and exported as JSON and checked in.

These example dashboards cannot be updated in-place by default, if you are trying to make some changes to the older dashboards, you can save them directly as a JSON file and then re-import.