This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Design

This section includes Designs of Envoy Gateway.

1: Goals
2: System Design
3: Watching Components Design
4: Gateway API Translator Design
5: Control Plane Observability: Metrics
6: Backend
7: BackendTrafficPolicy
8: Bootstrap Design
9: ClientTrafficPolicy
10: Configuration API Design
11: Data Plane Observability: Accesslog
12: Data Plane Observability: Metrics
13: Data Plane Observability: Tracing
14: Debug support in Envoy Gateway
15: egctl Design
16: Envoy Gateway Extensions Design
17: EnvoyExtensionPolicy
18: EnvoyPatchPolicy
19: Fuzzing
20: Metadata in XDS resources
21: Rate Limit Design
22: Running Envoy Gateway locally
23: SecurityPolicy
24: TCP and UDP Proxy Design
25: Wasm OCI Image Support

1 - Goals

The high-level goal of the Envoy Gateway project is to attract more users to Envoy by lowering barriers to adoption through expressive, extensible, role-oriented APIs that support a multitude of ingress and L7/L4 traffic routing use cases; and provide a common foundation for vendors to build value-added products without having to re-engineer fundamental interactions.

Objectives

Expressive API

The Envoy Gateway project will expose a simple and expressive API, with defaults set for many capabilities.

The API will be the Kubernetes-native Gateway API, plus Envoy-specific extensions and extension points. This expressive and familiar API will make Envoy accessible to more users, especially application developers, and make Envoy a stronger option for “getting started” as compared to other proxies. Application developers will use the API out of the box without needing to understand in-depth concepts of Envoy Proxy or use OSS wrappers. The API will use familiar nouns that users understand.

The core full-featured Envoy xDS APIs will remain available for those who need more capability and for those who add functionality on top of Envoy Gateway, such as commercial API gateway products.

This expressive API will not be implemented by Envoy Proxy, but rather an officially supported translation layer on top.

Batteries included

Envoy Gateway will simplify how Envoy is deployed and managed, allowing application developers to focus on delivering core business value.

The project plans to include additional infrastructure components required by users to fulfill their Ingress and API gateway needs: It will handle Envoy infrastructure provisioning (e.g. Kubernetes Service, Deployment, et cetera), and possibly infrastructure provisioning of related sidecar services. It will include sensible defaults with the ability to override. It will include channels for improving ops by exposing status through API conditions and Kubernetes status sub-resources.

Making an application accessible needs to be a trivial task for any developer. Similarly, infrastructure administrators will enjoy a simplified management model that doesn’t require extensive knowledge of the solution’s architecture to operate.

All environments

Envoy Gateway will support running natively in Kubernetes environments as well as non-Kubernetes deployments.

Initially, Kubernetes will receive the most focus, with the aim of having Envoy Gateway become the de facto standard for Kubernetes ingress supporting the Gateway API. Additional goals include multi-cluster support and various runtime environments.

Extensibility

Vendors will have the ability to provide value-added products built on the Envoy Gateway foundation.

It will remain easy for end-users to leverage common Envoy Proxy extension points such as providing an implementation for authentication methods and rate-limiting. For advanced use cases, users will have the ability to use the full power of xDS.

Since a general-purpose API cannot address all use cases, Envoy Gateway will provide additional extension points for flexibility. As such, Envoy Gateway will form the base of vendor-provided managed control plane solutions, allowing vendors to shift to a higher management plane layer.

Non-objectives

Cannibalize vendor models

Vendors need to have the ability to drive commercial value, so the goal is not to cannibalize any existing vendor monetization model, though some vendors may be affected by it.

Disrupt current Envoy usage patterns

Envoy Gateway is purely an additive convenience layer and is not meant to disrupt any usage pattern of any user with Envoy Proxy, xDS, or go-control-plane.

Personas

In order of priority

1. Application developer

The application developer spends the majority of their time developing business logic code. They require the ability to manage access to their application.

2. Infrastructure administrators

The infrastructure administrators are responsible for the installation, maintenance, and operation of API gateways appliances in infrastructure, such as CRDs, roles, service accounts, certificates, etc. Infrastructure administrators support the needs of application developers by managing instances of Envoy Gateway.

2 - System Design

Goals

Define the system components needed to satisfy the requirements of Envoy Gateway.

Non-Goals

Create a detailed design and interface specification for each system component.

Terminology

Control Plane- A collection of inter-related software components for providing application gateway and routing functionality. The control plane is implemented by Envoy Gateway and provides services for managing the data plane. These services are detailed in the components section.
Data Plane- Provides intelligent application-level traffic routing and is implemented as one or more Envoy proxies.

Architecture

Configuration

Envoy Gateway is configured statically at startup and the managed data plane is configured dynamically through Kubernetes resources, primarily Gateway API objects.

Static Configuration

Static configuration is used to configure Envoy Gateway at startup, i.e. change the GatewayClass controllerName, configure a Provider, etc. Currently, Envoy Gateway only supports configuration through a configuration file. If the configuration file is not provided, Envoy Gateway starts-up with default configuration parameters.

Dynamic Configuration

Dynamic configuration is based on the concept of a declaring the desired state of the data plane and using reconciliation loops to drive the actual state toward the desired state. The desired state of the data plane is defined as Kubernetes resources that provide the following services:

Infrastructure Management- Manage the data plane infrastructure, i.e. deploy, upgrade, etc. This configuration is expressed through GatewayClass and Gateway resources. The EnvoyProxy Custom Resource can be referenced by gatewayclass.spec.parametersRef to modify data plane infrastructure default parameters, e.g. expose Envoy network endpoints using a ClusterIP service instead of a LoadBalancer service.
Traffic Routing- Define how to handle application-level requests to backend services. For example, route all HTTP requests for “www.example.com” to a backend service running a web server. This configuration is expressed through HTTPRoute and TLSRoute resources that match, filter, and route traffic to a backend. Although a backend can be any valid Kubernetes Group/Kind resource, Envoy Gateway only supports a Service reference.

Components

Envoy Gateway is made up of several components that communicate in-process; how this communication happens is described in the Watching Components Design.

Provider

A Provider is an infrastructure component that Envoy Gateway calls to establish its runtime configuration, resolve services, persist data, etc. As of v0.2, Kubernetes is the only implemented provider. A file provider is on the roadmap via Issue #37. Other providers can be added in the future as Envoy Gateway use cases are better understood. A provider is configured at start up through Envoy Gateway’s static configuration.

Kubernetes Provider

Uses Kubernetes-style controllers to reconcile Kubernetes resources that comprise the dynamic configuration.
Manages the data plane through Kubernetes API CRUD operations.
Uses Kubernetes for Service discovery.
Uses etcd (via Kubernetes API) to persist data.

File Provider

Uses a file watcher to watch files in a directory that define the data plane configuration.
Manages the data plane by calling internal APIs, e.g. CreateDataPlane().
Uses the host’s DNS for Service discovery.
If needed, the local filesystem is used to persist data.

Resource Watcher

The Resource Watcher watches resources used to establish and maintain Envoy Gateway’s dynamic configuration. The mechanics for watching resources is provider-specific, e.g. informers, caches, etc. are used for the Kubernetes provider. The Resource Watcher uses the configured provider for input and provides resources to the Resource Translator as output.

Resource Translator

The Resource Translator translates external resources, e.g. GatewayClass, from the Resource Watcher to the Intermediate Representation (IR). It is responsible for:

Translating infrastructure-specific resources/fields from the Resource Watcher to the Infra IR.
Translating proxy configuration resources/fields from the Resource Watcher to the xDS IR.

Note: The Resource Translator is implemented as the Translator API type in the gatewayapi package.

Intermediate Representation (IR)

The Intermediate Representation defines internal data models that external resources are translated into. This allows Envoy Gateway to be decoupled from the external resources used for dynamic configuration. The IR consists of an Infra IR used as input for the Infra Manager and an xDS IR used as input for the xDS Translator.

Infra IR- Used as the internal definition of the managed data plane infrastructure.
xDS IR- Used as the internal definition of the managed data plane xDS configuration.

xDS Translator

The xDS Translator translates the xDS IR into xDS Resources that are consumed by the xDS server.

xDS Server

The xDS Server is a xDS gRPC Server based on Go Control Plane. Go Control Plane implements the Delta xDS Server Protocol and is responsible for using xDS to configure the data plane.

Infra Manager

The Infra Manager is a provider-specific component responsible for managing the following infrastructure:

Data Plane - Manages all the infrastructure required to run the managed Envoy proxies. For example, CRUD Deployment, Service, etc. resources to run Envoy in a Kubernetes cluster.
Auxiliary Control Planes - Optional infrastructure needed to implement application Gateway features that require external integrations with the managed Envoy proxies. For example, Global Rate Limiting requires provisioning and configuring the Envoy Rate Limit Service and the Rate Limit filter. Such features are exposed to users through the Custom Route Filters extension.

The Infra Manager consumes the Infra IR as input to manage the data plane infrastructure.

Design Decisions

Envoy Gateway can consume multiple GatewayClass by comparing its configured controller name with spec.controllerName of a GatewayClass. gatewayclass.spec.parametersRef refers to the EnvoyProxy custom resource for configuring the managed proxy infrastructure. If unspecified, default configuration parameters are used for the managed proxy infrastructure.
Envoy Gateway manages Gateways that reference its GatewayClass.
- A Gateway resource causes Envoy Gateway to provision managed Envoy proxy infrastructure.
- Envoy Gateway groups Listeners by Port and collapses each group of Listeners into a single Listener if the Listeners in the group are compatible. Envoy Gateway considers Listeners to be compatible if all the following conditions are met:
  - Either each Listener within the group specifies the “HTTP” Protocol or each Listener within the group specifies either the “HTTPS” or “TLS” Protocol.
  - Each Listener within the group specifies a unique “Hostname”.
  - As a special case, one Listener within a group may omit “Hostname”, in which case this Listener matches when no other Listener matches.
- Envoy Gateway does not merge listeners across multiple Gateways.
Envoy Gateway follows Gateway API guidelines to resolve any conflicts.
- A Gateway listener corresponds to an Envoy proxy Listener.
An HTTPRoute resource corresponds to an Envoy proxy Route.
- Each backendRef corresponds to an Envoy proxy Cluster.
The goal is to make Envoy Gateway components extensible in the future. See the roadmap for additional details.

The draft for this document is here.

3 - Watching Components Design

Envoy Gateway is made up of several components that communicate in-process. Some of them (namely Providers) watch external resources, and “publish” what they see for other components to consume; others watch what another publishes and act on it (such as the resource translator watches what the providers publish, and then publishes its own results that are watched by another component). Some of these internally published results are consumed by multiple components.

To facilitate this communication use the watchable library. The watchable.Map type is very similar to the standard library’s sync.Map type, but supports a .Subscribe (and .SubscribeSubset) method that promotes a pub/sub pattern.

Pub

Many of the things we communicate around are naturally named, either by a bare “name” string or by a “name”/“namespace” tuple. And because watchable.Map is typed, it makes sense to have one map for each type of thing (very similar to if we were using native Go maps). For example, a struct that might be written to by the Kubernetes provider, and read by the IR translator:

type ResourceTable struct {
    // gateway classes are cluster-scoped; no namespace
    GatewayClasses watchable.Map[string, *gwapiv1.GatewayClass]

    // gateways are namespace-scoped, so use a k8s.io/apimachinery/pkg/types.NamespacedName as the map key.
    Gateways watchable.Map[types.NamespacedName, *gwapiv1.Gateway]

    HTTPRoutes watchable.Map[types.NamespacedName, *gwapiv1.HTTPRoute]
}

The Kubernetes provider updates the table by calling table.Thing.Store(name, val) and table.Thing.Delete(name); updating a map key with a value that is deep-equal (usually reflect.DeepEqual, but you can implement your own .Equal method) the current value is a no-op; it won’t trigger an event for subscribers. This is handy so that the publisher doesn’t have as much state to keep track of; it doesn’t need to know “did I already publish this thing”, it can just .Store its data and watchable will do the right thing.

Sub

Meanwhile, the translator and other interested components subscribe to it with table.Thing.Subscribe (or table.Thing.SubscribeSubset if they only care about a few “Thing"s). So the translator goroutine might look like:

func(ctx context.Context) error {
    for snapshot := range k8sTable.HTTPRoutes.Subscribe(ctx) {
        fullState := irInput{
           GatewayClasses: k8sTable.GatewayClasses.LoadAll(),
           Gateways:       k8sTable.Gateways.LoadAll(),
           HTTPRoutes:     snapshot.State,
        }
        translate(irInput)
    }
}

Or, to watch multiple maps in the same loop:

func worker(ctx context.Context) error {
    classCh := k8sTable.GatewayClasses.Subscribe(ctx)
    gwCh := k8sTable.Gateways.Subscribe(ctx)
    routeCh := k8sTable.HTTPRoutes.Subscribe(ctx)
    for ctx.Err() == nil {
        var arg irInput
        select {
        case snapshot := <-classCh:
            arg.GatewayClasses = snapshot.State
        case snapshot := <-gwCh:
            arg.Gateways = snapshot.State
        case snapshot := <-routeCh:
            arg.Routes = snapshot.State
        }
        if arg.GateWayClasses == nil {
            arg.GatewayClasses = k8sTable.GateWayClasses.LoadAll()
        }
        if arg.GateWays == nil {
            arg.Gateways = k8sTable.GateWays.LoadAll()
        }
        if arg.HTTPRoutes == nil {
            arg.HTTPRoutes = k8sTable.HTTPRoutes.LoadAll()
        }
        translate(irInput)
    }
}

From the updates it gets from .Subscribe, it can get a full view of the map being subscribed to via snapshot.State; but it must read the other maps explicitly. Like sync.Map, watchable.Maps are thread-safe; while .Subscribe is a handy way to know when to run, .Load and friends can be used without subscribing.

There can be any number of subscribers. For that matter, there can be any number of publishers .Storeing things, but it’s probably wise to just have one publisher for each map.

The channel returned from .Subscribe is immediately readable with a snapshot of the map as it existed when .Subscribe was called; and becomes readable again whenever .Store or .Delete mutates the map. If multiple mutations happen between reads (or if mutations happen between .Subscribe and the first read), they are coalesced in to one snapshot to be read; the snapshot.State is the most-recent full state, and snapshot.Updates is a listing of each of the mutations that cause this snapshot to be different than the last-read one. This way subscribers don’t need to worry about a backlog accumulating if they can’t keep up with the rate of changes from the publisher.

If the map contains anything before .Subscribe is called, that very first read won’t include snapshot.Updates entries for those pre-existing items; if you are working with snapshot.Update instead of snapshot.State, then you must add special handling for your first read. We have a utility function ./internal/message.HandleSubscription to help with this.

Other Notes

The common pattern will likely be that the entrypoint that launches the goroutines for each component instantiates the map, and passes them to the appropriate publishers and subscribers; same as if they were communicating via a dumb chan.

A limitation of watchable.Map is that in order to ensure safety between goroutines, it does require that value types be deep-copiable; either by having a DeepCopy method, being a proto.Message, or by containing no reference types and so can be deep-copied by naive assignment. Fortunately, we’re using controller-gen anyway, and controller-gen can generate DeepCopy methods for us: just stick a // +k8s:deepcopy-gen=true on the types that you want it to generate methods for.

4 - Gateway API Translator Design

The Gateway API translates external resources, e.g. GatewayClass, from the configured Provider to the Intermediate Representation (IR).

Assumptions

Initially target core conformance features only, to be followed by extended conformance features.

Inputs and Outputs

The main inputs to the Gateway API translator are:

GatewayClass, Gateway, HTTPRoute, TLSRoute, Service, ReferenceGrant, Namespace, and Secret resources.

Note: ReferenceGrant is not fully implemented as of v0.2.

The outputs of the Gateway API translator are:

Xds and Infra Internal Representations (IRs).
Status updates for GatewayClass, Gateways, HTTPRoutes

Listener Compatibility

Envoy Gateway follows Gateway API listener compatibility spec:

Each listener in a Gateway must have a unique combination of Hostname, Port, and Protocol. An implementation MAY group Listeners by Port and then collapse each group of Listeners into a single Listener if the implementation determines that the Listeners in the group are “compatible”.

Note: Envoy Gateway does not collapse listeners across multiple Gateways.

Listener Compatibility Examples

Example 1: Gateway with compatible Listeners (same port & protocol, different hostnames)

kind: Gateway
apiVersion: gateway.networking.k8s.io/v1
metadata:
  name: gateway-1
  namespace: envoy-gateway
spec:
  gatewayClassName: envoy-gateway
  listeners:
    - name: http
      protocol: HTTP
      port: 80
      allowedRoutes:
        namespaces:
          from: All
      hostname: "*.envoygateway.io"
    - name: http
      protocol: HTTP
      port: 80
      allowedRoutes:
        namespaces:
          from: All
      hostname: whales.envoygateway.io

Example 2: Gateway with compatible Listeners (same port & protocol, one hostname specified, one not)

kind: Gateway
apiVersion: gateway.networking.k8s.io/v1
metadata:
  name: gateway-1
  namespace: envoy-gateway
spec:
  gatewayClassName: envoy-gateway
  listeners:
    - name: http
      protocol: HTTP
      port: 80
      allowedRoutes:
        namespaces:
          from: All
      hostname: "*.envoygateway.io"
    - name: http
      protocol: HTTP
      port: 80
      allowedRoutes:
        namespaces:
          from: All

Example 3: Gateway with incompatible Listeners (same port, protocol and hostname)

kind: Gateway
apiVersion: gateway.networking.k8s.io/v1
metadata:
  name: gateway-1
  namespace: envoy-gateway
spec:
  gatewayClassName: envoy-gateway
  listeners:
    - name: http
      protocol: HTTP
      port: 80
      allowedRoutes:
        namespaces:
          from: All
      hostname: whales.envoygateway.io
    - name: http
      protocol: HTTP
      port: 80
      allowedRoutes:
        namespaces:
          from: All
      hostname: whales.envoygateway.io

Example 4: Gateway with incompatible Listeners (neither specify a hostname)

kind: Gateway
apiVersion: gateway.networking.k8s.io/v1
metadata:
  name: gateway-1
  namespace: envoy-gateway
spec:
  gatewayClassName: envoy-gateway
  listeners:
    - name: http
      protocol: HTTP
      port: 80
      allowedRoutes:
        namespaces:
          from: All
    - name: http
      protocol: HTTP
      port: 80
      allowedRoutes:
        namespaces:
          from: All

Computing Status

Gateway API specifies a rich set of status fields & conditions for each resource. To achieve conformance, Envoy Gateway must compute the appropriate status fields and conditions for managed resources.

Status is computed and set for:

The managed GatewayClass (gatewayclass.status.conditions).
Each managed Gateway, based on its Listeners’ status (gateway.status.conditions). For the Kubernetes provider, the Envoy Deployment and Service status are also included to calculate Gateway status.
Listeners for each Gateway (gateway.status.listeners).
The ParentRef for each Route (route.status.parents).

The Gateway API translator is responsible for calculating status conditions while translating Gateway API resources to the IR and publishing status over the message bus. The Status Manager subscribes to these status messages and updates the resource status using the configured provider. For example, the Status Manager uses a Kubernetes client to update resource status on the Kubernetes API server.

Outline

The following roughly outlines the translation process. Each step may produce (1) IR; and (2) status updates on Gateway API resources.

Process Gateway Listeners
- Validate unique hostnames, ports, and protocols.
- Validate and compute supported kinds.
- Validate allowed namespaces (validate selector if specified).
- Validate TLS fields if specified, including resolving referenced Secrets.
Process HTTPRoutes
- foreach route rule:
  - compute matches
    - [core] path exact, path prefix
    - [core] header exact
    - [extended] query param exact
    - [extended] HTTP method
  - compute filters
    - [core] request header modifier (set/add/remove)
    - [core] request redirect (hostname, statuscode)
    - [extended] request mirror
  - compute backends
    - [core] Kubernetes services
- foreach route parent ref:
  - get matching listeners (check Gateway, section name, listener validation status, listener allowed routes, hostname intersection)
  - foreach matching listener:
    - foreach hostname intersection with route:
      - add each computed route rule to host

Context Structs

To help store, access and manipulate information as it’s processed during the translation process, a set of context structs are used. These structs wrap a given Gateway API type, and add additional fields and methods to support processing.

GatewayContext wraps a Gateway and provides helper methods for setting conditions, accessing Listeners, etc.

type GatewayContext struct {
	// The managed Gateway
	*v1beta1.Gateway

	// A list of Gateway ListenerContexts.
	listeners []*ListenerContext
}

ListenerContext wraps a Listener and provides helper methods for setting conditions and other status information on the associated Gateway.

type ListenerContext struct {
    // The Gateway listener.
	*v1beta1.Listener

	// The Gateway this Listener belongs to.
	gateway           *v1beta1.Gateway

	// An index used for managing this listener in the list of Gateway listeners.
	listenerStatusIdx int

	// Only Routes in namespaces selected by the selector may be attached
	// to the Gateway this listener belongs to.
	namespaceSelector labels.Selector

	// The TLS Secret for this Listener, if applicable.
	tlsSecret         *v1.Secret
}

RouteContext represents a generic Route object (HTTPRoute, TLSRoute, etc.) that can reference Gateway objects.

type RouteContext interface {
	client.Object

	// GetRouteType returns the Kind of the Route object, HTTPRoute,
	// TLSRoute, TCPRoute, UDPRoute etc.
	GetRouteType() string

	// GetHostnames returns the hosts targeted by the Route object.
	GetHostnames() []string

	// GetParentReferences returns the ParentReference of the Route object.
	GetParentReferences() []v1beta1.ParentReference

	// GetRouteParentContext returns RouteParentContext by using the Route
	// objects' ParentReference.
	GetRouteParentContext(forParentRef v1beta1.ParentReference) *RouteParentContext
}

5 - Control Plane Observability: Metrics

This document aims to cover all aspects of envoy gateway control plane metrics observability.

Note

Data plane observability (while important) is outside of scope for this document. For data plane observability, refer to here.

Current State

At present, the Envoy Gateway control plane provides logs and controller-runtime metrics, without traces. Logs are managed through our proprietary library (internal/logging, a shim to zap) and are written to /dev/stdout.

Goals

Our objectives include:

Supporting PULL mode for Prometheus metrics and exposing these metrics on the admin address.
Supporting PUSH mode for Prometheus metrics, thereby sending metrics to the Open Telemetry Stats sink via gRPC or HTTP.

Non-Goals

Our non-goals include:

Supporting other stats sinks.

Use-Cases

The use-cases include:

Exposing Prometheus metrics in the Envoy Gateway Control Plane.
Pushing Envoy Gateway Control Plane metrics via the Open Telemetry Sink.

Design

Standards

Our metrics, will be built upon the OpenTelemetry standards. All metrics will be configured via the OpenTelemetry SDK, which offers neutral libraries that can be connected to various backends.

This approach allows the Envoy Gateway code to concentrate on the crucial aspect - generating the metrics - and delegate all other tasks to systems designed for telemetry ingestion.

Attributes

OpenTelemetry defines a set of Semantic Conventions, including Kubernetes specific ones.

These attributes can be expressed in logs (as keys of structured logs), traces (as attributes), and metrics (as labels).

We aim to use attributes consistently where applicable. Where possible, these should adhere to codified Semantic Conventions; when not possible, they should maintain consistency across the project.

Extensibility

Envoy Gateway supports both PULL/PUSH mode metrics, with Metrics exported via Prometheus by default.

Additionally, Envoy Gateway can export metrics using both the OTEL gRPC metrics exporter and OTEL HTTP metrics exporter, which pushes metrics by grpc/http to a remote OTEL collector.

Users can extend these in two ways:

Downstream Collection

Based on the exported data, other tools can collect, process, and export telemetry as needed. Some examples include:

Metrics in PULL mode: The OTEL collector can scrape Prometheus and export to X.
Metrics in PUSH mode: The OTEL collector can receive OTEL gRPC/HTTP exporter metrics and export to X.

While the examples above involve OTEL collectors, there are numerous other systems available.

Vendor extensions

The OTEL libraries allow for the registration of Providers/Handlers. While we will offer the default ones (PULL via Prometheus, PUSH via OTEL HTTP metrics exporter) mentioned in Envoy Gateway’s extensibility, we can easily allow custom builds of Envoy Gateway to plug in alternatives if the default options don’t meet their needs.

For instance, users may prefer to write metrics over the OTLP gRPC metrics exporter instead of the HTTP metrics exporter. This is perfectly acceptable – and almost impossible to prevent. The OTEL has ways to register their providers/exporters, and Envoy Gateway can ensure its usage is such that it’s not overly difficult to swap out a different provider/exporter.

Stability

Observability is, in essence, a user-facing API. Its primary purpose is to be consumed - by both humans and tooling. Therefore, having well-defined guarantees around their formats is crucial.

Please note that this refers only to the contents of the telemetry - what we emit, the names of things, semantics, etc. Other settings like Prometheus vs OTLP, JSON vs plaintext, logging levels, etc., are not considered.

I propose the following:

Metrics

Metrics offer the greatest potential for providing guarantees. They often directly influence alerts and dashboards, making changes highly impactful. This contrasts with traces and logs, which are often used for ad-hoc analysis, where minor changes to information can be easily understood by a human.

Moreover, there is precedent for this: Kubernetes Metrics Lifecycle has well-defined processes, and Envoy Gateway’s dataplane (Envoy Proxy) metrics are de facto stable.

Currently, all Envoy Gateway metrics lack defined stability. I suggest we categorize all existing metrics as either:

Deprecated: a metric that is intended to be phased out.
Experimental: a metric that is off by default.
Alpha: a metric that is on by default.

We should aim to promote a core set of metrics to Stable within a few releases.

Envoy Gateway API Types

New APIs will be added to Envoy Gateway config, which are used to manage Control Plane Telemetry bootstrap configs.

EnvoyGatewayTelemetry

// EnvoyGatewayTelemetry defines telemetry configurations for envoy gateway control plane.
// Control plane will focus on metrics observability telemetry and tracing telemetry later.
type EnvoyGatewayTelemetry struct {
	// Metrics defines metrics configuration for envoy gateway.
	Metrics *EnvoyGatewayMetrics `json:"metrics,omitempty"`
}

EnvoyGatewayMetrics

Prometheus will be exposed on 0.0.0.0:19001, which is not supported to be configured yet.

// EnvoyGatewayMetrics defines control plane push/pull metrics configurations.
type EnvoyGatewayMetrics struct {
	// Sinks defines the metric sinks where metrics are sent to.
	Sinks []EnvoyGatewayMetricSink `json:"sinks,omitempty"`
	// Prometheus defines the configuration for prometheus endpoint.
	Prometheus *EnvoyGatewayPrometheusProvider `json:"prometheus,omitempty"`
}

// EnvoyGatewayMetricSink defines control plane
// metric sinks where metrics are sent to.
type EnvoyGatewayMetricSink struct {
	// Type defines the metric sink type.
	// EG control plane currently supports OpenTelemetry.
	// +kubebuilder:validation:Enum=OpenTelemetry
	// +kubebuilder:default=OpenTelemetry
	Type MetricSinkType `json:"type"`
	// OpenTelemetry defines the configuration for OpenTelemetry sink.
	// It's required if the sink type is OpenTelemetry.
	OpenTelemetry *EnvoyGatewayOpenTelemetrySink `json:"openTelemetry,omitempty"`
}

type EnvoyGatewayOpenTelemetrySink struct {
	// Host define the sink service hostname.
	Host string `json:"host"`
	// Protocol define the sink service protocol.
	// +kubebuilder:validation:Enum=grpc;http
	Protocol string `json:"protocol"`
	// Port defines the port the sink service is exposed on.
	//
	// +optional
	// +kubebuilder:validation:Minimum=0
	// +kubebuilder:default=4317
	Port int32 `json:"port,omitempty"`
}

// EnvoyGatewayPrometheusProvider will expose prometheus endpoint in pull mode.
type EnvoyGatewayPrometheusProvider struct {
	// Disable defines if disables the prometheus metrics in pull mode.
	//
	Disable bool `json:"disable,omitempty"`
}

Example

The following is an example to disable prometheus metric.

apiVersion: gateway.envoyproxy.io/v1alpha1
kind: EnvoyGateway
gateway:
  controllerName: gateway.envoyproxy.io/gatewayclass-controller
logging:
  level: null
  default: info
provider:
  type: Kubernetes
telemetry:
  metrics:
    prometheus:
      disable: true

The following is an example to send metric via Open Telemetry sink to OTEL gRPC Collector.

apiVersion: gateway.envoyproxy.io/v1alpha1
kind: EnvoyGateway
gateway:
  controllerName: gateway.envoyproxy.io/gatewayclass-controller
logging:
  level: null
  default: info
provider:
  type: Kubernetes
telemetry:
  metrics:
    sinks:
      - type: OpenTelemetry
        openTelemetry:
          host: otel-collector.monitoring.svc.cluster.local
          port: 4317
          protocol: grpc

The following is an example to disable prometheus metric and send metric via Open Telemetry sink to OTEL HTTP Collector at the same time.

apiVersion: gateway.envoyproxy.io/v1alpha1
kind: EnvoyGateway
gateway:
  controllerName: gateway.envoyproxy.io/gatewayclass-controller
logging:
  level: null
  default: info
provider:
  type: Kubernetes
telemetry:
  metrics:
    prometheus:
      disable: false
    sinks:
      - type: OpenTelemetry
        openTelemetry:
          host: otel-collector.monitoring.svc.cluster.local
          port: 4318
          protocol: http

6 - Backend

Overview

This design document introduces the Backend API allowing system administrators to represent backends without the use of a K8s Service resource.

Common use cases for non-Service backends in the K8s and Envoy ecosystem include:

Cluster-external endpoints, which are currently second-class citizens in Gateway-API (supported using Services and FQDN endpoints).
Host-local endpoints, such as sidecars or daemons that listen on unix domain sockets or envoy internal listeners, that cannot be represented by a K8s service at all.

Several projects currently support backends that are not registered in the infrastructure-specific service registry.

K8s Ingress: Resource Backends
Istio: Service Entry
Gloo Edge: Upstream
Consul: External Services

Goals

Add an API definition to hold settings for configuring Unix Domain Socket, FQDN and IP.
Determine which resources may reference the new backend resource.
Determine which existing Gateway-API and Envoy Gateway policies may attach to the new backend resource.

Non Goals

Support specific backend types, such as S3 Bucket, Redis, AMQP, InfluxDB, etc.

Implementation

The Backend resource is an implementation-specific Gateway-API BackendObjectReference Extension.

Example

Here is an example highlighting how a user can configure a route that forwards traffic to both a K8s Service and a Backend that has both unix domain socket and ip endpoints. A BackendTLSPolicy is attached to the backend resource, enabling TLS.

apiVersion: v1
kind: Service
metadata:
  name: backend
spec:
  ports:
    - name: http
      port: 3000
      targetPort: 3000
  selector:
    app: backend
---
apiVersion: gateway.envoyproxy.io/v1alpha1
kind: Backend
metadata:
  name: backend-mixed-ip-uds
spec:
  appProtocols: 
    - gateway.envoyproxy.io/h2c
  endpoints:
    - unix:
        path: /var/run/backend.sock   
    - ip:
        address: 10.244.0.28
        port: 3000
---
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
  name: backend
spec:
  parentRefs:
    - name: eg
  hostnames:
    - "www.example.com"
  rules:
    - backendRefs:
        - group: gateway.envoyproxy.io
          kind: Backend
          name: backend-mixed-ip-uds
          weight: 1
        - group: ""
          kind: Service
          name: backend
          port: 3000
          weight: 1          
      matches:
        - path:
            type: PathPrefix
            value: /
---
apiVersion: gateway.networking.k8s.io/v1alpha2
kind: BackendTLSPolicy
metadata:
  name: policy-btls
spec:
  targetRef:
    group: gateway.envoyproxy.io
    kind: Backend
    name: backend-mixed-ip-uds
  tls:
    caCertRefs:
      - name: backend-tls-checks-certificate
        group: ''
        kind: ConfigMap
    hostname: example.com

Design Decisions

All instances of BackendObjectReference in Envoy Gateway MAY support referencing the Backend kind.
For security reasons, Envoy Gateway MUST reject references to a Backend in xRoute resources. For example, UDS and localhost references will not be supported for xRoutes.
All attributes of the Envoy Gateway extended BackendRef resource MUST be implemented for the Backend resource.
A Backend resource referenced by BackendObjectReference will be translated to Envoy Gateway’s IR DestinationSetting. As such, all BackendAdresses are treated as equivalent endpoints with identical weights, TLS settings, etc.
Gateway-API and Envoy Gateway policies that attach to Services (BackendTLSPolicy, BackendLBPolicy) MUST support attachment to the Backend resource in Envoy Gateway.
Policy attachment to a named section of the Backend resource is not supported at this time. Currently, BackendObjectReference can only select ports, and not generic section names. Hence, a named section of Backend cannot be referenced by routes, and so attachment of policies to named sections will create translation ambiguity. Users that wish to attach policies to some of the BackendAddresses in a Backend resource can use multiple Backend resources and pluralized BackendRefs instead.
The Backend API SHOULD support other Gateway-API backend features, such as Backend Protocol Selection. Translation of explicit upstream application protocol setting SHOULD be consistent with the existing implementation for Service resources.
The Backend upstream transport protocol (TCP, UDP) is inferred from the xRoute kind: TCP is inferred for all routes except for UDPRoute which is resolved to UDP.
This API resource MUST be part of same namespace as the targetRef resource. The Backend API MUST be subject to the same cross-namespace reference restriction as referenced Service resources.
The Backend resource translation MUST NOT modify Infrastructure. Any change to infrastructure that is required to achieve connectivity to a backend (mounting a socket, adding a sidecar container, modifying a network policy, …) MUST be implemented with an appropriate infrastructure patch in the EnvoyProxy API.
To limit the overall maintenance effort related to supporting of non-Service backends, the Backend API SHOULD support multiple generic address types (UDS, FQDN, IPv4, IPv6), and MUST NOT support vendor-specific backend types.
Both Backend and Service resources may appear in the same BackendRefs list.
The Optional Port field SHOULD NOT be evaluated when referencing a Backend.
Referenced Backend resources MUST be translated to envoy endpoints, similar to the current Service translation.
Certain combinations of Backend and Service are incompatible. For example, a Unix Domain Socket and a FQDN service require different cluster service discovery types (Static/EDS and Strict-DNS accordingly).
If a Backend that is referenced by a route cannot be translated, the Route resource will have an Accepted=False condition with a UnsupportedValue reason.
This API needs to be explicitly enabled using the EnvoyGateway API

Alternatives

The project can indefinitely wait for these configuration parameters to be part of the Gateway API.
Users can leverage the existing Envoy Patch Policy or Envoy Extension Manager to inject custom envoy clusters and route configuration. However, these features require a high level of envoy expertise, investment and maintenance.

7 - BackendTrafficPolicy

Overview

This design document introduces the BackendTrafficPolicy API allowing users to configure the behavior for how the Envoy Proxy server communicates with upstream backend services/endpoints.

Goals

Add an API definition to hold settings for configuring behavior of the connection between the backend services and Envoy Proxy listener.

Non Goals

Define the API configuration fields in this API.

Implementation

BackendTrafficPolicy is an implied hierarchy type API that can be used to extend Gateway API. It can target either a Gateway, or an xRoute (HTTPRoute/GRPCRoute/etc.). When targeting a Gateway, it will apply the configured settings within ght BackendTrafficPolicy to all children xRoute resources of that Gateway. If a BackendTrafficPolicy targets an xRoute and a different BackendTrafficPolicy targets the Gateway that route belongs to, then the configuration from the policy that is targeting the xRoute resource will win in a conflict.

Example

Here is an example highlighting how a user can configure this API.

apiVersion: gateway.networking.k8s.io/v1
kind: GatewayClass
metadata:
  name: eg
spec:
  controllerName: gateway.envoyproxy.io/gatewayclass-controller
---
apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
metadata:
  name: eg
  namespace: default
spec:
  gatewayClassName: eg
  listeners:
    - name: http
      protocol: HTTP
      port: 80
---
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
  name: ipv4-route
  namespace: default
spec:
  parentRefs:
    - name: eg
  hostnames:
    - "www.foo.example.com"
  rules:
    - backendRefs:
        - group: ""
          kind: Service
          name: ipv4-service
          port: 3000
          weight: 1
      matches:
        - path:
            type: PathPrefix
            value: /
---
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
  name: ipv6-route
  namespace: default
spec:
  parentRefs:
    - name: eg
  hostnames:
    - "www.bar.example.com"
  rules:
    - backendRefs:
        - group: ""
          kind: Service
          name: ipv6-service
          port: 3000
          weight: 1
      matches:
        - path:
            type: PathPrefix
            value: /
---
apiVersion: gateway.envoyproxy.io/v1alpha1
kind: BackendTrafficPolicy
metadata:
  name: default-ipv-policy
  namespace: default
spec:
  protocols:
    enableIPv6: false
  targetRef:
    group: gateway.networking.k8s.io
    kind: Gateway
    name: eg
---
apiVersion: gateway.envoyproxy.io/v1alpha1
kind: BackendTrafficPolicy
metadata:
  name: ipv6-support-policy
  namespace: default
spec:
  protocols:
    enableIPv6: true
  targetRef:
    group: gateway.networking.k8s.io
    kind: HTTPRoute
    name: ipv6-route

Features / API Fields

Here is a list of some features that can be included in this API. Note that this list is not exhaustive.

Protocol configuration
Circuit breaking
Retries
Keep alive probes
Health checking
Load balancing
Rate limit

Design Decisions

This API will only support a single targetRef and can bind to only a Gateway or xRoute (HTTPRoute/GRPCRoute/etc.) resource.
This API resource MUST be part of same namespace as the resource it targets.
There can be only be ONE policy resource attached to a specific Listener (section) within a Gateway
If the policy targets a resource but cannot attach to it, this information should be reflected in the Policy Status field using the Conflicted=True condition.
If multiple polices target the same resource, the oldest resource (based on creation timestamp) will attach to the Gateway Listeners, the others will not.
If Policy A has a targetRef that includes a sectionName i.e. it targets a specific Listener within a Gateway and Policy B has a targetRef that targets the same entire Gateway then
- Policy A will be applied/attached to the specific Listener defined in the targetRef.SectionName
- Policy B will be applied to the remaining Listeners within the Gateway. Policy B will have an additional status condition Overridden=True.

Alternatives

The project can indefintely wait for these configuration parameters to be part of the Gateway API.

8 - Bootstrap Design

Overview

Issue 31 specifies the need for allowing advanced users to specify their custom Envoy Bootstrap configuration rather than using the default Bootstrap configuration defined in Envoy Gateway. This allows advanced users to extend Envoy Gateway and support their custom use cases such setting up tracing and stats configuration that is not supported by Envoy Gateway.

Goals

Define an API field to allow a user to specify a custom Bootstrap
Provide tooling to allow the user to generate the default Bootstrap configuration as well as validate their custom Bootstrap.

Non Goals

Allow user to configure only a section of the Bootstrap

API

Leverage the existing EnvoyProxy resource which can be attached to the GatewayClass using the parametersRef field, and define a Bootstrap field within the resource. If this field is set, the value is used as the Bootstrap configuration for all managed Envoy Proxies created by Envoy Gateway.

// EnvoyProxySpec defines the desired state of EnvoyProxy.
type EnvoyProxySpec struct {
    ......
	// Bootstrap defines the Envoy Bootstrap as a YAML string.
	// Visit https://www.envoyproxy.io/docs/envoy/latest/api-v3/config/bootstrap/v3/bootstrap.proto#envoy-v3-api-msg-config-bootstrap-v3-bootstrap
	// to learn more about the syntax.
	// If set, this is the Bootstrap configuration used for the managed Envoy Proxy fleet instead of the default Bootstrap configuration
	// set by Envoy Gateway.
	// Some fields within the Bootstrap that are required to communicate with the xDS Server (Envoy Gateway) and receive xDS resources
	// from it are not configurable and will result in the `EnvoyProxy` resource being rejected.
	// Backward compatibility across minor versions is not guaranteed.
	// We strongly recommend using `egctl x translate` to generate a `EnvoyProxy` resource with the `Bootstrap` field set to the default
	// Bootstrap configuration used. You can edit this configuration, and rerun `egctl x translate` to ensure there are no validation errors.
	//
	// +optional
	Bootstrap *string `json:"bootstrap,omitempty"`
}

Tooling

A CLI tool egctl x translate will be provided to the user to help generate a working Bootstrap configuration. Here is an example where a user inputs a GatewayClass and the CLI generates the EnvoyProxy resource with the Bootstrap field populated.

cat <<EOF | egctl x translate --from gateway-api --to gateway-api -f -
apiVersion: gateway.networking.k8s.io/v1
kind: GatewayClass
metadata:
  name: eg
spec:
  controllerName: gateway.envoyproxy.io/gatewayclass-controller
---

EOF

apiVersion: gateway.networking.k8s.io/v1
kind: GatewayClass
metadata:
  name: eg
spec:
  controllerName: gateway.envoyproxy.io/gatewayclass-controller
  parametersRef:
    group: gateway.envoyproxy.io/v1alpha1
    kind: EnvoyProxy
    name: with-bootstrap-config
---
apiVersion: gateway.envoyproxy.io/v1alpha1
kind: EnvoyProxy
metadata:
  name: with-bootstrap-config
spec:
  bootstrap: |
    admin:
      access_log:
      - name: envoy.access_loggers.file
        typed_config:
          "@type": type.googleapis.com/envoy.extensions.access_loggers.file.v3.FileAccessLog
          path: /dev/null
      address:
        socket_address:
          address: 127.0.0.1
          port_value: 19000
    dynamic_resources:
      cds_config:
        resource_api_version: V3
        api_config_source:
          api_type: DELTA_GRPC
          transport_api_version: V3
          grpc_services:
          - envoy_grpc:
              cluster_name: xds_cluster
          set_node_on_first_message_only: true
      lds_config:
        resource_api_version: V3
        api_config_source:
          api_type: DELTA_GRPC
          transport_api_version: V3
          grpc_services:
          - envoy_grpc:
              cluster_name: xds_cluster
          set_node_on_first_message_only: true
    static_resources:
      clusters:
      - connect_timeout: 1s
        load_assignment:
          cluster_name: xds_cluster
          endpoints:
          - lb_endpoints:
            - endpoint:
                address:
                  socket_address:
                    address: envoy-gateway
                    port_value: 18000
        typed_extension_protocol_options:
          "envoy.extensions.upstreams.http.v3.HttpProtocolOptions":
             "@type": "type.googleapis.com/envoy.extensions.upstreams.http.v3.HttpProtocolOptions"
             "explicit_http_config":
               "http2_protocol_options": {}
        name: xds_cluster
        type: STRICT_DNS
        transport_socket:
          name: envoy.transport_sockets.tls
          typed_config:
            "@type": type.googleapis.com/envoy.extensions.transport_sockets.tls.v3.UpstreamTlsContext
            common_tls_context:
              tls_params:
                tls_maximum_protocol_version: TLSv1_3
              tls_certificate_sds_secret_configs:
              - name: xds_certificate
                sds_config:
                  path_config_source:
                    path: "/sds/xds-certificate.json"
                  resource_api_version: V3
              validation_context_sds_secret_config:
                name: xds_trusted_ca
                sds_config:
                  path_config_source:
                    path: "/sds/xds-trusted-ca.json"
                  resource_api_version: V3
    layered_runtime:
      layers:
        - name: runtime-0
          rtds_layer:
            rtds_config:
              resource_api_version: V3
              api_config_source:
                transport_api_version: V3
                api_type: DELTA_GRPC
                grpc_services:
                  envoy_grpc:
                    cluster_name: xds_cluster
            name: runtime-0

The user can now modify the output, for their use case. Lets say for this example, the user wants to change the admin server port from 19000 to 18000, they can do so by editing the previous output and running egctl x translate again to see if there any validation errors. Validation errors should be surfaced in the Status subresource. The internal validator will ensure that the Bootstrap string can be unmarshalled into the Bootstrap object as well as ensure the user can override certain fields within the Bootstrap configuration such as the address and tls context within the xds_cluster which are essential for xDS communication between Envoy Gateway and Envoy Proxy.

cat <<EOF | egctl x translate --from gateway-api --to gateway-api -f -
apiVersion: gateway.networking.k8s.io/v1
kind: GatewayClass
metadata:
  name: eg
spec:
  controllerName: gateway.envoyproxy.io/gatewayclass-controller
  parametersRef:
    group: gateway.envoyproxy.io/v1alpha1
    kind: EnvoyProxy
    name: with-bootstrap-config
---
apiVersion: gateway.envoyproxy.io/v1alpha1
kind: EnvoyProxy
metadata:
  name: with-bootstrap-config
spec:
  bootstrap: |
    admin:
      access_log:
      - name: envoy.access_loggers.file
        typed_config:
          "@type": type.googleapis.com/envoy.extensions.access_loggers.file.v3.FileAccessLog
          path: /dev/null
      address:
        socket_address:
          address: 127.0.0.1
          port_value: 18000
    dynamic_resources:
      cds_config:
        resource_api_version: V3
        api_config_source:
          api_type: DELTA_GRPC
          transport_api_version: V3
          grpc_services:
          - envoy_grpc:
              cluster_name: xds_cluster
          set_node_on_first_message_only: true
      lds_config:
        resource_api_version: V3
        api_config_source:
          api_type: DELTA_GRPC
          transport_api_version: V3
          grpc_services:
          - envoy_grpc:
              cluster_name: xds_cluster
          set_node_on_first_message_only: true
    static_resources:
      clusters:
      - connect_timeout: 1s
        load_assignment:
          cluster_name: xds_cluster
          endpoints:
          - lb_endpoints:
            - endpoint:
                address:
                  socket_address:
                    address: envoy-gateway
                    port_value: 18000
        typed_extension_protocol_options:
          "envoy.extensions.upstreams.http.v3.HttpProtocolOptions":
             "@type": "type.googleapis.com/envoy.extensions.upstreams.http.v3.HttpProtocolOptions"
             "explicit_http_config":
               "http2_protocol_options": {}
        name: xds_cluster
        type: STRICT_DNS
        transport_socket:
          name: envoy.transport_sockets.tls
          typed_config:
            "@type": type.googleapis.com/envoy.extensions.transport_sockets.tls.v3.UpstreamTlsContext
            common_tls_context:
              tls_params:
                tls_maximum_protocol_version: TLSv1_3
              tls_certificate_sds_secret_configs:
              - name: xds_certificate
                sds_config:
                  path_config_source:
                    path: "/sds/xds-certificate.json"
                  resource_api_version: V3
              validation_context_sds_secret_config:
                name: xds_trusted_ca
                sds_config:
                  path_config_source:
                    path: "/sds/xds-trusted-ca.json"
                  resource_api_version: V3
    layered_runtime:
      layers:
        - name: runtime-0
          rtds_layer:
            rtds_config:
              resource_api_version: V3
              api_config_source:
                transport_api_version: V3
                api_type: DELTA_GRPC
                grpc_services:
                  envoy_grpc:
                    cluster_name: xds_cluster
            name: runtime-0

EOF

apiVersion: gateway.networking.k8s.io/v1
kind: GatewayClass
metadata:
  name: eg
spec:
  controllerName: gateway.envoyproxy.io/gatewayclass-controller
  parametersRef:
    group: gateway.envoyproxy.io/v1alpha1
    kind: EnvoyProxy
    name: with-bootstrap-config
---
apiVersion: gateway.envoyproxy.io/v1alpha1
kind: EnvoyProxy
metadata:
  name: with-bootstrap-config
spec:
  bootstrap: |
    admin:
      access_log:
      - name: envoy.access_loggers.file
        typed_config:
          "@type": type.googleapis.com/envoy.extensions.access_loggers.file.v3.FileAccessLog
          path: /dev/null
      address:
        socket_address:
          address: 127.0.0.1
          port_value: 18000
    dynamic_resources:
      cds_config:
        resource_api_version: V3
        api_config_source:
          api_type: DELTA_GRPC
          transport_api_version: V3
          grpc_services:
          - envoy_grpc:
              cluster_name: xds_cluster
          set_node_on_first_message_only: true
      lds_config:
        resource_api_version: V3
        api_config_source:
          api_type: DELTA_GRPC
          transport_api_version: V3
          grpc_services:
          - envoy_grpc:
              cluster_name: xds_cluster
          set_node_on_first_message_only: true
    static_resources:
      clusters:
      - connect_timeout: 1s
        load_assignment:
          cluster_name: xds_cluster
          endpoints:
          - lb_endpoints:
            - endpoint:
                address:
                  socket_address:
                    address: envoy-gateway
                    port_value: 18000
        typed_extension_protocol_options:
          "envoy.extensions.upstreams.http.v3.HttpProtocolOptions":
             "@type": "type.googleapis.com/envoy.extensions.upstreams.http.v3.HttpProtocolOptions"
             "explicit_http_config":
               "http2_protocol_options": {}
        name: xds_cluster
        type: STRICT_DNS
        transport_socket:
          name: envoy.transport_sockets.tls
          typed_config:
            "@type": type.googleapis.com/envoy.extensions.transport_sockets.tls.v3.UpstreamTlsContext
            common_tls_context:
              tls_params:
                tls_maximum_protocol_version: TLSv1_3
              tls_certificate_sds_secret_configs:
              - name: xds_certificate
                sds_config:
                  path_config_source:
                    path: "/sds/xds-certificate.json"
                  resource_api_version: V3
              validation_context_sds_secret_config:
                name: xds_trusted_ca
                sds_config:
                  path_config_source:
                    path: "/sds/xds-trusted-ca.json"
                  resource_api_version: V3
    layered_runtime:
      layers:
        - name: runtime-0
          rtds_layer:
            rtds_config:
              resource_api_version: V3
              api_config_source:
                transport_api_version: V3
                api_type: DELTA_GRPC
                grpc_services:
                  envoy_grpc:
                    cluster_name: xds_cluster
            name: runtime-0

9 - ClientTrafficPolicy

Overview

This design document introduces the ClientTrafficPolicy API allowing system administrators to configure the behavior for how the Envoy Proxy server behaves with downstream clients.

Goals

Add an API definition to hold settings for configuring behavior of the connection between the downstream client and Envoy Proxy listener.

Non Goals

Define the API configuration fields in this API.

Implementation

ClientTrafficPolicy is a Direct Policy Attachment type API that can be used to extend Gateway API to define configuration that affect the connection between the downstream client and Envoy Proxy listener.

Example

Here is an example highlighting how a user can configure this API.

apiVersion: gateway.networking.k8s.io/v1
kind: GatewayClass
metadata:
  name: eg
spec:
  controllerName: gateway.envoyproxy.io/gatewayclass-controller
---
apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
metadata:
  name: eg
  namespace: default
spec:
  gatewayClassName: eg
  listeners:
    - name: http
      protocol: HTTP
      port: 80
---
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
  name: backend
  namespace: default
spec:
  parentRefs:
    - name: eg
  hostnames:
    - "www.example.com"
  rules:
    - backendRefs:
        - group: ""
          kind: Service
          name: backend
          port: 3000
          weight: 1
      matches:
        - path:
            type: PathPrefix
            value: /
---
apiVersion: gateway.envoyproxy.io/v1alpha1
kind: ClientTrafficPolicy
metadata:
  name: enable-proxy-protocol-policy
  namespace: default
spec:
  targetRef:
    group: gateway.networking.k8s.io
    kind: Gateway
    name: eg
    namespace: default
  enableProxyProtocol: true

Features / API Fields

Here is a list of features that can be included in this API

Downstream ProxyProtocol
Downstream Keep Alives
IP Blocking
Downstream HTTP3

Design Decisions

This API will only support a single targetRef and can bind to only a Gateway resource.
This API resource MUST be part of same namespace as the Gateway resource
There can be only be ONE policy resource attached to a specific Listener (section) within a Gateway
If the policy targets a resource but cannot attach to it, this information should be reflected in the Policy Status field using the Conflicted=True condition.
If multiple polices target the same resource, the oldest resource (based on creation timestamp) will attach to the Gateway Listeners, the others will not.
If Policy A has a targetRef that includes a sectionName i.e. it targets a specific Listener within a Gateway and Policy B has a targetRef that targets the same entire Gateway then
- Policy A will be applied/attached to the specific Listener defined in the targetRef.SectionName
- Policy B will be applied to the remaining Listeners within the Gateway. Policy B will have an additional status condition Overridden=True.

Alternatives

The project can indefintely wait for these configuration parameters to be part of the Gateway API.

10 - Configuration API Design

Motivation

Issue 51 specifies the need to design an API for configuring Envoy Gateway. The control plane is configured statically at startup and the data plane is configured dynamically through Kubernetes resources, primarily Gateway API objects. Refer to the Envoy Gateway design doc for additional details regarding Envoy Gateway terminology and configuration.

Goals

Define an initial API to configure Envoy Gateway at startup.
Define an initial API for configuring the managed data plane, e.g. Envoy proxies.

Non-Goals

Implementation of the configuration APIs.
Define the status subresource of the configuration APIs.
Define a complete set of APIs for configuring Envoy Gateway. As stated in the Goals, this document defines the initial configuration APIs.
Define an API for deploying/provisioning/operating Envoy Gateway. If needed, a future Envoy Gateway operator would be responsible for designing and implementing this type of API.
Specify tooling for managing the API, e.g. generate protos, CRDs, controller RBAC, etc.

Control Plane API

The EnvoyGateway API defines the control plane configuration, e.g. Envoy Gateway. Key points of this API are:

It will define Envoy Gateway’s startup configuration file. If the file does not exist, Envoy Gateway will start up with default configuration parameters.
EnvoyGateway inlines the TypeMeta API. This allows EnvoyGateway to be versioned and managed as a GroupVersionKind scheme.
EnvoyGateway does not contain a metadata field since it’s currently represented as a static configuration file instead of a Kubernetes resource.
Since EnvoyGateway does not surface status, EnvoyGatewaySpec is inlined.
If data plane static configuration is required in the future, Envoy Gateway will use a separate file for this purpose.

The v1alpha1 version and gateway.envoyproxy.io API group get generated:

// gateway/api/config/v1alpha1/doc.go

// Package v1alpha1 contains API Schema definitions for the gateway.envoyproxy.io API group.
//
// +groupName=gateway.envoyproxy.io
package v1alpha1

The initial EnvoyGateway API:

// gateway/api/config/v1alpha1/envoygateway.go

package valpha1

import (
	metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
)

// EnvoyGateway is the Schema for the envoygateways API
type EnvoyGateway struct {
	metav1.TypeMeta `json:",inline"`

	// EnvoyGatewaySpec defines the desired state of Envoy Gateway.
	EnvoyGatewaySpec `json:",inline"`
}

// EnvoyGatewaySpec defines the desired state of Envoy Gateway configuration.
type EnvoyGatewaySpec struct {
	// Gateway defines Gateway-API specific configuration. If unset, default
	// configuration parameters will apply.
	//
	// +optional
	Gateway *Gateway `json:"gateway,omitempty"`

	// Provider defines the desired provider configuration. If unspecified,
	// the Kubernetes provider is used with default parameters.
	//
	// +optional
	Provider *EnvoyGatewayProvider `json:"provider,omitempty"`
}

// Gateway defines desired Gateway API configuration of Envoy Gateway.
type Gateway struct {
	// ControllerName defines the name of the Gateway API controller. If unspecified,
	// defaults to "gateway.envoyproxy.io/gatewayclass-controller". See the following
	// for additional details:
	//
	// https://gateway-api.sigs.k8s.io/reference/spec/#gateway.networking.k8s.io/v1.GatewayClass
	//
	// +optional
	ControllerName string `json:"controllerName,omitempty"`
}

// EnvoyGatewayProvider defines the desired configuration of a provider.
// +union
type EnvoyGatewayProvider struct {
	// Type is the type of provider to use. If unset, the Kubernetes provider is used.
	//
	// +unionDiscriminator
	Type ProviderType `json:"type,omitempty"`
	// Kubernetes defines the configuration of the Kubernetes provider. Kubernetes
	// provides runtime configuration via the Kubernetes API.
	//
	// +optional
	Kubernetes *EnvoyGatewayKubernetesProvider `json:"kubernetes,omitempty"`

	// File defines the configuration of the File provider. File provides runtime
	// configuration defined by one or more files.
	//
	// +optional
	File *EnvoyGatewayFileProvider `json:"file,omitempty"`
}

// ProviderType defines the types of providers supported by Envoy Gateway.
type ProviderType string

const (
	// KubernetesProviderType defines the "Kubernetes" provider.
	KubernetesProviderType ProviderType = "Kubernetes"

	// FileProviderType defines the "File" provider.
	FileProviderType ProviderType = "File"
)

// EnvoyGatewayKubernetesProvider defines configuration for the Kubernetes provider.
type EnvoyGatewayKubernetesProvider struct {
	// TODO: Add config as use cases are better understood.
}

// EnvoyGatewayFileProvider defines configuration for the File provider.
type EnvoyGatewayFileProvider struct {
	// TODO: Add config as use cases are better understood.
}

Note: Provider-specific configuration is defined in the {$PROVIDER_NAME}Provider API.

Gateway

Gateway defines desired configuration of Gateway API controllers that reconcile and translate Gateway API resources into the Intermediate Representation (IR). Refer to the Envoy Gateway design doc for additional details.

Provider

Provider defines the desired configuration of an Envoy Gateway provider. A provider is an infrastructure component that Envoy Gateway calls to establish its runtime configuration. Provider is a union type. Therefore, Envoy Gateway can be configured with only one provider based on the type discriminator field. Refer to the Envoy Gateway design doc for additional details.

Control Plane Configuration

The configuration file is defined by the EnvoyGateway API type. At startup, Envoy Gateway searches for the configuration at “/etc/envoy-gateway/config.yaml”.

Start Envoy Gateway:

$ ./envoy-gateway

Since the configuration file does not exist, Envoy Gateway will start with default configuration parameters.

The Kubernetes provider can be configured explicitly using provider.kubernetes:

$ cat << EOF > /etc/envoy-gateway/config.yaml
apiVersion: gateway.envoyproxy.io/v1alpha1
kind: EnvoyGateway
provider:
  type: Kubernetes
  kubernetes: {}
EOF

This configuration will cause Envoy Gateway to use the Kubernetes provider with default configuration parameters.

The Kubernetes provider can be configured using the provider field. For example, the foo field can be set to “bar”:

$ cat << EOF > /etc/envoy-gateway/config.yaml
apiVersion: gateway.envoyproxy.io/v1alpha1
kind: EnvoyGateway
provider:
  type: Kubernetes
  kubernetes:
    foo: bar
EOF

Note: The Provider API from the Kubernetes package is currently undefined and foo: bar is provided for illustration purposes only.

The same API structure is followed for each supported provider. The following example causes Envoy Gateway to use the File provider:

$ cat << EOF > /etc/envoy-gateway/config.yaml
apiVersion: gateway.envoyproxy.io/v1alpha1
kind: EnvoyGateway
provider:
  type: File
  file:
    foo: bar
EOF

Note: The Provider API from the File package is currently undefined and foo: bar is provided for illustration purposes only.

Gateway API-related configuration is expressed through the gateway field. If unspecified, Envoy Gateway will use default configuration parameters for gateway. The following example causes the GatewayClass controller to manage GatewayClasses with controllerName foo instead of the default gateway.envoyproxy.io/gatewayclass-controller:

$ cat << EOF > /etc/envoy-gateway/config.yaml
apiVersion: gateway.envoyproxy.io/v1alpha1
kind: EnvoyGateway
gateway:
  controllerName: foo
EOF

With any of the above configuration examples, Envoy Gateway can be started without any additional arguments:

$ ./envoy-gateway

Data Plane API

The data plane is configured dynamically through Kubernetes resources, primarily Gateway API objects. Optionally, the data plane infrastructure can be configured by referencing a custom resource (CR) through spec.parametersRef of the managed GatewayClass. The EnvoyProxy API defines the data plane infrastructure configuration and is represented as the CR referenced by the managed GatewayClass. Key points of this API are:

If unreferenced by gatewayclass.spec.parametersRef, default parameters will be used to configure the data plane infrastructure, e.g. expose Envoy network endpoints using a LoadBalancer service.
Envoy Gateway will follow Gateway API recommendations regarding updates to the EnvoyProxy CR:
It is recommended that this resource be used as a template for Gateways. This means that a Gateway is based on the state of the GatewayClass at the time it was created and changes to the GatewayClass or associated parameters are not propagated down to existing Gateways.

The initial EnvoyProxy API:

// gateway/api/config/v1alpha1/envoyproxy.go

package v1alpha1

import (
	metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
)

// EnvoyProxy is the Schema for the envoyproxies API.
type EnvoyProxy struct {
	metav1.TypeMeta   `json:",inline"`
	metav1.ObjectMeta `json:"metadata,omitempty"`

	Spec   EnvoyProxySpec   `json:"spec,omitempty"`
	Status EnvoyProxyStatus `json:"status,omitempty"`
}

// EnvoyProxySpec defines the desired state of Envoy Proxy infrastructure
// configuration.
type EnvoyProxySpec struct {
	// Undefined by this design spec.
}

// EnvoyProxyStatus defines the observed state of EnvoyProxy.
type EnvoyProxyStatus struct {
	// Undefined by this design spec.
}

The EnvoyProxySpec and EnvoyProxyStatus fields will be defined in the future as proxy infrastructure configuration use cases are better understood.

Data Plane Configuration

GatewayClass and Gateway resources define the data plane infrastructure. Note that all examples assume Envoy Gateway is running with the Kubernetes provider.

apiVersion: gateway.networking.k8s.io/v1
kind: GatewayClass
metadata:
  name: example-class
spec:
  controllerName: gateway.envoyproxy.io/gatewayclass-controller
---
apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
metadata:
  name: example-gateway
spec:
  gatewayClassName: example-class
  listeners:
  - name: http
    protocol: HTTP
    port: 80

Since the GatewayClass does not define spec.parametersRef, the data plane is provisioned using default configuration parameters. The Envoy proxies will be configured with a http listener and a Kubernetes LoadBalancer service listening on port 80.

The following example will configure the data plane to use a ClusterIP service instead of the default LoadBalancer service:

apiVersion: gateway.networking.k8s.io/v1
kind: GatewayClass
metadata:
  name: example-class
spec:
  controllerName: gateway.envoyproxy.io/gatewayclass-controller
  parametersRef:
    name: example-config
    group: gateway.envoyproxy.io
    kind: EnvoyProxy
---
apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
metadata:
  name: example-gateway
spec:
  gatewayClassName: example-class
  listeners:
  - name: http
    protocol: HTTP
    port: 80
---
apiVersion: gateway.envoyproxy.io/v1alpha1
kind: EnvoyProxy
metadata:
  name: example-config
spec:
  networkPublishing:
    type: ClusterIPService

Note: The NetworkPublishing API is currently undefined and is provided here for illustration purposes only.

11 - Data Plane Observability: Accesslog

Overview

Envoy supports extensible accesslog to different sinks, File, gRPC etc.

Envoy supports customizable access log formats using predefined fields as well as arbitrary HTTP request and response headers.

Envoy supports several built-in access log filters and extension filters that are registered at runtime.

Envoy Gateway leverages Gateway API for configuring managed Envoy proxies. Gateway API defines core, extended, and implementation-specific API support levels for implementers such as Envoy Gateway to expose features. Since accesslog is not covered by Core or Extended APIs, EG should provide an easy to config access log formats and sinks per EnvoyProxy.

Goals

Support send accesslog to File or OpenTelemetry backend
TODO: Support access log filters base on CEL expression

Non-Goals

Support non-CEL filters, e.g. status_code_filter, response_flag_filter
Support HttpGrpcAccessLogConfig or TcpGrpcAccessLogConfig

Use-Cases

Configure accesslog for a EnvoyProxy to File
Configure accesslog for a EnvoyProxy to OpenTelemetry backend
Configure multi accesslog providers for a EnvoyProxy

ProxyAccessLog API Type

type ProxyAccessLog struct {
	// Disable disables access logging for managed proxies if set to true.
	Disable bool `json:"disable,omitempty"`
	// Settings defines accesslog settings for managed proxies.
	// If unspecified, will send default format to stdout.
	// +optional
	Settings []ProxyAccessLogSetting `json:"settings,omitempty"`
}

type ProxyAccessLogSetting struct {
	// Format defines the format of accesslog.
	Format ProxyAccessLogFormat `json:"format"`
	// Sinks defines the sinks of accesslog.
	// +kubebuilder:validation:MinItems=1
	Sinks []ProxyAccessLogSink `json:"sinks"`
}

type ProxyAccessLogFormatType string

const (
	// ProxyAccessLogFormatTypeText defines the text accesslog format.
	ProxyAccessLogFormatTypeText ProxyAccessLogFormatType = "Text"
	// ProxyAccessLogFormatTypeJSON defines the JSON accesslog format.
	ProxyAccessLogFormatTypeJSON ProxyAccessLogFormatType = "JSON"
	// TODO: support format type "mix" in the future.
)

// ProxyAccessLogFormat defines the format of accesslog.
// +union
type ProxyAccessLogFormat struct {
	// Type defines the type of accesslog format.
	// +kubebuilder:validation:Enum=Text;JSON
	// +unionDiscriminator
	Type ProxyAccessLogFormatType `json:"type,omitempty"`
	// Text defines the text accesslog format, following Envoy accesslog formatting,
	// empty value results in proxy's default access log format.
	// It's required when the format type is "Text".
	// Envoy [command operators](https://www.envoyproxy.io/docs/envoy/latest/configuration/observability/access_log/usage#command-operators) may be used in the format.
	// The [format string documentation](https://www.envoyproxy.io/docs/envoy/latest/configuration/observability/access_log/usage#config-access-log-format-strings) provides more information.
	// +optional
	Text *string `json:"text,omitempty"`
	// JSON is additional attributes that describe the specific event occurrence.
	// Structured format for the envoy access logs. Envoy [command operators](https://www.envoyproxy.io/docs/envoy/latest/configuration/observability/access_log/usage#command-operators)
	// can be used as values for fields within the Struct.
	// It's required when the format type is "JSON".
	// +optional
	JSON map[string]string `json:"json,omitempty"`
}

type ProxyAccessLogSinkType string

const (
	// ProxyAccessLogSinkTypeFile defines the file accesslog sink.
	ProxyAccessLogSinkTypeFile ProxyAccessLogSinkType = "File"
	// ProxyAccessLogSinkTypeOpenTelemetry defines the OpenTelemetry accesslog sink.
	ProxyAccessLogSinkTypeOpenTelemetry ProxyAccessLogSinkType = "OpenTelemetry"
)

type ProxyAccessLogSink struct {
	// Type defines the type of accesslog sink.
	// +kubebuilder:validation:Enum=File;OpenTelemetry
	Type ProxyAccessLogSinkType `json:"type,omitempty"`
	// File defines the file accesslog sink.
	// +optional
	File *FileEnvoyProxyAccessLog `json:"file,omitempty"`
	// OpenTelemetry defines the OpenTelemetry accesslog sink.
	// +optional
	OpenTelemetry *OpenTelemetryEnvoyProxyAccessLog `json:"openTelemetry,omitempty"`
}

type FileEnvoyProxyAccessLog struct {
	// Path defines the file path used to expose envoy access log(e.g. /dev/stdout).
	// Empty value disables accesslog.
	Path string `json:"path,omitempty"`
}

// TODO: consider reuse ExtensionService?
type OpenTelemetryEnvoyProxyAccessLog struct {
	// Host define the extension service hostname.
	Host string `json:"host"`
	// Port defines the port the extension service is exposed on.
	//
	// +optional
	// +kubebuilder:validation:Minimum=0
	// +kubebuilder:default=4317
	Port int32 `json:"port,omitempty"`
	// Resources is a set of labels that describe the source of a log entry, including envoy node info.
	// It's recommended to follow [semantic conventions](https://opentelemetry.io/docs/reference/specification/resource/semantic_conventions/).
	// +optional
	Resources map[string]string `json:"resources,omitempty"`

	// TODO: support more OpenTelemetry accesslog options(e.g. TLS, auth etc.) in the future.
}

Example

The following is an example to disable access log.

apiVersion: gateway.envoyproxy.io/v1alpha1
kind: EnvoyProxy
metadata:
  name: disable-accesslog
  namespace: envoy-gateway-system
spec:
  telemetry:
    accessLog:
      disable: true

The following is an example with text format access log.

apiVersion: gateway.envoyproxy.io/v1alpha1
kind: EnvoyProxy
metadata:
  name: text-access-logging
  namespace: envoy-gateway-system
spec:
  telemetry:
    accessLog:
      settings:
        - format:
            type: Text
            text: |
              [%START_TIME%] "%REQ(:METHOD)% %REQ(X-ENVOY-ORIGINAL-PATH?:PATH)% %PROTOCOL%" %RESPONSE_CODE% %RESPONSE_FLAGS% %BYTES_RECEIVED% %BYTES_SENT% %DURATION% "%REQ(X-FORWARDED-FOR)%" "%REQ(USER-AGENT)%" "%REQ(X-REQUEST-ID)%" "%REQ(:AUTHORITY)%" "%UPSTREAM_HOST%"
          sinks:
            - type: File
              file:
                path: /dev/stdout

The following is an example with json format access log.

apiVersion: gateway.envoyproxy.io/v1alpha1
kind: EnvoyProxy
metadata:
  name: json-access-logging
  namespace: envoy-gateway-system
spec:
  telemetry:
    accessLog:
      settings:
        - format:
          type: JSON
          json:
            status: "%RESPONSE_CODE%"
            message: "%LOCAL_REPLY_BODY%"
      sinks:
        - type: File
          file:
            path: /dev/stdout

The following is an example with OpenTelemetry format access log.

apiVersion: gateway.envoyproxy.io/v1alpha1
kind: EnvoyProxy
metadata:
  name: otel-access-logging
  namespace: envoy-gateway-system
spec:
  telemetry:
    accessLog:
      settings:
        - format:
            type: Text
            text: |
              [%START_TIME%] "%REQ(:METHOD)% %REQ(X-ENVOY-ORIGINAL-PATH?:PATH)% %PROTOCOL%" %RESPONSE_CODE% %RESPONSE_FLAGS% %BYTES_RECEIVED% %BYTES_SENT% %DURATION% "%REQ(X-FORWARDED-FOR)%" "%REQ(USER-AGENT)%" "%REQ(X-REQUEST-ID)%" "%REQ(:AUTHORITY)%" "%UPSTREAM_HOST%"
          sinks:
            - type: OpenTelemetry
              openTelemetry:
                host: otel-collector.monitoring.svc.cluster.local
                port: 4317
                resources:
                  k8s.cluster.name: "cluster-1"

The following is an example of sending same format to different sinks.

apiVersion: gateway.envoyproxy.io/v1alpha1
kind: EnvoyProxy
metadata:
  name: multi-sinks
  namespace: envoy-gateway-system
spec:
  telemetry:
    accessLog:
      settings:
        - format:
            type: Text
            text: |
              [%START_TIME%] "%REQ(:METHOD)% %REQ(X-ENVOY-ORIGINAL-PATH?:PATH)% %PROTOCOL%" %RESPONSE_CODE% %RESPONSE_FLAGS% %BYTES_RECEIVED% %BYTES_SENT% %DURATION% "%REQ(X-FORWARDED-FOR)%" "%REQ(USER-AGENT)%" "%REQ(X-REQUEST-ID)%" "%REQ(:AUTHORITY)%" "%UPSTREAM_HOST%"
          sinks:
            - type: File
              file:
                path: /dev/stdout
            - type: OpenTelemetry
              openTelemetry:
                host: otel-collector.monitoring.svc.cluster.local
                port: 4317
                resources:
                  k8s.cluster.name: "cluster-1"

12 - Data Plane Observability: Metrics

This document aims to cover all aspects of envoy gateway data plane metrics observability.

Note

Control plane observability (while important) is outside of scope for this document. For control plane observability, refer to here.

Overview

Envoy provide robust platform for metrics, Envoy support three different kinds of stats: counter, gauges, histograms.

Envoy enables prometheus format output via the /stats/prometheus admin endpoint.

Envoy support different kinds of sinks, but EG will only support Open Telemetry sink.

Envoy Gateway leverages Gateway API for configuring managed Envoy proxies. Gateway API defines core, extended, and implementation-specific API support levels for implementers such as Envoy Gateway to expose features. Since metrics is not covered by Core or Extended APIs, EG should provide an easy to config metrics per EnvoyProxy.

Goals

Support expose metrics in prometheus way(reuse probe port).
Support Open Telemetry stats sink.

Non-Goals

Support other stats sink.

Use-Cases

Enable prometheus metric by default
Disable prometheus metric
Push metrics via Open Telemetry Sink
TODO: Customize histogram buckets of target metric
TODO: Support stats matcher

ProxyMetric API Type

type ProxyMetrics struct {
	// Prometheus defines the configuration for Admin endpoint `/stats/prometheus`.
	Prometheus *PrometheusProvider `json:"prometheus,omitempty"`
	// Sinks defines the metric sinks where metrics are sent to.
	Sinks []MetricSink `json:"sinks,omitempty"`
}

type MetricSinkType string

const (
	MetricSinkTypeOpenTelemetry MetricSinkType = "OpenTelemetry"
)

type MetricSink struct {
	// Type defines the metric sink type.
	// EG currently only supports OpenTelemetry.
	// +kubebuilder:validation:Enum=OpenTelemetry
	// +kubebuilder:default=OpenTelemetry
	Type MetricSinkType `json:"type"`
	// OpenTelemetry defines the configuration for OpenTelemetry sink.
	// It's required if the sink type is OpenTelemetry.
	OpenTelemetry *OpenTelemetrySink `json:"openTelemetry,omitempty"`
}

type OpenTelemetrySink struct {
	// Host define the service hostname.
	Host string `json:"host"`
	// Port defines the port the service is exposed on.
	//
	// +optional
	// +kubebuilder:validation:Minimum=0
	// +kubebuilder:validation:Maximum=65535
	// +kubebuilder:default=4317
	Port int32 `json:"port,omitempty"`

	// TODO: add support for customizing OpenTelemetry sink in https://www.envoyproxy.io/docs/envoy/latest/api-v3/extensions/stat_sinks/open_telemetry/v3/open_telemetry.proto#envoy-v3-api-msg-extensions-stat-sinks-open-telemetry-v3-sinkconfig
}

type PrometheusProvider struct {
	// Disable the Prometheus endpoint.
	Disable bool `json:"disable,omitempty"`
}

Example

The following is an example to disable prometheus metric.

apiVersion: gateway.envoyproxy.io/v1alpha1
kind: EnvoyProxy
metadata:
  name: prometheus
  namespace: envoy-gateway-system
spec:
  telemetry:
    metrics:
      prometheus:
        disable: true

The following is an example to send metric via Open Telemetry sink.

apiVersion: gateway.envoyproxy.io/v1alpha1
kind: EnvoyProxy
metadata:
  name: otel-sink
  namespace: envoy-gateway-system
spec:
  telemetry:
    metrics:
      sinks:
        - type: OpenTelemetry
          openTelemetry:
            host: otel-collector.monitoring.svc.cluster.local
            port: 4317

13 - Data Plane Observability: Tracing

Overview

Envoy supports extensible tracing to different sinks, Zipkin, OpenTelemetry etc. Overview of Envoy tracing can be found here.

Envoy Gateway leverages Gateway API for configuring managed Envoy proxies. Gateway API defines core, extended, and implementation-specific API support levels for implementers such as Envoy Gateway to expose features. Since tracing is not covered by Core or Extended APIs, EG should provide an easy to config tracing per EnvoyProxy.

Only OpenTelemetry sink can be configured currently, you can use OpenTelemetry Collector to export to other tracing backends.

Goals

Support send tracing to OpenTelemetry backend
Support configurable sampling rate
Support propagate tag from Literal, Environment and Request Header

Non-Goals

Support other tracing backend, e.g. Zipkin, Jaeger

Use-Cases

Configure accesslog for a EnvoyProxy to File

ProxyAccessLog API Type

type ProxyTracing struct {
	// SamplingRate controls the rate at which traffic will be
	// selected for tracing if no prior sampling decision has been made.
	// Defaults to 100, valid values [0-100]. 100 indicates 100% sampling.
	// +kubebuilder:validation:Minimum=0
	// +kubebuilder:validation:Maximum=100
	// +kubebuilder:default=100
	// +optional
	SamplingRate *uint32 `json:"samplingRate,omitempty"`
	// CustomTags defines the custom tags to add to each span.
	// If provider is kubernetes, pod name and namespace are added by default.
	CustomTags map[string]CustomTag `json:"customTags,omitempty"`
	// Provider defines the tracing provider.
	// Only OpenTelemetry is supported currently.
	Provider TracingProvider `json:"provider"`
}

type TracingProviderType string

const (
	TracingProviderTypeOpenTelemetry TracingProviderType = "OpenTelemetry"
)

type TracingProvider struct {
	// Type defines the tracing provider type.
	// EG currently only supports OpenTelemetry.
	// +kubebuilder:validation:Enum=OpenTelemetry
	// +kubebuilder:default=OpenTelemetry
	Type TracingProviderType `json:"type"`
	// Host define the provider service hostname.
	Host string `json:"host"`
	// Port defines the port the provider service is exposed on.
	//
	// +optional
	// +kubebuilder:validation:Minimum=0
	// +kubebuilder:default=4317
	Port int32 `json:"port,omitempty"`
}

type CustomTagType string

const (
	// CustomTagTypeLiteral adds hard-coded value to each span.
	CustomTagTypeLiteral CustomTagType = "Literal"
	// CustomTagTypeEnvironment adds value from environment variable to each span.
	CustomTagTypeEnvironment CustomTagType = "Environment"
	// CustomTagTypeRequestHeader adds value from request header to each span.
	CustomTagTypeRequestHeader CustomTagType = "RequestHeader"
)

type CustomTag struct {
	// Type defines the type of custom tag.
	// +kubebuilder:validation:Enum=Literal;Environment;RequestHeader
	// +unionDiscriminator
	// +kubebuilder:default=Literal
	Type CustomTagType `json:"type"`
	// Literal adds hard-coded value to each span.
	// It's required when the type is "Literal".
	Literal *LiteralCustomTag `json:"literal,omitempty"`
	// Environment adds value from environment variable to each span.
	// It's required when the type is "Environment".
	Environment *EnvironmentCustomTag `json:"environment,omitempty"`
	// RequestHeader adds value from request header to each span.
	// It's required when the type is "RequestHeader".
	RequestHeader *RequestHeaderCustomTag `json:"requestHeader,omitempty"`

	// TODO: add support for Metadata tags in the future.
	// EG currently doesn't support metadata for route or cluster.
}

// LiteralCustomTag adds hard-coded value to each span.
type LiteralCustomTag struct {
	// Value defines the hard-coded value to add to each span.
	Value string `json:"value"`
}

// EnvironmentCustomTag adds value from environment variable to each span.
type EnvironmentCustomTag struct {
	// Name defines the name of the environment variable which to extract the value from.
	Name string `json:"name"`
	// DefaultValue defines the default value to use if the environment variable is not set.
	// +optional
	DefaultValue *string `json:"defaultValue,omitempty"`
}

// RequestHeaderCustomTag adds value from request header to each span.
type RequestHeaderCustomTag struct {
	// Name defines the name of the request header which to extract the value from.
	Name string `json:"name"`
	// DefaultValue defines the default value to use if the request header is not set.
	// +optional
	DefaultValue *string `json:"defaultValue,omitempty"`
}

Example

The following is an example to config tracing.

apiVersion: gateway.envoyproxy.io/v1alpha1
kind: EnvoyProxy
metadata:
  name: tracing
  namespace: envoy-gateway-system
spec:
  telemetry:
    tracing:
      # sample 100% of requests
      samplingRate: 100
      provider:
        host: otel-collector.monitoring.svc.cluster.local
        port: 4317
      customTags:
        # This is an example of using a literal as a tag value
        key1:
          type: Literal
          literal:
            value: "val1"
        # This is an example of using an environment variable as a tag value
        env1:
          type: Environment
          environment:
            name: ENV1
            defaultValue: "-"
        # This is an example of using a header value as a tag value
        header1:
          type: RequestHeader
          requestHeader:
            name: X-Header-1
            defaultValue: "-"

14 - Debug support in Envoy Gateway

Overview

Envoy Gateway exposes endpoints at localhost:19000/debug/pprof to run Golang profiles to aid in live debugging.

The endpoints are equivalent to those found in the http/pprof package. /debug/pprof/ returns an HTML page listing the available profiles.

Goals

Add admin server to Envoy Gateway control plane, separated with admin server.
Add pprof support to Envoy Gateway control plane.
Define an API to allow Envoy Gateway to custom admin server configuration.
Define an API to allow Envoy Gateway to open envoy gateway config dump in logs.

The following are the different types of profiles end-user can run:

PROFILE	FUNCTION
/debug/pprof/allocs	Returns a sampling of all past memory allocations.
/debug/pprof/block	Returns stack traces of goroutines that led to blocking on synchronization primitives.
/debug/pprof/cmdline	Returns the command line that was invoked by the current program.
/debug/pprof/goroutine	Returns stack traces of all current goroutines.
/debug/pprof/heap	Returns a sampling of memory allocations of live objects.
/debug/pprof/mutex	Returns stack traces of goroutines holding contended mutexes.
/debug/pprof/profile	Returns pprof-formatted cpu profile. You can specify the duration using the seconds GET parameter. The default duration is 30 seconds.
/debug/pprof/symbol	Returns the program counters listed in the request.
/debug/pprof/threadcreate	Returns stack traces that led to creation of new OS threads.
/debug/pprof/trace	Returns the execution trace in binary form. You can specify the duration using the seconds GET parameter. The default duration is 1 second.

Non Goals

API

Add admin field in EnvoyGateway config.
Add address field under admin field.
Add port and host under address field.
Add enableDumpConfig field under `admin field.
Add enablePprof field under `admin field.

Here is an example configuration to open admin server and enable Pprof:

apiVersion: gateway.envoyproxy.io/v1alpha1
gateway:
    controllerName: "gateway.envoyproxy.io/gatewayclass-controller"
kind: EnvoyGateway
provider:
    type: "Kubernetes"
admin:
  enablePprof: true
  address:
    host: 127.0.0.1
    port: 19000

Here is an example configuration to open envoy gateway config dump in logs:

apiVersion: gateway.envoyproxy.io/v1alpha1
gateway:
    controllerName: "gateway.envoyproxy.io/gatewayclass-controller"
kind: EnvoyGateway
provider:
    type: "Kubernetes"
admin:
   enableDumpConfig: true

15 - egctl Design

Motivation

EG should provide a command line tool with following capabilities:

Collect configuration from envoy proxy and gateway
Analyse system configuration to diagnose any issues in envoy gateway

This tool is named egctl.

Syntax

Use the following syntax to run egctl commands from your terminal window:

egctl [command] [entity] [name] [flags]

where command, name, and flags are:

command: Specifies the operation that you want to perform on one or more resources, for example config, version.
entity: Specifies the entity the operation is being performed on such as envoy-proxy or envoy-gateway.
name: Specifies the name of the specified instance.
flags: Specifies optional flags. For example, you can use the -c or --config flags to specify the values for installing.

If you need help, run egctl help from the terminal window.

Operation

The following table includes short descriptions and the general syntax for all the egctl operations:

Operation	Syntax	Description
`version`	`egctl version`	Prints out build version information.
`config`	`egctl config ENTITY`	Retrieve information about proxy configuration from envoy proxy and gateway
`analyze`	`egctl analyze`	Analyze EG configuration and print validation messages
`experimental`	`egctl experimental`	Subcommand for experimental features. These do not guarantee backwards compatibility

Examples

Use the following set of examples to help you familiarize yourself with running the commonly used egctl operations:

# Retrieve all information about proxy configuration from envoy
egctl config envoy-proxy all <instance_name>

# Retrieve listener information about proxy configuration from envoy 
egctl config envoy-proxy listener <instance_name>

# Retrieve the relevant rate limit configuration from the Rate Limit instance
egctl config envoy-ratelimit

16 - Envoy Gateway Extensions Design

As outlined in the official goals for the Envoy Gateway project, one of the main goals is to “provide a common foundation for vendors to build value-added products without having to re-engineer fundamental interactions”. Development of the Envoy Gateway project has been focused on developing the core features for the project and Kubernetes Gateway API conformance. This system focuses on the “common foundation for vendors” component by introducing a way for vendors to extend Envoy Gateway.

To meaningfully extend Envoy Gateway and provide additional features, Extensions need to be able to introduce their own custom resources and have a high level of control over the configuration generated by Envoy Gateway. Simply applying some static xDS configuration patches or relying on the existing Gateway API resources are both insufficient on their own as means to add larger features that require dynamic user-configuration.

As an example, an extension developer may wish to provide their own out-of-the-box authentication filters that require configuration from the end-user. This is a scenario where the ability to introduce custom resources and attach them to HTTPRoutes as an ExtensionRef is necessary. Providing the same feature through a series of xDS patch resources would be too cumbersome for many end-users that want to avoid that level of complexity when managing their clusters.

Goals

Provide a foundation for extending the Envoy Gateway control plane
Allow Extension Developers to introduce their own custom resources for extending the Gateway-API via ExtensionRefs, policyAttachments (future) and backendRefs (future).
Extension developers should NOT have to maintain a custom fork of Envoy Gateway
Provide a system for extending Envoy Gateway which allows extension projects to ship updates independent of Envoy Gateway’s release schedule
Modify the generated Envoy xDS config
Setup a foundation for the initial iteration of Extending Envoy Gateway
Allow an Extension to hook into the infra manager pipeline (future)

Non-Goals

The initial design does not capture every hook that Envoy Gateway will eventually support.
Extend Gateway API Policy Attachments. At some point, these will be addressed using this extension system, but the initial implementation omits these.
Support multiple extensions at the same time. Due to the fact that extensions will be modifying xDS resources after they are generated, handling the order of extension execution for each individual hook point is a challenge. Additionally, there is no real way to prevent one extension from overwriting or breaking modifications to xDS resources that were made by another extension that was executed first.

Overview

Envoy Gateway can be extended by vendors by means of an extension server developed by the vendor and deployed alongside Envoy Gateway. An extension server can make use of one or more pre/post hooks inside Envoy Gateway before and after its major components (translator, etc.) to allow the extension to modify the data going into or coming out of these components. An extension can be created external to Envoy Gateway as its own Kubernetes deployment or loaded as a sidecar. gRPC is used for the calls between Envoy Gateway and an extension. In the hook call, Envoy Gateway sends data as well as context information to the extension and expects a reply with a modified version of the data that was sent to the extension. Since extensions fundamentally alter the logic and data that Envoy Gateway provides, Extension projects assume responsibility for any bugs and issues they create as a direct result of their modification of Envoy Gateway.

Diagram

Architecture

Registering Extensions in Envoy Gateway

Information about the extension that Envoy Gateway needs to load is configured in the Envoy Gateway config.

An example configuration:

apiVersion: gateway.envoyproxy.io/v1alpha1
kind: EnvoyGateway
extensionManager:
  policyResources:
  - group: example.myextension.io
    version: v1alpha1
    kind: ListenerPolicyKind
  resources:
  - group: example.myextension.io
    version: v2
    kind: OAuth2Filter
  hooks:
    xdsTranslator:
      post:
      - Route
      - VirtualHost
      - HTTPListener
      - Translation
  service:
    fqdn:
      hostname: my-extension.example
      port: 443
    tls:
      certificateRef:
        name: my-secret
        namespace: default

An extension must supply connection information in the extension.service field so that Envoy Gateway can communicate with the extension. The tls configuration is optional. Envoy Gateway supports connecting to an extension server either with TCP or with Unix Domain Sockets as a transport layer.

If the extension wants Envoy Gateway to watch resources for it then the extension must configure the optional extension.resources field and supply a list of:

group: the API group of the resource
version: the API version of the resource
kind: the Kind of resource

If the extension wants Envoy Gateway to watch for policy resources then it must configure the optional extensions.policyResources field and supply a list of

group: the API group of the resource
version: the API version of the resource
kind: the Kind of resource

Policy resources, like all Gateway-API policies, must contain targetRef or targetRefs fields in the spec which allow Envoy Gateway to identify which resources are targeted by the policy. Policies can currently only target Gateway resources, and are provided as context to calls to the HTTPListener hook.

The extension can configure the extensionManager.hooks field to specify which hook points it would like to support. If a given hook is not listed here then it will not be executed even if the extension is configured properly. This allows extension developers to only opt-in to the hook points they want to make use of.

This configuration is required to be provided at bootstrap and modifying the registered extension during runtime is not currently supported. Envoy Gateway will keep track of the registered extension and its API groups and kinds when processing Gateway API resources.

Extending Gateway API and the Data Plane

Envoy Gateway manages Envoy deployments, which act as the data plane that handles actual user traffic. Users configure the data plane using the K8s Gateway API resources which Envoy Gateway converts into Envoy specific configuration (xDS) to send over to Envoy.

Gateway API offers ExtensionRef filters and Policy Attachments as extension points for implementers to use. Envoy Gateway extends the Gateway API using these extension points to provide support for rate limiting and authentication native to the project. The initial design of Envoy Gateway extensions will primarily focus on ExtensionRef filters so that extension developers can reference their own resources as HTTP Filters in the same way that Envoy Gateway has native support for rate limiting and authentication filters, as well as policy resources which can target Gateways.

When Envoy Gateway encounters an HTTPRoute or GRPCRoute that has an ExtensionRef filter with a group and kind that Envoy Gateway does not support, it will first check the registered extension to determine if it supports the referenced object before considering it a configuration error.

This allows users to be able to reference additional filters provided by their Envoy Gateway Extension, in their HTTPRoutes / GRPCRoutes:

apiVersion: example.myextension.io/v1alpha1
kind: OAuth2Filter
metadata:
  name: oauth2-filter
spec:
  ...

---
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
  name: example
spec:
  parentRefs:
  - name: eg
  hostnames:
  - www.example.com
  rules:
  - clientSelectors:
    - path:
        type: PathPrefix
        value: /
    filters:
    - type: ExtensionRef
      extensionRef:
        group: example.myextension.io
        kind: OAuth2Filter
        name: oauth2-filter
    backendRefs:
    - name: backend
      port: 3000

In order to enable the usage of new resources introduced by an extension for translation and xDS modification, Envoy Gateway provides hook points within the translation pipeline, where it calls out to the extension service registered in the EnvoyGateway config if they specify an group that matches the group of an ExtensionRef filter. The extension will then be able to modify the xDS that Envoy Gateway generated and send back the modified configuration. If an extension is not registered or if the registered extension does not specify support for the group of an ExtensionRef filter then Envoy Gateway will treat it as an unknown resource and provide an error to the user.

Note: Currently (as of v1) Gateway API does not provide a means to specify the namespace or version of an object referenced as an ExtensionRef. The extension mechanism will assume that the namespace of any ExtensionRef is the same as the namespace of the HTTPRoute or GRPCRoute it is attached to rather than treating the name field of an ExtensionRef as a name.namespace string. If Gateway API adds support for these fields then the design of the Envoy Gateway extensions will be updated to support them.

Similarly, any registered policy resource that targets an HTTPListener will be sent to the HTTPListener hook as context.

Watching New Resources

Envoy Gateway will dynamically create new watches on resources introduced by the registered Extension. It does so by using the controller-runtime to create new watches on Unstructured resources that match the versions, groups, and kinds that the registered extension configured. When communicating with an extension, Envoy Gateway sends these Unstructured resources over to the extension. This eliminates the need for the extension to create its own watches which would have a strong chance of creating race conditions and reconciliation loops when resources change. When an extension receives the Unstructured resources from Envoy Gateway it can perform its own type validation on them. Currently we make the simplifying assumption that the registered extension’s Kinds are filters referenced by extensionRef in HTTPRouteFilters . Policy attachments which target Gateway resources work in the same way.

xDS Hooks API

Envoy Gateway supports the following hooks as the initial foundation of the Extension system. Additional hooks can be developed using this extension system at a later point as new use-cases and needs are discovered. The primary iteration of the extension hooks focuses solely on the modification of xDS resources.

Route Modification Hook

The Route level Hook provides a way for extensions to modify a route generated by Envoy Gateway before it is finalized. Doing so allows extensions to configure/modify route fields configured by Envoy Gateway and also to configure the Route’s TypedPerFilterConfig which may be desirable to do things such as pass settings and information to ext_authz filters. The Post Route Modify hook also passes a list of Unstructured data for the externalRefs owned by the extension on the HTTPRoute that created this xDS route This hook is always executed when an extension is loaded that has added Route to the EnvoyProxy.extensionManager.hooks.xdsTranslator.post, and only on Routes which were generated from an HTTPRoute that uses extension resources as externalRef filters.

// PostRouteModifyRequest sends a Route that was generated by Envoy Gateway along with context information to an extension so that the Route can be modified
message PostRouteModifyRequest {
    envoy.config.route.v3.Route route = 1;
    PostRouteExtensionContext post_route_context = 2;
}

// RouteExtensionContext provides resources introduced by an extension and watched by Envoy Gateway
// additional context information can be added to this message as more use-cases are discovered
message PostRouteExtensionContext {
    // Resources introduced by the extension that were used as extensionRefs in an HTTPRoute/GRPCRoute
    repeated ExtensionResource extension_resources = 1;

    // hostnames are the fully qualified domain names attached to the HTTPRoute
    repeated string hostnames = 2;
}

// ExtensionResource stores the data for a K8s API object referenced in an HTTPRouteFilter
// extensionRef. It is constructed from an unstructured.Unstructured marshalled to JSON. An extension
// can marshal the bytes from this resource back into an unstructured.Unstructured and then 
// perform type checking to obtain the resource.
message ExtensionResource {
    bytes unstructured_bytes = 1;
}

// PostRouteModifyResponse is the expected response from an extension and contains a modified version of the Route that was sent
// If an extension returns a nil Route then it will not be modified
message PostRouteModifyResponse {
    envoy.config.route.v3.Route route = 1;
}

VirtualHost Modification Hook

The VirtualHost Hook provides a way for extensions to modify a VirtualHost generated by Envoy Gateway before it is finalized. An extension can also make use of this hook to generate and insert entirely new Routes not generated by Envoy Gateway. This hook is always executed when an extension is loaded that has added VirtualHost to the EnvoyProxy.extensionManager.hooks.xdsTranslator.post. An extension may return nil to not make any changes to the VirtualHost.

// PostVirtualHostModifyRequest sends a VirtualHost that was generated by Envoy Gateway along with context information to an extension so that the VirtualHost can be modified
message PostVirtualHostModifyRequest {
    envoy.config.route.v3.VirtualHost virtual_host = 1;
    PostVirtualHostExtensionContext post_virtual_host_context = 2;
}

// Empty for now but we can add fields to the context as use-cases are discovered without
// breaking any clients that use the API
// additional context information can be added to this message as more use-cases are discovered
message PostVirtualHostExtensionContext {}

// PostVirtualHostModifyResponse is the expected response from an extension and contains a modified version of the VirtualHost that was sent
// If an extension returns a nil Virtual Host then it will not be modified
message PostVirtualHostModifyResponse {
    envoy.config.route.v3.VirtualHost virtual_host = 1;
}

HTTP Listener Modification Hook

The HTTP Listener modification hook is the broadest xDS modification Hook available and allows an extension to make changes to a Listener generated by Envoy Gateway before it is finalized. This hook is always executed when an extension is loaded that has added HTTPListener to the EnvoyProxy.extensionManager.hooks.xdsTranslator.post. An extension may return nil in order to not make any changes to the Listener.

// PostVirtualHostModifyRequest sends a Listener that was generated by Envoy Gateway along with context information to an extension so that the Listener can be modified
message PostHTTPListenerModifyRequest {
    envoy.config.listener.v3.Listener listener = 1;
    PostHTTPListenerExtensionContext post_listener_context = 2;
}

// Empty for now but we can add fields to the context as use-cases are discovered without
// breaking any clients that use the API
// additional context information can be added to this message as more use-cases are discovered
message PostHTTPListenerExtensionContext {
    // Resources introduced by the extension that were used as extension server
    // policies targeting the listener
    repeated ExtensionResource extension_resources = 1;
}

// PostHTTPListenerModifyResponse is the expected response from an extension and contains a modified version of the Listener that was sent
// If an extension returns a nil Listener then it will not be modified
message PostHTTPListenerModifyResponse {
    envoy.config.listener.v3.Listener listener = 1;
}

Post xDS Translation Modify Hook

The Post Translate Modify hook allows an extension to modify the clusters and secrets in the xDS config. This allows for inserting clusters that may change along with extension specific configuration to be dynamically created rather than using custom bootstrap config which would be sufficient for clusters that are static and not prone to have their configurations changed. An example of how this may be used is to inject a cluster that will be used by an ext_authz http filter created by the extension. The list of clusters and secrets returned by the extension are used as the final list of all clusters and secrets This hook is always executed when an extension is loaded that has added Translation to the EnvoyProxy.extensionManager.hooks.xdsTranslator.post.

// PostTranslateModifyRequest currently sends only clusters and secrets to an extension.
// The extension is free to add/modify/remove the resources it received.
message PostTranslateModifyRequest {
    PostTranslateExtensionContext post_translate_context = 1;
    repeated envoy.config.cluster.v3.Cluster clusters = 2;
    repeated envoy.extensions.transport_sockets.tls.v3.Secret secrets = 3;
}

// PostTranslateModifyResponse is the expected response from an extension and contains
// the full list of xDS clusters and secrets to be used for the xDS config.
message PostTranslateModifyResponse {
    repeated envoy.config.cluster.v3.Cluster clusters = 1;
    repeated envoy.extensions.transport_sockets.tls.v3.Secret secrets = 2;
}

Extension Service

Currently, an extension must implement all of the following hooks although it may return the input(s) it received if no modification of the resource is desired. A future expansion of the extension hooks will allow an Extension to specify with config which Hooks it would like to “subscribe” to and which Hooks it does not wish to support. These specific Hooks were chosen in order to provide extensions with the ability to have both broad and specific control over xDS resources and to minimize the amount of data being sent.

service EnvoyGatewayExtension {
    rpc PostRouteModify (PostRouteModifyRequest) returns (PostRouteModifyResponse) {};
    rpc PostVirtualHostModify(PostVirtualHostModifyRequest) returns (PostVirtualHostModifyResponse) {};
    rpc PostHTTPListenerModify(PostHTTPListenerModifyRequest) returns (PostHTTPListenerModifyResponse) {};
    rpc PostTranslateModify(PostTranslateModifyRequest) returns (PostTranslateModifyResponse) {};
}

Design Decisions

Envoy Gateway watches new custom resources introduced by a loaded extension and passes the resources back to the extension when they are used.
- This decision was made to solve the problem about how resources introduced by an extension get watched. If an extension server watches its own resources then it would need some way to trigger an Envoy Gateway reconfigure when a resource that Envoy Gateway is not watching gets updated. Having Envoy Gateway watch all resources removes any concern about creating race conditions or reconcile loops that would result from Envoy Gateway and the extension server both having so much separate state that needs to be synchronized.
The Extension Server takes ownership of producing the correct xDS configuration in the hook responses
The Extension Server will be responsible for ensuring the performance of the hook processing time
The Post xDS level gRPC hooks all currently send a context field even though it contains nothing for several hooks. These fields exist so that they can be updated in the future to pass additional information to extensions as new use-cases and needs are discovered.
The initial design supplies the scaffolding for both “pre xDS” and “post xDS” hooks. Only the post hooks are currently implemented which operate on xDS resources after they have been generated. The pre hooks will be implemented at a later date along with one or more hooks in the infra manager. The infra manager level hook(s) will exist to power use-cases such as dynamically creating Deployments/Services for the extension the whenever Envoy Gateway creates an instance of Envoy Proxy. An extension developer might want to take advantage of this functionality to inject a new authorization service as a sidecar on the Envoy Proxy deployment for reduced latency.
Multiple extensions are not be supported at the same time. Preventing conflict between multiple extensions that are mangling xDS resources is too difficult to ensure compatibility with and is likely to only generate issues.

Known Challenges

Extending Envoy Gateway by using an external extension server which makes use of hook points in Envoy Gateway does comes with a few trade-offs. One known trade-off is the impact of the time that it takes for the hook calls to be executed. Since an extension would make use of hook points in Envoy Gateway that use gRPC for communication, the time it takes to perform these requests could become a concern for some extension developers. One way to minimize the request time of the hook calls is to load the extension server as a sidecar to Envoy Gateway using the Unix Local Domain transport to minimize the impact of networking on the hook calls.

17 - EnvoyExtensionPolicy

Overview

This design document introduces the EnvoyExtensionPolicy API allowing system administrators to configure traffic processing extensibility policies, based on existing Network and HTTP Envoy proxy extension points.

Envoy Gateway already provides two methods of control plane extensibility that can be used to achieve this functionality:

Envoy Patch Policy can be used to patch Listener filters and HTTP Connection Manager filters.
Envoy Extension Manager can be used to programmatically mutate Listener filters and HTTP Connection Manager filters.

These approaches require a high level of Envoy and Envoy Gateway expertise and may create a significant operational burden for users (see Alternatives for more details). For this reason, this document proposes to support Envoy data plane extensibility options as first class citizens of Envoy Gateway.

Goals

Add an API definition to hold settings for configuring extensibility rules on the traffic entering the gateway.

Non Goals

Define the API configuration fields in this API.
Define the API for the following extension options:
- Native Envoy extensions: custom C++ extensions that must be compiled into the Envoy binary.
- Non-filter extensions: services, matchers, tracers, private key providers, resource monitors, etc.

Implementation

EnvoyExtensionPolicy is a Policy Attachment type API that can be used to extend Gateway API to define traffic extension rules.

BackendTrafficPolicy is enhanced to allow users to provide per-route config for Extensions.

Example

Here is an example highlighting how a user can configure this API for the External Processing extension.

apiVersion: gateway.networking.k8s.io/v1
kind: GatewayClass
metadata:
  name: eg
spec:
  controllerName: gateway.envoyproxy.io/gatewayclass-controller
---
apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
metadata:
  name: eg
  namespace: default
spec:
  gatewayClassName: eg
  listeners:
    - name: https
      protocol: HTTPS
      port: 443
---
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
  name: backend
  namespace: default
spec:
  parentRefs:
    - name: eg
  hostnames:
    - "www.example.com"
  rules:
    - backendRefs:
        - group: ""
          kind: Service
          name: backend
          port: 3000
          weight: 1
      matches:
        - path:
            type: PathPrefix
            value: /
---
apiVersion: gateway.envoyproxy.io/v1alpha1
kind: EnvoyExtensionPolicy
metadata:
  name: ext-proc-policy
  namespace: default
spec:
  priority: 10
  extProc:
  - service:
      backendRef:
        group: ""
        kind: Service
        name: myExtProc
        port: 3000
    processingMode:
      request:
        headers: SEND
        body: BUFFERED
      response:
        headers: SKIP
        body: STREAMED
    messageTimeout: 5s
  targetRef:
    group: gateway.networking.k8s.io
    kind: Gateway
    name: eg
    namespace: default

Features / API Fields

Here is a list of features that can be included in this API

Network Filters:
- Wasm
- Golang
HTTP Filters:
- External Processing
- Lua
- Wasm
- Golang

Design Decisions

This API will only support a single targetRef and can bind to a Gateway resource or a HTTPRoute or GRPCRoute or TCPRoute.
Extensions that support both Network and HTTP filter variants (e.g. Wasm, Golang) will be translated to the appropriate filter type according to the sort of route that they attach to.
Extensions that only support HTTP extensibility (Ext-Proc, LUA) can only be attached to HTTP/GRPC Routes.
A user-defined extension that is added to the request processing flow can have a significant impact on security, resilience and performance of the proxy. Gateway Operators can restrict access to the extensibility policy using K8s RBAC.
Users may need to customize the order of extension and built-in filters. This will be addressed in a separate issue.
Gateway operators may need to include multiple extensions (e.g. Wasm modules developed by different teams and distributed separately). This API will support attachment of multiple policies. Extension will execute in an order defined by the priority field.
This API resource MUST be part of same namespace as the targetRef resource
If the policy targets a resource but cannot attach to it, this information should be reflected in the Policy Status field using the Conflicted=True condition.
If Policy A has a targetRef that includes a sectionName i.e. it targets a specific Listener within a Gateway and Policy B has a targetRef that targets the same entire Gateway then
- Policy A will be applied/attached to the specific Listener defined in the targetRef.SectionName
- Policy B will be applied to the remaining Listeners within the Gateway. Policy B will have an additional status condition Overridden=True.
A Policy targeting the most specific scope wins over a policy targeting a lesser specific scope. i.e. A Policy targeting a Listener overrides a Policy targeting the Gateway the listener/section is a part of.

Alternatives

The project can indefinitely wait for these configuration parameters to be part of the Gateway API.
The project can implement support for HTTP traffic extensions using vendor-specific Gateway API Route Filters instead of policies. However, this option will is less convenient for definition of gateway-level extensions.
Users can leverage the existing Envoy Patch Policy to inject extension filters. However, Envoy Gateway strives to provide a simple abstraction for common use cases and easy operations. Envoy patches require a high level of end-user Envoy expertise, and knowledge of how Envoy Gateway generates XDS. Such patches may be too difficult and fragile for some users to maintain.
Users can leverage the existing Envoy Extension Manager to inject extension filters. However, this requires a significant investment by users to build and operate an extension manager alongside Envoy Gateway.

18 - EnvoyPatchPolicy

Overview

This design introduces the EnvoyPatchPolicy API allowing users to modify the generated Envoy xDS Configuration that Envoy Gateway generates before sending it to Envoy Proxy.

Envoy Gateway allows users to configure networking and security intent using the upstream Gateway API as well as implementation specific Extension APIs defined in this project to provide a more batteries included experience for application developers.

These APIs are an abstracted version of the underlying Envoy xDS API to provide a better user experience for the application developer, exposing and setting only a subset of the fields for a specific feature, sometimes in a opinionated way (e.g RateLimit)
These APIs do not expose all the features capabilities that Envoy has either because these features are desired but the API is not defined yet or the project cannot support such an extensive list of features. To alleviate this problem, and provide an interim solution for a small section of advanced users who are well versed in Envoy xDS API and its capabilities, this API is being introduced.

Goals

Add an API allowing users to modify the generated xDS Configuration

Non Goals

Support multiple patch mechanisims

Implementation

EnvoyPatchPolicy is a Direct Policy Attachment type API that can be used to extend Gateway API Modifications to the generated xDS configuration can be provided as a JSON Patch which is defined in RFC 6902. This patching mechanism has been adopted in Kubernetes as well as Kustomize to update resource objects.

Example

Here is an example highlighting how a user can configure global ratelimiting using an external rate limit service using this API.

apiVersion: gateway.networking.k8s.io/v1
kind: GatewayClass
metadata:
  name: eg
spec:
  controllerName: gateway.envoyproxy.io/gatewayclass-controller
---
apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
metadata:
  name: eg
  namespace: default
spec:
  gatewayClassName: eg
  listeners:
    - name: http
      protocol: HTTP
      port: 80
---
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
  name: backend
  namespace: default
spec:
  parentRefs:
    - name: eg
  hostnames:
    - "www.example.com"
  rules:
    - backendRefs:
        - group: ""
          kind: Service
          name: backend
          port: 3000
          weight: 1
      matches:
        - path:
            type: PathPrefix
            value: /
---
apiVersion: gateway.envoyproxy.io/v1alpha1
kind: EnvoyPatchPolicy
metadata:
  name: ratelimit-patch-policy
  namespace: default
spec:
  targetRef:
    group: gateway.networking.k8s.io
    kind: Gateway
    name: eg
    namespace: default
  type: JSONPatch
  jsonPatches:
    - type: "type.googleapis.com/envoy.config.listener.v3.Listener"
      # The listener name is of the form <GatewayNamespace>/<GatewayName>/<GatewayListenerName>
      name: default/eg/http
      operation:
        op: add
        path: "/default_filter_chain/filters/0/typed_config/http_filters/0"
        value:
          name: "envoy.filters.http.ratelimit"
          typed_config:
            "@type": "type.googleapis.com/envoy.extensions.filters.http.ratelimit.v3.RateLimit"
            domain: "eag-ratelimit"
            failure_mode_deny: true
            timeout: 1s
            rate_limit_service:
              grpc_service:
                envoy_grpc:
                  cluster_name: rate-limit-cluster
              transport_api_version: V3
    - type: "type.googleapis.com/envoy.config.route.v3.RouteConfiguration"
      # The route name is of the form <GatewayNamespace>/<GatewayName>/<GatewayListenerName>
      name: default/eg/http
      operation:
        op: add
        path: "/virtual_hosts/0/rate_limits"
        value:
          - actions:
              - remote_address: {}
    - type: "type.googleapis.com/envoy.config.cluster.v3.Cluster"
      name: rate-limit-cluster
      operation:
        op: add
        path: ""
        value:
          name: rate-limit-cluster
          type: STRICT_DNS
          connect_timeout: 10s
          lb_policy: ROUND_ROBIN
          http2_protocol_options: {}
          load_assignment:
            cluster_name: rate-limit-cluster
            endpoints:
              - lb_endpoints:
                  - endpoint:
                      address:
                        socket_address:
                          address: ratelimit.svc.cluster.local
                          port_value: 8081

Verification

Offline - Leverage egctl x translate to ensure that the EnvoyPatchPolicy can be successfully applied and the desired output xDS is created.
Runtime - Use the Status field within EnvoyPatchPolicy to highlight whether the patch was applied successfully or not.

State of the World

Istio - Supports the EnvoyFilter API which allows users to customize the output xDS using patches and proto based merge semantics.

Design Decisions

This API will only support a single targetRef and can bind to only a Gateway or GatewayClass resource. This simplifies reasoning of how patches will work.
This API will always be an experimental API and cannot be graduated into a stable API because Envoy Gateway cannot guarantee
- that the naming scheme for the generated resources names will not change across releases
- that the underlying Envoy Proxy API will not change across releases
This API needs to be explicitly enabled using the EnvoyGateway API

Open Questions

Should the value only support JSON or YAML as well (which is a JSON superset) ?

Alternatives

Users can customize the Envoy Bootstrap configuration using EnvoyProxy API and provide static xDS configuration.
Users can extend functionality by Extending the Control Plane and adding gRPC hooks to modify the generated xDS configuration.

19 - Fuzzing

Overview

This design document introduces Fuzz Testing in Envoy Gateway. Its goal is to detect unexpected crashes, memory leaks, and undefined behaviors in the translation of Gateway API resources and configuration parsing, areas that may not be fully covered by unit tests.

Additionally, OSS-Fuzz provides a continuous fuzzing infrastructure for popular open-source projects. By writing fuzz tests, we can leverage OSS-Fuzz to continuously fuzz Envoy Gateway against a wide range of inputs, improving its resilience and reliability.

Note: This work is sponsored by the Linux Foundation Mentorship program.

Goals

Identify fuzz targets considering the tradeoff between realism, efficiency, and maintenance effort.
Develop a set of initial fuzzers.
Integrate with OSS-Fuzz for continuous, automated fuzz testing.
Iteratively refine, update, and enhance fuzzers based on findings and evolving requirements.

Non Goals

Introducing new features to Envoy Gateway.
Achieving 100% code coverage.

Implementation

As the fuzzers will be integrated with OSS-Fuzz, the implementation will follow best practices outlined in OSS-Fuzz Ideal Integration page. Fuzzers will be developed native Go fuzzing library go-fuzz.

Example

Here is an example of a simple fuzzer that tests the translation of Gateway API resource to Intermediate representation (IR).

func FuzzGatewayAPIToIRWithGatewayClass(f *testing.F) {
	f.Add("valid-name", "valid-namespace", "gateway.envoyproxy.io/gatewayclass-controller")
	f.Fuzz(func(t *testing.T, name, namespace, controllerName string) {
		resources := &resource.Resources{
			GatewayClass: &gwapiv1.GatewayClass{
				ObjectMeta: metav1.ObjectMeta{
					Name:      name,
					Namespace: namespace,
				},
				Spec: gwapiv1.GatewayClassSpec{
					ControllerName: gwapiv1.GatewayController(controllerName),
				},
			},
		}

		result, err := translateGatewayAPIToIR(resources)
		require.NoError(t, err, "Error should be nil")
		require.NotNil(t, result, "Result should not be nil")
		require.NotNil(t, result.Resources, "Resources should not be nil")
		require.NotNil(t, result.XdsIR, "XdsIR should not be nil")
		require.NotNil(t, result.InfraIR, "InfraIR should not be nil")
	})
}

Design Decisions

Fuzzing library: go-fuzz.
Fuzzers directory: Fuzzers will be stored in the test/fuzz directory.

Conclusion

Fuzz testing is an ongoing process. Once the initial fuzzers are developed, crashes reported by OSS-Fuzz will be continuously monitored, and the fuzzers will be iteratively refined. Additionally, future efforts will be directed toward exploring the integration of fuzz testing into the CI pipeline using GitHub Actions.

20 - Metadata in XDS resources

Overview

In Envoy, static metadata can be configured on various resources: listener, virtual host, route and cluster.

Static metadata can be used for various purposes:

Observability: enrichment of access logs and traces with metadata formatters and custom tags.
Processing: provide configuration context to filters in a certain scope (e.g. vhost, route, etc.).

This document describes how Envoy Gateway manages static metadata for various XDS resource such as listeners, virtual hosts, routes, clusters and endpoints.

Configuration

Envoy Gateway propagates certain attributes of Gateway-API resources to XDS resources. Attributes include:

Metadata: Kind, Group/Version, Name, Namespace and Annotations (belonging to the metadata.gateway.envoyproxy.io namespace)
Spec: SectionName (Listener Name, RouteRule Name, Port Name), in-spec annotations (e.g. Gateway Annotations)

Future enhancements may include:

Additional attribute propagation
Supporting section-specific metadata, e.g. HTTPRoute Metadata annotations that are propagated only to a specific route rule XDS metadata.
Supporting additional XDS resource, e.g. endpoints and filter chains.

Translation

Envoy Gateway uses the following namespace for envoy resource metadata: gateway.envoyproxy.io/. For example, an envoy route resource may have the following metadata structure:

Kubernetes resource:

kind: HTTPRoute
apiVersion: gateway.networking.k8s.io/v1
metadata:
  annotations:
    gateway.envoyproxy.io/foo: bar
  name: myroute
  namespace: gateway-conformance-infra
spec:
  rules:
    matches:
    - path:
        type: PathPrefix
        value: /mypath

Metadata structure:

name: httproute/gateway-conformance-infra/myroute/rule/0/match/0/*
match:
  path_separated_prefix: "/mypath"
route:
  cluster: httproute/gateway-conformance-infra/myroute/rule/0
metadata:
  filter_metadata:
    envoy-gateway:
      resources:
        - namespace: gateway-conformance-infra
          groupVersion: gateway.networking.k8s.io/v1
          kind: HTTPRoute
          annotations:
            foo: bar
          name: myroute

Envoy Gateway translates Gateway-API in the following manner:

Gateway metadata is propagated to envoy listener metadata. If merge-gateways is enabled, Gateway Class is used instead.
Gateway metadata and Listener Section name are propagated to envoy virtual host metadata
HTTPRoute and GRPCRoute metadata is propagated to envoy route metadata. When Gateway-API adds support named route rules, the route rule name
TCP/UDPRoute and TLSRoute resource attributes are not propagated. These resources are translated to envoy filter chains, which do not currently support static metadata.
Service, ServiceImport and Backend metadata and port name are propagated to envoy cluster metadata.

Usage

Users can consume metadata in various ways:

Adding metadata to access logs using the metadata operator, e.g. %METADATA(ROUTE:envoy-gateway:resources)
Accessing metadata in CEL expressions through the xds.*_metadata attribute

21 - Rate Limit Design

Overview

Rate limit is a feature that allows the user to limit the number of incoming requests to a predefined value based on attributes within the traffic flow.

Here are some reasons why a user may want to implement Rate limits

To prevent malicious activity such as DDoS attacks.
To prevent applications and its resources (such as a database) from getting overloaded.
To create API limits based on user entitlements.

Scope Types

The rate limit type here describes the scope of rate limits.

Global - In this case, the rate limit is common across all the instances of Envoy proxies where its applied i.e. if the data plane has 2 replicas of Envoy running, and the rate limit is 10 requests/second, this limit is common and will be hit if 5 requests pass through the first replica and 5 requests pass through the second replica within the same second.
Local - In this case, the rate limits are specific to each instance/replica of Envoy running. Note - This is not part of the initial design and will be added as a future enhancement.

Match Types

Rate limit a specific traffic flow

Here is an example of a ratelimit implemented by the application developer to limit a specific user by matching on a custom x-user-id header with a value set to one

apiVersion: gateway.envoyproxy.io/v1alpha1
kind: BackendTrafficPolicy
metadata:
  name: ratelimit-specific-user
spec:
  targetRef:
    group: gateway.networking.k8s.io
    kind: HTTPRoute
    name: example
  rateLimit:
    type: Global
    global:
      rules:
      - clientSelectors:
        - headers:
          - name: x-user-id
            value: one
        limit:
          requests: 10
          unit: Hour
---
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
  name: example
spec:
  parentRefs:
  - name: eg
  hostnames:
  - www.example.com
  rules:
  - matches:
    - path:
        type: PathPrefix
        value: /foo
    filters:
    - type: ExtensionRef
      extensionRef:
        group: gateway.envoyproxy.io
        kind: RateLimitFilter
        name: ratelimit-specific-user
    backendRefs:
    - name: backend
      port: 3000

Rate limit all traffic flows

Here is an example of a rate limit implemented by the application developer that limits the total requests made to a specific route to safeguard health of internal application components. In this case, no specific headers match is specified, and the rate limit is applied to all traffic flows accepted by this HTTPRoute.

apiVersion: gateway.envoyproxy.io/v1alpha1
kind: BackendTrafficPolicy
metadata:
  name: ratelimit-all-requests
spec:
  targetRef:
    group: gateway.networking.k8s.io
    kind: HTTPRoute
    name: example
  rateLimit:
    type: Global
    global:
      rules:
      - limit:
          requests: 1000
          unit: Second
---
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
  name: example
spec:
  parentRefs:
  - name: eg
  hostnames:
  - www.example.com
  rules:
  - matches:
    - path:
        type: PathPrefix
        value: /foo
    filters:
    - type: ExtensionRef
      extensionRef:
        group: gateway.envoyproxy.io
        kind: RateLimitFilter
        name: ratelimit-all-requests
    backendRefs:
    - name: backend
      port: 3000

Rate limit per distinct value

Here is an example of a rate limit implemented by the application developer to limit any unique user by matching on a custom x-user-id header. Here, user A (recognised from the traffic flow using the header x-user-id and value a) will be rate limited at 10 requests/hour and so will user B (recognised from the traffic flow using the header x-user-id and value b).

apiVersion: gateway.envoyproxy.io/v1alpha1
kind: BackendTrafficPolicy
metadata:
  name: ratelimit-per-user
spec:
  targetRef:
    group: gateway.networking.k8s.io
    kind: HTTPRoute
    name: example
  rateLimit:
    type: Global
    global:
      rules:
      - clientSelectors:
        - headers:
          - type: Distinct
            name: x-user-id
        limit:
          requests: 10
          unit: Hour
---
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
  name: example
spec:
  parentRefs:
  - name: eg
  hostnames:
  - www.example.com
  rules:
  - matches:
    - path:
        type: PathPrefix
        value: /foo
    filters:
    - type: ExtensionRef
      extensionRef:
        group: gateway.envoyproxy.io
        kind: RateLimitFilter
        name: ratelimit-per-user 
    backendRefs:
    - name: backend
      port: 3000

Rate limit per source IP

Here is an example of a rate limit implemented by the application developer that limits the total requests made to a specific route by matching on source IP. In this case, requests from x.x.x.x will be rate limited at 10 requests/hour.

apiVersion: gateway.envoyproxy.io/v1alpha1
kind: BackendTrafficPolicy
metadata:
  name: ratelimit-per-ip
spec:
  targetRef:
    group: gateway.networking.k8s.io
    kind: HTTPRoute
    name: example
  rateLimit:
    type: Global
    global:
      rules:
      - clientSelectors:
        - sourceIP: x.x.x.x/32
        limit:
          requests: 10
          unit: Hour
---
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
  name: example
spec:
  parentRefs:
  - name: eg
  hostnames:
  - www.example.com
  rules:
  - matches:
    - path:
        type: PathPrefix
        value: /foo
    filters:
    - type: ExtensionRef
      extensionRef:
        group: gateway.envoyproxy.io
        kind: RateLimitFilter
        name: ratelimit-per-user 
    backendRefs:
    - name: backend
      port: 3000

Rate limit based on JWT claims

Here is an example of rate limit implemented by the application developer that limits the total requests made to a specific route by matching on the jwt claim. In this case, requests with jwt claim information of {"name":"John Doe"} will be rate limited at 10 requests/hour.

apiVersion: gateway.envoyproxy.io/v1alpha1
kind: SecurityPolicy
metadata:
  name: jwt-example
spec:
  targetRef:
    group: gateway.networking.k8s.io
    kind: HTTPRoute
    name: example
  jwt:
    providers:
      - name: example
        remoteJWKS:
          uri: https://raw.githubusercontent.com/envoyproxy/gateway/main/examples/kubernetes/jwt/jwks.json
        claimToHeaders:
        - claim: name
          header: custom-request-header
---
apiVersion: gateway.envoyproxy.io/v1alpha1
kind: BackendTrafficPolicy
metadata:
  name: ratelimit-specific-user
spec:
  targetRef:
    group: gateway.networking.k8s.io
    kind: HTTPRoute
    name: example
  rateLimit:
    type: Global
    global:
      rules:
      - clientSelectors:
        - headers:
          - name: custom-request-header
            value: John Doe
        limit:
          requests: 10
          unit: Hour
---
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
  name: example
spec:
  parentRefs:
  - name: eg
  hostnames:
  - "www.example.com"
  rules:
  - backendRefs:
    - group: ""
      kind: Service
      name: backend
      port: 3000
      weight: 1
    matches:
    - path:
        type: PathPrefix
        value: /foo

Multiple RateLimitFilters, rules and clientSelectors

Users can create multiple RateLimitFilters and apply it to the same HTTPRoute. In such a case each RateLimitFilter will be applied to the route and matched (and limited) in a mutually exclusive way, independent of each other.
Rate limits are applied for each RateLimitFilter rule when ALL the conditions under clientSelectors hold true.

Here’s an example highlighting this -

apiVersion: gateway.envoyproxy.io/v1alpha1
kind: BackendTrafficPolicy
metadata:
  name: ratelimit-all-safeguard-app 
spec:
  targetRef:
    group: gateway.networking.k8s.io
    kind: HTTPRoute
    name: example
  rateLimit:
    type: Global
    global:
      rules:
      - limit:
          requests: 100
          unit: Hour
---
apiVersion: gateway.envoyproxy.io/v1alpha1
kind: BackendTrafficPolicy
metadata:
  name: ratelimit-per-user
spec:
  targetRef:
    group: gateway.networking.k8s.io
    kind: HTTPRoute
    name: example
  rateLimit:
    type: Global
    global:
      rules:
      - clientSelectors:
        - headers:
          - type: Distinct
            name: x-user-id
        limit:
          requests: 100
          unit: Hour
---
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
  name: example
spec:
  parentRefs:
  - name: eg
  hostnames:
  - www.example.com
  rules:
  - matches:
    - path:
        type: PathPrefix
        value: /foo
    filters:
    - type: ExtensionRef
      extensionRef:
        group: gateway.envoyproxy.io
        kind: RateLimitFilter
        name: ratelimit-per-user
    - type: ExtensionRef
      extensionRef:
        group: gateway.envoyproxy.io
        kind: RateLimitFilter
        name: ratelimit-all-safeguard-app
    backendRefs:
    - name: backend
      port: 3000

The user has created two RateLimitFilters and has attached it to a HTTPRoute - one(ratelimit-all-safeguard-app) to ensure that the backend does not get overwhelmed with requests, any excess requests are rate limited irrespective of the attributes within the traffic flow, and another(ratelimit-per-user) to rate limit each distinct user client who can be differentiated using the x-user-id header, to ensure that each client does not make exessive requests to the backend.
If user baz (identified with the header and value of x-user-id: baz) sends 90 requests within the first second, and user bar sends 11 more requests during that same interval of 1 second, and user bar sends the 101th request within that second, the rule defined in ratelimit-all-safeguard-app gets activated and Envoy Gateway will ratelimit the request sent by bar (and any other request sent within that 1 second). After 1 second, the rate limit counter associated with the ratelimit-all-safeguard-app rule is reset and again evaluated.
If user bar also ends up sending 90 more requests within the hour, summing up bar’s total request count to 101, the rate limit rule defined within ratelimit-per-user will get activated, and bar’s requests will be rate limited again until the hour interval ends.
Within the same above hour, if baz sends 991 more requests, summing up baz’s total request count to 1001, the rate limit rule defined within ratelimit-per-user will get activated for baz, and baz’s requests will also be rate limited until the hour interval ends.

Design Decisions

The initial design uses an Extension filter to apply the Rate Limit functionality on a specific HTTPRoute. This was preferred over the PolicyAttachment extension mechanism, because it is unclear whether Rate Limit will be required to be enforced or overridden by the platform administrator or not.
The RateLimitFilter can only be applied as a filter to a HTTPRouteRule, applying it across all backends within a HTTPRoute and cannot be applied a filter within a HTTPBackendRef for a specific backend.
The HTTPRoute API has a matches field within each rule to select a specific traffic flow to be routed to the destination backend. The RateLimitFilter API that can be attached to an HTTPRoute via an extensionRef filter, also has a clientSelectors field within each rule to select attributes within the traffic flow to rate limit specific clients. The two levels of selectors/matches allow for flexibility and aim to hold match information specific to its use, allowing the author/owner of each configuration to be different. It also allows the clientSelectors field within the RateLimitFilter to be enhanced with other matchable attribute such as IP subnet in the future that are not relevant in the HTTPRoute API.

Implementation Details

Global Rate limiting

Global rate limiting in Envoy Proxy can be achieved using the following -
- Actions can be configured per xDS Route.
- If the match criteria defined within these actions is met for a specific HTTP Request, a set of key value pairs called descriptors defined within the above actions is sent to a remote rate limit service, whose configuration (such as the URL for the rate limit service) is defined using a rate limit filter.
- Based on information received by the rate limit service and its programmed configuration, a decision is computed, whether to rate limit the HTTP Request or not, and is sent back to Envoy, which enforces this decision on the data plane.
Envoy Gateway will leverage this Envoy Proxy feature by -
- Translating the user facing RateLimitFilter API into Rate limit Actions as well as Rate limit service configuration to implement the desired API intent.
- Envoy Gateway will use the existing reference implementation of the rate limit service.
  - The Infrastructure administrator will need to enable the rate limit service using new settings that will be defined in the EnvoyGateway config API.
- The xDS IR will be enhanced to hold the user facing rate limit intent.
- The xDS Translator will be enhanced to translate the rate limit field within the xDS IR into Rate limit Actions as well as instantiate the rate limit filter.
- A new runner called rate-limit will be added that subscribes to the xDS IR messages and translates it into a new Rate Limit Infra IR which contains the rate limit service configuration as well as other information needed to deploy the rate limit service.
- The infrastructure service will be enhanced to subscribe to the Rate Limit Infra IR and deploy a provider specific rate limit service runnable entity.
- A Status field within the RateLimitFilter API will be added to reflect whether the specific configuration was programmed correctly in these multiple locations or not.

22 - Running Envoy Gateway locally

Overview

Today, Envoy Gateway runs only on Kubernetes. This is an ideal solution when the applications are running in Kubernetes. However there might be cases when the applications are running on the host which would require Envoy Gateway to run locally.

Goals

Define an API to allow Envoy Gateway to retrieve configuration while running locally.
Define an API to allow Envoy Gateway to deploy the managed Envoy Proxy fleet on the host machine.

Non Goals

Support multiple ways to retrieve configuration while running locally.
Support multiple ways to deploy the Envoy Proxy fleet locally on the host.

API

The provider field within the EnvoyGateway configuration only supports Kubernetes today which provides two features - the ability to retrieve resources from the Kubernetes API Server as well as deploy the managed Envoy Proxy fleet on Kubernetes.
This document proposes adding a new top level provider type called Custom with two fields called resource and infrastructure to allow the user to configure the sub providers for providing resource configuration and an infrastructure to deploy the Envoy Proxy data plane in.
A File resource provider will be introduced to enable retrieving configuration locally by reading from the configuration from a file.
A Host infrastructure provider will be introduced to allow Envoy Gateway to spawn a Envoy Proxy child process on the host.

Here is an example configuration

provider:
  type: Custom
  custom:
    resource:
      type: File
      file:
        paths: 
        - "config.yaml"
    infrastructure:
      type: Host
      host: {}

23 - SecurityPolicy

Overview

This design document introduces the SecurityPolicy API allowing system administrators to configure authentication and authorization policies to the traffic entering the gateway.

Goals

Add an API definition to hold settings for configuring authentication and authorization rules on the traffic entering the gateway.

Non Goals

Define the API configuration fields in this API.

Implementation

SecurityPolicy is a Policy Attachment type API that can be used to extend Gateway API to define authentication and authorization rules.

Example

Here is an example highlighting how a user can configure this API.

apiVersion: gateway.networking.k8s.io/v1
kind: GatewayClass
metadata:
  name: eg
spec:
  controllerName: gateway.envoyproxy.io/gatewayclass-controller
---
apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
metadata:
  name: eg
  namespace: default
spec:
  gatewayClassName: eg
  listeners:
    - name: https
      protocol: HTTPS
      port: 443
---
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
  name: backend
  namespace: default
spec:
  parentRefs:
    - name: eg
  hostnames:
    - "www.example.com"
  rules:
    - backendRefs:
        - group: ""
          kind: Service
          name: backend
          port: 3000
          weight: 1
      matches:
        - path:
            type: PathPrefix
            value: /
---
apiVersion: gateway.envoyproxy.io/v1alpha1
kind: SecurityPolicy
metadata:
  name: jwt-authn-policy
  namespace: default
spec:
  jwt:
    providers:
    - name: example
      remoteJWKS:
        uri: https://raw.githubusercontent.com/envoyproxy/gateway/main/examples/kubernetes/jwt/jwks.json
  targetRef:
    group: gateway.networking.k8s.io
    kind: Gateway
    name: eg
    namespace: default

Features / API Fields

Here is a list of features that can be included in this API

JWT based authentication
OIDC Authentication
External Authorization
Basic Auth
API Key Auth
CORS

Design Decisions

This API will only support a single targetRef and can bind to a Gateway resource or a HTTPRoute or GRPCRoute.
This API resource MUST be part of same namespace as the targetRef resource
There can be only be ONE policy resource attached to a specific targetRef e.g. a Listener (section) within a Gateway
If the policy targets a resource but cannot attach to it, this information should be reflected in the Policy Status field using the Conflicted=True condition.
If multiple polices target the same resource, the oldest resource (based on creation timestamp) will attach to the Gateway Listeners, the others will not.
If Policy A has a targetRef that includes a sectionName i.e. it targets a specific Listener within a Gateway and Policy B has a targetRef that targets the same entire Gateway then
- Policy A will be applied/attached to the specific Listener defined in the targetRef.SectionName
- Policy B will be applied to the remaining Listeners within the Gateway. Policy B will have an additional status condition Overridden=True.
A Policy targeting the most specific scope wins over a policy targeting a lesser specific scope. i.e. A Policy targeting a xRoute (HTTPRoute or GRPCRoute) overrides a Policy targeting a Listener that is this route’s parentRef which in turn overrides a Policy targeting the Gateway the listener/section is a part of.

Alternatives

The project can indefinitely wait for these configuration parameters to be part of the Gateway API.

24 - TCP and UDP Proxy Design

Even though most of the use cases for Envoy Gateway are at Layer-7, Envoy Gateway can also work at Layer-4 to proxy TCP and UDP traffic. This document will explore the options we have when operating Envoy Gateway at Layer-4 and explain the design decision.

Envoy can work as a non-transparent proxy or a transparent proxy for both TCP and UDP , so ideally, Envoy Gateway should also be able to work in these two modes:

Non-transparent Proxy Mode

For TCP, Envoy terminates the downstream connection, connects the upstream with its own IP address, and proxies the TCP traffic from the downstream to the upstream.

For UDP, Envoy receives UDP datagrams from the downstream, and uses its own IP address as the sender IP address when proxying the UDP datagrams to the upstream.

In this mode, the upstream will see Envoy’s IP address and port.

Transparent Proxy Mode

For TCP, Envoy terminates the downstream connection, connects the upstream with the downstream IP address, and proxies the TCP traffic from the downstream to the upstream.

For UDP, Envoy receives UDP datagrams from the downstream, and uses the downstream IP address as the sender IP address when proxying the UDP datagrams to the upstream.

In this mode, the upstream will see the original downstream IP address and Envoy’s mac address.

Note: Even in transparent mode, the upstream can’t see the port number of the downstream because Envoy doesn’t forward the port number.

The Implications of Transparent Proxy Mode

Escalated Privilege

Envoy needs to bind to the downstream IP when connecting to the upstream, which means Envoy requires escalated CAP_NET_ADMIN privileges. This is often considered as a bad security practice and not allowed in some sensitive deployments.

Routing

The upstream can see the original source IP, but the original port number won’t be passed, so the return traffic from the upstream must be routed back to Envoy because only Envoy knows how to send the return traffic back to the right port number of the downstream, which requires routing at the upstream side to be set up. In a Kubernetes cluster, Envoy Gateway will have to carefully cooperate with CNI plugins to get the routing right.

The Design Decision (For Now)

The implementation will only support proxying in non-transparent mode i.e. the backend will see the source IP and port of the deployed Envoy instance instead of the client.

25 - Wasm OCI Image Support

Motivation

Envoy Gateway (EG) should support Wasm OCI image as a remote wasm code source. This feature will allow users to deploy Wasm modules from OCI registries, such as Docker Hub, Google Container Registry, and Amazon Elastic Container Registry, to Envoy proxies managed by EG. Deploying Wasm modules from OCI registries has several benefits:

Versioning: Users can use the tag feature of the OCI image to manage the version of the Wasm module.
Security: Users can use private registries to store the Wasm module.
Distribution: Users can use the existing distribution mechanism of the OCI registry to distribute the Wasm module.

Goals

Define the system components needed to support Wasm OCI images as remote Wasm code sources.

Architecture

Control Plane Wasm File Cache

Envoy lacks native OCI image support, therefore, EG needs to download Wasm modules from their original OIC registries, cache them locally in the file system, and serve them to Envoy over HTTP.

HTTP Code Source

For HTTP code source, we have two options: serve Wasm modules directly from their original HTTP URLs, or cache them in EG (as with OCI images). Caching both the HTTP Wasm modules and OCI images inside EG can make UI consistent, for example, sha256sum can be calculated on the EG side and made optional in the API. This will also make the Envoy proxy side more efficient as it won’t have to download the Wasm module from the original URL every time, which can be slow.

Resource Consumption

Memory: Since we cache Wasm modules in the file system, we can optimize the memory usage of the cache and the HTTP server to avoid introducing too much memory consumption by this feature. For example, when receiving a Wasm pulling request form the Envoy, we can open the Wasm file and write the file content directly to response, and then close the file. There won’t be significant memory consumption involved.

Disk: Though it’s possible to mount a volume to the container for the Wasm file cache, the current implementation just stores the Wasm files in the EG container’s file system. The disk space consumed by the cache is limited to 1GB by default, it can be made configurable in the future.

Caching Mechanism

Cached files will be evicted based on LRU（Last recently used）algorithm. If the image’s tag is latest, then they will be updated periodically. The cache clean and update periods will be configurable.

Restrict Access to Private Images

Client Authn with mTLS: To prevent unauthorized proxies from accessing the Wasm modules, the communication between the Envoy and EG will be secured using mTLS.
User Authn with Registry Credentials: To prevent unauthorized users from accessing the Wasm modules, the user who creates the EEP must have the appropriate permissions to access the OCI registry. For example, if two users create EEPs in different namespaces (ns1, ns2) accessing the same OCI image, each must also create a unique secret with registry credentials (secret1 for user1 in ns1, secret2 for user2 in ns2) and provide it in the EEP configuration. EG will validate the provided secret against the OCI registry before serving the Wasm module to the target HTTPRoute/Gateway of that EEP.
Unguessable Download URLs: It’s possible that users who have no permission to access a private OCI image could create an EnvoyPatchPolicy to bypass the EG permission check. For example, a user could create an EPP, inject a Wasm filter, and put the download URL of a private OCI image in the Wasm filter configuration. To prevent this, we need to make the download URL unguessable. The download URL will be generated by EG and will be a random string that is impossible to guess. If a user can get the config dump the Envoy Proxy, they can still get the download URL. However, they won’t be able to do so without the permission to access the config dump of Envoy Proxy, which is a more restricted permission (usually Admin role).

Alternative Considered

Inline Bytes

EG downloads the Wasm modules, caches them in memory, and pushes the Wasm code through xDS as inline bytes. This could inflate the xDS, potentially causing memory issues for EG and Envoy.

Data Plane Agent

Mount Wasm modules in the local file system of the Envoy container. We’ll need an agent deployed in the same pod as the Envoy for this. It would be too expensive to implement this as we’ll need to intercept the xDS at the agent.

Standalone Wasm HTTP Server

Deploying the Wasm HTTP server as a standalone service. While this has no obvious benefits, it increases operational costs.

Wait for Envoy OCI image support

We could wait indefinitely for Envoy to support OCI imag as a remote wasm code source.