This is the multi-page printable view of this section. Click here to print.
Design
- 1: Goals
- 2: System Design
- 3: Watching Components Design
- 4: Gateway API Translator Design
- 5: Control Plane Observability: Metrics
- 6: Backend
- 7: BackendTrafficPolicy
- 8: Bootstrap Design
- 9: ClientTrafficPolicy
- 10: Configuration API Design
- 11: Data Plane Observability: Accesslog
- 12: Data Plane Observability: Metrics
- 13: Data Plane Observability: Tracing
- 14: Debug support in Envoy Gateway
- 15: egctl Design
- 16: Envoy Gateway Extensions Design
- 17: EnvoyExtensionPolicy
- 18: EnvoyPatchPolicy
- 19: Metadata in XDS resources
- 20: Rate Limit Design
- 21: Running Envoy Gateway locally
- 22: SecurityPolicy
- 23: TCP and UDP Proxy Design
- 24: Wasm OCI Image Support
1 - Goals
The high-level goal of the Envoy Gateway project is to attract more users to Envoy by lowering barriers to adoption through expressive, extensible, role-oriented APIs that support a multitude of ingress and L7/L4 traffic routing use cases; and provide a common foundation for vendors to build value-added products without having to re-engineer fundamental interactions.
Objectives
Expressive API
The Envoy Gateway project will expose a simple and expressive API, with defaults set for many capabilities.
The API will be the Kubernetes-native Gateway API, plus Envoy-specific extensions and extension points. This expressive and familiar API will make Envoy accessible to more users, especially application developers, and make Envoy a stronger option for “getting started” as compared to other proxies. Application developers will use the API out of the box without needing to understand in-depth concepts of Envoy Proxy or use OSS wrappers. The API will use familiar nouns that users understand.
The core full-featured Envoy xDS APIs will remain available for those who need more capability and for those who add functionality on top of Envoy Gateway, such as commercial API gateway products.
This expressive API will not be implemented by Envoy Proxy, but rather an officially supported translation layer on top.
Batteries included
Envoy Gateway will simplify how Envoy is deployed and managed, allowing application developers to focus on delivering core business value.
The project plans to include additional infrastructure components required by users to fulfill their Ingress and API gateway needs: It will handle Envoy infrastructure provisioning (e.g. Kubernetes Service, Deployment, et cetera), and possibly infrastructure provisioning of related sidecar services. It will include sensible defaults with the ability to override. It will include channels for improving ops by exposing status through API conditions and Kubernetes status sub-resources.
Making an application accessible needs to be a trivial task for any developer. Similarly, infrastructure administrators will enjoy a simplified management model that doesn’t require extensive knowledge of the solution’s architecture to operate.
All environments
Envoy Gateway will support running natively in Kubernetes environments as well as non-Kubernetes deployments.
Initially, Kubernetes will receive the most focus, with the aim of having Envoy Gateway become the de facto standard for Kubernetes ingress supporting the Gateway API. Additional goals include multi-cluster support and various runtime environments.
Extensibility
Vendors will have the ability to provide value-added products built on the Envoy Gateway foundation.
It will remain easy for end-users to leverage common Envoy Proxy extension points such as providing an implementation for authentication methods and rate-limiting. For advanced use cases, users will have the ability to use the full power of xDS.
Since a general-purpose API cannot address all use cases, Envoy Gateway will provide additional extension points for flexibility. As such, Envoy Gateway will form the base of vendor-provided managed control plane solutions, allowing vendors to shift to a higher management plane layer.
Non-objectives
Cannibalize vendor models
Vendors need to have the ability to drive commercial value, so the goal is not to cannibalize any existing vendor monetization model, though some vendors may be affected by it.
Disrupt current Envoy usage patterns
Envoy Gateway is purely an additive convenience layer and is not meant to disrupt any usage pattern of any user with Envoy Proxy, xDS, or go-control-plane.
Personas
In order of priority
1. Application developer
The application developer spends the majority of their time developing business logic code. They require the ability to manage access to their application.
2. Infrastructure administrators
The infrastructure administrators are responsible for the installation, maintenance, and operation of API gateways appliances in infrastructure, such as CRDs, roles, service accounts, certificates, etc. Infrastructure administrators support the needs of application developers by managing instances of Envoy Gateway.
2 - System Design
Goals
- Define the system components needed to satisfy the requirements of Envoy Gateway.
Non-Goals
- Create a detailed design and interface specification for each system component.
Terminology
- Control Plane- A collection of inter-related software components for providing application gateway and routing functionality. The control plane is implemented by Envoy Gateway and provides services for managing the data plane. These services are detailed in the components section.
- Data Plane- Provides intelligent application-level traffic routing and is implemented as one or more Envoy proxies.
Architecture
Configuration
Envoy Gateway is configured statically at startup and the managed data plane is configured dynamically through Kubernetes resources, primarily Gateway API objects.
Static Configuration
Static configuration is used to configure Envoy Gateway at startup, i.e. change the GatewayClass controllerName, configure a Provider, etc. Currently, Envoy Gateway only supports configuration through a configuration file. If the configuration file is not provided, Envoy Gateway starts-up with default configuration parameters.
Dynamic Configuration
Dynamic configuration is based on the concept of a declaring the desired state of the data plane and using reconciliation loops to drive the actual state toward the desired state. The desired state of the data plane is defined as Kubernetes resources that provide the following services:
- Infrastructure Management- Manage the data plane infrastructure, i.e. deploy, upgrade, etc. This configuration is
expressed through GatewayClass and Gateway resources. The
EnvoyProxy
Custom Resource can be referenced bygatewayclass.spec.parametersRef
to modify data plane infrastructure default parameters, e.g. expose Envoy network endpoints using aClusterIP
service instead of aLoadBalancer
service. - Traffic Routing- Define how to handle application-level requests to backend services. For example, route all HTTP requests for “www.example.com” to a backend service running a web server. This configuration is expressed through HTTPRoute and TLSRoute resources that match, filter, and route traffic to a backend. Although a backend can be any valid Kubernetes Group/Kind resource, Envoy Gateway only supports a Service reference.
Components
Envoy Gateway is made up of several components that communicate in-process; how this communication happens is described in the Watching Components Design.
Provider
A Provider is an infrastructure component that Envoy Gateway calls to establish its runtime configuration, resolve services, persist data, etc. As of v0.2, Kubernetes is the only implemented provider. A file provider is on the roadmap via Issue #37. Other providers can be added in the future as Envoy Gateway use cases are better understood. A provider is configured at start up through Envoy Gateway’s static configuration.
Kubernetes Provider
- Uses Kubernetes-style controllers to reconcile Kubernetes resources that comprise the dynamic configuration.
- Manages the data plane through Kubernetes API CRUD operations.
- Uses Kubernetes for Service discovery.
- Uses etcd (via Kubernetes API) to persist data.
File Provider
- Uses a file watcher to watch files in a directory that define the data plane configuration.
- Manages the data plane by calling internal APIs, e.g.
CreateDataPlane()
. - Uses the host’s DNS for Service discovery.
- If needed, the local filesystem is used to persist data.
Resource Watcher
The Resource Watcher watches resources used to establish and maintain Envoy Gateway’s dynamic configuration. The mechanics for watching resources is provider-specific, e.g. informers, caches, etc. are used for the Kubernetes provider. The Resource Watcher uses the configured provider for input and provides resources to the Resource Translator as output.
Resource Translator
The Resource Translator translates external resources, e.g. GatewayClass, from the Resource Watcher to the Intermediate Representation (IR). It is responsible for:
- Translating infrastructure-specific resources/fields from the Resource Watcher to the Infra IR.
- Translating proxy configuration resources/fields from the Resource Watcher to the xDS IR.
Note: The Resource Translator is implemented as the Translator
API type in the gatewayapi
package.
Intermediate Representation (IR)
The Intermediate Representation defines internal data models that external resources are translated into. This allows Envoy Gateway to be decoupled from the external resources used for dynamic configuration. The IR consists of an Infra IR used as input for the Infra Manager and an xDS IR used as input for the xDS Translator.
- Infra IR- Used as the internal definition of the managed data plane infrastructure.
- xDS IR- Used as the internal definition of the managed data plane xDS configuration.
xDS Translator
The xDS Translator translates the xDS IR into xDS Resources that are consumed by the xDS server.
xDS Server
The xDS Server is a xDS gRPC Server based on Go Control Plane. Go Control Plane implements the Delta xDS Server Protocol and is responsible for using xDS to configure the data plane.
Infra Manager
The Infra Manager is a provider-specific component responsible for managing the following infrastructure:
- Data Plane - Manages all the infrastructure required to run the managed Envoy proxies. For example, CRUD Deployment, Service, etc. resources to run Envoy in a Kubernetes cluster.
- Auxiliary Control Planes - Optional infrastructure needed to implement application Gateway features that require external integrations with the managed Envoy proxies. For example, Global Rate Limiting requires provisioning and configuring the Envoy Rate Limit Service and the Rate Limit filter. Such features are exposed to users through the Custom Route Filters extension.
The Infra Manager consumes the Infra IR as input to manage the data plane infrastructure.
Design Decisions
- Envoy Gateway can consume multiple GatewayClass by comparing its configured controller name with
spec.controllerName
of a GatewayClass.gatewayclass.spec.parametersRef
refers to theEnvoyProxy
custom resource for configuring the managed proxy infrastructure. If unspecified, default configuration parameters are used for the managed proxy infrastructure. - Envoy Gateway manages Gateways that reference its GatewayClass.
- A Gateway resource causes Envoy Gateway to provision managed Envoy proxy infrastructure.
- Envoy Gateway groups Listeners by Port and collapses each group of Listeners into a single Listener if the Listeners
in the group are compatible. Envoy Gateway considers Listeners to be compatible if all the following conditions are
met:
- Either each Listener within the group specifies the “HTTP” Protocol or each Listener within the group specifies either the “HTTPS” or “TLS” Protocol.
- Each Listener within the group specifies a unique “Hostname”.
- As a special case, one Listener within a group may omit “Hostname”, in which case this Listener matches when no other Listener matches.
- Envoy Gateway does not merge listeners across multiple Gateways.
- Envoy Gateway follows Gateway API guidelines to resolve any conflicts.
- A Gateway
listener
corresponds to an Envoy proxy Listener.
- A Gateway
- An HTTPRoute resource corresponds to an Envoy proxy Route.
- Each backendRef corresponds to an Envoy proxy Cluster.
- The goal is to make Envoy Gateway components extensible in the future. See the roadmap for additional details.
The draft for this document is here.
3 - Watching Components Design
Envoy Gateway is made up of several components that communicate in-process. Some of them (namely Providers) watch external resources, and “publish” what they see for other components to consume; others watch what another publishes and act on it (such as the resource translator watches what the providers publish, and then publishes its own results that are watched by another component). Some of these internally published results are consumed by multiple components.
To facilitate this communication use the watchable library. The watchable.Map
type is very similar to the
standard library’s sync.Map
type, but supports a .Subscribe
(and .SubscribeSubset
) method that promotes a pub/sub
pattern.
Pub
Many of the things we communicate around are naturally named, either by a bare “name” string or by a “name”/“namespace”
tuple. And because watchable.Map
is typed, it makes sense to have one map for each type of thing (very similar to if
we were using native Go map
s). For example, a struct that might be written to by the Kubernetes provider, and read by
the IR translator:
type ResourceTable struct {
// gateway classes are cluster-scoped; no namespace
GatewayClasses watchable.Map[string, *gwapiv1.GatewayClass]
// gateways are namespace-scoped, so use a k8s.io/apimachinery/pkg/types.NamespacedName as the map key.
Gateways watchable.Map[types.NamespacedName, *gwapiv1.Gateway]
HTTPRoutes watchable.Map[types.NamespacedName, *gwapiv1.HTTPRoute]
}
The Kubernetes provider updates the table by calling table.Thing.Store(name, val)
and table.Thing.Delete(name)
;
updating a map key with a value that is deep-equal (usually reflect.DeepEqual
, but you can implement your own .Equal
method) the current value is a no-op; it won’t trigger an event for subscribers. This is handy so that the publisher
doesn’t have as much state to keep track of; it doesn’t need to know “did I already publish this thing”, it can just
.Store
its data and watchable
will do the right thing.
Sub
Meanwhile, the translator and other interested components subscribe to it with table.Thing.Subscribe
(or
table.Thing.SubscribeSubset
if they only care about a few “Thing"s). So the translator goroutine might look like:
func(ctx context.Context) error {
for snapshot := range k8sTable.HTTPRoutes.Subscribe(ctx) {
fullState := irInput{
GatewayClasses: k8sTable.GatewayClasses.LoadAll(),
Gateways: k8sTable.Gateways.LoadAll(),
HTTPRoutes: snapshot.State,
}
translate(irInput)
}
}
Or, to watch multiple maps in the same loop:
func worker(ctx context.Context) error {
classCh := k8sTable.GatewayClasses.Subscribe(ctx)
gwCh := k8sTable.Gateways.Subscribe(ctx)
routeCh := k8sTable.HTTPRoutes.Subscribe(ctx)
for ctx.Err() == nil {
var arg irInput
select {
case snapshot := <-classCh:
arg.GatewayClasses = snapshot.State
case snapshot := <-gwCh:
arg.Gateways = snapshot.State
case snapshot := <-routeCh:
arg.Routes = snapshot.State
}
if arg.GateWayClasses == nil {
arg.GatewayClasses = k8sTable.GateWayClasses.LoadAll()
}
if arg.GateWays == nil {
arg.Gateways = k8sTable.GateWays.LoadAll()
}
if arg.HTTPRoutes == nil {
arg.HTTPRoutes = k8sTable.HTTPRoutes.LoadAll()
}
translate(irInput)
}
}
From the updates it gets from .Subscribe
, it can get a full view of the map being subscribed to via snapshot.State
;
but it must read the other maps explicitly. Like sync.Map
, watchable.Map
s are thread-safe; while .Subscribe
is a
handy way to know when to run, .Load
and friends can be used without subscribing.
There can be any number of subscribers. For that matter, there can be any number of publishers .Store
ing things, but
it’s probably wise to just have one publisher for each map.
The channel returned from .Subscribe
is immediately readable with a snapshot of the map as it existed when
.Subscribe
was called; and becomes readable again whenever .Store
or .Delete
mutates the map. If multiple
mutations happen between reads (or if mutations happen between .Subscribe
and the first read), they are coalesced in
to one snapshot to be read; the snapshot.State
is the most-recent full state, and snapshot.Updates
is a listing of
each of the mutations that cause this snapshot to be different than the last-read one. This way subscribers don’t need
to worry about a backlog accumulating if they can’t keep up with the rate of changes from the publisher.
If the map contains anything before .Subscribe
is called, that very first read won’t include snapshot.Updates
entries for those pre-existing items; if you are working with snapshot.Update
instead of snapshot.State
, then you
must add special handling for your first read. We have a utility function ./internal/message.HandleSubscription
to
help with this.
Other Notes
The common pattern will likely be that the entrypoint that launches the goroutines for each component instantiates the
map, and passes them to the appropriate publishers and subscribers; same as if they were communicating via a dumb
chan
.
A limitation of watchable.Map
is that in order to ensure safety between goroutines, it does require that value types
be deep-copiable; either by having a DeepCopy
method, being a proto.Message
, or by containing no reference types and
so can be deep-copied by naive assignment. Fortunately, we’re using controller-gen
anyway, and controller-gen
can
generate DeepCopy
methods for us: just stick a // +k8s:deepcopy-gen=true
on the types that you want it to generate
methods for.
4 - Gateway API Translator Design
The Gateway API translates external resources, e.g. GatewayClass, from the configured Provider to the Intermediate Representation (IR).
Assumptions
Initially target core conformance features only, to be followed by extended conformance features.
Inputs and Outputs
The main inputs to the Gateway API translator are:
- GatewayClass, Gateway, HTTPRoute, TLSRoute, Service, ReferenceGrant, Namespace, and Secret resources.
Note: ReferenceGrant is not fully implemented as of v0.2.
The outputs of the Gateway API translator are:
- Xds and Infra Internal Representations (IRs).
- Status updates for GatewayClass, Gateways, HTTPRoutes
Listener Compatibility
Envoy Gateway follows Gateway API listener compatibility spec:
Each listener in a Gateway must have a unique combination of Hostname, Port, and Protocol. An implementation MAY group Listeners by Port and then collapse each group of Listeners into a single Listener if the implementation determines that the Listeners in the group are “compatible”.
Note: Envoy Gateway does not collapse listeners across multiple Gateways.
Listener Compatibility Examples
Example 1: Gateway with compatible Listeners (same port & protocol, different hostnames)
kind: Gateway
apiVersion: gateway.networking.k8s.io/v1
metadata:
name: gateway-1
namespace: envoy-gateway
spec:
gatewayClassName: envoy-gateway
listeners:
- name: http
protocol: HTTP
port: 80
allowedRoutes:
namespaces:
from: All
hostname: "*.envoygateway.io"
- name: http
protocol: HTTP
port: 80
allowedRoutes:
namespaces:
from: All
hostname: whales.envoygateway.io
Example 2: Gateway with compatible Listeners (same port & protocol, one hostname specified, one not)
kind: Gateway
apiVersion: gateway.networking.k8s.io/v1
metadata:
name: gateway-1
namespace: envoy-gateway
spec:
gatewayClassName: envoy-gateway
listeners:
- name: http
protocol: HTTP
port: 80
allowedRoutes:
namespaces:
from: All
hostname: "*.envoygateway.io"
- name: http
protocol: HTTP
port: 80
allowedRoutes:
namespaces:
from: All
Example 3: Gateway with incompatible Listeners (same port, protocol and hostname)
kind: Gateway
apiVersion: gateway.networking.k8s.io/v1
metadata:
name: gateway-1
namespace: envoy-gateway
spec:
gatewayClassName: envoy-gateway
listeners:
- name: http
protocol: HTTP
port: 80
allowedRoutes:
namespaces:
from: All
hostname: whales.envoygateway.io
- name: http
protocol: HTTP
port: 80
allowedRoutes:
namespaces:
from: All
hostname: whales.envoygateway.io
Example 4: Gateway with incompatible Listeners (neither specify a hostname)
kind: Gateway
apiVersion: gateway.networking.k8s.io/v1
metadata:
name: gateway-1
namespace: envoy-gateway
spec:
gatewayClassName: envoy-gateway
listeners:
- name: http
protocol: HTTP
port: 80
allowedRoutes:
namespaces:
from: All
- name: http
protocol: HTTP
port: 80
allowedRoutes:
namespaces:
from: All
Computing Status
Gateway API specifies a rich set of status fields & conditions for each resource. To achieve conformance, Envoy Gateway must compute the appropriate status fields and conditions for managed resources.
Status is computed and set for:
- The managed GatewayClass (
gatewayclass.status.conditions
). - Each managed Gateway, based on its Listeners’ status (
gateway.status.conditions
). For the Kubernetes provider, the Envoy Deployment and Service status are also included to calculate Gateway status. - Listeners for each Gateway (
gateway.status.listeners
). - The ParentRef for each Route (
route.status.parents
).
The Gateway API translator is responsible for calculating status conditions while translating Gateway API resources to the IR and publishing status over the message bus. The Status Manager subscribes to these status messages and updates the resource status using the configured provider. For example, the Status Manager uses a Kubernetes client to update resource status on the Kubernetes API server.
Outline
The following roughly outlines the translation process. Each step may produce (1) IR; and (2) status updates on Gateway API resources.
Process Gateway Listeners
- Validate unique hostnames, ports, and protocols.
- Validate and compute supported kinds.
- Validate allowed namespaces (validate selector if specified).
- Validate TLS fields if specified, including resolving referenced Secrets.
Process HTTPRoutes
- foreach route rule:
- compute matches
- [core] path exact, path prefix
- [core] header exact
- [extended] query param exact
- [extended] HTTP method
- compute filters
- [core] request header modifier (set/add/remove)
- [core] request redirect (hostname, statuscode)
- [extended] request mirror
- compute backends
- [core] Kubernetes services
- compute matches
- foreach route parent ref:
- get matching listeners (check Gateway, section name, listener validation status, listener allowed routes, hostname intersection)
- foreach matching listener:
- foreach hostname intersection with route:
- add each computed route rule to host
- foreach hostname intersection with route:
- foreach route rule:
Context Structs
To help store, access and manipulate information as it’s processed during the translation process, a set of context structs are used. These structs wrap a given Gateway API type, and add additional fields and methods to support processing.
GatewayContext
wraps a Gateway and provides helper methods for setting conditions, accessing Listeners, etc.
type GatewayContext struct {
// The managed Gateway
*v1beta1.Gateway
// A list of Gateway ListenerContexts.
listeners []*ListenerContext
}
ListenerContext
wraps a Listener and provides helper methods for setting conditions and other status information on
the associated Gateway.
type ListenerContext struct {
// The Gateway listener.
*v1beta1.Listener
// The Gateway this Listener belongs to.
gateway *v1beta1.Gateway
// An index used for managing this listener in the list of Gateway listeners.
listenerStatusIdx int
// Only Routes in namespaces selected by the selector may be attached
// to the Gateway this listener belongs to.
namespaceSelector labels.Selector
// The TLS Secret for this Listener, if applicable.
tlsSecret *v1.Secret
}
RouteContext
represents a generic Route object (HTTPRoute, TLSRoute, etc.) that can reference Gateway objects.
type RouteContext interface {
client.Object
// GetRouteType returns the Kind of the Route object, HTTPRoute,
// TLSRoute, TCPRoute, UDPRoute etc.
GetRouteType() string
// GetHostnames returns the hosts targeted by the Route object.
GetHostnames() []string
// GetParentReferences returns the ParentReference of the Route object.
GetParentReferences() []v1beta1.ParentReference
// GetRouteParentContext returns RouteParentContext by using the Route
// objects' ParentReference.
GetRouteParentContext(forParentRef v1beta1.ParentReference) *RouteParentContext
}
5 - Control Plane Observability: Metrics
This document aims to cover all aspects of envoy gateway control plane metrics observability.
Note
Data plane observability (while important) is outside of scope for this document. For data plane observability, refer to here.Current State
At present, the Envoy Gateway control plane provides logs and controller-runtime metrics, without traces. Logs are managed through our proprietary library (internal/logging
, a shim to zap
) and are written to /dev/stdout
.
Goals
Our objectives include:
- Supporting PULL mode for Prometheus metrics and exposing these metrics on the admin address.
- Supporting PUSH mode for Prometheus metrics, thereby sending metrics to the Open Telemetry Stats sink via gRPC or HTTP.
Non-Goals
Our non-goals include:
- Supporting other stats sinks.
Use-Cases
The use-cases include:
- Exposing Prometheus metrics in the Envoy Gateway Control Plane.
- Pushing Envoy Gateway Control Plane metrics via the Open Telemetry Sink.
Design
Standards
Our metrics, will be built upon the OpenTelemetry standards. All metrics will be configured via the OpenTelemetry SDK, which offers neutral libraries that can be connected to various backends.
This approach allows the Envoy Gateway code to concentrate on the crucial aspect - generating the metrics - and delegate all other tasks to systems designed for telemetry ingestion.
Attributes
OpenTelemetry defines a set of Semantic Conventions, including Kubernetes specific ones.
These attributes can be expressed in logs (as keys of structured logs), traces (as attributes), and metrics (as labels).
We aim to use attributes consistently where applicable. Where possible, these should adhere to codified Semantic Conventions; when not possible, they should maintain consistency across the project.
Extensibility
Envoy Gateway supports both PULL/PUSH mode metrics, with Metrics exported via Prometheus by default.
Additionally, Envoy Gateway can export metrics using both the OTEL gRPC metrics exporter and OTEL HTTP metrics exporter, which pushes metrics by grpc/http to a remote OTEL collector.
Users can extend these in two ways:
Downstream Collection
Based on the exported data, other tools can collect, process, and export telemetry as needed. Some examples include:
- Metrics in PULL mode: The OTEL collector can scrape Prometheus and export to X.
- Metrics in PUSH mode: The OTEL collector can receive OTEL gRPC/HTTP exporter metrics and export to X.
While the examples above involve OTEL collectors, there are numerous other systems available.
Vendor extensions
The OTEL libraries allow for the registration of Providers/Handlers. While we will offer the default ones (PULL via Prometheus, PUSH via OTEL HTTP metrics exporter) mentioned in Envoy Gateway’s extensibility, we can easily allow custom builds of Envoy Gateway to plug in alternatives if the default options don’t meet their needs.
For instance, users may prefer to write metrics over the OTLP gRPC metrics exporter instead of the HTTP metrics exporter. This is perfectly acceptable – and almost impossible to prevent. The OTEL has ways to register their providers/exporters, and Envoy Gateway can ensure its usage is such that it’s not overly difficult to swap out a different provider/exporter.
Stability
Observability is, in essence, a user-facing API. Its primary purpose is to be consumed - by both humans and tooling. Therefore, having well-defined guarantees around their formats is crucial.
Please note that this refers only to the contents of the telemetry - what we emit, the names of things, semantics, etc. Other settings like Prometheus vs OTLP, JSON vs plaintext, logging levels, etc., are not considered.
I propose the following:
Metrics
Metrics offer the greatest potential for providing guarantees. They often directly influence alerts and dashboards, making changes highly impactful. This contrasts with traces and logs, which are often used for ad-hoc analysis, where minor changes to information can be easily understood by a human.
Moreover, there is precedent for this: Kubernetes Metrics Lifecycle has well-defined processes, and Envoy Gateway’s dataplane (Envoy Proxy) metrics are de facto stable.
Currently, all Envoy Gateway metrics lack defined stability. I suggest we categorize all existing metrics as either:
- Deprecated: a metric that is intended to be phased out.
- Experimental: a metric that is off by default.
- Alpha: a metric that is on by default.
We should aim to promote a core set of metrics to Stable within a few releases.
Envoy Gateway API Types
New APIs will be added to Envoy Gateway config, which are used to manage Control Plane Telemetry bootstrap configs.
EnvoyGatewayTelemetry
// EnvoyGatewayTelemetry defines telemetry configurations for envoy gateway control plane.
// Control plane will focus on metrics observability telemetry and tracing telemetry later.
type EnvoyGatewayTelemetry struct {
// Metrics defines metrics configuration for envoy gateway.
Metrics *EnvoyGatewayMetrics `json:"metrics,omitempty"`
}
EnvoyGatewayMetrics
Prometheus will be exposed on 0.0.0.0:19001, which is not supported to be configured yet.
// EnvoyGatewayMetrics defines control plane push/pull metrics configurations.
type EnvoyGatewayMetrics struct {
// Sinks defines the metric sinks where metrics are sent to.
Sinks []EnvoyGatewayMetricSink `json:"sinks,omitempty"`
// Prometheus defines the configuration for prometheus endpoint.
Prometheus *EnvoyGatewayPrometheusProvider `json:"prometheus,omitempty"`
}
// EnvoyGatewayMetricSink defines control plane
// metric sinks where metrics are sent to.
type EnvoyGatewayMetricSink struct {
// Type defines the metric sink type.
// EG control plane currently supports OpenTelemetry.
// +kubebuilder:validation:Enum=OpenTelemetry
// +kubebuilder:default=OpenTelemetry
Type MetricSinkType `json:"type"`
// OpenTelemetry defines the configuration for OpenTelemetry sink.
// It's required if the sink type is OpenTelemetry.
OpenTelemetry *EnvoyGatewayOpenTelemetrySink `json:"openTelemetry,omitempty"`
}
type EnvoyGatewayOpenTelemetrySink struct {
// Host define the sink service hostname.
Host string `json:"host"`
// Protocol define the sink service protocol.
// +kubebuilder:validation:Enum=grpc;http
Protocol string `json:"protocol"`
// Port defines the port the sink service is exposed on.
//
// +optional
// +kubebuilder:validation:Minimum=0
// +kubebuilder:default=4317
Port int32 `json:"port,omitempty"`
}
// EnvoyGatewayPrometheusProvider will expose prometheus endpoint in pull mode.
type EnvoyGatewayPrometheusProvider struct {
// Disable defines if disables the prometheus metrics in pull mode.
//
Disable bool `json:"disable,omitempty"`
}
Example
- The following is an example to disable prometheus metric.
apiVersion: gateway.envoyproxy.io/v1alpha1
kind: EnvoyGateway
gateway:
controllerName: gateway.envoyproxy.io/gatewayclass-controller
logging:
level: null
default: info
provider:
type: Kubernetes
telemetry:
metrics:
prometheus:
disable: true
- The following is an example to send metric via Open Telemetry sink to OTEL gRPC Collector.
apiVersion: gateway.envoyproxy.io/v1alpha1
kind: EnvoyGateway
gateway:
controllerName: gateway.envoyproxy.io/gatewayclass-controller
logging:
level: null
default: info
provider:
type: Kubernetes
telemetry:
metrics:
sinks:
- type: OpenTelemetry
openTelemetry:
host: otel-collector.monitoring.svc.cluster.local
port: 4317
protocol: grpc
- The following is an example to disable prometheus metric and send metric via Open Telemetry sink to OTEL HTTP Collector at the same time.
apiVersion: gateway.envoyproxy.io/v1alpha1
kind: EnvoyGateway
gateway:
controllerName: gateway.envoyproxy.io/gatewayclass-controller
logging:
level: null
default: info
provider:
type: Kubernetes
telemetry:
metrics:
prometheus:
disable: false
sinks:
- type: OpenTelemetry
openTelemetry:
host: otel-collector.monitoring.svc.cluster.local
port: 4318
protocol: http
6 - Backend
Overview
This design document introduces the Backend
API allowing system administrators to represent backends without the use
of a K8s Service
resource.
Common use cases for non-Service backends in the K8s and Envoy ecosystem include:
- Cluster-external endpoints, which are currently second-class citizens in Gateway-API (supported using Services and FQDN endpoints).
- Host-local endpoints, such as sidecars or daemons that listen on unix domain sockets or envoy internal listeners, that cannot be represented by a K8s service at all.
Several projects currently support backends that are not registered in the infrastructure-specific service registry.
- K8s Ingress: Resource Backends
- Istio: Service Entry
- Gloo Edge: Upstream
- Consul: External Services
Goals
- Add an API definition to hold settings for configuring Unix Domain Socket, FQDN and IP.
- Determine which resources may reference the new backend resource.
- Determine which existing Gateway-API and Envoy Gateway policies may attach to the new backend resource.
Non Goals
- Support specific backend types, such as S3 Bucket, Redis, AMQP, InfluxDB, etc.
Implementation
The Backend
resource is an implementation-specific Gateway-API BackendObjectReference Extension.
Example
Here is an example highlighting how a user can configure a route that forwards traffic to both a K8s Service and a Backend that has both unix domain socket and ip endpoints. A BackendTLSPolicy is attached to the backend resource, enabling TLS.
apiVersion: v1
kind: Service
metadata:
name: backend
spec:
ports:
- name: http
port: 3000
targetPort: 3000
selector:
app: backend
---
apiVersion: gateway.envoyproxy.io/v1alpha1
kind: Backend
metadata:
name: backend-mixed-ip-uds
spec:
appProtocols:
- gateway.envoyproxy.io/h2c
endpoints:
- unix:
path: /var/run/backend.sock
- ip:
address: 10.244.0.28
port: 3000
---
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
name: backend
spec:
parentRefs:
- name: eg
hostnames:
- "www.example.com"
rules:
- backendRefs:
- group: gateway.envoyproxy.io
kind: Backend
name: backend-mixed-ip-uds
weight: 1
- group: ""
kind: Service
name: backend
port: 3000
weight: 1
matches:
- path:
type: PathPrefix
value: /
---
apiVersion: gateway.networking.k8s.io/v1alpha2
kind: BackendTLSPolicy
metadata:
name: policy-btls
spec:
targetRef:
group: gateway.envoyproxy.io
kind: Backend
name: backend-mixed-ip-uds
tls:
caCertRefs:
- name: backend-tls-checks-certificate
group: ''
kind: ConfigMap
hostname: example.com
Design Decisions
- All instances of
BackendObjectReference
in Envoy Gateway MAY support referencing theBackend
kind. - For security reasons, Envoy Gateway MUST reject references to a
Backend
in xRoute resources. For example, UDS and localhost references will not be supported for xRoutes. - All attributes of the Envoy Gateway extended
BackendRef
resource MUST be implemented for theBackend
resource. - A
Backend
resource referenced byBackendObjectReference
will be translated to Envoy Gateway’s IR DestinationSetting. As such, allBackendAdresses
are treated as equivalent endpoints with identical weights, TLS settings, etc. - Gateway-API and Envoy Gateway policies that attach to Services (BackendTLSPolicy, BackendLBPolicy)
MUST support attachment to the
Backend
resource in Envoy Gateway. - Policy attachment to a named section of the
Backend
resource is not supported at this time. Currently,BackendObjectReference
can only select ports, and not generic section names. Hence, a named section ofBackend
cannot be referenced by routes, and so attachment of policies to named sections will create translation ambiguity. Users that wish to attach policies to some of theBackendAddresses
in aBackend
resource can use multipleBackend
resources and pluralizedBackendRefs
instead. - The
Backend
API SHOULD support other Gateway-API backend features, such as Backend Protocol Selection. Translation of explicit upstream application protocol setting SHOULD be consistent with the existing implementation forService
resources. - The
Backend
upstream transport protocol (TCP, UDP) is inferred from the xRoute kind: TCP is inferred for all routes except forUDPRoute
which is resolved to UDP. - This API resource MUST be part of same namespace as the targetRef resource. The
Backend
API MUST be subject to the same cross-namespace reference restriction as referencedService
resources. - The
Backend
resource translation MUST NOT modify Infrastructure. Any change to infrastructure that is required to achieve connectivity to a backend (mounting a socket, adding a sidecar container, modifying a network policy, …) MUST be implemented with an appropriate infrastructure patch in the EnvoyProxy API. - To limit the overall maintenance effort related to supporting of non-Service backends, the
Backend
API SHOULD support multiple generic address types (UDS, FQDN, IPv4, IPv6), and MUST NOT support vendor-specific backend types. - Both
Backend
andService
resources may appear in the sameBackendRefs
list. - The Optional
Port
field SHOULD NOT be evaluated when referencing aBackend
. - Referenced
Backend
resources MUST be translated to envoy endpoints, similar to the currentService
translation. - Certain combinations of
Backend
andService
are incompatible. For example, a Unix Domain Socket and a FQDN service require different cluster service discovery types (Static/EDS and Strict-DNS accordingly). - If a Backend that is referenced by a route cannot be translated, the
Route
resource will have anAccepted=False
condition with aUnsupportedValue
reason. - This API needs to be explicitly enabled using the EnvoyGateway API
Alternatives
- The project can indefinitely wait for these configuration parameters to be part of the Gateway API.
- Users can leverage the existing Envoy Patch Policy or Envoy Extension Manager to inject custom envoy clusters and route configuration. However, these features require a high level of envoy expertise, investment and maintenance.
7 - BackendTrafficPolicy
Overview
This design document introduces the BackendTrafficPolicy
API allowing users to configure
the behavior for how the Envoy Proxy server communicates with upstream backend services/endpoints.
Goals
- Add an API definition to hold settings for configuring behavior of the connection between the backend services and Envoy Proxy listener.
Non Goals
- Define the API configuration fields in this API.
Implementation
BackendTrafficPolicy
is an implied hierarchy type API that can be used to extend Gateway API.
It can target either a Gateway
, or an xRoute (HTTPRoute
/GRPCRoute
/etc.). When targeting a Gateway
,
it will apply the configured settings within ght BackendTrafficPolicy
to all children xRoute resources of that Gateway
.
If a BackendTrafficPolicy
targets an xRoute and a different BackendTrafficPolicy
targets the Gateway
that route belongs to,
then the configuration from the policy that is targeting the xRoute resource will win in a conflict.
Example
Here is an example highlighting how a user can configure this API.
apiVersion: gateway.networking.k8s.io/v1
kind: GatewayClass
metadata:
name: eg
spec:
controllerName: gateway.envoyproxy.io/gatewayclass-controller
---
apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
metadata:
name: eg
namespace: default
spec:
gatewayClassName: eg
listeners:
- name: http
protocol: HTTP
port: 80
---
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
name: ipv4-route
namespace: default
spec:
parentRefs:
- name: eg
hostnames:
- "www.foo.example.com"
rules:
- backendRefs:
- group: ""
kind: Service
name: ipv4-service
port: 3000
weight: 1
matches:
- path:
type: PathPrefix
value: /
---
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
name: ipv6-route
namespace: default
spec:
parentRefs:
- name: eg
hostnames:
- "www.bar.example.com"
rules:
- backendRefs:
- group: ""
kind: Service
name: ipv6-service
port: 3000
weight: 1
matches:
- path:
type: PathPrefix
value: /
---
apiVersion: gateway.envoyproxy.io/v1alpha1
kind: BackendTrafficPolicy
metadata:
name: default-ipv-policy
namespace: default
spec:
protocols:
enableIPv6: false
targetRef:
group: gateway.networking.k8s.io
kind: Gateway
name: eg
namespace: default
---
apiVersion: gateway.envoyproxy.io/v1alpha1
kind: BackendTrafficPolicy
metadata:
name: ipv6-support-policy
namespace: default
spec:
protocols:
enableIPv6: true
targetRef:
group: gateway.networking.k8s.io
kind: HTTPRoute
name: ipv6-route
namespace: default
Features / API Fields
Here is a list of some features that can be included in this API. Note that this list is not exhaustive.
- Protocol configuration
- Circuit breaking
- Retries
- Keep alive probes
- Health checking
- Load balancing
- Rate limit
Design Decisions
- This API will only support a single
targetRef
and can bind to only aGateway
or xRoute (HTTPRoute
/GRPCRoute
/etc.) resource. - This API resource MUST be part of same namespace as the resource it targets.
- There can be only be ONE policy resource attached to a specific
Listener
(section) within aGateway
- If the policy targets a resource but cannot attach to it, this information should be reflected
in the Policy Status field using the
Conflicted=True
condition. - If multiple polices target the same resource, the oldest resource (based on creation timestamp) will attach to the Gateway Listeners, the others will not.
- If Policy A has a
targetRef
that includes asectionName
i.e. it targets a specific Listener within aGateway
and Policy B has atargetRef
that targets the same entire Gateway then- Policy A will be applied/attached to the specific Listener defined in the
targetRef.SectionName
- Policy B will be applied to the remaining Listeners within the Gateway. Policy B will have an additional
status condition
Overridden=True
.
- Policy A will be applied/attached to the specific Listener defined in the
Alternatives
- The project can indefintely wait for these configuration parameters to be part of the Gateway API.
8 - Bootstrap Design
Overview
Issue 31 specifies the need for allowing advanced users to specify their custom Envoy Bootstrap configuration rather than using the default Bootstrap configuration defined in Envoy Gateway. This allows advanced users to extend Envoy Gateway and support their custom use cases such setting up tracing and stats configuration that is not supported by Envoy Gateway.
Goals
- Define an API field to allow a user to specify a custom Bootstrap
- Provide tooling to allow the user to generate the default Bootstrap configuration as well as validate their custom Bootstrap.
Non Goals
- Allow user to configure only a section of the Bootstrap
API
Leverage the existing EnvoyProxy resource which can be attached to the GatewayClass using
the parametersRef field, and define a Bootstrap
field within the resource. If this field is set,
the value is used as the Bootstrap configuration for all managed Envoy Proxies created by Envoy Gateway.
// EnvoyProxySpec defines the desired state of EnvoyProxy.
type EnvoyProxySpec struct {
......
// Bootstrap defines the Envoy Bootstrap as a YAML string.
// Visit https://www.envoyproxy.io/docs/envoy/latest/api-v3/config/bootstrap/v3/bootstrap.proto#envoy-v3-api-msg-config-bootstrap-v3-bootstrap
// to learn more about the syntax.
// If set, this is the Bootstrap configuration used for the managed Envoy Proxy fleet instead of the default Bootstrap configuration
// set by Envoy Gateway.
// Some fields within the Bootstrap that are required to communicate with the xDS Server (Envoy Gateway) and receive xDS resources
// from it are not configurable and will result in the `EnvoyProxy` resource being rejected.
// Backward compatibility across minor versions is not guaranteed.
// We strongly recommend using `egctl x translate` to generate a `EnvoyProxy` resource with the `Bootstrap` field set to the default
// Bootstrap configuration used. You can edit this configuration, and rerun `egctl x translate` to ensure there are no validation errors.
//
// +optional
Bootstrap *string `json:"bootstrap,omitempty"`
}
Tooling
A CLI tool egctl x translate
will be provided to the user to help generate a working Bootstrap configuration.
Here is an example where a user inputs a GatewayClass
and the CLI generates the EnvoyProxy
resource with the Bootstrap
field populated.
cat <<EOF | egctl x translate --from gateway-api --to gateway-api -f -
apiVersion: gateway.networking.k8s.io/v1
kind: GatewayClass
metadata:
name: eg
spec:
controllerName: gateway.envoyproxy.io/gatewayclass-controller
---
EOF
apiVersion: gateway.networking.k8s.io/v1
kind: GatewayClass
metadata:
name: eg
spec:
controllerName: gateway.envoyproxy.io/gatewayclass-controller
parametersRef:
group: gateway.envoyproxy.io/v1alpha1
kind: EnvoyProxy
name: with-bootstrap-config
---
apiVersion: gateway.envoyproxy.io/v1alpha1
kind: EnvoyProxy
metadata:
name: with-bootstrap-config
spec:
bootstrap: |
admin:
access_log:
- name: envoy.access_loggers.file
typed_config:
"@type": type.googleapis.com/envoy.extensions.access_loggers.file.v3.FileAccessLog
path: /dev/null
address:
socket_address:
address: 127.0.0.1
port_value: 19000
dynamic_resources:
cds_config:
resource_api_version: V3
api_config_source:
api_type: DELTA_GRPC
transport_api_version: V3
grpc_services:
- envoy_grpc:
cluster_name: xds_cluster
set_node_on_first_message_only: true
lds_config:
resource_api_version: V3
api_config_source:
api_type: DELTA_GRPC
transport_api_version: V3
grpc_services:
- envoy_grpc:
cluster_name: xds_cluster
set_node_on_first_message_only: true
static_resources:
clusters:
- connect_timeout: 1s
load_assignment:
cluster_name: xds_cluster
endpoints:
- lb_endpoints:
- endpoint:
address:
socket_address:
address: envoy-gateway
port_value: 18000
typed_extension_protocol_options:
"envoy.extensions.upstreams.http.v3.HttpProtocolOptions":
"@type": "type.googleapis.com/envoy.extensions.upstreams.http.v3.HttpProtocolOptions"
"explicit_http_config":
"http2_protocol_options": {}
name: xds_cluster
type: STRICT_DNS
transport_socket:
name: envoy.transport_sockets.tls
typed_config:
"@type": type.googleapis.com/envoy.extensions.transport_sockets.tls.v3.UpstreamTlsContext
common_tls_context:
tls_params:
tls_maximum_protocol_version: TLSv1_3
tls_certificate_sds_secret_configs:
- name: xds_certificate
sds_config:
path_config_source:
path: "/sds/xds-certificate.json"
resource_api_version: V3
validation_context_sds_secret_config:
name: xds_trusted_ca
sds_config:
path_config_source:
path: "/sds/xds-trusted-ca.json"
resource_api_version: V3
layered_runtime:
layers:
- name: runtime-0
rtds_layer:
rtds_config:
resource_api_version: V3
api_config_source:
transport_api_version: V3
api_type: DELTA_GRPC
grpc_services:
envoy_grpc:
cluster_name: xds_cluster
name: runtime-0
The user can now modify the output, for their use case. Lets say for this example, the user wants to change the admin server port
from 19000
to 18000
, they can do so by editing the previous output and running egctl x translate
again to see if there any validation
errors. Validation errors should be surfaced in the Status subresource. The internal validator will ensure that the Bootstrap string can be
unmarshalled into the Bootstrap object as well as ensure the user can override certain fields within the Bootstrap configuration such as the
address
and tls context within the xds_cluster
which are essential for xDS communication between Envoy Gateway and Envoy Proxy.
cat <<EOF | egctl x translate --from gateway-api --to gateway-api -f -
apiVersion: gateway.networking.k8s.io/v1
kind: GatewayClass
metadata:
name: eg
spec:
controllerName: gateway.envoyproxy.io/gatewayclass-controller
parametersRef:
group: gateway.envoyproxy.io/v1alpha1
kind: EnvoyProxy
name: with-bootstrap-config
---
apiVersion: gateway.envoyproxy.io/v1alpha1
kind: EnvoyProxy
metadata:
name: with-bootstrap-config
spec:
bootstrap: |
admin:
access_log:
- name: envoy.access_loggers.file
typed_config:
"@type": type.googleapis.com/envoy.extensions.access_loggers.file.v3.FileAccessLog
path: /dev/null
address:
socket_address:
address: 127.0.0.1
port_value: 18000
dynamic_resources:
cds_config:
resource_api_version: V3
api_config_source:
api_type: DELTA_GRPC
transport_api_version: V3
grpc_services:
- envoy_grpc:
cluster_name: xds_cluster
set_node_on_first_message_only: true
lds_config:
resource_api_version: V3
api_config_source:
api_type: DELTA_GRPC
transport_api_version: V3
grpc_services:
- envoy_grpc:
cluster_name: xds_cluster
set_node_on_first_message_only: true
static_resources:
clusters:
- connect_timeout: 1s
load_assignment:
cluster_name: xds_cluster
endpoints:
- lb_endpoints:
- endpoint:
address:
socket_address:
address: envoy-gateway
port_value: 18000
typed_extension_protocol_options:
"envoy.extensions.upstreams.http.v3.HttpProtocolOptions":
"@type": "type.googleapis.com/envoy.extensions.upstreams.http.v3.HttpProtocolOptions"
"explicit_http_config":
"http2_protocol_options": {}
name: xds_cluster
type: STRICT_DNS
transport_socket:
name: envoy.transport_sockets.tls
typed_config:
"@type": type.googleapis.com/envoy.extensions.transport_sockets.tls.v3.UpstreamTlsContext
common_tls_context:
tls_params:
tls_maximum_protocol_version: TLSv1_3
tls_certificate_sds_secret_configs:
- name: xds_certificate
sds_config:
path_config_source:
path: "/sds/xds-certificate.json"
resource_api_version: V3
validation_context_sds_secret_config:
name: xds_trusted_ca
sds_config:
path_config_source:
path: "/sds/xds-trusted-ca.json"
resource_api_version: V3
layered_runtime:
layers:
- name: runtime-0
rtds_layer:
rtds_config:
resource_api_version: V3
api_config_source:
transport_api_version: V3
api_type: DELTA_GRPC
grpc_services:
envoy_grpc:
cluster_name: xds_cluster
name: runtime-0
EOF
apiVersion: gateway.networking.k8s.io/v1
kind: GatewayClass
metadata:
name: eg
spec:
controllerName: gateway.envoyproxy.io/gatewayclass-controller
parametersRef:
group: gateway.envoyproxy.io/v1alpha1
kind: EnvoyProxy
name: with-bootstrap-config
---
apiVersion: gateway.envoyproxy.io/v1alpha1
kind: EnvoyProxy
metadata:
name: with-bootstrap-config
spec:
bootstrap: |
admin:
access_log:
- name: envoy.access_loggers.file
typed_config:
"@type": type.googleapis.com/envoy.extensions.access_loggers.file.v3.FileAccessLog
path: /dev/null
address:
socket_address:
address: 127.0.0.1
port_value: 18000
dynamic_resources:
cds_config:
resource_api_version: V3
api_config_source:
api_type: DELTA_GRPC
transport_api_version: V3
grpc_services:
- envoy_grpc:
cluster_name: xds_cluster
set_node_on_first_message_only: true
lds_config:
resource_api_version: V3
api_config_source:
api_type: DELTA_GRPC
transport_api_version: V3
grpc_services:
- envoy_grpc:
cluster_name: xds_cluster
set_node_on_first_message_only: true
static_resources:
clusters:
- connect_timeout: 1s
load_assignment:
cluster_name: xds_cluster
endpoints:
- lb_endpoints:
- endpoint:
address:
socket_address:
address: envoy-gateway
port_value: 18000
typed_extension_protocol_options:
"envoy.extensions.upstreams.http.v3.HttpProtocolOptions":
"@type": "type.googleapis.com/envoy.extensions.upstreams.http.v3.HttpProtocolOptions"
"explicit_http_config":
"http2_protocol_options": {}
name: xds_cluster
type: STRICT_DNS
transport_socket:
name: envoy.transport_sockets.tls
typed_config:
"@type": type.googleapis.com/envoy.extensions.transport_sockets.tls.v3.UpstreamTlsContext
common_tls_context:
tls_params:
tls_maximum_protocol_version: TLSv1_3
tls_certificate_sds_secret_configs:
- name: xds_certificate
sds_config:
path_config_source:
path: "/sds/xds-certificate.json"
resource_api_version: V3
validation_context_sds_secret_config:
name: xds_trusted_ca
sds_config:
path_config_source:
path: "/sds/xds-trusted-ca.json"
resource_api_version: V3
layered_runtime:
layers:
- name: runtime-0
rtds_layer:
rtds_config:
resource_api_version: V3
api_config_source:
transport_api_version: V3
api_type: DELTA_GRPC
grpc_services:
envoy_grpc:
cluster_name: xds_cluster
name: runtime-0
9 - ClientTrafficPolicy
Overview
This design document introduces the ClientTrafficPolicy
API allowing system administrators to configure
the behavior for how the Envoy Proxy server behaves with downstream clients.
Goals
- Add an API definition to hold settings for configuring behavior of the connection between the downstream client and Envoy Proxy listener.
Non Goals
- Define the API configuration fields in this API.
Implementation
ClientTrafficPolicy
is a Direct Policy Attachment type API that can be used to extend Gateway API
to define configuration that affect the connection between the downstream client and Envoy Proxy listener.
Example
Here is an example highlighting how a user can configure this API.
apiVersion: gateway.networking.k8s.io/v1
kind: GatewayClass
metadata:
name: eg
spec:
controllerName: gateway.envoyproxy.io/gatewayclass-controller
---
apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
metadata:
name: eg
namespace: default
spec:
gatewayClassName: eg
listeners:
- name: http
protocol: HTTP
port: 80
---
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
name: backend
namespace: default
spec:
parentRefs:
- name: eg
hostnames:
- "www.example.com"
rules:
- backendRefs:
- group: ""
kind: Service
name: backend
port: 3000
weight: 1
matches:
- path:
type: PathPrefix
value: /
---
apiVersion: gateway.envoyproxy.io/v1alpha1
kind: ClientTrafficPolicy
metadata:
name: enable-proxy-protocol-policy
namespace: default
spec:
targetRef:
group: gateway.networking.k8s.io
kind: Gateway
name: eg
namespace: default
enableProxyProtocol: true
Features / API Fields
Here is a list of features that can be included in this API
- Downstream ProxyProtocol
- Downstream Keep Alives
- IP Blocking
- Downstream HTTP3
Design Decisions
- This API will only support a single
targetRef
and can bind to only aGateway
resource. - This API resource MUST be part of same namespace as the
Gateway
resource - There can be only be ONE policy resource attached to a specific
Listener
(section) within aGateway
- If the policy targets a resource but cannot attach to it, this information should be reflected
in the Policy Status field using the
Conflicted=True
condition. - If multiple polices target the same resource, the oldest resource (based on creation timestamp) will attach to the Gateway Listeners, the others will not.
- If Policy A has a
targetRef
that includes asectionName
i.e. it targets a specific Listener within aGateway
and Policy B has atargetRef
that targets the same entire Gateway then- Policy A will be applied/attached to the specific Listener defined in the
targetRef.SectionName
- Policy B will be applied to the remaining Listeners within the Gateway. Policy B will have an additional
status condition
Overridden=True
.
- Policy A will be applied/attached to the specific Listener defined in the
Alternatives
- The project can indefintely wait for these configuration parameters to be part of the Gateway API.
10 - Configuration API Design
Motivation
Issue 51 specifies the need to design an API for configuring Envoy Gateway. The control plane is configured statically at startup and the data plane is configured dynamically through Kubernetes resources, primarily Gateway API objects. Refer to the Envoy Gateway design doc for additional details regarding Envoy Gateway terminology and configuration.
Goals
- Define an initial API to configure Envoy Gateway at startup.
- Define an initial API for configuring the managed data plane, e.g. Envoy proxies.
Non-Goals
- Implementation of the configuration APIs.
- Define the
status
subresource of the configuration APIs. - Define a complete set of APIs for configuring Envoy Gateway. As stated in the Goals, this document defines the initial configuration APIs.
- Define an API for deploying/provisioning/operating Envoy Gateway. If needed, a future Envoy Gateway operator would be responsible for designing and implementing this type of API.
- Specify tooling for managing the API, e.g. generate protos, CRDs, controller RBAC, etc.
Control Plane API
The EnvoyGateway
API defines the control plane configuration, e.g. Envoy Gateway. Key points of this API are:
- It will define Envoy Gateway’s startup configuration file. If the file does not exist, Envoy Gateway will start up with default configuration parameters.
- EnvoyGateway inlines the
TypeMeta
API. This allows EnvoyGateway to be versioned and managed as a GroupVersionKind scheme. - EnvoyGateway does not contain a metadata field since it’s currently represented as a static configuration file instead of a Kubernetes resource.
- Since EnvoyGateway does not surface status, EnvoyGatewaySpec is inlined.
- If data plane static configuration is required in the future, Envoy Gateway will use a separate file for this purpose.
The v1alpha1
version and gateway.envoyproxy.io
API group get generated:
// gateway/api/config/v1alpha1/doc.go
// Package v1alpha1 contains API Schema definitions for the gateway.envoyproxy.io API group.
//
// +groupName=gateway.envoyproxy.io
package v1alpha1
The initial EnvoyGateway
API:
// gateway/api/config/v1alpha1/envoygateway.go
package valpha1
import (
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
)
// EnvoyGateway is the Schema for the envoygateways API
type EnvoyGateway struct {
metav1.TypeMeta `json:",inline"`
// EnvoyGatewaySpec defines the desired state of Envoy Gateway.
EnvoyGatewaySpec `json:",inline"`
}
// EnvoyGatewaySpec defines the desired state of Envoy Gateway configuration.
type EnvoyGatewaySpec struct {
// Gateway defines Gateway-API specific configuration. If unset, default
// configuration parameters will apply.
//
// +optional
Gateway *Gateway `json:"gateway,omitempty"`
// Provider defines the desired provider configuration. If unspecified,
// the Kubernetes provider is used with default parameters.
//
// +optional
Provider *EnvoyGatewayProvider `json:"provider,omitempty"`
}
// Gateway defines desired Gateway API configuration of Envoy Gateway.
type Gateway struct {
// ControllerName defines the name of the Gateway API controller. If unspecified,
// defaults to "gateway.envoyproxy.io/gatewayclass-controller". See the following
// for additional details:
//
// https://gateway-api.sigs.k8s.io/reference/spec/#gateway.networking.k8s.io/v1.GatewayClass
//
// +optional
ControllerName string `json:"controllerName,omitempty"`
}
// EnvoyGatewayProvider defines the desired configuration of a provider.
// +union
type EnvoyGatewayProvider struct {
// Type is the type of provider to use. If unset, the Kubernetes provider is used.
//
// +unionDiscriminator
Type ProviderType `json:"type,omitempty"`
// Kubernetes defines the configuration of the Kubernetes provider. Kubernetes
// provides runtime configuration via the Kubernetes API.
//
// +optional
Kubernetes *EnvoyGatewayKubernetesProvider `json:"kubernetes,omitempty"`
// File defines the configuration of the File provider. File provides runtime
// configuration defined by one or more files.
//
// +optional
File *EnvoyGatewayFileProvider `json:"file,omitempty"`
}
// ProviderType defines the types of providers supported by Envoy Gateway.
type ProviderType string
const (
// KubernetesProviderType defines the "Kubernetes" provider.
KubernetesProviderType ProviderType = "Kubernetes"
// FileProviderType defines the "File" provider.
FileProviderType ProviderType = "File"
)
// EnvoyGatewayKubernetesProvider defines configuration for the Kubernetes provider.
type EnvoyGatewayKubernetesProvider struct {
// TODO: Add config as use cases are better understood.
}
// EnvoyGatewayFileProvider defines configuration for the File provider.
type EnvoyGatewayFileProvider struct {
// TODO: Add config as use cases are better understood.
}
Note: Provider-specific configuration is defined in the {$PROVIDER_NAME}Provider
API.
Gateway
Gateway defines desired configuration of Gateway API controllers that reconcile and translate Gateway API resources into the Intermediate Representation (IR). Refer to the Envoy Gateway design doc for additional details.
Provider
Provider defines the desired configuration of an Envoy Gateway provider. A provider is an infrastructure component that
Envoy Gateway calls to establish its runtime configuration. Provider is a union type. Therefore, Envoy Gateway
can be configured with only one provider based on the type
discriminator field. Refer to the Envoy Gateway
design doc for additional details.
Control Plane Configuration
The configuration file is defined by the EnvoyGateway API type. At startup, Envoy Gateway searches for the configuration at “/etc/envoy-gateway/config.yaml”.
Start Envoy Gateway:
$ ./envoy-gateway
Since the configuration file does not exist, Envoy Gateway will start with default configuration parameters.
The Kubernetes provider can be configured explicitly using provider.kubernetes
:
$ cat << EOF > /etc/envoy-gateway/config.yaml
apiVersion: gateway.envoyproxy.io/v1alpha1
kind: EnvoyGateway
provider:
type: Kubernetes
kubernetes: {}
EOF
This configuration will cause Envoy Gateway to use the Kubernetes provider with default configuration parameters.
The Kubernetes provider can be configured using the provider
field. For example, the foo
field can be set to “bar”:
$ cat << EOF > /etc/envoy-gateway/config.yaml
apiVersion: gateway.envoyproxy.io/v1alpha1
kind: EnvoyGateway
provider:
type: Kubernetes
kubernetes:
foo: bar
EOF
Note: The Provider API from the Kubernetes package is currently undefined and foo: bar
is provided for
illustration purposes only.
The same API structure is followed for each supported provider. The following example causes Envoy Gateway to use the File provider:
$ cat << EOF > /etc/envoy-gateway/config.yaml
apiVersion: gateway.envoyproxy.io/v1alpha1
kind: EnvoyGateway
provider:
type: File
file:
foo: bar
EOF
Note: The Provider API from the File package is currently undefined and foo: bar
is provided for illustration
purposes only.
Gateway API-related configuration is expressed through the gateway
field. If unspecified, Envoy Gateway will use
default configuration parameters for gateway
. The following example causes the GatewayClass controller to
manage GatewayClasses with controllerName foo
instead of the default gateway.envoyproxy.io/gatewayclass-controller
:
$ cat << EOF > /etc/envoy-gateway/config.yaml
apiVersion: gateway.envoyproxy.io/v1alpha1
kind: EnvoyGateway
gateway:
controllerName: foo
EOF
With any of the above configuration examples, Envoy Gateway can be started without any additional arguments:
$ ./envoy-gateway
Data Plane API
The data plane is configured dynamically through Kubernetes resources, primarily Gateway API objects.
Optionally, the data plane infrastructure can be configured by referencing a custom resource (CR) through
spec.parametersRef
of the managed GatewayClass. The EnvoyProxy
API defines the data plane infrastructure
configuration and is represented as the CR referenced by the managed GatewayClass. Key points of this API are:
- If unreferenced by
gatewayclass.spec.parametersRef
, default parameters will be used to configure the data plane infrastructure, e.g. expose Envoy network endpoints using a LoadBalancer service. - Envoy Gateway will follow Gateway API recommendations regarding updates to the EnvoyProxy CR:
It is recommended that this resource be used as a template for Gateways. This means that a Gateway is based on the state of the GatewayClass at the time it was created and changes to the GatewayClass or associated parameters are not propagated down to existing Gateways.
The initial EnvoyProxy
API:
// gateway/api/config/v1alpha1/envoyproxy.go
package v1alpha1
import (
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
)
// EnvoyProxy is the Schema for the envoyproxies API.
type EnvoyProxy struct {
metav1.TypeMeta `json:",inline"`
metav1.ObjectMeta `json:"metadata,omitempty"`
Spec EnvoyProxySpec `json:"spec,omitempty"`
Status EnvoyProxyStatus `json:"status,omitempty"`
}
// EnvoyProxySpec defines the desired state of Envoy Proxy infrastructure
// configuration.
type EnvoyProxySpec struct {
// Undefined by this design spec.
}
// EnvoyProxyStatus defines the observed state of EnvoyProxy.
type EnvoyProxyStatus struct {
// Undefined by this design spec.
}
The EnvoyProxySpec and EnvoyProxyStatus fields will be defined in the future as proxy infrastructure configuration use cases are better understood.
Data Plane Configuration
GatewayClass and Gateway resources define the data plane infrastructure. Note that all examples assume Envoy Gateway is running with the Kubernetes provider.
apiVersion: gateway.networking.k8s.io/v1
kind: GatewayClass
metadata:
name: example-class
spec:
controllerName: gateway.envoyproxy.io/gatewayclass-controller
---
apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
metadata:
name: example-gateway
spec:
gatewayClassName: example-class
listeners:
- name: http
protocol: HTTP
port: 80
Since the GatewayClass does not define spec.parametersRef
, the data plane is provisioned using default configuration
parameters. The Envoy proxies will be configured with a http listener and a Kubernetes LoadBalancer service listening
on port 80.
The following example will configure the data plane to use a ClusterIP service instead of the default LoadBalancer service:
apiVersion: gateway.networking.k8s.io/v1
kind: GatewayClass
metadata:
name: example-class
spec:
controllerName: gateway.envoyproxy.io/gatewayclass-controller
parametersRef:
name: example-config
group: gateway.envoyproxy.io
kind: EnvoyProxy
---
apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
metadata:
name: example-gateway
spec:
gatewayClassName: example-class
listeners:
- name: http
protocol: HTTP
port: 80
---
apiVersion: gateway.envoyproxy.io/v1alpha1
kind: EnvoyProxy
metadata:
name: example-config
spec:
networkPublishing:
type: ClusterIPService
Note: The NetworkPublishing API is currently undefined and is provided here for illustration purposes only.
11 - Data Plane Observability: Accesslog
Overview
Envoy supports extensible accesslog to different sinks, File, gRPC etc.
Envoy supports customizable access log formats using predefined fields as well as arbitrary HTTP request and response headers.
Envoy supports several built-in access log filters and extension filters that are registered at runtime.
Envoy Gateway leverages Gateway API for configuring managed Envoy proxies. Gateway API defines core, extended, and implementation-specific API support levels for implementers such as Envoy Gateway to expose features. Since accesslog is not covered by Core
or Extended
APIs, EG should provide an easy to config access log formats and sinks per EnvoyProxy
.
Goals
- Support send accesslog to
File
orOpenTelemetry
backend - TODO: Support access log filters base on CEL expression
Non-Goals
- Support non-CEL filters, e.g.
status_code_filter
,response_flag_filter
- Support HttpGrpcAccessLogConfig or TcpGrpcAccessLogConfig
Use-Cases
- Configure accesslog for a
EnvoyProxy
toFile
- Configure accesslog for a
EnvoyProxy
toOpenTelemetry
backend - Configure multi accesslog providers for a
EnvoyProxy
ProxyAccessLog API Type
type ProxyAccessLog struct {
// Disable disables access logging for managed proxies if set to true.
Disable bool `json:"disable,omitempty"`
// Settings defines accesslog settings for managed proxies.
// If unspecified, will send default format to stdout.
// +optional
Settings []ProxyAccessLogSetting `json:"settings,omitempty"`
}
type ProxyAccessLogSetting struct {
// Format defines the format of accesslog.
Format ProxyAccessLogFormat `json:"format"`
// Sinks defines the sinks of accesslog.
// +kubebuilder:validation:MinItems=1
Sinks []ProxyAccessLogSink `json:"sinks"`
}
type ProxyAccessLogFormatType string
const (
// ProxyAccessLogFormatTypeText defines the text accesslog format.
ProxyAccessLogFormatTypeText ProxyAccessLogFormatType = "Text"
// ProxyAccessLogFormatTypeJSON defines the JSON accesslog format.
ProxyAccessLogFormatTypeJSON ProxyAccessLogFormatType = "JSON"
// TODO: support format type "mix" in the future.
)
// ProxyAccessLogFormat defines the format of accesslog.
// +union
type ProxyAccessLogFormat struct {
// Type defines the type of accesslog format.
// +kubebuilder:validation:Enum=Text;JSON
// +unionDiscriminator
Type ProxyAccessLogFormatType `json:"type,omitempty"`
// Text defines the text accesslog format, following Envoy accesslog formatting,
// empty value results in proxy's default access log format.
// It's required when the format type is "Text".
// Envoy [command operators](https://www.envoyproxy.io/docs/envoy/latest/configuration/observability/access_log/usage#command-operators) may be used in the format.
// The [format string documentation](https://www.envoyproxy.io/docs/envoy/latest/configuration/observability/access_log/usage#config-access-log-format-strings) provides more information.
// +optional
Text *string `json:"text,omitempty"`
// JSON is additional attributes that describe the specific event occurrence.
// Structured format for the envoy access logs. Envoy [command operators](https://www.envoyproxy.io/docs/envoy/latest/configuration/observability/access_log/usage#command-operators)
// can be used as values for fields within the Struct.
// It's required when the format type is "JSON".
// +optional
JSON map[string]string `json:"json,omitempty"`
}
type ProxyAccessLogSinkType string
const (
// ProxyAccessLogSinkTypeFile defines the file accesslog sink.
ProxyAccessLogSinkTypeFile ProxyAccessLogSinkType = "File"
// ProxyAccessLogSinkTypeOpenTelemetry defines the OpenTelemetry accesslog sink.
ProxyAccessLogSinkTypeOpenTelemetry ProxyAccessLogSinkType = "OpenTelemetry"
)
type ProxyAccessLogSink struct {
// Type defines the type of accesslog sink.
// +kubebuilder:validation:Enum=File;OpenTelemetry
Type ProxyAccessLogSinkType `json:"type,omitempty"`
// File defines the file accesslog sink.
// +optional
File *FileEnvoyProxyAccessLog `json:"file,omitempty"`
// OpenTelemetry defines the OpenTelemetry accesslog sink.
// +optional
OpenTelemetry *OpenTelemetryEnvoyProxyAccessLog `json:"openTelemetry,omitempty"`
}
type FileEnvoyProxyAccessLog struct {
// Path defines the file path used to expose envoy access log(e.g. /dev/stdout).
// Empty value disables accesslog.
Path string `json:"path,omitempty"`
}
// TODO: consider reuse ExtensionService?
type OpenTelemetryEnvoyProxyAccessLog struct {
// Host define the extension service hostname.
Host string `json:"host"`
// Port defines the port the extension service is exposed on.
//
// +optional
// +kubebuilder:validation:Minimum=0
// +kubebuilder:default=4317
Port int32 `json:"port,omitempty"`
// Resources is a set of labels that describe the source of a log entry, including envoy node info.
// It's recommended to follow [semantic conventions](https://opentelemetry.io/docs/reference/specification/resource/semantic_conventions/).
// +optional
Resources map[string]string `json:"resources,omitempty"`
// TODO: support more OpenTelemetry accesslog options(e.g. TLS, auth etc.) in the future.
}
Example
- The following is an example to disable access log.
apiVersion: gateway.envoyproxy.io/v1alpha1
kind: EnvoyProxy
metadata:
name: disable-accesslog
namespace: envoy-gateway-system
spec:
telemetry:
accessLog:
disable: true
- The following is an example with text format access log.
apiVersion: gateway.envoyproxy.io/v1alpha1
kind: EnvoyProxy
metadata:
name: text-access-logging
namespace: envoy-gateway-system
spec:
telemetry:
accessLog:
settings:
- format:
type: Text
text: |
[%START_TIME%] "%REQ(:METHOD)% %REQ(X-ENVOY-ORIGINAL-PATH?:PATH)% %PROTOCOL%" %RESPONSE_CODE% %RESPONSE_FLAGS% %BYTES_RECEIVED% %BYTES_SENT% %DURATION% "%REQ(X-FORWARDED-FOR)%" "%REQ(USER-AGENT)%" "%REQ(X-REQUEST-ID)%" "%REQ(:AUTHORITY)%" "%UPSTREAM_HOST%"
sinks:
- type: File
file:
path: /dev/stdout
- The following is an example with json format access log.
apiVersion: gateway.envoyproxy.io/v1alpha1
kind: EnvoyProxy
metadata:
name: json-access-logging
namespace: envoy-gateway-system
spec:
telemetry:
accessLog:
settings:
- format:
type: JSON
json:
status: "%RESPONSE_CODE%"
message: "%LOCAL_REPLY_BODY%"
sinks:
- type: File
file:
path: /dev/stdout
- The following is an example with OpenTelemetry format access log.
apiVersion: gateway.envoyproxy.io/v1alpha1
kind: EnvoyProxy
metadata:
name: otel-access-logging
namespace: envoy-gateway-system
spec:
telemetry:
accessLog:
settings:
- format:
type: Text
text: |
[%START_TIME%] "%REQ(:METHOD)% %REQ(X-ENVOY-ORIGINAL-PATH?:PATH)% %PROTOCOL%" %RESPONSE_CODE% %RESPONSE_FLAGS% %BYTES_RECEIVED% %BYTES_SENT% %DURATION% "%REQ(X-FORWARDED-FOR)%" "%REQ(USER-AGENT)%" "%REQ(X-REQUEST-ID)%" "%REQ(:AUTHORITY)%" "%UPSTREAM_HOST%"
sinks:
- type: OpenTelemetry
openTelemetry:
host: otel-collector.monitoring.svc.cluster.local
port: 4317
resources:
k8s.cluster.name: "cluster-1"
- The following is an example of sending same format to different sinks.
apiVersion: gateway.envoyproxy.io/v1alpha1
kind: EnvoyProxy
metadata:
name: multi-sinks
namespace: envoy-gateway-system
spec:
telemetry:
accessLog:
settings:
- format:
type: Text
text: |
[%START_TIME%] "%REQ(:METHOD)% %REQ(X-ENVOY-ORIGINAL-PATH?:PATH)% %PROTOCOL%" %RESPONSE_CODE% %RESPONSE_FLAGS% %BYTES_RECEIVED% %BYTES_SENT% %DURATION% "%REQ(X-FORWARDED-FOR)%" "%REQ(USER-AGENT)%" "%REQ(X-REQUEST-ID)%" "%REQ(:AUTHORITY)%" "%UPSTREAM_HOST%"
sinks:
- type: File
file:
path: /dev/stdout
- type: OpenTelemetry
openTelemetry:
host: otel-collector.monitoring.svc.cluster.local
port: 4317
resources:
k8s.cluster.name: "cluster-1"
12 - Data Plane Observability: Metrics
This document aims to cover all aspects of envoy gateway data plane metrics observability.
Note
Control plane observability (while important) is outside of scope for this document. For control plane observability, refer to here.Overview
Envoy provide robust platform for metrics, Envoy support three different kinds of stats: counter, gauges, histograms.
Envoy enables prometheus format output via the /stats/prometheus
admin endpoint.
Envoy support different kinds of sinks, but EG will only support Open Telemetry sink.
Envoy Gateway leverages Gateway API for configuring managed Envoy proxies. Gateway API defines core, extended, and implementation-specific API support levels for implementers such as Envoy Gateway to expose features. Since metrics is not covered by Core
or Extended
APIs, EG should provide an easy to config metrics per EnvoyProxy
.
Goals
- Support expose metrics in prometheus way(reuse probe port).
- Support Open Telemetry stats sink.
Non-Goals
- Support other stats sink.
Use-Cases
- Enable prometheus metric by default
- Disable prometheus metric
- Push metrics via Open Telemetry Sink
- TODO: Customize histogram buckets of target metric
- TODO: Support stats matcher
ProxyMetric API Type
type ProxyMetrics struct {
// Prometheus defines the configuration for Admin endpoint `/stats/prometheus`.
Prometheus *PrometheusProvider `json:"prometheus,omitempty"`
// Sinks defines the metric sinks where metrics are sent to.
Sinks []MetricSink `json:"sinks,omitempty"`
}
type MetricSinkType string
const (
MetricSinkTypeOpenTelemetry MetricSinkType = "OpenTelemetry"
)
type MetricSink struct {
// Type defines the metric sink type.
// EG currently only supports OpenTelemetry.
// +kubebuilder:validation:Enum=OpenTelemetry
// +kubebuilder:default=OpenTelemetry
Type MetricSinkType `json:"type"`
// OpenTelemetry defines the configuration for OpenTelemetry sink.
// It's required if the sink type is OpenTelemetry.
OpenTelemetry *OpenTelemetrySink `json:"openTelemetry,omitempty"`
}
type OpenTelemetrySink struct {
// Host define the service hostname.
Host string `json:"host"`
// Port defines the port the service is exposed on.
//
// +optional
// +kubebuilder:validation:Minimum=0
// +kubebuilder:validation:Maximum=65535
// +kubebuilder:default=4317
Port int32 `json:"port,omitempty"`
// TODO: add support for customizing OpenTelemetry sink in https://www.envoyproxy.io/docs/envoy/latest/api-v3/extensions/stat_sinks/open_telemetry/v3/open_telemetry.proto#envoy-v3-api-msg-extensions-stat-sinks-open-telemetry-v3-sinkconfig
}
type PrometheusProvider struct {
// Disable the Prometheus endpoint.
Disable bool `json:"disable,omitempty"`
}
Example
- The following is an example to disable prometheus metric.
apiVersion: gateway.envoyproxy.io/v1alpha1
kind: EnvoyProxy
metadata:
name: prometheus
namespace: envoy-gateway-system
spec:
telemetry:
metrics:
prometheus:
disable: true
- The following is an example to send metric via Open Telemetry sink.
apiVersion: gateway.envoyproxy.io/v1alpha1
kind: EnvoyProxy
metadata:
name: otel-sink
namespace: envoy-gateway-system
spec:
telemetry:
metrics:
sinks:
- type: OpenTelemetry
openTelemetry:
host: otel-collector.monitoring.svc.cluster.local
port: 4317
13 - Data Plane Observability: Tracing
Overview
Envoy supports extensible tracing to different sinks, Zipkin, OpenTelemetry etc. Overview of Envoy tracing can be found here.
Envoy Gateway leverages Gateway API for configuring managed Envoy proxies. Gateway API defines core, extended, and implementation-specific API support levels for implementers such as Envoy Gateway to expose features. Since tracing is not covered by Core
or Extended
APIs, EG should provide an easy to config tracing per EnvoyProxy
.
Only OpenTelemetry sink can be configured currently, you can use OpenTelemetry Collector to export to other tracing backends.
Goals
- Support send tracing to
OpenTelemetry
backend - Support configurable sampling rate
- Support propagate tag from
Literal
,Environment
andRequest Header
Non-Goals
- Support other tracing backend, e.g.
Zipkin
,Jaeger
Use-Cases
- Configure accesslog for a
EnvoyProxy
toFile
ProxyAccessLog API Type
type ProxyTracing struct {
// SamplingRate controls the rate at which traffic will be
// selected for tracing if no prior sampling decision has been made.
// Defaults to 100, valid values [0-100]. 100 indicates 100% sampling.
// +kubebuilder:validation:Minimum=0
// +kubebuilder:validation:Maximum=100
// +kubebuilder:default=100
// +optional
SamplingRate *uint32 `json:"samplingRate,omitempty"`
// CustomTags defines the custom tags to add to each span.
// If provider is kubernetes, pod name and namespace are added by default.
CustomTags map[string]CustomTag `json:"customTags,omitempty"`
// Provider defines the tracing provider.
// Only OpenTelemetry is supported currently.
Provider TracingProvider `json:"provider"`
}
type TracingProviderType string
const (
TracingProviderTypeOpenTelemetry TracingProviderType = "OpenTelemetry"
)
type TracingProvider struct {
// Type defines the tracing provider type.
// EG currently only supports OpenTelemetry.
// +kubebuilder:validation:Enum=OpenTelemetry
// +kubebuilder:default=OpenTelemetry
Type TracingProviderType `json:"type"`
// Host define the provider service hostname.
Host string `json:"host"`
// Port defines the port the provider service is exposed on.
//
// +optional
// +kubebuilder:validation:Minimum=0
// +kubebuilder:default=4317
Port int32 `json:"port,omitempty"`
}
type CustomTagType string
const (
// CustomTagTypeLiteral adds hard-coded value to each span.
CustomTagTypeLiteral CustomTagType = "Literal"
// CustomTagTypeEnvironment adds value from environment variable to each span.
CustomTagTypeEnvironment CustomTagType = "Environment"
// CustomTagTypeRequestHeader adds value from request header to each span.
CustomTagTypeRequestHeader CustomTagType = "RequestHeader"
)
type CustomTag struct {
// Type defines the type of custom tag.
// +kubebuilder:validation:Enum=Literal;Environment;RequestHeader
// +unionDiscriminator
// +kubebuilder:default=Literal
Type CustomTagType `json:"type"`
// Literal adds hard-coded value to each span.
// It's required when the type is "Literal".
Literal *LiteralCustomTag `json:"literal,omitempty"`
// Environment adds value from environment variable to each span.
// It's required when the type is "Environment".
Environment *EnvironmentCustomTag `json:"environment,omitempty"`
// RequestHeader adds value from request header to each span.
// It's required when the type is "RequestHeader".
RequestHeader *RequestHeaderCustomTag `json:"requestHeader,omitempty"`
// TODO: add support for Metadata tags in the future.
// EG currently doesn't support metadata for route or cluster.
}
// LiteralCustomTag adds hard-coded value to each span.
type LiteralCustomTag struct {
// Value defines the hard-coded value to add to each span.
Value string `json:"value"`
}
// EnvironmentCustomTag adds value from environment variable to each span.
type EnvironmentCustomTag struct {
// Name defines the name of the environment variable which to extract the value from.
Name string `json:"name"`
// DefaultValue defines the default value to use if the environment variable is not set.
// +optional
DefaultValue *string `json:"defaultValue,omitempty"`
}
// RequestHeaderCustomTag adds value from request header to each span.
type RequestHeaderCustomTag struct {
// Name defines the name of the request header which to extract the value from.
Name string `json:"name"`
// DefaultValue defines the default value to use if the request header is not set.
// +optional
DefaultValue *string `json:"defaultValue,omitempty"`
}
Example
- The following is an example to config tracing.
apiVersion: gateway.envoyproxy.io/v1alpha1
kind: EnvoyProxy
metadata:
name: tracing
namespace: envoy-gateway-system
spec:
telemetry:
tracing:
# sample 100% of requests
samplingRate: 100
provider:
host: otel-collector.monitoring.svc.cluster.local
port: 4317
customTags:
# This is an example of using a literal as a tag value
key1:
type: Literal
literal:
value: "val1"
# This is an example of using an environment variable as a tag value
env1:
type: Environment
environment:
name: ENV1
defaultValue: "-"
# This is an example of using a header value as a tag value
header1:
type: RequestHeader
requestHeader:
name: X-Header-1
defaultValue: "-"
14 - Debug support in Envoy Gateway
Overview
Envoy Gateway exposes endpoints at localhost:19000/debug/pprof
to run Golang profiles to aid in live debugging.
The endpoints are equivalent to those found in the http/pprof package. /debug/pprof/
returns an HTML page listing the available profiles.
Goals
- Add admin server to Envoy Gateway control plane, separated with admin server.
- Add pprof support to Envoy Gateway control plane.
- Define an API to allow Envoy Gateway to custom admin server configuration.
- Define an API to allow Envoy Gateway to open envoy gateway config dump in logs.
The following are the different types of profiles end-user can run:
PROFILE | FUNCTION |
---|---|
/debug/pprof/allocs | Returns a sampling of all past memory allocations. |
/debug/pprof/block | Returns stack traces of goroutines that led to blocking on synchronization primitives. |
/debug/pprof/cmdline | Returns the command line that was invoked by the current program. |
/debug/pprof/goroutine | Returns stack traces of all current goroutines. |
/debug/pprof/heap | Returns a sampling of memory allocations of live objects. |
/debug/pprof/mutex | Returns stack traces of goroutines holding contended mutexes. |
/debug/pprof/profile | Returns pprof-formatted cpu profile. You can specify the duration using the seconds GET parameter. The default duration is 30 seconds. |
/debug/pprof/symbol | Returns the program counters listed in the request. |
/debug/pprof/threadcreate | Returns stack traces that led to creation of new OS threads. |
/debug/pprof/trace | Returns the execution trace in binary form. You can specify the duration using the seconds GET parameter. The default duration is 1 second. |
Non Goals
API
- Add
admin
field in EnvoyGateway config. - Add
address
field underadmin
field. - Add
port
andhost
underaddress
field. - Add
enableDumpConfig
field under `admin field. - Add
enablePprof
field under `admin field.
Here is an example configuration to open admin server and enable Pprof:
apiVersion: gateway.envoyproxy.io/v1alpha1
gateway:
controllerName: "gateway.envoyproxy.io/gatewayclass-controller"
kind: EnvoyGateway
provider:
type: "Kubernetes"
admin:
enablePprof: true
address:
host: 127.0.0.1
port: 19000
Here is an example configuration to open envoy gateway config dump in logs:
apiVersion: gateway.envoyproxy.io/v1alpha1
gateway:
controllerName: "gateway.envoyproxy.io/gatewayclass-controller"
kind: EnvoyGateway
provider:
type: "Kubernetes"
admin:
enableDumpConfig: true
15 - egctl Design
Motivation
EG should provide a command line tool with following capabilities:
- Collect configuration from envoy proxy and gateway
- Analyse system configuration to diagnose any issues in envoy gateway
This tool is named egctl
.
Syntax
Use the following syntax to run egctl
commands from your terminal window:
egctl [command] [entity] [name] [flags]
where command
, name
, and flags
are:
command
: Specifies the operation that you want to perform on one or more resources, for exampleconfig
,version
.entity
: Specifies the entity the operation is being performed on such asenvoy-proxy
orenvoy-gateway
.name
: Specifies the name of the specified instance.flags
: Specifies optional flags. For example, you can use the-c
or--config
flags to specify the values for installing.
If you need help, run egctl help
from the terminal window.
Operation
The following table includes short descriptions and the general syntax for all the egctl
operations:
Operation | Syntax | Description |
---|---|---|
version | egctl version | Prints out build version information. |
config | egctl config ENTITY | Retrieve information about proxy configuration from envoy proxy and gateway |
analyze | egctl analyze | Analyze EG configuration and print validation messages |
experimental | egctl experimental | Subcommand for experimental features. These do not guarantee backwards compatibility |
Examples
Use the following set of examples to help you familiarize yourself with running the commonly used egctl
operations:
# Retrieve all information about proxy configuration from envoy
egctl config envoy-proxy all <instance_name>
# Retrieve listener information about proxy configuration from envoy
egctl config envoy-proxy listener <instance_name>
# Retrieve the relevant rate limit configuration from the Rate Limit instance
egctl config envoy-ratelimit
16 - Envoy Gateway Extensions Design
As outlined in the official goals for the Envoy Gateway project, one of the main goals is to “provide a common foundation for vendors to build value-added products without having to re-engineer fundamental interactions”. Development of the Envoy Gateway project has been focused on developing the core features for the project and Kubernetes Gateway API conformance. This system focuses on the “common foundation for vendors” component by introducing a way for vendors to extend Envoy Gateway.
To meaningfully extend Envoy Gateway and provide additional features, Extensions need to be able to introduce their own custom resources and have a high level of control over the configuration generated by Envoy Gateway. Simply applying some static xDS configuration patches or relying on the existing Gateway API resources are both insufficient on their own as means to add larger features that require dynamic user-configuration.
As an example, an extension developer may wish to provide their own out-of-the-box authentication filters that require configuration from the end-user. This is a scenario where the ability to introduce custom resources and attach them to HTTPRoutes as an ExtensionRef is necessary. Providing the same feature through a series of xDS patch resources would be too cumbersome for many end-users that want to avoid that level of complexity when managing their clusters.
Goals
- Provide a foundation for extending the Envoy Gateway control plane
- Allow Extension Developers to introduce their own custom resources for extending the Gateway-API via ExtensionRefs, policyAttachments (future) and backendRefs (future).
- Extension developers should NOT have to maintain a custom fork of Envoy Gateway
- Provide a system for extending Envoy Gateway which allows extension projects to ship updates independent of Envoy Gateway’s release schedule
- Modify the generated Envoy xDS config
- Setup a foundation for the initial iteration of Extending Envoy Gateway
- Allow an Extension to hook into the infra manager pipeline (future)
Non-Goals
- The initial design does not capture every hook that Envoy Gateway will eventually support.
- Extend Gateway API Policy Attachments. At some point, these will be addressed using this extension system, but the initial implementation omits these.
- Support multiple extensions at the same time. Due to the fact that extensions will be modifying xDS resources after they are generated, handling the order of extension execution for each individual hook point is a challenge. Additionally, there is no real way to prevent one extension from overwriting or breaking modifications to xDS resources that were made by another extension that was executed first.
Overview
Envoy Gateway can be extended by vendors by means of an extension server developed by the vendor and deployed alongside Envoy Gateway. An extension server can make use of one or more pre/post hooks inside Envoy Gateway before and after its major components (translator, etc.) to allow the extension to modify the data going into or coming out of these components. An extension can be created external to Envoy Gateway as its own Kubernetes deployment or loaded as a sidecar. gRPC is used for the calls between Envoy Gateway and an extension. In the hook call, Envoy Gateway sends data as well as context information to the extension and expects a reply with a modified version of the data that was sent to the extension. Since extensions fundamentally alter the logic and data that Envoy Gateway provides, Extension projects assume responsibility for any bugs and issues they create as a direct result of their modification of Envoy Gateway.
Diagram
Registering Extensions in Envoy Gateway
Information about the extension that Envoy Gateway needs to load is configured in the Envoy Gateway config.
An example configuration:
apiVersion: gateway.envoyproxy.io/v1alpha1
kind: EnvoyGateway
extensionManager:
poliyResources:
- group: example.myextension.io
version: v1alpha1
kind: ListenerPolicyKind
resources:
- group: example.myextension.io
version: v2
kind: OAuth2Filter
hooks:
xdsTranslator:
post:
- Route
- VirtualHost
- HTTPListener
- Translation
service:
fqdn:
hostname: my-extension.example
port: 443
tls:
certificateRef:
name: my-secret
namespace: default
An extension must supply connection information in the extension.service
field so that Envoy Gateway can communicate with the extension. The tls
configuration is optional. Envoy Gateway supports connecting to an extension server either with TCP or with Unix Domain Sockets as a transport layer.
If the extension wants Envoy Gateway to watch resources for it then the extension must configure the optional extension.resources
field and supply a list of:
group
: the API group of the resourceversion
: the API version of the resourcekind
: the Kind of resource
If the extension wants Envoy Gateway to watch for policy resources then it must configure the optional extensions.policyResources
field and supply a list of
group
: the API group of the resourceversion
: the API version of the resourcekind
: the Kind of resource
Policy resources, like all Gateway-API policies, must contain targetRef
or targetRefs
fields in the spec which allow Envoy Gateway to identify which resources are targeted by the policy.
Policies can currently only target Gateway
resources, and are provided as context to calls to the HTTPListener
hook.
The extension can configure the extensionManager.hooks
field to specify which hook points it would like to support. If a given hook is not listed here then it will not be executed even
if the extension is configured properly. This allows extension developers to only opt-in to the hook points they want to make use of.
This configuration is required to be provided at bootstrap and modifying the registered extension during runtime is not currently supported.
Envoy Gateway will keep track of the registered extension and its API groups
and kinds
when processing Gateway API resources.
Extending Gateway API and the Data Plane
Envoy Gateway manages Envoy deployments, which act as the data plane that handles actual user traffic. Users configure the data plane using the K8s Gateway API resources which Envoy Gateway converts into Envoy specific configuration (xDS) to send over to Envoy.
Gateway API offers ExtensionRef filters and Policy Attachments as extension points for implementers to use. Envoy Gateway extends the Gateway API using these extension points to provide support for rate limiting
and authentication native to the project. The initial design of Envoy Gateway extensions will primarily focus on ExtensionRef
filters so that extension developers can reference their own resources as HTTP Filters in the same way
that Envoy Gateway has native support for rate limiting and authentication filters, as well as policy resources which can target Gateway
s.
When Envoy Gateway encounters an HTTPRoute or GRPCRoute that has an ExtensionRef
filter
with a group
and kind
that Envoy Gateway does not support, it will first
check the registered extension to determine if it supports the referenced object before considering it a configuration error.
This allows users to be able to reference additional filters provided by their Envoy Gateway Extension, in their HTTPRoute
s / GRPCRoute
s:
apiVersion: example.myextension.io/v1alpha1
kind: OAuth2Filter
metadata:
name: oauth2-filter
spec:
...
---
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
name: example
spec:
parentRefs:
- name: eg
hostnames:
- www.example.com
rules:
- clientSelectors:
- path:
type: PathPrefix
value: /
filters:
- type: ExtensionRef
extensionRef:
group: example.myextension.io
kind: OAuth2Filter
name: oauth2-filter
backendRefs:
- name: backend
port: 3000
In order to enable the usage of new resources introduced by an extension for translation and xDS modification, Envoy Gateway provides hook points within the translation pipeline, where it calls out to the extension service registered in the EnvoyGateway config
if they specify an group
that matches the group
of an ExtensionRef
filter. The extension will then be able to modify the xDS that Envoy Gateway generated and send back the
modified configuration. If an extension is not registered or if the registered extension does not specify support for the group
of an ExtensionRef
filter then Envoy Gateway will treat it as an unknown resource
and provide an error to the user.
Note: Currently (as of v1) Gateway API does not provide a means to specify the namespace or version of an object referenced as an ExtensionRef
. The extension mechanism will assume that
the namespace of any ExtensionRef
is the same as the namespace of the HTTPRoute
or GRPCRoute
it is attached to rather than treating the name
field of an ExtensionRef
as a name.namespace
string.
If Gateway API adds support for these fields then the design of the Envoy Gateway extensions will be updated to support them.
Similarly, any registered policy resource that targets an HTTPListener
will be sent to the HTTPListener
hook as context.
Watching New Resources
Envoy Gateway will dynamically create new watches on resources introduced by the registered Extension. It does so by using the controller-runtime to create new watches on Unstructured resources that match the version
s, group
s, and kind
s that the registered extension configured. When communicating with an extension, Envoy Gateway sends these Unstructured resources over to the extension. This eliminates the need for the extension to create its own watches which would have a strong chance of creating race conditions and reconciliation loops when resources change. When an extension receives the Unstructured resources from Envoy Gateway it can perform its own type validation on them. Currently we make the simplifying assumption that the registered extension’s Kinds
are filters referenced by extensionRef
in HTTPRouteFilter
s . Policy attachments which target Gateway
resources work in the same way.
xDS Hooks API
Envoy Gateway supports the following hooks as the initial foundation of the Extension system. Additional hooks can be developed using this extension system at a later point as new use-cases and needs are discovered. The primary iteration of the extension hooks focuses solely on the modification of xDS resources.
Route Modification Hook
The Route level Hook provides a way for extensions to modify a route generated by Envoy Gateway before it is finalized.
Doing so allows extensions to configure/modify route fields configured by Envoy Gateway and also to configure the
Route’s TypedPerFilterConfig which may be desirable to do things such as pass settings and information to ext_authz filters.
The Post Route Modify hook also passes a list of Unstructured data for the externalRefs owned by the extension on the HTTPRoute that created this xDS route
This hook is always executed when an extension is loaded that has added Route
to the EnvoyProxy.extensionManager.hooks.xdsTranslator.post
, and only on Routes which were generated from an HTTPRoute that uses extension resources as externalRef filters.
// PostRouteModifyRequest sends a Route that was generated by Envoy Gateway along with context information to an extension so that the Route can be modified
message PostRouteModifyRequest {
envoy.config.route.v3.Route route = 1;
PostRouteExtensionContext post_route_context = 2;
}
// RouteExtensionContext provides resources introduced by an extension and watched by Envoy Gateway
// additional context information can be added to this message as more use-cases are discovered
message PostRouteExtensionContext {
// Resources introduced by the extension that were used as extensionRefs in an HTTPRoute/GRPCRoute
repeated ExtensionResource extension_resources = 1;
// hostnames are the fully qualified domain names attached to the HTTPRoute
repeated string hostnames = 2;
}
// ExtensionResource stores the data for a K8s API object referenced in an HTTPRouteFilter
// extensionRef. It is constructed from an unstructured.Unstructured marshalled to JSON. An extension
// can marshal the bytes from this resource back into an unstructured.Unstructured and then
// perform type checking to obtain the resource.
message ExtensionResource {
bytes unstructured_bytes = 1;
}
// PostRouteModifyResponse is the expected response from an extension and contains a modified version of the Route that was sent
// If an extension returns a nil Route then it will not be modified
message PostRouteModifyResponse {
envoy.config.route.v3.Route route = 1;
}
VirtualHost Modification Hook
The VirtualHost Hook provides a way for extensions to modify a VirtualHost generated by Envoy Gateway before it is finalized.
An extension can also make use of this hook to generate and insert entirely new Routes not generated by Envoy Gateway.
This hook is always executed when an extension is loaded that has added VirtualHost
to the EnvoyProxy.extensionManager.hooks.xdsTranslator.post
.
An extension may return nil to not make any changes to the VirtualHost.
// PostVirtualHostModifyRequest sends a VirtualHost that was generated by Envoy Gateway along with context information to an extension so that the VirtualHost can be modified
message PostVirtualHostModifyRequest {
envoy.config.route.v3.VirtualHost virtual_host = 1;
PostVirtualHostExtensionContext post_virtual_host_context = 2;
}
// Empty for now but we can add fields to the context as use-cases are discovered without
// breaking any clients that use the API
// additional context information can be added to this message as more use-cases are discovered
message PostVirtualHostExtensionContext {}
// PostVirtualHostModifyResponse is the expected response from an extension and contains a modified version of the VirtualHost that was sent
// If an extension returns a nil Virtual Host then it will not be modified
message PostVirtualHostModifyResponse {
envoy.config.route.v3.VirtualHost virtual_host = 1;
}
HTTP Listener Modification Hook
The HTTP Listener modification hook is the broadest xDS modification Hook available and allows an extension to make changes to a Listener generated by Envoy Gateway before it is finalized.
This hook is always executed when an extension is loaded that has added HTTPListener
to the EnvoyProxy.extensionManager.hooks.xdsTranslator.post
. An extension may return nil
in order to not make any changes to the Listener.
// PostVirtualHostModifyRequest sends a Listener that was generated by Envoy Gateway along with context information to an extension so that the Listener can be modified
message PostHTTPListenerModifyRequest {
envoy.config.listener.v3.Listener listener = 1;
PostHTTPListenerExtensionContext post_listener_context = 2;
}
// Empty for now but we can add fields to the context as use-cases are discovered without
// breaking any clients that use the API
// additional context information can be added to this message as more use-cases are discovered
message PostHTTPListenerExtensionContext {
// Resources introduced by the extension that were used as extension server
// policies targeting the listener
repeated ExtensionResource extension_resources = 1;
}
// PostHTTPListenerModifyResponse is the expected response from an extension and contains a modified version of the Listener that was sent
// If an extension returns a nil Listener then it will not be modified
message PostHTTPListenerModifyResponse {
envoy.config.listener.v3.Listener listener = 1;
}
Post xDS Translation Modify Hook
The Post Translate Modify hook allows an extension to modify the clusters and secrets in the xDS config.
This allows for inserting clusters that may change along with extension specific configuration to be dynamically created rather than
using custom bootstrap config which would be sufficient for clusters that are static and not prone to have their configurations changed.
An example of how this may be used is to inject a cluster that will be used by an ext_authz http filter created by the extension.
The list of clusters and secrets returned by the extension are used as the final list of all clusters and secrets
This hook is always executed when an extension is loaded that has added Translation
to the EnvoyProxy.extensionManager.hooks.xdsTranslator.post
.
// PostTranslateModifyRequest currently sends only clusters and secrets to an extension.
// The extension is free to add/modify/remove the resources it received.
message PostTranslateModifyRequest {
PostTranslateExtensionContext post_translate_context = 1;
repeated envoy.config.cluster.v3.Cluster clusters = 2;
repeated envoy.extensions.transport_sockets.tls.v3.Secret secrets = 3;
}
// PostTranslateModifyResponse is the expected response from an extension and contains
// the full list of xDS clusters and secrets to be used for the xDS config.
message PostTranslateModifyResponse {
repeated envoy.config.cluster.v3.Cluster clusters = 1;
repeated envoy.extensions.transport_sockets.tls.v3.Secret secrets = 2;
}
Extension Service
Currently, an extension must implement all of the following hooks although it may return the input(s) it received if no modification of the resource is desired. A future expansion of the extension hooks will allow an Extension to specify with config which Hooks it would like to “subscribe” to and which Hooks it does not wish to support. These specific Hooks were chosen in order to provide extensions with the ability to have both broad and specific control over xDS resources and to minimize the amount of data being sent.
service EnvoyGatewayExtension {
rpc PostRouteModify (PostRouteModifyRequest) returns (PostRouteModifyResponse) {};
rpc PostVirtualHostModify(PostVirtualHostModifyRequest) returns (PostVirtualHostModifyResponse) {};
rpc PostHTTPListenerModify(PostHTTPListenerModifyRequest) returns (PostHTTPListenerModifyResponse) {};
rpc PostTranslateModify(PostTranslateModifyRequest) returns (PostTranslateModifyResponse) {};
}
Design Decisions
- Envoy Gateway watches new custom resources introduced by a loaded extension and passes the resources back to the extension when they are used.
- This decision was made to solve the problem about how resources introduced by an extension get watched. If an extension server watches its own resources then it would need some way to trigger an Envoy Gateway reconfigure when a resource that Envoy Gateway is not watching gets updated. Having Envoy Gateway watch all resources removes any concern about creating race confitions or reconcile loops that would result from Envoy Gateway and the extension server both having so much separate state that needs to be synchronized.
- The Extension Server takes ownership of producing the correct xDS configuration in the hook responses
- The Extension Server will be responsible for ensuring the performance of the hook processing time
- The Post xDS level gRPC hooks all currently send a context field even though it contains nothing for several hooks. These fields exist so that they can be updated in the future to pass additional information to extensions as new use-cases and needs are discovered.
- The initial design supplies the scaffolding for both “pre xDS” and “post xDS” hooks. Only the post hooks are currently implemented which operate on xDS resources after they have been generated. The pre hooks will be implemented at a later date along with one or more hooks in the infra manager. The infra manager level hook(s) will exist to power use-cases such as dynamically creating Deployments/Services for the extension the whenever Envoy Gateway creates an instance of Envoy Proxy. An extension developer might want to take advantage of this functionality to inject a new authorization service as a sidecar on the Envoy Proxy deployment for reduced latency.
- Multiple extensions are not be supported at the same time. Preventing conflict between multiple extensions that are mangling xDS resources is too difficult to ensure compatibility with and is likely to only generate issues.
Known Challenges
Extending Envoy Gateway by using an external extension server which makes use of hook points in Envoy Gateway does comes with a few trade-offs. One known trade-off is the impact of the time that it takes for the hook calls to be executed. Since an extension would make use of hook points in Envoy Gateway that use gRPC for communication, the time it takes to perform these requests could become a concern for some extension developers. One way to minimize the request time of the hook calls is to load the extension server as a sidecar to Envoy Gateway using the Unix Local Domain transport to minimize the impact of networking on the hook calls.
17 - EnvoyExtensionPolicy
Overview
This design document introduces the EnvoyExtensionPolicy
API allowing system administrators to configure traffic
processing extensibility policies, based on existing Network and HTTP Envoy proxy extension points.
Envoy Gateway already provides two methods of control plane extensibility that can be used to achieve this functionality:
- Envoy Patch Policy can be used to patch Listener filters and HTTP Connection Manager filters.
- Envoy Extension Manager can be used to programmatically mutate Listener filters and HTTP Connection Manager filters.
These approaches require a high level of Envoy and Envoy Gateway expertise and may create a significant operational burden for users (see Alternatives for more details). For this reason, this document proposes to support Envoy data plane extensibility options as first class citizens of Envoy Gateway.
Goals
- Add an API definition to hold settings for configuring extensibility rules on the traffic entering the gateway.
Non Goals
- Define the API configuration fields in this API.
- Define the API for the following extension options:
- Native Envoy extensions: custom C++ extensions that must be compiled into the Envoy binary.
- Non-filter extensions: services, matchers, tracers, private key providers, resource monitors, etc.
Implementation
EnvoyExtensionPolicy
is a Policy Attachment type API that can be used to extend Gateway API
to define traffic extension rules.
BackendTrafficPolicy
is enhanced to allow users to provide per-route config for Extensions.
Example
Here is an example highlighting how a user can configure this API for the External Processing extension.
apiVersion: gateway.networking.k8s.io/v1
kind: GatewayClass
metadata:
name: eg
spec:
controllerName: gateway.envoyproxy.io/gatewayclass-controller
---
apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
metadata:
name: eg
namespace: default
spec:
gatewayClassName: eg
listeners:
- name: https
protocol: HTTPS
port: 443
---
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
name: backend
namespace: default
spec:
parentRefs:
- name: eg
hostnames:
- "www.example.com"
rules:
- backendRefs:
- group: ""
kind: Service
name: backend
port: 3000
weight: 1
matches:
- path:
type: PathPrefix
value: /
---
apiVersion: gateway.envoyproxy.io/v1alpha1
kind: EnvoyExtensionPolicy
metadata:
name: ext-proc-policy
namespace: default
spec:
priority: 10
extProc:
- service:
backendRef:
group: ""
kind: Service
name: myExtProc
port: 3000
processingMode:
request:
headers: SEND
body: BUFFERED
response:
headers: SKIP
body: STREAMED
messageTimeout: 5s
targetRef:
group: gateway.networking.k8s.io
kind: Gateway
name: eg
namespace: default
Features / API Fields
Here is a list of features that can be included in this API
- Network Filters:
- Wasm
- Golang
- HTTP Filters:
- External Processing
- Lua
- Wasm
- Golang
Design Decisions
- This API will only support a single
targetRef
and can bind to aGateway
resource or aHTTPRoute
orGRPCRoute
orTCPRoute
. - Extensions that support both Network and HTTP filter variants (e.g. Wasm, Golang) will be translated to the appropriate filter type according to the sort of route that they attach to.
- Extensions that only support HTTP extensibility (Ext-Proc, LUA) can only be attached to HTTP/GRPC Routes.
- A user-defined extension that is added to the request processing flow can have a significant impact on security, resilience and performance of the proxy. Gateway Operators can restrict access to the extensibility policy using K8s RBAC.
- Users may need to customize the order of extension and built-in filters. This will be addressed in a separate issue.
- Gateway operators may need to include multiple extensions (e.g. Wasm modules developed by different teams and distributed separately). This API will support attachment of multiple policies. Extension will execute in an order defined by the priority field.
- This API resource MUST be part of same namespace as the targetRef resource
- If the policy targets a resource but cannot attach to it, this information should be reflected
in the Policy Status field using the
Conflicted=True
condition. - If Policy A has a
targetRef
that includes asectionName
i.e. it targets a specific Listener within aGateway
and Policy B has atargetRef
that targets the same entire Gateway then- Policy A will be applied/attached to the specific Listener defined in the
targetRef.SectionName
- Policy B will be applied to the remaining Listeners within the Gateway. Policy B will have an additional
status condition
Overridden=True
.
- Policy A will be applied/attached to the specific Listener defined in the
- A Policy targeting the most specific scope wins over a policy targeting a lesser specific scope.
i.e. A Policy targeting a
Listener
overrides a Policy targeting theGateway
the listener/section is a part of.
Alternatives
- The project can indefinitely wait for these configuration parameters to be part of the Gateway API.
- The project can implement support for HTTP traffic extensions using vendor-specific Gateway API Route Filters instead of policies. However, this option will is less convenient for definition of gateway-level extensions.
- Users can leverage the existing Envoy Patch Policy to inject extension filters. However, Envoy Gateway strives to provide a simple abstraction for common use cases and easy operations. Envoy patches require a high level of end-user Envoy expertise, and knowledge of how Envoy Gateway generates XDS. Such patches may be too difficult and fragile for some users to maintain.
- Users can leverage the existing Envoy Extension Manager to inject extension filters. However, this requires a significant investment by users to build and operate an extension manager alongside Envoy Gateway.
18 - EnvoyPatchPolicy
Overview
This design introduces the EnvoyPatchPolicy
API allowing users to modify the generated Envoy xDS Configuration
that Envoy Gateway generates before sending it to Envoy Proxy.
Envoy Gateway allows users to configure networking and security intent using the upstream Gateway API as well as implementation specific Extension APIs defined in this project to provide a more batteries included experience for application developers.
- These APIs are an abstracted version of the underlying Envoy xDS API to provide a better user experience for the application developer, exposing and setting only a subset of the fields for a specific feature, sometimes in a opinionated way (e.g RateLimit)
- These APIs do not expose all the features capabilities that Envoy has either because these features are desired but the API is not defined yet or the project cannot support such an extensive list of features. To alleviate this problem, and provide an interim solution for a small section of advanced users who are well versed in Envoy xDS API and its capabilities, this API is being introduced.
Goals
- Add an API allowing users to modify the generated xDS Configuration
Non Goals
- Support multiple patch mechanisims
Implementation
EnvoyPatchPolicy
is a Direct Policy Attachment type API that can be used to extend Gateway API
Modifications to the generated xDS configuration can be provided as a JSON Patch which is defined in
RFC 6902. This patching mechanism has been adopted in Kubernetes as well as Kustomize to update
resource objects.
Example
Here is an example highlighting how a user can configure global ratelimiting using an external rate limit service using this API.
apiVersion: gateway.networking.k8s.io/v1
kind: GatewayClass
metadata:
name: eg
spec:
controllerName: gateway.envoyproxy.io/gatewayclass-controller
---
apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
metadata:
name: eg
namespace: default
spec:
gatewayClassName: eg
listeners:
- name: http
protocol: HTTP
port: 80
---
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
name: backend
namespace: default
spec:
parentRefs:
- name: eg
hostnames:
- "www.example.com"
rules:
- backendRefs:
- group: ""
kind: Service
name: backend
port: 3000
weight: 1
matches:
- path:
type: PathPrefix
value: /
---
apiVersion: gateway.envoyproxy.io/v1alpha1
kind: EnvoyPatchPolicy
metadata:
name: ratelimit-patch-policy
namespace: default
spec:
targetRef:
group: gateway.networking.k8s.io
kind: Gateway
name: eg
namespace: default
type: JSONPatch
jsonPatches:
- type: "type.googleapis.com/envoy.config.listener.v3.Listener"
# The listener name is of the form <GatewayNamespace>/<GatewayName>/<GatewayListenerName>
name: default/eg/http
operation:
op: add
path: "/default_filter_chain/filters/0/typed_config/http_filters/0"
value:
name: "envoy.filters.http.ratelimit"
typed_config:
"@type": "type.googleapis.com/envoy.extensions.filters.http.ratelimit.v3.RateLimit"
domain: "eag-ratelimit"
failure_mode_deny: true
timeout: 1s
rate_limit_service:
grpc_service:
envoy_grpc:
cluster_name: rate-limit-cluster
transport_api_version: V3
- type: "type.googleapis.com/envoy.config.route.v3.RouteConfiguration"
# The route name is of the form <GatewayNamespace>/<GatewayName>/<GatewayListenerName>
name: default/eg/http
operation:
op: add
path: "/virtual_hosts/0/rate_limits"
value:
- actions:
- remote_address: {}
- type: "type.googleapis.com/envoy.config.cluster.v3.Cluster"
name: rate-limit-cluster
operation:
op: add
path: ""
value:
name: rate-limit-cluster
type: STRICT_DNS
connect_timeout: 10s
lb_policy: ROUND_ROBIN
http2_protocol_options: {}
load_assignment:
cluster_name: rate-limit-cluster
endpoints:
- lb_endpoints:
- endpoint:
address:
socket_address:
address: ratelimit.svc.cluster.local
port_value: 8081
Verification
- Offline - Leverage egctl x translate to ensure that the
EnvoyPatchPolicy
can be successfully applied and the desired output xDS is created. - Runtime - Use the
Status
field withinEnvoyPatchPolicy
to highlight whether the patch was applied successfully or not.
State of the World
- Istio - Supports the EnvoyFilter API which allows users to customize the output xDS using patches and proto based merge semantics.
Design Decisions
- This API will only support a single
targetRef
and can bind to only aGateway
orGatewayClass
resource. This simplifies reasoning of how patches will work. - This API will always be an experimental API and cannot be graduated into a stable API because Envoy Gateway cannot garuntee
- that the naming scheme for the generated resources names will not change across releases
- that the underlying Envoy Proxy API will not change across releases
- This API needs to be explicitly enabled using the EnvoyGateway API
Open Questions
- Should the value only support JSON or YAML as well (which is a JSON superset) ?
Alternatives
- Users can customize the Envoy Bootstrap configuration using EnvoyProxy API and provide static xDS configuration.
- Users can extend functionality by Extending the Control Plane and adding gRPC hooks to modify the generated xDS configuration.
19 - Metadata in XDS resources
Overview
In Envoy, static metadata can be configured on various resources: listener, virtual host, route and cluster.
Static metadata can be used for various purposes:
- Observability: enrichment of access logs and traces with metadata formatters and custom tags.
- Processing: provide configuration context to filters in a certain scope (e.g. vhost, route, etc.).
This document describes how Envoy Gateway manages static metadata for various XDS resource such as listeners, virtual hosts, routes, clusters and endpoints.
Configuration
Envoy Gateway propagates certain attributes of Gateway-API resources to XDS resources. Attributes include:
- Metadata: Kind, Group/Version, Name, Namespace and Annotations (belonging to the
metadata.gateway.envoyproxy.io
namespace) - Spec: SectionName (Listener Name, RouteRule Name, Port Name), in-spec annotations (e.g. Gateway Annotations)
Future enhancements may include:
- Additional attribute propagation
- Supporting section-specific metadata, e.g. HTTPRoute Metadata annotations that are propagated only to a specific route rule XDS metadata.
- Supporting additional XDS resource, e.g. endpoints and filter chains.
Translation
Envoy Gateway uses the following namespace for envoy resource metadata: gateway.envoyproxy.io/
. For example, an envoy route resource may have the following metadata structure:
Kubernetes resource:
kind: HTTPRoute
apiVersion: gateway.networking.k8s.io/v1
metadata:
annotations:
gateway.envoyproxy.io/foo: bar
name: myroute
namespace: gateway-conformance-infra
spec:
rules:
matches:
- path:
type: PathPrefix
value: /mypath
Metadata structure:
name: httproute/gateway-conformance-infra/myroute/rule/0/match/0/*
match:
path_separated_prefix: "/mypath"
route:
cluster: httproute/gateway-conformance-infra/myroute/rule/0
metadata:
filter_metadata:
envoy-gateway:
resources:
- namespace: gateway-conformance-infra
groupVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
annotations:
foo: bar
name: myroute
Envoy Gateway translates Gateway-API in the following manner:
- Gateway metadata is propagated to envoy listener metadata. If merge-gateways is enabled, Gateway Class is used instead.
- Gateway metadata and Listener Section name are propagated to envoy virtual host metadata
- HTTPRoute and GRPCRoute metadata is propagated to envoy route metadata. When Gateway-API adds support named route rules, the route rule name
- TCP/UDPRoute and TLSRoute resource attributes are not propagated. These resources are translated to envoy filter chains, which do not currently support static metadata.
- Service, ServiceImport and Backend metadata and port name are propagated to envoy cluster metadata.
Usage
Users can consume metadata in various ways:
- Adding metadata to access logs using the metadata operator, e.g.
%METADATA(ROUTE:envoy-gateway:resources)
- Accessing metadata in CEL expressions through the
xds.*_metadata
attribute
20 - Rate Limit Design
Overview
Rate limit is a feature that allows the user to limit the number of incoming requests to a predefined value based on attributes within the traffic flow.
Here are some reasons why a user may want to implement Rate limits
- To prevent malicious activity such as DDoS attacks.
- To prevent applications and its resources (such as a database) from getting overloaded.
- To create API limits based on user entitlements.
Scope Types
The rate limit type here describes the scope of rate limits.
Global - In this case, the rate limit is common across all the instances of Envoy proxies where its applied i.e. if the data plane has 2 replicas of Envoy running, and the rate limit is 10 requests/second, this limit is common and will be hit if 5 requests pass through the first replica and 5 requests pass through the second replica within the same second.
Local - In this case, the rate limits are specific to each instance/replica of Envoy running. Note - This is not part of the initial design and will be added as a future enhancement.
Match Types
Rate limit a specific traffic flow
- Here is an example of a ratelimit implemented by the application developer to limit a specific user
by matching on a custom
x-user-id
header with a value set toone
apiVersion: gateway.envoyproxy.io/v1alpha1
kind: BackendTrafficPolicy
metadata:
name: ratelimit-specific-user
spec:
targetRef:
group: gateway.networking.k8s.io
kind: HTTPRoute
name: example
rateLimit:
type: Global
global:
rules:
- clientSelectors:
- headers:
- name: x-user-id
value: one
limit:
requests: 10
unit: Hour
---
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
name: example
spec:
parentRefs:
- name: eg
hostnames:
- www.example.com
rules:
- matches:
- path:
type: PathPrefix
value: /foo
filters:
- type: ExtensionRef
extensionRef:
group: gateway.envoyproxy.io
kind: RateLimitFilter
name: ratelimit-specific-user
backendRefs:
- name: backend
port: 3000
Rate limit all traffic flows
- Here is an example of a rate limit implemented by the application developer that limits the total requests made
to a specific route to safeguard health of internal application components. In this case, no specific
headers
match is specified, and the rate limit is applied to all traffic flows accepted by thisHTTPRoute
.
apiVersion: gateway.envoyproxy.io/v1alpha1
kind: BackendTrafficPolicy
metadata:
name: ratelimit-all-requests
spec:
targetRef:
group: gateway.networking.k8s.io
kind: HTTPRoute
name: example
rateLimit:
type: Global
global:
rules:
- limit:
requests: 1000
unit: Second
---
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
name: example
spec:
parentRefs:
- name: eg
hostnames:
- www.example.com
rules:
- matches:
- path:
type: PathPrefix
value: /foo
filters:
- type: ExtensionRef
extensionRef:
group: gateway.envoyproxy.io
kind: RateLimitFilter
name: ratelimit-all-requests
backendRefs:
- name: backend
port: 3000
Rate limit per distinct value
- Here is an example of a rate limit implemented by the application developer to limit any unique user
by matching on a custom
x-user-id
header. Here, user A (recognised from the traffic flow using the headerx-user-id
and valuea
) will be rate limited at 10 requests/hour and so will user B (recognised from the traffic flow using the headerx-user-id
and valueb
).
apiVersion: gateway.envoyproxy.io/v1alpha1
kind: BackendTrafficPolicy
metadata:
name: ratelimit-per-user
spec:
targetRef:
group: gateway.networking.k8s.io
kind: HTTPRoute
name: example
rateLimit:
type: Global
global:
rules:
- clientSelectors:
- headers:
- type: Distinct
name: x-user-id
limit:
requests: 10
unit: Hour
---
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
name: example
spec:
parentRefs:
- name: eg
hostnames:
- www.example.com
rules:
- matches:
- path:
type: PathPrefix
value: /foo
filters:
- type: ExtensionRef
extensionRef:
group: gateway.envoyproxy.io
kind: RateLimitFilter
name: ratelimit-per-user
backendRefs:
- name: backend
port: 3000
Rate limit per source IP
- Here is an example of a rate limit implemented by the application developer that limits the total requests made
to a specific route by matching on source IP. In this case, requests from
x.x.x.x
will be rate limited at 10 requests/hour.
apiVersion: gateway.envoyproxy.io/v1alpha1
kind: BackendTrafficPolicy
metadata:
name: ratelimit-per-ip
spec:
targetRef:
group: gateway.networking.k8s.io
kind: HTTPRoute
name: example
rateLimit:
type: Global
global:
rules:
- clientSelectors:
- sourceIP: x.x.x.x/32
limit:
requests: 10
unit: Hour
---
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
name: example
spec:
parentRefs:
- name: eg
hostnames:
- www.example.com
rules:
- matches:
- path:
type: PathPrefix
value: /foo
filters:
- type: ExtensionRef
extensionRef:
group: gateway.envoyproxy.io
kind: RateLimitFilter
name: ratelimit-per-user
backendRefs:
- name: backend
port: 3000
Rate limit based on JWT claims
- Here is an example of rate limit implemented by the application developer that limits the total requests made
to a specific route by matching on the jwt claim. In this case, requests with jwt claim information of
{"name":"John Doe"}
will be rate limited at 10 requests/hour.
apiVersion: gateway.envoyproxy.io/v1alpha1
kind: SecurityPolicy
metadata:
name: jwt-example
spec:
targetRef:
group: gateway.networking.k8s.io
kind: HTTPRoute
name: example
jwt:
providers:
- name: example
remoteJWKS:
uri: https://raw.githubusercontent.com/envoyproxy/gateway/main/examples/kubernetes/jwt/jwks.json
claimToHeaders:
- claim: name
header: custom-request-header
---
apiVersion: gateway.envoyproxy.io/v1alpha1
kind: BackendTrafficPolicy
metadata:
name: ratelimit-specific-user
spec:
targetRef:
group: gateway.networking.k8s.io
kind: HTTPRoute
name: example
rateLimit:
type: Global
global:
rules:
- clientSelectors:
- headers:
- name: custom-request-header
value: John Doe
limit:
requests: 10
unit: Hour
---
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
name: example
spec:
parentRefs:
- name: eg
hostnames:
- "www.example.com"
rules:
- backendRefs:
- group: ""
kind: Service
name: backend
port: 3000
weight: 1
matches:
- path:
type: PathPrefix
value: /foo
Multiple RateLimitFilters, rules and clientSelectors
- Users can create multiple
RateLimitFilter
s and apply it to the sameHTTPRoute
. In such a case eachRateLimitFilter
will be applied to the route and matched (and limited) in a mutually exclusive way, independent of each other. - Rate limits are applied for each
RateLimitFilter
rule
when ALL the conditions underclientSelectors
hold true.
Here’s an example highlighting this -
apiVersion: gateway.envoyproxy.io/v1alpha1
kind: BackendTrafficPolicy
metadata:
name: ratelimit-all-safeguard-app
spec:
targetRef:
group: gateway.networking.k8s.io
kind: HTTPRoute
name: example
rateLimit:
type: Global
global:
rules:
- limit:
requests: 100
unit: Hour
---
apiVersion: gateway.envoyproxy.io/v1alpha1
kind: BackendTrafficPolicy
metadata:
name: ratelimit-per-user
spec:
targetRef:
group: gateway.networking.k8s.io
kind: HTTPRoute
name: example
rateLimit:
type: Global
global:
rules:
- clientSelectors:
- headers:
- type: Distinct
name: x-user-id
limit:
requests: 100
unit: Hour
---
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
name: example
spec:
parentRefs:
- name: eg
hostnames:
- www.example.com
rules:
- matches:
- path:
type: PathPrefix
value: /foo
filters:
- type: ExtensionRef
extensionRef:
group: gateway.envoyproxy.io
kind: RateLimitFilter
name: ratelimit-per-user
- type: ExtensionRef
extensionRef:
group: gateway.envoyproxy.io
kind: RateLimitFilter
name: ratelimit-all-safeguard-app
backendRefs:
- name: backend
port: 3000
- The user has created two
RateLimitFilter
s and has attached it to aHTTPRoute
- one(ratelimit-all-safeguard-app
) to ensure that the backend does not get overwhelmed with requests, any excess requests are rate limited irrespective of the attributes within the traffic flow, and another(ratelimit-per-user
) to rate limit each distinct user client who can be differentiated using thex-user-id
header, to ensure that each client does not make exessive requests to the backend. - If user
baz
(identified with the header and value ofx-user-id: baz
) sends 90 requests within the first second, and userbar
sends 11 more requests during that same interval of 1 second, and userbar
sends the 101th request within that second, the rule defined inratelimit-all-safeguard-app
gets activated and Envoy Gateway will ratelimit the request sent bybar
(and any other request sent within that 1 second). After 1 second, the rate limit counter associated with theratelimit-all-safeguard-app
rule is reset and again evaluated. - If user
bar
also ends up sending 90 more requests within the hour, summing upbar
’s total request count to 101, the rate limit rule defined withinratelimit-per-user
will get activated, andbar
’s requests will be rate limited again until the hour interval ends. - Within the same above hour, if
baz
sends 991 more requests, summing upbaz
’s total request count to 1001, the rate limit rule defined withinratelimit-per-user
will get activated forbaz
, andbaz
’s requests will also be rate limited until the hour interval ends.
Design Decisions
- The initial design uses an Extension filter to apply the Rate Limit functionality on a specific HTTPRoute. This was preferred over the PolicyAttachment extension mechanism, because it is unclear whether Rate Limit will be required to be enforced or overridden by the platform administrator or not.
- The RateLimitFilter can only be applied as a filter to a HTTPRouteRule, applying it across all backends within a HTTPRoute and cannot be applied a filter within a HTTPBackendRef for a specific backend.
- The HTTPRoute API has a matches field within each rule to select a specific traffic flow to be routed to
the destination backend. The RateLimitFilter API that can be attached to an HTTPRoute via an extensionRef filter,
also has a
clientSelectors
field within eachrule
to select attributes within the traffic flow to rate limit specific clients. The two levels of selectors/matches allow for flexibility and aim to hold match information specific to its use, allowing the author/owner of each configuration to be different. It also allows theclientSelectors
field within the RateLimitFilter to be enhanced with other matchable attribute such as IP subnet in the future that are not relevant in the HTTPRoute API.
Implementation Details
Global Rate limiting
- Global rate limiting in Envoy Proxy can be achieved using the following -
- Actions can be configured per xDS Route.
- If the match criteria defined within these actions is met for a specific HTTP Request, a set of key value pairs called descriptors defined within the above actions is sent to a remote rate limit service, whose configuration (such as the URL for the rate limit service) is defined using a rate limit filter.
- Based on information received by the rate limit service and its programmed configuration, a decision is computed, whether to rate limit the HTTP Request or not, and is sent back to Envoy, which enforces this decision on the data plane.
- Envoy Gateway will leverage this Envoy Proxy feature by -
- Translating the user facing RateLimitFilter API into Rate limit Actions as well as Rate limit service configuration to implement the desired API intent.
- Envoy Gateway will use the existing reference implementation of the rate limit service.
- The Infrastructure administrator will need to enable the rate limit service using new settings that will be defined in the EnvoyGateway config API.
- The xDS IR will be enhanced to hold the user facing rate limit intent.
- The xDS Translator will be enhanced to translate the rate limit field within the xDS IR into Rate limit Actions as well as instantiate the rate limit filter.
- A new runner called
rate-limit
will be added that subscribes to the xDS IR messages and translates it into a new Rate Limit Infra IR which contains the rate limit service configuration as well as other information needed to deploy the rate limit service. - The infrastructure service will be enhanced to subscribe to the Rate Limit Infra IR and deploy a provider specific rate limit service runnable entity.
- A Status field within the RateLimitFilter API will be added to reflect whether the specific configuration was programmed correctly in these multiple locations or not.
21 - Running Envoy Gateway locally
Overview
Today, Envoy Gateway runs only on Kubernetes. This is an ideal solution when the applications are running in Kubernetes. However there might be cases when the applications are running on the host which would require Envoy Gateway to run locally.
Goals
- Define an API to allow Envoy Gateway to retrieve configuration while running locally.
- Define an API to allow Envoy Gateway to deploy the managed Envoy Proxy fleet on the host machine.
Non Goals
- Support multiple ways to retrieve configuration while running locally.
- Support multiple ways to deploy the Envoy Proxy fleet locally on the host.
API
- The
provider
field within theEnvoyGateway
configuration only supportsKubernetes
today which provides two features - the ability to retrieve resources from the Kubernetes API Server as well as deploy the managed Envoy Proxy fleet on Kubernetes. - This document proposes adding a new top level
provider
type calledCustom
with two fields calledresource
andinfrastructure
to allow the user to configure the sub providers for providing resource configuration and an infrastructure to deploy the Envoy Proxy data plane in. - A
File
resource provider will be introduced to enable retrieving configuration locally by reading from the configuration from a file. - A
Host
infrastructure provider will be introduced to allow Envoy Gateway to spawn a Envoy Proxy child process on the host.
Here is an example configuration
provider:
type: Custom
custom:
resource:
type: File
file:
paths:
- "config.yaml"
infrastructure:
type: Host
host: {}
22 - SecurityPolicy
Overview
This design document introduces the SecurityPolicy
API allowing system administrators to configure
authentication and authorization policies to the traffic entering the gateway.
Goals
- Add an API definition to hold settings for configuring authentication and authorization rules on the traffic entering the gateway.
Non Goals
- Define the API configuration fields in this API.
Implementation
SecurityPolicy
is a Policy Attachment type API that can be used to extend Gateway API
to define authentication and authorization rules.
Example
Here is an example highlighting how a user can configure this API.
apiVersion: gateway.networking.k8s.io/v1
kind: GatewayClass
metadata:
name: eg
spec:
controllerName: gateway.envoyproxy.io/gatewayclass-controller
---
apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
metadata:
name: eg
namespace: default
spec:
gatewayClassName: eg
listeners:
- name: https
protocol: HTTPS
port: 443
---
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
name: backend
namespace: default
spec:
parentRefs:
- name: eg
hostnames:
- "www.example.com"
rules:
- backendRefs:
- group: ""
kind: Service
name: backend
port: 3000
weight: 1
matches:
- path:
type: PathPrefix
value: /
---
apiVersion: gateway.envoyproxy.io/v1alpha1
kind: SecurityPolicy
metadata:
name: jwt-authn-policy
namespace: default
spec:
jwt:
providers:
- name: example
remoteJWKS:
uri: https://raw.githubusercontent.com/envoyproxy/gateway/main/examples/kubernetes/jwt/jwks.json
targetRef:
group: gateway.networking.k8s.io
kind: Gateway
name: eg
namespace: default
Features / API Fields
Here is a list of features that can be included in this API
- JWT based authentication
- OIDC Authentication
- External Authorization
- Basic Auth
- API Key Auth
- CORS
Design Decisions
- This API will only support a single
targetRef
and can bind to aGateway
resource or aHTTPRoute
orGRPCRoute
. - This API resource MUST be part of same namespace as the targetRef resource
- There can be only be ONE policy resource attached to a specific targetRef e.g. a
Listener
(section) within aGateway
- If the policy targets a resource but cannot attach to it, this information should be reflected
in the Policy Status field using the
Conflicted=True
condition. - If multiple polices target the same resource, the oldest resource (based on creation timestamp) will attach to the Gateway Listeners, the others will not.
- If Policy A has a
targetRef
that includes asectionName
i.e. it targets a specific Listener within aGateway
and Policy B has atargetRef
that targets the same entire Gateway then- Policy A will be applied/attached to the specific Listener defined in the
targetRef.SectionName
- Policy B will be applied to the remaining Listeners within the Gateway. Policy B will have an additional
status condition
Overridden=True
.
- Policy A will be applied/attached to the specific Listener defined in the
- A Policy targeting the most specific scope wins over a policy targeting a lesser specific scope.
i.e. A Policy targeting a xRoute (
HTTPRoute
orGRPCRoute
) overrides a Policy targeting a Listener that is this route’s parentRef which in turn overrides a Policy targeting the Gateway the listener/section is a part of.
Alternatives
- The project can indefinitely wait for these configuration parameters to be part of the Gateway API.
23 - TCP and UDP Proxy Design
Even though most of the use cases for Envoy Gateway are at Layer-7, Envoy Gateway can also work at Layer-4 to proxy TCP and UDP traffic. This document will explore the options we have when operating Envoy Gateway at Layer-4 and explain the design decision.
Envoy can work as a non-transparent proxy or a transparent proxy for both TCP and UDP , so ideally, Envoy Gateway should also be able to work in these two modes:
Non-transparent Proxy Mode
For TCP, Envoy terminates the downstream connection, connects the upstream with its own IP address, and proxies the TCP traffic from the downstream to the upstream.
For UDP, Envoy receives UDP datagrams from the downstream, and uses its own IP address as the sender IP address when proxying the UDP datagrams to the upstream.
In this mode, the upstream will see Envoy’s IP address and port.
Transparent Proxy Mode
For TCP, Envoy terminates the downstream connection, connects the upstream with the downstream IP address, and proxies the TCP traffic from the downstream to the upstream.
For UDP, Envoy receives UDP datagrams from the downstream, and uses the downstream IP address as the sender IP address when proxying the UDP datagrams to the upstream.
In this mode, the upstream will see the original downstream IP address and Envoy’s mac address.
Note: Even in transparent mode, the upstream can’t see the port number of the downstream because Envoy doesn’t forward the port number.
The Implications of Transparent Proxy Mode
Escalated Privilege
Envoy needs to bind to the downstream IP when connecting to the upstream, which means Envoy requires escalated CAP_NET_ADMIN privileges. This is often considered as a bad security practice and not allowed in some sensitive deployments.
Routing
The upstream can see the original source IP, but the original port number won’t be passed, so the return traffic from the upstream must be routed back to Envoy because only Envoy knows how to send the return traffic back to the right port number of the downstream, which requires routing at the upstream side to be set up. In a Kubernetes cluster, Envoy Gateway will have to carefully cooperate with CNI plugins to get the routing right.
The Design Decision (For Now)
The implementation will only support proxying in non-transparent mode i.e. the backend will see the source IP and port of the deployed Envoy instance instead of the client.
24 - Wasm OCI Image Support
Motivation
Envoy Gateway (EG) should support Wasm OCI image as a remote wasm code source. This feature will allow users to deploy Wasm modules from OCI registries, such as Docker Hub, Google Container Registry, and Amazon Elastic Container Registry, to Envoy proxies managed by EG. Deploying Wasm modules from OCI registries has several benefits:
- Versioning: Users can use the tag feature of the OCI image to manage the version of the Wasm module.
- Security: Users can use private registries to store the Wasm module.
- Distribution: Users can use the existing distribution mechanism of the OCI registry to distribute the Wasm module.
Goals
- Define the system components needed to support Wasm OCI images as remote Wasm code sources.
Architecture
Control Plane Wasm File Cache
Envoy lacks native OCI image support, therefore, EG needs to download Wasm modules from their original OIC registries, cache them locally in the file system, and serve them to Envoy over HTTP.
HTTP Code Source
For HTTP code source, we have two options: serve Wasm modules directly from their original HTTP URLs, or cache them in EG (as with OCI images). Caching both the HTTP Wasm modules and OCI images inside EG can make UI consistent, for example, sha256sum can be calculated on the EG side and made optional in the API. This will also make the Envoy proxy side more efficient as it won’t have to download the Wasm module from the original URL every time, which can be slow.
Resource Consumption
Memory: Since we cache Wasm modules in the file system, we can optimize the memory usage of the cache and the HTTP server to avoid introducing too much memory consumption by this feature. For example, when receiving a Wasm pulling request form the Envoy, we can open the Wasm file and write the file content directly to response, and then close the file. There won’t be significant memory consumption involved.
Disk: Though it’s possible to mount a volume to the container for the Wasm file cache, the current implementation just stores the Wasm files in the EG container’s file system. The disk space consumed by the cache is limited to 1GB by default, it can be made configurable in the future.
Caching Mechanism
Cached files will be evicted based on LRU(Last recently used)algorithm. If the image’s tag is latest, then they will be updated periodically. The cache clean and update periods will be configurable.
Restrict Access to Private Images
- Client Authn with mTLS: To prevent unauthorized proxies from accessing the Wasm modules, the communication between the Envoy and EG will be secured using mTLS.
- User Authn with Registry Credentials: To prevent unauthorized users from accessing the Wasm modules, the user who creates the EEP must have the appropriate permissions to access the OCI registry. For example, if two users create EEPs in different namespaces (ns1, ns2) accessing the same OCI image, each must also create a unique secret with registry credentials (secret1 for user1 in ns1, secret2 for user2 in ns2) and provide it in the EEP configuration. EG will validate the provided secret against the OCI registry before serving the Wasm module to the target HTTPRoute/Gateway of that EEP.
- Unguessable Download URLs: It’s possible that users who have no permission to access a private OCI image could create an EnvoyPatchPolicy to bypass the EG permission check. For example, a user could create an EPP, inject a Wasm filter, and put the download URL of a private OCI image in the Wasm filter configuration. To prevent this, we need to make the download URL unguessable. The download URL will be generated by EG and will be a random string that is impossible to guess. If a user can get the config dump the Envoy Proxy, they can still get the download URL. However, they won’t be able to do so without the permission to access the config dump of Envoy Proxy, which is a more restricted permission (usually Admin role).
Alternative Considered
Inline Bytes
EG downloads the Wasm modules, caches them in memory, and pushes the Wasm code through xDS as inline bytes. This could inflate the xDS, potentially causing memory issues for EG and Envoy.
Data Plane Agent
Mount Wasm modules in the local file system of the Envoy container. We’ll need an agent deployed in the same pod as the Envoy for this. It would be too expensive to implement this as we’ll need to intercept the xDS at the agent.
Standalone Wasm HTTP Server
Deploying the Wasm HTTP server as a standalone service. While this has no obvious benefits, it increases operational costs.
Wait for Envoy OCI image support
We could wait indefinitely for Envoy to support OCI imag as a remote wasm code source.