Use a custom proxy integration for service mesh

Note

The Connect Native Golang SDK and v1/agent/connect/authorize, v1/agent/connect/ca/leaf, and v1/agent/connect/ca/roots APIs are deprecated and will be removed in a future release. Although Connect Native will still operate as designed, we do not recommend leveraging this feature because it is deprecated and will be removed when the long term replacement to native application integration (such as a proxyless gRPC service mesh integration) is delivered. Refer to GH-10339 for additional information and to track progress toward one potential solution that is tracked as replacement functionality.

The Native App Integration does not support many of the Consul's service mesh features, and is not under active development. The Envoy proxy should be used for most production environments.

This topic describes the process and API endpoints you can use to extend proxies for integration with Consul.

Overview

You can extend any proxy to support Consul service mesh. Consul ships with a built-in proxy suitable for an out-of-the-box development experience, but you may require a more robust proxy solution for production environments.

The proxy you integrate must be able to accept inbound connections and/or establish outbound connections identified as a particular service. In some cases, either ability may be acceptable, but both are generally required to support for full sidecar functionality.

Sidecar proxies may support L4 or L7 network functionality. L4 integration is simpler and adequate for securing all traffic. L4 treats all traffic as TCP, however, so advanced routing or metrics features are not supported.

Full L7 support is built on top of L4 support. An L7 proxy integration supports most or all of the L7 traffic routing features in Consul service mesh by dynamically configuring routing, retries, and other L7 features. The built-in proxy only supports L4, while Envoy supports the full L7 feature set.

Areas where the integration approach differs between L4 and L7 are identified in this topic.

The noun connect is used throughout this documentation to refer to the connect subsystem that provides Consul's service mesh capabilities.

Accepting Inbound Connections

The proxy must accept TLS connections on some port to accept inbound connections.

Obtaining and validating client certificates

Call the /v1/agent/connect/ca/leaf/ API endpoint to obtain the client certificate, e.g.:

$ curl http://<host-ip>:8500/v1/agent/connect/ca/leaf/<service-name>

The client certificate from the inbound connection must be validated against the service mesh CA root certificates. Call the /v1/agent/connect/ca/roots endpoint to obtain the root certificates from the service mesh CA, e.g.:

$ curl http://<host-ip>:8500/v1/agent/connect/ca/roots

Authorizing the connection

After validating the client certificate from the caller, the proxy can authorize the entire connection (L4) or each request (L7). Depending upon the protocol of the proxied service, authorization is performed either on a per-connection (L4) or per-request (L7) basis. Authentication is based on "service identity" (TLS), and is implemented at the transport layer.

Note: Some features, such as (local) rate limiting or max connections, are expected to be proxy-level configurations enforced separately when authorization calls are made. Proxies can enforce the configurations based on information about request rates and other states that should already be available.

The proxy can authorize the connection by either calling the /v1/agent/connect/authorize API endpoint or by querying the intention match API endpoint.

The /v1/agent/connect/authorize endpoint should be called in the connection path for each received connection. If the local Consul agent is down or unresponsive, the success rate of new connections will be compromised. The agent uses locally-cached data to authorize the connection and typically responds in microseconds. As a result, the TLS handshake typically spans microseconds.

Note: This endpoint is only suitable for L4 (e.g., TCP) integration. The endpoint always treats intentions with Permissions defined (i.e., L7 criteria) as deny intentions during evaluation.

The proxy can query the intention match API endpoint on startup to retrieve a list of intentions that match the proxy destination. The matches should be stored in the native filter configuration of the proxy, such as RBAC for Envoy.

For performance and reliability reasons, querying the intention match API endpoint is the recommended method for implementing intention enforcement. The cached intentions should be consulted for each incoming connection (L4) or request (L7) to determine if the connection or request should be accepted or rejected.

Persistent TCP connections and intentions

For a proxied service configured with the TCP protocol, potentially long-lived TCP connections will only be authorized when the connections are initially established. But because many services, such as databases, typically use persistent connection pools, changing intentions to deny access does not terminate existing connections. This behavior violates the updated intention. In these cases, it may appear as if the intention is not being enforced.

Implement one of the following strategies to close connections:

Configure connections to terminate after a maximum lifetime, e.g., several hours. This balances the overhead of establishing new connections with determining how long existing connections remain open after an intention changes.
Periodically re-authorize every open connection. The authorization call is inexpensive and should be a local, in-memory operation on the Consul agent. Periodically authorizing thousands of open connections (e.g., once every minute) is likely to be negligible overhead, but doing so enforces a tighter upper boundary on how long it takes to enforce intention changes without affecting the protocol efficiency of persistent connections.

Certificate serial in authorization

Intentions currently use TLS URI Subject Alternative Name (SAN) for enforcement. The AuthZ API in the Go SDK contains a field for passing the serial number (consul/connect/tls.go). Proxies may provide this value during authorization.

Updating data

The API endpoints described in this section operate on agent-local data that is updated in the background. The leaf, roots, and intentions should be updated in the background by the proxy.

The leaf cert, root cert, and intentions endpoints support blocking queries, which should be used to get near-immediate updates for root key rotations, new leaf certs before expiry, and intention changes.

SPIFFE certificates

Although Consul follows the SPIFFE spec for certificates, some CA providers do not allow strict adherence. For example, CA certificates may not have the correct trust-domain SPIFFE URI SAN for the cluster. If SPIFFE validation is performed in the proxy, be aware that it should be possible to opt out, otherwise certain CA providers supported by Consul will not be compatible with the use of that proxy. Neither Envoy nor the built-in proxy currently validate the SPIFFE URI of the chain beyond the leaf certificate.

Establishing Outbound Connections

For outbound connections, the proxy should communicate with a mesh-capable endpoint for a service and provide a client certificate from the /v1/agent/connect/ca/leaf/ API endpoint. The proxy must use the root certificate obtained from the /v1/agent/connect/ca/roots endpoint to verify the certificate served from the destination endpoint.

Configuration Discovery

The /v1/agent/service/:service_id API endpoint enables any proxy to discover proxy configurations registered with a local service. This endpoint supports hash-based blocking, which enables long-polling for changes to the registration/configuration. Any changes to the registration/config will result in the new config being returned immediately.

Refer to the built-in proxy for an example implementation. Using the Go SDK, the proxy calls the HTTP "pull" API via the watch package: consul/connect/proxy/config.go.

The discovery chain for each upstream service should be fetched from the /v1/discovery-chain/:service_id API endpoint. This will return a compiled graph of configurations a sidecar needs for a particular upstream service.

If you are only implementing L4 support in your proxy, set the OverrideProtocol value to tcp when fetching the discovery chain so that L7 features, such as HTTP routing rules, are not returned.

For each target in the resulting discovery chain, a list of healthy, mesh-capable endpoints may be fetched from the [/v1/health/connect/:service_id] API endpoint as described in the Service Discovery section.

The remaining nodes in the chain include configurations that should be translated into the nearest equivalent for features, such as HTTP routing, connection timeouts, connection pool settings, rate limits, etc. See the full discovery chain documentation and relevant config entry documentation for details about supported configuration parameters.

Service Discovery

Proxies can use Consul's service discovery API to return all available, mesh-capable endpoints for a given service. This endpoint supports a cached query parameter, which uses agent caching to improve performance. The API package provides a UseCache query option to leverage caching. In addition to performance improvements, using the cache makes the mesh more resilient to Consul server outages. This is because the mesh "fails static" with the last known set of service instances still used, rather than errors on new connections.

Proxies can decide whether to perform just-in-time queries to the API when a new connection needs to be routed, or to use blocking queries to load the current set of endpoints for a service and keep that list updated. The SDK and built-in proxy currently use just-in-time resolution however many existing proxies are likely to find it easier to integrate by pulling the set of endpoints and maintaining it in local memory using blocking queries.

Upstreams may be defined with the Prepared Query target type. These upstreams should use Consul's prepared query API to determine a list of upstream endpoints for the service. Note that the PreparedQuery API does not support blocking, so proxies choosing to populate endpoints in memory will need to poll the endpoint at a suitable and, ideally, configurable frequency.

Long-term support for service-resolver configuration entries. The service-resolver configuration will completely replace prepared queries in future versions of Consul. In some instances, however, prepared queries are still used.

Sidecar Instantiation

Consul does not start or manage sidecar proxy processes. Proxies running on a physical host or VM are designed to be started and run by process supervisor systems, such as init, systemd, supervisord, etc. If deployed within a cluster scheduler (Kubernetes, Nomad), proxies should run as a sidecar container in the same namespace.

Proxies will use the CONSUL_HTTP_TOKEN and CONSUL_HTTP_ADDR environment variables to contact Consul and fetch certificates. This occurs if the CONSUL_HTTP_TOKEN environment variable contains a Consul ACL token that has the necessary permissions to read configurations for that service. If you use the Go api package, then the environment variables will be read and the client configured for you automatically.

Alternatively, you may also use the flags -token or -token-file to provide the Consul ACL token.

Providing a Consul ACL Token

$ consul connect envoy -sidecar-for "web" -token-file=/etc/consul.d/consul.token

If TLS is enabled on Consul, you will also need to add the following environment variables prior to starting the proxy:

The CONSUL_CACERT, CONSUL_CLIENT_CERT and CONSUL_CLIENT_KEY can also be provided as CLI flags. Refer to the consul connect proxy documentation for details.

The proxy service ID comes from the user. See consul connect envoy for an example. You can use the -proxy-id flag to specify the ID of the proxy service you have already registered with the local agent.

Alternatively, you can start the service using the -sidecar-for=<service> option. This option queries Consul for a proxy that is registered as a sidecar for the specified <service>. If exactly one service associated with the proxy is returned, the ID will be used to start the proxy. Your controller only needs to accept -proxy-id as an argument; the Consul CLI will resolve the ID for the name specified in -sidecar-for flag.