Consul
Scale API gateways on Kubernetes
Use Gateway annotations to configure per-gateway scaling for the managed Consul GatewayClass on Kubernetes. You can set a fixed replica count for a gateway or let Consul manage a Kubernetes horizontal pod autoscaler (HPA) for that gateway.
Enterprise
This feature requires Consul Enterprise(opens in new tab).
Requirements
Gateway scaling is available only when all of the following are true:
- You are using the managed Consul API gateway class on Kubernetes.
- The Helm value
connectInject.apiGateway.managedGatewayClass.scaling.enabledis set totrue. - The connected Consul cluster reports a valid Consul Enterprise license.
If gateway scaling is not enabled, Consul ignores gateway scaling annotations and does not create controller-managed HPAs.
Enable gateway scaling
Set the connectInject.apiGateway.managedGatewayClass.scaling.enabled Helm value to true when you install or upgrade Consul:
values.yaml
connectInject:
enabled: true
apiGateway:
managedGatewayClass:
scaling:
enabled: true
Refer to the Helm chart reference for connectInject.apiGateway for additional context.
Configure static replicas
Add the consul.hashicorp.com/default-replicas annotation to a Gateway resource when you want Consul to keep the gateway deployment at a fixed replica count.
gateway.yaml
apiVersion: gateway.networking.k8s.io/v1beta1
kind: Gateway
metadata:
name: api-gateway
annotations:
consul.hashicorp.com/default-replicas: "4"
spec:
gatewayClassName: consul
listeners:
- name: http
protocol: HTTP
port: 8080
default-replicas must be a positive integer.
Configure controller-managed HPA
Add HPA annotations to a Gateway resource when you want Consul to create and reconcile an HPA for the gateway deployment.
gateway.yaml
apiVersion: gateway.networking.k8s.io/v1beta1
kind: Gateway
metadata:
name: api-gateway
annotations:
consul.hashicorp.com/hpa-enabled: "true"
consul.hashicorp.com/hpa-minimum-replicas: "3"
consul.hashicorp.com/hpa-maximum-replicas: "25"
consul.hashicorp.com/hpa-cpu-utilisation-target: "70"
spec:
gatewayClassName: consul
listeners:
- name: http
protocol: HTTP
port: 8080
When HPA mode is enabled, Consul creates a controller-managed HPA named <gateway-name>-hpa.
If you omit optional HPA annotations, Consul uses the following defaults:
- Minimum replicas:
1 - Maximum replicas:
10 - CPU utilization target:
80
The HPA annotations use the following validation rules:
consul.hashicorp.com/hpa-minimum-replicasmust be at least1consul.hashicorp.com/hpa-maximum-replicasmust be at least1- minimum replicas cannot be greater than maximum replicas
consul.hashicorp.com/hpa-cpu-utilisation-targetmust be between1and100
Precedence and deprecated fields
Consul resolves gateway scaling in the following order:
- A user-managed HPA that targets the gateway deployment
- Gateway scaling annotations
- Deprecated
GatewayClassConfig.spec.deploymentfields
If a user-managed HPA already targets the gateway deployment, Consul does not create or manage its own HPA for that gateway.
GatewayClassConfig.spec.deployment.defaultInstances, minInstances, and maxInstances are deprecated. Keep using gateway annotations for new configurations and migrations. Refer to the GatewayClassConfig reference for the full resource schema.
Manual scaling behavior
If a gateway is not using gateway annotations or a controller-managed HPA, Kubernetes deployment scale remains user-managed after the initial deployment is created. This lets you manually scale the gateway deployment without Consul continuously forcing it back to an earlier replica count.