Design your network

Network capacity planning prevents service outages and reduces latency for distributed applications. The network that connects your services and its performance is critical to ensuring your applications run quickly and smoothly. As your application gains more users, sends more traffic, and adds more backend services, the load on your network can quickly grow. A well-designed network aims to distribute traffic to minimize latency and maximize performance.

Why design your network

Proper network design and capacity planning addresses the following strategic challenges:

Prevent service degradation from network congestion: Inadequate network capacity causes connection timeouts, slow response times, and failed requests that directly impact user experience and application reliability.
Reduce data transfer costs: Inefficient network design leads to unnecessary data transfer across regions or availability zones, resulting in high egress charges and wasted bandwidth.
Enable global scalability: Without proper network architecture and content distribution, applications cannot serve users across different geographic regions with acceptable latency.
Minimize security vulnerabilities: Poorly designed network segmentation and traffic routing expose services to unauthorized access and increase the attack surface.

The network design workflow follows these steps:

Plan for known capacity: First, run load and stress tests to understand traffic patterns, bandwidth requirements, and connection limits.
Design to meet demand: Next, based on test results, configure load balancers, CDNs, and service mesh to distribute traffic and reduce latency.
Respond to scale issues: Finally, monitor network metrics and adjust capacity to address performance issues before they affect users.

Plan for your known capacity

This section explains how to use load and stress testing to understand your network's traffic patterns, bandwidth limits, and connection capacity.

To plan for your network capacity, you must know what your network traffic looks like first. You should ask yourself questions like the following:

How many concurrent connections will this network serve? An application that serves tens of thousands of connections versus a service that serves millions will have different requirements.
How many requests will each connection make on average? A single user session could generate 50-200 requests for a web application.
What do those requests look like? Small API requests might transfer 1-10KB, while image-heavy pages could transfer several MB per request.
Are there known times that traffic will spike? E-commerce sites often see an increase in traffic during sales, requiring capacity planning for peak loads.
How does scaling this application affect the network? Adding 100 compute instances increases network traffic proportionally if load balancing isn't optimized.
How does scaling the network affect cost? Cloud providers charge for data transfer, with higher costs for cross-region traffic.
Where are the connections coming from? A global user base requires CDN distribution, while regional applications can use single-region infrastructure.

Use load and stress testing to answer these questions and inform your network design decisions.

Similar to load and stress test your compute infrastructure to find bottlenecks in CPU and memory usage, you can perform similar tests on your network. Use network load testing tools like Apache Bench (ab), wrk, or Vegeta to simulate realistic traffic patterns. By simulating expected, normal usage on your network, you can identify the key metrics to monitor and use to determine when to scale your infrastructure. Some of these common metrics include the following:

Total bandwidth
Connection or user count
Requests per second
Latency

Once you understand the load on your network under normal use, stress tests can help identify where your network breaks under sudden extreme workloads. By testing these high demand situations, you can verify that your plan to scale responds to these needs correctly.

Design to meet demand

Demand on your infrastructure and network can change frequently and suddenly. Many factors can affect the number of people using your application, such as time of day, time of year, holidays, and special events. For example, an online store faces a much higher demand on their infrastructure during a holiday sale than they normally do. By understanding your traffic patterns, you can preemptively provision and scale additional infrastructure as code to prepare for the increased traffic.

Network optimization strategies help distribute traffic efficiently and reduce latency:

Load balancers distribute incoming network traffic across multiple servers to prevent any single server from becoming overwhelmed. Content Delivery Networks (CDNs) cache static content at edge locations closer to users, reducing the load on origin servers and improving response times. Service mesh provides infrastructure-level service discovery and communication management for microservices, eliminating the need to manage individual IP addresses or DNS entries.

Choose network optimization strategies based on your traffic patterns and user distribution:

Use load balancers when you need to distribute traffic across multiple application instances and require automatic failover if instances become unavailable. Use CDNs when you serve static content (images, videos, CSS, JavaScript) to geographically distributed users and want to reduce origin server load. Use service mesh when you have microservices architectures with many inter-service communications (such as applications connecting to databases and other backend services) and need service discovery, traffic management, and observability.

You can use Terraform to configure load balancers alongside your application infrastructure.

Before configuring load balancers, you need existing infrastructure including compute instances or container services, a VPC, and subnets. The following example assumes these resources exist. This Terraform configuration creates an application load balancer that distributes traffic across multiple instances:


resource "aws_lb" "app" {
  name               = "app-lb"
  internal           = false
  load_balancer_type = "application"
  subnets            = data.aws_subnets.app.ids

  tags = {
    Environment = "production"
  }
}

resource "aws_lb_target_group" "app" {
  name     = "app-tg"
  port     = 80
  protocol = "HTTP"
  vpc_id   = data.aws_vpc.app.id

  health_check {
    path                = "/health"
    healthy_threshold   = 2
    unhealthy_threshold = 2
    interval            = 30
  }
}

resource "aws_lb_listener" "app" {
  load_balancer_arn = aws_lb.app.arn
  port              = 80
  protocol          = "HTTP"

  default_action {
    type             = "forward"
    target_group_arn = aws_lb_target_group.app.arn
  }
}

This configuration creates a load balancer that distributes incoming HTTP traffic across application instances, performs health checks every 30 seconds, and automatically removes unhealthy instances from rotation.

One of the benefits of using Consul is its ability to build and maintain a service mesh.

Before configuring service mesh with Consul, you need Consul installed and running on your infrastructure. The following example assumes Consul is configured and agents are connected to the cluster. This Consul configuration defines a service that registers itself with the service mesh:


service {
  name = "redis"
  id   = "redis"
  port = 80
  tags = ["primary"]

  tagged_addresses = {
    lan = {
      address = "192.168.0.55"
      port    = 8000
    }

    wan = {
      address = "198.18.0.23"
      port    = 80
    }
  }
}

This configuration registers a Redis service with Consul's service mesh, defining both LAN and WAN addresses for internal and external access. Once registered, any machine in the Consul service mesh can send requests to redis.service.consul to reach this service, enabling automatic service discovery without manual IP address management.

After configuring your network architecture with load balancers, CDNs, and service mesh, implement monitoring to track network performance and identify capacity issues.

Respond to scale issues

How you react to growing capacity needs depends on your cloud provider. For example, AWS, GCP, and Azure do not place limits on the bandwidth an individual subnet can accommodate, rather they place limitations on resources such as compute instances and other individual pieces of your infrastructure. Refer to your cloud provider documentation to learn more.

Monitor the key metrics that you identified in your load and stress testing and configure alerts to notify you when you are approaching the limitations of your network. Set alerts when bandwidth utilization exceeds 60%, when latency increases beyond baseline by 50%, or when connection counts approach 80% of capacity limits. Monitoring also lets you have better insight into users' patterns and changing needs over time, letting you better adapt your network to these changing use cases. While you can make design decisions early to allow your network to help address capacity issues, such as autoscaling and load balancing, implement proper monitoring that can catch situations that you may not have designed for before they affect users.

HashiCorp resources

Read the Monitor network traffic Well-Architected Framework documentation.
Learn how to Set up monitoring agents
Configure dashboards and alerts
Read the Terraform resource documentation for aws_lb.
Read the Manage network ingress and egress for zero trust security Well-Architected Framework documentation.

Next steps

In this section of Select and design infrastructure, you learned how to profile the workload on your network with load and stress tests, react to growing networking needs, and design to minimize the latency between your applications and services. Select and design infrastructure is part of the Optimize systems pillar.

To learn more about how to design and scale your infrastructure, refer to the following resources: