Networking & Content Delivery

Addressing private IPv4 exhaustion with AWS Cloud WAN service insertion

In this post, we describe how you can use Amazon Web Services (AWS) Cloud WAN with service insertion to centralize your private NAT Gateways and PrivateLink to effectively and efficiently address private IPv4 exhaustion. We demonstrate how you can maximize the usage of available IP space while minimizing cost impact.

Private IPv4 space, defined in RFC 1918 and also RFC 6598, can initially seem like a vast and inexhaustible source of addresses when used correctly with careful planning. However, in most enterprise networks, customers frequently struggle to allocate adequate ranges to new projects, environments, and sometimes entire locations as their organizations grow. This can be attributed to several reasons: the difficulty of planning growth over many years and anticipating scale, changes in design direction resulting in inefficient usage, mergers and acquisitions, and the adoption of new technologies such as containers that consume large amounts of IP space.

This can result in customers having limited available space in their Private IPv4 ranges to allocate to their AWS environment as they begin their cloud journey or need to scale to meet business needs. To address this situation, AWS provides several solutions and recommendations, such as: Amazon VPC Lattice, extensive support for IPv6, PrivateLink, and the use of distributed Private NAT Gateways, as described in the post, How to solve Private IP exhaustion with Private NAT Solution.

Another strategic approach is to use AWS Cloud WAN to provide a durable, scalable, and cost-effective solution.

Prerequisites and assumptions

This post assumes that you are familiar with AWS networking services including NAT Gateways, AWS PrivateLink and AWS Cloud WAN, along with a fundamental understanding of enterprise networking concepts. We won’t focus on defining these services and concepts, but we do outline their capabilities and how you can use them for the highlighted architecture.

Solution overview

Currently, customers facing a shortage of private IPv4 addresses, and who have not yet moved to Amazon VPC Lattice or IPv6, typically deploy Private NAT Gateways within their VPCs.

The solution described in this post allows you to eliminate the need to NAT traffic remaining within AWS and maximize the usage of available IP space, which lets you scale both the number and size of VPCs that you can create.

The approach is based on assigning a private IPv4 range that is routable only within your AWS environment and deploying centralized NAT and PrivateLink at the level rather than at the VPC level. Figure 1 shows the conceptual architecture.

Conceptual architecture— Optimize NAT and PrivateLink on AWS - assigning a private IPv4 range that is routable only within your AWS environment and deploying centralized NAT and PrivateLink at the Regional level rather than at the VPC level

Figure 1: Conceptual architecture—AWS routable IP Domain

In this example, we assign the Carrier-Grade NAT (CGN) range 100.64.0.0/10 (RFC 6598) to the AWS environment and divide it into /12 CIDRs to assign to each AWS Region. This range is routed across AWS Regions but is translated using Regional private NAT gateways into a company-wide routable IP domain (10.0.0.0/8 in the example) as it leaves AWS. This allows you to have an extended range that you can use to deploy VPCs and workloads and route freely within AWS without the need to NAT traffic leaving one VPC going to another.

The range chosen here is used as an example, because customers have traditionally elected to use it behind NAT devices and not route it across company WANs. If you are routing this range, then you can choose any other private range that is only deployed behind NAT devices and that can be reused in multiple locations, such as AWS, corporate, and third-party data centers.

If you do not have a large enough non-routed IP space, then you can also choose a smaller range that you can reuse across all your AWS Regions, on-premises locations, and third-party data centers. In this case, traffic remaining within the Region isn’t translated. However, all traffic leaving the Region, either to another AWS Region or to on-premises locations, must be translated by the Regional private NAT gateways.

In this post, we explore how this can be accomplished by applying the logic outlined in the following points and shown in Figure 2 as key design principles:

  1. AWS routable IP domain to AWS routable IP domain: Routed
  2. AWS routable IP domain to company-wide routable IP domain: NAT
  3. Company-wide routable IP domain to AWS routable IP domain: PrivateLink
  4. Company-wide routable IP domain to company-wide routable IP domain: Routed

Design Principles - optimize NAT and PrivateLink Deployment on AWS

Figure 2: Design principles

Solution architecture

The high-level architecture in Figure 3 shows the logical configuration of a Region, which includes the following elements:

  • The workload VPCs using the AWS routable IP space (100.64.0.0/10), where the ability to use a large CIDR allows for greater scalability. This range being routable with AWS eliminates the need for NAT gateways or VPC endpoints within the workload VPCs.
  • A single centralized outbound NAT VPC to translate traffic as it leaves AWS to on-premises. The use of centralized NAT reduces the number of private NAT gateways required. This VPC needs a small routable subnet (10.2.100.0/26 in the example), where a /28 could be assigned to each Availability Zone (AZ) to accommodate the NAT gateways.
  • A single centralized inbound VPC to deploy VPC endpoints for inbound traffic to applications. The use of a centralized inbound VPC allows you to maximize the usage of the available routable IP space (10.1.0.0/20 in this example) instead of having to divide it into smaller subnets assigned to workload VPCs for each AZ. Using the space more efficiently allows you to maximize the number of VPC endpoints that you can have.

High Level Architeture - Optimize NAT and PrivateLink on AWS - Centralized Private NAT and Centralized Inbound VPC for PrivateLink

Figure 3: High level architecture

Building connectivity with AWS Cloud WAN

To demonstrate how the routing works, Figure 4 introduces AWS Cloud WAN and AWS Direct Connect and shows the VPC design in more detail.

The Network Account owner is responsible for deploying both the centralized outbound NAT VPC and the centralised inbound VPC. They also own the connectivity, such as AWS Cloud WAN and Direct Connect. The routable subnets, shown in blue in Figure 4, are assigned 10.2.100.0/26 for NAT and 10.1.0.0/20 for VPC endpoints. These two CIDRs need to be advertised to on-premises networks over Direct Connect.

 

Traffic Flow for Centrazlised Private NAT and Centralized PrivateLink with AWS Cloud WAN 

Figure 4: Traffic flows within a Region

Traffic initiated from a workload in a VPC to on-premises

  1. Traffic initiated from workload in VPC 1, is routed to AWS Cloud WAN core network through the attachment.
  2. AWS Cloud WAN will be configured to route all traffic destined for on-premises networks to the centralized outbound NAT VPC.
  3. The NAT Gateway translates the source IP address to its own address (10.2.100.1 in this example) and routes it be back to AWS Cloud WAN.
  4. Cloud WAN then forwards the traffic to on promises networks over AWS Direct Connect.

Return traffic follows the same path back, steps (5) through (8). Return packets are destined to the NAT IP address (10.2.100.1). The NAT Gateway translates the destination IP address from its own to that of the workload that initiated the request. It then sends the packet to AWS Cloud WAN, which forwards it back to the VPC and the workload.

Traffic initiated from on-premises to a workload in a VPC

  1. Traffic initiated from on-premises networks is routed over AWS Cloud WAN over Direct Connect.
  2. AWS Cloud WAN forwards the traffic directly to the VPC endpoint (10.1.0.1 in this example) deployed in the centralized inbound VPC, as it does not need any translation.
  3. Using PrivateLink, the VPC endpoint sends the packet to the
  4. Then, the NLB can send it to one of the instances in its target group.

The return traffic follows the same path back, steps (5) through (8). The instance sends the return packet to the NLB, which forwards it through the same VPC endpoint. From there, the packet is sent to AWS Cloud WAN, which then routes it back to the on-premises network via Direct Connect.

Routing with AWS Cloud WAN service insertion

Now that we described the architecture and the flow patterns, let’s examine the solution implementation. We will demonstrate how you can use AWS Cloud WAN service insertion to insert the centralized NAT gateways in the path of outbound traffic and use route sharing between segments for inbound traffic.

Figure 5 shows the AWS Cloud WAN segments and provides a view of the route tables for both the segments and VPC subnets. It also shows how multiple accounts can use the centralized inbound VPC to deploy the VPC endpoints for their respective applications.

Detailed traffic flows and route tables for Centrazlised Private NAT and Centralized PrivateLink with AWS Cloud WAN

Figure 5: Routing with AWS Cloud WAN service insertion

Configure AWS Cloud WAN segments and VPC attachments

To deploy this architecture, you need to configure four segments on the AWS Cloud WAN core network and the associated attachments:

  • Production segment: Used to associate workload VPCs attachments.
  • Hybrid segment: Used to associate connectivity to on-premises networks over Direct Connect.
  • Inbound segment: Used to associate the centralized inbound VPC. This is the VPC hosting the PrivateLink Endpoints.
  • Outbound Network Function Group (NFG): This is a managed segment created by AWS Cloud WAN service insertion and used to attach the centralized outbound NAT VPC and insert the NAT gateway in the path of outbound traffic.

The deployment of AWS Cloud WAN core network and the configuration of segments and attachments are outside of the scope of this post but are detailed in the AWS Cloud WAN User Guide.

Configure routing for outbound traffic

Outbound traffic initiated from a workload in a VPC toward on-premises resources needs to be translated (NAT) before it leaves AWS. Using service insertion, you define your intent in policy, and AWS Cloud WAN makes sure that outbound traffic is routed through the centralized outbound NAT VPC and the NAT gateways within it.

Depending on your applications, you might also need to allow traffic initiated from workloads inside of a VPC to reach a PrivateLink endpoint deployed in the centralized inbound VPC. The endpoint resides in the company-wide routable IP domain (10.1.0.0/20 in this example), whereas the workloads reside in the AWS routable IP domain (100.64.0.0/10 range). Therefore, this traffic must also be translated (NAT) as per the design principles described previously.

Traffic remaining within the Production segment does not need to be NAT’d.

The necessary service insertion policy intent is as follows:

Service insertion policy intent
Source segment Destination segment Action
Production Hybrid Send-via Outbound NFG
Production Inbound Send-via Outbound NFG

Table 1: Service insertion policy intent

To set up service insertion for the Production segment, follow the steps described in the section Add a segment action of the AWS Cloud WAN User Guide.

When traffic reaches the NAT gateway, it translates the source IP address of the packet to its own address (10.2.100.1 in this example). To make sure that return traffic can reach the NAT gateway, a static route is needed in both the Hybrid and Inbound segments. This is because, although the service insertion feature ensures dynamic route propagation between the source and destination segments, it does not propagate routes of VPCs directly attached to the NFG. In Figure 5, the static route is shown in red in both the Hybrid and Inbound segment route tables. This is the only static route needed in the solution deployed at initial implementation and points to the entire CIDR of the centralized outbound NAT VPC.

Configure routing for inbound traffic

Traffic from on-promises networks coming into AWS needs to be directed to the PrivateLink endpoints deployed in the centralized inbound VPC. Therefore, the subnets used in this VPC belong to the company-wide routable IP domain and can be routed directly without the need for a NAT gateway.

To route directly between segments, you need to use segment sharing instead of service insertion. The latter, as the name suggests, is used to insert a service in the path of traffic, which is not needed here.

Configuring segment sharing between the Inbound and Hybrid segments allows them to dynamically share routes. This results in the Hybrid segment learning the routes for the centralized inbound VPC and the inbound segment learning the routes for on-premises networks.

This establishes a direct path between on-premises networks and the centralized inbound VPC, which is the desired outcome. If you need to inspect traffic from on-premises networks, then you can create a separate inspection VPC and use service insertion to route traffic to it. This is described in the post, Simplify global security inspection with AWS Cloud WAN Service Insertion.

Multi-Region architecture

While this post focuses on a single-Region implementation, the concepts presented can be extended to multi-Region deployments. A detailed exploration of multi-Region architectures, however, is beyond the scope of this discussion.

Things to know

  • The AWS routable IP domain can be reused outside AWS but should not be advertised on the company network.
  • If the AWS routable IP domain is used outside AWS, then it needs to be behind both outbound and inbound NAT.
  • Consider your DNS resolution and make sure that resources outside of AWS do not resolve domain names to IP addresses within the AWS routable IP domain.
  • The NFG does not dynamically propagate routes from the NAT VPC attached to it. You must configure a static route in other segments to point back to the NAT VPC to allow return traffic.
  • The Hybrid segment dynamically learns the AWS routable IP domain routes from the Production segment. Make sure that these are not advertised to on-premises networks.
  • The route tables in this post show the relevant routes only. Other routes can be dynamically learned but do not impact routing.

Conclusion

In this post, we discussed how you can use AWS Cloud WAN with service insertion to address the challenge of private IPv4 exhaustion across your organization. The architecture described here allows you to create a scalable platform by using IP ranges that are only routable within the AWS environment. At the same time you can optimize costs by centrally aggregating your NAT resources for an entire Region rather than deploying them within your workload VPCs. Furthermore, the use of a centralized inbound VPC together with AWS PrivateLink allows you to maximize the usage of the routable Private IPv4 range available to you.

About the authors

Mehdi Dahane

Mehdi Dahane

Mehdi is a network specialist solutions architect at AWS supporting global Accounts advising on cloud architectures and developing solutions to deliver desired outcomes and optimise efficiency, operations and costs. He has over 20 years of cross-industry experience in a wide range of areas including public and private cloud environments, SDN, global connectivity, security and multi-tenant environments. When not working he can be found running or enjoying the outdoors with family and friends.

Philipp Schaefer

Philipp Schaefer

Philipp Schaefer is a principal solution architect at AWS. With over 20 years of experience in software design in manufacturing, healthcare and energy Industries, he works with AWS customers and partners on their cloud transformation programs.