Networking & Content Delivery

Best Buy Health’s Resilient Care Centers powered by AWS Cloud WAN and SD-WAN

In the Retail and Healthcare industries, critical connectivity through highly resilient networks are required to serve customers and run operations. Customers in these sectors often need to connect their corporate enterprise networks to stores, contact centers, distribution centers, or in the case of Best Buy Health: customer care contact centers. Best Buy Health provides care solutions for patients and customers that require highly resilient voice network paths including care-at-home solutions, senior-friendly mobile phone devices designed for simplicity, and expert health response teams for technical, emergency, and social needs. This post explores a comprehensive technical solution that Best Buy Health used for integrating Fortinet FortiGate Software Defined Wide Area Networks (SD-WAN) within Amazon Virtual Private Cloud (Amazon VPC) using Amazon Web Services (AWS) Cloud WAN. This unique dual-standalone approach avoids traditional High Availability (HA) architecture drawbacks and yielded a 66% latency reduction connecting customers to workloads in AWS.

The Best Buy Health mission

The Best Buy Health mission is to enrich people’s lives through technology. It has grown to become a household name that’s synonymous with high-quality electronics and exceptional customer service. Best Buy Health, a division of Best Buy, aims to enrich and save lives through technology and meaningful connections. Best Buy Health is focused on enabling care at home by providing consumer health products that help customers live healthier lives, device-based emergency response services for the active aging population, and virtual care offerings that help connect patients and physicians.

Best Buy’s cloud-native network architecture

Best Buy has a cloud-first mandate where new workloads and supporting infrastructure solutions should attempt to use cloud native solutions. For Best Buy Health’s important health-related voice traffic, low latency and highly available connectivity from Care Centers to workloads in AWS is crucial. This traffic could be related to a healthcare emergency, so a robust architecture was needed.

Before we discuss Best Buy Health’s cloud-native current state network architecture, the following section demonstrates their previous state architecture.

Previous state architecture

Best Buy Health’s original design connecting AWS workloads to Care Centers is shown in Figure 1.

Figure 1: Best Buy Health's previous state network architecture using routing through on-premises datacenters
Figure 1: Best Buy Health’s previous state network architecture using routing through on-premises datacenters

This design follows Best Buy Health’s original pre-cloud-first design tenants. Traffic from Best Buy Health’s Care Centers routes through multiple hops of a Multi-Protocol Label Switching (MPLS) WAN network to their data centers, then to AWS Transit Gateways through AWS Direct Connect connections and a Direct Connect gateway. From there, traffic is finally routed to the target production and non-production VPCs (attached to respective Transit Gateways) containing the application workloads. Although this previous state does allow for HA through an active/passive configuration, routing through the datacenters wasn’t ideal.

Challenges with previous state architecture

Best Buy Health’s Care Centers would serve their customers by connecting to workloads in AWS and routing through Best Buy data centers. This design had drawbacks, such as the following:

  • Several more milliseconds of latency and more hops on sensitive voice traffic leading to degraded customer experience.
  • Network data transfer costs from telcos by routing through MPLS WAN and on-premises data centers.
  • Increased operational overhead on network operations teams that needed to manage the MPLS WAN links, Direct Connect connections, and Transit Gateway routing tables.
  • Most importantly, the previous design didn’t follow Best Buy’s cloud-native design requirements and continued to depend on their data centers.

A new solution was needed to improve customer experience, follow Best Buy’s design tenants, and reduce operational overhead.

Current state architecture using AWS Cloud WAN

There were two AWS cloud-native SD-WAN connectivity solutions that were evaluated:

  1. Transit Gateway with Transit Gateway Connect attachments
  2. AWS Cloud WAN with Tunnel-less Connect attachments

Best Buy’s previous experiences with Transit Gateway Connect attachments in their retail business led them to use AWS Cloud WAN for Best Buy Health for the following reasons:

  • AWS Cloud WAN offers comprehensive multi-Region networking capabilities with built-in support for dynamic routing across AWS Regions, while abstracting the complexity of network management. Best Buy Health uses a single Region (us-west-2) for their workloads today. AWS Cloud WAN enables seamless future AWS Region expansion and streamlined routing configuration using global routing tables and network segmentation. AWS Cloud WAN removes the need to manually configure and maintain individual Transit Gateways, thereby significantly reducing operational overhead while natively enhancing security through consistent policy enforcement and centralized network management across the global infrastructure.
  • Tunnel-less Connect attachments with AWS Cloud WAN allow for higher bandwidth and reduced overhead because they don’t need Generic Routing Encapsulation (GRE) tunnels between the SD-WAN appliances and the AWS Cloud WAN network. On the other hand, Transit Gateway Connect attachments need GRE tunnels to be established between the SD-WAN appliances and the Transit Gateway, thereby limiting the maximum bandwidth that can be achieved.
  • AWS Cloud WAN streamlines multi-Region networking by providing dynamic routing, segmentation, and global routing tables through a centralized policy-driven framework. This intent-based approach, supported by versioned policy documents with rollbacks, reduces configuration operational overhead while making sure of consistent security and network management across the global infrastructure.

When Best Buy began implementation using AWS Cloud WAN and SD-WAN using Tunnel-less Connect attachments, they followed the traditional active-standby HA reference architecture with their SD-WAN vendor, Fortinet. Soon after, they discovered BGP convergence times during failover events would interrupt live voice traffic from customers to customer Care Centers. A new SD-WAN architecture was needed to eliminate these convergence times. Best Buy experimented with other architectures and landed on a unique dual-standalone architecture. This architecture enables seamless traffic convergence during planned maintenance events of SD-WAN devices (such as patching, code upgrades, and so on) and reduced convergence time during unplanned maintenance, including catastrophic failures of an SD-WAN appliance or an AWS Availability Zone (AZ).

Current state architecture details

Figure 2 shows Best Buy Health’s current state network architecture using Fortinet FortiGate SD-WAN appliances and AWS Cloud WAN for routing traffic between Care Centers and applications running in Amazon VPCs.

Figure 2: Best Buy Health's current state network architecture using Fortinet FortiGate SD-WAN appliances and AWS Cloud WAN

Figure 2: Best Buy Health’s current state network architecture using Fortinet FortiGate SD-WAN appliances and AWS Cloud WAN

In the current state architecture, Fortinet FortiGate SD-WAN Virtual Machines (VMs) serve as headend devices across two AZs, providing redundancy while maintaining seamless AWS Cloud WAN integration. Unlike traditional active-standby configurations, this architecture implements a dual-standalone model where both VMs simultaneously serve Best Buy Health Care Centers. Each VM maintains dual Elastic Network Interfaces (ENIs): one interface in a private subnet and another in a public subnet mapped to an Elastic IP address (EIP). The VMs operate independently with dedicated underlay over the public internet and a BGP-over-IPSec (Internet Protocol Security) overlay to AWS Cloud WAN. This makes sure that the failure of one VM doesn’t impact the other. As a result, the traffic path is streamlined, which reduces latency, hops, and complexity because AWS Direct Connect is no longer needed. Best Buy Health Care Centers are connected directly to AWS using SD-WAN over the internet using two separate paths. This mitigates concerns about any unreliability of internet circuits.

Each Care Center FortiGate appliance uses an Amazon Route 53 Resolver ENI in the SD-WAN VPC AZs as a target for the health check IP Service Level Agreement (SLA). The best path for the SD-WAN architecture is determined based on the metrics gathered from these IP SLA targets. Best Buy opted not to use the SD-WAN headend devices themselves for IP SLA to make sure of independence between the device and the AZ. This design provides true dual-standalone redundancy rather than traditional failover configurations.

Integration strategy, results, and considerations

Best Buy’s unique dual-standalone approach and integration with AWS Cloud WAN yields results that meet specific business requirements for Best Buy Health. These include improved traffic flow, simplicity, fast failovers, and improved performance for Care Center employees using workloads in AWS. In the following sections we walk through the decisions, integration strategy, tradeoffs, and real-world observed performance.

AWS Cloud WAN integration strategy

Integration with AWS Cloud WAN uses a VPC attachment as the foundational transport mechanism. Both SD-WAN VMs share a single Connect attachment (the light blue line block in the preceding Figure 2) while maintaining individual connectivity paths through the VPC transport attachment extended across private subnets where the VMs’ private interfaces connect. This Connect attachment supports two AWS Cloud WAN Tunnel-less Connect BGP peers, with a Connect BGP peer dedicated to each VM (the green line in the preceding Figure 2). Using a Tunnel-less Connect BGP peer is more beneficial because it eliminates the five-tuple single flow principle (which GRE tunnels in Transit Gateway Connect attachments enforce). Tunnel-less Connect BGP peers also allow for improved throughput and peer addresses to correspond directly to the SD-WAN VMs’ private interface IP addresses. Lastly, it reduces overhead on the line and streamlines troubleshooting.

Branch connectivity and BGP routing

All branch FortiGate devices establish overlay tunnels to both headend SD-WAN VMs’ public interfaces. Branch-level redundancy includes dual internet circuits at each site. This, combined with dual headend VMs, creates robust multi-path redundancy. Each VM establishes an independent BGP neighbor relationship with the AWS Cloud WAN Core Network Edge (CNE) through the respective AWS Cloud WAN Connect peers. The transport VPC attachment is associated with a dedicated SD-WAN segment within AWS Cloud WAN, which makes sure that the Connect attachment inherits the same segment membership.

When the BGP sessions are operational, routing becomes bidirectional. Routes permitted outbound from the SD-WAN VMs toward AWS Cloud WAN propagate throughout the SD-WAN segment. The routes that are learned within the segment propagate back to the VMs, which provides complete route visibility and control.

Workload network integration

Production and non-production workload VPCs associate with dedicated, respective AWS Cloud WAN segments, thus maintaining environment isolation. Route propagation uses the AWS Cloud WAN segment sharing capabilities, where both production and non-production segments share routing information with the SD-WAN segment and the other way around. This segment sharing model enables SD-WAN VMs to learn about workload networks without direct attachment to production and non-production segments, maintaining security boundaries while making sure of connectivity. Workload VPC networks are dynamically propagated to SD-WAN VMs’ routing tables, thus eliminating manual route management.

Unique connectivity model: dual-standalone or traditional HA

An important requirement for Best Buy was the need for sub-second BGP convergence. The nature of the Best Buy Health business is such that important health-related voice traffic cannot be interrupted during SD-WAN headend appliance planned maintenance or unplanned outages. This typically necessitates a lengthy BGP convergence time. In traditional active-standby configurations where one VM handles all routing functionality while the second remains in standby, failover events can cause interruptions. These configurations depend on default BGP timers (keepalive and holddown) within AWS Cloud WAN, and the BGP convergence times cannot be minimized through Bidirectional Forwarding Detection (BFD) because it is not supported today. The dual-standalone model circumvents timing dependencies by maintaining continuous active sessions from both VMs. Therefore, route convergence becomes less critical because alternative paths remain constantly available.

Real-world performance outcomes

Although this dual-standalone architecture provides significant advantages, real-world testing reveals important limitations that organizations must understand. During catastrophic failure scenarios where traffic fails over from AZ 1 to AZ 2 (such as public interface failure or VM unresponsiveness), AWS Cloud WAN still needs approximately 30 seconds to fully redirect return traffic from AWS toward Best Buy Health branches to the AZ 2 FortiGate VM. As a result of this architecture evolution from the previous configuration where datacenters were part of the routing path, Best Buy saw a 66% round trip latency decrease, which directly translates into Care Center employee and customer experience improvements.

The dual-standalone model offers seamless failover primarily for planned maintenance scenarios, but it faces challenges with unexpected catastrophic failures and Equal-Cost Multi-Path ECMP routing implementation. Despite these limitations, the architecture provides significant benefits such as HA, reduced single points of failure, decreased latency, and automated route propagation. It scales efficiently, allowing for more new branches and workload VPCs without core infrastructure changes. However, organizations must carefully consider bandwidth requirements for AWS Cloud WAN Connect attachments, because they handle all branch-to-workload traffic. Although this solution is not perfect for every scenario, it offers a robust and scalable network architecture with zero-downtime maintenance capabilities, balancing performance, and manageability for enterprise networks.

Other considerations

Like many architecture solutions, there are tradeoffs: the most significant of which is the SD-WAN technology synchronization capability. Traditional SD-WAN HA provides session synchronization between the nodes, which makes sure that connection states and security policies remain consistent during failover events. Best Buy’s dual-standalone model cannot provide session synchronization because the VMs operate independently without shared state information. This limitation necessitates disabling stateful inspection features on the headend SD-WAN VMs to prevent connection disruptions when traffic paths change between VMs. Although this represents a security trade-off at the headend level, branch SD-WAN devices maintain full security inspection capabilities, making sure that the security inspection perimeter remains intact.

Conclusion

Best Buy Health’s dual-standalone SD-WAN integration solution represents an innovative difference from traditional HA models, prioritizing performance and convergence speed over certain security features. Combining FortiGate SD-WAN capabilities with the advanced routing features of AWS Cloud WAN allowed Best Buy Health to achieve enterprise-grade networking that scales with cloud adoption. For Best Buy Health’s customers, this represents an improved customer experience as their care center employees work with applications hosted in AWS due to a 66% latency reduction from the former architecture. This solution particularly benefits organizations that have stringent convergence requirements, such as those supporting voice communications, while maintaining security boundaries through distributed inspection at branch locations. Best Buy Health’s unique design helps them continue to delight customers using important healthcare customer care centers.

To learn more about implementing AWS Cloud WAN for your organization, visit the AWS Cloud WAN documentation.

About the authors

Mandar Sawant is a Senior Solutions Architect having a broad experience in AWS Core Networking Services to help customers design Well-Architected solutions. He is passionate about networking technologies and loves to innovate and help solve complex customer problems. Mandar holds a master’s degree in Computer Networking from University of Missouri, Kansas City. Mandar is based out of Seattle and enjoys outdoor photography.

Moulee Natarajan is a Staff Network Engineer at Best Buy, focusing on Data Center and Cloud Networking Core Services. He is passionate about networking technologies and enjoys designing, building, and supporting reliable network solutions. His interests include traditional Data Centers, SDN, Security, SD-WAN, and private and public cross-cloud connectivity.

Jason Schamp is a Principal Solutions Architect based out of Cleveland, Ohio. Jason is focused on guiding enterprise Retail/CPG customers through their cloud journeys, accelerating migrations, modernizing workloads, and adopting new ways of working. Jason has a specialty in Security and Compliance and is passionate about container security, cloud operations, automation, and self-service.