AWS Storage Blog
SAN boot your Amazon EC2 enterprise environments from Amazon FSx for NetApp ONTAP
Traditionally, many enterprises and organizations with on-premises infrastructure have used boot-from-SAN (Storage Area Network) rather than using locally attached storage. Booting from SAN offers centralized management and backup of boot volumes, supports high availability through multipathing, and enables greater flexibility by allowing systems to boot from pre-configured OS images hosted on a shared storage array to reduce cost.
Amazon FSx for NetApp ONTAP brings these benefits to the cloud. As a fully managed Amazon Web Services (AWS) service, FSx for ONTAP delivers a virtualized enterprise-class storage array supporting features such as high-throughput I/O, deduplication, compression, compaction, replication, and block-level access through iSCSI and NVMe/TCP. Most importantly for SAN booting, it supports thin cloning. FSx for ONTAP allows for a single thinly provisioned LUN to serve as the base “golden image” for an operating system (OS). Read-write snapshot clones of this LUN can be rapidly provisioned and presented to hundreds of servers as individual boot volumes. Each clone stores only the minimal differences that define a server’s identity, thus this approach dramatically reduces overall storage requirements. Furthermore, the FSx for ONTAP built-in awareness of shared data regions allows frequently accessed blocks to be cached once in memory and served to all clones, effectively extending the apparent cache size, and improving performance across the board. Because FSx for ONTAP also offers high availability and disaster recovery (HA/DR) capabilities through advanced replication, boot volumes can be integrated into HA/DR workflows. This makes sure of a consistent OS state across environments without manual effort.
The primary challenge with SAN booting has historically been the requirement for specialized boot firmware or host bus adapters, which are commonly unavailable in cloud environments. But what if you could realize the benefits of SAN boot without the need for specialized hardware? In the following sections we demonstrate how.
Background on AWS boot devices
AWS instances typically boot from Amazon Elastic Block Store (Amazon EBS) volumes, which are tightly integrated with Amazon Elastic Compute Cloud (Amazon EC2). This integration enables fast and predictable boot times through features like EBS Fast Snapshot Restore and EBS Provisioned IOPS for volume initialization. Amazon EBS also offers enhanced security using customer-managed key (CMK) encryption, high resiliency with independent boot volumes, and time-based AMI copies for efficient and consistent AMI distribution across Regions. Designed for both general-purpose and high-performance workloads, Amazon EBS is the default boot device for Amazon EC2.
In this post, I demonstrate how you can boot from iSCSI LUNs hosted on FSx for ONTAP file systems—whether in Single-Availability Zone (AZ) or Multi-AZ configurations. These LUNs can be thinly provisioned, space-efficient, and replicable across AZs or AWS Regions.
Properly configured, SAN booting from FSx for ONTAP can help reduce storage costs at scale while streamlining HA/DR operations.
We explore the two primary use cases for SAN booting with FSx for ONTAP, walk through the technical boot process, demonstrate working Linux and Windows examples, and share best practices for implementation in production environments.
Boot volume cost reduction
On-premises, SAN booting is commonly used to reduce costs when deploying hundreds of servers with nearly identical boot volumes. The same principle applies in the cloud when using iSCSI boot with FSx for ONTAP. Using thin provisioning and snapshot-based cloning means that storage capacity requirements for 100 to 200 boot volumes can be reduced to not much more than that of a single boot volume. Each server only consumes space for its unique differences from the golden image, dramatically minimizing overall storage usage. Furthermore, when following the best practices outlined later in this post, you can avoid provisioning dedicated IOPS for boot volumes, thanks to the FSx for ONTAP performance pooling. The result is significant cost savings with minimal impact on performance.
Simplified HA/DR and OS lifecycle management
OS updates and configuration changes are an ongoing necessity for enterprise workloads. SAN booting streamlines HA/DR by replicating boot volumes across AZs and remote AWS Regions. FSx for ONTAP supports both multi-AZ and long-distance replication, thus any changes to the OS or boot volume are automatically synchronized and made highly available. This reduces the number of manual steps needed to recover from failures and lowers the risk of human error—making it easier to meet strict Recovery Time Objectives (RTOs). Furthermore, updates can be staged on a clone of the golden image, thoroughly tested, and only promoted to production once validated, thus streamlining the OS update process while minimizing disruption.
How to SAN boot from FSx for ONTAP volumes
To boot from FSx for ONTAP, we use the concept of a network based chain-loader boot device. Sometimes this is called a “jumpboot”, as shown in Figure 1. The EC2 instance initially quickly boots a very tiny locked down OS image from a 1 GB EBS volume, containing a Preboot eXecution Environment (iPXE) environment. Then, iPXE chain-boots to a volume containing the actual Linux or Windows OS image on FSx for ONTAP. You can compile your own iPXE Amazon Machine Image (AMI), or use an AWS certified iPXE AMI that exists in every AWS Region, available as a community AMI. Chain-loading the OS allows continued use of Amazon EC2 console integration for launch, and start and stop operations such as the serial console. But how does iPXE know which FSx for ONTAP and iSCSI volume to boot from? When we start the desired EC2 instance with the iPXE AMI, we pass that information in the user data script which iPXE then chain-loads the new OS located on the block volume indicated in the script. For example, a SAN booted EC2 instance running Linux is shown in Figure 2, and an EC2 instance running Windows is shown in Figure 3.

Figure 1: Chain-loading from iPXE to cloned, deduplicated, compressed copies of FSx for ONTAP Block LUNs

Figure 2: Chain-loading from iPXE into Linux
Figure 3: Chain-loading from iPXE into Windows
Practical considerations and best practices
Booting from SAN using FSx for ONTAP in AWS brings many of the same planning and operational considerations found in traditional on-premises SAN environments, along with some cloud-specific best practices.
One important factor to address is operating system licensing. Boot volumes are often cloned, thus each instance must meet its respective licensing requirements—especially for commercial OSs such as Microsoft Windows.
Storage placement is also crucial. Unless there’s another specific need, it’s best to place both the boot and data volumes for a given EC2 instance on the same FSx for ONTAP file system. This makes sure of optimal data locality and consistent performance.
Another best practice is to avoid overloading a single FSx for ONTAP system with too many boot volumes. In large-scale recovery scenarios—commonly referred to as a “boot storm”—this can lead to delays in boot times. Fortunately, unlike traditional on-premises arrays, there’s typically no meaningful cost difference between distributing the same amount of storage across multiple FSx for ONTAP systems in AWS. This means you can scale out horizontally without incurring significant cost penalties, thus making sure that you avoid a boot storm. Consider a moderately sized FSx for ONTAP that has 50 TB of SSD. By default, without provisioning any IOPs, it can reach up to 150,000 IOPs. If the array supports SAN boot for 200 servers, then during boot phase each server would average 750 IOPs per second, over six times the speed of an average HDD. Because you have co-located applications and boot, until the servers have booted there should be no application related IO to contend for boot times.
To prevent disruptions, make sure that multipathing is correctly configured and validated for all iSCSI-attached boot volumes. Reliable path failover is essential for both performance and resilience.
Finally, it’s critical to test your HA and failover configuration before moving into production. You can simulate a failover event by temporarily increasing the throughput capacity of your FSx for ONTAP system. This triggers a non-disruptive controller failover, allowing you to verify multipath handling and OS resilience. When it’s validated, you can scale the throughput back down as needed.
How to get started
There are quite a few steps to setup your OS to SAN boot in AWS from FSx for ONTAP. The best way to get started is to explore the iPXE website page dedicated to AWS, and/or contact AWS directly if your organization wants to implement SAN boot. For large migrations to SAN boot from existing environments either on-premises or in an existing AWS environment, Cirrus Data, an AWS partner, automates all the steps including iSCSI, multipathing configuration, provisioning, and LUN mapping, all of which is no small feat at scale. You can see how easy SAN boot support becomes with Cirrus Data’s automated tool in the following figure. If you are interested in learning more about Cirrus, I recently published a post about using Cirrus for seamless VMware migration to Amazon EC2, of which migration for SAN boot is an option.
Figure 4: Migrate boot volumes to Amazon EC2 (SAN Boot from either Amazon EBS or FSx for ONTAP Block)
Is SAN boot from FSx for ONTAP right for you?
Booting from SAN using FSx for ONTAP block storage isn’t for most AWS environments, but for organizations that have historically relied on SAN boot to streamline DR or orchestrate large-scale infrastructure with consistent OS images, this capability is now available in AWS. If you’re managing large fleets of EC2 instances that need HA, cross-AZ failover or AWS Region replication, FSx for ONTAP allows you to significantly lower boot volume costs while streamlining failover and DR workflows. In short, you can use proven SAN boot strategies in cloud at AWS scale and AWS resiliency.