AWS Storage Blog

How a customer reduced storage TCO by 28% with Amazon FSx for NetApp ONTAP

Organizations with multiple branch offices face significant challenges in managing distributed file systems, particularly when dealing with traditional on-premises infrastructure. The complexity of maintaining seamless file sharing across geographically dispersed locations while ensuring robust security, efficient data management, and reliable authentication has become increasingly challenging in today’s digital landscape. Amazon FSx for NetApp ONTAP addresses these challenges by providing a fully managed, cloud-native solution that delivers high-performance file storage with built-in replication, automated data synchronization, and intelligent caching capabilities.

Branch office operations today face multiple challenges in managing distributed file systems, primarily centered around data consistency, security, scalability, and network performance. Security concerns are paramount as organizations must protect sensitive data across all locations while maintaining regulatory compliance through encryption and unified access controls. Scalability issues emerge as businesses expand, requiring flexible infrastructure that can accommodate growing storage needs and new branch locations without excessive hardware investments. Network bandwidth limitations significantly impact performance, as branches often experience varying connectivity levels, affecting file access speeds and real-time synchronization capabilities. These challenges are compounded by the need to maintain data consistency across locations, prevent version conflicts, and ensure effective backup and disaster recovery processes while managing bandwidth costs and maintaining optimal user experience across all branch offices.

Solution

This blog discusses the deployment of Amazon FSx for NetApp ONTAP Multi-AZ cluster across two availability zones, ensuring high availability and compliance with AWS Well-Architected Framework (as shown in Figure 1). Next, we’ll establish the network infrastructure by configuring VPCs and subnets across multiple Availability Zones (AZs), implementing the appropriate route tables and security groups for FSxN access, including setting up SMB endpoints, and integrating with Active Directory for authentication. Then, we’ll proceed with the on-premises configuration by deploying ONTAP FlexCache volumes in the VMware vSphere environment, which involves setting up a high availability (HA) pair with interconnect network, configuring data, management, and cluster networks, implementing a Mediator VM for automatic failover, and establishing datastore allocation with anti-affinity rules in VMware. Finally, we’ll conduct the data migration phase by transferring an initial test set of user data (several GBs) to validate access permissions and performance metrics, ensuring the solution meets required operational standards before proceeding with full-scale migration.

Distributed File System with Amazon FSx for ONTAP

Figure 1: Distributed file system with Amazon FSx ofr NetApp ONTAP architecture

The solution architecture centers on Amazon FSx for NetApp ONTAP deployed in a Multi-AZ configuration, providing robust high availability and resilient performance across AWS Availability Zones. Branch offices connect securely to FSxN through established VPN connections, enabling seamless integration with cloud storage resources. The implementation leverages NetApp FlexCache technology to optimize data access through local caching, reducing egress costs and enhancing performance by serving frequently accessed data from local cache. Data migration is streamlined through unified interface, utilizing NetApp SnapMirror technology to ensure efficient and consistent data transfer from on-premises environments to AWS.

Figure 2 indicates performance testing comparing ONTAP Select caching against local file server access demonstrates comparable latency metrics at 300Mbps bandwidth. For a 50MB file transfer, the ONTAP Select cache backed by FSx for NetApp ONTAP showed a latency of 26.09 seconds, while the local on-premises file server completed the same transfer in 24.20 seconds. This minimal difference in performance validates that ONTAP Select caching provides near-local access speeds while offering the benefits of cloud storage, making it an effective solution for branch office file access. The results indicate that users experience similar performance to local file servers while organizations gain the advantages of centralized cloud storage management, improved data protection, and simplified infrastructure maintenance through Amazon FSx for NetApp ONTAP.

Performance comparison

Figure 2: Performance comparison

Latency considerations

The performance analysis (Figure 3) demonstrates a crucial inflection point in FlexCache’s write-back mode efficiency, particularly relevant for branch office deployments using FSx for NetApp ONTAP. Based on NetApp’s testing, the optimal benefits of write-back caching begin to materialize when the latency between cache and origin exceeds 8ms, though this threshold can vary depending on specific workload characteristics. The testing, conducted using NFSv3 with 256KB read/write sizes and 64ms WAN latency, reveals that FlexCache’s effectiveness is directly correlated to both file size and network latency conditions. This insight is particularly valuable for organizations planning branch office deployments, suggesting that FlexCache implementation decisions should be based on careful consideration of typical file sizes, network conditions, and workload patterns. For branch offices experiencing higher latency to the FSx for NetApp ONTAP origin, write-back caching can significantly improve performance, while locations with low latency might benefit more from direct access. This understanding helps organizations optimize their caching strategy and achieve the best balance between performance and resource utilization in their distributed file system architecture.

Elapsed time vs size

Figure 3: Elapsed time vs size for different targets

NetApp ONTAP FlexCache write-back prerequisites

  • CPU & Memory – It is strongly recommended that each origin cluster node have at least 128GB of RAM and 20 CPUs to absorb the write-back messages initiated by write-back enabled caches. This is the equivalent of an A400 or greater. If the origin cluster serves as the origin to multiple write-back enabled FlexCaches, it will require more CPU and RAM. For a scale-up architecture, Amazon FSx for NetApp ONTAP requires minimum baseline specifications: starting with a Standard deployment type with at least 1,024 GiB (1 TB) of SSD storage capacity, supporting up to 2 GB/s of throughput with automatic storage tiering to capacity pool.
  • ONTAP Version – The origin must be running ONTAP 9.15.1 or later. Any caching cluster that needs to operate in write-back mode must be running ONTAP 9.15.1 or later. Any caching cluster that does not need to operate in write-back mode can run any generally supported ONTAP version.
  • Licensing – FlexCache, including the write-back mode of operation, is included with your ONTAP purchase which will be at actuals. No extra license is required.
  • Peering – The origin and cache clusters must be cluster peered. The server virtual machines (SVMs) on the origin and cache cluster must be vserver peered with the FlexCache option.

This solution offers improved scalability, enhanced security features, and better performance optimization capabilities. Amazon FSx for NetApp ONTAP serves as a comprehensive solution for distributed file systems, offering a unified global namespace that provides seamless data access across all branch locations while implementing intelligent caching through FlexCache technology to optimize performance and reduce bandwidth consumption. The service incorporates automated storage tiering between high-performance SSD and cost-effective capacity pools, maximizing storage efficiency and cost optimization. Security is enhanced through end-to-end encryption, integration with AWS Key management Service (KMS), and granular access controls via Active Directory integration, while built-in data protection features include automatic backups, cross-region replication, and disaster recovery capabilities through NetApp SnapMirror technology. The solution enables policy-based data management for improved governance and compliance, allowing organizations to implement consistent data policies across their distributed environment while maintaining regulatory requirements through features like NetApp FPolicy and audit logging.

NetApp Harvest

NetApp Harvest serves as a comprehensive monitoring solution for Amazon FSx for NetApp ONTAP, extending beyond the basic metrics available in Amazon CloudWatch to provide detailed performance analytics and system insights. When integrated with Grafana for visualization and NetApp Cloud Insights for advanced analytics, Harvest collects and processes granular ONTAP performance metrics that are typically only accessible through the ONTAP CLI, including detailed storage efficiency ratios, protocol-specific metrics, and in-depth volume performance statistics. The tool enables administrators to monitor critical performance indicators such as IOPS, latency, throughput at various levels (SVM, volume, LUN), cache hit ratios, and storage efficiency savings, while also providing real-time monitoring of system health, capacity utilization, and performance bottlenecks. This enhanced visibility helps organizations optimize their FSx for ONTAP deployments, make data-driven capacity planning decisions, and proactively address potential performance issues before they impact business operations. To learn more, please see Monitoring FSx for ONTAP file systems with Harvest and Grafana and the link NetApp Harvest on Github.

Estimating the TCO of FSx for ONTAP in comparison to on-premises storage system in a hybrid use case

Are you looking to optimize both performance and cost for your storage solution? Here’s the configuration implemented: FSx for ONTAP (FSxN) in a Multi-AZ environment, featuring a smart combination of SSD and capacity tiering with a 30-70 split rule. Set it up with a robust 256MBps throughput, striking the perfect balance between technical efficiency and cost-effectiveness. To get a clear picture of the costs, simply plug in a 100TB scenario into the AWS pricing calculator with these specifications – it’s a good starting point to understand the financial implications of this high-performance, cost-optimized setup. This configuration help ensure that you are striking the ideal balance of storage cost while maintaining the performance levels your applications demand.

Conclusion

Amazon FSx for NetApp ONTAP emerges as a transformative solution for organizations struggling with distributed file system management across branch offices. This solution demonstrates its effectiveness through three key aspects: 1/ The implementation of Multi-AZ deployment with FlexCache technology delivers near-local access speeds, as evidenced by performance testing showing only a second difference in latency compared to local file servers for 50MB transfers. 2/ The write-back caching optimization shows particular effectiveness when latency exceeds 8ms, making it ideal for geographically dispersed branch offices. And 3/ The solution’s comprehensive monitoring through NetApp Harvest, combined with cost-effective storage tiering (30-70 split between SSD and capacity storage) and throughput of 256MBps, provides an optimal balance of performance and cost efficiency. This architecture not only resolves traditional branch office challenges around data consistency, security, and scalability but also offers a future-proof pathway for organizations transitioning to hybrid cloud environments while maintaining enterprise-grade performance and security standards.

Sachin Bawse

Sachin Bawse

Sachin serves as a GTM Specialist Storage Solution Architect at Amazon Web Services (AWS), where he specializes in optimizing storage solutions, facilitating migrations, and enhancing workload performance for customers. Sachin is an avid explorer who enjoys discovering new destinations, immersing himself in diverse cultures, and experiencing different cuisines

Vishnu Vashist

Vishnu Vashist

Vishnu Vashist serves as a Partner Success Solutions Architect in Partner Org (AWS), where he takes care of HCLS patch of accounts along with Travel, Transport and Logistics. He specializes in migration and modernization, supporting large scale migrations to AWS and guiding customers and partners on solution architecture and designs of infrastructure on AWS.