Compute | AWS HPC Blog

AWS Batch Dos and Don’ts: Best Practices in a Nutshell

AWS Batch is a service that enables scientists and engineers to run computational workloads at virtually any scale without requiring them to manage a complex architecture. In this blog post, we share a set of best practices and practical guidance devised from our experience working with customers in running and optimizing their computational workloads. The readers will learn how to optimize their costs with Amazon EC2 Spot on AWS Batch, how to troubleshoot their architecture should an issue arise and how to tune their architecture and containers layout to run at scale.

Running the Harmonie numerical weather prediction model on AWS

The Danish Meteorological Institute (DMI) is responsible for running atmospheric, climate and ocean models covering the kingdom of Denmark. We worked together with the DMI to port and run a full numerical weather prediction (NWP) cycling dataflow with the Harmonie Numerical Weather Prediction (NWP) model to AWS. You can find a report of the porting and operational experience in the ACCORD community newsletter. In this blog post, we expand on that report to present the initial timing results from running the forecast component of Harmonie model on AWS. We also present these as-is timing results together with as-is timings attained on the supercomputing systems based on Cray XC40 and Intel Xeon based Cray XC50.

Cost-optimization on Spot Instances using checkpoint for Ansys LS-DYNA

A major portion of the costs incurred for running Finite Element Analyses (FEA) workloads on AWS comes from the usage of Amazon EC2 instances. Amazon EC2 Spot Instances offer a cost-effective architectural choice, allowing you to take advantage of unused EC2 capacity for up to a 90% discount compared to On-Demand Instance prices. In this post, we describe how you 0can run fault-tolerant FEA workloads on Spot Instances using Ansys LS-DYNA’s checkpointing and auto-restart utility.

Quantum Chemistry Calculation with FHI-aims code on AWS

This article was contributed by Dr. Fabio Baruffa, Sr. HPC and QC Solutions Architect at AWS, and Dr. Jesús Pérez Ríos, Group Leader at the Fritz Haber Institute, Max-Planck Society. Introduction Quantum chemistry – the study of the inherently quantum interactions between atoms forming part of molecules – is a cornerstone of modern chemistry. […]

New: Introducing AWS ParallelCluster 3

Running HPC workloads, like computational fluid dynamics (CFD), molecular dynamics, or weather forecasting typically involves a lot of moving parts. You need a hundreds or thousands of compute cores, a job scheduler for keeping them fed, a shared file system that’s tuned for throughput or IOPS (or both), loads of libraries, a fast network, and […]

Supporting climate model simulations to accelerate climate science

The Amazon Sustainability Data Initiative (ASDI), AWS is donating cloud resources, technical support, and access to scalable infrastructure and fast networking providing high performance computing solutions to support simulations of near-term climate using the National Center for Atmospheric Research (NCAR) Community Earth System Model Version 2 (CESM2) and its Whole Atmosphere Community Climate Model (WACCM). In collaboration with ASDI, AWS, and SilverLining, a nonprofit dedicated to ensuring a safe climate, the National Center for Atmospheric Research (NCAR) will run an ensemble of 30 climate-model simulations on AWS. The climate runs will simulate the Earth system over the period of years 2022-2070 under a median scenario for warming and make them available through the AWS Open Data Program. The simulation work will demonstrate the ability to use cloud infrastructure to advance climate models in support of robust scientific studies by researchers around the world and aims to accelerate and democratize climate science.

High Burst CPU Compute for Monte Carlo Simulations on AWS

Playtech mathematicians and game designers need accurate, detailed game play simulation results to create fun experiences for players. While software developers have been able to iterate on code in an agile manner for many years, for non-analytical solutions, mathematicians have had to rely on slow CPU-bound Monte-Carlo simulations, waiting, as software engineers once did, many hours or overnight to get the results of their latest changes. These statistics are also required as evidence of game fairness in the highly regulated online gaming business. Playtech has developed an AWS Lambda Serverless based solution that provides massive burst compute performance that allows game simulations in minutes rather than hours. This post goes into the details of the architecture, as well as some examples of using the system in our development and operations.

Bare metal performance with the AWS Nitro System

High Performance Computing (HPC) is known as a domain where applications are well-optimized to get the highest performance possible on a platform. Unsurprisingly, a common question when moving a workload to AWS is what performance difference there may be from an existing on-premises “bare metal” platform. This blog will show the performance differential between “bare metal” instances and instances that use the AWS Nitro hypervisor is negligible for the evaluated HPC workloads.

Scalable and Cost-Effective Batch Processing for ML workloads with AWS Batch and Amazon FSx

Batch processing is a common need across varied machine learning use cases such as video production, financial modeling, drug discovery, or genomic research. The elasticity of the cloud provides efficient ways to scale and simplify batch processing workloads while cutting costs. In this post, you’ll learn a scalable and cost-effective approach to configure AWS Batch Array jobs to process datasets that are stored on Amazon S3 and presented to compute instances with Amazon FSx for Lustre.

Building highly-available HPC infrastructure on AWS

In this blog post, we will explain how to launch highly available HPC clusters across an AWS Region. The solution is deployed using the AWS Cloud Developer Kit (AWS CDK), a software development framework for defining cloud infrastructure in code and provisioning it through AWS CloudFormation, hiding the complexity of integration between the components.

Category: Compute