AWS for M&E Blog
How Amazon S3 Glacier Instant Retrieval can simplify your content library supply chain
Media organizations around the world use Amazon Simple Storage Service (Amazon S3) to store extensive content libraries because it allows them to scale resources to meet fluctuating needs and optimize storage cost. At re:Invent 2021, AWS launched the Amazon S3 Glacier Instant Retrieval archive storage class that delivers the lowest cost storage for long-lived, rarely accessed data that requires milliseconds retrieval. This simplifies how content archives are used, especially in news, sports, and content creation spaces.
Challenges with optimizing your ever-expanding media archives
Every day, news and sports organizations restore hundreds of pieces of content—such as highlights from past sporting events, clips of every speech given by former government officials, and anniversary clips of historical importance—to develop stories that celebrate accomplishments and connect history to current day events. With breaking news events like a president’s intention to run for re-election, the passing of a significant figure, or an athlete’s intention to retire, news organizations fight the clock to curate endearing and meaningful stories from hundreds of thousands of clips within minutes of the event or broadcast.
Decades of content reside in these archives, resulting in hundreds of petabytes of data. Amazon S3 has become the storage of choice because of its easy-to-use management features and cost-effective storage classes to organize data and optimize costs. For example, the S3 Glacier Deep Archive storage class delivers the lowest cost storage in the cloud, at prices significantly lower than storing and maintaining data in on-premises magnetic tape libraries or archiving data off-site. Additionally, to keep costs low but still suitable for varying retrieval needs, the S3 Glacier Flexible Retrieval storage class (formerly S3 Glacier) provides three options to access data, from a few minutes to several hours. But it’s not all about costs for content library supply chain functions—performance, availability, and data durability are all equally as important for news and sports organizations.
In the past, media organizations had to choose between the S3 Standard-Infrequent Access (S3 Standard-IA) storage class if a particular piece of content required milliseconds retrieval, or the S3 Glacier Flexible Retrieval storage class for the tradeoff of longer retrieval times of minutes to hours to get a lower storage cost. For example, producers and editors can wait to access certain pieces of content like assets scheduled to distribute at a specific time or to a specific licensee (for example, episodic content or anniversary flashbacks). In this scenario, customers use the S3 Glacier Flexible Retrieval and S3 Glacier Deep Archive storage classes because immediate access is not required, and they can further optimize cost. However, sometimes content requires faster access to storage, when being the first to report a news story is of paramount importance. In these scenarios, customers typically select the S3 Standard-IA storage class at a higher cost to have rapid access to storage, or optimize their supply chain through the use of proxies. S3 Standard-IA is the ideal storage class if you plan on accessing your data about once every month or two.
Proxies are lower quality and therefore much smaller renditions of the original content that can support basic editing and screening needs. Customers commonly use AWS Elemental MediaConvert to create these proxies of the original mezzanine file to be stored in S3 Standard or S3 Standard-IA, with the higher quality version archived to S3 Glacier Flexible Retrieval. It looks something like this:
 
 
        Figure 1: Media organizations often develop a serverless workflow to generate proxy as part of their content ingestion workflow
The workaround empowers producers and editors to quickly curate video segments without delay while retrieving only the full resolution media files required for a particular news story by using expedited retrievals in S3 Glacier Flexible Retrieval, which can restore content in 1-5 minutes. Expedited retrievals in S3 Glacier Flexible Retrieval are available at an additional cost based on the number of requests and a per GB fee for every gigabyte of data returned. Customers can also use provisioned capacity units with expedited retrievals when their workload requires highly reliable and predictable access to a subset of data in minutes. However, in situations like covering breaking news, news desk producers and editors may require faster, more immediate access to their content libraries to curate clips and develop stories. With these use cases, speed and agility are essential as customers “look around corners” to find ways to optimize operations and cost.
Optimizing storage cost with Amazon S3 Glacier Instant Retrieval
To recap, media organizations are accumulating petabytes of archive data where that data must be immediately accessible but is only accessed a few times a year. Many of these organizations have a very similar data profile, where content libraries are often very large, consisting of multiple petabytes, are stored in perpetuity and yet typically only a small percentage is ever accessed—just a few times a year at most. And, in the event of a breaking news cycle, media organizations require that certain content be immediately accessible so editors and news rooms can act quickly.
With the launch of S3 Glacier Instant Retrieval, media customers can lower storage cost when storing long-lived, rarely accessed content, while still having the ability to access this content instantly. Customers using S3 Standard-IA today can save up to 68% on storage costs by using S3 Glacier Instant Retrieval, in exchange for higher data access costs. Now customers can choose from three S3 Glacier archive storage classes optimized for different access patterns and storage duration. For certain content, like final cut feature films and episodic content that can be retrieved in minutes to hours for distribution, organizations should continue to use S3 Glacier Flexible Retrieval, or the S3 Glacier Deep Archive storage class as it delivers the lowest cost storage in the cloud. S3’s archive storage classes are all designed for 99.999999999% (11 9s) of durability by redundantly storing data across a minimum of three physically separated AWS Availability Zones.
Conclusion
By using Amazon S3 Glacier Instant Retrieval, you get the lowest storage cost with milliseconds retrieval for your continuously growing content libraries. You can get started with S3 Glacier Instant Retrieval with just a few clicks in the S3 Console. You can upload data directly into S3 Glacier Instant Retrieval through the Amazon S3 API or AWS Command Line Interface (CLI), or use S3 Lifecycle to transition data from the S3 Standard and S3 Standard-IA storage classes into S3 Glacier Instant Retrieval.
To learn more about S3 Glacier Instant Retrieval, visit the AWS News blog post, the storage class page, and the Amazon S3 Developer guide.
If you have any comments or questions, feel free to leave them in the comments section.