AWS Public Sector Blog

66 new or updated datasets available on the Registry of Open Data on AWS

AWS Branded Background with text "66 new or updated datasets available on the Registry of Open Data on AWS"

The AWS Open Data Sponsorship Program makes high-value, cloud-optimized datasets publicly available on Amazon Web Services (AWS). AWS works with data providers to democratize access to data by making it available to the public for analysis on AWS, develop new cloud-based techniques, formats, and tools that lower the cost of working with data, and encourage the development of communities that benefit from access to shared datasets. Through the AWS Open Data Sponsorship Program, customers are making over 300 PB of high-value, cloud-optimized data available for public use.

All publicly available datasets can be found in the Registry of Open Data on AWS and are now also discoverable on Exchange. This quarter, AWS released 66 new or updated datasets.

What are people currently doing with AWS Open Data?

Workshops and Tutorials on leveraging Open Datasets

  • Our blog on using AWS Open Data in Amazon Bedrock shows how to use open data as a knowledge base in Bedrock applications. The post discusses how you can make technical information, like precipitation and snow depth, available to a set of users that might not be comfortable with SQL commands or other tools commonly used to search these types of data. Now nontechnical decision-makers can have access to highly technical data in an accessible and understandable format through a chat-based assistant.
  • At the recent Human Cell Atlas General Meeting in Singapore, we launched a workshop on Single-cell Omics on the Open Data Program, teaching researchers how to analyze single-cell data from public datasets using AWS HealthOmics. The workshop also demonstrates how to use public datasets to build a knowledge base and analyze datasets through Amazon Bedrock. As a part of this workshop, we released a Jupyter notebook—Accessing AWS Open Data Using Boto3—that demonstrates how to programmatically access and analyze datasets with Python’s boto3 library.
  • Similarly, the “Working with NOAA satellite data in the AWS Open Data Program” workshop shows how to visualize forest fire detection data using Amazon SageMaker. These intermediate-level tutorials guide researchers and data scientists through accessing, processing, and analyzing open datasets on AWS, demonstrating how to use cloud services effectively for scientific research.
  • The AWS Open Data team has published three how-to guides to help users work with open datasets, all available in the aws-opendata-samples GitHub repository. These include:

What can you build with these datasets?

Brain Encoding Response Generator (BERG)

The Brain Encoding Response Generator (BERG) dataset from University of California, Berkeley provides comprehensive brain encoding responses, offering researchers valuable data for neuroscience research and artificial intelligence applications in understanding human brain activity patterns.

Brain Encoding Response Generator (BERG) joins 65 other new or updated datasets on the Registry of Open Data in the following categories.

Climate and weather

Geospatial

Life sciences

Machine learning

How can you make your data available?

Looking to make your data available? The AWS Open Data Sponsorship Program covers the cost of storage for publicly available high-value, cloud-optimized datasets. We work with data providers who seek to:

  • Democratize access to data by making it available for analysis on AWS
  • Develop new cloud-native techniques, formats, and tools that lower the cost of working with data
  • Encourage the development of communities that benefit from access to shared datasets

Learn how to propose your dataset to the AWS Open Data Sponsorship Program.

Learn more about open data on AWS.