AWS Public Sector Blog
Downscaled CMIP5, 1950 US Census, and open genomics data for Galaxy: The latest open data on AWS
The AWS Open Data Sponsorship Program makes high-value, cloud-optimized datasets publicly available on Amazon Web Services (AWS). We work with data providers to: democratize access to data by making it available to the public for analysis on AWS; develop new cloud-native techniques, formats, and tools that lower the cost of working with data; and encourage the development of communities that benefit from access to shared datasets.
Our full list of publicly available datasets are on the Registry of Open Data on AWS. This quarter, we released 13 new or updated datasets including CMIP5, 1950s US Decennial Census, and open genomics data for Galaxy. Read on for some highlights among the new datasets:
CMIP5 UWPD Dataset
 The National Oceanic and Atmospheric Administration (NOAA) released the Coupled Model Intercomparison Project Phase 5 (CMIP5) University of Wisconsin-Madison Probabilistic Downscaling (UWPD) Dataset. As shown in the CMIP5 UWPD documentation, statistically downscaling this dataset increases ease of usage by “weeding out” lesser probable forecasts based on CMIP5 climate models, allowing users to visualize weather events like local and regional storm fronts more easily. UWPD adds daily precipitation, as well as maximum and minimum temperature to the dataset. Learn more about the CMIP5 UWPD dataset.
1950 US Decennial Census
 On April 1, 2022, the US National Archives and Records Administration (NARA) made the complete 1950 Census available to the public via AWS. Kept confidential for 72 years, the 1950 Census contains information about individuals living in the United States during the pivotal post-WWII time. Details for accessing the full dataset can be found on the 1950 Census Registry of Open Data page. Read more about the 1950 Census on the AWS Public Sector Blog and at the National Archives website.
Open bioinformatics reference data for Galaxy from Galaxy and Bioconductor Projects
 Galaxy is an open-source platform that enables users to apply diverse bioinformatics tools through a user-friendly graphical web interface. To use many diverse tools in concert, Galaxy provides the references and indexes required for these tools seamlessly to their users. With the onboarding of these valuable references to the Registry of Open Data on AWS, these data can now be readily consumed by any Galaxy server with high availability and scaleability. In addition, in collaboration with Bioconductor Projects, the Galaxy resource in the Registry of Open Data also contains data experiment packages that includes sample datasets and experimental outcomes for analyses as diverse as single cell genomics to RNA sequencing. Learn more about the open bioinformatics reference data for Galaxy dataset.
Find these and other recently released datasets in the latest What’s New.
We’re excited to see how you can put these great datasets to work. If you have examples of tutorials, applications, tools, or publications that use these datasets, make sure to list them on the Registry of Open Data on AWS so the community can find them. Learn how to propose your dataset to the AWS Open Data Sponsorship Program and learn more about open data on AWS.
Read related stories about AWS and open data:
- Street-scale global maps, orca sounds, and COVID-19 detection data: The latest open data on AWS
- How to set up Galaxy for research on AWS using Amazon Lightsail
- AWS hosts new open dataset to help businesses identify climate finance risks and investments
- Predicting global biodiversity patterns in Costa Rica with ecosystem modeling on AWS
- Bringing world-class satellite imagery to smallholder farmers with open data
- Introducing 10 minute cloud tutorials for research
Subscribe to the AWS Public Sector Blog newsletter to get the latest in AWS tools, solutions, and innovations from the public sector delivered to your inbox, or contact us.
Please take a few minutes to share insights regarding your experience with the AWS Public Sector Blog in this survey, and we’ll use feedback from the survey to create more content aligned with the preferences of our readers.
