AWS Public Sector Blog
Data dissemination for public sector on AWS
Sharing data is essential for organizations to make informed decisions and foster innovation. Amazon Web Services (AWS) offers a variety of tools and services to help distribute data securely and at scale. Whether it’s sharing public data for the common good, monetizing private datasets for business purposes, or collaborating with colleagues, AWS provides the necessary infrastructure and support. Using the AWS Cloud allows organizations to securely share data with teams, AWS Partners, and users, thus enabling them to extract valuable insights and drive growth. AWS allows data sharing to become not only a requirement but also a strategic advantage in navigating the complexities of today’s data-driven landscape.
AWS Open Data
Open data initiatives are gaining traction as more people see the benefits of sharing data easily. AWS supports this movement with its AWS Open Data program. The program is a place where a wide range of datasets are stored and shared with the public through the Registry of Open Data on AWS and the cost of storage is covered with the AWS Open Data Sponsorship Program.
The Registry of Open Data on AWS catalogs a variety of public datasets such as government data, scientific research, life sciences, climate, satellite imagery, geospatial, and genomic data.
AWS Open Data encourages collaboration and innovation through its Registry of Open Data, where users can contribute datasets using AWS’s reliable and secure infrastructure. People can easily access this valuable data for research, analysis, and building applications without needing to download or store it themselves. Through the AWS Open Data Sponsorship Program, customers can make highly valued datasets publicly available with AWS covering storage costs. Consumers of the data and data providers both can share notebooks and other analysis in the Registry page for the dataset, as well as publications and other resources that cite the data.
Benefits of AWS Open Data:
- Global impact: AWS Open Data makes your dataset available to global users, enabling collaboration and growth with global impact.
- Accelerated growth and innovation: Data residing in AWS means you can now use AWS services to quickly process, analyze, and gain insights from the data. AWS offers a range of compute, analytics, and machine learning (ML) services that enable you to run your analysis at scale next to the data.
AWS Data Exchange
AWS Data Exchange is a service that makes it simple to find, subscribe to, and use third-party data in AWS. The service provides a catalog of data products from qualified data providers, allowing organizations to quickly discover and access the data they need without the typical challenges of licensing, ingesting, and managing third-party data. AWS Data Exchange is a complete end-to-end service that facilitates the discovery, subscription, and distribution of data products. It enables users to import various datasets, including those from Open Data sources, manages subscriptions and data grants efficiently, and provides a platform for data providers to publish and monetize their data products on the AWS Marketplace. This integrated approach streamlines the entire data exchange process, from sourcing to commercialization, all within the AWS ecosystem.
One of the key benefits of using AWS Data Exchange for data dissemination is the streamlining of distribution. Data providers can seamlessly publish their data products to the AWS Data Exchange, similar to other data catalogs. Data providers can list and sell their data, while data consumers can search, subscribe to, and use these datasets in their applications and workflows. This streamlines the process of sharing data, as data consumers can subscribe to the products they need through the AWS Management Console or API, without having to negotiate individual licensing agreements or manage complex data transfer processes.
AWS Data Exchange uses the scale and security of AWS to provide reliable and consistent data delivery. Data is delivered directly from AWS, providing high availability and durability, while also benefiting from the comprehensive security controls and compliance certifications of AWS. This gives data providers the confidence that their intellectual property is protected, while data consumers can trust that the data they receive is authentic and up-to-date.
AWS Data Exchange streamlines the commercial aspects of data sharing. The service provides a standardized, pay-as-you-go pricing model, allowing data consumers to purchase and access data products. This democratizes access to valuable data resources, enabling organizations of all sizes to incorporate third-party data into their applications and analyses, ultimately driving innovation and business value.
Benefits of AWS Data Exchange:
- Extensive data collection: AWS Data Exchange is a centralized data repository providing access to 3500-plus datasets from more than 300 data providers across the globe.
- Streamlined data acquisition: AWS Data Exchange centralizes and accelerates data acquisition process. You can consolidate data ingestion across data providers using a single API.
- Native integration with AWS services: Data exchange seamlessly integrates with AWS analytical services and ML models, allowing you to rapidly extract insights from your data. It also supports AWS authentication and governance.
Storage Browser for Amazon Simple Storage Service
AWS has introduced Storage Browser for Amazon S3, a new feature that allows developers to embed a customizable file browser directly within their applications. Storage Browser for S3 is particularly useful in scenarios where users need to manage personal files or collaborate on a smaller scale. It can be integrated into open data portals or research platforms allowing easy access to large public datasets stored in S3 buckets. For instance, scientific institutions can use it to share research data, enabling other researchers to browse and access valuable information without downloading entire datasets. This promotes open science and collaboration on a global scale. This tool enables users to seamlessly access, view, and interact with data stored in Amazon S3 buckets without leaving the application environment.
Storage Browser supports common file operations and can be tailored to match an application’s branding, making it versatile for both business and consumer applications. By integrating S3 data access directly into applications, Storage Browser significantly enhances user experience and productivity, eliminating the need to switch between different interfaces to manage S3-stored files. This feature represents a notable step in making cloud storage more accessible and user-friendly within application workflows. Get started with Storage Browser today..
Benefits of Storage Browser with S3:
- Integration: Storage Browser can be embedded directly within applications, providing a native user experience that is simple to understand.
- Reduced data movement: By allowing direct access to S3 data, Storage Browser minimizes the need to duplicate or move large datasets. This is more efficient than traditional data exchange methods that often involve copying or downloading entire datasets.
- Real-time access: Users can browse, search, and interact with S3 data in real time within the application. This is more immediate than other data exchange methods which might involve requesting and waiting for data transfers.
Build-your-own (BYO) Lens on AWS
Build-your-own (BYO) Lens on AWS provides a customized solution for organizations to interpret data with a unique perspective. Developing a tailored platform, allows businesses to facilitate smooth data sharing among teams. Organizations can use specialized knowledge, experiences, and understanding of the subject matter, leading to more insightful interpretations. A BYO Lens can be tailored to specific organizational needs, priorities, and decision-making processes. This helps make sure that the insights generated are directly actionable for organizations.
BYO Lens core steps involve data preparation, analysis, and visualization. AWS provides a wide range of services that allow organizations to build these core steps. Services such as Amazon QuickSight, AWS Glue, Amazon Athena, and AWS Lambda enable data to be tailored to specific needs. AWS also offers services specific to data storage, such as Amazon S3, allowing you to integrate multiple data sources. AWS services are designed to be scalable, allowing BYO Lens workflows to handle large volumes of data. BYO Lens workflows within AWS are highly secure, protecting the confidentiality, integrity, and availability of the data and insights generated through your custom data dissemination lens.
These various AWS services can be integrated into a custom workflow outlined within a Common Data Workflow:
- Data ingestion: This process would work in tandem with users requesting information on the data dissemination side. Various custom data sources can be input as raw data to Amazon S3 and automatically sent to applications with S3 Event Notifications and Amazon EventBridge. The raw data is sent to compute applications that process the raw data and start the data transformation process.
- Data delivery and security: Users enter desired domain managed through Amazon Route53, and traffic is fed static and dynamic content managed by Amazon CloudFront with an Amazon S3 origin. The custom BYO Lens is protected from cross-site scripting, SQL injection, and other common attacks with AWS WAF.
- Data dissemination: API Requests made within static and dynamic content are forwarded to Amazon API Gateway, where the requests are authenticated with Amazon Cognito. If authentication is successful, then Lambda manages both invocation and responses for these API requests, seamlessly connecting the Data Dissemination, Data Transformation, and Data Ingestion workflows together.
- Data transformation: The compute applications would extract, transform and load the transformed data into Amazon S3 using AWS Glue. Through AWS Lambda, the requests would be sent to Athena to query data using SQL from an AWS Glue Data Catalog. The data would be extracted, transformed, and loaded into Amazon S3 with AWS Glue, and the data catalog is created during this ETL. Athena would send the queried results back to Lambda, which is sent back to the user through API Gateway to the S3 bucket origin for CloudFront.
Benefits of a BYO Lens:
- Full control: A BYO Lens allows you to have complete control over the architecture, data sources, data transformation, and dissemination workflows. This allows you to tailor the platform to your specific requirements, integrate it with your existing systems, and customize the user experience.
- AWS services full control: A BYO Lens gives you the flexibility to choose the AWS services that best fit your needs, rather than being committed to how a fully managed service is constructed in the backend.
- Security/data governance: Building your own platform allows you to have more control over data governance, access policies, and security measures. You could grant granular privileges to specific services within the architecture to different teams within your organization.
- Customization: A BYO Lens allows you to fully customize the user experience, branding, and interfaces to align with your organization’s visual identity and user requirements. This can help provide a more seamless and branded experience for your consumers. It’s simple to integrate the data dissemination workflows with your existing data pipelines and analytics tools with other business systems, creating an efficient ecosystem of services within AWS.
 
 
        Figure 1. Common data workflow with BYO Lens data dissemination. The major components are Amazon Route 53, Amazon CloudFront, Amazon S3, Amazon API Gateway, AWS Lambda, Amazon Cognito, Amazon DynamoDB, Amazon Athena, AWS Glue, Amazon EKS, Amazon ECS, and Amazon EventBridge.
Conclusion
AWS offers a comprehensive suite of tools and services for data dissemination, including AWS Open Data, AWS Data Exchange, Storage Browser for Amazon S3, and Build-your-own (BYO) Lens. These solutions cater to various data sharing needs, from public datasets to commercial data products, enabling organizations to securely and efficiently distribute data at scale. AWS Open Data supports open data initiatives, while AWS Data Exchange facilitates the discovery and use of third-party data products. The Storage Browser for Amazon S3 and BYO Lens options provide customizable solutions for embedding file browsers in applications and creating tailored data interpretation platforms, respectively, all leveraging AWS’s robust infrastructure and security features.
