AWS Storage Blog
Category: AWS Glue
Cloud-powered tick data: revolutionizing financial data storage with Amazon S3 and LSEG
Data has become the lifeblood of modern financial markets, driving everything from investment decisions to regulatory compliance. Nowhere is this more evident than in electronic trading, where the ability to efficiently store, process, and analyze historical market data can make the difference between success and failure. Market participants are witnessing an unprecedented surge in tick […]
From raw to refined: building a data quality pipeline with AWS Glue and Amazon S3 Tables
Organizations often struggle to extract maximum value from their data lakes when running generative AI and analytics workloads due to data quality challenges. Although data lakes excel at storing massive amounts of raw, diverse data, they need robust governance and management practices to prevent common quality issues. Without proper data validation, cleansing processes, and ongoing […]
How to consume tabular data from Amazon S3 Tables for insights and business reporting
When was the last time you found yourself trying to look at rows and rows of data in a spreadsheet struggling to interpret and draw conclusions? Many analysts and engineers experience the same challenge every day. Whether it’s analyzing sales trends, monitoring operational metrics, or understanding customer behavior, the challenge lies not just in interpreting […]
How Pendulum achieves 6x faster processing and 40% cost reduction with Amazon S3 Tables
Pendulum is an AI-powered analytics platform that aggregates and analyzes real-time data from social media, news, and podcasts. Designed to help organizations stay ahead, it enables reputation monitoring, early crisis detection, and influencer activity tracking. Using machine learning (ML) enables Pendulum to surface key insights from multiple channels, providing a comprehensive view of the digital […]
Bringing more to the table: How Amazon S3 Tables rapidly delivered new capabilities in the first 5 months
Amazon S3 redefined data storage when it launched as the first generally available AWS service in 2006 to deliver highly reliable, durable, secure, low-latency storage with virtually unlimited scale. While designed to deliver simple storage, S3 has proven to be built to handle the explosive growth of data we have seen in the last 19 […]
Connect Snowflake to S3 Tables using the SageMaker Lakehouse Iceberg REST endpoint
Organizations today seek data analytics solutions that provide maximum flexibility and accessibility. Customers need their data to be readily available using their preferred query engines, and break down barriers across different computing environments. At the same time, they want a single copy of data to be used across these solutions, to track lineage, be cost […]
Build a data lake for streaming data with Amazon S3 Tables and Amazon Data Firehose
Businesses are increasingly adopting real-time data processing to stay ahead of user expectations and market changes. Industries such as retail, finance, manufacturing, and smart cities are using streaming data for everything from optimizing supply chains to detecting fraud and improving urban planning. The ability to use data as it is generated has become a critical […]
Access data in Amazon S3 Tables using PyIceberg through the AWS Glue Iceberg REST endpoint
Modern data lakes integrate with multiple engines to meet a wide range of analytics needs, from SQL querying to stream processing. A key enabler of this approach is the adoption of Apache Iceberg as the open table format for building transactional data lakes. However, as the Iceberg ecosystem expands, the growing variety of engines and languages has […]
How Amazon Ads uses Iceberg optimizations to accelerate their Spark workload on Amazon S3
In today’s data-driven business landscape, organizations are increasingly relying on massive data lakes to store, process, and analyze vast amounts of information. However, as these data repositories grow to petabyte scale, a key challenge for businesses is implementing transactional capabilities on their data lakes efficiently. The sheer volume of data requires immense computational power and […]
How Delhivery migrated 500 TB of data across AWS Regions using Amazon S3 Replication
Delhivery is one of the largest third-party logistics providers in India. It fulfills millions of packages every day, servicing over 18,000 pin codes in India and powered by more than 20 automated sort centers, 90 warehouses, with over 2800 delivery centers. Data is at the core of the Delhivery’s business. In anticipating of potential regulatory […]