Skip to main content

AWS Glue DataBrew

Clean and normalize data up to 80% faster

Introducing AWS Glue DataBrew

AWS Glue DataBrew is a visual data preparation tool that makes it easier for data analysts and data scientists to clean and normalize data to prepare it for analytics and machine learning (ML). You can choose from over 250 prebuilt transformations to automate data preparation tasks, all without the need to write any code. You can automate filtering anomalies, converting data to standard formats and correcting invalid values, and other tasks. After your data is ready, you can immediately use it for analytics and ML projects. You only pay for what you use—no upfront commitment.

Capabilities

Profile

Screenshot of an AWS analytics dashboard displaying column statistics for a sample dataset, including metrics on data quality, value distribution, correlations, and unique values for the 'start station id' column. Evaluate the quality of your data by profiling it to understand data patterns and detect anomalies; connect data directly from your data lake, data warehouses, and databases.

Clean and normalize

Screenshot of an AWS data analytics tool interface displaying the process of merging latitude and longitude columns into a new 'latlong' column within a Citi Bike dataset. Shows summary statistics, column merging options, and rows of geospatial data. Choose from over 250 built-in transformations to visualize, clean, and normalize your data with an interactive point-and-click visual interface.

Map data lineage

Screenshot of the AWS DataBrew interface displaying a data lineage diagram for the 'nycCitibikes' project. The diagram visualizes datasets, a project, recipe, job, and S3 sources and outputs in a data workflow, including joins and a running job output to parquet format. Visually map the lineage of your data to understand the various data sources and transformation steps that the data has been through.

Automate

Screenshot of the AWS Glue DataBrew interface displaying the 'Create recipe job' workflow, including recipe selections and job input settings for dataset transformation. Automate data cleaning and normalization tasks by applying saved transformations directly to new data as it comes into your source system.