Artificial Intelligence

Category: AWS Glue

Data visualization and anomaly detection using Amazon Athena and Pandas from Amazon SageMaker

Many organizations use Amazon SageMaker for their machine learning (ML) requirements and source data from a data lake stored on Amazon Simple Storage Service (Amazon S3). The petabyte scale source data on Amazon S3 may not always be clean because data lakes ingest data from several source systems, such as like flat files, external feeds, […]

Access Amazon S3 data managed by AWS Glue Data Catalog from Amazon SageMaker notebooks

In this blog post, I’ll show you how to perform exploratory analysis on massive corporate data sets in Amazon SageMaker. From your Jupyter notebook running on Amazon SageMaker, you’ll identify and explore several corporate datasets in the corporate data lake that seem interesting to you. You’ll discover that each contains a subset of the information you need. You’ll join them to extract the interesting information, then continue analyzing and visualizing your data in your Amazon SageMaker notebook, in a seamless experience.