Skip to main content

Overview

This Guidance demonstrates how to use pgvector and Amazon Aurora PostgreSQL for sentiment analysis, a powerful natural language processing (NLP) task. The Guidance shows how to integrate Amazon Aurora PostgreSQL-Compatible Edition with the Amazon Comprehend Sentiment Analysis API, enabling sentiment analysis inferences through SQL commands. By using Amazon Aurora PostgreSQL with the pgvector extension as your vector store, you can accelerate vector similarity search for Retrieval Augmented Generation (RAG), delivering queries up to 20 times faster with pgvector's Hierarchical Navigable Small World (HNSW) indexing.

Important: This Guidance requires the use of AWS Cloud9 which is no longer available to new customers. Existing customers of AWS Cloud9 can continue using and deploying this Guidance as normal.

How it works

This architecture diagram shows how to generate sentiment analysis using Amazon Aurora PostgreSQL-Compatible Edition with pgvector enabled as the vector store. It details the process of integrating Amazon Aurora with an Amazon Comprehend Sentiment Analysis API and generating sentiment analysis inferences using SQL commands.

Get Started

Deploy this Guidance

Use sample code to deploy this Guidance in your AWS account
Open sample code on GitHub

Well-Architected Pillars

The architecture diagram above is an example of a Solution created with Well-Architected best practices in mind. To be fully Well-Architected, you should follow as many Well-Architected best practices as possible.

The provided CloudFormation script automates the deployment of key resources, including an Aurora PostgreSQL cluster, a SageMaker notebook instance, an AWS Cloud9 instance, virtual private cloud (VPC), subnets, security groups, and AWS Identity and Access Management (IAM) roles. This automated deployment streamlines operations, reduces manual effort, and mitigates configuration errors, promoting operational excellence.

Read the Operational Excellence whitepaper 

An IAM role integrates Aurora with Amazon Comprehend, granting the minimum required permissions. This role is associated with the Aurora cluster and does not have credentials such as passwords or access keys, enhancing security. Database user credentials are securely stored in AWS Secrets Manager, preventing unauthorized access and potential security breaches.

IAM roles and policies provide controlled access to Amazon Comprehend's sentiment analysis API from Aurora, limiting permissions to only what's necessary. This principle of least privilege approach to access management strengthens the Guidance’s security posture.

Read the Security whitepaper 

Aurora with pgvector enables storing and searching machine learning (ML)-generated embeddings while leveraging PostgreSQL features like indexing and querying. Aurora provides high availability and reliability by maintaining six copies of data across three Availability Zones, with read replicas and global database replication options.

Use Aurora with pgvector as the vector store offers vector capabilities combined with data reliability and durability, eliminating the need to move data across separate vector stores. Aurora's resiliency features and pgvector's capabilities allow you to use an existing relational database as a vector store, seamlessly integrating with artificial intelligence (AI) and ML services like Amazon Comprehend and SageMaker.

Read the Reliability whitepaper 

Aurora PostgreSQL with pgvector offers optimized storage, compute resources, and vector indexing capabilities within the relational database, helping ensure efficient workload performance. Aurora Optimized Reads can boost vector search performance with pgvector by up to nine times for workloads, exceeding regular instance memory. Aurora with pgvector not only provides vector search, indexing, and sentiment analysis capabilities but also features for optimal query performance, combining the benefits of a relational database with vector capabilities.

Read the Performance Efficiency whitepaper 

SageMaker offers Savings Plans, reducing costs by up to 64 percent, in addition to flexible on-demand pricing for Studio notebooks, notebook instances, and inference. Using the AWS Cloud9 IDE instead of dedicated Amazon Elastic Compute Cloud (Amazon EC2) instances further decreases costs. Additionally, Amazon Comprehend API's pay-per-use model optimizes expenses. These services provide cost-effective options through on-demand and Savings Plans to help you align with your budget.

Read the Cost Optimization whitepaper 

Aurora clusters on AWS Graviton instances consume up to 60 percent less energy than comparable EC2 instances while delivering the same performance and better price performance. This Guidance uses temporary resources like AWS Cloud9 and SageMaker notebooks to reduce carbon footprint. AWS Cloud9, a temporary IDE, integrates Aurora with Amazon Comprehend and generates inferences through SQL statements, further minimizing the environmental impact.

Read the Sustainability whitepaper 

Disclaimer

The sample code; software libraries; command line tools; proofs of concept; templates; or other related technology (including any of the foregoing that are provided by our personnel) is provided to you as AWS Content under the AWS Customer Agreement, or the relevant written agreement between you and AWS (whichever applies). You should not use this AWS Content in your production accounts, or on production or other critical data. You are responsible for testing, securing, and optimizing the AWS Content, such as sample code, as appropriate for production grade use based on your specific quality control practices and standards. Deploying AWS Content may incur AWS charges for creating or using AWS chargeable resources, such as running Amazon EC2 instances or using Amazon S3 storage.