AWS Database Blog

Better together: Amazon RDS for SQL Server and Amazon SageMaker Lakehouse, a generative AI data integration use case

Generative AI solutions are transforming how businesses operate worldwide, captivating users and customers across industries with their innovative capabilities. It has now become paramount for businesses to integrate generative AI capabilities into their customer-facing services and applications. The challenge they often face is the need to use massive amounts of relational data hosted on SQL Server databases to contextualize these new generative AI solutions.

In this post, we demonstrate how you can address this challenge by combining Amazon Relational Database Service (Amazon RDS) for SQL Server and Amazon SageMaker Lakehouse. These managed services work seamlessly together, enabling you to build integrated solutions that enhance generic foundation models (FMs) through Retrieval Augmented Generation (RAG). By connecting your existing SQL Server data with powerful AI capabilities, you can create more accurate, contextually relevant generative AI features for your services and applications.

Solution overview

Amazon RDS for SQL Server simplifies the process of setting up, operating, and scaling SQL Server deployments in the AWS Cloud. You can deploy multiple editions of SQL Server, including Express, Web, Standard, and Enterprise, in minutes with cost-efficient and resizable compute capacity. Amazon RDS frees you from time-consuming database administration tasks, so you can focus on application development.

SageMaker Lakehouse unifies your data across Amazon Simple Storage Service (Amazon S3) data lakes, Amazon Redshift data warehouses, and other relational data sources (like Amazon RDS for SQL Server), helping you build powerful analytics, machine learning (ML), and generative AI applications on a single copy of data. SageMaker Lakehouse provides integrated access controls and open source Apache Iceberg for data interoperability and collaboration. With SageMaker Lakehouse, you can build an open lakehouse on your existing data investments without changing your data architecture.

To support the integration between SageMaker Lakehouse and Amazon RDS, you can create a new SQL Server data source from Amazon SageMaker Unified Studio, which in turn is persisted as a federated data source in the lakehouse. The following diagram illustrates the key services involved in the solution.


Prerequisites

We assume you have familiarity with navigating the AWS Management Console. For this post, you also need the following resources and services enabled in your AWS account:

Refer to Importing and exporting SQL Server databases using native backup and restore for instructions to perform a database restore into an RDS for SQL Server instance.

Create a SageMaker Unified Studio domain

Before starting with the onboarding process, verify that the chosen VPC to onboard the SageMaker domain meets the following requirements:

  • The VPC spans across three or more Availability Zones
  • The VPC contains at least three private subnets

When the preceding requirements are met, proceed with the following steps:

  1. On the SageMaker console, choose Create a Unified Studio domain.
  2. Select Quick setup (recommended for exploration), then choose an appropriate option for VPC. For this post, we choose Create VPC and create a new VPC.When you choose Create VPC, you are redirected to the AWS CloudFormation console to create a new VPC named SageMakerUnifiedStudio-VPC. The process to deploy the stack takes a few minutes to complete. It will incur additional cost.

    Proceed to the following steps after the new VPC named SageMakerUnifiedStudio-VPC is created successfully.

  3. For Domain name, enter a name (for example, RDSLakeHouse).
  4. For Domain Execution role, keep the default selection AmazonSageMakerDomainExecution.
  5. For Domain Service role, keep the default selection AmazonSageMakerDomainService.
  6. For Data encryption, keep the Customize encryption settings (advanced) check box unselected.
  7. For Provisioning role, keep the default selection AmazonSageMakerProvisioning-<YourAccountNumber>.
  8. Keep the default Create and use a new service role option for Manage access role.
  9. For Virtual private cloud (VPC), choose the appropriate VPC.
  10. For Subnets, choose at least three subnets in different Availability Zones.
  11. In the Generative AI resources section, for Model provisioning role, keep the default selection AmazonSageMakerBedrockModelManagementRole-<YourAccountNumber>.
  12. For Model consumption role, keep the default option Use an Amazon SageMaker Unified Studio managed role for each model selected.
  13. Choose Continue.
  14. In the Create IAM Identity Center user section, enter an email address, first name, and last name.
  15. Choose Create domain.
  16. Choose the newly created SageMaker domain.
  17. On the domain details page, choose Open unified studio to launch SageMaker Unified Studio (follow the prompts to complete the two-factor authentication process).

Create a new SQL Server data source from SageMaker Lakehouse

After you configure a SageMaker domain using the appropriate VPC, you now need to create a new SQL Server data source using SageMaker Lakehouse built-in features to connect to our sample wwi database, hosted on Amazon RDS for SQL Server. Complete the following steps:

  1. On the SageMaker Unified Studio landing page, choose Create project.
  2. Under Create project, provide the following information:
    1. For Project name, enter a name (for example, RDSSQLLakehouseDemo).
    2. For Description, enter an optional description (for example, Gen-AI with RDS SQL Server as a data source demo).
    3. Choose All capabilities for the project profile.
    4. Choose Continue.

  3. In the Customize blueprint parameters section, provide the requested information, then choose Continue.
    1. Retention – Select 180 from the drop down
    2. glueDbName – Enter “glue_db”
    3. redshiftDbName – Enter “dev”

  4. Review your selections and choose Create project.It may take a few minutes to create a project in SageMaker Lakehouse. Proceed to the next steps after the project is created successfully.
  5. On the RDSSQLLakehouseDemo project page, choose Data in the navigation pane to open SageMaker Lakehouse.
  6. Choose the plus sign to create a new data source.
  7. Choose Add connection, then choose Next.
  8. Choose Microsoft SQL Server as the connection type, then choose Next.
  9. Under Add data, provide the following information:
    1. For Name, enter a name (for example, lakehouse-rdssql-demo).
    2. For Description, enter an optional description (for example, RDS SQL Server data connection via SageMaker Lakehouse).
    3. For Host, enter your RDS for SQL Server instance endpoint.
    4. For Post, use 1433 (unless your instance was configured differently).
    5. For Database, enter wwi.
    6. For JDBC URL Parameters, enter trustServerCertificate=true.
    7. Under Authentication, select Username and password, then enter your RDS for SQL Server username and password (as of this writing, there’s an UI issue for passwords with special characters).
    8. Choose Add data.

    The “Add Data” feature connects to and scans your data sources, infers schema or structure, and prepares the data for use in notebooks or pipelines, which can take a few minutes depending on data size, permissions, and environment readiness.

    After successful completion of this task, a new entry will show in the Lakehouse pane, under the default AwsDataCatalog data connection.

  10. Expand the lakehouse-rdssql-demo data connection to browse available schemas in the wwi database, expand the application schema, then choose the countries table.
  11. Choose the options menu (three vertical dots) next to the countries table and choose Query with Athena.

This action launches the query editor with a prepopulated Amazon Athena SQL query, which will be automatically executed. After a few seconds, a short list of countries (10) should be displayed on the page, as shown in the following screenshot. To browse all countries, remove the limit 10 portion of the SQL statement, disable the Limit 100 option, and rerun the query.

Generative AI RAG scenarios using SageMaker Lakehouse

After you have created a SageMaker project with a properly configured SQL Server data source, you’re ready for the generative AI section of this demo. Complete the steps in this section to execute generative AI RAG scenarios using the sample wwi SQL Server database as a source.

Image generation use case

For our first use case, we execute a sample query that will be the source data for an image generation generative AI example using a Stability AI FM. To execute a sample query using the SageMaker Unified Studio query editor, complete the following steps:

  1. In SageMaker Unified Studio, choose Build from the menu bar, then choose Query Editor under Data Analysis & Integration.
  2. In the query editor, enter the following T-SQL command, then choose Run all to run the query:
    select top 10 c.customername customername, sum(t.transactionamount) totalsalesusd 
    from sales.customertransactions t 
    inner join sales.customers c 
    on c.customerid = t.customerid 
    group by customername order by 2 desc;

    After a few seconds, the top 10 list of customers will be shown.

  3. Choose Discover on the menu bar and choose Image and video playground under Generative AI.
  4. Under Image and video playground, for Configurations, choose Stability AI Stable Image Ultra 1.0.
  5. For Size, choose 1344 x 768.
  6. Keep remaining default values.
  7. Enter the following prompt and then choose the play icon:
    “Two dogs are sitting and holding signs. The first dog, the biggest, is a Saint Bernard and it is holding a sign that reads "97053". The second dog, the smallest, is a short-haired Chihuahua and it is holding a sign that reads "56435". The size difference between the dogs is extreme. The first dog is much bigger than the second dog.”

An image similar to the one shown in the following screenshot will appear. Because these FMs are non-deterministic (they are all statistical models), it’s likely that the image generated in your environment will be different. An interesting exercise is retrying the same prompt several times to see what the FM generates in each case. This example is just to showcase SageMaker Unified Studio image generation capabilities using specific data points from a SQL Server source (“97053” and “56435” sales totals) as literal values extracted from the query and embedded in the prompt.

Text generation use case

For the second use case, we execute another sample query that will be the source data for a text generation generative AI example using an Amazon Nova FM. Complete the following steps:

  1. In SageMaker Unified Studio, choose Build from the menu bar, then choose Query Editor under Data Analysis & Integration.
  2. In the query editor, enter the following T-SQL command, then choose Run all to run the query:
    select top 10 s.stockitemname stockitemname, sum(il.quantity * il.unitprice) totalsalesusd 
    from sales.invoicelines il 
    inner join warehouse.stockitems s 
    on il.StockItemID = s.StockItemID 
    group by StockItemName 
    order by 2 desc;

    After a few seconds, the top 10 list of customers will be shown.

  3. Download the result set by choosing the CSV icon.
  4. Open the CSV file, select the entire result set, and copy it to the clipboard for later use.
  5. Choose Build on the menu bar and choose Prompt under Machine Learning & Generative AI.
  6. Choose the Amazon Nova Pro FM.
  7. Keep the remaining default values.
  8. Enter the following prompt: “Write a summary sales report with a commentary on the top selling stock items based on the sales results included below: {{resultset2}}”.
  9. Edit the resultset2 variable Value field and enter in the SQL Server result set obtained from previous steps.
  10. Choose Run and wait for a few seconds for the summary sales report to generate.
    You should see a report similar to the one in the following screenshot.
  11. Copy the report, enter it into your preferred text editor, and examine its contents.

Because these FMs are non-deterministic (they are all statistical models), the report generated in your environment might be slightly different. An interesting exercise is retrying the same prompt several times to see what the FM generates in each case.

Clean up

To avoid incurring future charges and remove the components you created while testing this use case, complete the following steps:

  1. On the Amazon RDS console, choose Databases in the navigation pane.
  2. Select the database you set up, and on the Actions menu, choose Delete.
  3. On the SageMaker console, choose View existing domains and choose your domain.
  4. Choose Delete.
  5. On the AWS CloudFormation console, select the SageMakerUnifiedStudio-VPC stack, then choose Delete to delete the components created by the stack.
  6. Confirm the deletion request and follow the required prompts.

Conclusion

In this post, we demonstrated how to integrate Amazon SageMaker Lakehouse with Amazon RDS for SQL Server to build powerful generative AI applications. Through practical examples, we explored how Retrieval Augmented Generation (RAG) can enhance foundation models, using Stability AI Stable Image Ultra 1.0 for image generation and Amazon Nova Pro for text generation. The solution’s straightforward setup process allows you to seamlessly access and utilize data from your RDS for SQL Server database, requiring minimal configuration effort. This solution highlights how the seamless integration between these two AWS services can help accelerate your generative AI development journey. By combining the robust capabilities of these two services you can quickly build and deploy sophisticated AI-powered solutions that leverage your existing data assets. We encourage you to try this integration in your own environment and explore how it can benefit your specific use cases. If you have any feedback or questions, please leave them in the comments section.


About the authors

Camilo Leon

Camilo Leon

Camilo is a Principal Solutions Architect at AWS specialized in databases and based in San Francisco, California. He works with AWS customers, providing architectural guidance and technical support for the design, deployment, and management of their AWS relational database workloads and business applications. In his spare time, he enjoys mountain biking, photography, and movies.

Minesh Chande

Minesh Chande

Minesh is a Senior Database Specialist Solutions Architect at AWS, focused on helping customers across diverse industries optimize their database solutions. He specializes in designing, migrating, and enhancing SQL Server workloads on managed database platforms such as Amazon RDS, Amazon RDS Custom, and Babelfish for Aurora PostgreSQL.

Kannan Iyer

Kannan Iyer

Kannan is a Senior Solutions Architect at AWS specialized in In-Memory databases and based in San Francisco, California. He works with AWS customers to provide architectural guidance and technical support for the design, development, deployment, and management of their AWS managed In-Memory database workloads and business applications. In his spare time, he enjoys hiking.