AWS Big Data Blog
Best practices for migrating Teradata BTEQ scripts to Amazon Redshift RSQL
When migrating from Teradata BTEQ (Basic Teradata Query) to Amazon Redshift RSQL, following established best practices helps ensure maintainable, efficient, and reliable code. While the AWS Schema Conversion Tool (AWS SCT) automatically handles the basic conversion of BTEQ scripts to RSQL, it primarily focuses on SQL syntax translation and basic script conversion. However, to achieve optimal performance, better maintainability, and full compatibility with the architecture of Amazon Redshift, additional optimization and standardization are needed.
The best practices that we share in this post complement the automated conversion supplied by AWS SCT by addressing areas such as performance tuning, error handling improvements, script modularity, logging enhancements, and Amazon Redshift-specific optimizations that AWS SCT might not fully implement. These practices can help you transform automatically converted code into production-ready, efficient RSQL scripts that fully use the capabilities of Amazon Redshift.
BTEQ
BTEQ is Teradata’s legacy command-line SQL tool that has served as the primary interface for Teradata databases since the 1980s. It’s a powerful utility that combines SQL querying capabilities with scripting features; you can use it to perform various tasks from data extraction and reporting to complex database administration. BTEQ’s robustness lies in its ability to handle direct database interactions, manage sessions, process variables, and execute conditional logic while providing comprehensive error handling and report formatting capabilities.
RSQL is a modern command-line client tool provided by Amazon Redshift and is specifically designed to execute SQL commands and scripts in the AWS ecosystem. Similar to PostgreSQL’s psql but optimized for the unique architecture of Amazon Redshift, RSQL offers seamless SQL query execution, efficient script processing, and sophisticated result set handling. It stands out for its native integration with AWS services, making it a powerful tool for modern data warehousing operations.
The transition from BTEQ to RSQL has become increasingly relevant as organizations embrace cloud transformation. This migration is driven by several compelling factors. Businesses are moving from on-premises Teradata systems to Amazon Redshift to take advantage of cloud benefits. Cost optimization plays a crucial role in these moves, because Amazon Redshift typically offers more economical data warehousing solutions with its pay-as-you-go pricing model.
Furthermore, organizations want to modernize their data architecture to take advantage of enhanced security features, better scalability, and seamless integration with other AWS services. The migration also brings performance benefits through columnar storage, parallel processing capabilities, and optimized query performance offered by Amazon Redshift, making it an attractive destination for enterprises looking to modernize their data infrastructure.
Best practices for BTEQ to RSQL migration
Let’s explore key practices across code structure, performance optimization, error handling, and Redshift-specific considerations that will help you create robust and efficient RSQL scripts.
Parameter files
Parameters in RSQL function as variables that store and pass values to your scripts, similar to BTEQ’s .SET VARIABLE
functionality. Instead of hardcoding schema names, table names, or configuration values directly in RSQL scripts, use dynamic parameters that can be modified for different environments (dev, test, prod). This approach reduces manual errors, simplifies maintenance, and supports better version control by keeping sensitive values separate from code.
Create a separate shell script containing environment variables:
Then import these parameters into your RSQL scripts using:
Secure credential management
For better security and maintainability, use JDBC or ODBC temporary AWS Identity and Access Management (IAM) credentials for database authentication. For details, see Connect to a cluster with Amazon Redshift RSQL.
Query logging and debugging
Debugging and troubleshooting SQL scripts can be challenging, especially when dealing with complex queries or error scenarios. To simplify this process, it’s recommended to enable query logging in RSQL scripts.
RSQL provides the echo-queries
option, which prints the executed SQL queries along with their execution status. By invoking the RSQL client with this option, you can track the progress of your script and identify potential issues.
rsql --echo-queries -D testiam
Here testiam
represents a DSN connection configured in odbc.ini with an IAM profile.
You can store these logs by redirecting the output when executing your RSQL script:
With query logging is enabled, you can examine the output and identify the specific query that caused an error or unexpected behavior. This information can be invaluable when troubleshooting and optimizing your RSQL scripts.
Error handling with incremental exit codes
Implement robust error handling using incremental exit codes to identify specific failure points. Proper error handling is crucial in a scripting environment, and RSQL is no exception. In BTEQ scripts, errors were typically handled by checking the error code and taking appropriate actions. However, in RSQL, the approach is slightly different. To help ensure robust error handling and straightforward troubleshooting, it’s recommended that you implement incremental exit codes at the end of each SQL operation.The incremental exit code approach works as follows:
- After executing a SQL statement (such as
SELECT
,INSERT
,UPDATE
, and so on.), check the value of the:ERROR
variable. - If the
:ERROR
variable is non-zero, it indicates that an error occurred during the execution of the SQL statement. - Print the error message, error code, and additional relevant information using RSQL commands such as
\echo
,\remark
, and so on. - Exit the script with an appropriate exit code using the
\exit
command, where the exit code represents the specific operation that failed.
By using incremental exit codes, you can identify the point of failure within the script. This approach not only aids in troubleshooting but also allows for better integration with continuous integration and deployment (CI/CD) pipelines, where specific exit codes can trigger appropriate actions.
Example:
In the preceding example, if the SELECT
statement fails, the script will exit with an exit code of 1. If the INSERT
statement fails, the script will exit with an exit code of 2. By using unique exit codes for different operations, you can quickly identify the point of failure and take appropriate actions.
Use query groups
When troubleshooting issues in your RSQL scripts, it can be helpful to identify the root cause by analyzing query logs. By using query groups, you can label a group of queries that are run during the same session, which can help pinpoint problematic queries in the logs.
To set a query group at the session level, you can use the following command:
set query_group to $QUERY_GROUP;
By setting a query group, queries executed within that session will be associated with the specified label. This technique can significantly aid in effective troubleshooting when you need to identify the root cause of an issue.
Use a search path
When creating an RSQL script that refers to tables from the same schema multiple times, you can simplify the script by setting a search path. By using a search path, you can directly reference table names without specifying the schema name in your queries (for example, SELECT
, INSERT
, and so on).
To set the search path at the session level, you can use the following command:
After setting the search path to $STAGING_TABLE_SCHEMA
, you can refer to tables within that schema directly, without including the schema name.
For example:
If you haven’t set a search path, you need to specify the schema name in the query, as shown in the following example:
It’s recommended to use a fully qualified path for an object in an RSQL script, but adding the search path prevents abrupt execution failure because of not providing a fully qualified path.
Combine multiple UPDATE statements into a single INSERT
In BTEQ scripts, it might have multiple sequential UPDATE
statements for the same table. However, this approach can be inefficient and lead to performance issues, especially when dealing with large datasets, because of I/O intensive operations.
To address this concern, it’s recommended to combine all or some of the UPDATE
statements into a single INSERT
statement. This can be achieved by creating a temporary table, converting the UPDATE
statements into a LEFT JOIN
with the staging table using a SELECT
statement, and then inserting the temporary table data into the staging table.
Example:
The existing BTEQ SQLs in the following example first INSERT
the data into staging_table
from staging_table1
and then UPDATE
the columns for inserted data if certain condition is satisfied:
The following RSQL operation below achieves the same result by first loading the data into a staging table, then executing the UPDATE
using a temporary table as an intermediate step and then completes UPDATE
using a temporary table. After this, it will truncate staging_tables
and insert temporary table staging_table_temp1
data into staging_table
.
The following is an overview of the preceding logic:
- Create a temporary table with the same structure as the staging table.
- Execute a single
INSERT
statement that combines the logic of all theUPDATE
statements from the BTEQ script. TheINSERT
statement uses aLEFT JOIN
to merge data from the staging table and thestaging_table2
table, applying the necessary transformations and conditions. - After inserting the data into the temporary table, truncate the staging table and insert the data from the temporary table into the staging table.
By consolidating multiple UPDATE
statements into a single INSERT
operation, you can improve the overall performance and efficiency of the script, especially when dealing with large datasets. This approach also promotes better code readability and maintainability.
Execution logs
Troubleshooting and debugging scripts can be a challenging task, especially when dealing with complex logic or error scenarios. To aid in this process, it’s recommended to generate execution logs for RSQL scripts.
Execution logs capture the output and error messages produced during the script’s execution, providing valuable information for identifying and resolving issues. These logs can be especially helpful when running scripts on remote servers or in automated environments, where direct access to the console output might be limited.
To generate execution logs, you can execute the RSQL script from the Amazon Elastic Compute Cloud (Amazon EC2) machine and redirect the output to a log file using the following command:
The preceding command executes the RSQL script and redirects the output, including error messages or debugging information to the specified log file. It’s recommended to add a time parameter in the log file name to have distinct files for each run of RSQL script.
By maintaining execution logs, you can review the script’s behavior, track down errors, and gather relevant information for troubleshooting purposes. Additionally, these logs can be shared with teammates or support teams for collaborative debugging efforts.
Capture an audit parameter in the script
Audit parameters such as start time, end time, and the exit code of an RSQL script are important for troubleshooting, monitoring, and performance analysis. You can capture the start time at the beginning of your script and the end time and exit code after the script completes.
Here’s an example of how you can implement this:
The preceding example captures the start time in start= $(date +%s)
. After the RSQL code is complete, it captures the exit code in rsqlexitcode=$?
and the end time in end=$(date +%s)
.
Sample structure of the script
The following is a sample RSQL script that follows the best practices outlined in the preceding sections:
Conclusion
In this post, we’ve explored crucial best practices for migrating Teradata BTEQ scripts to Amazon Redshift RSQL. We’ve shown you essential techniques including parameter management, secure credential handling, comprehensive logging, and robust error handling with incremental exit codes. We’ve also discussed query optimization strategies and methods that you can use to improve data modification operations. By implementing these practices, you can create efficient, maintainable, and production-ready RSQL scripts that fully use the capabilities of Amazon Redshift. These approaches not only help ensure a successful migration, but also set the foundation for optimized performance and straightforward troubleshooting in your new Amazon Redshift environment.
To get started with your BTEQ to RSQL migration, explore these additional resources:
- AWS Schema Conversion Tool User Guide
- Amazon Redshift best practices
- Amazon Redshift sample database scripts
About the authors
Ankur Bhanawat is a Consultant with the Professional Services team at AWS based out of Pune, India. He’s an AWS certified professional in three areas and specialized in databases and serverless technologies. He has experience in designing, migrating, deploying, and optimizing workloads on the AWS Cloud.
Raj Patel is AWS Lead Consultant for Data Analytics solutions based out of India. He specializes in building and modernizing analytical solutions. His background is in data warehouse architecture, development, and administration. He has been in data and analytical field for over 14 years.