Overview and best practices of multithreaded replication in Amazon RDS for MySQL, Amazon RDS for MariaDB, and Amazon Aurora MySQL

Multithreaded replication (MTR) is a feature in MySQL that enhances binlog replication performance, particularly for high-throughput databases such as those managed in Amazon Aurora MySQL-Compatible Edition and Amazon Relational Database Service (Amazon RDS) for MySQL and MariaDB. This technology is useful in standard replication scenarios as well as in modern operational practices, including Amazon RDS Blue/Green deployments, where it simplifies database upgrades and modifications with minimal downtime.

In this first post, we dive into the world of MySQL replication, with a special focus on parallel replication techniques. We start with a quick overview of how MySQL replication works, then explore the intricacies of multithreaded replication. We discuss key configuration options and best practices for optimization. In the second part, we will cover multithreaded replication monitoring.

MySQL binlog replication overview

The following image shows the high-level architecture of MySQL replication with single threaded replication.

MySQL replication arch

Let’s dive into each of the components shown in the above diagram on both source and replica.

In the source

When binary logs are enabled and a DML (Data Manipulation Language), DCL (Data Control Language), or DDL (Data Definition Language) is executed in the source and committed, MySQL persists the statement or transaction, as events, to the binary log files. Once persisted, the replicas will fetch the events through the dump thread.
- Note: Binary logs are enabled by default on Amazon RDS for MySQL if the backup retention period is set to a nonzero value. But in Aurora MySQL, customers must enable binary logs explicitly.
Binary log dump thread – For each replica, the source database creates a thread to send the binary log contents to a replica when the replica connects. You can identify this thread in the output of SHOW PROCESSLIST on the source as the Binlog Dump thread.

In the replica

IO thread – the replica creates an IO (receiver) thread, which connects to the source and pull the updates recorded in its binary logs. The IO thread reads the updates that the source’s Binlog Dump thread sends and copies them to local files that comprise the replica’s relay log. The state of this thread is shown as Replica_IO_running in the output of SHOW REPLICA STATUS. It is always a single IO Thread regardless of single-threaded replication or MTR. It rarely becomes a bottleneck in replication.
SQL (applier) thread – Reads relay logs and applies them to the replica’s database (as any other normal client would). There is only one SQL thread in single threaded replication, which is the default before MySQL 8.0.27. There are multiple SQL threads in MTR.
If the replica has binlogs enabled, the SQL thread is also responsible for writing new binary logs to the disk. This can cause replication lag due to binlog overhead.

MTR overview

When MTR is enabled (by setting slave_parallel_workers or replica_parallel_workers to a value greater than 1), the SQL thread is divided into two types: a coordinator thread and multiple worker threads. The coordinator thread’s job is to read events from the relay log, analyze event dependencies, and then assign them to the worker threads for parallel execution. Event dependencies are determined in the source instance to track which events can be safely executed in parallel on replicas while maintaining data consistency. Worker or applier threads are the workhorses of the replication process, replaying the transactions on the replica. This structure allows for more efficient parallel processing of replication events.

In the following example, the parameter replica_parallel_workers is set to 4. When the query Select * from information_schema.processlist where User='system user'; is executed on the read replica, you’ll find a total of six threads in the output: one IO thread, one coordinator thread and four worker threads.

IO_THREAD: One per replication channel (Id: 419 in the following output)
SQL threads:
- Coordinator thread: One per replication channel. (Id: 420 in the following output)
- Worker threads: replica_parallel_workers per channel. (Id 421-424 in the following output)

select * from information_schema.processlist where User='system user';
+------+-----------------+--------------------+------+---------+--------+----------------------------------------------------------+-----------------------+
| Id   | User            | Host               | db   | Command | Time   | State                                                    | Info                  |
+------+-----------------+--------------------+------+---------+--------+----------------------------------------------------------+-----------------------+
|  419 | system user     | connecting host    | NULL | Connect | 494497 | Waiting for source to send event                         | NULL                  |
|  420 | system user     |                    | NULL | Query   |      9 | Replica has read all relay log; waiting for more updates | NULL                  |
|  421 | system user     |                    | NULL | Query   |      9 | Waiting for an event from Coordinator                    | NULL                  |
|  422 | system user     |                    | NULL | Query   | 494925 | Waiting for an event from Coordinator                    | NULL                  |
|  423 | system user     |                    | NULL | Query   | 494925 | Waiting for an event from Coordinator                    | NULL                  |
|  424 | system user     |                    | NULL | Query   | 494925 | Waiting for an event from Coordinator                    | NULL                  |
+------+-----------------+--------------------+------+---------+--------+----------------------------------------------------------+-----------------------+

How MTR works

MTR relies on replication dependency tracking. This system determines how transactions can be safely executed in parallel on replica servers while maintaining data consistency. MySQL provides two primary methods for this dependency tracking: COMMIT ORDER and WRITESET.

COMMIT_ORDER dependency tracking relies on group commit timing on the source server. The dependency information written by the replication source is represented using logical timestamps and persisted in the binary log events. There are two logical timestamps for each transaction to determine dependencies.

sequence_number – This is one for the first transaction in each binary log, two for the second transaction, and so on. The numbering restarts with one in each binary log file.
last_committed – This refers to the sequence_number of the most recently committed transaction found to conflict with the current transaction. This value is always less than the sequence_number.

The following code block shows a simplified binlog snippet after being decoded using mysqlbinlog. In this example, the transactions with sequence numbers 2347, 2348, 2349, and 2351 can execute in parallel in the replica because their last committed timestamps all point to a transaction before this set. Transaction with sequence number 2350 is dependent on 2348, so it can’t be replicated in parallel with 2347, 2348, 2349, and 2351.

#308741 15:23:45... last committed=2345 sequence number=2346
#308741 15:23:45... last committed=2346 sequence number=2347
#308741 15:23:45... last committed=2346 sequence number=2348
#308741 15:23:45... last committed=2346 sequence number=2349
#308741 15:23:45... last committed=2348 sequence number=2350
#308741 15:23:45... last committed=2345 sequence number=2351

This approach excels in environments with highly concurrent workloads on the source server or when large group commit sizes are configured by adjusting the binlog_group_commit_sync_delay parameter, which we explore in detail later. However, this method’s effectiveness is heavily influenced by timing factors and doesn’t consider the actual independence of data access patterns. As a result, in real-world scenarios, the level of parallelism achieved is often more limited than one might initially expect.

WRITESET represents a more sophisticated approach to dependency tracking in MySQL replication, moving beyond the simple commit time window used in COMMIT ORDER to track actual data modifications at the row level. For each row modified by a transaction, MySQL generates a unique hash. The collection of these hashes, for a transaction, forms its write sets. This tracking enables parallel execution of transactions on the replica if their write sets don’t overlap, regardless of their original execution order on the source, session ownership, or commit window timing. By focusing on real data dependencies rather than temporal relationships, this method can substantially enhance replication parallelization capabilities on replica servers. This can result in improved replication performance and throughput compared to the COMMIT ORDER. However, there are cases WRITESET doesn’t outperform COMMIT ORDER, which we explore in detail in the best practices section.

MTR configurations and best practices

MTR is supported in RDS for MySQL 5.7, 8.0, and 8.4, Amazon RDS for MariaDB 10.0.5 and higher, and Aurora MySQL version 3, and in Aurora MySQL version 2.12.1 and higher. However, to fully benefit from MTR, we strongly recommend upgrading both your source and target or replica servers to more recent versions. As of this writing, optimal performance and operation is achieved with source servers running RDS MySQL 5.7.44+, Aurora MySQL 2.12.5+, or RDS MySQL 8.0.35+, Aurora MySQL 3.10.0+. For replica servers, we advise using RDS MySQL 8.0.35+ and Aurora MySQL 3.10.0+. Now, let’s dive into the key configuration parameters and best practices for optimizing MTR in your RDS for MySQL, RDS for MariaDB, and Aurora MySQL environment. Note that parameter names might differ between versions due to updates in terminology.

In the source

binlog_transaction_dependency_tracking

The valid values for this parameter are COMMIT_ORDER, WRITESET, and WRITESET_SESSION. We already covered the meaning of COMMIT_ORDER and WRITESET. WRITESET_SESSION is the same thing as WRITESET, however there is an additional constraint applied: Two transactions that were committed in the same client session can’t be applied in parallel.

From Improving the Parallel Applier with Writeset-based Dependency Tracking, you can see WRITESET and COMMIT_ORDER perform similarly under highly concurrent workload on the source, but WRITESET shows better performance in environments with lower concurrency. This is particularly relevant for real-world applications, which typically have lower concurrent DML levels than artificial benchmark tests such as sysbench, which are designed to at higher thread counts in most cases. We recommend setting this parameter to WRITESET (or WRITESET_SESSION if your application requires).

There are cases WRITESET can’t be used and MySQL fallbacks to non-writeset, including:

Tables without primary or unique keys – The WRITESET dependency tracking method relies on the ability to uniquely identify modified rows. Without primary or unique keys, the system cannot accurately track the full set of changes made by a transaction. It’s important to note that regardless of using WRITESET or not, all InnoDB tables should have an explicit primary or unique key to avoid potential performance issues. See primary key optimization for more details.
Transactions with DDL statements – DDL statements such as CREATE TABLE or ALTER TABLE modify the database schema rather than only the data. These schema changes aren’t easily tracked in the same way as regular data modifications, which can impact the effectiveness of replication dependency tracking. As a best practice, it’s recommended to minimize or avoid DDL operations if possible, during periods of high DML activity, which not only benefits replication performance but also aligns with general database management best practices.
Transactions accessing parent tables in foreign key relationships – When a transaction modifies data in a child table that has a foreign key relationship, the changes to the parent table might not be fully captured in the write sets, potentially leading to incomplete dependency tracking.

binlog_transaction_dependency_history_size

This parameter sets an upper limit on the number of row hashes, which are kept in memory and used for looking up the transaction that last modified a given row. When this number of hashes has been reached, the history is purged. You can increase this parameter on large instance classes such as 4xlarge or larger. However, setting this value too high can lead to performance issues because it can lead to excessive memory consumption and high CPU usage for tracking dependencies, so test carefully when configuring so you’re aware of trade-off.

binlog_format

This system variable sets the binary logging format and can be STATEMENT, ROW, or MIXED. Note that binlog_format is deprecated as of MySQL 8.0.34 and is subject to removal in a future version of MySQL. This implies that support for logging formats other than row-based is also subject to removal in a future release. In RDS for MySQL, RDS for MariaDB, and Aurora MySQL, we recommend using binlog_format=Row for performance and compatibility reasons.

binlog_group_commit_sync_delay

The binlog_group_commit_sync_delay parameter determines how many microseconds the binary log commit waits before synchronizing the binary log file to disk. This setting is particularly effective when used with binlog_transaction_dependency_tracking = COMMIT_ORDER. It has no effect on dependency tracking when binlog_transaction_dependency_tracking = WRITESET. By intentionally introducing a slight delay in the commit process on the source, MySQL can batch more writes into each group commit, creating larger commit windows and increasing parallel execution on replicas. However, this optimization comes with a trade-off: It also increases transaction latency on the source server, which might impact client application performance. We strongly recommend thorough testing to find the optimal value that balances improved replication performance against acceptable transaction latency for your specific use case.

Transaction size

In MySQL, maintaining appropriate transaction sizes is important for optimal performance. Large transactions can lead to several issues: They hold locks for extended periods, potentially blocking other operations and reducing overall system throughput, while also increasing RollbackSegmentHistoryListLength, which can impact overall database performance. In the context of MTR, large transactions might significantly limit parallelism on replica servers, leading to replication lag, which is discussed in detail in the replica_pending_jobs_size_max section. To mitigate these issues, it’s recommended to avoid large transactions whenever possible. For operations that must modify substantial amounts of data, consider breaking them into smaller, manageable chunks that can be processed in separate transactions.

In the replica

binlog_format

Aurora MySQL doesn’t require binlogs enabled for intra-cluster replication and backup and recovery, so if you don’t have downstream replication, we recommend setting binlog_format to OFF in the DB cluster parameter group to disable binary logging on the Aurora MySQL DB replica cluster. This helps boost replication performance. Setting binlog_format to OFF resets the binlog_format session variable to the default value of ROW in the database. For RDS for MySQL or RDS for MariaDB replica servers, you can achieve similar performance gains by disabling automated backups, which in turn disables binary logging. However, this comes with the trade-off that it removes the ability to perform point-in-time recovery for the replica.

replica_parallel_type or slave_parallel_type

The possible values are:

LOGICAL_CLOCK – Transactions are applied in parallel on the replica, based on timestamps that the replication source writes to the binary log. Dependencies between transactions are tracked based on their logical timestamps to provide additional parallelization where possible.
DATABASE – Transactions that update different databases are applied in parallel. This value is only appropriate if data is partitioned into multiple databases that are being updated independently and concurrently on the source. There must be no cross-database constraints because such constraints might be violated on the replica.

We recommend setting LOGICAL_CLOCK because it provides more granular dependency tracking and better parallelism, unless you have a specific use case to use DATABASE. In all RDS for MySQL 8.0 and Aurora MySQL 3 versions, the default value is LOGICAL_CLOCK.

replica_parallel_workers or slave_parallel_workers

Setting this parameter to a value higher than 1 enables MTR on the replica and sets the number of applier threads for executing replication transactions in parallel. Prior to RDS for MySQL 8.0.27 and Aurora MySQL 3.04.0, the default value of this system variable is 0, so replicas use a single worker thread by default. Beginning with RDS for MySQL 8.0.27 and Aurora MySQL 3.04.0, the default value is 4, which means that replicas are multithreaded by default. The optimal value for replica_parallel_workers depends on your specific hardware and workload characteristics. If your server is larger than 2xlarge, you can start with four worker threads, but you need to monitor and observe to tune this parameter, which we will cover in our next post. We don’t recommend setting this parameter too high, as beyond a certain point, it can even reduce performance due to concurrency effects such as lock contention.

Setting replica_parallel_workers has no immediate effect. You would need to restart the replication by using mysql.rds_stop_replication and mysql.rds_start_replication statements. Tables without primary keys, which usually harm performance as mentioned previously, might have even greater negative performance impact on replicas having replica_parallel_workers greater than 1.

replica_pending_jobs_size_max

This variable sets the maximum memory available for queues holding events yet to be applied by worker threads on replica servers. The value of this variable is a soft limit and can be set to match the normal workload. If an unusually large event exceeds this size, the transaction is held until all the worker threads have empty queues and then processed. All subsequent transactions are held until the large transaction has been completed. Although this ensures event processing, it can lead to significantly reduced worker concurrency, hence replication lag. Therefore, it’s important to set this parameter high enough to handle your typical event sizes. Additionally, on multithreaded replicas, this value should be at least equal to, if not greater than, the max_allowed_packet setting on the source to prevent replication failures due to large packets, as mentioned in the MySQL documentation.

aurora_binlog_replication_sec_index_parallel_workers (Aurora MySQL only)

In Aurora MySQL version 3.06 and higher, you can improve performance for binary log replicas when replicating transactions for large tables with more than one secondary index. This feature introduces a thread pool to apply secondary index changes in parallel on a binlog replica. The feature is controlled by the aurora_binlog_replication_sec_index_parallel_workers DB cluster parameter, which controls the total number of parallel threads available to apply the secondary index changes. The parameter is set to 0 (disabled) by default. Enabling this feature doesn’t require an instance restart. To enable this feature, stop ongoing replication, set the desired number of parallel worker threads, and then start replication again.

aurora_in_memory_relaylog (Aurora MySQL only)

Aurora MySQL version 3.10 extends the in-memory relay log cache support for binary log replicas. This feature, first introduced in version 3.05, can improve binary log replication throughput by up to 40%. The in-memory relay log cache is enabled by default for single-threaded binary log replication, multi-threaded replication with GTID auto-positioning enabled, and starting with version 3.10, it’s also enabled for multi-threaded replication with replica_preserve_commit_order = ON (even without GTIDs).

replica_preserve_commit_order or slave_preserve_commit_order

When replica_preserve_commit_order (or slave_preserve_commit_order) is set to ON (the default in MySQL 8.0.27 and later), transactions are executed and committed on the replica in the same order as they appear in the replica’s relay log. This prevents gaps in the sequence of transactions that have been executed from the replica’s relay log and preserves the same transaction history on the replica as on the source. When replica_preserve_commit_order=ON is set, the executing worker thread waits until all previous transactions are committed before committing. Although a given thread is waiting for other worker threads to commit their transactions, it reports its status as Waiting for preceding transaction to commit. This might slightly reduce the level of parallelism in the replica, but it’s recommended for data consistency, especially for applications that depend on commit order.

Lastly, to increase the resilience of your database against unexpected halts, we recommend that you enable global transaction identifier (GTID) replication on the source and allow GTIDs on the replica. To allow GTID replication, set gtid_mode to ON_PERMISSIVE on both the source and replica. For more information about GTID-based replication, see Using GTID-based replication.

Conclusion

In this post, we discussed how MySQL replication and MTR works and its key configuration options and best practices. In our next post, we will discuss methods to effectively monitor parallel replication performance.

AWS Database Blog

Overview and best practices of multithreaded replication in Amazon RDS for MySQL, Amazon RDS for MariaDB, and Amazon Aurora MySQL

MySQL binlog replication overview

In the source

In the replica

MTR overview

How MTR works

MTR configurations and best practices

In the source

binlog_transaction_dependency_tracking

binlog_transaction_dependency_history_size

binlog_format

binlog_group_commit_sync_delay

Transaction size

In the replica

binlog_format

replica_parallel_type or slave_parallel_type

replica_parallel_workers or slave_parallel_workers

replica_pending_jobs_size_max

aurora_binlog_replication_sec_index_parallel_workers (Aurora MySQL only)

aurora_in_memory_relaylog (Aurora MySQL only)

replica_preserve_commit_order or slave_preserve_commit_order

Conclusion

About the authors

Resources

Blog Topics

Follow

Learn

Resources

Developers

Help