Split Read and Writes in Magento 2 using AWS RDS and ProxySQL

Hello,

I have been exploring options lately to scale MySQL and tested this solution. In my tests, I am using small db.t2.medium AWS RDS cluster with Write and Read replica.

For test purposes following components were used:
- AWS account setup of RDS with public access with IP-only limitation.
- Magento 2 gitpod instance (https://github.com/nemke82/magento2gitpod) started.
- Modified m2-install.sh file to connect to Remote AWS RDS environment and install fresh Magento 2 install with Performance fixtures data (small).
- ProxySQL installed on the Gitpod environment and configured

Before you start, you have to create a test database. In my example:

mysql -u nemanja -h nemanja-instance-1.cusgvuriflp3.us-east-1.rds.amazonaws.com -pRandomPassword123 -e 'create database nemanja;'  

Then we can add privileges to that database:

GRANT ALL PRIVILEGES ON nemanja.* TO 'nemanja'@'%';  
FLUSH PRIVILEGES;  
exit;  

Modify m2-install.sh (if you are testing on the Magento 2 Gitpod platform) file with the new MySQL hostname endpoint, username, and password and execute the installation.

Once the installation of fresh Magento 2 is ready and data is in your AWS RDS instance, we will proceed with ProxySQL configuration.

To set up ProxySQL to split SELECT queries between an AWS RDS master instance and its read replicas, follow these steps:

Install ProxySQL
Install ProxySQL on an EC2 instance or a server within your VPC, so it has access to your RDS instances.

URL: https://proxysql.com/documentation/installing-proxysql/

Configure ProxySQL
Edit the ProxySQL configuration file, usually located at /etc/proxysql.cnf, and configure it according to your needs. An example configuration is shown below:

datadir="/var/lib/proxysql"  
admin_variables=  
{
  admin_credentials="admin:admin"
  mysql_ifaces="0.0.0.0:6032"
}
mysql_variables=  
{
  threads=4
  max_connections=2048
  default_query_delay=0
  default_query_timeout=36000000
  have_compress=true
  poll_timeout=2000
  interfaces="0.0.0.0:6033"
  default_schema="information_schema"
  stacksize=1048576
  server_version="5.7.27"
  connect_timeout_server=3000
  monitor_history=600000
  monitor_connect_interval=60000
  monitor_ping_interval=10000
  ping_interval_server_msec=10000
  ping_timeout_server=200
  commands_stats=true
  sessions_sort=true
}

Start the ProxySQL service:

sudo systemctl start proxysql  

Next task is to Configure RDS instances in ProxySQL.

Let's access the ProxySQL admin interface:

mysql -u admin -p -h 127.0.0.1 -P 6032 --prompt='ProxySQLAdmin> '  

Default password is "admin" for Fresh ProxySQL installation.

Add your RDS master and read replica instances:

-- Add master instance
INSERT INTO mysql_servers(hostgroup_id, hostname, port)  
VALUES (0, 'your_master_instance_endpoint', 3306);

-- Add read replica instance
INSERT INTO mysql_servers(hostgroup_id, hostname, port)  
VALUES (1, 'your_read_replica_instance_endpoint', 3306);

LOAD MYSQL SERVERS TO RUNTIME;  
SAVE MYSQL SERVERS TO DISK;  

Here we start with one master and 1 read replica endpoint. We can scale later and add more easily.

Set up query rules

The next step is to create rules to route SELECT queries to the read replica:

INSERT INTO mysql_query_rules (rule_id, active, match_pattern, destination_hostgroup, apply)  
VALUES (1, 1, '^SELECT.*FOR UPDATE$', 0, 1),  
       (2, 1, '^SELECT', 1, 1);             

LOAD MYSQL QUERY RULES TO RUNTIME;  
SAVE MYSQL QUERY RULES TO DISK;  

-- Send 'SELECT ... FOR UPDATE' to the master
-- Send other SELECTs to the read replica
You can extend this with anything you think could benefit route SELECT queries to the read replica.

Configure user credentials

Add the database user credentials to ProxySQL:

INSERT INTO mysql_users(username, password, default_hostgroup)  
VALUES ('your_username', 'your_password', 0);

LOAD MYSQL USERS TO RUNTIME;  
SAVE MYSQL USERS TO DISK;  

We will use the same username and password we used to define when creating the AWS RDS cluster. I advise using anything except the "admin" username, as that is the default username for ProxySQL, which we could change later, but is not crucial in the current proof-of-concept demo.

Monitor and adjust
Monitor ProxySQL's performance and adjust settings as needed. You can add more read replicas and change the load balancing settings for better performance.

Start by adjusting the env.php file. In the 'host' field, enter:

127.0.0.1:6033  

We could experiment here by adding Socket instead of a loopback which I think should optimize this area better.

Next, adjust the 'username' and 'password' fields with the one you created above.

Now, your application should connect to ProxySQL on port 6033 instead of directly connecting to the RDS instances. ProxySQL will route SELECT queries to the read replica and other write queries to the master instance.

-- Horizontally scaling Read replicas --
To add more read replicas and adjust the load balancing settings for better performance, follow these steps:

Add more read replicas in AWS RDS
First, create additional read replicas of your master RDS instance from the AWS Management Console or the AWS CLI.

Add read replicas to ProxySQL
Connect to the ProxySQL admin interface:

mysql -u admin -p -h 127.0.0.1 -P 6032 --prompt='ProxySQLAdmin> '  

Add the new read replica instances to ProxySQL, assigning them to the same hostgroup (1 in this example):

INSERT INTO mysql_servers(hostgroup_id, hostname, port)  
VALUES (1, 'your_additional_read_replica_instance_endpoint_1', 3306),  
       (1, 'your_additional_read_replica_instance_endpoint_2', 3306),
       ...;

LOAD MYSQL SERVERS TO RUNTIME;  
SAVE MYSQL SERVERS TO DISK;  

-- Add additional read replica instances

Configure load balancing settings
By default, ProxySQL uses the round-robin algorithm to balance the load among the read replicas. You can adjust the weight of each server to control the traffic distribution. A higher weight means more traffic will be sent to that server.

To modify the weight of a read replica, update the weight column in the mysql_servers table:

UPDATE mysql_servers SET weight = new_weight  
WHERE hostgroup_id = 1 AND hostname = 'your_read_replica_instance_endpoint';

LOAD MYSQL SERVERS TO RUNTIME;  
SAVE MYSQL SERVERS TO DISK;  

Here we can say for readreplicainstance_endpoint. We will put 50, then next one, we can put 50 if two read replicas are added. Overall that makes them split traffic even. Monitor the performance of your read replicas and adjust their weights accordingly. You can use the ProxySQL stats schema to gather information about traffic distribution and query performance.

alt

For example, to view the traffic distribution among the read replicas, you can run:

select * from stats_mysql_connection_pool;  

You can see several queries sent and other useful data:
alt

Adjust the read replica weights based on the performance metrics and your application's requirements. Continue to monitor and fine-tune the load balancing settings to achieve optimal performance.

Remember that adding more read replicas may increase costs but can help distribute read traffic and improve performance for read-heavy workloads. Ensure you monitor your instances' resource usage to balance performance and price. All the above can be partially or fully automated with a systemd or cron task checking the state of the AWS RDS server and then scaling up replicas based on selected patterns.

We can also enable ProxySQL UI and watch stats in the browser.
Ref article: https://proxysql.com/documentation/http-web-server/

Connect to the ProxySQL admin interface:

mysql -u admin -p -h 127.0.0.1 -P 6032 --prompt='ProxySQLAdmin> '  

Then:

SET admin-web_enabled='true';  
LOAD ADMIN VARIABLES TO RUNTIME;  

I usually use SSH tunnel to connect and get to the UI, for example:

ssh -p <SSHport> -L 6080:127.0.0.1:6080 <SSHuser>@<SSHhost>  

Then you can open up your favorite browser and visit https://127.0.0.1:6080
username and password by default are admin:admin

Where are problems here?

A typical issue with Indexer is the following:

Product Price index process error during indexation process:  
SQLSTATE[Y0000]: <<Unknown error>>: 9006 ProxySQL Error: connection is locked to hostgroup 0 but trying to reach hostgroup 1, query was: SELECT `i`.`entity_id`, `o`.`option_id` FROM `catalog_product_index_price_temp` AS `i`  
 INNER JOIN `catalog_product_entity` AS `e` ON e.entity_id = i.entity_id
 INNER JOIN `catalog_product_option` AS `o` ON o.product_id = e.entity_id
Catalog Search index process error during indexation process:  
SQLSTATE[Y0000]: <<Unknown error>>: 9006 ProxySQL Error: connection is locked to hostgroup 0 but trying to reach hostgroup 1, query was: SELECT `indexer_state`.* FROM `indexer_state` WHERE (`indexer_state`.`indexer_id`='catalogsearch_fulltext')  

To fix:
As is explained in the documentation

https://proxysql.com/documentation/global-variables/mysql-variables/#mysql-setquerylockonhostgroup

So to resolve this issue on the single primary server on Group replication, try to do this.

Log into ProxySql admin monitor.

set mysql-set_query_lock_on_hostgroup=0;  
load mysql variables to runtime;  
save mysql variables to disk;  

Not mandatory, but the ProxySql service can be restarted to check if variables are saved on disk. Every setting here was done with "hot-reload". The advantage of ProxySQL service is that we can use proxysql.cnf file and add them as default instead.

Stress testing this just phase one. I will update you with additional how time goes on. Using the tool mysqlslap
(https://dev.mysql.com/doc/refman/8.0/en/mysqlslap.html), I've tried to stress/load test this with 100 concurrent users sending some generic SELECT query to see how READS are spread across the board. It was to select all Quote Items:

gitpod /workspace/magento2gitpod $ mysqlslap -unemanja -P6033 -p -h127.0.0.1  --concurrency=100 --iterations=20 --create-schema=nemanja --query="SELECT * from quote_item" --verbose  
Enter password:  

Benchmark
The average number of seconds to run all queries: 0.369 seconds
Minimum number of seconds to run all queries: 0.218 seconds
Maximum number of seconds to run all queries: 1.144 seconds
Number of clients running queries: 100
The average number of queries per client: 1

Next, Benchmark using the same query was a lot faster since the query entered Query Cache, which ProxySQL provides by default even if AWS RDS is 8.0.26 used.

Benchmark
The average number of seconds to run all queries: 0.292 seconds
Minimum number of seconds to run all queries: 0.184 seconds
Maximum number of seconds to run all queries: 0.494 seconds
Number of clients running queries: 100
The average number of queries per client: 1

You can load separate terminals or screens (monitor) and watch how SELECT queries from the same Load/Stress test are spread across two Read replicas that I've created in this Concept Proof demo:

watch -n 1 "mysql -u admin -padmin -h 127.0.0.1 -P 6032 -e 'select * from stats_mysql_connection_pool;'"  

I hope this article was helpful. Good luck!