In this investigation, we analyzed three solutions: MySQL with MHA (Master High Availability Manager), MySQL with Galera Replication which is synchronous data replication cross-node, and AWS RDS, a data solution from the Amazon promising higher availability and read scalability.
All these three platforms were only evaluated to know how fast the service would recover in case of any crash? And what is the performance while managing both concurrent read and write traffic and incoming writes.
These were then followed by an investigation on knowing how to implement a multi-region higher availability solution when requested by the customers?
Furthermore, this evaluation will then also be used to assist the customers in selecting the best but suitable HA solution.
MySQL + Galera was proven as the most efficient solution in the presence of read and write load. Consequently, it performed a full-failover with AWS-RDS in only 15 seconds as compared to 50 seconds. On the other hand, it took more than two minutes for MySQL with MHA.
Tests indicate that MySQL with MHA is the best platform because it is easy to manage read/write operations more efficiently two times faster than the MySQL Galera. But, RDS is consistently in the last place.
From the above tests, we are now in a state to answer a simple question "Which tool is the best for the job?"
If HA and low failover time are the major factors, then MySQL with Galera is the right choice. Otherwise, MySQL with MHA is aright choice if the focus is on performance and business capable enough to afford several minutes of downtime.
Finally, RDS is recommendable if there is a need for an environment with limited concurrent writes, notable scale in reads, and the need to cover the bursts of read requests in a high-selling season.
Why This Investigation?
The research and tests are done to answer questions for clarification and recommendations around the RDS and EC2. The main objective was to focus on real scenarios and numbers seen in any production environments.
Every possible thing was done to execute the tests consistently across different platforms.
Errors were pruned by the errors while executing tests on a different platform before the data collection. Also, it had scopes to identify the saturation limit which was unusual for every tested architecture.
During the real-time test execution, we repeated the test execution many times intending to identify and reduce all kinds of possible deviation.
Things to Answer:
This investigation was to answer the following things by platform
- Time to failover
- Service interruption time
- Lag in execution at saturation level
Ingest & Compliance test
- Execution time
- Rows inserted/sec
- Rows select/sec (compliance only)
- Rows delete/sec (compliance only)
The Description on MHA, Galera, and RDS
MHA solutions sit on top of the MySQL nodes always checking the status of every node, and using the custom scripts to manage the failover.
One important thing we must keep in mind is that MHA doesn't act like the “man in the middle,” and no connection is sent to the MHA Controller. Instead, the controller manages an entry point with a Virtual IP as an HAProxy setting.
At the MySQL level, the same MHA controller recognizes the failing master and elects the most updated new master. It also tries to re-adjust the gap between the failed master and the new master by using original binary logs only if it is available and accessible.
Scalability is provided through standard MySQL design, having one master in the read/write and several servers in the read mode. Replication is also asynchronous; therefore it easily lags while leaving the read nodes quite far behind the master.
MySQL with Galera Replication
MySQL + Galera works on the cluster of nodes sharing the same dataset.
It means that MySQL+Galera is a cluster of three or five MySQL instances that shares the same dataset, where the data synchronously is distributed between nodes.
The focus is more on the data, but not on the nodes sharing the same data and status. Transactions are distributed across all the active nodes representing the primary component.
These Nodes can leave and re-join the cluster by modifying the conceptual view which expresses the central component.
NOTE: Each node at the end of the transaction has the same data; given that the application will connect to any node and read or write data consistently.
As mentioned previously, replication is a virtual synchronous where data is locally validated, certified, and the conflicts are managed to keep the data internally consistent.
Additionally, the failover is more of an external need than the Galera one. It means the application can be set to connect only one node or on all the available nodes. If one of the nodes isn't possible, the application should utilize the others.
However, not all the applications have this out of the box function!
It is a common practice to add another layer with Virtual IP as HAProxy to manage and distribute the application connection across the nodes or else to use a single node as the main point of reference, then shifting to the others if needed.
But, the shift is limited to move the connection points from one node to another, in simple words Node A to Node B.
MySQL/Galera's write scalability is limited due to the capacity for managing the total amount of incoming writes by a single node. There is no write scaling while adding the nodes.
Therefore, MySQL/Galera is simply a write distribution platform and the reads can be performed consistently on every node.
RDS is based on the ease of management, data on disk, built-in backup, , etc. Also, Amazon states that RDS offers a better replication mechanism of~100 ms lag.
This architecture is based on the main node and the other number of nodes acting as a read/write node and read-only nodes respectively. When reading from the replica nodes, the replication is within ~100ms, safe and effective.
A read replica node can be distributed by availability zone (AZ), but it must reside in the same region.
The applications connect to an entry point that never changes. In case of failover, the complete internal mechanism moves the entry point from the failed node to a new master, including all the read replicas which will be perfectly aligned to the new master.
Data replication is at a low level and is directly pushed to a read data store from the read/write data store. And here is a situation where there should be a minimal lag (~100ms) and limited delay.
RDS replication is not synchronous, only one node is active at a time, and there no consistency in data validation or check.
In terms of scalability, RDS is not to write scaling, and the only way to scale the writes is by upgrading the master instance to a powerful one while the reads are also scaled by adding a new read replica.
What About The Multi-Region?
One of the increasing requests we receive is to design the architectures capable enough to manage the failover across regions.
There are many issues in replicating data across the regions such as data security, frame dimension modifications, etc. given we must use existing networks for that.
For the multi-region investigation, it is important to note that the only solution which offers internal replication cross-region is the Galera using Segments.
But, we usually never recommend using this solution while talking about the regions across countries and continents.
Galera is also the one which helps in optimizing asynchronous replications, using multiple nodes to replicate on some other cluster.
AWS-RDS and standard MySQL must rely on the basic asynchronous replication along with all the related limitations.
Architecture for the investigation
The investigation we conducted used several components which are given below:
- EIP = 1
- VPC = 1
- Subnets = 4 (1 public, 3 private)
- HAProxy = 6
- MHA Monitor (micro ec2) = 1
- NAT Instance (EC2) =1 (hosting EIP)
- DB Instances (EC2) = 6 + (2 stand by) (m4.xlarge)
- Application Instances (EC2) = 6
- EBS SSD 3000 PIOS
- RDS RDS node = 3 (db.r3.xlarge)
- Snapshots = (at regular intervals)
- MySQL_HA_failover - POC architecture
The application nodes connect to the databases either using an RDS entry point or HAProxy.
For MHA, the controller action was changed to act on an HAProxy configuration instead of a VIP (Virtual IP).
For Galera, the shift was driven after the recognition of the node status using HAProxy check functionality.
Meaning replication was distributed across several availability zones; no traffic from outside the VPC reached the databases.
ELB connected to the HAProxy instances is a right way to rotate across several HAProxy instances if additional HA layer is needed.
For the broader scope of the investigation, each application was hosting a local HAProxy by using the “by tile” approach. The ELB was tested in that phase which was dedicated to identifying errors and saturation, but not in the performance and HA tests.
Furthermore, the NAT instance was configured for allowing access to each HAProxy web interface in order to review statistics and the node status.
Tests Done By Us
We performed 3 different types of tests:
- High availability
- Data Ingest
- Data compliance
TEST 1: High Availability
This test was quite simple; we ran script while inserting, collecting the time of the execution command, storing the time of SQL execution with NOW(), returning the value to be printed, and lastly collecting the error codes from the MySQL command.
The result was as follows:
2015-09-30 21:12:49 2015-09-30 21:12:49 0
/\ /\ /\
| | |
Date time sys now() MySQL exit error code (bash)
Log from bash :
2015-09-30 21:12:49 2015-09-30 21:12:49 0
Log from bash:
2015-09-30 21:12:49 2015-09-30 21:12:49 0
Inside the table:
CREATE TABLE `ha` (
`a` int(11) NOT NULL AUTO_INCREMENT,
`d` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
`c` datetime NOT NULL;
PRIMARY KEY (`a`)
) ENGINE=InnoDB AUTO_INCREMENT=178 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci
select a,d from ha;
| a | d |
| 1 | 2015-09-30 21:12:31 |
The HAProxy settings, for Galera were;
server node1 10.0.2.40:3306 check port 3311 inter 3000 rise 1 fall 2 weight 50
server node2Bk 10.0.1.41:3306 check port 3311 inter 3000 rise 1 fall 2 weight 50 backup
server node3Bk 10.0.1.42:3306 check port 3311 inter 3000 rise 1 fall 2 weight 10 backup
We ran the test scripts on different platforms without heavy load close to the MySQL/RDS saturation point.
High Availability Results
MySQL with MHA took around 2 minutes to perform the full failover.
Under stress, the master was ahead of the slaves, and replication lag was very significant such that failover with binlog application took too long to consider a valid option in the comparison to the other two.
This result was not at all a surprise but pushed us to analyze the MHA solution more independently because the behavior was completely diverging from the other two and wasn't comparable.
The more interesting behavior was between MySQL/Galera and RDS. MySQL/Galera stood consistently more efficient than the RDS, without or with the load.
It is worth mentioning; of the 8 seconds taken by the MySQL/Galera to perform failover, 6 seconds were due HAProxy settings having 2 set loops before executing failover and 3000 ms interval.
RDS performed a decent failover when the load was low, but on increasing load, the read nodes became less aligned with the write node and weren't ready for the failover.
ALTER SYSTEM CRASH [INSTANCE|NODE]
Also, while doing the tests, we had the opportunity to observe RDS replica lag using CloudWatch that reported a much higher lag value than the claimed ~100 ms.
In this case, we were getting almost 18 seconds of lag in the RDS replication which is much higher than ~100 ms!
Here, 2E16 has several decades of latency.
Another interesting metric which we collected was the latency between the moment of execution and the application sending requests.
Once more, MySQL/Galera managed the requests more efficiently. Also, with a high load and saturation level, MySQL/Galera took 61 seconds to serve the request while RDS took 204 seconds for the same operation.
It indicates how high can be the impact of load in case of saturation for the execution and response time. It's an important metric that must be observed well to decide when to scale up.
Exclude What is Not a Full Fit
As mentioned previously, this investigation was done to answer several questions about the high-availability, the failover time, the MySQL/MHA analysis, etc. We analyzed the performance of the MHA MySQL solution in conjunction to the others and now are presenting the complete information.
TEST 2: Ingest tests
It was done to know how the two platforms behaved in a significant amount of inserts.
We used IIbench with a single table and our own StressTool which instead uses several configurable tables in addition to other configurable options such as:
- Batch inserts
- Insert rate
- Different access method to PK (simple PK or composite)
- Multiple tables and configurable table structure.
In the table definition, two benchmarking tools also differ:
CREATE TABLE `purchases_index` (
`transactionid` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
`dateandtime` datetime DEFAULT NULL,
`cashregisterid` int(11) NOT NULL,
`customerid` int(11) NOT NULL,
`productid` int(11) NOT NULL,
`price` float NOT NULL,
`data` varchar(4000) COLLATE utf8_unicode_ci DEFAULT NULL,
PRIMARY KEY (`transactionid`),
KEY `marketsegment` (`price`,`customerid`),
KEY `registersegment` (`cashregisterid`,`price`,`customerid`),
KEY `pdc` (`price`,`dateandtime`,`customerid`)
) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci
CREATE TABLE `tbtest1` (
`autoInc` bigint(11) NOT NULL AUTO_INCREMENT,
`a` int(11) NOT NULL,
`uuid` char(36) COLLATE utf8_unicode_ci NOT NULL,
`b` varchar(100) COLLATE utf8_unicode_ci NOT NULL,
`c` char(200) COLLATE utf8_unicode_ci NOT NULL,
`counter` bigint(20) DEFAULT NULL,
`time` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
`partitionid` int(11) NOT NULL DEFAULT '0',
`date` date NOT NULL,
`strrecordtype` char(3) COLLATE utf8_unicode_ci DEFAULT NULL,
PRIMARY KEY (`autoInc`,`date`),
KEY `IDX_a` (`a`),
KEY `IDX_date` (`date`),
KEY `IDX_uuid` (`uuid`)
) ENGINE=InnoDB AUTO_INCREMENT=1 CHARSET=utf8 COLLATE=utf8_unicode_ci
IIbench was executed after using 32 threads for each application server while the StressTool completed running 16/32/64 on each of the application nodes finally, resulting in 96, 192, and 384 threads respectively where each is executing the batch insert with a 50 insert per batch (19,200 rows).
Ingest Test Results are as follows
FOR IIBENCH -
- Execution time : Time to insert ~305 ML rows, in one single table using 192 threads was more for RDS, less for Galera EC2.
- Rows Inserted/Sec :
- Insert Time: The result is once more quite clear that MySQL/Galera manages the load in one-third of the time taken by RDS. MySQL/Galera was found more consistent in the insert time taken along with the growth of the rows number inside the table.
The number of rows inserted was reduced faster in MySQL/Galera as compared to the RDS.
JAVA STRESS TOOL -
- Execution Time: This test mainly focused on multiple inserts (like IIbench), but by using an increasing number of multiple tables and threads. This test shows what could happen in real life if given parallelization and multiple entry points are more common than a single table insert in the relational database.
- In this test, RDS performs better, and the distance was less evident than the IIbench test.
Also, we can see that RDS was able to perform as a standard MySQL instance, but it didn't continue with the increase of concurrent threads. MySQL/Galera managed the load of 384 threads in the 1/3 of the time taken by RDS.
If we go deep, we can see that MySQL/Galera manages the commit phase part more efficiently along with the replication and data validation.
- Row inserted/ Commit Handler Calls/ Com Commit: In short, we can say that MySQL/Galera performed better than the RDS.
TEST 3: Compliance Tests
The compliance tests we ran used Tpcc-MySQL and StressTool with 200 warehouses and 15 parent tables, 15 child tables respectively generating a basic dataset of 100K entries via Select/Insert/Delete.
All tests were done with the saturated buffer pool.
Tests for Tpcc-MySQL used 32, 64, and 128 threads while the StressTool used 16, 32, and 64 threads.
Compliance tests results are as follows;
JAVA STRESS TOOL -
- Execution Time: In this test, the applications performed concurrent access and read/write action in rows on several tables. It is clear that MySQL/Galera processed the load more efficiently than the RDS. But, both platforms had a significant increase in the concurrent threads and execution time.
Both reached a saturation level with this test by using a total of 192 concurrent threads. The saturation was at a different moment, hitting a different resource during the 192 concurrent threads test.
But, in the case of RDS, it was CPU-bound with the replication lag, for MySQL/Galera the jam were i/o and flow control.
- Rows Insert and Read: With the rows managed, MySQL/Galera performed far better in terms of quantity of rows, but this trend significantly went down while increasing the concurrency. Both the read and write operations were affected while RDS was managing less in terms of volume and became more consistent while the concurrency increased.
- Com Select Insert Delete: If analyzed by the type of command, it is simply possible to identify that MySQL/Galera was affected more in the read operations while write operations showed less variation.
- Handlers Calls: In the write operation, RDS inserted less volume of rows but was more consistent with the increase in concurrency. It is probably because of the load which exceeded the capacity of the platform.
On the other hand, RDS acted at its limit. MySQL/Galera managed the load better with 2 times the performance of RDS, even if the increasing concurrency was negatively affecting the trend.
TEST 4: TPCC-MySQL
- Transactions: The Tpcc-MySQL emulated the CRUD activities of transaction users against N warehouses. In this test, we used 200 warehouses where each warehouse was having 10 terminals in addition, to 32, 64, and 128 threads.
Again, MySQL/Galera was consistently better than the RDS in terms of volume of transactions. Also, the average per-second results were always over around 2.6 times better than the RDS.
On the other hand, the RDS showed fewer fluctuations in serving the transactions having more consistent trends for each execution.
- Average Transactions: Galera has more average insert/sec by number of thread
What Are The Test Conclusion?
MHA excluded, the two other platforms managed the failover operation in a limited time frame, i.e., below 1 minute. Nevertheless, MySQL/Galera was more efficient and consistent, but the RDS didn't justify the episodes of replication lags. This result is a direct outcome of the synchronous replication.
In our opinion, the replication method in RDS was efficient but allowed the node misalignments which is not optimal upon the need to promote a read-only node in a read/write instance.
MySQL/Galera outperformed the RDS in all tests -- by the execution time, number of transactions, and rows managed. Also, scaling up the RDS didn't have the impact we were expecting. Actually, it didn't match the EC2 MySQL/Galera performance in less memory and CPUs.
Even when MHA is excluded, the performance achieved using standard MySQL was far better than MySQL/Galera or the RDS.
General Comment on the RDS
The RDS failover mechanism is not so different from the other solutions. In case of the crash of one node, another node gets elected as the new primary node based upon the "most up-to-date" rule.
Replication is a combination of the block device distribution and semi-sync replication, not affected by the delay in the copy of a block, but gets dispatched. The volume’s data will be fixed while copying the data from another location or volume hosting the correct data.
It is similar to the HDFS mechanism, where the latency in the operation is relevant. Also, if the primary node crashes during the election of a secondary, there will be a service interruption that can lie for up to 2 minutes.
The replication can take about ~100 ms and is dependent on the traffic in writes incoming to the primary server. We have already reported it, it is not true because replication takes a significantly longer time.
More the writes, farther the distance exist between the visible data in the primary node and the one in the replica. No matter how efficient is the replication mechanism, it is not a synchronous replication and doesn't guarantee consistent data reading by the transaction.
Finally, replication across regions is not even present in the design of the RDS solution and must rely on the standard replication.
The RDS is not performing scaling in writes, but perform it in the reads. More power and memory is equal to more writes. That said, we think the RDS is a precious solution that requires extensive read scaling (in/out), for rolling out a product in phase 1.
MHA performance Graphs
As previously mentioned MySQL/MHA acted significantly better than the other two solutions. Also, the IIBench test complete in 281 seconds against the 4039 seconds of Galera.
IIBench Execution Time
MySQL/MHA execution time (Ingest & Compliance)
The execution of the longest and compliance test in MySQL/MHA was three times faster than MySQL/Galera.
MySQL/MHA Rows Inserted (Ingest & Compliance)
The number of inserted rows are consistent with the execution time, being three times to the one allowed in the MySQL/Galera solution. MySQL/MHA managed the increasing concurrency by simple inserts in the case of concurrent read/write operations.