Horizontal Scaling

Amazon RDS v/s Amazon EC2

When hosting Magento on AWS, RDS is often assumed to be the default database.

At luroConnect we do not use RDS and have saved thousands of $ for many of our customers, each month. We wrote up a RDS vs EC2 focussed on Magento - born out of migrating a customer from RDS to EC2.

The story we had was interesting - as we onboarded the customer over a year ago, the agency insisted that we use RDS. We had calls trying to convince them we could handle the db on EC2. The Magento database was over 500GB, with a lot of products and 1000ish orders a day.

However, a year later, we were asked to see how we could reduce the AWS costs. And the RDS costs stuck out very prominently. A RDS + a read only replica of a busy website can cost a lot! However, the agency was ready to try - provided we gave them a ~0 downtime transition and an option of ~0 downtime rollback to RDS.

The feedback when we moved to EC2? "We find uncached pages are served faster".

The migration learnings will lead to more articles, but this one focusses on the argument of why we use EC2 instead of RDS for Magento. Needless to say, we have built devops tools and processes which makes this decision a no brainer if you host on luroConnect

It is about marketing, isn’t it? A feature you can sell at a premium, a feature not easy for competitors to emulate, a feature you may never use!

RDS is one of them! Anyone with or without having worked with AWS tells me it is a no brainer to use RDS for your mariadb database – in our case for Magento websites. No questions, no arguments. I do get into arguments, and I am shown AWS documentation.

Ofcourse RDS has features not easy to emulate. But AWS has been good in its documentation, so for those who care to read, it is possible to replicate, at least where it matters.

RDS from my perspective, has these features:

  • Easy to change configuration values. The UI tells you clearly what parameters need a restart and what don’t. In any case, you expect the change is committed so a restart will retain that. The last bit is crucial – especially when a restart is not needed. You make a change in a parameter and forget to commit. A restart and you lose your changes.
  • Atomic writes – not needing “double write buffers”. What are double write buffers? “This buffer was implemented to recover from half-written pages. This can happen in case of a power failure while InnoDB is writing a page (16KB = 32 sectors) to disk. On reading that page, InnoDB would be able to discover the corruption from the mismatch of the page checksum. However, in order to recover, an intact copy of the page would be needed.” Double write buffers are for safety but leads to performance issue “Both the checksum calculation and the double writing consume time and thus reduce the performance of page flushing. The effect becomes visible only with fast storage and heavy write load.” For magento this is websites with a large catalog and higher IOPs disks. AWS RDS gives atomic writes. But, AWS also documents how this can be implemented in EC2. Please click here to access the AWS documentation.
  • Easy to create a readonly replica, reliably. If you have worked with mysql / mariadb, you know issues with creating a readonly replica (slave). Lots of documentation, but when it actually comes to making a slave, there is always a doubt if it will work. The key reason is getting a consistent snapshot or a backup. Even using AWS snapshot, we have not found a way to do this reliably and without downtime – either by creating a write lock, take (or start) a snapshot and release the lock or stop mysql, take (or start) a snapshot and restart mysql. Taking a RDS slave on the other hand seems to work without a downtime. Alternatively, a backup strategy also requires some locking – provided in the mysql command line. When using a backup as a strategy we have always found it better to take backups when indexing or setup upgrade is not running.
  • AWS RDS is also quicker in making the slave – even though we may use a snapshot strategy. AWS snapshots can takes time based on the size of the (occupied) disk. For large disks it can take long.

Easy to add proxy Again it is just an option to select and a proxy starts. There is no need to configure CPU/RAM of the proxy. However, there is no guide to tell you a proxy will give a performance improvement.

But let us look at the flip side – costs. This was part of analysis we did for an existing customer in June 2025. If you use RDS the only savings option is a Reserved Instance. EC2 gives a “lighter” commitment with savings plan. (RDS estimate is for single AZ, no proxy. All disks are 12000 iops, 500Mbps. All in Ohio region.)

A 50% month-on-month cost difference cannot be ignored!

A few seconds downtime during slave creation, an external proxy (we use proxysql on EC2) has to be set and configured. Managing configuration values with discipline can possibly accomplished by a shell script – with knowledge of what values are dynamically updating and which are not. Values are always written to the my.cnf file, ensuring a restart will keep new values.

NOTE: We have had many devops swear RDS is faster. That has not been my experience – for same configurations, etc. Open to discussion on this topic. RDS Aurora is a different product and a new blog for later!

Using AWS Autoscale “warm pools” to reduce costs

AWS Autoscale added a new feature “Warm Pool”.  Let us explore this feature and see how luroConnect uses this to reduce hosting costs.

The autoscale latency problem

Usually, AWS Autoscale will launch a new server with the given AMI image based on the launch configuration or launch template configured. Launching a new server takes about 4 minutes or more. So let us say a scale-out event is configured for launching a server when the CPU across all autoscale instances exceeds 70% for 1 minute. Now, let us say a sale promotion on facebook causes a surge in traffic causes this event to trigger. It takes AWS 4+ minutes to respond and add a new server. If during this 4 minute period, the surge goes past 70% and say reaches 90-100%, it is likely that visitors will see a slowdown or even errors. The 4+ minute period is called the autoscale latency and in designing the scale-out and scale-in parameters, it plays a crucial role.

For a website that sees frequent surge in traffic in short spurts, one would be prompted to use a lower threshold for a scale-out event. A lower threshold will result in frequent triggering of scale-out events.

At the same time the scale-in threshold will also have to be reduced to ensure enough spread between scale-out and scale-in events. A lower spread will result in an unhealthy sequence of a scale-out event adding a resource for it to be immediately removed.

Autoscale designers then tend to add higher number of minimum instances, possibly of larger sizes. That reduces the effectiveness of autoscale – and increases AWS costs.

Lowering the autoscale latency results in a better autoscale system. As the latency reduces, the need for larger number of minimum instances or larger size instances reduces. This results in savings in the AWS bill.

Introducing the warm pool

AWS now introduces the concept of a warm pool. The costs saving of a warm pools come from AWS policy for not charging for instances in stopped state – except for the disks. A warm pool is a set of autoscale instances that are launched but kept in stopped state. When a scale-out event happens, the latency Is now reduced to the boot time of an instance and any initialization needed – we measured adding 3 instances took about 35 seconds to start serving traffic for Magento.

A scale-in policy simply stops the selected instance and add it back to the warm pool.

Warm Pool For Autoscale

How to use a warm pool?

If you are using launch template for your autoscale, creating a warm pool is easy and documented here. If using lifecycle events, newer events have been introudced.

If using a launch configuration, we suggest upgrading to a launch template before using a warm pool. While upgrading to a launch template is easy, it is advisable to read about launch templates as they are a different and a larger concept.

Changing your instance image when in a warm pool

AWS has support for “instance refresh” – a term used by AWS to indicate an update in AMI image for all running and warm pool instances in a single command. However, this update has a crucial flaw – it can keep your website inaccessible for a short time. This is due to AWS terminating an instance before adding one. If an image has to be updated – such as a new code deploy – a custom strategy has to be deployed to ensure the website does not go down.

luroConnect support for warm pool

luroConnect now supports warm pools across all its autoscale plans, with a scripted image update policy that ensures 0 downtime during image change as well as a code deploy strategy that ensures 0 downtime on code deploy.

Issues with AWS Reference architecture and tools for a Magento application

At luroConnect we implemented our autoscaling system after addressing flaws in many implementations we had seen.

As AWS autoscale by default is integrated into AWS load balancers – ELB or ALB. Using AWS reference implementation will put the code in a autoscale instance with nginx or apache with php and the code. Traffic can be routed through the ELB/ALB which will handle SSL and route the traffic to each autoscale instance.

When code has to be updated, a new AMI will be created and AWS instance refresh can be run to update the instances.

You could use AWS CodeDeploy as described here but you need to set it up to make sure Magento setup upgrade can be run when required.

Problems with autoscale implementations for Magento

  1. Issues configuring FPC (Full Page Cache) with this configuration : If varnish is configured on all autoscale instances (as we have seen many implementations do), each server will warm caches on its own. Clearing pages from cache will also be difficult. Using redis as a FPC increases per page latency for cached pages.
  2. Media and var folders are needed to be shared across all servers. NFS is typically used to share. However, the configuration of each autoscale instance has to be such that it can discover and mount the folders from the NFS server.
  3. When a code change has to be deployed, it is not clear how it can be done without causing a downtime of the website. Using AWS Code Deploy requires a complex setup to ensure setup upgrade is run before one of the 0 downtime strategies can be used.
  4. When a new server is launched, conditions to check the health of the website are not easy to write. This results in a few error responses before the server is ready to serve traffic.
  5. It is difficult to use a AWS ALB to route traffic for specific purposes – for example, routing traffic to a wordpress server for /blog urls.

luroConnect Autoscale on AWS : Smooth setup and running.

luroConnect Autoscale solves these problems.

luroConnect lets AWS monitor instances and decide when to add or remove (scale out or scale in) instances. luroConnect autoscale for AWS adds cloudwatch events and lifecycle management generated by AWS Autoscale to ensure a very smooth Autoscaling operation. luroConnect uses nginx as a load balancer and does not require a ALB/ELB to operate. luroConnect Autoscale supports AWS Autoscale with warm instances and has a mechanism to update the AMI when needed without any downtime.

  1. Using nginx as a load balancer allows high flexibility in deciding which urls go to varnish for full page cache and which should be directly served by php. varnish as a full page cache gives the maximum impact of full page caching.
  2. A nfs server holds shareable content of magento - specific media and var folders for example. Using NIS, autofs and NFS, each new app server is able to discover the NFS share.
  3. When a code change has to be deployed, php code using nfs is shared to each app server. A php reload and opcache configuration will ensure the new code is kept in the php opcache memory for all future operations. A php file from NFS share is loaded only once.
  4. Before a server is added to the nginx load balancer, extensive checks are done to ensure the new autoscale instance is ready to take traffic, including warming the opcache.
  5. nginx as a load balancer brings in a lot of flexibilty in routing traffic such as a /blog to a wordpress website, custom rewrites, etc.

Would you like to switch to a modern hosting platform?

Schedule a call of a free evaluation!

With features like ~0 downtime code deploy and autoscale to reduce your hosting costs, luroConnect offers you unparalleled hosting environment for Magento.

Schedule a call and we will show you how we can

  • Improve your hosting, possibly with autoscale
  • Have a managed dev, staging and production environment
  • Server performance measured every minute with alerts for a slowdown
  • A multi point health check every day
  • Optimized hosting costs