It all started when we were drafted by a customer to give our objective view on what hosting provider they should select. Simultaneously, after years of recommending physical servers to host Magento, the high performance folks at Magento tell us it is ok to use cloud servers and changed pricing of Magento Enterprise, when really nothing has changed, other than customer opinion that cloud servers were better. Our customer demanded a logical explanation with proof. That set us out to find measures and metrics that should really matter. The outcome was luroConnect Insight, but here was our journey.
A public event with the Indian eCommerce website Flipkart occurred. (Flipkart is not our customer and our observations are from the public domain). Flipkart Is a India based eCommerce vendor – a unicorn company with multi-billion dollar valuation. Flipkart came with their first “billion day” sale. Few hours into the sale, Flipkart announced they had reached their goal for the day and announced the massive success. Almost immediately, social media got flooded with customer complaints of how they could not get their purchase completed. Some of them were typical of such a sale where items get out of stock by the time you checkout, but many were hosting related – some customers got response times much worse than the average. We can assume Flipkart was monitoring sales numbers and scaling their app servers based on CPU and memory usage. But they did not see any problems that users saw – or otherwise the success message would not have been withdrawn to manage a customer support crisis.
For the purpose of giving an answer to our customer, we first evaluated AWS, the platform the customer was already using. And we came with a framework for evaluation that would give direct answers.
- We used jmeter running on the internet (to ensure request latencies matched expected access). We recorded a few typical user actions from login to purchase.
- We then simulated multiple users starting from 1 and scaling up.
- We monitored using nginx log file, the time for each response as well as time for php to respond – the nginx $upstream_response_time. We further averaged the readings on a per minute basis and in order to highlight the variation in response we calculated standard deviation on a per minute basis.
We then repeated the experiment on 3 other providers – cloud servers of Rackspace and elastichosts and physical servers from Indian provider Netmagic. Since each provider gives a different configuration, there cannot be a 100% accurate comparison. Trends can be extrapolated though. Here is a summary of what we found.
- Physical servers do give the most consistent results and scale upto a well defined limit. But, ensure the CPU and other hardware are new.
- Cloud servers give a higher variation in response and in general deteriorate faster and more dramatically than physical servers.
- AWS provides very good raw hardware giving terrific response. But you need to size your server appropriately. AWS and other cloud services give the flexibility of scaling horizontally by adding app servers.
CPU and memory are not a good measure for deciding when to scale – instead knowing how your site performs and using hits to the server is a better way. Infact, a well configured system will not get overloaded!
- Magento Managed Service
- A done-it-for-you Magento managed hosting – any hosting platform
- We manage OS and Magento stack
- Configure, monitor and act – giving you piece of mind
- luroConnect Insight
- A cloud platform that monitors Magento servers, giving insights with dashboards, alerts and reports.
- All relevant data is extracted using a server side agent
- Drill down report on badly performing URLs so developer can gain insights – the 404, 5xx and php-exception report.
- BOT report giving information of what BOTs are seeing including performance. Google uses site speed to rank – that is not just the home page. You need to know what site performance google sees.
- (Coming soon!) Drive AWS scaling directly based on these parameters