Magento response is non linear to traffic increase – i.e. when traffic increases the response time does not go down linearly – instead response drops dramatically and erratically when a saturation point is reached. In this case study we show how a customer faced routine downtime with their Magento site and what we did to get them to host appropriately.
When hosting Magento for the first time, a big question remains is what size host is appropriate to the requirement. A do-it-yourself person would google for answers, talk to hosting providers, etc. The result is an unscientific assessment and hence wrong size. We routinely get customers such as the one featured in this case study and we help them scale right.
The customer was hosted with US based Arvixe (https://www.arvixe.com/). Arvixe is known for its low priced servers and 24×7 support. With the help of Arvixe engineers, the customer was setup on a bare metal server. As is done in the hosting industry, a WHM (https://cpanel.com/products/) based hosting was made available to the customer and Arvixe engineers help setup the Magento website. Arvixe enginners also helped them setup MX records to point them to the server – a mail server was also enabled and the customers work emails were received and sent from here. In order to ease the customers developers access a phpmyadmin was setup to give access to the database which was running on the same server.
Everything worked with the exception of one problem known to the customer – routine downtime. The Arvixe 24×7 support was very responsive in restarting services and the customer gained expertise in WHM to do some restarts themselves.
During our conversation with the customer we realized they knew that they were outgrowing their environment. We also realized that there were fundamental differences in our approach to hosting a production Magento website and Arvixe’s. But the customer wanted more analysis before they would accept our solution.
We spent 1-2 weeks observing the problems the customer was having with their server. During this time we analyzed their apache log files to see what were the rates of hits per minute they were getting. We graphed it in excel to see the results. We even added a log parameter to show the response time for each hit. We then analysed a downtime incidence – instead of restarting services, we took the error log files, studied the system resource utilization using linux “top”command. We also made more observations of downtimes and came up with different types of downtimes.
- Database size.
Here are the top 10 tables sorted by size.
+--------------------------------+------------+--------+--------+------------+ | TABLE_NAME | table_rows | data | idx | Size in MB | +--------------------------------+------------+--------+--------+------------+ | j2t_autoadd_salesquote | 21712 | 507.55 | 1.86 | 509.41 | | core_url_rewrite | 1517105 | 328.56 | 590.02 | 918.58 | | catalog_category_product_index | 2528808 | 233.08 | 241.34 | 474.42 | | catalog_product_index_eav_idx | 1692500 | 214.53 | 363.36 | 577.89 | | catalog_product_entity_varchar | 3338042 | 196.70 | 271.59 | 468.30 | | catalog_product_index_eav | 1724290 | 172.58 | 260.41 | 432.98 | | log_url_info | 1032207 | 169.22 | 0.00 | 169.22 | | catalogsearch_fulltext | 195208 | 120.45 | 81.57 | 202.02 | | catalog_product_entity_text | 505555 | 118.63 | 38.09 | 156.72 | | log_visitor_info | 556272 | 111.16 | 0.00 | 111.16 | +--------------------------------+------------+--------+--------+------------+
Some tables in Magento are truncatable. Some tables like core_url_rewrite required to be rebuilt periodically.
- Database cache was not tuned.
The SQL query used by the menu was not cached. Each hit ran the query.
The sql query cache tuning was set too low a number. It was as per general recommendations but it hurt this Magento site.
- While there were 8 cores CPU, apache was configured to run 200 processes. When there was a spurt in hits, all hits responded badly and mysql could crash under this load as it ran out memory. Each of the apache process would grow to 500MB, increasing the memory requirement on load.
- They used email to communicate internally – large design files were routine transferred among employees as attachments. Whenever such an email was sent to many employees – who were all configured on outlook to download emails – the main site slowed down.
- There were coding errors that would routinely return an error on a page.
- WHM made it too easy for change in configuration settings, many experimental and some leading to a rebuild of the entire software stack, leading to inadvertent downtime of the website.
Our solution included
- Host on 2 servers – one for database the other for application. Servers were sized as per observed load conditions. The database server had a very fast disk to reduce IO Waits. This would be with our standard open source stack with tuning. It included
- redis (https://redis.io/) for Magento caching and Magento session storage,
- nginx + php-fpm,
- Percona mysql database with percona tools,
- fully secure with 2 level authentication for admin and
- no password access.
- Backup with a disaster recovery plan.
- With the help of a plugin we implemented a search using Sphinx (http://sphinxsearch.com/)
- Moved out the email to a office365 with an automated process using imapsync (https://imapsync.lamiral.info/) to move the emails to the new setup.
- Not give WHM or phpmyadmin access due to potential security concerns.
Our proposal asked them to spend more on hosting and support in return for stability and scalability.
Here is what the customer had to say about us 6 months later.
“We started with shifting our servers to a new host and moving managed services of our magento based shopping cart … our downtimes have come to nearly zero … People are pleasant … and show genuine concern for our business. We look forward to a long term association with them due to their competence and warm positive outlook.”
That was January of 2014. They continue to be our customer and a great source of referral