caches

Managing caches is key to performance in Magento and Wordpress. These blog articles discuss cache management.

Deploying a Magento PWA project

Why PWA might be the future of headless eCommerce

Progressive Web Apps (PWAs) are designed to address the mobile revenue gap indicated below.

In most markets, online retail has a higher proportional audience reach among mobile users than desktop. However, mobile sales numbers are much lower. There can be many reasons for this, some in the realm of technology.

At one time it was thought mobile engagement was best achieved using an app. This was based on data for mobile users in general, but was possibly skewed towards gaming and social media. An app has advantages of being able to deliver notifications, use mobile features the way a typically website cannot, be installed as an icon resulting in easier access.

On the flip side, there is data suggesting mobile users were reluctant to install apps due to issues like memory limitations. Other disadvantages include a need for an OK from Apple to be on the app store, lack of easy-to-test infrastructure, the rather slow process of distributing updates – for example some users may not have auto update on.

Using service worker technology, widely supported by browsers today, the PWA (or progressive web application) will give some of the benefits of an app without the cost of consuming memory, requiring an OK from Apple and being as easy to deploy new updates as it is to update a website. PWAs are also as easy to test as a website. A PWA will install on a mobile as an icon – much like an app, service workers allow push notifications as well local storage, allowing for some offline capabilities.

From a technology perspective PWAs pose a completely different problem – the relative newness of technology means developers are limited, as are systems to reliably host & deploy. The cost of development will come down as more developers and websites adopt the technology.

Hosting related issues with PWA for Magento

From a hosting and code deployment perspective. Vue-storefront for example, replicates the entire catalog in elasticsearch and uses 2 nodejs processes to run the frontend of the store. PWA Studio is expected to be Magento (read php) native, yet the reference implementation of its Upward Specifcation Is in nodejs. Both developer and production environments pose challenges.

Developer Environment

It is no more a localhost WAMP stack that you can deploy and get a development environment for PWA setup. A single project will require setup of various components (vue-storefront, vue-storefront-api, Magento, graphql, elasticsearch, redis, rabbitmq, etc).

The developer environment will affect the learning curve of the many new developers starting with these new technologies as well as the productivity of experienced developers.

Here are some challenges

  1. Launch a development environment for a new project
  2. Setup a developer environment for an existing project
  3. An ability to change and test any component easily – for example if a js file was modified that affects the UI, what component(s) need be redeployed? Can this process be automated?

It is too early for us to start work on solutions for developers – since we do not do project work ourselves. However, we are working with our partners and we have an eye on releasing a “developer stack”. Contact us if you would be interested.

Production Environment

We recently took a PWA website live. Developed by our partner Codilar, it was a first for us. Some of the challenges faced and lessons learnt are summarized below.

  1. Setting up a production environment.
    Since there were so many components, not natively supported by Magento, many configuration files had to be manually modified.

    1. Vue-storefront (the UI end that replaces varnish in a classic Magento 2) needs to communicate to Redis for Full Page Cache and the vue-storefont-api
    2. Vue-storefront-api communicates with elasticsearch and Magento 2 backend via a rest API. Ideally vue-storefront should replicate the entire catalog through a indexing process into elasticsearch, but that is not fully operational yet.
    3. Magento 2 has its own redis cache and redis session. Magento 2 FPC is not used. Magento 2.3 uses RabbitMQ in addition to connection to its database.
      Here is our architecture for the deployment. We used Virtual machines as shown. We did not use a containerised architecture. The reasons will possibly be a different blog post.
  1. Starting nodejs processes automatically. Vue-storefront uses pm2 for process management. However, developer information and documentation is written using yarn to run the pm2 processes with log files being stored in ~/.pm2. In order for better control from a system administration perspective, we installed pm2 at the global level, generated systemd files (using pm2 startup) and modified them to suit the environment. We can now use “service vue-storefront start/stop/restart”.
  2. Monitoring all the components.
    Log files for each component are taken to a central log processing server using CNCF project fluentd.
    A key challenge is observability of failures. A Magento 2 API failure is not obvious. An error return code from vue-storefront needs to be traced to vue-storefront-api to Magento. Correlating the actual hit that caused a non-fatal Magento error is another challenge.
  3. How to deploy new code with minimum or even 0 downtime
    For Magento 2, until a database change is required (via a bin/magento setup:upgrade and/or indexing), we have a process to make a deployable package, giving an opportunity to deploy with 0-downtime. Check out our bitbucket pipeline presentation.
  4. How can one deploy a vue-storefront based PWA?
    The project we migrated ran on 2 git repos – one for Magento, the other for vue. Upon deploy we need to find the files that changed since the last release and decide if the change is in Magento, vue-storefront or vue-storefront-api and decide the build steps appropriately. Presently since the repos are different, we have 2 separate builds running on the production servers. A pipeline based deploy is our next step.

Note: We think a monorepo for both Magento and vue is essential in the long run due to possibility of versioning incompatibilities.

Conclusion

This is yet early work-in-progress and we hope to update our process and keep updating this article as we go.

Magento Caches – Scaling Magento Part 2

Introduction

Caches are an integral part of strategy of performance and scaling a Magento website. A Cache is the term used to describe a mechanism to store calculated values such as a query or HTML so a subsequent request does not need to recalculate.

While Part 1 looked at FPC specifically, in this article we review all other caches needed for a Magento store. Some of these are commonly found to be referred in best practices, but a deeper understanding will help put them in perspective of Magento.

In general, for any cache the factors to consider are

  • Effectiveness – measured in terms of impact on page speed. Caches should be highly effective.
  • Invalidation flexibility – is an inbuilt automatic mechanism available? Is the mechanism too aggressive? A High score means the mechanism is very flexible.
  • Performance of miss – if a miss were to occur what would be the performance in terms of page speed. A low score represents bad performance.
  • Cost of refill –how much would it cost to refill the cache completely to be back in its most effective state?
  • Hit ratio achievable – In a real world environment if hit ratio were to be measured periodically what can be expected. A high would be over 99% hits over a reasonable period such as a day.
  • Memory required : How much memory is needed on the server side to cache the content.
Cache Type Effectiveness Invaliation Flexibility Peformance of miss cost to refill Hit ratio achievable Memory required
Browser High High Low Moderate High N/A
CDN High Low Moderate Moderate High N/A
FPC High Low Low High Moderate High
HTML Block Cache Moderate High Moderate Moderate High Low
Mysql cache High Low Low High High Moderate

Browser cache

  • Easy to setup – at the hosting level. Static resources like css, js and images should have caching enabled.
  • There is no need for invalidation as the browser checks for each cached resource if a newer version is available.
  • The ability of browsers to find changed resources means cacheable resources should have a very long expiry – say over a year.
  • Browser’s request for detecting change has a performance impact based on number of resources.
  • A key question to be answered is how merging css and js files impact browser caching performance. If merging is enabled, each page type (home, category, product, CMS) are likely to have different css and js files. Not merging allows the browser to cache individual files and reuse them across pages. However, in HTTP/1 browsers limit the number of connections to each domain. So, the advice is to not merge css and js files if using HTTP/2 or splitting domains for skin and js if using HTTP/1.

CDN

  • CDN networks cache at edge servers closer to users – reducing round trip latency from browser to server.
  • CDN also offloads the server from serving static cacheable resources, improving network performance of the server as well as freeing up server CPU to serve dynamic content.
  • CDNs may also take the load of SSL validation however, caution is needed here as the traffic between CDN and server may be unsecure making the site vulnerable to some type of attacks.
  • CDNs are notorious for invalidations – some charge for APIs, others take a few minutes before the invalidations are effective across all edge points. When evaluating a CDN, this is a key factor that is not evaluated.
  • Having many edge locations may not be a good thing – as each edge records the first access to a resource as a miss
  • While a single miss is easily retrieved from the server or backup store, a full invalidation requires multiple GBs to be transferred to make the cache effective again
  • CDNs when full give great performance benefit on page load times
  • Modern CDNs like section.io can also do FPC (html) caching using a distributed varnish cache architecture.

FPC

We have reviewed aspects of FPC in part 1 here.

Recap

  • FPC caches full HTML pages – except for variable content
  • Excellent for caching dynamic content
  • Requires very high amount of RAM on the server side
  • Depending on the quality of code, FPC invalidation or miss can have impact on resource utilization
  • Best implemented with autoscale – so servers are added automatically when cache is invalidated

HTML Block Cache

  • Also caches HTML but cached at the block level. Magento uses HTML blocks for building a page.
  • Since blocks may be shared across pages, these blocks do not have a high impact on invalidation as they will be regenerated once and used multiple times
  • Can dramatically improve performance if used consistently and correctly.
  • Needs developer help as many blocks are not cached by default. To cache a block, one needs a unique key that correctly identifies its variation. Check this technical blog info.
  • Invalidation can be either via a key or time (TTL). If using a key, developer needs to write appropriate event callbacks to detect change.
  • Examples of major speed up include home page blocks where latest products are shown. Depending on frequency of store updated, a 10 minute to 1 hour TTL on the block will result in dramatic improvement of home page speed.

We can analyze your site for free

Schedule a call

Do you know how your website performs and want an expert to look at it?

  • We will analyze your site using public information.
  • We will run a synthetic test from the internet.
  • We will ask you to give us a 1 day web server log file.
  • We will also try to identify what steps if any you should take to improve your sites performance goals.

mysql cache

Mysql can store result queries in memory. The amount of memory is specified in my.cnf as a combination of query_cache_limit (max memory for a single query result) and query_cache_size (max memory all cached queries).

  • mysql automatically invalidates a cached query if any table that was used in the query changes.
  • to access the cache mysql uses a lock on a table thereby reducing the effectiveness of the cache
  • Many mysql articles recommend smaller cache values due to the lock problem but it is best to test the size of the cache for your situation. It is best to monitor “low mem prunes” and “table locks wait”. The first gives the number of times a query was cached by removing another and the latter gives the number of times table lock was not immediately available. Both should be as low as possible.

There are other caches too

  • Php opcache
    php code is read and converted to “opcodes” which are then interpreted. An opcode cache stores the opcodes reducing one step each time. Opecode cache should have sufficient RAM and keys (equal to number of php files).
    Using opcache.php the status should be checked regularly.
    It is best to leave invalidation automatic so when the underlying php file changes the opcode cache is invalidated.
  • Operating system cache
    linux has an excellent file system cache – whenever additional RAM is available, linux will store opened files in RAM. This is important if for example you do not have a CDN. Files will actually be retrieved most likely from RAM rather than disk.
  • Magento has caches other than block. These should always be enabled on a production server.

Conclusion

Setting up caches is crucial for achieving performance in Magento. But with so many caches and so much advice there is overload and scatter of information. In the first 2 parts of our Magento scaling series, we have tried to pull out the important points.

Full Page Cache (FPC): Scaling Magento Part 1

Introduction

The most important and often ignored factor to scaling is the quality of code. Well written code will scale better. The next most important factor is perhaps caching. There are many types of caches that developers, managers and store owners need to understand. Full Page Cache (FPC) is seen by store owners as a magic solution to speed issues. Understanding the benefits and compromises of a caching mechanism is important to understand scaling.

FPC Options

Magento Enterprise 1.x and Magento Community & Enterprise 2.x comes with a FPC module inbuilt.

There are many plugins available for Magento Community 1.x. Some hosting providers will help setup a Varnish based FPC with appropriate hole punching.

Magento 2 has two mechanisms for FPC – php based (called FPC) and varnish. This option generates a .vcl file that will be required by your hosting provider to setup varnish.

The discussion below applies to all these mechanisms as well as to Magento 1.x and Magento 2.

What is Full Page Cache (FPC)?

FPC is a cache of a full HTML page – except variable content such as a login status or items in cart or stock status of a product or sometimes even price of a product. When a hit is encountered – i.e. the page required is in the cache, FPC will return very fast compared to when there is a miss i.e. the page is not in the cache, which will require re-generating the content. FPC may store the cache in files, but more likely for maximum benefit, it will be stored in memory.

FPC affects resource utilization – memory and CPU. In a way it trades memory for CPU time.

Traditional FPC stores the page in Magento cache and is a part of Magento. Varnish stores the page HTML after it has been generated and is not a part of Magento. It is a separate process.

What FPC is not!

Let us understand FPC better

  • Memory needed to store the entire site in FPC
    Let us say each page is 100KB and you have 10000 pages to cache. That would take about 1GB of RAM. The problem is when the number of pages or page size starts rising above this, the RAM requirement goes up. So, if you now had 20000 pages (result of each option in layered navigation for example), you would need 2GB or if each page was 120KB the 20000 pages would need  4GB. Pages are not just products – they are category pages as well. If layered navigation is added the pages multiply fast as each combination is unique and needs to be stored independently. If you start exceeding the RAM available, you need to decide what to do when you hit the memory limit.
  • Cache warming.
    Cache warming is the process of automatically adding pages to the cache before a real visitor hit comes to the cached page. When a cache is cleared, you may need to warm the cache to make FPC effective early. Cache warming uses a crawler to artificially visit pages of a site. A typical crawler will recursively crawl the site starting from the home page. This sounds logical but here are some things to think through

    • If possible find the most likely pages you need to be in the cache and warm the cache with only those pages. This will give the maximum benefit.
    • If you cannot fit all the pages in memory, the use of crawling to warm the caches becomes a problem – they will recycle pages out of memory at random, not based on the end user popularity of the pages.
    • When the cache is being warmed your resource requirement in terms of CPU will rise as both the crawler and real traffic are being served.
    • If possible crawl the site in parallel – the earlier the pages get cached the more likely a visitor to a page will already be in the cache (scoring a hit).

Performance degradation on FPC full invalidation

The above figure shows the bad response immediately when a FPC that had built to 1.5GB was cleared completely. The top image is from redis usage graph from munin and the one below is AWS cloudwatch latency (time to serve a page) averaged per minute. The latency came down as AWS Autoscale added more instances, costing money.

  • Invalidating the cache :
    Magento automatically invalidates FPC (internal or varnish) by tagging or hashing the content with keys that refer to the type of content. For example, it may generate a tag / hash CATEGORY_123 if the page depends on category 123. Now, when category 123 changes, Magento sends out a invalidate message that says “all pages that have tag / hash CATEGORY_123 should be invalid”. Magento has a elaborate tag convention.
  • FPC and robotic crawlers (BOTS)
    Even if you do not use a crawler for warming, robotic crawlers on the internet (such as google’s indexer Googlebot) will start filling the FPC cache with pages they happen to crawl. It is our advice that a site with FPC should have robots.txt and a front end processor (nginx, WAF) restricting BOTs.
  • CPU and time needed to re-generate a page
    A FPC can fully invalidate (clear) due to a (p)html or css file changing or partially due to a data change such as a product update. A miss from FPC results in the page being regenerated. The CPU requirement for a miss is much higher than a hit. If a crawler is used to warm the cache or if traffic is high, CPU requirement can be quite high as the FPC fills up. Yet, the visitor experience is not good during this period. Using autoscale, this performance degradation can be contained to some extent as additional instances are launched to handle the high CPU requirement.
  • Discipline when using FPC – know when invalidation happens
    It is important to add discipline for code update as it has the worst effect on user experience.

    • Code update should be done at low traffic times.
    • Category changes should be carefully planned at low traffic times.
    • Magento indexing should be set to manual (M1) or on schedule (M2) with a cron running the indexer.

Our recommendation for FPC

  • Do not use a random crawler to warm the FPC cache. Use a page popularity based crawler to warm the cache if necessary.
  • Avoid using a crawler during high traffic – the crawler will compete for system resources with live traffic
  • If possible update code during low traffic times as it causes FPC to invalidate
  • If your site is horizontally scaled, pre-launch instances to your load balancer before invalidating FPC, either explicitly or indirectly, so the latency of starting an instance does not further worsen the user experience

Magento 1.x FPC Plugins

  1. Free Lesti FPC : https://github.com/GordonLesti/Lesti_Fpc. Use this guide to install
  2. Magento connect search results for FPC

Should FPC be a part of scaling strategy?

FPC is concerned with speed. Scale is concerned with the process that helps the site add resources when needed. FPC helps in scalability by reducing the use of resources per hit to the website under certain conditions. It changes the dynamics of when and how many resources will be needed.

FPC has to be considered to be part of scaling strategy – but as one of many parts.

Read part 2 where we discuss other Magento caches.

Read the overview of our Magento scaling series here.

We can analyze your site for free

Schedule a call

Not happy with your website performance and want an expert to look at it?

  • We will analyze your site using public information.
  • We will ask you to give us a 1 day web server log file.
  • We will try to identify what steps if any you should take to improve your sites performance goals.