Web Worker Performance Tips 101

There are many potential benefits to using web workers. They can provide a significant web application performance boost by moving heavy-duty work off the main browser thread. It’s also true that in certain instances you may be slowing down your application in ways you didn’t expect.

Tip #1 – The cost of using a web worker is not free. JavaScript must serialize your data to pass it to a background thread and it must also serialize data when sending it from a background thread back to the main thread. That’s two serialization processes for each round trip and each process takes CPU cycles and time. Depending on what type of data and how large it is you may be surprised how long it can take.

Tip #2 – Not all browsers treat web workers equally. Web worker performance gains in one browser may not represent a similar gain in a different browser. If you are building a cross-browser application, make sure you specifically test and measure each of your web workers in the various browsers that you will be supporting. This often takes developers by surprise. The issue is mainly due to differences in how each browser vendor implements their data serialization algorithms. If you want more information on this, its officially referred to as structural cloning algorithms.

Tip #3 – Measure the total time it takes to use a web worker. Set console.time() before the initializing the worker and set console.timeEnd() where you get a message back. You’ll want to compare these results against running the same code directly on the main thread.

Example:

    console.time("parseTestTimer"); // Start the timer
    
    // Initialize the worker
    var worker = new Worker("ParserWorker.js");

    // Send the data to the worker
    worker.postMessage([first.value,second.value]);

    // Get the data back from the worker
    worker.onmessage = function(result){
        console.timeEnd("parseTestTimer"); // End the timer
        // Do something
    }

Tip #4 – Even using binary transferable objects can have a cost. The Transferable pattern for web workers are designed for high performance, however depending on what you are transferring, the browser, the browser version and the device type (mobile vs desktop) your mileage may vary. In more technical terms this pattern, at least in theory, uses a zero-copy, pass-by-reference approach which is intended to have very low overhead. You should definitely consider testing the transferable objects pattern and compare timing benchmarks against the standard web worker postMessage() pattern. You might as well be thorough, especially since there’s no guarantees of how each browser vendor implemented this functionality under the hood.

Example:

    // Transferrable object pattern using binary data
    worker.postMessage(uInt8Array.buffer, [uInt8Array.buffer]);

Additional References:

Advanced Web Worker Performance – this post provides several important details for determine if your web worker is provide a positive or negative performance gain.

Samples apps demoing no workers, one worker and two workers

MDN – Structured Clone Algorithms

HTML Living Standard – Transferable Objects

Advanced Web Worker Performance

This post offers some advanced considerations on the mechanics of squeezing the best performance possible out of workers. I only mention a few (very) rough benchmarks since the goal is to focus on guidelines and food-for-thought, rather than specifics. It’s also fair to say that not all tasks benefit from using web workers. The best way to know is to test performance with and without the workers.

Processing costs. There are a variety of costs associated with manipulating data. Manipulating data means any task you run against it such as looping, converting, parsing and analyzing. Each one of these tasks should be evaluated for the cost in terms of CPU usage and time spent. By looking at these aspects, you can determine the benefits of having a web worker versus not having a web worker.

Pre-processing – manipulating data before sending it to the web worker.
Peri-processing – manipulating data while doing work on the background thread.
Post-processing – manipulating data after it’s been received back from the web worker.
Total processing – this benchmarks the processing from the very beginning until the very end.

Time stamp before and after each process that handles data. Tools such as console.time are invaluable in identifying actual and potential bottlenecks. Also make liberal use of the developer tools such as Chrome’s CPU profiler that now includes screen captures.

Total CPU and memory usage. There’s often a misperception that just because CPU usage has been outsourced to the background thread that it won’t negatively impact an application or contribute to jank. This is not true, especially on mobile devices that are already CPU and memory constrained. CPU headroom, or the amount available for application processing, is finite and typically doesn’t care whether CPU is used on the main thread or a background thread.

Web apps are wholly dependent on how each browser vendor uses CPUs and manages threads behind the scenes for every browser version and operating system. Using JavaScript you cannot guarantee how the browser will spread CPU load across cores.

In fact, I’ve seen web worker performance improvements made in Chrome that create almost an exact opposite performance decrease in Safari! Talk about a WTF moment.

Cloning costs. Careful consideration should be placed on the costs associated with the act of sending and receiving messages. Sending and receiving isn’t “free”, in fact the amount of CPU consumed can directly cause jank.

Web workers use a cloning algorithm for serializing objects that go in and back out, and cloning uses CPU cycles. Yes, even if you are using transferable objects there may still be performance implications especially when dealing with large amounts of data (>100MBs) as well as on mobile devices. You simply will never know unless you test, test, test.

In some cases, the cloning costs related to both sending and receiving messages may outweigh the performance benefit of using a web worker. If the entire round-trip time and CPU utilization is greater than simply keeping the processing on the main thread – then you probably don’t need a web worker.

Pooled workers. Should you use one worker or several? The only reliable way to know is to experiment. One approach to eke out greater performance is to create a pool of workers. The concept is a task that can be broken up and passed on into smaller concurrent worker tasks and then reassembled after all tasks have completed.

Here’s an example application that retrieves a GeoJSON file, parses it, then displays the data on a map. Open up the developer console and run each application multiple times. You can also download and experiment when this application yourself with increasing and decreasing the size of the thread pool:

Example 1: Parsing using no web workers
Example 2: Parsing using one web workers
Example 3: Parsing using two web workers

With respect to binary versus non-binary data, I’ve seen diminishing returns when using larger number of workers with JSON files. On the other hand, there are a number of posts on the web that show great performance processing image data with increasing numbers of workers.

This also may be dependent on the browsers-threading model and how it uses the number of cores on your device or laptop. How do you know? Test and benchmark.

Re-use workers, or re-create for each loop? There are costs associated with initializing workers. Depending on the type of application, it may benefit you to re-use workers rather than re-create them for each loop. This depends on how complex your web workers are, how many you need to spin up, the type of device they are running on and how often you are using them.

Some applications may run on a timer once every 5 minutes. If that’s the case does it make sense to keep workers sitting unused in memory for that long? Maybe or maybe not. Other applications may only need the workers at startup and there’s definitely no need to keep them around after that. The list of use cases are endless.

Here’s some very rough test results on the time to simply initialize web workers. This example used identical workers that contained 67 lines of code:

Macbook Pro: 2 workers, 0.4 milliseconds on average
Macbook Pro: 4 workers, 0.6 milliseconds on average
Nexus 5: 2 workers, 6 milliseconds on average
Nexus 5: 4 workers, 15 milliseconds on average (border-line UI jank)

As far as computational horsepower, the Macbook used in the tests had 2.6 GHz i7 with 16GB of 1600 MHz DDR3. The Nexus 5 had a Snapdragon 2.26 GHz with 2GB of 800 MHz RAM. Clearly the Macbook outclasses the Nexus. It’s important to note how much longer it took to spin up a pool of workers on a mobile device. That’s an example of an app that would work seamlessly on a powerful laptop, but could produce jank on a mobile device.

Closing Notes. Web workers are fairly straight forward to bolt into an application, but tuning them up to gain the true performance benefits can be a bit more tricky. With a little exploration and experimentation, workers can potentially provide huge benefits for your application’s performance.

Sample Application

[Fixed broken links: July 7, 2016]

Debunking the myth of the one second web page load rule

Ever since Google announced that every mobile web site had to achieve less than one second loading times, I’ve been meaning to do a fun, psuedo-scientific study to help start the discussion of putting this well intentioned goal into a perspective that everyday developers and businesses can understand. This topic has been heavily promoted, with most of the blog posts simply trumpeting agreement and either implying or coming right and saying that any page that loads over one second is pretty much worthless. In case you’ve never read the actual article, here’s a qoute from Google:

“we must deliver and render the above the fold (ATF) content in under one second, which allows the user to begin interacting with the page as soon as possible.”

I propose that most consumers will put up with a majority of sites that don’t deliver in exactly one second as long as the following general criteria are met: the user finds what they want or need the majority of the time, the user interface is fairly easy-to-use and intuitive, that there are limited or no overbearing advertisement banners (whoops!), and that the page loads within a reasonable amount of time. I really haven’t done enough research to know what a ‘reasonable amount of time’ is, but I know from testing that very few sites today deliver true sub-second performance. So let’s tear into this a bit.

First don’t get me wrong, I think sub-second performance is a very worthy goal and I absolutely think every web developer worth his or her salt should strive as hard as they can to make their web page performance as fast as possible. Reality is that budgets can be limited, time frames for deliverables can be short and not everyone is a website performance expert. However…I think we need to start asking some hard questions to make sense of the one-second rule and understand how it can be applied in our everyday development work rather than simply taking it at face value, for example:

  • What industries in particular does the rule apply? I suspect it’s mostly applied to the online retail industry. I think web site visitors cut other industries a reasonable amount of slack.
  • Does the rule only apply to first time visitors?
  • Do repeat visitors abide by other page speed rules? Repeat visitors can take advantage of cached browser pages to speed up their viewing experience
  • How do lousy mobile internet connections factor into this equation? For example, if someone knows they have a lousy internet connection most of the time do they factor that into their online buying decisions?
  • Does the rule apply to all countries or just the U.S.?
  • What is the ideal internet connection speed that this rule is based on? It seems unlikely that a page would have to load under one second regardless of the connection speed.
  • Does this apply to only self-hosted websites? What if your website is hosted on Amazon Webstore or Etsy because with these sites you don’t really have any control over how their webservers, DNS, cloud or internet pipe are configured.

I then went about verifying who are the largest online retailers in the U.S. by sheer sales volume and I came up with Amazon, Staples, Apple and Walmart as good candidates for the top four. However you verify this list, we can all agree that these four sites generate a massive amount of internet traffic, billions of dollars in revenue per year and perhaps even a majority share of internet sales. Given the fact that these stores are were tens of millions of people successfully shop every day I wanted to use the seemingly undisputable shopping experience of these retailers as a basis for comparison.

It seems like a fair assumption that these retailers must be doing something right, and therefore whatever they are doing could be a potential guideline for others. I theorize people’s online shopping and surfing expectations are formed by the websites on which they shop the most. You tend to do in-store shopping at places where you are comfortable and that the same can be said for on-line shopping. Therefore, we need to understand these leading retailers performance baselines to get some basic numbers that we can compare our against our own websites performance.

The Device

For my device, I used my middle of the pack Nexus 4 on a DSL WiFi to ensure the best possible consistent connection. Where I live 4G speeds can fluctuate quite a bit during the day, so in order to normalize those issues out of the tests I simply went with WiFi:

Android Nexus 4, Android v4.4.2
Native Chrome browser
12 Mb/sec DSL/WiFi via G Band (verified between 10 – 12 Mb/sec) – Your own WiFi experience will vary significantly.

To measure performance, I used the latest desktop version of Chrome Canary and it’s new mobile inspection tools that were hooked up to my phone via USB cable. This works really, really well, by the way.

The Criteria

Here is what I was looking for. Your test results will vary based on your device, other applications running and internet connection speeds. I didn’t test iTunes because, well, I don’t use iTunes on my Android and it’s not a website. Believe it or not, when I went to Apple.com on my Android I got a desktop website and not a mobile website.

I chose the following criteria to put context around the very first page load, since that’s what Google seems to focus on the most. My goal was to load each page two times. First time is with an empty browser cache and second time is with the website cached in the browser. Then I repeat the tests multiple times to help account for any anomalies.

Here’s the criteria I looked  at:

  • Page lag with the browser cache empty. This represents a first time page visitor. By the way, I’m making a distinction between the technical time that Chrome Devtools reports the page has loaded and when the various parts and pieces within a web page finish spinning up first. This may result in a slight delay until you can actually start navigating around. This is a very subject number and it’s really hard to eyeball it accurately, but we’ve all experienced it. A web page can ‘appear’ to look like it has loaded however when you go to scroll the page down nothing may happen for a short period of time. I fully acknowledge that some of my perceived delays are due to the lag time looking back and forth between a timer and the web page which were right next to each other.
  • Page lag with web page cached. I report both the technical page load time and the perceived page load time. This represents a repeat visitor.
  • Total download time with no cache. This is the time in which all HTML, JavaScript and CSS has finished loading. As mentioned in bullet 1 this represents the number reported by the developer tools. Lazily loaded content can continue to happen unbeknownst the to user and drag down page performance for the first time page visitor.
  • Total download time with cache. Lazily loaded content can also drag down page performance for repeat visitors.
  • And last but not least, Google’s PageSpeed Insights online tool gives a few guidelines for trying to examine how well a website page stacks up against specific criteria. My only sticking point is it’s not 100% clear what criteria is being used. But, I will point out that not a single top four website got excellent ratings in the ‘speed’ category. In fact, if we were giving out grades, two of them were in the ‘C’ category and the other two were in the ‘F’ category.

Amazon.com Tests (Averaged)

  1. Page lag no cache – 1.18 seconds reported, however based on my perception it looked more like between 2 and 3 seconds as the page finished visually loading.
  2. Page lag cached – 1.53 seconds according to Chrome dev tools. Strangely, the cached page tests seemed just a hair slower than the non-cached page. I’ve noticed that browsers can sometimes be a bit slow when grabbing cached files. It would take more research to dig into how they construct their page and what cached settings are used.
  3. Total download time (no cache) 36.52 seconds, 819KB, 103 requests (Yes, that’s right…around 35 – 36 seconds for a full and complete page load)
  4. Total download time (cached) 4.59 seconds, 428KB, 78 requests
  5. PageSpeed Insights
    1. Speed – 71/100
    2. User Experience – 99/100

Walmart.com Tests (Averaged)

  1. Page lag no cache – 415ms reported, based on my perception it looked more like between 1 and 3 seconds as the page finished visually loading. There was a somewhat brief spinner icon that displayed as the page loaded. Fast!
  2. Page lag cached – 378ms actual, based on my perception it looked more like between 1 and 2 seconds. I was able to start scrollable immediately.
  3. Total download time (no cache) 10.32 seconds, 358KB, 41 requests
  4. Total download time (cached) 9.61 seconds, 47KB, 30 requests
  5. PageSpeed Insights
    1. Speed – 77/100 (This number surprised me, but again we don’t know how the number was calculated)
    2. User Experience – 99/100

Staples.com Tests (Averaged)

  1. Page lag no cache – 5.52 seconds reported, and that approximately matched what I could see.
  2. Page lag cached – 4.45 actual and that also matched what I could see.
  3. Total download time (no cache) 8.25 seconds, 572KB, 44 requests
  4. Total download time (cached) 7.04 seconds, 25KB, 39 requests. Wow, 7 seconds for 25KBs??
  5. PageSpeed Insights
    1. Speed – 50/100 (Yikes!)
    2. User Experience – 96/100

Apple.com Tests (Averaged)

Apple gets the worse grade of the group because when I surfed to www.apple.com I got a full blown desktop website instead of a mobile-enabled website. PageSpeed Insights apparently agreed with me.

  1. Page lag no cache – 2.41 seconds reported, and my eyeballing it said between 2 and 4 seconds.
  2. Page lag cached – 1.69 seconds according to Chrome dev tools. My eyeballing it tended to look like around 2 seconds.
  3. Total download time (no cache) 3.66 seconds, 2.8MBs, 72 requests.
  4. Total download time (cached) 2.46 seconds, 905KB, 71 requests.
  5. PageSpeed Insights
    1. Speed – 58/100 (Yikes!)
    2. User Experience – 60/100 (Double Yikes!)

Conclusions

Since the vast majority of internet users buy products from these major retails, I believe their overall perceptions of how a web site should perform is in a great part established by their experiences while buying products online from them. None of the top sites were perfect, and there is always room for continued improvements.

Only one website out of the top four Internet retailers delivered a technical page load speed that was under one second: Walmart.com. Amazon came really, really close. Staples had mediocre mobile performance. Apple didn’t offer my Android phone a mobile-enabled website.

There is a difference between the times when the page is loaded in the browser as reported by the developer tools and when all web page components become completely visible and then a short time later, ranging from several hundred to several thousand milliseconds, fully usable. As a mobile web developer I can tell you it takes a bit of time for a mobile application to be 100% ready. Many (most?) of us have experienced the often herky-jerky surfing experience as a web page bounces up and down while the content is still loading in the background. iPads have this nasty habit if you aren’t patient enough to wait and wait until the page “appears” to have finished loading. Because of this experience, defining technically at what point the page becomes fully usable can be fairly subjective. This is especially true because some retails treat tablets like desktop machines and deliver a full blown version of a website. Testing for when the page becomes visible and usable is very dependent on the phone’s capabilities, any other applications that might be using the phones hardware and bandwidth resources, as well as the internet connection at that point in time and the users perceptions!

Repeat page visits almost always load faster. Web developers would already know this, but it’s important to keep in mind for making sense of page performance discussions: first time visitors will get a different experience than repeat visitors who come back frequentyly. There is all sort of magic that can be done to control and tweak page caching.

References

Mobile Analysis in PageSpeed Insights
Mobile Path to Purchase: Five Key Findings (interesting info on how people use mobile for retail)
Amazon’s sales versus others (WSJ)
Top 5 largest online retailers (netonomy.net)