Advanced Web Worker Performance

This post offers some advanced considerations on the mechanics of squeezing the best performance possible out of workers. I only mention a few (very) rough benchmarks since the goal is to focus on guidelines and food-for-thought, rather than specifics. It’s also fair to say that not all tasks benefit from using web workers. The best way to know is to test performance with and without the workers.

Processing costs. There are a variety of costs associated with manipulating data. Manipulating data means any task you run against it such as looping, converting, parsing and analyzing. Each one of these tasks should be evaluated for the cost in terms of CPU usage and time spent. By looking at these aspects, you can determine the benefits of having a web worker versus not having a web worker.

Pre-processing – manipulating data before sending it to the web worker.
Peri-processing – manipulating data while doing work on the background thread.
Post-processing – manipulating data after it’s been received back from the web worker.
Total processing – this benchmarks the processing from the very beginning until the very end.

Time stamp before and after each process that handles data. Tools such as console.time are invaluable in identifying actual and potential bottlenecks. Also make liberal use of the developer tools such as Chrome’s CPU profiler that now includes screen captures.

Total CPU and memory usage. There’s often a misperception that just because CPU usage has been outsourced to the background thread that it won’t negatively impact an application or contribute to jank. This is not true, especially on mobile devices that are already CPU and memory constrained. CPU headroom, or the amount available for application processing, is finite and typically doesn’t care whether CPU is used on the main thread or a background thread.

Web apps are wholly dependent on how each browser vendor uses CPUs and manages threads behind the scenes for every browser version and operating system. Using JavaScript you cannot guarantee how the browser will spread CPU load across cores.

In fact, I’ve seen web worker performance improvements made in Chrome that create almost an exact opposite performance decrease in Safari! Talk about a WTF moment.

Cloning costs. Careful consideration should be placed on the costs associated with the act of sending and receiving messages. Sending and receiving isn’t “free”, in fact the amount of CPU consumed can directly cause jank.

Web workers use a cloning algorithm for serializing objects that go in and back out, and cloning uses CPU cycles. Yes, even if you are using transferable objects there may still be performance implications especially when dealing with large amounts of data (>100MBs) as well as on mobile devices. You simply will never know unless you test, test, test.

In some cases, the cloning costs related to both sending and receiving messages may outweigh the performance benefit of using a web worker. If the entire round-trip time and CPU utilization is greater than simply keeping the processing on the main thread – then you probably don’t need a web worker.

Pooled workers. Should you use one worker or several? The only reliable way to know is to experiment. One approach to eke out greater performance is to create a pool of workers. The concept is a task that can be broken up and passed on into smaller concurrent worker tasks and then reassembled after all tasks have completed.

Here’s an example application that retrieves a GeoJSON file, parses it, then displays the data on a map. Open up the developer console and run each application multiple times. You can also download and experiment when this application yourself with increasing and decreasing the size of the thread pool:

Example 1: Parsing using no web workers
Example 2: Parsing using one web workers
Example 3: Parsing using two web workers

With respect to binary versus non-binary data, I’ve seen diminishing returns when using larger number of workers with JSON files. On the other hand, there are a number of posts on the web that show great performance processing image data with increasing numbers of workers.

This also may be dependent on the browsers-threading model and how it uses the number of cores on your device or laptop. How do you know? Test and benchmark.

Re-use workers, or re-create for each loop? There are costs associated with initializing workers. Depending on the type of application, it may benefit you to re-use workers rather than re-create them for each loop. This depends on how complex your web workers are, how many you need to spin up, the type of device they are running on and how often you are using them.

Some applications may run on a timer once every 5 minutes. If that’s the case does it make sense to keep workers sitting unused in memory for that long? Maybe or maybe not. Other applications may only need the workers at startup and there’s definitely no need to keep them around after that. The list of use cases are endless.

Here’s some very rough test results on the time to simply initialize web workers. This example used identical workers that contained 67 lines of code:

Macbook Pro: 2 workers, 0.4 milliseconds on average
Macbook Pro: 4 workers, 0.6 milliseconds on average
Nexus 5: 2 workers, 6 milliseconds on average
Nexus 5: 4 workers, 15 milliseconds on average (border-line UI jank)

As far as computational horsepower, the Macbook used in the tests had 2.6 GHz i7 with 16GB of 1600 MHz DDR3. The Nexus 5 had a Snapdragon 2.26 GHz with 2GB of 800 MHz RAM. Clearly the Macbook outclasses the Nexus. It’s important to note how much longer it took to spin up a pool of workers on a mobile device. That’s an example of an app that would work seamlessly on a powerful laptop, but could produce jank on a mobile device.

Closing Notes. Web workers are fairly straight forward to bolt into an application, but tuning them up to gain the true performance benefits can be a bit more tricky. With a little exploration and experimentation, workers can potentially provide huge benefits for your application’s performance.

Sample Application

[Fixed broken links: July 7, 2016]