Advanced Web Worker Performance

This post offers some advanced considerations on the mechanics of squeezing the best performance possible out of workers. I only mention a few (very) rough benchmarks since the goal is to focus on guidelines and food-for-thought, rather than specifics. It’s also fair to say that not all tasks benefit from using web workers. The best way to know is to test performance with and without the workers.

Processing costs. There are a variety of costs associated with manipulating data. Manipulating data means any task you run against it such as looping, converting, parsing and analyzing. Each one of these tasks should be evaluated for the cost in terms of CPU usage and time spent. By looking at these aspects, you can determine the benefits of having a web worker versus not having a web worker.

Pre-processing – manipulating data before sending it to the web worker.
Peri-processing – manipulating data while doing work on the background thread.
Post-processing – manipulating data after it’s been received back from the web worker.
Total processing – this benchmarks the processing from the very beginning until the very end.

Time stamp before and after each process that handles data. Tools such as console.time are invaluable in identifying actual and potential bottlenecks. Also make liberal use of the developer tools such as Chrome’s CPU profiler that now includes screen captures.

Total CPU and memory usage. There’s often a misperception that just because CPU usage has been outsourced to the background thread that it won’t negatively impact an application or contribute to jank. This is not true, especially on mobile devices that are already CPU and memory constrained. CPU headroom, or the amount available for application processing, is finite and typically doesn’t care whether CPU is used on the main thread or a background thread.

Web apps are wholly dependent on how each browser vendor uses CPUs and manages threads behind the scenes for every browser version and operating system. Using JavaScript you cannot guarantee how the browser will spread CPU load across cores.

In fact, I’ve seen web worker performance improvements made in Chrome that create almost an exact opposite performance decrease in Safari! Talk about a WTF moment.

Cloning costs. Careful consideration should be placed on the costs associated with the act of sending and receiving messages. Sending and receiving isn’t “free”, in fact the amount of CPU consumed can directly cause jank.

Web workers use a cloning algorithm for serializing objects that go in and back out, and cloning uses CPU cycles. Yes, even if you are using transferable objects there may still be performance implications especially when dealing with large amounts of data (>100MBs) as well as on mobile devices. You simply will never know unless you test, test, test.

In some cases, the cloning costs related to both sending and receiving messages may outweigh the performance benefit of using a web worker. If the entire round-trip time and CPU utilization is greater than simply keeping the processing on the main thread – then you probably don’t need a web worker.

Pooled workers. Should you use one worker or several? The only reliable way to know is to experiment. One approach to eke out greater performance is to create a pool of workers. The concept is a task that can be broken up and passed on into smaller concurrent worker tasks and then reassembled after all tasks have completed.

Here’s an example application that retrieves a GeoJSON file, parses it, then displays the data on a map. Open up the developer console and run each application multiple times. You can also download and experiment when this application yourself with increasing and decreasing the size of the thread pool:

Example 1: Parsing using no web workers
Example 2: Parsing using one web workers
Example 3: Parsing using two web workers

With respect to binary versus non-binary data, I’ve seen diminishing returns when using larger number of workers with JSON files. On the other hand, there are a number of posts on the web that show great performance processing image data with increasing numbers of workers.

This also may be dependent on the browsers-threading model and how it uses the number of cores on your device or laptop. How do you know? Test and benchmark.

Re-use workers, or re-create for each loop? There are costs associated with initializing workers. Depending on the type of application, it may benefit you to re-use workers rather than re-create them for each loop. This depends on how complex your web workers are, how many you need to spin up, the type of device they are running on and how often you are using them.

Some applications may run on a timer once every 5 minutes. If that’s the case does it make sense to keep workers sitting unused in memory for that long? Maybe or maybe not. Other applications may only need the workers at startup and there’s definitely no need to keep them around after that. The list of use cases are endless.

Here’s some very rough test results on the time to simply initialize web workers. This example used identical workers that contained 67 lines of code:

Macbook Pro: 2 workers, 0.4 milliseconds on average
Macbook Pro: 4 workers, 0.6 milliseconds on average
Nexus 5: 2 workers, 6 milliseconds on average
Nexus 5: 4 workers, 15 milliseconds on average (border-line UI jank)

As far as computational horsepower, the Macbook used in the tests had 2.6 GHz i7 with 16GB of 1600 MHz DDR3. The Nexus 5 had a Snapdragon 2.26 GHz with 2GB of 800 MHz RAM. Clearly the Macbook outclasses the Nexus. It’s important to note how much longer it took to spin up a pool of workers on a mobile device. That’s an example of an app that would work seamlessly on a powerful laptop, but could produce jank on a mobile device.

Closing Notes. Web workers are fairly straight forward to bolt into an application, but tuning them up to gain the true performance benefits can be a bit more tricky. With a little exploration and experimentation, workers can potentially provide huge benefits for your application’s performance.

Sample Application

[Fixed broken links: July 7, 2016]

You can’t just wrap old websites in bootstrap and call it a day

Just because you wrapped an existing website in bootstrap doesn’t necessarily mean it’s ready for use on mobile devices. This is especially true if the website was originally designed for desktop browsers. Yes, bootstrap can significantly improve the user interface and make it flexible across multiple screen sizes. But it’s also up to you to roll up your sleeves and make sure the code behind the scenes is also worthy of being mobile-ready.

So, here are a three challenges to consider that will help keep your smartphone using visitors happy.

Challenge 1 – The internet connection on mobile devices is not as reliable as your home or office wired network. That’s a fact. Download speeds can and will vary significantly. The larger the website in MBs, the more links it has to download, the more non-optimized images then the longer it will take to render and become ready for use, especially on a mobile device. And mobile users are a very impatient bunch when it comes to sluggishness.

Size does matter with web sites. Smaller sized files download faster. The fewer number of files that make up a website also means faster downloads. Optimize, optimize and optimize some more. Minify files. Combine multiple files into one. Optimize images for web display.

Challenge 2 – Site navigation needs to be rethought and resized with mobile users in mind. Modern mobile devices use finger-based navigation, as opposed to high-precision mouse pointers. Teeny, tiny buttons or links that look cool on an ultra-high resolution MacBook retina screen positively suck when you are trying multiple times to click on them with your fingertip. On some websites using desktop navigational elements on your phone becomes like a macabre video game as you repeatedly play hit-or-miss with your fingertips.

Mobile websites should be finger lickin’ good. Okay, maybe you don’t really have to lick your fingers, but at least right size your navigational elements while keeping people’s fingers in mind. And fingers come in all shapes, sizes and levels of dexterity. Bootstrap can help with this.

Furthermore, your design and testing should work equally well in both portrait and landscape modes (phone right side up or phone on its side), and you should be able to switch back and forth seamless between the two modes in the same browser session.

Challenge 3 – Mobile devices are significantly more sensitive to browser memory leaks and bloated web pages. The mobile operating system will simply kill of any app that it deems to be using too much memory. And most of us are simply not good stewards at keeping our mobile browsers tuned up and happy. Browser caches grow huge and don’t get cleaned out regularly; we keep too many tabs open and probably have more information in our browser history than the library of congress. To further add to our woes, many of us let our phones run for weeks without a restart which can allow memory leakage to grow over time.

Tweak as many aspects of web page performance that your time will allow. Optimize old code by re-writing and striving to add new efficiencies. I know it may sound crazy, but if your site is particularly large and complex then consider creating focused, mobile-only sites that have scaled down content, rather than trying the one-size fits all approach. Smaller sites not only load faster but they are easier to navigate, take up less memory and typically perform better.

Fastest way to find an item in a JavaScript Array

There are many different ways to find an item in a JavaScript array. With a little bit of testing and tinkering, I found some methodologies were faster than others by close to 200%!

I’ve been doing some performance tweaking on a very CPU intensive JavaScript application and I needed really fast in-memory searching on a temporary array before writing that data to IndexedDB. So I did some testing to decide on an approach with the best search times. My objective was to coax out every last micro-ounce of performance. The tests were completed using a pure JavaScript methodology, and no third party libraries were used, so that I could see exactly what was going on in the code.

I looked at five ways to parse what I’ll call a static Array. This is an array that once it is written you aren’t going to add anything new too it, you simply access its data as needed and when you are done you delete it.

  1. Seek. Create an index Array based exactly on the primary Array. It only contains names or unique ids in the same exact order as the primary. Then search for indexArray.indexOf(“some unique id”) and apply that integer against the primary Array, for example primaryArray[17] to get your result. If this doesn’t make sense take a look at code in my JSFiddle.
  2. Loop. Loop thru every element until I find the matching item, then break out of the loop. This pattern should be the most familiar to everyone.
  3. Filter. Use Array.prototype.filter.
  4. Some. Use Array.prototype.Some.
  5. Object. Create an Object and access it’s key/value pairs directly using an Object pattern such as parsedImage.image1 or parseImage[“image1”]. It’s not an Array, per se, but it works with the static access pattern that I need.

I used the Performance Interface to get high precision, sub-millisecond numbers needed for this test. Note, this Interface only works on Chrome 20 and 24+, Firefox 15+ and IE 10. It won’t run on Safari or Chrome on iOS. I bolted in a shim so you can also run these tests on your iPad or iPhone.

My JSFiddle app creates an Array containing many base64 images and then loops thru runs hundreds of tests against it using the five approaches. It performs a random seek on the Array, or Object during each iteration. The offers a better reflection of how the array parse algorithm would work under production conditions. After the loops are finished, it then spits out an average completion time for each approaches.

The results are very interesting in terms of which approach is more efficient. Now, I understand in a typical application you might only loop an Array a few times. In those cases a tenth or even hundredth of a millisecond may not really matter. However if you are doing hundreds or even thousands of manipulations repetitively, then having the most efficient algorithm will start to pay off for your app performance.

Here are some of the test results based on 300 random array seeks against a decent size array that contained 300 elements. It’s actually the same base64 image copied into all 300 elements. You can tweak the JSFiddle and experiment with different size arrays and number of test loops. I saw similar performance between Firefox 29 and Chrome 34 on my MacBook Pro as well as on Windows. Approach #1 SEEK seems to be consistently the fastest on Arrays and Object is by far the fastest of any of the approaches:

OBJECT Average 0.0005933333522989415* (Fastest.~191% less time than LOOP)
SEEK Average 0.0012766665895469487 (181% less time than LOOP)
SOME Average 0.010226666696932321
FILTER Average 0.019943333354603965
LOOP Average 0.02598666658741422 (Slowest)

————–

OBJECT Average 0.0006066666883028423* (Fastest.~191% less time than slowest)
SEEK Average 0.0012900000368244945 (181% less time than LOOP)
SOME Average 0.012076666820018242
FILTER Average 0.020773333349303962
LOOP Average 0.026383333122745777 (Slowest)

As for testing on Android, I used my Android Nexus 4 running 4.4.2. It’s interesting to note that the OBJECT approach was still the fastest, however the LOOP approach (Approach #2) was consistently dead last.

On my iPad 3 Retina using both Safari and Chrome, the OBJECT approach was also the fastest, however the FILTER (Approach #3) seemed to come in dead last.

I wasn’t able to test this on IE 10 at the time I wrote this post and ran out of time.

Conclusion

Some folks have blogged that you should never use Arrays for associative search. I think this depends on exactly what you need to do with the array, for example if you need to do things like slice(), shift() or pop() then sticking to an Array structure will make your life easier. For my requirements where I’m using a static Array pattern, it looks like using the Object pattern has a significant performance advantage. If you do need an actual Array then the SEEK pattern was a close second in terms of speed.

References:

JSFiddle Array Parse tests
Performance Interface

[Updated: May 18, 16:06, fixed incorrect info]