Node.js: moving intensive tasks to a child process

If you have code in your Node application that runs longer than a few seconds then you should consider moving it off the main thread, especially if it is a task that you will be running many times per hour or day. Because Node is single threaded, long running processes can block other code from executing and give end users the perception that your application slow. An example of a CPU intensive task might be retrieving an RSS feed, or multiple RSS feeds, at regular intervals and then post-processing the data.

There are several different ways to handle intensive and repetitive tasks in Node, for this post I’m only going to focus on #2:

  1. External Cron Job. You can implement a server-based cron process that runs completely outside of your application process and kicks off a node application at regular intervals and then writes the results to a database or static file.
  2. Child Process. Or, you can create a new, separate process to handle your long running code and then use what’s called a ‘signal’ to pass data between the new process and your main Node application. This pattern is based on Unix/Linux signals.

There are plenty of articles on the internet that discuss how to set up cron jobs so I’m going to skip that. There are very few if any full blog articles that discuss in detail how to build an application with child processes that handle repetitive and intensive tasks. There are Stack Overflow snippets that are great, but they leave out much of the nitty-gritty of how to get everything working, especially for newb’s.

The good news is the steps for implementing a child process are fairly straightforward. The following psuedo-code snippets demonstrate the steps. You can download or fork the full source code from this github repo.

Step 1. Create a new file for the code that you want to run in a separate process. We’ll name this file retriever.js and it will contain a timer and our mock intensive task. Note that the timer doesn’t have to be in the same file, I just put it there for convenience to help illustrate my point.

I also recommend setting up a counter to keep track of the total number of errors related to sending data from retriever.js back to index.js or specifically related to your intensive task. It’s important for you to know that the child process “can” continue to run even if the parent process stops accepting signals. When this happens the child process will throw errors. By counting the number of errors associated with your task or sending/receiving you can force the child process to fail gracefully.

var timers = require("timers"),
    http = require("http")
    ___backgroundTimer;

process.on('message',function(msg){

    this._longRunningTask = function(data){
        var finalArray = []
        for(var url in data){
            //TODO do something here to create the 'result'
            finalArray.push(result);
        }

        //Send the results back to index.js
        if(finalArray != []){
            var data = {
                "error":null,
                "content":finalArray
            }

            try{
                process.send(data);
            }
            catch(err){
                console.log("retriever.js: problem with process.send() " + err.message + ", " + err.stack);
            }
        }
        else{
            console.log("retriever.js: no data processed");
        }
    }

    this._startTimer = function(){
        var count = 0;

        ___backgroundTimer = timers.setInterval(function(){

            try{
                var date = new Date();
                console.log("retriever.js: datetime tick: " + date.toUTCString());
                this._longRunningTask(msg.content);
            }
            catch(err){
                count++;
                if(count == 3){
                    console.log("retriever.js: shutdown timer...too many errors. " + err.message);
                    clearInterval(___backgroundTimer);
                    process.disconnect();
                }
                else{
                    console.log("retriever.js error: " + err.message + "\n" + err.stack);
                }
            }
        },msg.interval);
    }

    this._init = function(){
        if(msg.content != null || msg.content != "" && msg.start == true){
            this._startTimer();
        }
        else{
            console.log("retriever.js: content empty. Unable to start timer.");
        }
    }.bind(this)()

})

process.on('uncaughtException',function(err){
    console.log("retriever.js: " + err.message + "\n" + err.stack + "\n Stopping background timer");
    clearInterval(___backgroundTimer);
})

Step 2. Create a fork of the current process in index.js. The fork request is executed immediately by the application.

var childProcess = require("child_process");
this._retrieveChild = childProcess.fork("./background/retriever");

Step 3. Pass message(s) from index.js to the forked process using the send() method. Note, since the data being passed back-and-forth is automatically serialized you’ll need to use JavaScript primitives such as Object, String, Integer and Array. Any non-Primitive data will have to be manually serialized and de-serialized down to its component parts.


var _finalizedData = null,
    _httpRequestArray = ["https://someurl","https://someurl2","https://someurl3"];

var data = {
    "start":true,
    "interval": 60 * 60 * 1000,
    "content": _httpRequestArray
}

this._retrieveChild.send(data);

Step 4. Receive messages from the forked process and process them in index.js.

this._retrieveChild.on('message', function(msg){
    console.log("Recv'd message from background process.");
    _finalizedData = msg.content;
}.bind(this))

Step 5. Verify that everything works by running the application and opening it in a web page. You can also use a terminal window and grep for any node processes. If your code was implemented correctly it should run without any errors and grep show your background process running separately from node:


bash-3.2$ ps aux | grep node
andy        79497   1.2  0.1  3039268  15040 s000  S     1:27PM   0:02.87 /usr/local/bin/node --debug-brk ./samples/currentweather/utils/retriever
andy        79531   0.0  0.0  2432768    612 s000  U+    1:31PM   0:00.00 grep node

References:

Github repo: node-background-processer
Node Child Process Class
Linux Signals Fundamentals – Part 1