Troubleshooting multi-threading problems related to Android’s onResume()

This post is a continuation of a previous post I wrote about best practices for using onResume(). I found a particularly testy bug that caused me 2 hours of pain time to track down. The tricky part was it would only show up when there was no debugger attached. Right away this told me it was a threading problem. I suspected that the debugger slowed things down just enough that all the threads could complete in the expected order, but not the actual order that occurred when running the device in stand-alone mode.

The test case. This is actually a very common workflow, and perhaps so common that we just don’t think about it much:

  • Cold start the application without a debugger attached. By cold start I mean that the app was in a completely stopped, non-cached state.
  • Minimize the app like you are going to do some other task.
  • Open the app again to ensure that onResume() gets called.

Now, fortunately I already had good error handling built-in. I kept seeing in logcat and a toast message that a java.lang.NullPointerException was occuring. What happened next was troubleshooting a multi-threaded app without the benefit of a debugger. Not fun. I knew I had to do it because of the visibility of the use case. I couldn’t let this one go.

How to narrow down the problem. The pattern I used to hunt down the bug was to wrap each line of code or code block with Log messages like this.

Log.d("Test","Test1");
setLocationListener(true, true);
Log.d("Test","Test2");

Then I used the following methodology starting inside the method were the NullPointerException was occurring. I did this step-by-step, app rebuild by app rebuild, through the next 250 lines of related code:

  1. Click debug in Eclipse to build the new version of the app that included any new logging code as shown above, and load it on the device.
  2. Wait until the application was running, then shutdown the debug session through Eclipse.
  3. Restart the app on device. Note: debugger was shutdown so it wouldn’t re-attach.
  4. Watch the messages in Logcat.
  5. If I saw one message , such as Test1, followed by the NullPointerException with no test message after it, then I knew it was the offending code block, method or property. If it was a method, then I followed the same pattern through the individual lines of code inside that method. This looked very much like you would do with step-thru debugging, except this was done manually. Ugh.

What caused the problem? As time went on, and I was surprised that I had to keep going and going deeper in the code, I became very curious.  It turned out to be a multi-threading bug in a third party library that wasn’t fully initialized even though it had a method to check if initialization was complete. The boolean state property was plainly wrong. This one portion of the library wasn’t taken into account when declaring initialization was complete. And I was trying to access a property that wasn’t initialized. Oops…now that’s a bug.

The workaround? To work around the problem  I simply wrapped the offending property in a try/catch block. Then using the pattern I described in the previous blog post I was able to keep running verification checks until this property was either correctly initialized, or fail after a certain number of attempts. This isn’t 100% ideal, yet it let me keep going forward with the project until the vendor fixes the bug.

Lessons Learned. I’ve done kernel level debugging on Windows applications, but I really didn’t feel like learning how to do it with one or multiple Android devices. I was determined to try and narrow down the bug using the rather primitive tools at hand. The good news is it only took two hours. For me, it reaffirmed my own practice of implementing good error handling because I knew immediately where to start looking. I had multiple libraries and several thousand lines of code to work with. And, as I’ve seen before there are some bugs in Android that simply fail with little meaningful information. By doubling down and taking it step-by-step I was able to mitigate a very visible bug.

Best Practices for using onResume() in Android Apps

onResume() is a tricky part of an Android’s application life cycle that is called after onRestoreInstanceState(bundle), onRestart(), or onPause(). Its’ typical usage looks like this inside an Activity:

@Override
protected void onResume() {
	super.onResume();
	//do something
}

There are two things to be aware of when using onResume():

1)      The application may not be visible yet to the user

2)      Code that you want to access may not be fully initialized yet.

It seems very simple on the surface. When an application resumes it’s really no different than when you wake up in the morning. It may take some time to get going and there may be certain necessary rituals to be completed. For example, some people need a few cups of coffee (or tea), and applications are the same way. Of course, applications don’t drink coffee or tea (yet). But, anyway, it takes time and there may be certain rituals that need to be done for certain aspects of your application to spin back up. This is especially true when you have implemented your own threads.

It’s important to note: onResume() does not indicate that the application knows anything about the state of your application, and this is where you can get into trouble. This event is, for the most part, just an announcement by the operating system that it has resumed your Activity and that you can start accessing your app or hardware items such as the camera.  What makes this confusing is that some aspects of your Activity will come back to life without your help. Examples of this include user interface components. And, other aspects of your app will not automatically come back to life. An example of this is if you built any custom threads.

So, some key items to consider in your code are:

1)      If you are concerned about visibility then check onWindowFocusChanged(). You can do this using the pattern described below for #3 and #4.

2)      Did you pause any threads prior to the onResume() event? If you did, you’ll need to unpause them. If you don’t unpause them they won’t start back up again automatically.

3)      Do you have anything that takes additional time to re-initialize? An example of this might an RSS refresh request is kicked off, but the response payload hasn’t been received and processed yet and you want to synchronize that with other methods.

4)      If the device is under load when your application resumes, the methods you attempt to access and their responses, as well as any event handling, may be sluggish. Examples of a device under load include limited memory conditions, and/or high CPU usage, and/or high-bandwidth usage. If you don’t handle this properly the app will crash.

To work around items #3 and #4, there are several relatively easy ways to help prevent your app from crashing: Handlers and AsyncTasks. Use Handlers or AsyncTask for managing aspects of your application that don’t or can’t spring immediately back to life. If you aren’t familiar with Handlers or AsyncTask, they give you an easy way to off-load time-consuming or intensive tasks from the main user interface thread, and they also provide easy methods for re-synching messages, methods or objects back into the main thread. The concept behind this is the end user can continue working with a compliant user interface that still accepts input, while these special methods work on their tasks in the background, and then return control back to the main thread when the tasks are done.

There are plenty of posts that explain Handlers and AsyncTasks and show how to fully implement them, so I’m not going to cover that. I will, however, show you one example to demonstrate what I’d consider a best practice to cover you on items #3 and #4. In this example, and in the context of the application being resumed, it must now wait until the RSS feed has been retrieved before running an analysis on the feed. Both the RSS HTTP request/response and the analysis can be time consuming, and the analysis could still be running in the background while another RSS feed request is taking place. By using background threads, we can better manage this scenario and reduce the chance of an application crash.

public boolean getRSSPayloadReady(){
	boolean rssRecieved;
	//determine if RSS has been recieved and processed.
	....
	return rssRecieved;
}

@Override
protected void onResume() {
	super.onResume();
	rssQueue.unpause(); //threaded method for retrieving RSS
	delayedStart(15000);
}

/**
* Use with onResume().
* Check for RSS update in the background using a specified second delay.
* @param delay how long to wait in milliseconds
*/
public void delayedStart(final int delay){
	final Handler handler = new Handler();

	Runnable rssTask = new Runnable() {

		@Override
		public void run() {

			try{
				handler.postDelayed(new Runnable() {

					@Override
					public void run() {

						try{
							boolean test = getRSSPayloadReady(); //Has RSS refresh completed?

							if(test == true){
								//this algorythm runs as AsynTask
								runParsingAlgorythm(); //won't work if RSS payload = null
							}
							else{
								sendToastMessageRSSFailed(); //Let user know there was a problem.
							}
						}
						catch(Exception exc){
							Log.d("Test","delayedStart(): " + exc.getMessage());
						}
					}
				}, delay);
			}
			catch(Exception exc){
				Log.d("Test","delayedStart(): " + exc.getMessage());
			}
		}
	};

	Thread rssThread = new Thread(rssTask);
	rssThread.start();
}

Reference:

Android Activity and Application Life Cycle