Troubleshooting multi-threading problems related to Android’s onResume()

This post is a continuation of a previous post I wrote about best practices for using onResume(). I found a particularly testy bug that caused me 2 hours of pain time to track down. The tricky part was it would only show up when there was no debugger attached. Right away this told me it was a threading problem. I suspected that the debugger slowed things down just enough that all the threads could complete in the expected order, but not the actual order that occurred when running the device in stand-alone mode.

The test case. This is actually a very common workflow, and perhaps so common that we just don’t think about it much:

  • Cold start the application without a debugger attached. By cold start I mean that the app was in a completely stopped, non-cached state.
  • Minimize the app like you are going to do some other task.
  • Open the app again to ensure that onResume() gets called.

Now, fortunately I already had good error handling built-in. I kept seeing in logcat and a toast message that a java.lang.NullPointerException was occuring. What happened next was troubleshooting a multi-threaded app without the benefit of a debugger. Not fun. I knew I had to do it because of the visibility of the use case. I couldn’t let this one go.

How to narrow down the problem. The pattern I used to hunt down the bug was to wrap each line of code or code block with Log messages like this.

Log.d("Test","Test1");
setLocationListener(true, true);
Log.d("Test","Test2");

Then I used the following methodology starting inside the method were the NullPointerException was occurring. I did this step-by-step, app rebuild by app rebuild, through the next 250 lines of related code:

  1. Click debug in Eclipse to build the new version of the app that included any new logging code as shown above, and load it on the device.
  2. Wait until the application was running, then shutdown the debug session through Eclipse.
  3. Restart the app on device. Note: debugger was shutdown so it wouldn’t re-attach.
  4. Watch the messages in Logcat.
  5. If I saw one message , such as Test1, followed by the NullPointerException with no test message after it, then I knew it was the offending code block, method or property. If it was a method, then I followed the same pattern through the individual lines of code inside that method. This looked very much like you would do with step-thru debugging, except this was done manually. Ugh.

What caused the problem? As time went on, and I was surprised that I had to keep going and going deeper in the code, I became very curious.  It turned out to be a multi-threading bug in a third party library that wasn’t fully initialized even though it had a method to check if initialization was complete. The boolean state property was plainly wrong. This one portion of the library wasn’t taken into account when declaring initialization was complete. And I was trying to access a property that wasn’t initialized. Oops…now that’s a bug.

The workaround? To work around the problem  I simply wrapped the offending property in a try/catch block. Then using the pattern I described in the previous blog post I was able to keep running verification checks until this property was either correctly initialized, or fail after a certain number of attempts. This isn’t 100% ideal, yet it let me keep going forward with the project until the vendor fixes the bug.

Lessons Learned. I’ve done kernel level debugging on Windows applications, but I really didn’t feel like learning how to do it with one or multiple Android devices. I was determined to try and narrow down the bug using the rather primitive tools at hand. The good news is it only took two hours. For me, it reaffirmed my own practice of implementing good error handling because I knew immediately where to start looking. I had multiple libraries and several thousand lines of code to work with. And, as I’ve seen before there are some bugs in Android that simply fail with little meaningful information. By doubling down and taking it step-by-step I was able to mitigate a very visible bug.

Handle null values when using Android’s LocationManager

I’d noticed a number of fatal app crashes recently, and it was only through luck I discovered that I wasn’t handling null values that were being returned by the device’s  LocationManager. I only discovered this through a bit of detective work and being lucky enough to have the app crash while hooked up to logcat when the device decided to act up. I can now say with certainty that it’s true a GPS_PROVIDER or LOCATION_PROVIDER can indeed return null values. So, as a best practice  now I simply listen for them and handle them gracefully.

I’ve also noticed that sometimes these null values can happen sporadically, so I’ve also implemented a counter system that says if a certain number of null values happen in a row then shut down the LocationManager and notify the user, otherwise simply ignore them. Here’s an example of the basic pattern:

// Acquire a reference to the system Location Manager
LocationManager locationManager = (LocationManager) this.getSystemService(Context.LOCATION_SERVICE);

// Define a listener that responds to location updates
LocationListener locationListener = new LocationListener() {

	int counter = 0;

	public void onLocationChanged(Location location) {
      // Called when a new location is found by the network location provider.
		if(location != null){
			counter = 0
			double lat = loc.getLatitude();
			double lon = loc.getLongitude();
		}
		else
		{
			counter++;
			if(counter > 3)
			{
				// Remove the listener you previously added
				locationManager.removeUpdates(locationListener);
                                //locationManager = null;

				//Let user know there was a problem and that GPS was shut off....
			}

		}
    }

    public void onStatusChanged(String provider, int status, Bundle extras) {}

    public void onProviderEnabled(String provider) {}

    public void onProviderDisabled(String provider) {}
};

// Register the listener with the Location Manager to receive location updates
locationManager.requestLocationUpdates(LocationManager.NETWORK_PROVIDER, 0, 0, locationListener);

Here’s one example of a GPS problem on a Motorola Atrix in logcat:

06-01 15:15:43.298: E/libgps(1653): recv_command_nmea() : NMEA_PARSE_ERROR_DATA_FIELDS_MISSING: The sentence does not contain enough fields for its data type

The problem with JavaScript Obfuscators and Minification – Tracking down errors

JavaScript obfuscators and minifiers do their job well. In fact, some obfuscators have anti-debugging features. However, if you are a legitimate developer building applications against one of these libraries, chances are you’ve gotten an indecipherable error such as “z=null line 14300” and it brings your development efforts to a halt. Error messages like this provide no useful information on what the problem really is, or give any hints on how you might be able solve it. You’ve probably even looked at the jumbled source code in a last ditch attempt to make some sense out of the error. And, whether it’s your own library or a mainstream ones as jQuery or Dojo, it doesn’t matter. The amount of productivity lost because of these errors in probably very large, not to mention the frustration it causes.

I hope the the developers of these obfuscators are reading this…because I have a proposed solution to the problem.

Now, I want to start out by mentioning that I fully understand why obfuscators exist for reasons such as source code protection and decreasing download size. What I propose takes this fully into account, yet makes your library developer friendly in a secure way:

During the obfuscation process create an index file that maps each variable, function and class to a real line number and store this file in a web folder.  Then create a small html file that lets you search the index and return the real line number. Provide an option for return the variable, function or class name, too.

The concept is that if there is an error, like the  “z=null line 14300” I mentioned above, developers can then at least have some hope of narrowing down the general area of the code where it might be occurring.

The bonus is, if you own an obfuscated commercial library, now your tech support people can also look up the general area where a customer might be having a problem. For security reasons you don’t have to share the index file, But, even then, there isn’t enough information in it to de-compile the library. Now, if I post my error to the forum:  What is “z=null line 14300”? Tech support will be able to tell me that I’m missing a custom property on a widget’s HTML DIV element. It’s a win-win situation.

What do you think?