More Blog Entries

Javascript Race Conditions

Posted by Nolan Cafferky on Thursday, April 22, 2010.


In some situations, Javascript can be more difficult to debug than most other languages. Your execution environment is inextricably bound to a piece of third party software, or rather several different pieces of third party software, each with slightly different quirks and caveats. Combine this with an application that communicates asynchronously with a server, and sooner or later you're going to run into a programmer's worst nightmare: an inscrutable bug that is only rarely reproducible.


We encountered such a bug while working on Congregate, our instant message client for the NetAdventist 3.0 platform. The problem was thus: when a new message comes in for a user, we draw attention using brighter colors and pulsating text in the Congregate box to let them know an unread message has arrived.


Unread messages for the local user

Example 1: Unread messages for the local user, as seen from the Users tab


When the user views the conversation that message is in, the highlighted interface pieces revert to their normal state.

Chats tab after reading the new message

Example 2: Chats tab after reading the new message


However, very rarely, when the user clicks to read the message, their "Chats" tab inexplicably disappears.

Missing Chats tab
Example 3: Missing Chats tab

Of course, the first time anyone experienced this was when we were showing off Congregate to our representatives from NetAdventist, and one of them informed us of the problem. Such is the way of software development.


We didn't experience the problem firsthand until today. Some Firebug inspection revealed the symptom: a "display: none;" was getting added to the link in the Chats tab. This didn't make any sense based on our knowledge of the code; there were no calls to hide() or toggle() on any links. After some speculation and testing, we decided the problem was most likely due to a race conditions, possibly the code that updates CSS for unread messages getting a half-baked interruption from communications with the Openfire server. That felt like a long shot, and we couldn't come up with a good example that would cause this, or why jQuery might be doing that, but it was worth investigating.


We ended up implementing mutual exclusion for everything that edited page content; or so we thought. Bruce Wallace wrote up wrote a good article about it, and you can find a version of the article here. However, after putting all the relevant entry points behind mutexes and verifying that we didn’t break anything, the Chats tab disappeared again. Obviously, either we were wrong or we missed something.


After some more frustrated debugging and a fortunate observation, we discovered the problem.

jQuery(document).ready(function() {
  ...
 
  congregate_blink_alert = setInterval(function() {
    jQuery('.CongregateChat .Attention a').fadeOut(150,function() {
      jQuery('.CongregateChat .Attention a').fadeIn(150);
    });
  }, 1000);
 
  ...
});

This bit of code tells the text indicators for unread messages to pulsate in and out about once per second. We hadn’t put this code behind a mutex, thinking that it was harmless. Yet it is here that the second piece of our race condition lies. Here's the sequence of events to expose the vulnerability:

  • A jQuery selector finds all links in Attention divs and invokes fadeOut() on them
  • The 150 ms timer for fadeOut() expires. The end result is a "display: none;" in the style for the link
  • A user clicks to read a message, removing the Attention class from some divs
  • A jQuery selector finds all links in Attention divs, missing the ones we just removed the Attention class from, and invokes fadeIn() on them
  • The orphans stay hidden until a new unread message comes in, adding the Attention class back to them

We rewrote that bit of code to avoid the race condition.

jQuery(document).ready(function() {
  ...
 
  congregate_blink_alert = setInterval(function() {
    // Ensure we use the same set of elements for fadeIn() as we do for fadeOut().
    var links = jQuery('.CongregateChat .Attention a');
    links.fadeOut(150,function() {
      links.fadeIn(150);
    });
  }, 1000);
 
  ...
});

Lessons learned:

  • It didn't occur to me that fadeOut() leaves a "display: none;" in its target’s style when finished, and knowing that would have saved me hours of debugging. Many of the jQuery effects can do this.
  • Any time you have asynchronous Javascript events reading and modifying the DOM, you need to either implement mutual exclusion across the board or write code that expects your DOM to be volatile, maybe some combination of both. Make sure your manipulations operate on the same set of elements from beginning to end and leave all of them in a stable state.


More Blog Entries

Post a comment


  • (not displayed)

© 2010 K3 Integrations, LLC | 888.524.6150 | software@K3integrations.com