888.524.6150 | software@K3integrations.com
Posted by Nolan Cafferky on Thursday, April 22, 2010.
In some situations, Javascript can be more difficult to debug than most other languages. Your execution environment is inextricably bound to a piece of third party software, or rather several different pieces of third party software, each with slightly different quirks and caveats. Combine this with an application that communicates asynchronously with a server, and sooner or later you're going to run into a programmer's worst nightmare: an inscrutable bug that is only rarely reproducible.
We encountered such a bug while working on Congregate, our instant message client for the NetAdventist 3.0 platform. The problem was thus: when a new message comes in for a user, we draw attention using brighter colors and pulsating text in the Congregate box to let them know an unread message has arrived.
Example 1: Unread messages for the local user, as seen from the Users tab
When the user views the conversation that message is in, the highlighted interface pieces revert to their normal state.

Example 2: Chats tab after reading the new message
However, very rarely, when the user clicks to read the message, their "Chats" tab inexplicably disappears.

Of course, the first time anyone experienced this was when we were showing off Congregate to our representatives from NetAdventist, and one of them informed us of the problem. Such is the way of software development.
We didn't experience the problem firsthand until today. Some Firebug inspection revealed the symptom: a "display: none;" was getting added to the link in the Chats tab. This didn't make any sense based on our knowledge of the code; there were no calls to hide() or toggle() on any links. After some speculation and testing, we decided the problem was most likely due to a race conditions, possibly the code that updates CSS for unread messages getting a half-baked interruption from communications with the Openfire server. That felt like a long shot, and we couldn't come up with a good example that would cause this, or why jQuery might be doing that, but it was worth investigating.
We ended up implementing mutual exclusion for everything that edited page content; or so we thought. Bruce Wallace wrote up wrote a good article about it, and you can find a version of the article here. However, after putting all the relevant entry points behind mutexes and verifying that we didn’t break anything, the Chats tab disappeared again. Obviously, either we were wrong or we missed something.
After some more frustrated debugging and a fortunate observation, we discovered the problem.
jQuery(document).ready(function() {
...
congregate_blink_alert = setInterval(function() {
jQuery('.CongregateChat .Attention a').fadeOut(150,function() {
jQuery('.CongregateChat .Attention a').fadeIn(150);
});
}, 1000);
...
});
This bit of code tells the text indicators for unread messages to pulsate in and out about once per second. We hadn’t put this code behind a mutex, thinking that it was harmless. Yet it is here that the second piece of our race condition lies. Here's the sequence of events to expose the vulnerability:
We rewrote that bit of code to avoid the race condition.
jQuery(document).ready(function() {
...
congregate_blink_alert = setInterval(function() {
// Ensure we use the same set of elements for fadeIn() as we do for fadeOut().
var links = jQuery('.CongregateChat .Attention a');
links.fadeOut(150,function() {
links.fadeIn(150);
});
}, 1000);
...
});
Lessons learned:
© 2010 K3 Integrations, LLC | 888.524.6150 | software@K3integrations.com