Asynchronous events and Mochitesting

Recently, I found that I needed my chrome Mochitest to wait for a XBL binding to apply, before continuing the test. For the end-user, waiting is hardly an issue – by the time they see the new user interface, all the bindings have applied. Mochitest (and most other automated test harnesses) runs straight through, and brakes for nobody. In some cases you can pull some trickery with nsIThreadManager, convincing Firefox to wait… but that’s hardly reliable.

So I came up with a three-part solution.

  1. Force my XBL bindings to fire DOM events when the constructor executes.
  2. Break up my test functions into (slightly) smaller functions, each of which handles a part of the overall test. Two parts cover the main functional testing, while a third part is specific to the XBL binding.
  3. Implement a generic testing harness for Mochitest which can manage the flow from one function to another.

This new test harness, which I call PendingEventsTest, adds a layer of complexity on top of SimpleTest. Each test function is added via PendingEventsTest.addTest(), and you can pause for a DOM event with PendingEventsTest.addEventLock(). The whole test sequence can abort with an error by calling PendingEventsTest.abortFail(). You can advance to the next test function in the sequence with PendingEventsTest.asyncRun() or PendingEventsTest.run().

There’s more complexity to it than that, of course. To move data from one test function to another, I also added “payload” support, for storing a generic object. Also, this is written for chrome Mochitests – it will likely fail in content Mochitests. Still, this new test harness helped me finish support in Verbosio for my equivalent of the <xbl:children/> element in markup templates.

Lastly, it (and the logging service I wrote a few weeks ago) is licensed under MPL 1.1/GPL 2.0/LGPL 2.1, so anyone who wants to borrow this for Mozilla code is welcome to. Cheers!

The strangest crash I’ve seen in a while

Last week, I wrote about a new logging infrastructure I built to track down
a bug. In the original post, I warned that logging wasn’t a panacea – that it
could cause more trouble than it solved. I was more right than I knew, as I
added a bit of logging which led inexorably to a crash by JS_ASSERT failure.
(For those who don’t know what a JS_ASSERT is, it’s an assertion inside the
JavaScript Engine that all is well. Since JavaScript is one of the primary
languages of the World Wide Web, these assertions are really important. If
JavaScript is misbehaving, it can’t be trusted. Fail one such assertion, and
the program aborts, forcing the developer to take notice.)

At first, the crash didn’t make a damned bit of sense. I prefer to write JavaScript,
not just because I entered into software engineering (and Mozilla development)
through JavaScript, but also because it’s super-stable. In fact, the only new
code I was introducing was in JavaScript. It shouldn’t crash. Through
an unlikely chain of events, it does.

This article details how I was able to analyze the crash itself, how logging
set up the conditions for the crash, and how other logging ultimately
proved that the crash resulted from a logging call. I’ve filed this as bug 543304. Ultimately, reducing
this test case to a usable bug report will not be easy – but worth it. More details in the extended entry. (Warning: technobabble content ratio is high. If you do not want to see how this
developer thinks and debugs Mozilla code, do not read on.)

Continue reading The strangest crash I’ve seen in a while

XPCOM and Logging, part 2

First off, thanks to everyone who took the time to reply to my original blog post. Several people pointed me to moz4js, and I’m quite pleasantly surprised to find out the Mozilla Thunderbird team uses it quite a bit. I hadn’t heard of it, but when I did through the comments, I found it had the same problem NSPR logging has: single language support.

Then I realized: if we put moz4js in a JavaScript module, then a component can reference that module and expose the same interfaces to XPCOM. Chrome JS and JS components can load the module, and all other languages can use the XPCOM component. Maybe not simple, but certainly workable.

No, I’m not signing up to own the toolkit moz4js bug. I’m not wholly convinced, personally, it’s the right way to do logging. What I do think, though, is that we need a general discussion on logging API’s and consumers.

In the bug, Flock mentioned their own API – which I also didn’t know about. I mentioned Skyfire has an internal logging system. It makes me wonder how many other companies and projects (Songbird, Komodo, Disruptive Innovations, etc.) have implemented their own logging API’s. This is one subject that reasonably affects the whole Mozilla developer community.

If MozPad were still active, this is one subject I’d put right to them and develop a standard logging API for. Since we don’t have MozPad anymore, I would like to put it to the community at large:

  1. What do you require logging for?
  2. What logging capabilities do you need?
  3. What logging infrastructure (API, code) have you already implemented or used?
  4. What tests exist and pass for the logging infrastructure you use?
  5. How extensively do you use your logging infrastructure?
  6. How easy is it to work with your logging infrastructure?
  7. If you use someone else’s logging infrastructure, what’s missing?
  8. Are you willing to work on getting your logging infrastructure into the Mozilla code base? (Writing code, tests, specifications)
  9. Are you willing to work on developing a common Mozilla standard logging API and infrastructure, and getting that into the Mozilla code base? (Writing code, tests, specifications)
  10. Are you willing to actively “own” and drive an effort towards adding a Mozilla standard logging API to the Mozilla code base?

Please reply in the comments. The quality and quantity will tell me how seriously I should be taking this.

(P.S. I found a bug in my logging interfaces already. Not the JavaScript code, but the IDL. Bonus points to the first person to spot it besides me – it’s a little undocumented gotcha.)

XPCOM and Logging

What we have now

Over the last several years, I’ve learned to appreciate good message logging. Mozilla’s NSPR Logging API is a very good example, and works well, for compiled code.

JavaScript components? They can’t use NSPR. Neither can chrome, PyXPCOM, or any other privileged code. The best alternative I’ve got is a dump() statement. dump() writes to the standard error console, commonly known as “stderr”. In debug builds, though, stderr gets populated with a bunch of other stuff that I really don’t care about at a particular moment. (I’m not entirely clear of blame, either – leftover dump statements from my code pollute that console as much as anything else.)

At Skyfire, I’ve had to implement a couple logging systems as well, and the generated logs have helped us find and isolate bugs relatively quickly. What you log can help you identify bugs in most cases. They’re not a panacea, though: doing it wrong can really cost you.

Over the past few days, I hit a really weird bug in Verbosio’s templates system (which I’ve been quietly working on for about five months now, in a complete rewrite with tests). One test fails at a critical point, but it’s pretty far removed from the starting point. Amid all the other dump() statements and debug output, I can’t quite figure out where the code started going wrong. To pin it down, I think the best approach is to insert some logging.

Oh, and did I mention that Mochitest includes logging for pass, fail, and todo conditions… but conveniently left out an informational-only logging feature? (Having that helps when you’re trying to trace where a log statement came from – and the conditions at the time.)

The five paragraphs above combine to form one idea: Maybe I should implement my own logging service with tests.

Thoughts in the design

  • First of all, keep it really simple. Most logging implementations I’ve seen use a module name, a severity level, and a summary string. I figure that plus an optional details string should cover most messages.
  • For where those messages go, files are good. Streams are better. With a stream, you can feed it into a file, or read it into dump(), a XUL page’s multiline textbox, or anywhere else.
  • Someone reading the logs probably wants a way of separating messages – so if I give them a “multipart” stream, where each part is a separate stream… it’s easy to tell each message apart.
  • Make sure the user requesting the logging stream can specify which messages they want to receive – both by module name and severity level, and whether or not they want the details with each message.
  • Make sure to cache all the messages, so that a logging stream created later still gets all the messages that came before it.
  • Finally, since each message ultimately ends up as a simple string – use a simple string stream interface to construct each message as needed.

Did I overdo the design? Probably. On the other hand, it really maximizes flexibility. I don’t know how much memory multipart input streams consume, though, so I don’t recommend using this in a production environment. If you use this logging service for debugging, make sure you remove all references to it before checking in!

Did it solve my problem?

No.

Oh, the logging itself worked beautifully. But to get some of the data I wanted to log, I had to call other functions. Some of those functions triggered JavaScript garbage collection, which tried to run some cleanup code, which tried to execute JavaScript code I wrote, which resulted in a 100% reproducible crash. I don’t yet know what I’m going to do about that. I’ll probably first try to reduce the testcase and post it.

Also, logging itself will never solve the problem you’ve got. All it will do is provide data, and hints about where you are in the code execution. You still have to tell it what and when to log, and you have to read those logs to find the bug.

The good news is, though, that a logging infrastructure like this only took one afternoon to write. Sometimes, that’s all I need: a quick, simple component to organize the dirty work into one place.

(The code I wrote I am making available under MPL 1.1 / LGPL 3 / GPL 3 tri-license. If you want it under another license, feel free to e-mail me, and I’ll almost certainly say yes. I don’t currently plan on submitting this in a bug for landing in mozilla-central, though.)

Forking internally – suggestions?

Imagine the following: You’re working on a new feature for your application, dependent on XULRunner 1.9.1. It’s complex, but it ends up being all chrome & JavaScript components. You write tests for it as you’re developing, and you write a specification. Everything’s coming along nicely.

Then you start thinking about moving to XULRunner 1.9.2 beta x. You compile the code, and just to be sure everything’s fine, you run a test you wrote a few weeks ago. The test and the code it is testing haven’t changed at all. The test that passed in 1.9.1.x fails in 1.9.2.x.

Now, what do you do?

This is the situation I faced a few weeks ago. It’s not an idle question. Obviously I want to find out what regressed, but my test case isn’t minimal – it’s approaching maximal. On the one hand, since the bug isn’t in code I’m working on right now (far from it), I could simply ignore it for now and keep working on my feature. On the other hand, my confidence in my code is now badly shaken. Did I misuse an API? Did the API change without my knowledge? Did I make a change in between the 1.9.1 test and the 1.9.2 test? I have to know what broke, so I can work around it for supporting both versions of XULRunner.

(As it turns out, it was a regression in the Mozilla code base itself.)

I started thinking about how to make this a deterministic test: something that I can point to and say “In this set-up, everything works… but in the other set-up, something’s off.” When I think about it though, it’s a bigger problem than just A/B comparisons. My whole approach to developing software is too informal.

More in the extended entry.

Continue reading Forking internally – suggestions?

NSID gone too far

I have a more serious article which I’ll try to write tonight… but this I couldn’t resist. I went to weatherbonk.com, because it’s getting rather chilly in Mountain View and San Jose, CA, USA. I spotted something quite disturbing when I typed in “Mountain View” in the Quick Forecast field..

nsid.PNG

What’s next? Wookies? Geico advertisements?

“Why are you doing this?”

One of the hardest questions for me to answer is, “Why are you doing
this?”
It’s also an excellent question, one that checks what I am
working on. I recently found myself asking this question when trying to use
something I’ve never tried to use before: weak references to DOM elements.

What are weak references?

For those of you coming from JavaScript-land, you know that once you hold
a DOM element as a property, it will last as long as the object you attached
that property to. (Depending on the usage, longer.) This is because XPConnect
generates a “strong” reference to the element, which through the magic of
XPCOM reference counting, requires the element not be deleted until all
strong references are gone – including yours. Weak references are different.
They are objects which hold a pointer to your real target, but they don’t
require the real target “stay alive”. In JavaScript pseudo-code, it looks a
little like this:

function getWeakReference(obj) {
var weakPtr = {
_realPtr: obj,
QueryReferent: function(aIID)
{
if (this._realPtr)
return this._realPtr.QueryInterface(aIID);
return null;
},
QueryInterface: function(aIID)
{
if (aIID.equals(Components.interfaces.nsIWeakReference) ||
aIID.equals(Components.interfaces.nsISupports))
return this;
throw Components.results.NS_ERROR_NO_INTERFACE;
}
};
return weakPtr;
}

There’s one thing this code can’t show you: Some time in the future,
weakPtr._realPtr could be destroyed, and set to null. That
doesn’t happen in normal code (thank heaven!). But with weak references, it
can happen at any time.

Why are you doing this?

There’s a long list of sub-projects and sub-sub-projects, etc., each of
which seems to require I look at something I hadn’t tried yet. Here’s the
current list, and how I started thinking (wrongly) I needed weak
references.

(Continued on the next page…)

Continue reading “Why are you doing this?”

What do you need to know to do your job?

I mean, what languages and tools are requirements in your profession as a software engineer? What do you have to have in your toolbag?

I’ve been thinking about this a bit over the last few months, and the list, for me at least, is pretty long:

There are also a few I infrequently use:

  • Python
  • Perl
  • PHP
  • MySQL
  • Apache
  • GDB debugging (on occasion)
  • CVS
  • Macintosh operating system
  • MathML
  • SVG
  • RDF
  • HTTP

Now, I don’t need to be an expert in all of them. I do need to be an expert in a few of them (JavaScript, HTML, CSS, XUL, XBL, XPIDL, Makefiles, tests). The rest, I can look the details up when I need them.

Even so, it’s a lot – and I imagine Professor Humphrey’s students might be intimidated to see a list like this. The good news is you don’t have to learn it all at once! In the corporate world, for new professionals, there is a learning curve. The bad news is that this list is always changing. Two years ago, I doubt I’d have listed Python above – and I suspect in less than two years, it will move into the “must-know” list.

So here’s a new Internet meme for you software engineers: List out the languages and tools you need to do your job. Extra credit if you post it as an unified diff against my own list here, linking back to this post.

I just hope I haven’t missed anything terribly obvious.

Verbosio progress, 08/06/2009

It’s the project that Just Won’t Die Already!

A few months ago, I was thinking about Verbosio’s proposed template system. I realized it would be a lot more involved than I was thinking about two years ago. That, combined with Mozilla 1.9.1’s HTML 5 drag and drop support, required I scrap the original wizard design and start something new.

(Plus, it helps to give your users as much control as you can – while still guiding them through the complicated process. I didn’t give the user enough flexibility in the first design.)

It’s not finished yet, but the most difficult parts are done. (Or as we like to say at Skyfire, “feature-complete”.) I’ve had to create a whole new set of XBL bindings and JavaScript modules to support my perception of how some of this stuff should work.

Read on in the extended entry for the bindings and modules.

Continue reading Verbosio progress, 08/06/2009

Alex Vincent's ramblings about Mozilla technology, authoring, and whatever he feels like.