Making a hash of things

Storing references to objects

As I’ve said before, I’m building a markup templates system. The idea is to
reduce markup into reusable patterns, and to separate output from the user
interface. The idea is that for every significant node in the output (and thus,
in the user’s markup), there’s a matching user-interface node, and vice
versa.

Keeping track of these node pairings is the challenge. In a simple case, you
have a DOM tree like this:

  <markup:template>
<markup:output>
<xul:groupbox>
<xul:caption/>
<markup:children/>
</xul:groupbox>
</markup:output>
<markup:ui-section>
<xul:groupbox>
<xul:caption>
<xul:textbox markup:xpath="xul:groupbox/xul:caption/@label"
markup:source="@value"
markup:required="@value"
/>
</xul:caption>
<markup:children markup:xpath="xul:groupbox/markup:children"/>
</xul:groupbox>
</markup:ui-section>

The markup in italics indicates one node pairing – between the output’s
caption element (actually, its label attribute) and the user-interface’s
textbox. Another pairing exists between the underlined nodes. The
<markup:output/> and <markup:ui-section/>
elements define yet another pairing.

So, ultimately, I need a way of referencing one node from the other.
(Specifically, I’d needed references to the output nodes from the
user-interface elements.) The most obvious way is to store each desired output
node as a property of the corresponding user-interface node – through the
setUserData() and getUserData() methods DOM 3 Core
provides. However, I scaled the template system upwards to include repeating
fragments (think for (var i = 0; i < x; i++) {...} in XML).
Since users can rearrange those fragments – or even create new ones and remove
existing ones – pretty soon, when sending the updates back to the output
section, it gets… complicated. Now, you have user-interface nodes in a
different order than output nodes, some output nodes without user interface
nodes (because those were removed), and some user interface nodes without
output nodes (because they were inserted).

More details in the extended entry.

To illustrate this, consider Table 1, which illustrates the original
ordering of such a template, and Table 2, which shows one scenario based on
user interaction:

Table 1: Original ordering of nodes (vertical) and relationships
(horizontal)
User Interface nodes Output nodes
User Input 1 Output node 1
User Input 2 Output node 2
User Input 3 Output node 3
User input 4 Output node 4

Table 2: Ordering of nodes (vertical) and relationships (horizontal)
after user interaction
User Interface nodes Output nodes
User Input 3 Output node 3
New User Input 5 null
User Input 1 Output node 1

The user has removed two rows, inserted a brand-new one, and reordered the
nodes. To answer this, the application must – at synchronization time – tell
the output section what order of nodes it should use… but in one case,
there’s no matching output node to use at all! Thus, a row in the user
interface cannot guarantee that it has a matching output element, nor what
order it should be in overall among its siblings.

No, the elements which coordinate rows must manage this ordering and filling
in the blanks. But consider another scenario, where a row extracted from one
repeating section is inserted into another. Now it’s really interesting. The
user interface node might have a match, or it might not… and the repeating
section, if it sees a match, has to extract the match from wherever it came
from. (This scenario might arise given a HTML table, where cell (3, 2) moves to
cell (4, 1), for example.) There’s also the added complication that a foreign
row might not match the format of the existing row, but that’s irrelevant for
this matter.

So let’s drop the output-as-property-of-user-interface model, and build a
larger table, where every node (or more realistically, row from a repeating
section) must appear once, must be designated
as output or user-interface, and must have at most one
matching object in the other column. Then, Tables 1 and 2 merge into Table
3:

Table 3: Relationships between nodes, irregardless of order
User Interface nodes Output nodes
User Input 1 Output node 1
User Input 2 Output node 2
User Input 3 Output node 3
User input 4 Output node 4
New User Input 5 null

Then all the repeat sections need are the list and ordering of nodes in
their control, a method to look up nodes in this table, and a method to fill
gaps in this table. In the export phase, you collect the list and
ordering of nodes you have, and send that. In the import phase, you
get the list and ordering of nodes that the other element has, match
those up to the nodes that go on your side, fill in the gaps, and
insert the nodes in the corresponding order.

So, the larger table only allows a node to appear once… that sounds like a
hashtable to me, where each node is a key. (You could argue this means instead
each node appears once as a key, and once as a value… but that makes the
table more complicated and less flexible. What if you have a third column, a
third type of object to match?) What’s the value that goes with each key? How
about an unique set of objects, one of which is the key, another of which is
the desired object? In other words, a row in Table 3 becomes the value for a
given key object:

Table 4: A hashtable between nodes and Table 3 rows
Key Value
User Input 1 Table 3, Row 1
User Input 2 Table 3, Row 2
User Input 3 Table 3, Row 3
User Input 4 Table 3, Row 4
User Input 5 Table 3, Row 5
Output node 1 Table 3, Row 1
Output node 2 Table 3, Row 2
Output node 3 Table 3, Row 3
Output node 4 Table 3, Row 4

As long as you maintain Table 3 and Table 4, and you know which keys you
need, and in what order, it’s pretty easy to update your own repeating section
– or any other set of objects. The combination of these two leads to what I am
calling a 1:1 Hashtable: each object appears once, and has semantic links to
other objects which also appear only once, but have different “columns”,
different meaning.

Taking the object as a key: the hash in hashtable

I still had to implement this 1:1 hashtable, though. My procedure was
simple: get it written, then get it tested, then get it optimized. The
simplest hashtable available to JavaScript is a native one: the object literal,
{}. Keys would be strings, and values are properties. Nice, neat, simple.

Objects are not strings. (Unless you build them by calling new
String('...')
, that is.) Neither are DOM nodes – which are really
important in the above scheme. So I originally thought I couldn’t use an object
literal. (I could, but we’ll come back to that.)

Instead, I wrote a simple, inefficient scheme, like this:

function Hashtable() {
this._keyArray = [];
this._valArray = [];
}
Hashtable.prototype = {
add: function(key, val) {
var index = this._keyArray.length;
this._keyArray[index] = key;
this._valArray[index] = val;
},
get: function(key) {
var index = this._keyArray.indexOf(key);
return index == -1 ? null : this._valArray[index];
}
};

From that, key objects became members of the keyArray, and values
(in this case values from Table 4, or rows from Table 3) became members of
valArray. I wrote an extensive test harness along with my implementation,
debugged it, and had the whole thing working. The inefficiency, though, nagged
at me.

I figured, well, somebody had to have looked at this problem before. Sure
enough, someone had implemented a JS-based hashtable
constructor
. His name is Tim Down, and he’s from the UK. His design is
actually quite innovative: he first generates a string representation of your
key by calling toString(). This won’t generate an unique key by
itself, but a pretty good one. So he takes a JavaScript object literal, {}, and
assigns the string hash a property value which is an array:

_table = {"[object Object]": [], "[object HTMLInputElement]": [], /*
... */}

Then he pushes [key, value] onto the array. To look up an item,
he uses both a hash function and an equality function to match the provided key
against keys stored in the table, like (but not exactly) this:

var bucket = _table["[object HTMLInputElement]"];
for (var i = bucket.length - 1; i >= 0; i--) {
if (equalityFunction(bucket[i][0], item)) {
return bucket[i][1];
}
}

I took a look at his code, and once I understood the basic concept, I liked
it. I thought it could be made a little more efficient, though. I suggested a
solution to him, involving Array.indexOf to generate a string hash on the fly.
He and I chatted over e-mail about this option, and he’ll be testing it. We
don’t know if it’s more efficient or not – or for that matter, if it would pass
tests of, say, deletion of an object from the hashtable.

In the meantime, I also started thinking, Why don’t I create my own
XPCOM-based hashtable component, based on hashtables that Mozilla binary
components can already build and use?
I’m talking specifically about
wrapping nsInterfaceHashTable<nsISupportsHashKey,
nsISupports>
in a XPCOM component. Before I got too far down that
path (didn’t someone tell me in a bug comment that XPCOM shouldn’t be used as a
hammer?), I realized something very fundamental.

Each of the objects I care about storing in my hashtable is either:

  • A JavaScript object whose base classes I defined and own – and thus, I
    can extend with extra properties, or
  • A DOM node which implements setUserData() and
    getUserData(), and thus, I can add extra properties to them
    through “user data”.

In other words, there’s nothing preventing me from generating my own hash
keys for these objects as I need them. I could even generate string hash keys
for them, and go back to using a JavaScript object literal for a hashtable.

So here’s a string
hash key interface
and JavaScript
module
which exports functions to build key generators from. It also has a
handy function near the end, getHashStringKey(), to retrieve that
string hash key for use in hash tables. Note that the key generators do
not hold references to the objects which we need hashed – just to a
base string and a counter. Instead, the objects hold references to the hash
key. As long as each instance of the base class calls the hash function once,
it will have a reference which indeed is unique, until the application
closes.

The “1:1” hashtable implementation, interface and chrome Mochitest are also available.

When I get down to it, that’s all I really need.

One thought on “Making a hash of things”

  1. Thats pretty smart. I’ve been using the double array + indexOf for my javascript DOMNode to DOMNode HashTable for ages. But this sounds like it could be much faster.
    Thanks for sharing!

Comments are closed.