A new design for the Document Object Model's Node objects

The traditional view of a DOM node is somewhat simple: lots of interfaces to access the same basic properties. In a Node, you have a reference for a parent node, for child nodes. In Elements, you also have references for attributes. This works well in a static DOM, one that only minimally changes.

However, experience with the DOM in Gecko shows this is only a piece of the puzzle. Gecko implements XBL "shadow content", where nodes can be children of a parent node, but not be among that parent node's official children. In addition, there are some elements in Gecko's DOM which hide content in a way even XBL cannot expose. The implementation of the <video/> element is one case. The <textarea/> element of HTML also has a "hidden DOM" in Gecko, where the text you type appears in a shadow document. Then there's the <iframe/> element, which exposes a .contentDocument property. That's four different ways of hiding content from the primary DOM which scripts see, and they all play by different rules.

Furthermore, there is no mechanism built into the DOM for tracking changes to a DOM. We have some ad-hoc rules. First, the UndoManager specification, which augments the implemented .execCommand() de facto standard. Second, there are mutation events and mutation observers. Mozilla uses a TransactionManager and special Transaction objects to record changes so they can be easily undone and redone. However, this does not account for possibilities of "forking" development with a DOM, to isolate bugs in one branch while continuing development of a major feature on the trunk branch.

I've reluctantly come to the conclusion that the DOM as Mozilla currently implements it will not meet my needs for an editing environment: multiple shadow content models and a non-scalable history tracking mechanism need to be replaced. I believe to meet my needs, I need an entirely new way of looking at nodes: one where the Node objects are separated from their relationships to other Nodes and attributes.

One node, different parent nodes based on time

Traditionally, the Node handles references to child nodes, and to parent nodes. For instance, if I call foo.appendChild(bar), where foo and bar are Node objects, foo will add bar to an internally stored list of DOM nodes (its child nodes), and bar will place foo in a special pointer for its parent node. There are other operations that have to be considered (does bar have a parent node already?), but for the purposes of this discussion that is not relevant.

The bar node thus establishes a "parent node" relationship to foo, and breaks previously existing relationships. Some far-off other mechanism is responsible for tracking what the bar node's "parent node" might have been before. To undo the foo.appendChild(bar) operation, I have to write some complicated logic to restore the previous state.

I propose maintenance of these relationships be off-loaded to a separate but private object, which the Node owns: a DOMRelations object. This DOMRelations object is responsible for maintaining an in-memory record of what the parent node is at any time in the evolution of that Node. We can think of this simply as an array, hosted above a timeline.

Figure 1

Changes to the overall document happen on the lower timeline in Figure 1. When I call foo.appendChild(bar), then bar's DOMRelations object notes the change in memory, based on the timeline's location of the change. We might simply store {5: foo} for the parent node of bar, in an array of parent node states.

So suppose I'm at time index 6. I want to look up the parent node of bar. The bar node looks at the overall timeline, and finds 6 for the current coordinate. The bar node then asks the DOMRelations object to find the parent node assignment nearest to 6, but no later than 6. Doing a quick search, the DOMRelations object finds a parent node assignment to foo at 5. So the parent node of bar is "currently" foo.

What if I want to undo a change? That's as simple as decrementing the time index from 6 to 5 and repainting the nodes which changed from 5 to 6. From the Node object's perspective, the relationships to parent node, child nodes, etc. update transparently - because it gets those relationships from the DOMRelations object it owns.

Similarly, I can store histories of child nodes and elements' attributes.

One node, different parent nodes based on shadow content bindings

Shadow content is equally simple to handle in this new model - perhaps simpler. The difference is, instead of a one-dimensional timeline, we have a two-dimensional grid of coordinates:

Figure 2

In Figure 2, the red diamonds represent the same parentNode property changes that happened in Figure 1. But suppose in one level of shadow content, I have a XBL binding on the foo element:


  
    
      
    
  

]]>

This looks like the blue diamonds in Figure 2. To the bar element, its parent node is actually the div element in this "shadow" content, but in the "real" world, its parent node remains the foo element. So the following DOM in the real world:


  

]]>

will appear in the first "shadow" world as:


  
    
  

]]>

Now, I could add additional levels of XBL bindings to generate different parent nodes at deeper "shadow" levels, to reflect the green and purple diamonds of Figure 2. That's a minor detail, though. The more important point is that there is a completely different set of rules for shadow content in this model.

We can implement all forms of shadow content in this model, in order of priority. For instance, native anonymous content for a <video/> element could introduce XBL-bound content. The shadow content model could detect this new content, and apply shadow content rules in a given order:

  1. XBL-bound content
  2. Native-anonymous content
  3. Content belonging to an inline frame, browser or editor
  4. Some other anonymous content model
  5. etc.

When all changes at a particular level have finished processing, we can move on to the next level, introducing levels of shadow content needed, until every missing DOM node has been inserted... somewhere.

Then, by looking at the final shadow levels, we can determine what to paint on the screen.

Nodes' relations only store coordinates they care about

In Figure 2, there are sixteen coordinates (the diamonds) the bar node cares about for its .parentNode property. However the grid illustrates there are 256 potential combinations of shadow content and undo history. How would we store what we need, and ignore what we don't?

First, you need the overall coordinate manager for the document, to store all coordinates of all changes:

interface ContentManager : UndoManager {
  Object[] undoCoordinates;
  Object[] shadowCoordinates;
};

Then you need a DOMRelations tracking device:

interface DOMRelations {
  Node? parentNode;
  Node[] childNodes;
};

Internally, DOMRelations.childNodes would implement something like the following:

interface DOMRelationsInternal {
  Object[] getStoredCoordinates(in string axis);
  Object? getDataAt(in Object undoCoordinate, in Object shadowCoordinate);
  setDataAt(in Object undoCoordinate, in Object shadowCoordinate, in Object data);
};

To get the "current" bar.childNodes, I'd call the .getStoredCoordinates() method twice - once for undo coordinates, once for shadow coordinates. I'd compare those returned coordinates against where I am in the "current" undo history and shadow level (there's no sense in looking into the future, is there?) to get a set of coordinates to iterate over. Then it's a matter of calling DOMRelations.getDataAt() with each possible combination, until I arrive at a valid answer.

The stored coordinates for each DOMRelations internal object are likely to be very few (rarely more than 2 or 3 per axis). So the cost of finding the right array of child nodes will be pretty small in the long run. True, it won't be quite as efficient as storing the child nodes array directly on the Node object (either in speed or memory), but undo operations are very fast, as are shifts from the "real" content to "shadow" content.

(Note that DOMRelations.parentNode would also do a lookup on its own DOMRelationsInternal object. A similar arrangement would exist for attributes, ordered first by namespace, then by local name, and finally by DOMRelationsInternal data.)

Reaching across to the native DOM

Alas, I do not plan on implementing this in the native Gecko DOM. I do not think I can impose this new model directly into the Gecko browser without some serious rewriting, and I'm not sure I could sell Mozilla DOM peers on this model. There will be some costs I pay later.

In particular, my model does not handle layout and rendering of DOM nodes on the screen.

So, the "final" nodes which represent what the user sees will have to map onto native DOM nodes from the Gecko DOM. When an attribute in the native DOM changes, I'll have to mirror it to my DOM, and vice versa. Ditto for child nodes, etc. (Notably, this will cost me all the performance savings I'd have from a fast undo model.)

It's important to note that I am not re-implementing XBL. Instead, I have a repeating content templates model which requires its own shadow content implementation. I tried XBL: it doesn't meet my needs, especially in a non-window environment.

The native DOM of Mozilla will (by the time I finish this) rely on mutation observers for tracking changes. For my purposes, I primarily need to look at user-interface controls in XUL, which hopefully will notify mutation observers appropriately. The <xul:textbox type="number"/> element is a classic case: I'll use it to tell me how many copies of a template I need at a given time.

Conclusion

I do not plan on implementing the full HTML5 Document Object Model. Nor do I plan on implementing the full XBL model, or XBL2. I plan on implementing only the parts I need for my template model. In particular, that means a model where the shadow content is inserted before the element owning that shadow content. I believe that this new model will help me achieve the complete template system I need for my XML editing project.

One problem this does not solve - and potentially makes harder - is sharing a document or fragment among multiple environments. For instance, coordinating two people editing the same document on their own machines. I don't have an answer for that.