Thoughts on MathML, the DOM, and XML

I apologize for this seeming a mish-mash of two separate articles; I
couldn’t figure out a clean way to state one set of issues without stating
the other.
Identity Problems
ID-type and IDREF-type attributes are wonderful. They offer a conveinient
way for XML documents to reference and cross-reference elements in a
document. The DOM’s handling of these attributes seems a little bit lacking,
though. The problems start when you clone a node, or when you wish to use
similar markup in multiple sections without XML ID attribute validity
problems.
Example: MathML embedded in XML
Technically ID attributes don’t present much of a problem to an XML
document with embedded MathML. You could have several MathML fragments
describing the progress of a simple theorem (such as the Fundamental Theorem
of Calculus). But to use the powerful xref attribute of MathML,
each element you reference by xref must have a unique
id attribute. With possibly dozens of elements per fragment
needing id attributes, and several fragments slowly changing
from one to another, you need a systemized ID attribute naming system for the
XML document. The W3C has not yet suggested a particular scheme for ID
attributes in this scenario.
A simple, if less-than-elegant, solution would be for each id
and xref attribute within each MathML fragment to have its value
prefixed with the id attribute value of the MathML fragment’s
root node.
On another note, the XML Schemas specification does allow for “keys” to be
defined. However, MathML 2.0 does not involve XML Schemas. Perhaps, a
revision to MathML 2.0 would use XML Schemas to create “local”
id and xref attributes.
Example: Cloning a node
If the document’s author includes a feature for cloning a node and then
appending it to the document somewhere, the XML ID-type attributes pose a
particular type of problem. Unless the author explicitly changed or removed
all ID-type attributes, the cloned node cannot be appended to the document
without violating XML validity. Likewise, the author must update all
corresponding IDREF-type attributes.
The DOM does not currently provide any specifications on retrieving
IDREF-type attributes. A NodeFilter may be constructed to
retrieve elements via the names and/or values of attributes, but not by the
schema type of attributes. (Note: DOM Level 3 Core, WD, adds the
schemaType property to attributes; this property would have a
name property which for IDREF-type attributes corresponds to
“IDREF”.)
Opinions on MathML
MathML 2.0 is a specification with limited flexibility compared to more
recent XML specifications. Although written as an XHTML-compatible module
per Modularization of XHTML, it does not lend itself well to containing other
XML languages in its content. For instance, to include XML markup from a
theoretical Geometry Markup Language, you might very well end up using the
csymbol and semantics elements significantly.
Moreover, as noted above, id and xref are not very
flexible with regards to multiple MathML fragments in an XML document.
For that matter, I am continually stunned to see the Math WG
specifications completely ignore geometry. The SVG specification has done an
excellent job of describing graphical images; the mathematical approach to
geometry is not quite included in SVG, and I would hope SVG WG looks at that
for version 2.0. I have e-mailed the SVG WG on this issue some time ago. It
is my opinion that the SVG and Math Working Groups should coauthor a
specification for a Geometry Markup Language which would fit seamlessly into
MathML, SVG and XHTML.
It is also my opinion, for these reasons, that the the W3C Math Working
Group should consider starting a new minor version of MathML, tentatively
MathML 2.1. The basic foundations of MathML 2.0 are solid (though I have
heard complaints about the MathML DOM as well). However, two years have
passed for MathML 2.0. MathML is very useful from a mathematical standpoint,
but not as useful from a XML/DOM standpoint.
Opinions on the DOM
Thank God for DOM 2 Traversal! Had I known about it when I was writing
JavaScript Developer’s Dictionary, I would have certainly included it. (I
use the same excuse for JavaScript strict warnings…) It’s even better than
DOM 2 Range.
With it and support for DOM 3 Core (when it becomes a Candidate
Recommendation), it should be trivial to write NodeFilters for
appropriate cleanup of ID- and IDREF-type attributes — without knowing the
type of each attribute beforehand. Then you could be completely certain that
you won’t violate XML validity appending a cloned node.
Also with Traversal, you could implement a “sub-identity” attribute. The
idea is a subset of an XML Schemas key: basically an ancestor element has a
specific ID-type attribute which you use as the root node in your
NodeIterator or TreeWalker. The sub-identity
attribute being unique only within the contents of the root node,
you could then iterate through the descendant elements to find the target
element.
Similarly, you could also define and find “sub-identity reference”
attributes, serving the same purpose IDREF-type attributes serve for ID-type
attributes.
XML Schemas could easily enforce uniqueness constraints on the id+sub_id
combination via the element, and serve a
similar purpose for id+sub_idref via the
element. Traversal could then help us define several methods:
* Node.getElementById(id) (for nodes which haven’t
been appended yet)
* (Node/Document).getElementsByIdRef(id) returns
NodeList
* (Node/Document).getElementBySubId(id, sub_id) returns
Element
* (Node/Document).getElementsBySubIdRef(id, sub_id)
returns NodeList
* (Node/Document).getElementByKey(key) returns
Element
* (Node/Document).getElementsByKeyRef(key) returns
NodeList
The first four in this list I can implement using DOM 2 Traversal and an
XML language where I already know the IDREF-type amd “sub”-id and -idref
attributes, or DOM 2 Traversal and DOM 3 Core. The last two are much more
difficult; they would require a specific definition of how a XML Schemas key
would be referenced in the DOM. This is something I would not expect the DOM
Working Group to attempt before Level 4, but they should consider it.
Opinions on XBL Note
XBL has a wonderful potential that I think no one is aware of:
* It can serve as an alternative to XSLT.
* It allows you to treat a bound element as part of the DOM
* It allows you to add custom extensions to the bound element — in
essence, creating your own DOM for the element. (Properties and methods,
not just anonymous nodes!)
XBL cannot replace XSLT, and vice versa. But for simple transformations
or extensions, XBL is beautiful. The W3C should really look at standardizing
XBL and integrating it into its other XML specifications wherever feasible.
XBL is a gold mine for XML application developers.