Imagine the following:
- You have a XML or HTML document from a version control system. (SVN, Hg, git, etc.)
- You have an editor which parses that into a DOM document.
- You edit the document using a GUI, not looking at the source code – just adding in paragraphs, new widgets, math formulae, vector graphics, etc. with a visual editor of some kind.
- You finish your changes and save the document.
- You get ready to commit your altered document back to the source code repository, looking over the change set, and wonder what your editor did to your document: it only looks vaguely like what you started with.
This is a hard problem, I think, and for the most part it comes down to handling white space inside a XML tag of some kind. For instance, attributes may appear all on the same line as the element they belong to, or on new lines by themselves, with different indentations.
On the one hand, a visual editor like this isn’t required to preserve source code formatting. As long as two different formats of the same XML document’s white space parse to equal DOM nodes, the editor normally does not care. On the other hand, version control systems don’t often deal with DOM nodes – only with the source markup that an editor generates.
In trying to build a better XML editor (Verbosio) which supports editing in visual modes, I feel I have to respect both visual editing and existing source formatting. Now, there may be a solution out there – a set of algorithms and a specification – but I have certainly not heard of it. On the other hand, many source-controlled versions of XML documents have certain patterns enforced from coding patterns their authors require. I am much less concerned with formatting a document the way I think it should be than with honoring existing patterns of source formatting.
So, would you please post relevant links as comments to this blog entry? Either an existing specification for this sort of problem, or diverse XML samples where people have their hands on the code regularly.