Text styling by Mozilla editor?

Let’s face it: when you’re editing plaintext in Mozilla, you don’t get any nice text highlighting. It’s just not supported. Having edited raw text (XML documents, JavaScripts, CSS files, you get the idea) many times, I’ve often wanted some support for syntax highlighting, in a simple but very flexible way.

There are really two parts to the problem: (1) Figuring out the method by which text is highlighted (or otherwise “styled”), and (2) Figuring out how to determine the groups of text which are highlighted.

Call me crazy, but the second part I feel can only be answered by regular expressions, like ECMAScript’s.

One idea I had involves reusing the CSS stylesheet mechanism, and a new Mozilla-specific CSS selector (which no one’s proposed yet):

-moz-regexp("/foo/") {
color: #ff0000;
}

The above stylesheet would be applied to a <xul:editor type=’text/plain’/>’s content document, and through some internal magic, would apply the CSS rules to the appropriate text.

Another, somewhat more convoluted, would involve implementing a new XPath function regexp-match("/foo/"), and XSLT stylesheets to transform the source text into a HTML document, similar to what XML pretty printing in Mozilla does. Then use a <xul:editor type=’text/html’/> to display the document for editing. When practical, the editor would grab the post-processed + edited text and re-run it (or at least the changed portion) through the XSLT stylesheet to re-highlight the changes.

A time delay to allow the user to finish typing before repainting the screen would be entirely acceptable. When you factor in DOM mutation events, it’s practically a necessity.

I’m wondering how in the world to do this. If someone could write up a proposal on doing this, it’d make for a really great feature for Gecko 1.9. Any Summer of Code ’06 takers? Anyone else willing to do this?

Does anyone understand what I’m talking about here on the technical side? 🙂

UPDATE (at 3 am) : Okay, I just had a nice 45 minute discussion with Daniel Brooks (aka db48x). Combine that with comments 2-4 on this blog entry, and I’ve got some rethinking to do. New blog entry coming with Daniel’s conversation post-cogitation. In the meantime, your comments are still welcome. (For those of you wanting to preview the conversation, find some #developers moznet logs, roughly from 2:00 am to 3:00 am Pacific time, 06 May 2005.) For now, I’m going to sleep.

6 thoughts on “Text styling by Mozilla editor?”

  1. Have you done any of the computer science behind syntax highlighting? Have you studied context free grammars? Have you read the sources of existing parsers, tokenisers and lexers, and competent highlighting tools? Do you still think it’s competent to do syntax highlighting for irregular grammars with regular expressions?
    (From Alex: Whoever you are, you don’t sound like a Mozilla developer. Our Mozilla guys at least sign their names.
    I haven’t done any of that, no, but I would appreciate links to such documents, instead of rhetorical questions that basically call me an idiot – especially the last one.)

  2. Alex: maybe anon was a bit harsh, but he has a point. Regexp-based parsers suck, as anyone who tried to write and maintain one knows.
    You really ought to check how existing parsers and good highlighters are implemented before suggesting things like this.
    Last, it seems that Daniel has been working on a decent syntax highlighter. I hope he’ll share his code when it’s done.

  3. I think syntax highlighting based on lexical analysis is the way to go. I think this is what most other syntax highlighting editors do, except for super-sophisticated ones that actually parse and/or compile the source code on the fly, which is overkill for most applications.
    Lexical analysis of a document basically means tokenization via regular expressions, plus a state machine (usually, but not necessarily, a finite state machine). Look at the flex documentation to see what lexers do.
    In Mozilla the way to implement this would probably be to make the document a bunch of spans. Each span represents one lexed token, and has an attribute saying what the token is. Periodically in the document you’d store checkpoints of the lexer state at that point in the document. When the document changes, you rerun lexing from the previous checkpoint, updating the document span structure as you go. Whenever you reach a new checkpoint, if the new state is the same as the checkpointed state, stop, otherwise continue to lex the next chunk of the document.

  4. The tricky part with syntax highlighting is that you need to cut stuff into pieces. However that’s done.
    XPath selects nodes, not ranges. CSS selects elements, you can’t even select individual text nodes. So however you cut those parts, you need to make DOM modifications to markup the document fragments.
    I suggest reading a bit on syntax highlighting in eclipse, too.
    I’m not sure what editor in source view does, and how its performance compares between rich and source mode.

  5. isn’t this already done for the view-source windows? though i suppose that could be done backwards from the already-parsed document.
    (From Alex: That’s done through a XSLT transformation. I’m considering this possibility too, but it’s a bit painful.)

Comments are closed.