Etna, a wysiwyg XML RELAX NG- and Gecko-based editor
- , ,
Abstract
In February 2004, the Connexions Project hosted by Rice University, Houston Texas, contacted our company to build a new XML editor. The requirements for that editor were simple: open-source, wysiwyg, independent from other software sources, simple to use, and based on RELAX NG.
We carefully reviewed the possible solutions with Connexions and proposed to build a new editing tool based on Mozilla Firefox’s rendering engine, Gecko.
This document describes this project, from inception to its current status, and will cover technical implementation details.
About the authors

Daniel Glazman
Daniel Glazman is the CEO and founder of Disruptive Innovations SARL. Holding two engineering diplomas from Ecole Polytechnique and Sup'Télécom Paris, Daniel started working on markup editors in 1991 when he joined Grif SA, a software vendor specialized in SGML editors where he implemented one of the very first wysiwyg editing environments for CALS tables. From 1994 to 2000, Daniel held several positions, from research engineer to team manager, at the Research & Development Center of Electricité de France, the French national energy provider. He participated into the standardization of HTML 4 and CSS 2 specifications, and is still an Invited Expert in W3C's CSS Working Group, editor or author of several CSS 3 modules. In January 2000, Daniel joined Amazon.com as the CTO of the French subsidiary Amazon.fr, and did not like it... He left to become the CTO of the French subsidiary of Swedish Halogen, the merger of a consulting company and a web agency. At the end of 2000, Daniel joined Netscape Communications were he implemented new features in both the CSS engine and the HTML editor, Composer. After the fall of Netscape in July 2003, Daniel founded Disruptive Innovations and implemented Nvu.

Laurent Jouanneau
Laurent Jouanneau started working on the Web in 1999, after its MIAGE diploma (Maîtrise informatique Appliqué à la Gestion d'Entreprise). In a former professional life, he implemented Web Applications using various languages (PHP, Java, Vbscript..) and databases ( Mysql, Postgresql, Oracle...). In 2003, he fell into “web standards mode” and started promoting them. He is one of the founders of Openweb, the most famous French web site about web standards. Later in the year, he discovered Mozilla and started the first French web site about XUL and the other underlying technologies of Mozilla : xulfr.org. In June 2004, he joined Disruptive Innovations to play with Gecko, XUL, XPCOM, XBL and create Etna, a new XML wysiwyg editor.
Genesis
In February 2004, after the first public releases of Nvu, the HTML/XHTML editor based on Mozilla, Brent Hendricks from the Connexions Project contacted us because they wanted to hire a software engineer to develop an open-source, wysiwyg, validating XML editor. Open-source was mandatory. Since they were also open to contracting opportunities, and had already evaluated the potential of the editor in Gecko, we quickly shifted to more technical discussions around very precise proposals.
Connexions is an environment for collaboratively developing, freely sharing, and rapidly publishing scholarly content on the Web. They develop and distribute free software meeting their needs, and gather content coming from all around the globe.
After only a few conference calls, we understood Connexions was really interested in a standalone application with the following requirements:
Wysiwyg; wysiwygness was a key factor here, since the editor was meant to be used by people having no technical background of markup languages at all, and it was totally out of question to ask them to acquire such a knowledge.
Open-source; Connexions comes from the academic world, and open-source was not questionable. They also wanted to build a community around the editing tool, and open-source was the natural choice for that. MPL was the preferred license.
RELAX NG; that’s Connexions’ choice, based on the power and complexity of all existing Model and Schema languages.
Validation; under no circumstance should the editor allow to create invalid markup.
Independent; Connexions preferred having a really standalone application, minimizing the number of dependencies to other software.
At that time, there was no open-source project meeting these requirements and none of the commercial closed-source existing software was fulfilling the five conditions; extending commercial software was out of reach, both in terms of feasibility or cost.
After a short while, extending the editor in Gecko came as a natural choice because of the tri-license MPL/GPL/LGPL of Mozilla’s code and because Gecko is perfectly cross-platform. The modernity of the rendering engine and the wide extensibility/localizability of XUL-based applications were also key factors in favour of a Mozilla-based solution. Nvu, our HTML/XHTML editor sponsored by Linspire Inc., just proved not only the technical feasibility of such a project, but also the ability of our company to handle it.
But we still needed a RELAX NG validator and Mozilla did not provide one…
In the list above, item 4 was a major condition that drastically reduced the size of existing solutions we could reuse. Even in terms of embeddable libraries, the choice was narrow.
Wywisygness also impacted drastically our choice since we needed technical solutions for problems RELAX NG does not solve. For instance how do you describe what happens when the user presses the CR key at the end of a given element ? What should be the next element ? Or what is the “blank document” for a given Schema?
Because of that, we decided not to use libxml2 (we probably still owe Daniel Veillard a beer on that one…) and instead implement our own RELAX NG validator.
The project was codenamed ETNA, standing for Editing Tool for Networked Authors. The implementation of Etna started in September 2004, after two months of design and brainstorming sessions. We originally imagined to reuse the codebase of Nvu and “map” XML authoring commands onto an HTML rendering, but we quickly saw that this was a dead end and decided to implement an XML editor inside Gecko.
Etna

Screenshot of the main window of Etna 0.3.1
So let’s describe a bit what is and what’s not Etna… Etna is:
A wysiwyg editor; the document you edit is styled using CSS and if for instance you save the document, browse it with Mozilla Firefox and print it, you’ll see no difference between the printed result and your editing window (if you except the reflow caused by the size of the viewport, of course). The full power of the CSS style engine inside Gecko is available, for the greatest pleasure of the user.
An editor requiring little – if no – knowledge of XML or RELAX NG; we worked hard on that, keeping “geeks and XML freaks are not our primary target” in mind…
A validating editor; whatever you do in Etna, it’s valid because choices leading to invalid markup are never available. All bits of our menus, buttons and dialogs are enabled or disabled on the fly depending on what says the RELAX NG validator about the current selection.
An extensible editor; just like Firefox, Etna is extensible through downloadable packages. And since we have a Schema manager in Etna, it’s possible to build packages containing on one hand a RELAX NG Schema, its attached styles and localization files, and on another hand XUL+JS chrome files (but binary components are also possible) adding UI elements to Etna’s main window. These UI bits will be visible and enabled only for a document based on the corresponding Schema, allowing to adapt Etna’s UI virtually to any kind of Schema.
A cross-platform application; for the time being, it’s available on WinXP, Mac OS X and Linux…
Why a new parser/validator ?
The key reason why we invested a lot of time on a new parser/validator for RELAX NG instead of reusing an existing implementation is the extensions we needed for Etna. Basically, wysiwyg editing of structured documents raise a few problems that were identified more than fifteen years ago, and remain unresolved:
As we said above, what is the “blank document” for a given Schema? Can we even say “the blank document” or should we say “a blank document” ? Should a blank document be minimal and how can we specify default content in such a document ? In other words, what are the default templates for a given Schema and how can we associate these templates to the Schema ?
What is the model for a “blank” instance of a given element in a given context ? Same as above, how can we template that and link the templates to the Schema ?
What are the default style sheets attached to a given Schema and how are they described in the Schema ? How can we specify a style sheet used only in the editing environment that saved documents will never link to ? More generally, how do you tell a Schema to specify an XML processing instruction ?
Similarly, how can we specify the default value for an attribute that is not mandatory ?
How do you specify in a given Schema the DOCTYPE documents based on that schema should declare ?
How do you give a human-readable title to an element type or attribute name ? How can you specify that in en-US locale, the “RPRTH1” element should be presented as “Level 1 Report Header” ?
How do you specify what should be the original selection in a blank instance (of a document or an element) ?
And finally, how do you control the behaviours you expect from all normal wysiwyg editors, for instance interrupt a list and fallback to an unnumbered element when you press twice on the CR key ?
In agreement with Connexions, and after a close look at all the editors on the market, we decided to extend RELAX NG and introduce in RELAX NG schemas our own elements from our own namespace.
The namespace is
http://disruptive-innovations.com/ns/editor-rng-extension/1.0
and the usual prefix for it is “di”.
- <di:blank>
- to define a “blank” instance for a given RELAX NG pattern. It’s possible to have multiple “blank” instances for a given pattern. In that case, a choice dialog will let the user decide exactly what he/she wants to create.
- di:defaultValue
- is an attribute available on attribute patterns to define the default value of the attribute
- <di:externalDoctype>
- allows to specify a DOCTYPE to be added to all document instances based on the current schema
- <di:processingInstruction>
- is precisely what you think it is…
- <di:label> and <di:description>
- are here to attach human readable information to RELAX NG patterns and provide the user with localizable and understandable UI dialogs…
- <di:localizationProperties>
- allows to specify chrome files used to localize the application when it deals with current schema. This is exactly similar to the *.dtd and *.properties used by Mozilla to localize XUL-based applications.
- <di:editorStylesheets>
- if <di:processingInstruction> allows to specify a default stylesheet for a document based on the current schema, this element allows to specify a stylesheet to be used only in the editor, that saved document instances will never link to.
- <di:collapsedSelection>, <di:startSelection> and <di:endSelection>
- allow to place the caret inside a given “blank” instance or specify that a given set of nodes is selected by default in that “blank” instance.
- <di:semantics>
- is probably a bad name and is still a work in progress. This element is here to attach UI behaviours to a given element. It will also allow to enable/disable on the fly some very common UI buttons in wysiwyg editors, like the “numbered list item”.
Since we extended RELAX NG using our own namespace, schemas using our extensions can easily be read by other applications. And in case it’s needed, it’s trivial to strip all our <di:*> extensions from a given schema.
Of course, it’s still possible in Etna to use a schema without such extensions. In that case, Etna will always offer you all the possible choices or fallback to the minimal solution.
So, this is our own extension system to RELAX NG, and no it’s not a standard. But nobody really addresses the issues we raise above. So we had no choice. We don’t believe, as we were told, that the schema is not the right place to fix these issues. We are open to discussion to improve our solution, or even completely revamp it, if good suggestions come. We are also of course totally open to standardization.
How does it work ?
The RELAX NG parser/validator is the root of Etna. Whatever you do in Etna, the validator is called to check the validity of the action, the validity of the result. When a choice is offered in a dialog or popup, the possible choices always come from the validator depending on the context.
And of course, everything starts when you want to load a given schema in Etna’s Schema Manager… An error will occur if the schema is invalid. If it’s not, then you can create a document based on that schema and Etna will first load the schema itself.
The RELAX NG parser then creates a validation graph for each pattern in the schema, itself being the parent of the validation graph of the child patterns. The parser also gathers all the data provided by our <di:*> extensions.
Then we have a dedicated API to validate a given element in given context against the schema.
We are also able to send basic queries to the validator, for instance to retrieve the list of elements that can be inserted after a given element in a given context, or to know if a given element can be deleted. As we said before, all UI actions are enabled or disabled by the validator itself. More technically, the IsEnabled() method of all Etna XUL commands refreshing our UI calls the validator.
We implemented all RELAX NG datatypes, and also a lot of XML Schema datatypes. The validator itself is not schema-agnostic, and it can easily be adapted to live with another schema language, XML Schema for instance. It is also possible to extend the datatypes set through extensions (XPIs) using our diIDatatypeLibrary and diIDataType interfaces.
What we did to Gecko
So we extended Gecko a bit
First, we implemented a new XMLEditor object inheriting from nsEditor. It lives in a new directory called, how surprising, mozilla/libeditor/xml. It’s fairly independent from the rest of the code.
We also added an “about:xmlblank” URL used by the XMLEditor just as “about:blank” is the default for the HTMLEditor, but that should change in the short term feature.
Finally, we tweaked only a bit nsEditingSession to let it know about xml and instantiate an XMLEditor when needed.
Overall, the changes are isolated enough and the goal is clearly to give that code to cvs.mozilla.org.
Issues with Gecko ?
Yep, a few, unfortunately..
First an old, old, really old problem in Gecko… It’s impossible to place the caret in an empty element. An empty block generating no nsIFrame, there’s no selection possible. I have no doubt Mozilla gurus David Baron and Boris Zbarsky will provide us some day with a superb fix for that longstanding issue. The HTMLEditor solves the problem inserting an “invisible” <br> element in the block, but we have no <br> in an arbitrary XML schema ! And since we deal with a validating editor, we cannot insert such a <br> or the document would become invalid… So we have to find workarounds, and such workarounds are complex, ugly and have a huge cost.
In fact, we’d love to be able to specify in Gecko that a given element acts like a line break… The behaviour of the HTML <br> element is hard-coded, and we really miss something like (just thinking aloud here):
display: -moz-line-break;
We miss the CSS 3 content property on replaced elements, for instance for images, to specify in the stylesheet what attribute should be used to retrieve the external resource.
What’s the future for Etna ?
We still have a lot to do…
we need to extend the APIs for extensions authors, to help them add easily UI to their schemas
our Find/Replace feature is weak (hear not implemented yet…)
we copy/cut/paste only text nodes for the time being and we should really deal with an XML clipboard here.
the main window toolbars are not customizable only because it’s easy to fix and we chose to focus on more important and harder stuff.
one of the top user feedback is about MathML… We need to do something in that space.
tabs, the world wants tabs. Firefox has tabs, Nvu has tabs, hey even IE7 has tabs. Etna needs tabs to be able to edit multiple documents in a single window.
we want to move to a xulrunner-based application.
we probably need to do an extensibe cleanup of nsEditor because it’s not isolated enough from nsHTMLEditor…
oh right, we need a web site and forums and a mailing-list
Conclusion
From our perspective, Etna is not only a new XML editing tool. It’s also the live proof that it’s possible, with a quite little investment (one year and half of work for a single software engineer), to build a professional application based on Mozilla.
We aim to build a very user-friendly open-source XML editor, available on all platforms, and we hope it’s going to become a “de facto” standard for structured editing in Linux distros, a worthy companion to Nvu and cousins Firefox and Thunderbird.
We would like to deeply thank the Connexions Project, and in particular Geneva Henry and Brent Hendricks, that funded all the work done on Etna until the end of 2005.
On a more XML note, we hope to finally show that Wysiwyg and Structured are not two antinomic notions, as we often hear.
Etna is available from http://rhaptos.org/downloads/editing/etna/




