RSS and Atom feed icon News feeds

TreeBind: an API to bind them all

Introduction

What does “data binding” mean?

Although Wikipedia doesn't provide a generic definition of “data binding”, it does provide a definition of “XML data binding”:

XML data binding refers to the process of representing the information in an XML document as an object in computer memory. This allows applications to access the data in the XML from the object rather than using the DOM to retrieve the data from a direct representation of the XML itself.

.../...

When this process is applied to convert an XML document to an object, it is called unmarshalling. The reverse process, to serialize an object as XML, is called marshalling.

So what?

Back in 2001, James Clark mentioned in a famous keynote session XML processing as one of the five most important challenges for XML with a special mention to loosely coupled data binding interfaces "automating the process of mapping between the XML and the data model" even when the internal structure is not a tree but directed graphs (quoted from my report on xmlhack).

This issue is important not only because using classical XML processing APIs is boring and a waste of time but also because this type of processing is too often a good excuse to escape Object Orientation to write procedural code.

Data binding is really the kind of low level stuff that your programming environment should do for yourself.

Progress seems to be stalled, why?

Five years after James Clark's keynote, the situation seems to be pretty much the same.

A reason for this slow progress had been given by James Clark in this very same keynotes (quoted from Edd Dumbill's report on XML.com):

Most existing solutions to this problem use annotations on a schema document, and Clark noted this was one approach. He observed that this was probably, however, working at the wrong level of abstraction for effective data modelling. He described two other approaches he thought promising: code-centric, where program classes could be annotated to indicate mappings to XML; and a modelling approach, where a higher-level representation such as UML is used to drive both schema and code generation.

Is data binding specific to XML?

Although the above definition contains multiple instance of the word “XML”, this term could be replaced by the name of any technology that stores data independently of the logic.

Curiously, when you replace the term “XML” by “relational database”, data binding is often called “persistence” but that's basically the same issue.

And there is no reason to limit the scope to XML and relational databases. Other data storage and data models such as RDF and LDAP are very good candidates for data binding.

Why TreeBind?

TreeBing (http://treebind.org) is a generic data binding API that can be used with multiple data storage / data models through specific “sources” and “sinks”.

The current XML source and sink are “code centric” as defined by James Clark: they do not rely on XML schemas and do their job through introspection (support of annotations should be added soon).

TreeBind is still work in progress.

The current version supports XML, RDF, LDAP and Java objects but not all the combinations between these sources and sinks have been developed yet.

TreeBind follows best practices that make the success of modern frameworks such as Ruby on Rails:

  • Use conventions over configuration: most of the mappings between names rely on conventions that can be overridden if needed rather than using configuration files or directives.

  • Don't Repeat Yourself (DRY): using techniques such as reflection improves the system maintainability.

What users need to know

XML to Java

To bind a XML document into a tree of Java objects you need:

  • A XML document

  • Java package(s) with classes matching the structure of the XML document.

  • And of course, the TreeBind library!

For example, if your XML document is:

<?xml version="1.0" encoding="UTF-8"?>
 <library>
     <book id="b0836217462" available="true">
         <isbn>0836217462</isbn>
         <title xml:lang="en">Being a Dog Is a Full-Time Job</title>
         <author id="CMS">
             <name>Charles M Schulz</name>
             <born>1922-11-26</born>
             <died>2000-02-12</died>
         </author>
         <character id="PP">
             <name>Peppermint Patty</name>
             <born>1966-08-22</born>
             <qualification>bold, brash and tomboyish</qualification>
         </character>
         <character id="Snoopy">
             <name>Snoopy</name>
             <born>1950-10-04</born>
             <qualification>extroverted beagle</qualification>
         </character>
         <character id="Schroeder">
             <name>Schroeder</name>
             <born>1951-05-30</born>
             <qualification>brought classical music to the Peanuts strip</qualification>
         </character>
         <character id="Lucy">
             <name>Lucy</name>
             <born>1952-03-03</born>
             <qualification>bossy, crabby and selfish</qualification>
         </character>
     </book>
 </library>

You will have a Library java class looking like:

public class Library {
List books = new LinkedList();
public void addBook(Book book) {
books.add(book);
}
public String toString() {
return "I am a library with " + books.size() + " book(s).";
}
.../...
}

A Book class looking like:

public class Book {

String id;
String available;
String isbn;
List titles = new LinkedList();
List authors  = new LinkedList();
List characters = new LinkedList();

public void setAvailable(String available) {
this.available = available;
}
public void setId(String id) {
this.id = id;
}
public void setIsbn(String isbn) {
this.isbn = isbn;
}
public void addTitle(Title title) {
titles.add(title);
}
public void addAuthor(Author author) {
authors.add(author);
}
public void addCharacter(Character character) {
characters.add(character);
}
}

And so forth.

The conventions being used here are:

  • Class names are UCC (UperCamelCase) versions of elements and attribute names.

  • Method names are UCC (UperCamelCase) versions of elements and attribute names with a “set” or an “add” prefix.

  • The radical for method and class names are the same (for instance a Book element is added through a addBook method).

These conventions have been chosen because they are common practices and that you are probably using them if you use getters in your own classes.

Using getters and settters is controversial: some believe that getters and setters are not Object Oriented and I tend to agree.

TreeBind is flexible and modular enough to allow other paradigms to be used instead of the current getters and setters convention if needed.

With these classes around, the actual binding is trivial:

/**
 * @param args
 * @throws Exception 
 */
public static void main(String[] args) throws Exception {
XmlSax2JavaObjectPipe pipe = new XmlSax2JavaObjectPipe();
pipe.setParameter("namespace2package", "",
Library.class.getPackage().getName());
File xmlFile = new File(args[0]);
pipe.parse(xmlFile.toURI().toASCIIString());
Library library = (Library) pipe.getObject();
/* use the library object here */
System.out.println(library);

} 

The first instruction creates a new “XmlSax2JavaObjectPipe”. As the name says, this class is a pipeline that binds XML Sax events into a tree of Java objects.

The second instruction sets a parameter that will control the binding. The parameter used here (“namespace2package”) controls the binding between XML namespaces and Java package names. Here, we bind the empty namespace to the package of our current class.

The third instruction creates a File object from the first command line parameter.

The fourth one instructs the pipeline to parse the URI corresponding to this file.

The XML parsing of the resource addresses by this URI pours SAX events into the pipe and the object created through these events is cast into a Library object by the fifth instruction.

The Library object can now be used.

Java to XML

To bind a tree of Java objects into a XML document you need:

  • Java package(s) with classes matching the structure of the XML document.

  • And of course, the TreeBind library!

The Library class would look like:

public class Library {
List books = new LinkedList();
.../...
public Iterator getBooks() {
return books.iterator();
}
.../...
}

The Book class could look like:

public class Book {
String id;
String available;
String isbn;
List titles = new LinkedList();
List authors  = new LinkedList();
List characters = new LinkedList();

.../...
public String getId() {
return id;
}
public String getIsbn() {
return isbn;
}
public Iterator getTitles() {
return titles.iterator();
}
public Iterator getAuthors() {
return authors.iterator();
}
public Iterator getCharacters() {
return characters.iterator();
}

}

The Title class could something such as:

public class Title {
String lang;
String value;
.../...
public String getLang() {
return lang;
}
public String getValue() {
return value;
}
}

And so forth.

The binding itself can be written as:

/* Java to XML */
JavaObject2XmlSaxPipe pipe2 = new JavaObject2XmlSaxPipe();
pipe2.setParameter("package2namespace", Library.class.getPackage()
.getName(), "");
StringWriter writer = new StringWriter();
ContentHandler contentHandler = new XMLSerializer(writer,
new OutputFormat("xml", "utf-8", true));
pipe2.setContentHandler(contentHandler);
pipe2.pourObject(library);
System.out.println(writer.toString());

The first instruction creates a new “JavaObject2XmlSaxPipe”. As you'd guessed from the name, this is a pipeline that binds Java objects into a stream of SAX events.

The second instruction says that the package of the Library class must be bind into the empty namespace.

The third and fourth ones create a SAX content handler that will serialize the XML into a StringWriter.

The fifth one connects this content handler to the pipeline.

The sixth one pours our Library object into the pipe.

The last one displays the result which, surprise, surprise is:

<?xml version="1.0" encoding="utf-8"?>
<library>
    <book>
        <character>
            <born>1966-08-22</born>
            <qualification>bold, brash and tomboyish</qualification>
            <name>Peppermint Patty</name>
            <id>PP</id>
        </character>
        <character>
            <born>1950-10-04</born>
            <qualification>extroverted beagle</qualification>
            <name>Snoopy</name>
            <id>Snoopy</id>
        </character>
        <character>
            <born>1951-05-30</born>
            <qualification>brought classical music to the Peanuts strip</qualification>
            <name>Schroeder</name>
            <id>Schroeder</id>
        </character>
        <character>
            <born>1952-03-03</born>
            <qualification>bossy, crabby and selfish</qualification>
            <name>Lucy</name>
            <id>Lucy</id>
        </character>
        <available>true</available>
        <isbn>0836217462</isbn>
        <title>
            <lang>en</lang>Being a Dog Is a Full-Time Job</title>
        <author>
            <born>1922-11-26</born>
            <died>2000-02-12</died>
            <name>Charles M Schulz</name>
            <id>CMS</id>
        </author>
        <id>b0836217462</id>
    </book>
</library>

Let's face it: there is no round-trip built into the current version of TreeBind! This document has a structure which is similar to the original document, but there are a number of differences:

  • The relative order of sub-elements isn't preserved.

  • Everything is serialized as elements (there are no attributes).

  • This is true even for “complex type simple content elements” which are converted into mixed content such as in “<title><lang>en</lang>Being a Dog Is a Full-Time Job</title>” (the good news is that this proves that TreeBind supports mixed content!).

  • The namespace of the xml:lang attribute has been lost!

This is the price to pay for relying on conventions. These conventions can be improved (for the last two bullet points for instance), but to fully control the XML serialization, source code annotations as advocated by James Clark really seem to be a nice solution.

We will also see that users can write or override existing TreeBind filters to implement their own conventions.

From LDAP to Java objects

This (still experimental) pipe is following the same principles that the XML to Java objects pipes. To use it, you need:

  • Java packages matching the structure of your LDAP schema.

  • LDAP data available either as LDIF documents or as LDAPSearchResults objects.

  • The TreeBing library.

The current version doesn't follow the relation between objects that can exist in LDAP repository and generates objects that are only “one level deep”.

As an example, let's say that we have stored our characters in a LDAP repository and exported them as the following LDIF document:

dn: uid=PP,ou=characters,o=treebind,c=org
objectclass: top
objectclass: person
objectclass: character
uid:PP
name:Peppermint Patty
born:1966-08-22
qualification:bold, brash and tomboyish

dn: uid=Snoopy,ou=characters,o=treebind,c=org
objectclass: top
objectclass: person
objectclass: character
uid:Snoopy
name:Snoopy
born:1950-10-04
qualification:extroverted beagle

dn: uid=Schroeder,ou=characters,o=treebind,c=org
objectclass: top
objectclass: person
objectclass: character
uid:Schroeder
name:Schroeder
born:1951-05-30
qualification:brought classical music to the Peanuts strip

dn: uid=Lucy,ou=characters,o=treebind,c=org
objectclass: top
objectclass: person
objectclass: character
uid:Lucy
name:Lucy
born:1952-03-03
qualification:bossy, crabby and selfish

We could write the following Characters class to hold this set of characters:

public class Characters {

List characters = new LinkedList();

public void addCharacter(Character character) {
characters.add(character);
}
}

And adapt our existing Character class so that it accepts objectclass and uid properties (up to now, we had no objectclass property and uid was called id):

public class Character {

.../...
public void setUid(String id) {
this.id = id;
}
.../...
public void addObjectclass(String o) {
}

}

To bind the LDIF document into objects from these classes, we would write:

/* LDIF to Java */
Ldap2JavaObjectPipe pipe = new Ldap2JavaObjectPipe();
pipe.setParameter("rootLevel", Characters.class.getName());
pipe.setParameter("topLevel", Character.class.getName());
pipe.parse(new FileInputStream(args[0]));
Characters characters = (Characters) pipe.getObject();
System.out.println(characters);

The first instruction creates a Ldap2JavaObjectPipe object. As its indicated by its name, this class implements a pipe binding LDAP data into Java objects.

The second instruction indicates that the whole LDAPSearchResults object or LDIF document should be bind into a Characters object.

The third instruction says that top level objects should be bind into Character object.

The fourth one parses the LDIF document and pours the result into the pipe.

The fifth instruction casts the result as a Characters object.

If we bind this Characters object into a XML document, we get:

<characters>
    <character>
        <born>1966-08-22</born>
        <qualification>bold, brash and tomboyish</qualification>
        <name>Peppermint Patty</name>
        <id>PP</id>
    </character>
    <character>
        <born>1950-10-04</born>
        <qualification>extroverted beagle</qualification>
        <name>Snoopy</name>
        <id>Snoopy</id>
    </character>
    <character>
        <born>1951-05-30</born>
        <qualification>brought classical music to the Peanuts strip</qualification>
        <name>Schroeder</name>
        <id>Schroeder</id>
    </character>
    <character>
        <born>1952-03-03</born>
        <qualification>bossy, crabby and selfish</qualification>
        <name>Lucy</name>
        <id>Lucy</id>
    </character>
</characters>

LDAP to XML

LDAP can also be bonded directly into XML (without going through a set of Java objects).

To bind the same LDIF document as XML, we could write:

/* LDAP to XML */
Ldap2XmlSaxPipe pipe3 = new Ldap2XmlSaxPipe();
pipe3.setParameter("ldifElement", "characters");
pipe3.setParameter("ldifRecord", "character");
pipe3.setParameter("defaultLdapNamespace", "");
StringWriter writer2 = new StringWriter();
ContentHandler contentHandler2 = new XMLSerializer(writer2, new OutputFormat("xml", "utf-8", true));
pipe3.setContentHandler(contentHandler2);
pipe3.parse(new FileInputStream(args[0]));
System.out.println(writer2.toString());

The first instruction create a new Ldap2XmlSaxPipe object which, surprise, surprise, binds LDAP data into a stream on XML Sax events.

The second one says that the root should be a characters element and the third that top level elements should be character elements. And yes, I have noticed that this is not very consistent with the conventions used in the Ldap2JavaObjectPipe and I need to change that!

The fourth instruction sets the default namespace.

The fifth and sixth create a SAX content handler which serializes its events in a StringWriter.

The seventh instruction plugs this SAX content handler into the pipe and the eight parses the LDIF file.

The last instructions prints the following result:

<?xml version="1.0" encoding="utf-8"?>
<characters>
    <character>
        <ldapDN>uid=PP,ou=characters,o=treebind,c=org</ldapDN>
        <objectclass>top</objectclass>
        <objectclass>person</objectclass>
        <objectclass>character</objectclass>
        <name>Peppermint Patty</name>
        <uid>PP</uid>
        <qualification>bold, brash and tomboyish</qualification>
        <born>1966-08-22</born>
    </character>
    <character>
        <ldapDN>uid=Snoopy,ou=characters,o=treebind,c=org</ldapDN>
        <objectclass>top</objectclass>
        <objectclass>person</objectclass>
        <objectclass>character</objectclass>
        <name>Snoopy</name>
        <uid>Snoopy</uid>
        <qualification>extroverted beagle</qualification>
        <born>1950-10-04</born>
    </character>
    <character>
        <ldapDN>uid=Schroeder,ou=characters,o=treebind,c=org</ldapDN>
        <objectclass>top</objectclass>
        <objectclass>person</objectclass>
        <objectclass>character</objectclass>
        <name>Schroeder</name>
        <uid>Schroeder</uid>
        <qualification>brought classical music to the Peanuts strip</qualification>
        <born>1951-05-30</born>
    </character>
    <character>
        <ldapDN>uid=Lucy,ou=characters,o=treebind,c=org</ldapDN>
        <objectclass>top</objectclass>
        <objectclass>person</objectclass>
        <objectclass>character</objectclass>
        <name>Lucy</name>
        <uid>Lucy</uid>
        <qualification>bossy, crabby and selfish</qualification>
        <born>1952-03-03</born>
    </character>
</characters>

The differences with the previous result where we had serialized as XML the object tree generated from the LDIF file come from the design of the Character class:

  • The addObjectclass method is empty and objectclass properties are lost.

  • The setUid method is storing its value in the id property which is then fetched through a getId method. The result is that the LDAP uid property was previously serialised as an id element.

  • There is no addLdapDN nor setLdapDN method and this information was lost (the general policy is to raise an exception when a method is not found, but LDAP DNs setters are considered optional since classes may not want to store these identifiers).

RDF to Java objects

This is pretty similar to the binding that we've seen previously, except that of course, the starting point is a RDF model (the RDF to Java binding works directly with RDF models and doesn't care at all about its serialization).

To use it, you need:

  • Java packages matching the structure of your RDF model.

  • A RDF model.

  • The TreeBing library.

One of the many ways to represent our library in RDF is:

<?xml version="1.0" encoding="UTF-8"?>
<rdf:RDF xmlns="http://ns.treebind.org/xtech2006/" xmlns:xt="http://ns.treebind.org/xtech2006/"
    xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
  <library>
    <book rdf:resource="#b0836217462"/>
  </library>
  <book rdf:ID="b0836217462" xt:available="true">
    <isbn>0836217462</isbn>
    <title rdf:parseType="Resource" xml:lang="en">
      <value>Being a Dog Is a Full-Time Job</value>
    </title>
    <writtenBy rdf:resource="#CMS"/>
    <featuring rdf:resource="#PP"/>
    <featuring rdf:resource="#Snoopy"/>
    <featuring rdf:resource="#Schroeder"/>
    <featuring rdf:resource="#Lucy"/>
  </book>
  <author rdf:ID="CMS">
    <name>Charles M Schulz</name>
    <born>1922-11-26</born>
    <died>2000-02-12</died>
  </author>
  <character rdf:ID="PP">
    <name>Peppermint Patty</name>
    <born>1966-08-22</born>
    <qualification>bold, brash and tomboyish</qualification>
  </character>
  <character rdf:ID="Snoopy">
    <name>Snoopy</name>
    <born>1950-10-04</born>
    <qualification>extroverted beagle</qualification>
  </character>
  <character rdf:ID="Schroeder">
    <name>Schroeder</name>
    <born>1951-05-30</born>
    <qualification>brought classical music to the Peanuts strip</qualification>
  </character>
  <character rdf:ID="Lucy">
    <name>Lucy</name>
    <born>1952-03-03</born>
    <qualification>bossy, crabby and selfish</qualification>
  </character>
</rdf:RDF>

The Library, Character and Author classes can be kept the same as before for this binding, however, if you look carefully at the book element:

<book rdf:ID="b0836217462" xt:available="true">
  <isbn>0836217462</isbn>
  <title rdf:parseType="Resource" xml:lang="en">
    <value>Being a Dog Is a Full-Time Job</value>
  </title>
  <writtenBy rdf:resource="#CMS"/>
  <featuring rdf:resource="#PP"/>
  <featuring rdf:resource="#Snoopy"/>
  <featuring rdf:resource="#Schroeder"/>
  <featuring rdf:resource="#Lucy"/>
</book>

You'll notice that the name of the predicate in the triple linking books to authors is “writtenBy” and the name of the predicate in the triple linking books to characters is “featuring”.

In its RDF to Java binding, TreeBind derives method names from predicate names, meaning that it will expect to find setters that are no longer addAuthor or setAuthor but addWrittenBy or setWrittenBy and we need to adapt our Book class to these changes:

public class Book {
.../...
public void addWrittenBy(Author author) {
authors.add(author);
}
.../...
public void addFeaturing(Character character) {
characters.add(character);
}
.../...
}

With these modifications, the actual binding is done as:

/* RDF to objects */
JenaRdf2JavaObjectPipe pipeRdf = new JenaRdf2JavaObjectPipe();
pipeRdf.setParameter("namespace2package",
"http://ns.treebind.org/xtech2006/", Library.class.getPackage().getName());
File rdfFile = new File(args[1]);
pipeRdf.parse(rdfFile.toURI().toASCIIString());
pipeRdf.pourModelFromType("http://ns.treebind.org/xtech2006/library");
Library rdfLibrary = (Library) pipe.getObject();

The first instruction is as usual to create a pipe.

The second one sets the mapping between namespaces and package names.

The second and third ones parse the RDF document into a model. Unlike this was the case with XML, the model isn't poured into the pipe yet. The reason is that RDF models can be pretty big and that they are not trees but graphs.

The fourth instruction pours resources from the model that have a type of library into the pipe.

The fifth instruction casts the result into a Library object.

To check what we've got, we can serialize this object as XML and get:

<?xml version="1.0" encoding="utf-8"?>
<library>
    <book>
        <character>
            <born>1966-08-22</born>
            <qualification>bold, brash and tomboyish</qualification>
            <name>Peppermint Patty</name>
            <id>PP</id>
        </character>
        <character>
            <born>1950-10-04</born>
            <qualification>extroverted beagle</qualification>
            <name>Snoopy</name>
            <id>Snoopy</id>
        </character>
        <character>
            <born>1951-05-30</born>
            <qualification>brought classical music to the Peanuts strip</qualification>
            <name>Schroeder</name>
            <id>Schroeder</id>
        </character>
        <character>
            <born>1952-03-03</born>
            <qualification>bossy, crabby and selfish</qualification>
            <name>Lucy</name>
            <id>Lucy</id>
        </character>
        <available>true</available>
        <isbn>0836217462</isbn>
        <title>
            <lang>en</lang>Being a Dog Is a Full-Time Job</title>
        <author>
            <born>1922-11-26</born>
            <died>2000-02-12</died>
            <name>Charles M Schulz</name>
            <id>CMS</id>
        </author>
        <id>b0836217462</id>
    </book>
</library>

The internals

How is that working? Fine, thank you!

Data model

The data model behind TreeBind is simple and basic enough to be a superset of the XML, RDF, LDAP and Java objects data models...

It is based on names, complex properties and leaf properties.

Names

The Name interface includes explicit provision for a full name composed of a domain name and a local name.

Some of its implementations such as XmlElementName, RdfResourceName, or JavaClassName make use of these two components.

Other implementations such as JavaMethodName or LdapName do not provide a domain name.

Finally, implementations such as XmlAttributeName hide a more complex behaviour beneath this simple interface.

Equality between names is left to the equals method which can use algorithm as complex as needed...

Properties

Properties have two names: a role and a nature.

Leaf properties have a value while complex properties have a list of sub properties.

Comparison between data models

A rough comparison between the four data models supported by the current version can be summarized in the following table:

Table 1: comparison between data models

Data model

Structure

Role names

Nature names

Java objects

Graph

Unqualified (method names)

Qualified by package names (Class names)

XML

Tree (unless ID/IDREF are used)

None: nature names are used as role names

Qualified by namespaces (element and attribute names)

RDF

Graph

Qualified (predicates are URIs)

Qualified (objects are URIs)

LDAP

Graph

Unqualified

Unqualified

XML has the additional property of having several types of nodes (element, text, attribute, PI, comment, namespace, document, ...).

The current version of TreeBind supports element, text and attribute nodes and differentiates these types of nodes through different implementations of the Name interface:

  • XmlElementName identifies elements.

  • XmlAttributeName identifies attributes.

  • Text nodes are identified by the static name XmlProductionName.TEXTNODE.

Text nodes are explicitly exposed only in non simple type elements:

  • In <foo>simple type element</foo> TreeBind sees a leaf property which nature is a XmlElementName and which value is “simple type element”.

  • In <foo bar="bat">complex type element</foo> TreeBind sees a complex property with two leaf properties, the second of them having the name XmlProductionName.TEXTNODE and the value “ complex type element”.

This very simple principle means that the TreeBind data model supports advanced XML content models (such as mixed content models) and could support other XML node types (such as PIs and comments) if that was needed.

Pipelines

Names and properties are the data model manipulated by TreeBind.

The actual data binding takes place in SAX like pipelines composed of a source, zero or more filters and a sink. Properties move through these pipelines by events materialized by method calls.

Events

TreeBind knows only two events materialized by three method calls:

addLeaf
This method is used to send a leaf property. Its argument are the role, nature and value of the property.
startProperty
This method is used to send the beginning of a complex property. Its argument are the role and nature.
endProperty
This method is used to send the end of a complex property. It takes no argument.

Sinks

Sinks receive TreeBind events. There is one sink implementation per target data model and new data models can be added by implementing new Sink classes.

Sources

Sources send TreeBind events. There is one source implementation per source data model and new data models can be added by implementing new Source classes.

To be sure that sources and sinks can be used in different bindings, it is important, when you design one, to expose all the features of the data model without taking into account the conversions in which the source or sink will be used.

If you design both a source and a sink for the same model, a good way to check that they are well designed is to check that you achieve round trips when you chain them together.

Filters

Filters are where lives most of the magic of the binding: they take care of the impedance mismatch between a source and a sink that have not been designed to work together!

There is a filter per source/sink couple and their main responsibility is to translate the property names.

Pipelines

Pipelines are just there to make it easier to manipulate sources, filters and sinks.

They act as containers for a source, zero or more filters and a sink and store configuration parameters that can be used by their elements.

For your convenience, they often also act as a façade or proxy to give you access to the source and sink methods to pour stuff in the pipe and get the result.

TreeBom

DOM users will tall you that there are cases where streaming APIs become really boring.

For these cases, a TreeBom (TreeBind Object Model) is available that stores properties as a tree.

TreeBom can be used either as a sink or as a filter.

As a sink it can be plugged to a source and queried like a DOM to navigate among the property tree.

As a filter, if can be plugged within a pipe and used as a cache or memory of the events that flow through the pipe.

Example: the XmlSax2JavaObjectPipe class

As an example, let's have a look at the XmlSax2JavaObjectPipe class.

XmlSax2JavaObjectPipe pipes are chaining a XmlSaxSource source, a XmlSax2JavaObjectFilter filter and a JavaObjectSink sink.

The source and sink are designed so that they could be used in other pipes with other sinks and sources and the filter is the only pipe element specific to this combination of source and sink:

  • XmlSaxSource is a SAX content handler which transforms SAX events into TreeBind events. The names of the property roles and natures are instances of name implementations specific to XML (XmlElementName, XmlAttributeName and XmlProductionName.TEXTNODE and the structure of the TreeBind events follows the structure of the SAX events: if you chained a XmlSaxSink to a XmlSaxSource you'd have a perfect round trip of the XML productions supported by TreeBind.

  • XmlSax2JavaObjectFilter is a simple filter implementation that translates XML names into Java class and method names keeping the structure of the events flow unmodified.

  • JavaObjectSink takes a stream of TreeBind events and build a graph of Java objects. Again, this sink isn't specific to XML to Java binding and can be used to bind other sources into Java object assuming the right filter is used.

Customizing TreeBind

I am a strong believer that simple things are most powerful and I am trying to keep TreeBind as simple and focused as possible.

However, as a TreeBind user I have seen situations where TreeBind needed to be customized and there are three different ways to achieve customization.

Pipe parameters

Pipe parameters are configuration properties that pipe elements can read.

They are used for properties that most applications will need to customize, such as namespaces to package names mappings, deciding whether TreeBind should raise an exception when a method name can't be found, ...

I am trying to keep the number of these parameters as small as possible since TreeBind favours convention over configuration and also because there are other customization methods available.

Pipe element override

The second customization method is to override existing pipe elements.

TreeBind is still relatively simple with classes kept small and once you get familiar with its internals, it shouldn't be a big deal the override classes that do not behave like you want.

For instance, if you don't like the getters and setters used to bind to and from Java object, you could override the JavaObjectSink and JavaObjectSource classes.

Adding new filters

A third customization method that I have found most useful is to add new filters in pipelines.

If for instance you need to bind a microformat (ie a XHTML subset in which “class” attributes are used as you'd been using element names in a XML vocabulary) to Java object, you can write a TreeBind filter that derives property roles and natures from “class” leaf properties and include this filter in a XmlSax2JavaObjectPipe pipe.

Future work

The architecture of TreeBind is incredibly flexible and the number of useful features that could be developed seems endless.

Here are just some few ideas that I find most interesting:

  • Java annotations: when binding to or from Java classes, Java annotations could be used as a customization method.

  • Class generation: that's a kind of chicken/egg problem... People starting with Java classes will find TreeBind interesting as it is now since they can generate XML documents out of their classes with close to zero effort. People starting with XML documents would probably be happy to be able to generate Java classes out of their existing documents. This approach would lead to using these documents as “schemas” and this is the idea behind Examplotron (http://examplotron.org). A cool way to achieve that would be to write an Examplotron to Java pipeline that would read an Examplotron “schema” and generate Java classes.

  • XML roundtrip: XML roundtrip requires additional information and this information could be found in annotations and/or in schemas. This feature would play well with an Examplotron to Java pipeline that would insert enough annotations in Java classes to produce XML serializations conform to the schema.

Acknowledgement

TreeBind has been developed for the INSEE (Institut national de la statistique et des études économiques) and this organization has decided to publish it under the GPL licence has advised for such projects by the French administration.