The Intelligent Design of Microformats
- , ,
Introduction
Microformats exploded in popularity throughout 2005 and will likely continue to grow in 2006. While the technological ideas behind microformats are not new, the Web appears to have grown up to the point where using semantic markup is now feasible.
Microformats are, by design, conservative technology- a way of providing benefit to the web by inventing as little as possible, building on top of well established technology. As much as possible, the design and development of microformats are based on common Web practices and ideas.
Web as Platform
The web is a platform- a well established set of technologies on top of which new technologies can be built. The underlying protocols- TCP and IP- are mature, stable and well understood. HTTP is very useful, scalable and understood by many developers. HTML, despite its incompatibilities between implementations, is a mature technology, also well understood by many developers.
It's tempting to look at these technologies, point out their shortcomings, then seek to build new technologies for the sake of 'fixing' the problems. This is the revolutionary approach, wherein, old regimes are thrown off and new ones created. However, when we discard technologies, we throw away a lot of value. This value should be measure not just in technical capabilities and operational software, but also in knowledge and experience.
An important part of this platform, which is often overlooked, is HTML. HTML was created to solve a specific need- to create an interoperable hypertext system for physics researchers. It has since grown to be the largest information system in the history of mankind.
Growth of the web means growth of technology and products useful for browsing, indexing and authoring for the web. More importantly, it also means growth of the community of people familiar with web technology. This is perhaps the most significant resource for the web- authors capable of writing HTML documents.
Human comprehensibility of HTML is vital to its usage. There are many useful tools available for creating websites, which allow authors to ignore much of the drudgery of producing documents, but effective usage of those tools requires minimal knowledge of HTML. Secondarily, creation of such tools requires intimate knowledge of HTML.
This knowledge is an important part of the economy of the web and justifies more attention than I have to give it here.
Web as Playground
Many see HTML as an evolutionary dead end, yet, in fact, there are several ways to extend HTML.
First, with XHTML, HTML can be extended with XML namespaces. This takes advantage of decentralized development, by allowing anyone to create and use a namespace for their data. However, it introduces a new problem, known by some as "The Tower of Babel Problem". By allowing anyone to create a namespace, and then pushing all the vocabularies into namespaces, we create a situation where two data formats, seemingly identical, cannot communicate. They're saying the same thing, but in different dialects of XML.
Also, in the specific case of extending HTML with XML namespaces, we have a problem with our most popular user-agent, browsers. It is not clear what browsers should do with XHTML attributes and namespaces outside the XHTML namespace, beside provide access to them in the browser's DOM for scripting. There's certainly no presentational heuristics, and no current support for namespaces with CSS.
A second way to extend HTML is to build on top of it, using its constructs and tools as a baseline. There are several HTML constructs which are very useful for adding semantics to the language. First, the <meta> element can take on new values for the name and content attributes, which are defined by a profile URI. With the <meta> element, you can add new bits of metadata to your HTML documents in a way that's understandable to current user-agents. This has been used for supplying data to web applications, embedding Dublin Core metadata and various other purposes.
Elements in HTML which have URI attributes (<a> and <link>) have two attributes rel and rev, which allow authors to apply semantics to the link. The rel attribute is used to define relationships between the referred resource and the current document. The rev attribute define the relationship in the opposite direction. These constructs allow authors to apply semantics to hypertext links. They have been used to add stylesheets to documents, provide machine understandable navigation and are the basis of several microformats.
The third, and most popular, way to extend HTML is with the class attribute. Authors may not think of using class attribute values as an extension of HTML. Many think of it as a hook for styling with CSS. However, the HTML specification allows using the class attribute for "general user agent processing." So, by adding class names to elements in CSS, authors are essentially able to expand the semantics of the language.
Web as Process
The Web is a great place for experimenting with new ideas. After publishing an idea on the Web, it is often picked up by many people, who iterate and improve on the idea. As on innovator on the Web, it is important to watch and learn from how innovations are adopted and adapted.
Several years ago, many Web developers began to push for using more semantic HTML markup in conjunction with CSS in order to create more extensible and maintainable documents. Since these practices are out in the open, collaboration and convergence was bound to happen. Best practices were identified and evolved into standard practices for progressive web developers.
The idea for microformats grew out of this natural convergence of vocabulary usage. As people naturally converge on vocabularies, it is possible to standardize them with just enough rigor that the human semantics can become machine semantics. At that point, we can take formerly unstructured data and treat it as structured data, enabling a new form of interoperability.
When designing microformats, we seek to observe and document human behavior. Since we are trying to make publishing structured data on the web as simple as possible for the publisher, studying common usage is vitally important.
Conclusion
Microformats are a novel approach to publishing structured data on the Web. The novelty comes, not only in the choice of technology used, but also in the process employed to create the technology. Rather than creating schemas and formats based on a theoretical or idealistic view of how people should be publishing on the Web, with microformats we prefer to observe and standardize on common, emergent behavior.




