The Intelligent Design of Microformats
Microformats exploded in popularity throughout 2005 [http://microformats.org/blog/2006/01/04/2005-year-in-review/] and will likely continue to grow in 2006. While the ideas behind microformats are not new (there are similar proposals regarding semantic XHTML going back to 2000), the Web appears to have grown up to the point where using semantic markup is now feasible.
In this talk I’d like to explain some of the vision behind microformats and how the ideas and technology provide unique benefits for the web.
Microformats have been deemed the ‘lowercase semantic web’ – which reflects both the similarities and differences between the semantic XHTML/microformats way and the (Uppercase) Semantic Web approach. The two methodologies share a common goal: create a world-wide web of data, to which anyone can publish, and on top of which many can build tools.
Just like the vision for the Semantic Web, microformats developers wish to create distributed ecosystems of data, which will avoid the common problems of walled-gardens and service lock-in. Enabling this mode of publishing will allow people to control their data, which will create virtuous feedback cycles leading to a richer ecosystem of data.
Microformats take a different approach to publishing data than the Semantic Web. Specifically, microformats enable publishing data with a low barrier-to-entry, attempting to create a larger, higher quality base of data on the web. Anyone who understands how to author (X)HTML can, by learning only some conventions, publish microformats. This approach will enable more people to publish data on the web, enabling web-scale knowledge systems to emerge.
In this way, microformats can be seen as a low-barrier-to-entry method for supplying data for the Semantic Web. All data published via microformats is trivially transformable to RDF/XML via GRDDL [http://www.w3.org/TeamSubmission/grddl/].
Additionally, microformats are a way to structure data which is already present on the web, rather than creating secondary public representations of the data. This has several benefits. First of all, by structuring visible data, publishers obey the DRY principle [http://c2.com/cgi/wiki?DontRepeatYourself], which helps prevent data drift and cruft. This will lead to a higher quality data set, since more eyes lead to fewer bugs.
Secondly, by marking up already-published data, we enable a new set of loosely coupled applications, from GreaseMonkey [http://greasemonkey.mozdev.org/] hacks to spidering user-agents meant to extract semantic data. By avoiding the need for auto-discovery of secondary data representations, user-agents are able to work in a more ad-hoc manner, with simpler specification.
Microformats are designed for humans first and machines second, for the purpose of creating larger, more reliable webs of data, published by more people. The microformats approach is the low-cost, efficient way to build a web of data.




