RSS and Atom feed icon News feeds

RDFA: The Easy Way to Publish Your Metadata

Introduction

RDFA is a new technique for embedding metadata into any XML document by using a small number of attributes. Its primary use is in XHTML documents, and allows metadata to be added in such a way that an ordinary home page can provide metadata such that it can serve as a FoaF file, an RSS feed, even a list of items for sale.

Aim

This paper aims to provide an introduction to RDFA, and will show how it can make the publication of metadata as easy as publishing any other type of information.

What is RDF?

Although most people have heard of RDF, it is often associated with its rather obtuse language—RDF/XML. In fact RDF is nothing more than a very general way of describing data, but as with many things that are very general, this gives it both power and the potential to confuse.

The basic idea of RDF is to reduce all collections of data to ‘nuggets’ of information called triples. A triple is nothing more than ‘some item’ having a property of some value. It might be:

Mark has an address of London, UK

or:

XTech 2006 has a venue of Amsterdam

By breaking things down to such fundamental building blocks, RDF can be useful for anything from knowledge management to database definitions, to marking up metadata about web pages.

The RDF Stack

RDF has evolved over the years, and a number of incredibly powerful layers have been built upon it. For example, reasoning software can make use of RDF statements to work out other statements. It may be the case that some system knows these facts:

XTech 2006 has a venue of Amsterdam
XTech 2006 has a start date of May 16th
XTech 2006 has an end date of May 19th

If the system also discovers that:

XTech has a speaker of Mark

then it could deduce that on May 18th:

Mark has a location of Amsterdam

The problem

The major problem that people have been trying to tackle is that although RDF itself is fairly simple, the language that is usually used to express it is not. RDF is usually ‘carried’ via RDF/XML which is renowned for its difficulty.

The reason this is such an important issue is that a great deal of the web’s metadata resides in HTML pages sitting on web-sites. Company addresses, the weather in Tokyo, the price of a second-hand car…every day millions of pieces of metadata are placed onto the internet which are not usable in software since they are not formatted in any standardised way. Without some kind of mechanism for extracting this information the dream of the Semantic Web will remain just that.

What’s needed is not to throw away RDF, but to find an easier way of encoding it that allows the output of HTML authors to be placed at the centre of the Semantic Web.

Microformats and GRDDL

The growth of interest in so-called microformats shows the there is a real need for, and interest in, such a solution. The technique used is to agree that certain patterns of HTML usage can be ‘codified’ to represent some agreed upon metadata. For example, the following HTML mark-up:

<div class="vcard">
  <a class="url fn" href="http://tantek.com/">
    Tantek Çelik
  </a>
  <div class="org">Technorati</div>
</div>

leverages the use of CSS classes to reuse pieces of the mark-up as metadata, and so represent a vCard.

This approach is the same as that used by GRDDL, which uses XSLT to take an XHTML document and extract from it pieces of metadata, but the problem with both solutions is that they don’t scale, and so don’t allow the mark-up data to become part of the Semantic Web. In both cases they require already existing metadata formats (vCard, iCalendar, and so on) to have a ‘partner’ or ‘mirror’ definition created, which guides document production.

Despite these problems, the goal of initiatives like microformats is important, since it sets out the possibility of carrying metadata in an ordinary HTML document. But ultimately, without addressing the problem of scale, we are not much closer to our goal of building a Semantic Web that parallels the visible web that we’ve been using for years.

The RDFA Approach

The approach taken by RDFA is that ultimately any RDF structure should be representable. This means that instead of having to ‘codify’ each format to describe how it must be marked up, we simply provide a set of rules that explain how any RDF can be marked up, and than any RDF ‘language’ can be used.

This means that a library, for example, can still make use of the complex taxonomies and schemas that it relies on, whilst a web author can mark-up their home page using something like Friend-of-a-Friend (FoaF).

Example: A home page

Jo has lots of friends, family and work colleagues with which she would like to stay in touch during her busy schedule. She would like to set up a home-page for herself, where people who know her can find useful contact information, such as her phone number or work email.

Setting Up the Web Page

Jo's first stop is to create a page that contains information about her that can be read by anyone using a web browser. She begins with some details for people who might be trying to contact her at work:

<html>
  <head>
    <title>Jo Lambda's Home Page</title>
  </head>
  <body>
    <p>
      Hello. This is Jo Lambda's home page.
      <h2>Work</h2>
      If you want to contact me at work, you can
      either <a href="mailto:jo.lambda@example.org">email
      me</a>, or call +1 777 888 9999.
    </p>
  </body>
</html>

Jo can now pass on the address of her home-page to her friends, which is http://jo-lambda.example.org/.

Adding Name and Contact Metadata

One of Jo's friends, Terri, tells Jo that the address book software she uses can be automatically kept up-to-date with Jo's details. All Jo needs to do is to add some tags to her home page to help the system understand her data. The tags that Terri's address book understands come from a special list—often called a vocabulary—specifically for describing relationships between people. The particular vocabulary is called 'Friend-of-a-friend', or FoaF.

The first thing that Jo needs to do is add an identifier to the top of her document that will make the FoaF vocabulary available to the rest of her home-page:

<html
  xmlns:foaf="http://xmlns.com/foaf/0.1/"
>

Jo then looks through the FoaF vocabulary, and sees that the pieces of information that she has in her page—name, phone number and email address—all have special names within FoaF. She therefore adds those names to her document, using the following approach:

if the value she wants to use for a property is in the href attribute of an <a> element, then the rel attribute can be added to the element, and its value is set to contain the name of the property she wants to add;

if the value to be used for a property that she wants to add doesn't have an element to contain it, then one must be added;

the name of the property used to describe the contents of an element is placed in an attribute called property.

Let's look at each of those rules.

Using URLs as Property Values

Jo has provided a link in her home-page to her email address, which is jo.lambda@example.org:

...
If you want to contact me at work, you can
either <a href="mailto:jo.lambda@example.org">email
...

However, to ensure that Terri's address book software understands this, Jo can use the FoaF mailbox property:

...
If you want to contact me at work, you can either
<a
 rel="foaf:mbox"
 href="mailto:jo.lambda@example.org"
>
  email
</a>
...

Note that the use of QNames to describe the property means it is clear and unambiguous—no matter where this appears, in whatever document it will mean the FoaF mbox property.

Using Text As Property Values

In addition to her email address, Jo also wants to add her name and phone number. Currently the values that she would like to use for these properties are not separated from the other text items so, as per rule 2, Jo adds some simple wrapper elements:

<p>
  Hello. This is <span>Jo Lambda</span>'s home page.
  <h2>Work</h2>
  If you want to contact me at work, you can either
  <a
   rel="foaf:mbox"
   href="mailto:jo.lambda@example.org"
  >
    email me
  </a>
  , or call <span>+1 777 888 9999</span>.
</p>

Now that the text is inside span elements it is easy to add the FoaF properties for name and phone number, using the RDFA attribute property:

<p>
  Hello. This is
  <span property="foaf:name">Jo Lambda</span>
  's home page.
  <h2>Work</h2>
  If you want to contact me at work, you can either
  <a
   rel="foaf:mbox"
   href="mailto:jo.lambda@example.org"
  >
   email me
  </a>
  , or call
  <span property="foaf:phone">+1 777 888 9999</span>.
</p>

Complete Mark-up

<html>
  <head>
    <title>Jo Lambda's Home Page</title>
  </head>
  <body>
    <p>
      Hello. This is <span property="foaf:name">Jo Lambda</span>'s
      home page.
      <h2>Work</h2>
      If you want to contact me at work, you can
      either
      <a
       rel="foaf:mbox"
       href="mailto:jo.lambda@example.org"
      >
        email me
      </a>
      , or call
      <span property="foaf:phone">+1 777 888 9999</span>.
    </p>
  </body>
</html>

Now all Terri needs to do is to provide the internet address for Jo's home page to her contact software, and it will be able to extract the following information about Jo:

foaf:name     = "Jo Lambda"
foaf:mbox     = "mailto:jo.lambda@example.org"
foaf:phone    = "+1 777 888 9999"
foaf:homepage = "http://jo-lambda.example.org/"

More formally, the markup Terri added to her XHTML defines a set of RDF triples. Each triple effectively represents one property of her data.

Conclusion

The final mark-up shows how Jo’s document can double as both an ordinary home page and a collection of FoaF data. This has a twinfold advantage; firstly, Jo can publish metadata as easily as publishing a web page. She need only own a blog to participate in the Semantic Web. But secondly, the Semantic Web itself comes a step closer as the techniques and tools that have been established for years in the world of RDF can begin to be applied to an ever increasing quantity of information—data that has hitherto been unavailable.