RSS and Atom feed icon News feeds

The Viper Solution: A Data Persistence Model using XML and PHP

Abstract

In this paper, we present an XML persistence model and API, called the XML Content Store (XCS). The XCS helps manage XML data that is common to many web applications by abstracting the persistence details. The components of the XCS will be described as well as an implementation in PHP using the latest release of the IBM DB2 database, codenamed Viper. Finally, some common applications that may benefit from the XCS will also be discussed.

Introduction

Recently, web frameworks, such as Ruby on Rails [1], have been developed to make web development easier. Much of the “scaffolding code” for the most common type of web applications is created automatically so that the web developer need not concentrate on the often tedious and repetitive configuration tasks. Rather, the developer can devote his/her energies to more interesting tasks, such as domain-specific logic and application behavior.

One important piece of these frameworks is the persistence model used to simplify and abstract database access. This follows the same philosophy: leave the details of database access and manipulation to the framework so that the developer can concentrate on the behavior of the data. Many persistence and object-relational models exist, including Ruby on Rails’ own Active Record [2], JBoss’ Hibernate [3], and Sun’s Entity Enterprise Java Beans [4]. We propose a simple persistence model in PHP using IBM’s upcoming DB2 database release, codenamed Viper, with its native XML support.

PHP is a popular scripting language that is suited for web development. It is not only easy to learn, but can be quickly used for developing sophisticated applications. Our proposed model, tentatively called XML Content Store (XCS), abstracts all database Create, Read, Update, and Delete (CRUD) operations. Furthermore, by using XML as the underlying storage structure, the developer need not know anything about the underlying table schema. The schema “automatically” evolves as the developer’s needs evolve.

All this is made possible by using Viper’s XML capabilities. Viper uses a native XML data type and provides a means to represent, access, analyze, and integrate data and content across heterogeneous information sources. This inherently allows the data management and interchange that web development requires. Viper provides a powerful XQuery engine and standardized SQL/XML publishing functions leading to the use of “XML-on-the-edge” technology. This use of XML functionality, abstracted by the XCS and without inherent knowledge of how Viper uses the data store, is a viable option for those new to web applications and XML, but familiar with basic programming techniques. Still, for the more sophisticated programmer who is familiar with standard SQL and data access methods, it is hard to ignore the added value the DB2 engine provides as an enterprise database management system.

This paper will focus on the XCS, an XML data access and persistence model and API that allows programmers to develop simple web applications that require a database. An implementation of the XCS using PHP and Viper will demonstrate how Viper allows for faster and more intuitive PHP application development and maintenance. Finally, some code samples using the XCS will be shown, along with a summary of some common type of data-centric web applications that will benefit from the use of the XCS.

XML, XML Everywhere

XML provides a simple format that is flexible enough to accommodate wildly diverse needs. It is simple, extensible, interoperable, and open [5]. For these reasons, XML use in software has become almost ubiquitous, especially with the emergence of Web 2.0 technologies. Some examples of its wide use include web services, AJAX, RSS/Atom, messages, templates, user interfaces, and configuration files.

Application developers use various technologies to manipulate and process XML, including DOM, SAX, XQuery, and XSLT. Similarly, XML data, when it must persist for any length of time, is stored in various ways, either in a file system or in a database. Within a database, it can be shredded into relational columns, stored as a Character Large Object (CLOB) or Binary Large Object (BLOB), or stored natively logically and physically in an XML data model [6].

Each of these choices has its own unique set of use cases and complexities associated with it and choosing a particular programming API or storage paradigm can be a daunting task for not only a beginning programmer, but for an experienced programmer as well. Many times, performing even a basic create, read, update, or delete (CRUD) operation on a simple document requires several non-trivial resource management steps. Also, most of these often tedious steps must be repeated for subsequent operations.

Frameworks and Persistence

For software development, and web development in particular, one approach to address this problem of complexity for common tasks is abstraction. Abstraction reduces and factors out details so that the developer can focus on a few concepts at a time. A common technique for abstracting complex details is the use of frameworks. Frameworks are a defined support structure in which another software project can be organized and developed. This may include support programs, code libraries, toolkits, or even a scripting language. These are used to help develop and glue together the different components of a software project and they shift responsibility of resource handling from the client or application to the framework. By developing applications using a framework, much of the mundane work is done by the framework allowing the developer to concentrate on the more interesting application-specific logic.

There are dozens of frameworks, open-source and otherwise, available on the web. A small sampling includes Apache Cocoon [7], Apache Struts [8], Eclipse [9], Ruby on Rails, CakePHP [10], Spring [11], and recently in beta, the Zend Framework [12]. Most of these frameworks already include some persistence mechanisms for saving data, either to a file system or to a database. Some examples include Ruby on Rails Active Record, Hibernate, Entity Enterprise Java Beans (EJB), and Service Data Objects (SDO) [13]. Most are geared towards object-relational mapping and often this approach is used when an XML document is persisted. Even for simple XML documents, this can still be a difficult task and may involve multiple steps to configure this mapping.

Clearly, there is a need for abstracting the manipulation of XML data to allow for the development of applications quickly, easily, and without a steep learning curve. Furthermore, a portion of this abstraction involves a simple mechanism for persisting the data. We propose a data persistence model for XML data, called the XML Content Store (XCS). The XCS abstracts the details of configuration and resource management. The notion of manipulating a file system, or mapping to a database is abstracted away with a simple API. All that the API needs to know is where the XML will be persisted, via a “connection handle”, whatever that connection may be. Once the programmer provides the connection information, all other persistence operations will be hidden.

The proposed API will be presented to demonstrate its potential for a rapid application development paradigm. For their ease of use and as a proof of concept, this first implementation was completed using the PHP scripting language and the IBM DB2 database’s next version beta release, codenamed Viper and with native XML features. But the concepts and API are generic enough, such that they can be implemented against any XML “data source”, i.e., file system or other relational database management system. Furthermore, the API can be implemented using any programming or scripting language, such as Java, .NET, Python, Ruby, or any other programming language of choice.

XML Content Store Components

In this section, we'll provide an architectural overview of the XCS. We'll describe each of the components that make up the XCS and explain how they work together.

XMLContent

As mentioned before, a web developer encounters XML data in many forms, such as web services messages, RSS/Atom feeds, and/or configuration files. Once he/she decides that this XML data needs to be saved somewhere, we will assume several things about this data. First, we need a way to uniquely identify it, so that once it is saved, it can easily be programmatically found and retrieved (though the API will contain some basic search capability which will be discussed below). The unique name can be a numeric id or a user-provided name.

The second assumption is that the XML data will be stored as is. It will not be modified or changed in any way. Any modifications that are required will be done by the application outside of the XCS. Internally, the XML data is stored as a DOM document, but an application is free to access the data as a file stream, string, or several other convenient access methods which may or may not be implementation-dependent.

Third, the capability will be provided to add additional metadata about the XML data that is saved. For example, if the XML data is a blog entry, perhaps the application would care to know the date and title of the entry, or the hostname where the entry originated. This metadata is saved in an about property and is also XML.

Finally, often times XML data will be accompanied by binary data, such as .jpeg, .pdf, or .doc files. An attachment property will associate this binary data with the XML data. In this first implementation of XMLContent, attachment can contain either 0 or at most 1 item, though optionally, a future version can contain any number of items.

These properties of the XML data are encapsulated in an object called XMLContent. XMLContent objects are the fundamental components of the XCS as they are the XCS representation of XML data. As we will see below, the XMLContentStore component needs to know about the persistence technology (for example, a relational database) and how to access it, but XMLContent objects need not know anything about it.

XMLContentStore

The XMLContentStore represents a repository of XML documents. It is responsible for updating the data source based on changes made to an XMLContent object in the repository as well as retrieving XMLContent objects from the data source based on search or id criteria.

A data source is defined very generally as the persistence layer where the XML data is saved. It can be a relational database, an XML database, or a file system and it stores the XML data in its own format. When an XMLContentStore object is instantiated, it receives a “connection handle” which describes in a meaningful way what the data source is.

An XMLContentStore knows how to access the persistence layer to perform CRUD operations and abstracts these operations by using convenience methods. For example, if the persistence layer is a relational database, a call to the method insert() will build an appropriate SQL insert statement based on the structure of the underlying tables used and the content of the XMLContent object. It will connect to the database and execute the statement. Other CRUD methods work in a similar fashion and include: update(), delete(), deleteById(), and selectAll().

In addition, XMLContentStore also contains a simple search facility that retrieves XMLContent by its id or by searching within the XML data or the about data in XMLContent. The search on XML data is done using XPath expressions. These methods are find(), and findById().

The following classes are helper classes that allow for easier processing of XMLContent objects.

XMLIterator

It is possible that a search returns one XMLContent or a set of XMLContent objects. In the case where a set is returned, the XMLIterator class is used to iterate over the set of XML documents that meet the search criteria. An XMLIterator knows several essential things about the set of XMLContent objects over which it is iterating. These include its current location in the set, how to retrieve the next object in the set, how to go back to the beginning of the set, and when it has reached the last item in the set. This allows the developer to assign behavior on the XML data at each iteration without having to worry about the details of loop control.

XMLUtil

XMLUtil is a utility class that provides convenience methods for passing XML data back and forth from the application to XMLContent, either for the raw XML data or for the about XML metadata. Though the XML is stored internally as a DOM, XMLUtil can allow an application to use strings, file streams, or any other implementation-specific object representation of XML data. Convenience methods for converting between these different types of representations and DOM are provided.

An XCS Implementation

XMLContentStore is defined as an abstract class so that an implementation of it can define for itself what it means to insert, update, delete or find. As a proof of concept, we defined an implementation using PHP and the beta release of DB2 Viper.

PHP

PHP is a dynamic language that is easy to use and learn. It started out as a small open source project that evolved as more and more people discovered how useful it was. PHP is a scripting language that is embedded in HTML and interpreted by the server. It can be used to manage dynamic content, work with databases, handle session tracking, and even build entire e-commerce sites. It works well with a number of popular databases, include MySQL, Oracle, SQL Server, and DB2 [14].

PHP is an integral part of the LAMP architecture that has become popular in the Web industry as a way of deploying inexpensive, reliable, scalable, and secure web applications. LAMP is short for Linux, Apache, MySQL, PHP, all of which are open-source products [15]. The PHP model can be seen as an alternative to Microsoft’s ASP .NET [16], ColdFusion [17], or Java/JSP [18].

One of the many useful data structures available in PHP that was especially useful in the development of the XCS is the array. In PHP, an array is actually an ordered map, also called an associative array, which maps values to keys. This allows the array data type to be very versatile and to be easily used as an array, list or vector, hash table, dictionary, collection, stack, queue, or tree [19].

Finally, basic object-oriented features to PHP were added in version 3. The latest release, version 5 released in July 2004, contains enhanced and more robust and powerful features to allow PHP to be used as a full-fledged object-oriented language.

Because it is so easy to learn and use (and because it is free), PHP has become the dominate language for serving dynamic web content. As of April 2006, there were over 20,000,000 domains and 1,200,000 IP addresses using PHP [20].

DB2 Viper

DB2 Viper, developed by IBM and now in beta, is the next version of the DB2 database management system. It features significant new support for storing, managing, and searching XML data, in addition to its traditional relational database management scheme. Viper allows client applications to work with both tabular and XML data structures through the query language of their choice – SQL (including SQL with XML extensions, often referred to as "SQL/XML") or XQuery. As shown in Figure 1, engine-level components within DB2 support queries specified in either language [21].

Figure 1. The architecture of the new DB2 "Viper" release

XML is treated as a first class data type in Viper and a table can be defined to contain a column of type XML. This familiar syntax and usage allows quick adoption and integration with existing relational models and applications. Still, a purely XML-centric developer can simply stick to the XQuery language to manipulate XML data structures within Viper.

To efficiently manage XML data like traditional SQL data types, the XML data model is the fundamental logical data model both used internally by the database and exposed to database users when XML is the data type. Also, the XML data model is the fundamental unit of physical storage. However, it's important to note that the underlying storage mechanism used for a given data type is transparent to the application. In other words, the application doesn't need to explicitly specify which storage mechanism to use or to manage physical aspects of storage, such as splitting portions of XML documents across multiple database pages. It simply enjoys the runtime performance benefits of storing and querying data in a format that's efficient for the target data[22] [23].

Because of its native XML capabilities, Viper is a natural choice for the persistence layer for the XCS.

XCS using PHP and Viper

For the Viper implementation of the XCS, we defined one table with four columns that represent the properties of an XMLContent object that was described above: id, data, about, and attachment.

The columns are defined as follows:

  • id is a unique integer and used as the primary key of the table

  • data is defined as an XML column

  • about is defined as an XML column

  • attachment is defined as a BLOB column

Using Viper, each row in the table represents one XMLContent. Creating the table is the only database administrative task required. Yet, this can also be automated by adding the appropriate implementation-specific code in the API. This one-time enabling of Viper for the XCS by creating the table can then be executed via a programmatic method call. Figure 2 is a visual representation of the database table used for the XCS.

Figure 2. Using DB2 Viper as the persistence layer for the XCS

As we mentioned above, XMLContent is not aware of what the persistence mechanism is, so it can be used with Viper or any other implementation of the XCS without any changes. PHP contains a set of classes that allows the developer to operate on an XML document with the standardized DOM API and is fully object-oriented. Internally, the data and about properties are stored as PHP DOMDocument objects.

PHP also contains a simple XML representation called SimpleXML. It provides a very simple and usable object the can be processed with normal PHP property selectors and array iterators [24]. It is much easier when using simple, small XML documents to work with the SimpleXML class than with the DOMDocument class. For this reason, convenience methods for SimpleXML were added to XMLUTil so that an application may choose to do all its work in SimpleXML, but would also be able to use the XCS.

Since XMLContentStore is required to know about the persistence layer, it will be unique for each implementation. For its use with Viper, we created a child class of XMLContentStore called XMLContentStoreDB2. We implemented the CRUD methods using the PHP CLI driver, named ibm_db2. ibm_db2 provides a set of functions that enable applications to access the IBM DB2 Universal Database, IBM Cloudscape, and Apache Derby [25]. ibm_db2 uses connection, statement, and resultset resources and passes SQL directly to the database. XMLContentStoreDB2 abstracts all these details to make developer interaction with the persistence layer simple and fast.

It is important to note that the Viper implementation was simple because Viper allows us to treat XML as a first-class data type. By allowing us to create columns of XML type, the CRUD operations were easily implemented using simple SQL statements. There was no need to map XML elements to SQL columns or to perform any other type of object-relational mapping.

Coding Samples and Common Applications

In this section, we provide some sample code to showcase some implementation details and the ease of use of the XCS for web applications. Also, a sample list of applications that will benefit from the XCS will be listed.

DB2 Viper Database Definition

To create the XCS database in DB2, the following Data Definition Language (DDL) can be used. This DDL is executed at the DB2 command line or from a script where the commands are placed.

// Viper uses an utf-8 database to support XML
db2 create database xmlrepos USING CODESET utf-8 TERRITORY us;
// create the table
db2 create table xmldata(
   id bigint not null generated always as identity primary key,
   data xml,
   attachment blob(100m),
   about xml
);

The XML data type is available on any utf-8 database in the DB2 Viper release. With the creation of these database objects, DB2 is now ready to be used as an XML Content Store. There are no other DB2 administrative tasks necessary. However, to improve performance when searching, we suggest that indexes be created for the XML data:

create index datatext on xmldata(data) generate key using xmlpattern '//*' as sql varchar(800)
create index dataattr on xmldata(data) generate key using xmlpattern '//@*' as sql varchar(800)

create index abouttext on xmldata(about) generate key using xmlpattern '//*' as sql varchar(800)
create index aboutattr on xmldata(about) generate key using xmlpattern '//@*' as sql varchar(800) 

These are very general indexes on all text nodes and attributes on both data and about. The developer may decide to create indexes on elements or attributes for application-specific data, such as a product id or a last name element where searches may occur often.

XCS Implementation Example

The class XMLContentStoreDB2 is implemented using the ibm_db2 CLI driver. For example, to insert, the application gives the XCS an XMLContent object and the XMLContentStoreDB2 takes care of the underlying details:

/*** Saves the XML data represented by an
* XMLContent object into the persistence layer
*
* @param XMLContent $doc
*
* @return integer
*/
public function insert(XMLContent $doc)
{
   // SELECT FROM INSERT is a DB2 construct that allows
   // an application to immediately select values after
   // an insert. In this case we are retrieving the id
   // generated for the document using the sequence
   // database object
   $insertStr = 'SELECT id FROM FINAL TABLE(' .
                'INSERT INTO ' . 
                'xmldata(data,attachment,about)' .
                'VALUES(?,?,?))';

   // prepare the statement   
   $stmt = db2_prepare($this->conn, $insertStr);
   $xmlDoc = $doc->getDOM();   
    
   // set parameters   
   $paramArray = array($xmlDoc,
                       $doc->getAttachment(),
                       $doc->getAbout()->saveXML());

   // execute    
   db2_execute($stmt, $paramArray);

   // fetch results    
   $idArr = db2_fetch_array($stmt);
   $id = $idArr[0];
   $doc->setId($id);

   // return the id
   return $id;
}

The other CRUD and find methods are implemented in a similar fashion.

Example of an Application’s Use of the XCS

An application can easily manipulate and persist XML data using the XCS. First, instantiate the XMLContentStoreDB2 object, assuming that a connection resource has been previously created:

// Construct the XMLContentStoreDB2 object
// We assume that $conn is a connection resource to a DB2
// database that has already been created.
$xmlstore = new XMLContentStoreDB2($conn);

Perform an insert into the XCS:

// createDocument() is a convenience method in XMLUtil
// to create an XMLContent object
$doc = XMLUtil::createDocument();
// use a utility method to import a file into
// the XMLContent
XMLUtil::importXMLFile(‘data.xml’);
$xmlstore->insert($doc);

Let’s do a find as well. Assume a set of blog entries are stored in the XCS. A sample entry might look like this:

<?xml version="1.0" encoding="windows-1252" ?>
<entry>
   <title>Elvis Lives</title>
   <author>John Doe</author>
   <date>11-22-2003</date>
   <body>I recently wrote a short history of the works of Elvis Presley</body>
</entry>

To perform the search, one might issue the following statements:

$lookingFor = ‘elvis’;
$found = $xmlstore->find($lookingFor);
// iterate over possibly many results
foreach($found as $current) {
   // do something with the result
}

In this case, the above blog entry would be returned because the string “elvis” was a match in the <title> element and the <body> element (though only one match is needed in order to determine search success). In this example, the search was case-insensitive, but the find() method can also take a parameter indicating that the case must also match.

Typical XCS Applications

There are many types of applications that will benefit by using the XCS. Figure 3 demonstrates the basic architecture of an application that uses the XCS.

Figure 3: The architecture of a web application using the XCS

Below we list some specific application examples.

RSS/Atom Feed Aggregator

As shown above, an RSS/Atom Feed Aggregator application is well-suited for the XCS. A typical use case is that a user can input different feed urls and the application can periodically go out and retrieve feed updates and store them in the XCS. These feeds can be displayed, searched, and possibly be published as a “feed of feeds” as well. Some example aggregators include Bloglines [28] and MagpieRSS [27] (though they do not currently use the XCS).

Content Management Systems

A content management system (CMS) is a computer software system for organizing and facilitating collaborative creation of documents and other content. Storing content as XML in the XCS allows easy retrieval and search. about metadata can be used for workflow management and processing. Though the data is stored as XML, an CMS application can export documents as needed in .pdf, .doc, html, etc. format by transforming the XML into the required format. Storing the data as XML will be transparent to the user of the application.

Web Services

Web Services send messages using XML data. Many applications can benefit by storing data to be served in the XCS. As a web service request comes in, the result can be easily retrieved or even composed of several XMLContent objects.

Mashups

A mashup is a website or web application that seamlessly combines content from more than one source into an integrated experience. The content used in mashups typically comes from a third party via a public interface or API [28]. Most of these API return data as XML and often times the data is refreshed using AJAX, which inherently returns XML data. The XCS can be used to maintain and manage this data. By querying and joining the data, interesting scenarios can be created dynamically, in essence creating a “mashup of mashups”.

We have only listed a few applications of the XCS, but essentially, any application that requires the storage and processing of XML data can be easily implemented using the XCS. In the world of Web 2.0, this may mean all applications!

Conclusion

We have defined a data persistence model for XML data named XCS. The XCS provides a mechanism for easily adding persistence to XML applications. An implementation using PHP and DB2 Viper was also discussed. PHP allows rapid web application development and Viper allows easy manipulation and storage of XML data, plus the capability to perform traditional relational database operations as well. Some sample code was presented to demonstrate how the XCS was implemented and how an application written in PHP can use the XCS.

As the volume of XML data generated and used continues to grow in years to come, easy interoperability between disparate systems seems closer than ever. The XCS will be available to enable developers of all skill levels to quickly and easily create the applications that will allow these systems to talk the common language of XML. An IBM DB2 Viper implementation of the XCS further combines powerful XML storage and XQuery technologies with a world-class relational database management system.

Acknowledgements

The author would like to thank Stephen Brodsky (IBM Silicon Valley Lab) for his contributions to this paper and overall mentoring and guidance for the XCS project.

References

[1] Ruby on Rails (http://www.rubyonrails.org/) – Ruby web development framework

[2] Ruby on Rails Active Record – http://wiki.rubyonrails.com/rails/pages/ActiveRecord

[3] Hibernate – http://www.hibernate.org/

[4] Entity EJB – http://java.sun.com/products/ejb/

[5] “Why XML?” by Simon St. Laurent, http://www.webdevelopersjournal.com/articles/why_xml.html

[6] “XML and Relational Storage – Are They Mutually Exclusive?” by George Lapis, Technical Manager, IBM Corporation. Presented at XTech 2005, http://www.idealliance.org/proceedings/xtech05/papers/02-05-01/

[7] Apache Cocoon (http://cocoon.apache.org/) – Java web development framework

[8] Apache Struts (http://struts.apache.org/) – Java J2EE development framework

[9] Eclipse (http://www.eclipse.org/) – Language-independent, but mostly Java-based, development platform and application framework for building software

[10] CakePHP (http://cakephp.org/) – PHP web development framework

[11] Spring (http://www.springframework.org/) – Java J2EE development framework

[12] Zend Framework (http://framework.zend.com/) – PHP web development and web services framework

[13] SDO – http://www-128.ibm.com/developerworks/java/library/j-sdo/

[14] “Introduction to PHP” by Joe Brockmeier, http://www-128.ibm.com/developerworks/linux/library/l-php.html

[15] “PHP” Wikipedia Entry; http://en.wikipedia.org/wiki/PHP

[16] Microsoft ASP .NET (http://www.asp.net/Default.aspx?tabindex=0&tabid=1) – Uses C# and VB .NET languages.

[17] ColdFusion (http://www.macromedia.com/software/coldfusion/?promoid=BINO) – Developed by Macromedia

[18] Java/JSP (http://java.sun.com/products/jsp/) – Sun’s servlet-based technology

[19] PHP Online Manual Entry for Array; http://us2.php.net/manual/en/language.types.array.php

[20] Usage stats for April 2006, Source: Netcraft, http://www.php.net/usage.php

[21] “Get off to a fast start with DB2 Viper” by Cynthia M. Saracco, http://www-128.ibm.com/developerworks/db2/library/techarticle/dm-0603saracco/

[22] “Native XML Support inDB2 Universal Database” by Matthias Nicola and Bert Van der Linden; Proceedings of the 31st VLDB Conference, Trondheim, Norway, 2005; http://www.vldb2005.org/program/paper/thu/p1164-nicola.pdf

[23] “System RX: One Part Relational, One Part XML”, Beyer, Kevin, et. al. ; SIGMOD, June 2005

[24] PHP Online Manual Entry for SimpleXML; http://us2.php.net/manual/en/ref.simplexml.php

[25] PHP Online Manual Entry for ibm_db2; http://us2.php.net/manual/en/ref.ibm-db2.php

[26] Bloglines – http://www.bloglines.com/

[27] MagpieRSS – http://magpierss.sourceforge.net/

[28] “Mashup” Wikipedia Entry; http://en.wikipedia.org/wiki/Mashup_(web_application_hybrid)