XTech 2006 news

Newsletter sign-up


RSS and Atom feed icon News feeds

Markup for Flat-XML Processing

Daniel Parker (Economic Technology, Inc.)
Core technologies St. John 1

As XML technologies come to play an ever expanding role in mainstream data processing, the need grows for markup languages that convert legacy data to XML. For SQL database management systems, the need is met by the ANSI and ISO SQL/XML standard (http://www.sqlx.org). But there does not appear to be a comparable effort that covers off the many varieties of flat files (delimited files, positional files, files of records with repeating groups, EDI files, etc.) The major database and middleware vendors offer proprietary solutions, but there is no standard solution.

This presentation begins with an examination of use cases that a flat-XML markup language must meet. These are drawn from middleware vendor documentation, Michael Rawlins’s published work about legacy data (http://www.awprofessional.com/bookstore/product.asp?isbn=0321154940&rl=1), Bob Lyons’ discussion of his company’s XFlat markup language (http://www.infoloom.com/gcaconfs/WEB/TOC/t0413_.HTM), my own experience in the banking sector, and the use cases submitted by users of the open source project ServingXML (http://servingxml.sourceforge.net/).

From the use cases, requirements are established for (i) describing flat file records in a very general way, to represent individual records in a canonical XML format, and (ii) mapping a stream of records so represented into an XML document. Particular attention is paid to the fact that the input files can be very large, and that records generally have to be grouped. The need for record-by-record error and discard handling is also addressed.

The presentation then sketches out a markup vocabulary that attempts to meet these requirements, and briefly discusses some issues around an open source implementation of that language.

Chair: Eric Prud'hommeaux