A high performance RDFS store using a Generic Object Model
The ability to consider multiple data sources simultaneously greatly enhances the ability to perform threat analysis and disruption of terrorist activities. The simultaneous consideration of multiple data sources with disparate schemas requires the ability to perform semantic alignment, often at a very large scale. In addition to concept merging, merging must be possible at the instance level, since individuals and other entities often occur in many or all data sources being considered. This merging, or “federation,” must maintain provenance information for the data so that information can be traced to its source even after being brought into federation. These requirements can be best met using Semantic Web technologies, specifically RDF Schema, specific constructs in OWL, and support in RDF query languages for provenance tracking and maintaining trust boundaries.
RDFS is part of the semantic web stack. It extends the Resource Description Framework (RDF) with support for schema information, including specific entailment rules licensing certain inferences from data based on an ontology. Unlike more expressive ontology languages, e.g., OWL-DL, and OWL-full, the entailments licensed by RDFS Model Theory (MT) can be computed in advance using forward chaining – a process known as “eager closure.” Entailments computed during eager closure are supported by one or more existing statements, and these entailments can themselves support further entailments. By maintaining these “justification chains”, a simple algorithm can be used to perform truth maintenance during statement removal, which can occur as a result of de-merging a data source from federation.
Maintaining provenance information within an RDF graph is quite cumbersome, if not impossible, since RDF data is composed of triples that can not be reified. (Reification in RDF is not compatible with RDFS and OWL reasoning.) To maintain provenance at the statement level, triples would need to be replaced with quads or quints reserved for statement and/or context information similar to the topic map data model. New query languages such as SPARQL may introduce language features that support querying the entailments of multiple RDF graphs simultaneously.
Several architectures are being explored for building high performance RDF databases. These architectures have emerged from a variety of backgrounds including Prolog, Datalog, Relational, XQuery, and “custom” RDF store architectures. In our work we have used an OODBMS framework known as the Generic Object Model (GOM). This architecture was selected based on its support for general purpose application programming and scalable persistence. The GOM provides a good contrast to existing approaches and has made it possible for us to explore the suitability of various architectures for semantic web applications by contrast with existing RDF databases. Preliminary benchmarks place performance at twice that of Sesame/MySQL on large datasets and an order of magnitude better than the Oracle® database.
This presentation will review the GOM architecture and the architecture of the RDFS store, report on our experiences implementing and tuning a 100% Java open source RDFS store, and explore why the GOM is an efficient and effective platform on which to build a Semantic-Web based federation application.




