RDF Module Technical Overview

Status: This document is outdated, and reflects the state of RDF in Mozilla before we moved to use of the new layout engine. It remains mostly accurate but specifics of technical details are a little dated.

Last updated: $Id: api.html,v 1.3 1999/03/18 09:07:54 daniel.brickley%bristol.ac.uk Exp $

This Document

The document is a high level overview about the RDF code in Navigator. You need to understand the material in this document before hacking the Navigator RDF code in ns/modules/rdf/.

If all you want to do is use the RDF C Apis for your own application, you don't need to understand everything given here, but it might be a good idea anyway.

If you are looking for material on how RDF is used in Navigator or on Aurora, look here.

The Basic Idea

We have a lot of different pieces of structured data --- bookmarks, history, file systems, document structures, sitemaps, etc. The creation/access/manipulation code for these are completely independent. So, each of them has its own storage system, editing and viewing tools, query and manipulation APIs, etc. There is a substantial lost opportunity here. There is considerable overlap in the data model used by all these different structures. All these structures are instances of directed labeled graphs. So, the basic idea behind RDF is : if you can manifest yourself via the RDF data model (which is built upon directed labeled graphs), there is a marketplace of services that you can utilize. Some of these services include,

Viewers and Editors for these structures.
Persistent Storage
Query Mechanisms
Inferential services such as type checking and inheritance.
Compositing, i.e., the ability to provide merged views of multiple graphs. Uses of this are described later.
Serialization and transmission via the RDF-XML format.
and many other services that we haven't yet thought about ...

Another way of looking at it is as follows: Just as COM/beans/... allows pieces of code to work together because they manifest a common object model, RDF tries to do the same thing for data and the common data model is built upon that of directed labeled graphs.

What is RDF

There are a couple very different things meant by the term RDF.

RDF as a data model / data abstraction layer / query language. Directed Labeled Graphs (DLG) are a very general mechanism for representing things. Naturally, it turns out that you can model a wide range of information as a DLG. It doesn't matter how the information is stored on disk or transmitted over the wire --- if it can be modeled as a DLG, we can make it queriable as a DLG.
RDF as a file format using XML. It would be nice to have a canonical file format to ship snippets of RDF across the wire. This is the RDF File format.

Nodes in RDF DLGs are Resources in the sense of URIs. This means that you can get two different graphs from different sources that reference some of the same nodes. You can superpose the two graphs (making sure that the common nodes are properly aligned) and you have just aggregated the information from the two sources.

This aggregation ability is used all around the place with RDF for personalization, overriding, etc.

RDF Database

An RDF database is a directed labeled graph. Almost all the RDF APIs include reference to an RDF Database.

The graph consists of

A set of nodes called RDF_Resource.
In addition to RDF_Resource, nodes can also be char* or int32. The type of a node (i.e., RDF_Resource, char* or int32) is specified as an argument in many of the APIs.
A set of arcs, each labeled with a RDF_Resource and a truth value, i.e., a true/false label.

An RDF Database is actually an aggregation of an ordered list of RDF data sources, each contributing a portion of the graph. The aggregation of the graphs is defined by simple superpositioning. The ordering of data sources specifies a priority and if an arc appears in multiple data sources with different truth values, the arc from the higher data source overrides. Each data source itself is identified by a uri. The following calls are used to create an RDF Database and eventually to dispose it. The char** argument to RDF_GetDB specifies the uris for the RDF data sources. Some standard data sources already built into navigator are

A file encoded in RDF, MCF, Netscape bookmarks or any of the data file formats that Navigator understands.
FTP directories
Local file systems (A, C, D, etc. drives, Mac Volumes, etc.)
Berkeley DB encodings of RDF
Netscape history format

More data sources can easily be added. We hope this will happen with the help of the developers outside Netscape.

RDF Queries

The RDF Query API is a very standard graph query API. It can be used both to traverse and to edit the graph.

Every RDF Query API takes an RDF database as an argument. Each RDF data source is required to implement a set of RDF Data Source APIs. A database itself implements the RDF Query API by by issuing RDF Data Source API queries to its data sources.

The RDF Data Source API is a strict subset of the RDF Query API. This means that an RDF Database can also be an RDF Data source.

RDF Data Sources

It is easy to expose a new source of data via the RDF APIs. To do this, one provides a wrapper (around that data source) that implements the RDF APIs.

A data source could be a read-write store or a read-only store. It can also be a read-partial-write store, i.e., it can execute only some of the edits presented to it. e.g., folder based file system directories are far less expressive than general RDF graphs. In the more general model, it is possible to make statements like "File001 contains the response to email0017". Neither the file system (nor the email system) is capable of representing such a statement. The wrapper for the file system (and email system) can legally refuse to perform such an edit. If a more general purpose RDF data source (such as the one based on Berkeley DB), that database could perform the addition. The user of the query API need not know the difference.