Apache Cassandra Storage Adapter for RDF.rb

This is an RDF.rb plugin that adds support for storing RDF data in the Apache Cassandra distributed database management system.

Features

Limitations

Examples

require 'rdf/cassandra'

Connecting to a Cassandra server running on localhost

repository = RDF::Cassandra::Repository.new

Connecting to specific Cassandra servers

repository = RDF::Cassandra::Repository.new(:servers => "127.0.0.1:9160")

Configuring the Cassandra keyspace and column family

repository = RDF::Cassandra::Repository.new({
  :keyspace      => "MyApplication",  # defaults to "RDF"
  :column_family => "MyRepository",   # defaults to "Resources"
})

Configuration

As of Cassandra 0.6, all keyspaces and column families must be predeclared in storage-conf.xml. You can think of each used Cassandra supercolumn family as being equivalent to an RDF repository, so you'll want to configure as many as you are likely to need.

The following configuration snippet matches the default options for constructing an RDF::Cassandra::Repository instance:

<Keyspaces>
  <Keyspace Name="RDF">
    <ColumnFamily Name="Resources"
                  ColumnType="Super"
                  CompareWith="UTF8Type"
                  CompareSubcolumnsWith="BytesType"
                  Comment="RDF data."/>
  </Keyspace>
</Keyspaces>

See etc/storage-conf.xml for a full configuration file example compatible with Cassandra 0.6.

Data Model

This storage adapter stores RDF data in a resource-centric manner by mapping RDF subject terms to Cassandra row keys, RDF predicates to Cassandra supercolumns, and RDF object terms to Cassandra columns as follows:

{key     => {supercolumn => {column    => value }}}   # Cassandra terminology
{subject => {predicate   => {object_id => object}}}   # RDF terminology

RDF object terms are stored using their canonical N-Triples serialization and are uniquely identified by the binary SHA-1 fingerprint of that representation.

For example, here's how some of RDF.rb's DOAP data would be stored using the RDF::Cassandra data model:

{
  "http://rdf.rubyforge.org/" => {
    "http://www.w3.org/1999/02/22-rdf-syntax-ns#type" => {
      "c0b66f5e31ec616497404f044ff0eaa210f21232" => "<http://usefulinc.com/ns/doap#Project>",
    },
    "http://usefulinc.com/ns/doap#developer" => {
      "9d178ddaa88acfec63f812aa270b42291381b4ff" => "<http://ar.to/#self>",
      "908b42dd9d1a3f5ac5ecf9540e1f9a753f444204" => "<http://bhuga.net/#ben>",
      ...
    },
    ...
  },
  "http://ar.to/#self" => {
    "http://www.w3.org/1999/02/22-rdf-syntax-ns#type" => {
      "74a5c03994aacac0a36003afb61aaf7befc438fd" => "<http://xmlns.com/foaf/0.1/Person>",
    },
    "http://xmlns.com/foaf/0.1/name" => {
      "f369f748e964ef2b82160d6389b63fb55949b464" => '"Arto Bendiken"',
    },
    ...
  },
  "http://bhuga.net/#ben" => {
    "http://www.w3.org/1999/02/22-rdf-syntax-ns#type" => {
      "74a5c03994aacac0a36003afb61aaf7befc438fd" => "<http://xmlns.com/foaf/0.1/Person>",
    },
    "http://xmlns.com/foaf/0.1/name" => {
      "97325e589ac0194e74848090181b66b0db310750" => '"Ben Lavender"',
    },
    ...
  },
}

To learn more about Cassandra's data model, read WTF is a SuperColumn?.

Documentation

http://rdf.rubyforge.org/cassandra/

Dependencies

Installation

The recommended installation method is via RubyGems. To install the latest official release of the RDF::Cassandra gem, do:

% [sudo] gem install rdf-cassandra

Download

To get a local working copy of the development repository, do:

% git clone git://github.com/bendiken/rdf-cassandra.git

Alternatively, you can download the latest development version as a tarball as follows:

% wget http://github.com/bendiken/rdf-cassandra/tarball/master

Authors

License

RDF::Cassandra is free and unencumbered public domain software. For more information, see http://unlicense.org/ or the accompanying UNLICENSE file.