Exposing databases as linked data using clojure, compojure and R2RML

The web is increasingly becoming a web of data.

Web APIs are ubiquitous nowadays. Transferring application data between web servers and complex javascript applications being executed inside web browsers, mobile phones, tablets etc is one of the main problems engineers face in modern web development.

The current state of affairs in the building of web APIs is far from being ideal. Web applications read data from a mixture of relational and not relational data sources, and transform it into some kind transport format, usually javascript.
These data are then retrieved by web clients, using some kind of more less RESTful API to be processed in the client application. Some cutting edge frameworks for building javascript applications, like Sproutcore, offer support for a datastore in the web client where JSON encoded data can be saved and manipulated using their own query language.

Developers not relying in such frameworks, must make up their own schemas to process the retrieved API data.

This approach to web API design have many drawbacks. For example, is hard to mix JSON objects from different applications describing the same kind of information. There is not a standard way to link resources inside JSON objects. It is also difficult to write reusable generic libraries to handle this kind of data or even writing framework code to handle the discovery, query and manipulation of web data APIs, as the above mentioned example of Sproutcore shows.

Some of these problems could be solved using some of the technologies developed as part of the semantic web initiative in the past decade. People have started referring to this pragmatic approach to the semantic web with a new title: Linked Data. The pragmatic approach here means putting less emphasis in the inference and ontological layers of the semantic web and just focusing in offer a simple way to expose data in the web linking resources across different web application and data sources.

Many interesting technologies are being developed under the linked data monicker or are commonly associated to it, RDFa for instance. Another of these technologies is R2RML: RDB to RDF Mapping Language.

R2RML describes a standard vocabulary to lift relational data to a RDF graph. It also provides a standard mapping for the relational data. This RDF graph can be serialized to some transport format: RDFa, Turtle, XML and then retrieved by the client. The client can store the triples in the graph locally and use the standard query language SPARQL to retrieve the information. Data from different applications using the same vocabulary (FOAF, GoodRelations) can be easily mixed and manipulated by the client in the same triple store. Furthermore, links to other resources can be inserted inside the RDF graph leading to the discovery of additional information.


clj-r2rml is a small library that implements R2RML in the Clojure programming language that can be used inside web applications.

Let’s see an example of how R2RML could be used in a web application. Imagine we have the following database table:

+--------+--------------+-------------+
| deptno | dname        | loc            |
+--------+--------------+-------------+
|       10 | APPSERVER | NEW YORK  |
|       11 | APPSERVER | BOSTON     |
+--------+--------------+-------------+

We can describe a mapping for this table with the following RDF graph, expressed with the turtle syntax:


    a rr:TriplesMap;
    rr:logicalTable "
       Select ('_:Department' || deptno) AS deptid
            , deptno
            , dname
            , loc
         from dept
       ";
    rr:class xyz:dept;
    rr:tableGraphIRI xyz:DeptGraph;
    rr:subjectMap [ a rr:BlankNodeMap; 
                          rr:column "deptid" ];
    rr:propertyObjectMap [ rr:property dept:deptno; rr:column "deptno"; rr:datatype xsd:positiveInteger ];
    rr:propertyObjectMap [ rr:property dept:name; rr:column "dname" ];
    rr:propertyObjectMap [ rr:property dept:location; rr:column "loc" ];
    rr:propertyObjectMap [ rr:property dept:COMPANY; rr:constantValue "XYZ Corporation" ];
.

The mapping takes a logical table, produced by the SELECT SQL query, and generates triples for the columns and values of the table according to the rules in the R2RML specification.

Using clj-r2rml, this mapping can be expressed as a clojure map:

{:logical-table   "select concat('_:Department',deptno) AS deptid, deptno, dname, loc from Dept"
       :class           "xyz:dept"
       :table-graph-iri "xyz:DeptGraph"
       :subject-map     {:column "deptid"}
       :property-object-map [{:property "dept:deptno"
                              :column   "deptno"
                              :datatype "xsd:positiveInteger"}
                             {:property "dept:name"
                              :column   "dname"}
                             {:property "dept:location"
                              :column   "loc"}
                             {:property       "dept:COMPANY"
                              :constant-value "XYZ Corporation"}

The mapping data will be used by clj-r2rml to generate the correct RDF graph. This graph can be later exposed behind a RESTful interface using Compojure:

(ns clj-r2rml-test.core
  (:use compojure.core)
  (:use clj-r2rml.core)
  (:require [compojure.route :as route]))


(def *db-spec*
     {:classname   "com.mysql.jdbc.Driver"
      :subprotocol "mysql"
      :user        "root"
      :password    ""
      :subname     "//localhost:3306/rdftests"})

(def *context* (make-context *db-spec* {}))

(defn- build-query
  ([id]
     (if (nil? id)
       "select concat('_:Department',deptno) AS deptid, deptno, dname, loc from Dept"
       (str "select concat('_:Department',deptno) AS deptid, deptno, dname, loc from Dept "
            "where deptno='" id "'"))))

(def *additional-ns*
     {"dept" "http://test-clj-r2rml#"
      "xyz"  "http://test-clj-r2rml-xyz#"})

(defn- mapping
  ([] (mapping nil))
  ([id]
     [{:logical-table   (build-query id)
       :class           "xyz:dept"
       :table-graph-iri "xyz:DeptGraph"
       :subject-map     {:column "deptid"}
       :property-object-map [{:property "dept:deptno"
                              :column   "deptno"
                              :datatype "xsd:positiveInteger"}
                             {:property "dept:name"
                              :column   "dname"}
                             {:property "dept:location"
                              :column   "loc"}
                             {:property       "dept:COMPANY"
                              :constant-value "XYZ Corporation"}]}]))

(defroutes example
  (GET "/departments/:id" [id]
       (let [triples (:results (run-mapping (mapping id) *context* *additional-ns*))]
         {:status 200
          :headers {"Content-Type" "text/turtle"}
          :body (to-rdf *additional-ns* triples)}))

  (GET "/departments" []
       (let [triples (:results (run-mapping (mapping) *context* *additional-ns*))]
         {:status 200
          :headers {"Content-Type" "text/turtle"}
          :body (to-rdf *additional-ns* triples)}))

  (route/not-found "Page not found"))

We can run this web application directly from the REPL:

REPL started; server listening on localhost:2178.
user=> (use 'clj-r2rml-test.core)      
nil
user=> (use 'ring.adapter.jetty)       
nil
user=> (run-jetty example {:port 8080})
2010-12-09 21:10:13.060::INFO:  Logging to STDERR via org.mortbay.log.StdErrLog
2010-12-09 21:10:13.061::INFO:  jetty-6.1.14
2010-12-09 21:10:13.078::INFO:  Started SocketConnector@0.0.0.0:8080

And retrieve the RDF graph for all the departments:

$ curl -X GET http://localhost:8080/departments
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> . 
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> . 
@prefix rdf:  <http://www.w3.org/1999/02/22-rdf-syntax-ns#> . 
@prefix dept: <http://test-clj-r2rml#> . 
@prefix xyz:  <http://test-clj-r2rml-xyz#> . 
_:Department10 dept:deptno "10"^^xsd:positiveInteger .
_:Department10 dept:name "APPSERVER" .
_:Department10 dept:location "NEW YORK" .
_:Department10 dept:COMPANY "XYZ Corporation" .
_:Department11 dept:deptno "11"^^xsd:positiveInteger .
_:Department11 dept:name "APPSERVER" .
_:Department11 dept:location "BOSTON" .
_:Department11 dept:COMPANY "XYZ Corporation" .
_:Department10 rdf:type xyz:dept .
_:Department11 rdf:type xyz:dept .

Or for a single department:

$ curl -X GET http://localhost:8080/departments/10
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> . 
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> . 
@prefix rdf:  <http://www.w3.org/1999/02/22-rdf-syntax-ns#> . 
@prefix dept: <http://test-clj-r2rml#> . 
@prefix xyz:  <http://test-clj-r2rml-xyz#> . 
_:Department10 dept:deptno "10"^^xsd:positiveInteger .
_:Department10 dept:name "APPSERVER" .
_:Department10 dept:location "NEW YORK" .
_:Department10 dept:COMPANY "XYZ Corporation" .
_:Department10 rdf:type xyz:dept .

clj-r2rml is just a tentative implementation of the recommendation, with the only goal of studying the standard but if you take a look at the list of planned implementations fo R2RMLit will not take long until we see robust implementations of R2RML (the presence of Alex Miller in the list will hopefully mean a production ready clojure/java implementation in the future).

R2RML altogether with other projects like JSON-LD, the building of new libraries to work effectively with semantic technologies in the browser and mobile devices and the interest in linked data will hopefully mean a new push for the semantic web and, maybe, better ways of building web APIs.

Advertisements

One thought on “Exposing databases as linked data using clojure, compojure and R2RML

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s