Embedding V8 in a Clojure SPARQL library

Using Javascript in Java applications is a common practice. The easiest path to achieve this integration is to use the excellent Rhino library from the Mozilla Foundation. This Library is an implementation of Javascript in Java, so interop. from both sides is really easy.
Nevertheless, when I was in the need of re-using a Javascript implementation of a SPARQL 1.1 parser I has been working in a different Clojure project, I decided to try a different approach and use Google’s V8 Javascript engine. These are the results of this small experiment.

The V8 project has some instructions about how to embed the engine into a C application. V8 exposes JavaScript objects to C code through ‘handlers’ referencing these objects. Objects, can be createad and released from C but instead of manually tracking each object individually, they can be grouped into ‘handle scopes’. This mechanism simplifies the management of JavaScript objects memory, since all the objects associated to a scope can be garbage-collected by V8 once the scope is deleted. Additionallly, in the same way that a C object can be created in the heap or in the stack, V8 object handlers can also be persistent or transient.

V8 objects are created and manipulated in a certain ‘execution context’. These contexts are persistent between function calls, and a scope to declare or manipulate additional JS objects can always be retrieved from the context object. C data structures and functions can also be used in JS code using ‘templates’ that wrap the C object in the JS execution context.

As an example, the following code shows the constructor of a C++ class that loads the JS parser code from a file, initializes a persistent execution context, and evaluates the JS script in that context. As a result the JS parsing function will be available into that V8 execution context.

SparqlParser::SparqlParser() {

  char script[500000];

  // load the parser js code
  std::ifstream fin;
  fin.open("sparql_parser.js", std::ios::in);
  fin.read(script,500000);
  fin.close();

  // The context where the parser will be executed
  context = Context::New();
  HandleScope handle_scope;
  Context::Scope context_scope(context);

  // compiles the parser function
  Handle parserScript = Script::Compile(String::New(script));
  parserScript->Run();

};

The method parse from the same clase, just builds the invocation to the parse JS function, retrieves a new scope in the persistent execution context and executes the actual JS function invocation:

std::string SparqlParser::parse(std::string query) {
  Context::Scope context_scope(context);
  HandleScope handle_scope;

  // executes the parser function call on the global object
  std::string query_string = "sparql_query('";
  query_string = query_string.append(query);
  query_string = query_string.append("');");

  Handle runnerScript = Script::Compile(String::New(query_string.c_str()));
  Handle result = runnerScript->Run();

  if (!result.IsEmpty() && !result->IsUndefined()) {
    String::Utf8Value str(result);
    std::string toReturn = std::string(ToCString(str));

    return toReturn;
  } else {
    std::cout << "Something went wrong";
    return NULL;
  }

The destructor of the class, just releases the persistent execution context.

To be able to use this code from a Clojure function, a mechnism to execute native code from Java code must be used. The main alternatives are the Java Native Interface (JNI) and the newer Java Native Access (JNA). JNA offers a simpler and cleaner integration with Java code, and is well supported in Clojure and Leiningen thanks to projects like clojure-jna and clj-native. Unfortunately JNA is oriented towards plain C APIs and using C++ code like V8 supposes writing a C interface layer on top of C++ classes. Taking this into account, I decided to write a JNI interface, consisting of a Java class, that will wrap the C++ code using the JNI automatically generated C interface code.

This is the Java wrapper class:

package sparqltest;

public class SparqlParser {
     static {
         System.loadLibrary("SparqlParserWrapper");
     }

    private native void init();
    private native String parse_query(String query);

    private static SparqlParser parser;

    private SparqlParser() {
        init();
    }

    public String _parse(String query){
        return parse_query(query);
    }

    public static String parse(String query) {
        if(parser == null) {
            parser = new SparqlParser();
        }
        return parser._parse(query);
    }
 }

And this is the generated C interface, with the usual conversion between wrapped Java types and C types:

#include <jni.h>
#include <stdio.h>
#include "sparql_parser.h"
#include "sparqltest_SparqlParser.h"

SparqlParser parser;

JNIEXPORT void JNICALL Java_sparqltest_SparqlParser_init (JNIEnv *env , jobject obj) {
  parser = SparqlParser();
}

JNIEXPORT jstring JNICALL Java_sparqltest_SparqlParser_parse_1query (JNIEnv *env, jobject obj, jstring javaQuery) {

  const char *query = env->GetStringUTFChars(javaQuery, 0);
  std::string result = parser.parse(std::string(query));

  jstring javaResult = env->NewStringUTF(result.c_str());

  return javaResult;
}

After compiling and packaging the Java wrapper in a Jar file, we are ready to test the code from Clojure. Before that, the C++ code must have been compiled into a library that will be loaded by the JNI framework. The library must be located somewhere in the load path of JNI, as well as the V8 library if we have decided to compile it as a shared library. This path can be configured in the Leiningen project specification using the :native-path keyword (equivalent to use the -Djava.library.path argument in the Java invocation.
Since the C code, reads the JS parser from disk, the file containing the parser must also be located in the same directory where we are running the application.

If all the paths are OK and JNI can found the native libraries, the JS parser can be invoked without problems from Clojure code:

(ns clj-sparql-test.core
  (:import sparqltest.SparqlParser)
  (:use [clojure.contrib.json :only [read-json]]))

(defn parse-sparql [query]
  (read-json (SparqlParser/parse query)))

A sample invocation:

  user> (use 'clj-sparql-test.core)
  nil
  user> (parse-sparql "SELECT * { ?s ?p ?o }")
  [{:token "prologue",
    :base "",
    :prefixes []}
   {:kind "select",
    :token "executableunit",
    :dataset [],
    :projection [{:token "variable", :kind "*"}],
    :pattern {:token "groupgraphpattern",
              :patterns [{:token "basicgraphpattern",
                          :triplesContext 
                                [{:subject {:token "var", :value "s"},
                                  :predicate {:token "var", :value "p"},
                                  :object {:token "var", :value "o"}}]}],
                          :filters []}}]

The code of this experiment along with more details about the building of the differents components, ca be found here.

About these ads

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s