visualizing Mahout’s output with Clojure and Incanter

Some Clojure code to visualize clusters built using Apache Mahout implementation of the K-Means clustering algorithm.

The code retrieves the output of the algorithm (clustered-points and centroids) from HDFS, builds a Clojure friendly representation of the output (a map and a couple of lazy-seqs) and finally uses Incanter’s wrapper around JFreeChart to visualize the results.

A sample execution using the output generated by the example from Mahout’s documentation:

(use 'mahout-vis.core)

(bootstrap! "/Users/antonio/Development/tmp/hadoop-0.20.2/
conf/core-site.xml")

(def *results* 
  (k-means-output "output/clusters-9/part-r-00000" 
                  "output/clusteredPoints/part-m-00000"))

(visualize-plots 
  (compute-comps *results* [5 10] [15 50] 
                           {:display-centroids true}))

The output of the previous code are 4 frames displaying the clusters for the components with indices 5,10,15 and 50 of the input data.

Other visualizations can be generated interactively from Clojure’s REPL. Another example of how Clojure can provide an interactive and powerful interface to complex Java systems.

Leave a comment