visualizing Mahout’s output with Clojure and Incanter

Some Clojure code to visualize clusters built using Apache Mahout implementation of the K-Means clustering algorithm.

The code retrieves the output of the algorithm (clustered-points and centroids) from HDFS, builds a Clojure friendly representation of the output (a map and a couple of lazy-seqs) and finally uses Incanter’s wrapper around JFreeChart to visualize the results.

A sample execution using the output generated by the example from Mahout’s documentation:

(use 'mahout-vis.core)

(bootstrap! "/Users/antonio/Development/tmp/hadoop-0.20.2/
conf/core-site.xml")

(def *results* 
  (k-means-output "output/clusters-9/part-r-00000" 
                  "output/clusteredPoints/part-m-00000"))

(visualize-plots 
  (compute-comps *results* [5 10] [15 50] 
                           {:display-centroids true}))

The output of the previous code are 4 frames displaying the clusters for the components with indices 5,10,15 and 50 of the input data.

Other visualizations can be generated interactively from Clojure’s REPL. Another example of how Clojure can provide an interactive and powerful interface to complex Java systems.

Advertisement

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s