Monte Carlo integration with Clojure and Mahout

Monte Carlo simulations can be used to approximate the value of different mathematical models where computing an analytical solution is an intractable problem or it is impossible to find because of the uncertainty in some variables.

Conceptually Monte Carlo simulations are simple.
They are based in the repeated computation of the model where the value for the random variables are extracted from some known probability distribution.
Finally, the results of all the simulations are averaged to obtain the approximated value.

The two main requirements to run a Monte Carlo simulation, provided you already have a mathematical model of the proble, are a good random number generator and some tool to model the different probability distributions. The Apache Mahout project has implemented these tools. It includes a powerful pseudo-random numbers generator that passes all the DieHard randomness tests. It also includes Java classes that make easy to work with different probability distributions.

This post is a small example of how to use these classes from Clojure as a convenient scripting language to work with Mahout. The example implemented is extracted from this excellent introductory book published by Springer.
It consists of computing the integral for the following function:

  (defn h
    ([x] (pow (+ (cos (* 50 x)) (sin (* 20 x))) 2)))

We can use Incanter to visualize this function:

  (use 'incanter.core)
  (use 'incanter.charts)

  (view  (function-plot h 0 1))

The integral can be computed anallytically obtaining the value 0.965.

The montcarlo approach is based in extracting iid variables from an uniform probability distribution U(0,1), and approximate the value with the function:

  (def *solution* (/ (reduce + *randoms*) *trials*))

Where *randoms* is the collection of randomly generated iid variables and *trials* is the number of variables generated.

In order to generate these values we can use the org.apache.mahout.math.jet.random.Uniform Mahout class:

  (import 'org.apache.mahout.math.jet.random.Uniform)
  (def *unif* (Uniform. (double 0.0)
                        (double 1.0)
                        (mod (.getTime (java.util.Date.)) 10000)))

  (def *trials* 100000)

  (def *randoms* (take *trials* (repeatedly #(h (.nextDouble *unif*)))))

To check that the values generated, we can plot the generated values using Incanter:

  (view (bar-chart (range 0 *trials*) *randoms* ))

The computed value is 0.9651838354718588, very close to the actual solution of 0.965.

We can also compute the running averages and plot them to see how the algorithm converges to the solution:

  (def *averages* (reduce (fn [ac i]
                            (conj ac (/ (+ (last ac) (nth *randoms* i)) (inc i))))
                          (range 0 *trials*)))

  (view  (function-plot (fn [x] (nth *averages* (Math/round (float x)))) 0 *trials*))

Monte Carlo simulations are a simple yet very powerful idea that can be applied to a lots of different of situations. If you want to learn more about Monte Carlo methods, these are some useful resources:

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s