OCR with Clojure, Tesseract and OpenCV

Last week I’ve been playing a little bit with computer vision and OCR. My ultimate goal was to digitalize the pile of bills I’ve stored since I moved to the UK and try to make some sense of the figures.

All my knowledge about computer vision was some previous experience with OpenCV. This wonderful library stores tons of algorithms for image transformation besides some support for utility functions to work with devices and basic UI widgets.

Research on open source OCR software led me to Tesseract, a robust OCR software originally developed by HP and now mantained by Google.

After writting some software for pre-processing bill pictures taken with an iPhone4 camera using OpenCV and then, attempting text capture with Tesseract, I had built a small C++ library that I intended to wrap using JNA so I could experiment easilly from the Clojure REPL trying different ways of preprocessing the image.

At this point I found a Clojure library called Vision by Nurullah Akkaya that wraps a good part of OpenCV in Clojure. Thanks to Vision I could just concentrate in wrapping the OCR parts exchanging pointers between both libraries in Clojure using JNA.

The clojure wrapper for Tesseract can be found at github with the instructions to install it.

Using the library is quite straightforward:

(use 'vision.core)
(use 'clj-tesseract.core)

;; create an API instance
(def *api* (make-tesseract "/tmp"))

We then can upload an image to attempt OCR or just grab one from the webcam using OpenCV/Vision

;; load an image
(def *img* (load-image "numbers_pre.tiff" :grayscale))

;; Webcam capture
(def *capture* (capture-from-cam 0))
(def *frame* (query-frame *capture*))

Once we have the image, we can try to capture text using the capture function.

(println (capture *api* *img*))
3 BALANCE Bus £1.96
CASH £1.96
CHANGE £0.00

(end *api*)

The capture function applies some very basic preprocessing, like converting the image to grayscae and applying a threshold:

Is preprocessed to the following image:

Custom preprocessing can be applied using the capture-no-preproces function.

The result was OK for my use case, although no perfect, but thanks to the power of OpenCV many clever ways of preprocessing the image can be tried to improve text recognition. Furthermore, Tesseract is fully trainable, so another interesting way of improving results for a certain applications is to create a new language for Tesseract and then train the OCR software for that language. Creating a new language can be a little bit tricky, but there are some software that can help creating the required files. Custom languages can be used in clj-tesseract, passing the name of the language as an argument to the make-tesseract function.


2 thoughts on “OCR with Clojure, Tesseract and OpenCV

  1. Great stuff!

    I’m looking at similar things for iPhone apps and I’m just wondering what kind of processing time you’re getting (i.e. how long it takes for the iPhone CPU to process the input image and output the OCR text)?

    For me, quick processing is essential for the app to be competitve, especially with thing like WordLens coming out that does OCR in real time.



Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s