Last week I’ve been playing a little bit with computer vision and OCR. My ultimate goal was to digitalize the pile of bills I’ve stored since I moved to the UK and try to make some sense of the figures.
All my knowledge about computer vision was some previous experience with OpenCV. This wonderful library stores tons of algorithms for image transformation besides some support for utility functions to work with devices and basic UI widgets.
Research on open source OCR software led me to Tesseract, a robust OCR software originally developed by HP and now mantained by Google.
After writting some software for pre-processing bill pictures taken with an iPhone4 camera using OpenCV and then, attempting text capture with Tesseract, I had built a small C++ library that I intended to wrap using JNA so I could experiment easilly from the Clojure REPL trying different ways of preprocessing the image.
At this point I found a Clojure library called Vision by Nurullah Akkaya that wraps a good part of OpenCV in Clojure. Thanks to Vision I could just concentrate in wrapping the OCR parts exchanging pointers between both libraries in Clojure using JNA.
The clojure wrapper for Tesseract can be found at github with the instructions to install it.
Using the library is quite straightforward:
(use 'vision.core) (use 'clj-tesseract.core) ;; create an API instance (def *api* (make-tesseract "/tmp"))
We then can upload an image to attempt OCR or just grab one from the webcam using OpenCV/Vision
;; load an image (def *img* (load-image "numbers_pre.tiff" :grayscale)) ;; Webcam capture (def *capture* (capture-from-cam 0)) (def *frame* (query-frame *capture*))
Once we have the image, we can try to capture text using the
(println (capture *api* *img*)) BASICS UM1 MILK £0.49 BASICS UHI MILK £0.49 CRAMULA15u SUGAR £0.98 3 BALANCE Bus ¬£1.96 CASH ¬£1.96 CHANGE £0.00 (end *api*)
capture function applies some very basic preprocessing, like converting the image to grayscae and applying a threshold:
Is preprocessed to the following image:
Custom preprocessing can be applied using the
The result was OK for my use case, although no perfect, but thanks to the power of OpenCV many clever ways of preprocessing the image can be tried to improve text recognition. Furthermore, Tesseract is fully trainable, so another interesting way of improving results for a certain applications is to create a new language for Tesseract and then train the OCR software for that language. Creating a new language can be a little bit tricky, but there are some software that can help creating the required files. Custom languages can be used in clj-tesseract, passing the name of the language as an argument to the