Tesseract OCR App | Double Dig IT

Nederlands

Using Tesseract OCR in an IOS App

– to read shelf labels

Optical character recognition is already becoming a well-established topic in both artificial intelligence and computer vision. This is for good reasons; OCR can be used to successfully turn physical printed – or even handwritten – text into machine readable form. A possible application of which could be to read the license plates of cars for various implementations.

Museums have been using machine readable labels on objects or storage shelves in the form of bar codes, QR codes or even RFID tags. Although this works very well it does require to put these new labels on objects or shelves. What if a museum currently just has clearly printed labels with text to identify objects? Wouldn’t it be great if an inventory control program, such as Axiell Move, would be able to read existing printed labels as well as machine readable labels like bar codes.

In our proof of concept implementation one can copy and paste the result of the ocr app into Axiell Move, but in the future a direct integration would be much better of course.

Setting Up Tesseract

To test our ideas, we created a simple IOS application in C# ( Xamarin) and used Tesseract – an open source OCR Engine – to read some example labels which we got from a museum. Tesseract is quite lightweight (training data included) and is relatively easy to use. Although Tesseract itself is written in C++ there is an existing C# wrapper for IOS, which can be found on their GitHub page.

To interpret printed text you create a Tesseract Engine object, which gets initiated with a string parameter to set which language you would like the Engine to recognize. After doing so, you can simply call the ‘SetImage‘ function which takes care of processing the image and starting the OCR process. When the process is finished the Engine instance will have set a variable called ‘Text‘ which contains the interpreted image as a string. – Note that the ‘SetImage‘ is an asynchronous function and needs to be awaited before trying to access the result.

Tesseract implements the OCR as a neural network (Artificial Intelligence) and thus needs training data to operate. There are open source multilingual ‘.trainingdata’ files available on the Tesseract-OCR GitHub page, which contain preconfigured neural networks. There are different training data-sets that can be downloaded with differences in accuracy, speed and size – ranging from 3.9 to 22.4 MB for English training data alone.

Our application would only be dealing with high contrast block letters, so the most basic training turned out to be working fine. We encountered some difficulties while running newer training data on (the outdated) IOS 10.3. For debugging purposes, we wanted the app to work on a physical test device (an iPhone 5), so we had to choose a slightly older data-set. However we ran into more trouble when setting up a live camera feed to the display. Apple has developed multiple different ways to achieve a live feed over the years, and while each iterations has been more streamlined than the last, the IOS 10.3 version is a bit messy… So we decided to switch to IOS 12.2 for a more future proof and readable application. Fortunately updating the training data is completely hassle free, and it is easily upgraded to a newer version.

Live Camera View

For a live feed in IOS 12.2 we included the AVFoundation namespace, which lets you setup an AVCaptureSession. The session has an input and output; using the AVCaptureDeviceInput.FromDevice() function gives you the camera in the right format. As our output we used a AVCapturePhotoOutput instance, which let us capture images from the live feed. After this is ready you can add the AVCaptureSession as a sublayer to a UIView of your choosing using a new AVCaptureVideoPreviewLayer instance which simply takes the session as a parameter (you’ll probably have to fiddle with some frames and bounds to get the layer to fit right in the UIView!). While the session is running you can take a picture with the AVCapturePhotoOutput using the CapturePhoto function ( – this is one of the functions that didn’t exist yet in 10.3, which does make things more difficult than need be). This will fire off an event when ready which you can capture with a delegate – or an asynchronous delegate in our case, as it immediately starts the OCR process.

To finalize our test application we implemented an alpha numerical characters filter, automatic copying of the OCR result to your clipboard, and a slider for selecting which part of the image you wish to read. However much more functionality could be added; everything from extending the current filter options to adding a connection with a database to start executing search-queries directly when the OCR is finished.

Source Code

– Demo program, the source code for this ‘proof of concept‘ project: https://github.com/bertdd/ocr-concept

References

– Tesseract OCR GitHub, here you can find all the required download links, including a short documentation: https://github.com/tesseract-ocr/

– Apple AVFoundation, info on the classes used to display video and capture images: https://developer.apple.com/documentation/avfoundation/cameras_and_media_capture

– Xamarin UIKit, info on the UI objects used to display live video: https://alm.axiell.com/collections-management-solutions/axiell-collections/

Products

Collection Publisher

Collection Manager

Collection API

Support

Data processing

Migration Support

Custom Solutions

Ideas & Experiments

ARK Identifiers

Automatic Cropping

Unused Term Count Tool

News

🔆 Summer news from Double Dig IT

Collection Publisher at the Airborne Museum Hartenstein