What Is Optical Character Recognition & How Does It Work?

You are currently viewing What Is Optical Character Recognition & How Does It Work?
cc: Greenide

It’s pretty easy to take words on your computer screen and put them on a physical sheet of paper. Just click print and unless you’ve forgotten to fork out an extortion level amount of money for a new cartridge, you’ll have fresh warm satisfying documents just a few moments later. But going in the opposite direction, scanning dead tree information into your PC is actually quite a bit trickier – not because flatbed scanners are so difficult to operate per se but many of them are basically just taking a picture of the document and saving it on to your PC meaning not only will it probably not look very crisp but due to file compression and little bits of dust in your scanner but you can’t edit a clean copy of your document in your favorite word processor because the scanner won’t recognize each individual character.

Fortunately there are number of devices out there that enable “Optical Character Recognition” or OCR where each character on a page is scanned individually so your papers are uploaded as actual text documents instead of messy JPEGs. But how exactly does that work and is one kind of optical scanner better than another? Well, because the whole concept of translating text into electronic signal is pretty broad, there have been lots of different implementations of OCR over the years. In fact, one of the earliest electric OCR devices, the “Optophone”, was invented all the way back in 1914.

 What Is Optical Character Recognition (OCR) & How Does It Work: This bizarre-looking contraption relied on the special behavior of selenium which conducts electricity differently in light and darkness. As it scanned the words on a page, the Optophone distinguished between the dark ink of text and lighter blank spaces, generating tones that corresponds to different letters making it possible for blind people to read with some practice.
Wikipedia

This bizarre-looking contraption relied on the special behavior of selenium which conducts electricity differently in light and darkness. As it scanned the words on a page, the Optophone distinguished between the dark ink of text and lighter blank spaces, generating tones that corresponds to different letters making it possible for blind people to read with some practice.

Later, in 1931, a machine was developed that could convert printed text to Telegraph code, one of the first technologies to translate printed characters to electrical impulses rather than sounds. But it wasn’t until the 1960s and 70s that OCR began to take a more familiar modern form with postal services using OCR to read addresses and software that could recognize many different fonts.

How Does OCR Work?

How Does Optical Character Recognition (OCR) Work: So, back to present day, when you scan a document, how exactly does the software know what it’s looking at? Well, the first step is to cut out artifacts so your OCR program can concentrate on the text and nothing else. So it attempts to remove dust and other various graphics, align the text properly and convert any colors or shades of gray in the image to black and white only, making the words themselves easier to recognize. The next is to figure out which characters are on the page, simpler forms of OCR compare each scanned letter, pixel by pixel to a known database of fonts and decide on the closest match. The smarter OCT, however, takes this step farther by breaking down each character down to constituent elements like curves and corners and looking for matching physical features and actual letters. You can think of the differences between these two approaches similarly to the difference between raster and vector images. OCR software can also make use of a dictionary so it won’t accidentally spit out nonsense words due to inaccurate scanning. For example, if your scanner sees something like a word “dog” but it can’t quite tell whether the middle letter is an O or an A, it can check its own dictionary to decide that the word is actually “dog” and not “dag”, giving OCR software situational information can further cut down on errors, such as telling it to only try to match numbers if it’s reading zip codes on an envelope.
vbridge.co.uk

So, back to present day, when you scan a document, how exactly does the software know what it’s looking at? Well, the first step is to cut out artifacts so your OCR program can concentrate on the text and nothing else. So it attempts to remove dust and other various graphics, align the text properly and convert any colors or shades of gray in the image to black and white only, making the words themselves easier to recognize.

The next is to figure out which characters are on the page, simpler forms of OCR compare each scanned letter, pixel by pixel to a known database of fonts and decide on the closest match. The smarter OCT, however, takes this step farther by breaking down each character down to constituent elements like curves and corners and looking for matching physical features and actual letters. You can think of the differences between these two approaches similarly to the difference between raster and vector images.

OCR software can also make use of a dictionary so it won’t accidentally spit out nonsense words due to inaccurate scanning. For example, if your scanner sees something like a word “dog” but it can’t quite tell whether the middle letter is an O or an A, it can check its own dictionary to decide that the word is actually “dog” and not “dag”, giving OCR software situational information can further cut down on errors, such as telling it to only try to match numbers if it’s reading zip codes on an envelope.

Even with these tricks, however, OCR obviously is not perfect which you’ve probably seen for yourself if you’ve ever used it but with greater processing power and machine learning techniques that allow software to recognize more subtle patterns over time, OCR has become versatile enough to recognize harder to read typefaces in consistently printed material and even handwriting. Free OCR cloud processing services like Google Drive, which has a lot more machine learning capability than your home PC have made OCR more accessible than ever.

What Do You Think?