Digitizing documents, translating text into real-time images, or reading identity documents are skills that mean computers can see but also understand information in the form of characters and symbols.
In the previous post, we already discussed what computer vision is and how a computer can mimic human vision. In this case, reading ability involves recognizing and recognizing text, a subfield of computer vision called Optical Character Recognition.
OCR is the process by which computer programs convert text from an image identified by symbols, characters, or letters of an alphabet, to store text data information.
This is a very useful resource for businesses that manage a lot of information through documents. Digitizing documents via OCR can provide a competitive advantage to those who not only scan their documents but also have text inside each document, in addition to better data preservation.
How does it work?
Typically, the flow of the process will work as follows:
- Take a picture: Whether it is defined as a human task of taking a photograph, scanning a document, or the computer itself has a computer vision algorithm, an image with text is required.
- Pre-processing: At this point, the image should be as appropriate as possible for character recognition. Generally speaking, the image is usually converted to black and white, the noise is removed with some blurring and the image skew is corrected.
- Text recognition: The text of the image is detected and recognized according to the OCR type selected, which generates a set of text data representing the words and their coordinates.
- After processing: The entire data set is parsed and organized to understand the appropriate context for the image. For example, if it is an identification document, the person’s name is next to the text where it identifies the word “Name”.