Concept in Definition ABC
Miscellanea / / July 04, 2021
By Francisco Cano, in May. 2014
We can see that not only a scanner is capable of recognizing letters and type.
The OCR or Optical Character Recognition, translated into Spanish, is a technology that aims to emulate the human eye, this technology tries to achieve that the characters and the type of font (typeface) what a document is made of. We are talking about an ordinary scanner and quite powerful software. When we refer to powerful software, we mean a large and broad database to be able to recognize the different letters and their corresponding types.
Apart from the software the scanner is important. A very sensitive scanner will better read the pixels of the document since this sensitivity it will allow the software to make less mistakes. It is quite difficult for the software not be wrong. Any document is placed in the scanner and it comes out in Format Word or in the format that the program lets you choose. After this you have to correct the document. Once corrected we can put it in pdf to share or archive.
One of the great uses for OCR is book scanning. As for example, the collection of the national library. Likewise the famous e.book that can be read everywhere on ipad and android type tablets and in readers ebook.
An example of how a book is scanned.
OCR technology has a limitation. It does not work for ancient texts or that for some reason have suffered physical deterioration. This deterioration is very normal when it comes to historical documents that can be up to a thousand years old. These types of documents, for which the years have taken their toll, are quite unrecognizable to the OCR technology in question. These types of documents are usually archived with high resolution scans in the Photography so that the public can admire all the details of a document without deteriorating it.
The resolution in OCR tells us how much detail the system itself detects. For clear and defined texts the normal thing is to use 300 dpi (dots per inch). This is configured from the scanner. One inch equals 25,400 millimeters, so 300 pixels for such a small area is sufficient. In the case of newspapers or the like. where the print is small and the paper is always a bit battered, the optimal resolution would be 600 dpi. If we scan at this last resolution it is better for us to have a good scanner since it costs a lot for a common scanner to complete the action at this resolution.
The evolution OCR goes through a gets better of this system. Gets better which is already underway in a draft called IMPACT. This project aims to share information among various institutions state and some company to develop OCR software that meets all the requirements for mass digitization.
Topics in OCR