Optical character recognition, usually abbreviated to OCR, involves computer software designed to translate images of typewritten text (usually captured by a scanner) into machine-editable text, or to translate pictures of characters into a standard encoding scheme representing them in (ASCII or Unicode). OCR began as a field of research in artificial intelligence and machine vision. Though academic research in the field continues, the focus on OCR has shifted to implementation of proven techniques.
Optical character recognition (using optical techniques such as mirrors and lenses) and digital character recognition (using scanners and computer algorithms) were originally considered separate fields. Because very few applications survive that use true optical techniques, the optical character recognition term has now been broadened to cover digital character recognition as well.
Early systems required "training" (essentially, the provision of known samples of each character) to read a specific font. Currently, though, "intelligent" systems that can recognize most fonts with a high degree of accuracy are now common. Some systems are even capable of reproducing formatted output that closely approximates the original scanned page including images, columns and other non-textual components.
The first commercial system was installed at the Readers Digest in 1955, which, many years later, was donated by Readers Digest to the Smithsonian, where it was put on display. The second system was sold to the Standard Oil Company of California for reading credit card imprints for billing purposes, with many more systems sold to other oil companies. Other systems sold by IMR during the late 1950's were a bill stub reader to the Ohio Bell Telephone Company and a page scanner to the U.S. Air Force for reading and transmitting by teletype typewritten messages. IBM and others were later licensed on Shepard's OCR patents.
The United States Postal Service has been using OCR machines to sort mail since 1965 based on technology devised primarily by the prolific inventor Jacob Rabinow. The first use of OCR in Europe was by the British Post Office. In 1965 it began planning an entire banking system, the National Giro, using OCR technology, a process that revolutionized bill payment systems in the UK. Canada Post has been using OCR systems since 1971. OCR systems read the name and address of the addressee at the first mechanized sorting center, and print a routing bar code on the envelope based on the postal code. After that the letters need only be sorted at later centers by less expensive sorters which need only read the bar code. To avoid interference with the human-readable address field which can be located anywhere on the letter, special ink is used that is clearly visible under UV light. This ink looks orange in normal lighting conditions. Envelopes marked with the machine readable bar code may then be processed.
Recognition of hand printing, cursive handwriting, and even the printed typewritten versions of some other scripts (especially those with a very large number of characters), are still the subject of active research.
Early research into recognition of printed sheet music was performed at the graduate level in the mid 1970's at MIT and other institutions. Successive efforts were made to localize and remove musical staff lines leaving symbols to be recognized and parsed. The first commercial music-scanning product, MIDISCAN, was released in 1991. Several commercial products are now available.
Generally, for more complex recognition problems neural networks are commonly used as they generally can be made indifferent to both affine and non-linear transformations.*
A related area is raster to vector conversion, converting bitmap images (for example, maps including drawings, text, and map symbols) into vector graphics that are easier to work with.
| Symbol | Name | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Hex | |||||||||
| Symbol's Picture | |||||||||
| ⑀ | OCR Hook | ⑁ | OCR Chair | ⑂ | OCR Fork | ⑃ | OCR Inverted Fork | ⑄ | OCR Belt Buckle |
| 0x2440 | 0x2441 | 0x2442 | 0x2443 | 0x2444 | |||||
| ⑅ | OCR Bow Tie | ⑆ | OCR Branch Bank Identification | ⑇ | OCR Amount Of Check | ⑈ | OCR Dash | ⑉ | OCR Customer Account Number |
| 0x2445 | 0x2446 | 0x2447 | 0x2448 | 0x2449 | |||||
| ⑊ | OCR Double Backslash | Not Defined | Not Defined | Not Defined | Not Defined | ||||
| 0x244A | 0x244B | 0x244C | 0x244D | 0x244E | |||||
Artificial intelligence applications | Applications of computer vision | Optical character recognition | Information technology | Unicode | Symbols
OCR | Texterkennung | Reconocimiento óptico de caracteres | Optika signorekono | تشخیص نوری نویسهها | Reconnaissance optique de caractères | Optical Character Recognition | Optičko prepoznavanje znakova | Ljóslestur | Optical Character Recognition | זיהוי תווים אופטי | Optikai karakterfelismerés | Optical Character Recognition | 光学文字認識 | OCR | OCR | Tekstintunnistus | Optical character recognition | โอซีอาร์ | OCR | 光学字符识别
This article is licensed under the GNU Free Documentation License.
It uses material from the
"Optical character recognition".
Home Page • arts • business • computers • games • health • hospitals • home • kids & teens • news • physicians • recreation• reference • regional • science • shopping • society • sports • world