Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Optical character recognition
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
===Pre-processing=== OCR software often pre-processes images to improve the chances of successful recognition. Techniques include:<ref name="nicomsoft">{{cite web|url=https://www.nicomsoft.com/optical-character-recognition-ocr-how-it-works/ |title=Optical Character Recognition (OCR) β How it works |publisher=Nicomsoft.com |access-date=2013-06-16}}</ref> * De-[[Skew (fax)|skewing]]{{spaced ndash}}if the document was not aligned properly when scanned, it may need to be tilted a few degrees clockwise or counterclockwise in order to make lines of text perfectly horizontal or vertical. * [[Despeckle|Despeckling]]{{spaced ndash}}removal of positive and negative spots, smoothing edges * Binarization{{spaced ndash}}conversion of an image from color or [[greyscale]] to black-and-white (called a [[binary image]] because there are two colors). The task is performed as a simple way of separating the text (or any other desired image component) from the background.<ref name="Sezgin2004">{{cite journal|last1=Sezgin|first1=Mehmet|last2=Sankur|first2=Bulent|date=2004|title=Survey over image thresholding techniques and quantitative performance evaluation|url=http://webdocs.cs.ualberta.ca/~nray1/CMPUT605/track3_papers/Threshold_survey.pdf|journal=Journal of Electronic Imaging|volume=13|issue=1|page=146|bibcode=2004JEI....13..146S|doi=10.1117/1.1631315|archive-url=https://web.archive.org/web/20151016080410/http://webdocs.cs.ualberta.ca/~nray1/CMPUT605/track3_papers/Threshold_survey.pdf|archive-date=October 16, 2015|access-date=2 May 2015}}</ref> The task of binarization is necessary since most commercial recognition algorithms work only on binary images, as it is simpler to do so.<ref name="Gupta2007">{{cite journal|last1=Gupta|first1=Maya R.|last2=Jacobson|first2=Nathaniel P.|last3=Garcia|first3=Eric K.|date=2007|title=OCR binarisation and image pre-processing for searching historical documents.|url=http://www.rfai.li.univ-tours.fr/fr/ressources/_dh/DOC/DocOCR/OCRbinarisation.pdf|journal=Pattern Recognition|volume=40|issue=2|page=389|doi=10.1016/j.patcog.2006.04.043|bibcode=2007PatRe..40..389G|archive-url=https://web.archive.org/web/20151016080410/http://www.rfai.li.univ-tours.fr/fr/ressources/_dh/DOC/DocOCR/OCRbinarisation.pdf|archive-date=October 16, 2015|access-date=2 May 2015}}</ref> In addition, the effectiveness of binarization influences to a significant extent the quality of character recognition, and careful decisions are made in the choice of the binarization employed for a given input image type; since the quality of the method used to obtain the binary result depends on the type of image (scanned document, [[scene text]] image, degraded historical document, etc.).<ref name=Trier1995>{{cite journal|last1=Trier|first1=Oeivind Due|last2=Jain|first2=Anil K.|title=Goal-directed evaluation of binarisation methods.|journal=IEEE Transactions on Pattern Analysis and Machine Intelligence|date=1995|volume=17|issue=12|pages=1191β1201|url=http://heim.ifi.uio.no/inf386/trier2.pdf |archive-url=https://web.archive.org/web/20151016080411/http://heim.ifi.uio.no/inf386/trier2.pdf |archive-date=2015-10-16 |url-status=live|access-date=2 May 2015|doi=10.1109/34.476511}}</ref><ref name="Milyaev2013">{{cite book|last1=Milyaev|first1=Sergey|last2=Barinova|first2=Olga|last3=Novikova|first3=Tatiana|last4=Kohli|first4=Pushmeet|last5=Lempitsky|first5=Victor|title=2013 12th International Conference on Document Analysis and Recognition |chapter=Image Binarization for End-to-End Text Understanding in Natural Images |date=2013|url=https://www.microsoft.com/en-us/research/wp-content/uploads/2016/11/mbnlk_icdar2013.pdf |archive-url=https://web.archive.org/web/20171113184347/https://www.microsoft.com/en-us/research/wp-content/uploads/2016/11/mbnlk_icdar2013.pdf |archive-date=2017-11-13 |url-status=live |pages=128β132|doi=10.1109/ICDAR.2013.33|isbn=978-0-7695-4999-6|s2cid=8947361|access-date=2 May 2015}}</ref> * Line removal{{spaced ndash}}Cleaning up non-glyph boxes and lines * [[Document Layout Analysis|Layout analysis]] or zoning{{spaced ndash}}Identification of columns, paragraphs, captions, etc. as distinct blocks. Especially important in [[Column (typography)|multi-column layouts]] and [[Table (information)|tables]]. * Line and word detection{{spaced ndash}}Establishment of a baseline for word and character shapes, separating words as necessary. * Script recognition{{spaced ndash}}In multilingual documents, the script may change at the level of the words and hence, identification of the script is necessary, before the right OCR can be invoked to handle the specific script.<ref>{{Cite journal |last1=Pati |first1=P.B. |last2= Ramakrishnan |first2=A.G. |title=Word Level Multi-script Identification |date=1987-05-29 |journal=Pattern Recognition Letters |volume=29 |issue=9 |pages=1218β1229 |doi=10.1016/j.patrec.2008.01.027|bibcode=2008PaReL..29.1218P }}</ref> * Character isolation or segmentation{{spaced ndash}}For per-character OCR, multiple characters that are connected due to image artifacts must be separated; single characters that are broken into multiple pieces due to artifacts must be connected. * Normalization of [[aspect ratio]] and [[Scale (ratio)|scale]]<ref>{{cite web|url=http://blog.damiles.com/2008/11/20/basic-ocr-in-opencv.html |title=Basic OCR in OpenCV | Damiles |publisher=Blog.damiles.com |access-date=2013-06-16|date=2008-11-20 }}</ref> Segmentation of [[fixed-pitch font]]s is accomplished relatively simply by aligning the image to a uniform grid based on where vertical grid lines will least often intersect black areas. For [[proportional font]]s, more sophisticated techniques are needed because whitespace between letters can sometimes be greater than that between words, and vertical lines can intersect more than one character.<ref name="Tesseract overview" />
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)