Main Content

ocr

Recognize text using optical character recognition

Description

example

txt = ocr(I) returns an ocrText object containing optical character recognition information from the input image, I. The object contains recognized text, text location, and a metric indicating the confidence of the recognition result.

Note

The ocr function also recognizes seven-segment digits from images.

example

txt = ocr(I, roi) recognizes text in I within one or more rectangular regions. The roi input contains an M-by-4 matrix, with M regions of interest.

example

[___] = ocr(___,Name=Value) specifies options using one or more name-value arguments in addition to any combination of arguments from previous syntaxes. For example, Language="english", sets English as the language to detect.

Examples

collapse all

     businessCard   = imread("businessCard.png");
     ocrResults     = ocr(businessCard)
ocrResults = 
  ocrText with properties:

                      Text: '‘ MathWorks®...'
    CharacterBoundingBoxes: [103x4 double]
      CharacterConfidences: [103x1 single]
                     Words: {16x1 cell}
         WordBoundingBoxes: [16x4 double]
           WordConfidences: [16x1 single]

     recognizedText = ocrResults.Text;    
     figure;
     imshow(businessCard);
     text(600,150,recognizedText,"BackgroundColor",[1 1 1]);

Figure contains an axes object. The axes object contains 2 objects of type image, text.

Read image.

I = imread("handicapSign.jpg");

Define one or more rectangular regions of interest to recognize text within input image.

roi = [360 118 384 560];

You may also use IMRECT to select a region using a mouse.

For example, figure;imshow(I); roi = round(getPosition(imrect)).

ocrResults = ocr(I,roi);

Insert recognized text into the original image.

Iocr = insertText(I,roi(1:2),ocrResults.Text,AnchorPoint="RightTop",FontSize=16);
figure;
imshow(Iocr);

Figure contains an axes object. The axes object contains an object of type image.

Read an image containing the seven-segment display into the workspace.

I = imread("sevSegDisp.jpg");

Specify the ROI that contains the seven-segment display.

roi = [506 725 1418 626];

To recognize the digits from the seven-segment display, specify the Language argument as "seven-segment".

ocrResults = ocr(I,roi,Language="seven-segment");

Display the recognized digits and detection confidence.

fprintf("Recognized seven-segment digits: ""%s""\nDetection confidence: %0.4f",cell2mat(ocrResults.Words),ocrResults.WordConfidences)
Recognized seven-segment digits: "5405.9"
Detection confidence: 0.7948

Insert the recognized digits into the image.

Iocr = insertObjectAnnotation(I,"rectangle",...
            ocrResults.WordBoundingBoxes,ocrResults.Words,LineWidth=5,FontSize=72);
figure
imshow(Iocr)

Figure contains an axes object. The axes object contains an object of type image.

     businessCard = imread("businessCard.png");
     ocrResults   = ocr(businessCard)
ocrResults = 
  ocrText with properties:

                      Text: '‘ MathWorks®...'
    CharacterBoundingBoxes: [103x4 double]
      CharacterConfidences: [103x1 single]
                     Words: {16x1 cell}
         WordBoundingBoxes: [16x4 double]
           WordConfidences: [16x1 single]

     Iocr         = insertObjectAnnotation(businessCard,"rectangle", ...
                           ocrResults.WordBoundingBoxes, ...
                           ocrResults.WordConfidences);
     figure; imshow(Iocr);

Figure contains an axes object. The axes object contains an object of type image.

businessCard = imread("businessCard.png");
ocrResults = ocr(businessCard);
bboxes = locateText(ocrResults,"MathWorks",IgnoreCase=true);
Iocr = insertShape(businessCard,"FilledRectangle",bboxes);
figure; imshow(Iocr);

Figure contains an axes object. The axes object contains an object of type image.

Input Arguments

collapse all

Input image, specified in M-by-N-by-3 truecolor, M-by-N 2-D grayscale, or binary format. The input image must be a real, nonsparse value. The function converts truecolor or grayscale input images to a binary image, before the recognition process. It uses the Otsu’s thresholding technique for the conversion. For best ocr results, the height of a lowercase ‘x’, or comparable character in the input image, must be greater than 20 pixels. From either the horizontal or vertical axes, remove any text rotations greater than +/- 10 degrees, to improve recognition results.

Data Types: single | double | int16 | uint8 | uint16 | logical

One or more rectangular regions of interest, specified as an M-by-4 element matrix. Each row, M, specifies a region of interest within the input image, as a four-element vector, [x y width height]. The vector specifies the upper-left corner location, [x y], and the size of a rectangular region of interest, [width height], in pixels. Each rectangle must be fully contained within the input image, I. Before the recognition process, the function uses the Otsu’s thresholding to convert truecolor and grayscale input regions of interest to binary regions. The function returns text recognized in the rectangular regions as an array of objects.

To obtain best results when using ocr to recognize seven-segment digits, specify an roi enclosing the seven-segment digits in the input image.

Name-Value Arguments

Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter.

Before R2021a, use commas to separate each name and value, and enclose Name in quotes.

Example: ocr(I,TextLayout=block) sets the text layout to "block".

Input text layout, specified as one of the following:

TextLayoutText Treatment
"auto"

Treats the text in the image as a "block" if the Language argument is set to "seven-segment". Otherwise, the function treats the text in the image as a "page".

"page"Treats the text in the image as a page containing blocks of text.
"block"Treats the text in the image as a single block of text.
"line"Treats the text in the image as a single line of text.
"word"Treats the text in the image as a single word of text.
"character"Treats the text in the image as a single character.

You can use the TextLayout argument to determine the layout of the text within the input image. For example, you can specify TextLayout as "page" to recognize text from a scanned document that contains a specific format, such as a double column. This setting preserves the reading order in the returned text.

You may get poor results if your input image contains a few regions of text or the text is located in a cluttered scene. If you get poor OCR results, try a different layout that matches the text in your image. If the text is located in a cluttered scene, try specifying an ROI around the text in your image in addition to trying a different layout.

Language to recognize, specified as "english", "japanese", "seven-segment", character vector, string scalar, string array, or as a cell array of character vectors.

If you specify the Language as "seven-segment", the ocr function recognizes seven-segment digits in the input image.

You can also install the Install OCR Language Data Files package for additional languages or add a custom language. Specifying multiple languages enables simultaneous recognition of all the selected languages. However, selecting more than one language may reduce the accuracy and increase the time it takes to perform ocr.

To specify any of the additional languages which are contained in the Install OCR Language Data Files package, use the language character vector the same way as the built-in languages. You do not need to specify the path.

txt = ocr(img,Language="finnish");

 List of Support Package OCR Languages

To use your own custom languages, specify the path to the trained data file as the language character vector. You must name the file in the format, <language>.traineddata. The file must be located in a folder named tessdata. For example:

txt = ocr(img,Language="path/to/tessdata/eng.traineddata");
You can load multiple custom languages as a cell array of character vectors:
txt = ocr(img,Language={"path/to/tessdata/eng.traineddata",...
                "path/to/tessdata/jpn.traineddata"});
The containing folder must always be the same for all the files specified in the cell array. In the preceding example, all of the traineddata files in the cell array are contained in the folder ‘path/to/tessdata’. Because the following code points to two different containing folders, it does not work.
txt = ocr(img,Language={"path/one/tessdata/eng.traineddata",...
                "path/two/tessdata/jpn.traineddata"});
Some language files have a dependency on another language. For example, Hindi training depends on English. If you want to use Hindi, the English traineddata file must also exist in the same folder as the Hindi traineddata file. The ocr only supports traineddata files created using tesseract-ocr 3.02 or using the OCR Trainer.

For deployment targets generated by MATLAB® Coder™: Generated ocr executable and language data file folder must be colocated. The tessdata folder must be named tessdata:

  • For English: C:/path/tessdata/eng.traineddata

  • For Japanese: C:/path/tessdata/jpn.traineddata

  • For Seven-segment: C:/path/tessdata/seven_segment.traineddata

  • For custom data files: C:/path/tessdata/customlang.traineddata

  • C:/path/ocr_app.exe

You can copy the English, Japanese and Seven-segment trained data files from:

fullfile(matlabroot,"toolbox","vision","visionutilities","tessdata");

Character subset, specified as a character vector. By default, CharacterSet is set to the empty character vector, "". The empty vector sets the function to search for all characters in the language specified by the Language property. You can set this property to a smaller set of known characters to constrain the classification process.

The ocr function selects the best match from the CharacterSet. Using deducible knowledge about the characters in the input image helps to improve text recognition accuracy. For example, if you set CharacterSet to all numeric digits, "0123456789", the function attempts to match each character to only digits. In this case, a non-digit character can incorrectly get recognized as a digit.

If you specify the Language as seven-segment, the ocr function uses the CharacterSet, "0123456789.:-".

Output Arguments

collapse all

Recognized text and metrics, returned as an ocrText object. The object contains the recognized text, the location of the recognized text within the input image, and the metrics indicating the confidence of the results. The confidence values range is [0 1] and represents a percent probability. When you specify an M-by-4 roi, the function returns ocrText as an M-by-1 array of ocrText objects.

If your ocr results are not what you expect, try one or more of the following options:

  • Increase the image 2-to-4 times the original size.

  • If the characters in the image are too close together or their edges are touching, use morphology to thin out the characters. Using morphology to thin out the characters separates the characters.

  • Use binarization to check for non-uniform lighting issues. Use the graythresh and imbinarize functions to binarize the image. If the characters are not visible in the results of the binarization, it indicates a potential non-uniform lighting issue. Try top hat, using the imtophat function, or other techniques that deal with removing non-uniform illumination.

  • Use the region of interest roi option to isolate the text. Specify the roi manually or use text detection.

  • If your image looks like a natural scene containing words, like a street scene, rather than a scanned document, try using an ROI input. Also, you can set the textLayout argument to "block" or "word".

Limitations

  • The Seven-Segment language cannot be combined with other languages. For example, this syntax is not supported:

    ocr(I,Language=["english","seven-segment"])

References

[1] R. Smith. An Overview of the Tesseract OCR Engine, Proceedings of the Ninth International Conference on Document Analysis and Recognition (ICDAR 2007) Vol 2 (2007), pp. 629-633.

[2] Smith, R., D. Antonova, and D. Lee. Adapting the Tesseract Open Source OCR Engine for Multilingual OCR. Proceedings of the International Workshop on Multilingual OCR, (2009).

[3] R. Smith. Hybrid Page Layout Analysis via Tab-Stop Detection. Proceedings of the 10th international conference on document analysis and recognition. 2009.

Extended Capabilities

Version History

Introduced in R2014a