Why ocr function doesn't recognize the numbers?

Hi,
I have the image below:
I need to capture the numbers through ocr function. Thus, I use this code to do it:
capture = imread('Capture.png');
my_image = imresize(capture, 1.4);
ocrResults = ocr(my_image,'CharacterSet','.0123456789');
recognizedText = ocrResults.Words;
However, ocr function only recognize some numbers. In fact, the output is a cell array 37x1 while I should have 39 rows:
{'18.33'}
{'1423' }
{'1' }
{'6.55' }
{'1' }
{'5.65' }
{'12.54'}
{'14.77'}
{'10.33'}
{'13.79'}
{'12.94'}
{'1255' }
{'1' }
{'1.70' }
{'9.84' }
{'10.71'}
{'9.74' }
{'9.98' }
{'933' }
{'9.00' }
{'7.22' }
{'3.02' }
{'7.45' }
{'7.10' }
{'6.56' }
{'6.28' }
{'5.86' }
{'5.40' }
{'5.01' }
{'4.57' }
{'4.10' }
{'174' }
{'3.39' }
{'3.011'}
{'2.71' }
{'2.33' }
{'2.118'}
Then, many numbers are worng. Please, somone can help me? Thanks!

 Akzeptierte Antwort

Birju Patel
Birju Patel am 17 Jan. 2018
Bearbeitet: Birju Patel am 17 Jan. 2018

4 Stimmen

Hi,
A little bit of pre-processing and using ROIs to specify where the words are will help. By default, OCR uses page layout analysis to determine blocks of text. In this case, the image doesn't look like a normal page of text (like a PDF article for example).
To make it easier for OCR, first you can find the location of the words using regionprops and then pass the location of the words (as bounding boxes) to the OCR function. See the code below and results. They look accurate. You may have to play around more with the pre-processing to make this robust for a collection of different images. But hopefully, this gives you an idea on how to proceed:
capture = imread('Captura.PNG');
% Increase image size by 3x
my_image = imresize(capture, 3);
figure
imshow(my_image)
% Localize words
BW = imbinarize(rgb2gray(my_image));
BW1 = imdilate(BW,strel('disk',6));
s = regionprops(BW1,'BoundingBox');
bboxes = vertcat(s(:).BoundingBox);
% Sort boxes by image height
[~,ord] = sort(bboxes(:,2));
bboxes = bboxes(ord,:);
% Pre-process image to make letters thicker
BW = imdilate(BW,strel('disk',1));
% Call OCR and pass in location of words. Also, set TextLayout to 'word'
ocrResults = ocr(BW,bboxes,'CharacterSet','.0123456789','TextLayout','word');
words = {ocrResults(:).Text}';
words = deblank(words)
words =
39×1 cell array
{'0' }
{'18.33'}
{'0' }
{'14.23'}
{'0' }
{'16.55'}
{'0' }
{'15.65'}
{'12.64'}
{'14.77'}
{'10.83'}
{'13.79'}
{'12.94'}
{'12.55'}
{'0' }
{'11.70'}
{'9.84' }
{'10.71'}
{'9.74' }
{'9.98' }
{'9.33' }
{'9.00' }
{'7.22' }
{'8.02' }
{'7.45' }
{'7.10' }
{'6.56' }
{'6.28' }
{'5.86' }
{'5.40' }
{'5.02' }
{'4.57' }
{'4.10' }
{'3.74' }
{'3.39' }
{'3.00' }
{'2.71' }
{'2.33' }
{'2.08' }

4 Kommentare

Adriano
Adriano am 18 Jan. 2018
Great! It works very very well. Many thanks Birju!
Lucas  Duarte
Lucas Duarte am 22 Jul. 2018
Wonderful. It worked very well.
Robert Cadavos
Robert Cadavos am 26 Apr. 2020
you're my hero man!
It still does not work for me.

Melden Sie sich an, um zu kommentieren.

Weitere Antworten (0)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by