Language to recognize, specified as the comma-separated pair
consisting of 'Language
' and the character vector 'English'
, 'Japanese'
,
or a cell array of character vectors. You can also install the Install OCR Language Data Files package
for additional languages or add a custom language. Specifying multiple
languages enables simultaneous recognition of all the selected languages.
However, selecting more than one language may reduce the accuracy
and increase the time it takes to perform ocr.
To specify any of the additional languages which are contained
in the Install OCR Language Data Files package, use the language
character vector the same way as the built-in languages. You do not
need to specify the path.
txt = ocr(img,'Language','Finnish');
List of Support Package OCR Languages
'Afrikaans'
'Albanian'
'AncientGreek'
'Arabic'
'Azerbaijani'
'Basque'
'Belarusian'
'Bengali'
'Bulgarian'
'Catalan'
'Cherokee'
'ChineseSimplified'
'ChineseTraditional'
'Croatian'
'Czech'
'Danish'
'Dutch'
'English'
'Esperanto'
'EsperantoAlternative'
'Estonian'
'Finnish'
'Frankish'
'French'
'Galician'
'German'
'Greek'
'Hebrew'
'Hindi'
'Hungarian'
'Icelandic'
'Indonesian'
'Italian'
'ItalianOld'
'Japanese'
'Kannada'
'Korean'
'Latvian'
'Lithuanian'
'Macedonian'
'Malay'
'Malayalam'
'Maltese'
'MathEquation'
'MiddleEnglish'
'MiddleFrench'
'Norwegian'
'Polish'
'Portuguese'
'Romanian'
'Russian'
'SerbianLatin'
'Slovakian'
'Slovenian'
'Spanish'
'SpanishOld'
'Swahili'
'Swedish'
'Tagalog'
'Tamil'
'Telugu'
'Thai'
'Turkish'
'Ukrainian'
To use your own custom languages, specify the path to the trained data file as the language
character vector. You must name the file in the format,
<language>.traineddata
. The file must be located
in a folder named 'tessdata
'. For
example:
txt = ocr(img,'Language','path/to/tessdata/eng.traineddata');
You
can load multiple custom languages as a cell array of character
vectors:
txt = ocr(img,'Language', ...
{'path/to/tessdata/eng.traineddata',...
'path/to/tessdata/jpn.traineddata'});
The
containing folder must always be the same for all the files specified in the cell array. In
the preceding example, all of the
traineddata
files in the cell array are
contained in the folder ‘
path/to/tessdata
’. Because the following code
points to two different containing folders, it does not work.
txt = ocr(img,'Language', ...
{'path/one/tessdata/eng.traineddata',...
'path/two/tessdata/jpn.traineddata'});
Some language files have a dependency on another language. For example, Hindi training
depends on English. If you want to use Hindi, the English
traineddata
file
must also exist in the same folder as the Hindi
traineddata
file. The
ocr
only supports
traineddata
files created using
tesseract-ocr
3.02 or using the
OCR
Trainer.
For deployment targets generated by MATLAB®
Coder™:
Generated ocr executable and language data file folder must be colocated.
The tessdata
folder must be named tessdata
:
For English: C:/path/tessdata/eng.traineddata
For Japanese: C:/path/tessdata/jpn.traineddata
For custom data files: C:/path/tessdata/customlang.traineddata
C:/path/ocr_app.exe
You can copy the English and Japanese trained data
files from:
fullfile(matlabroot, 'toolbox','vision','visionutilities','tessdata');