Main Content

Text Detection and Recognition

Detect and recognize text using image feature detection and description, deep learning, and OCR

Detecting and recognizing text in images is a common task performed in computer vision applications. For example, you can capture video of a road scene from a moving vehicle, recognize signposts in the captured scene, and alert the driver about the signs. The toolbox provides functions to detect and recognize text in multiple languages.

The first step in text recognition is to detect and segment the text regions in an image. To detect the text regions, use local image feature detectors and descriptors, or pretrained deep learning models trained to detect text in complex image scenes. The examples in the toolbox demonstrate how to use blob analysis, the maximally stable extremal regions (MSER) feature detector, and the character region awareness for text detection (CRAFT) deep learning model for text detection.

  • Blob analysis works well if the test image is a binarized image with text regions in the foreground. The method uses region statistics to effectively localize and extract text in the image foreground. Use segmentation approaches like image thresholding to binarize an image.

  • The MSER feature detector works well if the geometric characteristics of the text regions in the image are known in advance. Also, the text regions in the image must be high-contrast regions with uniform intensity or color values. The feature detector use geometric constraints to filter out non-text regions and detect text regions in images with both uniform and complex backgrounds.

  • The CRAFT model is a robust approach to detecting text regions in images regardless of factors like image background, contrast, and intensity values. Use the CRAFT model when segmenting the text regions in an image is difficult. This model requires more computational resources than other text detection approaches.

You can perform text segmentation as a preprocessing or post processing step for improving accuracy of text detection. To segment text from an image region, use image segmentation techniques such as image thresholding and clustering. For information about MATLAB® functions for image segmentation, see Image Segmentation. Alternatively, you can use the Color Thresholder and Image Segmenter apps to interactively segment the desired text regions in the image.

The next step is to recognize the text in the detected or segmented regions by using machine learning (ML) based classification or the optical character recognition (OCR) method. The ocr function uses the OCR Language Data support files from the OCR Engine page, Tesseract Open Source OCR Engine. The support files contain pretrained language data files for recognizing characters in multiple languages. You can download the additional language files using either the visionSupportPackages function or the Add-On Explorer. For more information on downloading add-ons, see Get and Manage Add-Ons. For procedures about how to install and use the OCR Language Data support files from Tesseract Open Source OCR Engine, see Install OCR Language Data Files.


OCR TrainerTrain an optical character recognition model to recognize a specific set of characters


alle erweitern

vision.BlobAnalysisProperties of connected regions
detectMSERFeaturesDetect MSER features and return MSERRegions object
detectTextCRAFTDetect texts in images by using CRAFT deep learning model
extractHOGFeaturesExtract histogram of oriented gradients (HOG) features
ocrRecognize text using optical character recognition
ocrTextObject for storing OCR results
visionSupportPackagesStart installer to download, install, or uninstall Computer Vision Toolbox data


Get Started

Use Optical Character Recognition