Main Content

Image Category Classification Using Bag of Features

This example shows how to use a bag of features approach for image category classification. This technique is also often referred to as bag of words. Visual image categorization is a process of assigning a category label to an image under test. Categories may contain images representing just about anything, for example, dogs, cats, trains, boats.

Load Data

Unzip a collection of images to use for this example.

unzip('MerchData.zip');

Load the image collection using an imageDatastore to help you manage the data. Because imageDatastore operates on image file locations, and therefore does not load all the images into memory, it is safe to use on large image collections.

imds = imageDatastore('MerchData','IncludeSubfolders',true,'LabelSource','foldernames');

You can easily inspect the number of images per category as well as category labels as shown below:

tbl = countEachLabel(imds)
tbl=5×2 table
             Label             Count
    _______________________    _____

    MathWorks Cap               15  
    MathWorks Cube              15  
    MathWorks Playing Cards     15  
    MathWorks Screwdriver       15  
    MathWorks Torch             15  

Note that the labels were derived from directory names used to construct the ImageDatastore, but can be customized by manually setting the Labels property of the ImageDatastore object. Next, display a few of the images to get a sense of the type of images being used.

figure
montage(imds.Files(1:16:end))

Note that for the bag of features approach to be effective, the majority of the object must be visible in the image.

Prepare Data for Training

Separate the sets into training and validation data. Pick 60% of images from each set for the training data and the remainder, 40%, for the validation data. Randomize the split to avoid biasing the results.

[trainingSet, validationSet] = splitEachLabel(imds, 0.6, 'randomize');

The above call returns two imageDatastore objects ready for training and validation tasks.

Create a Visual Vocabulary and Train an Image Category Classifier

Bag of words is a technique adapted to computer vision from the world of natural language processing. Since images do not actually contain discrete words, we first construct a "vocabulary" of extractFeatures features representative of each image category.

This is accomplished with a single call to bagOfFeatures function, which:

  1. extracts SURF features from all images in all image categories

  2. constructs the visual vocabulary by reducing the number of features through quantization of feature space using K-means clustering

bag = bagOfFeatures(trainingSet);
Creating Bag-Of-Features.
-------------------------
* Image category 1: MathWorks Cap
* Image category 2: MathWorks Cube
* Image category 3: MathWorks Playing Cards
* Image category 4: MathWorks Screwdriver
* Image category 5: MathWorks Torch
* Selecting feature point locations using the Grid method.
* Extracting SURF features from the selected feature point locations.
** The GridStep is [8 8] and the BlockWidth is [32 64 96 128].

* Extracting features from 45 images...done. Extracted 141120 features.

* Keeping 80 percent of the strongest features from each category.

* Creating a 500 word visual vocabulary.
* Number of levels: 1
* Branching factor: 500
* Number of clustering steps: 1

* [Step 1/1] Clustering vocabulary level 1.
* Number of features          : 112895
* Number of clusters          : 500
* Initializing cluster centers...100.00%.
* Clustering...completed 47/100 iterations (~0.62 seconds/iteration)...converged in 47 iterations.

* Finished creating Bag-Of-Features

Additionally, the bagOfFeatures object provides an encode method for counting the visual word occurrences in an image. It produced a histogram that becomes a new and reduced representation of an image.

img = readimage(imds, 1);
featureVector = encode(bag, img);
Encoding images using Bag-Of-Features.
--------------------------------------
* Encoding an image...done.
% Plot the histogram of visual word occurrences
figure
bar(featureVector)
title('Visual word occurrences')
xlabel('Visual word index')
ylabel('Frequency of occurrence')

This histogram forms a basis for training a classifier and for the actual image classification. In essence, it encodes an image into a feature vector.

Encoded training images from each category are fed into a classifier training process invoked by the trainImageCategoryClassifier function. Note that this function relies on the multiclass linear SVM classifier from the Statistics and Machine Learning Toolbox™.

categoryClassifier = trainImageCategoryClassifier(trainingSet, bag);
Training an image category classifier for 5 categories.
--------------------------------------------------------
* Category 1: MathWorks Cap
* Category 2: MathWorks Cube
* Category 3: MathWorks Playing Cards
* Category 4: MathWorks Screwdriver
* Category 5: MathWorks Torch

* Encoding features for 45 images...done.

* Finished training the category classifier. Use evaluate to test the classifier on a test set.

The above function utilizes the encode method of the input bag object to formulate feature vectors representing each image category from the trainingSet.

Evaluate Classifier

Now that we have a trained classifier, categoryClassifier, let's evaluate it. As a sanity check, let's first test it with the training set, which should produce near perfect confusion matrix, i.e. ones on the diagonal.

confMatrix = evaluate(categoryClassifier, trainingSet);
Evaluating image category classifier for 5 categories.
-------------------------------------------------------

* Category 1: MathWorks Cap
* Category 2: MathWorks Cube
* Category 3: MathWorks Playing Cards
* Category 4: MathWorks Screwdriver
* Category 5: MathWorks Torch

* Evaluating 45 images...done.

* Finished evaluating all the test sets.

* The confusion matrix for this test set is:


                                                                          PREDICTED
KNOWN                      | MathWorks Cap   MathWorks Cube   MathWorks Playing Cards   MathWorks Screwdriver   MathWorks Torch   
----------------------------------------------------------------------------------------------------------------------------------
MathWorks Cap              | 1.00            0.00             0.00                      0.00                    0.00              
MathWorks Cube             | 0.00            0.89             0.00                      0.00                    0.11              
MathWorks Playing Cards    | 0.00            0.00             1.00                      0.00                    0.00              
MathWorks Screwdriver      | 0.00            0.00             0.00                      1.00                    0.00              
MathWorks Torch            | 0.00            0.00             0.00                      0.00                    1.00              

* Average Accuracy is 0.98.

Next, let's evaluate the classifier on the validationSet, which was not used during the training. By default, the evaluate function returns the confusion matrix, which is a good initial indicator of how well the classifier is performing.

confMatrix = evaluate(categoryClassifier, validationSet);
Evaluating image category classifier for 5 categories.
-------------------------------------------------------

* Category 1: MathWorks Cap
* Category 2: MathWorks Cube
* Category 3: MathWorks Playing Cards
* Category 4: MathWorks Screwdriver
* Category 5: MathWorks Torch

* Evaluating 30 images...done.

* Finished evaluating all the test sets.

* The confusion matrix for this test set is:


                                                                          PREDICTED
KNOWN                      | MathWorks Cap   MathWorks Cube   MathWorks Playing Cards   MathWorks Screwdriver   MathWorks Torch   
----------------------------------------------------------------------------------------------------------------------------------
MathWorks Cap              | 1.00            0.00             0.00                      0.00                    0.00              
MathWorks Cube             | 0.00            0.67             0.17                      0.17                    0.00              
MathWorks Playing Cards    | 0.00            0.00             1.00                      0.00                    0.00              
MathWorks Screwdriver      | 0.00            0.00             0.00                      1.00                    0.00              
MathWorks Torch            | 0.17            0.00             0.00                      0.00                    0.83              

* Average Accuracy is 0.90.
% Compute average accuracy
mean(diag(confMatrix))
ans = 0.9000

You can tune bagOfFeatures hyperparameters and continue evaluating the trained classifier until you are satisfied with the results. Additional statistics can be derived using the rest of arguments returned by the evaluate function. See help for imageCategoryClassifier/evaluate.

Classify Object in Image

You can now apply the newly trained classifier to categorize new images.

img = imread(fullfile('MerchData','MathWorks Cap','Hat_0.jpg'));
figure
imshow(img)

[labelIdx, scores] = predict(categoryClassifier, img);
Encoding images using Bag-Of-Features.
--------------------------------------
* Encoding an image...done.
% Display the string label
categoryClassifier.Labels(labelIdx)
ans = 1x1 cell array
    {'MathWorks Cap'}