Automate Ground Truth Labeling for Semantic Segmentation

This example uses:

This example shows how to use a pretrained semantic segmentation algorithm to segment the sky and road in an image, and use this algorithm to automate ground truth labeling in the Ground Truth Labeler (Automated Driving Toolbox) app.

The Ground Truth Labeler App

Good ground truth data is crucial for developing automated driving algorithms and evaluating their performance. However, creating and maintaining a diverse and high-quality set of annotated driving data requires significant effort. The Ground Truth Labeler (Automated Driving Toolbox) app makes this process easy and efficient. This app includes features to annotate objects as rectangles, lines, or pixel labels. Pixel labeling is a process in which each pixel in an image is assigned a class or category, which can then be used to train a pixel-level segmentation algorithm. Although you can use the app to manually label all your data, this process requires a significant amount of time and resources, especially for pixel labeling. As an alternative, the app also provides a framework to incorporate algorithms to extend and automate the labeling process. You can use the algorithms you create to automatically label entire data sets, and then end with a more efficient, shorter manual verification step. You can also edit the results of the automation step to account for challenging scenarios that the algorithm might have missed.

In this example, you will:

Use a pretrained segmentation algorithm to segment pixels that belong to the categories 'Road' and 'Sky'.
Create an automation algorithm that can be used in the Ground Truth Labeler app to automatically label road and sky pixels.

This ground truth data can then be used to train a new semantic segmentation network, or retrain an existing one.

Create a Road and Sky Detection Algorithm

First, create a semantic segmentation algorithm that segments road and sky pixels in an image. The Semantic Segmentation Using Deep Learning example describes how to train a deep learning network for semantic segmentation. This network has been trained to predict 11 classes of semantic labels including 'Road' and 'Sky'. The performance of these networks depends on how generalizable they are. Applying the networks to situations they did not encounter during training can lead to subpar results. Iteratively introducing custom training data to the learning process can make the network perform better on similar data sets.

Download a network, which was pretrained on the CamVid dataset [1] from the University of Cambridge.

pretrainedURL = "https://ssd.mathworks.com/supportfiles/vision/data/deeplabv3plusResnet18CamVid_v2.zip";
pretrainedFolder = fullfile(tempdir,"pretrainedNetwork");
pretrainedNetworkZip = fullfile(pretrainedFolder,"deeplabv3plusResnet18CamVid_v2.zip"); 
if ~exist(pretrainedNetworkZip,'file')
    mkdir(pretrainedFolder);
    disp("Downloading pretrained network (58 MB)...");
    websave(pretrainedNetworkZip,pretrainedURL);
end
unzip(pretrainedNetworkZip, pretrainedFolder)

Load the pretrained network.

pretrainedNetwork = fullfile(pretrainedFolder,"deeplabv3plusResnet18CamVid_v2.mat");

% Load the semantic segmentation network
data = load(pretrainedNetwork);

Set the classes this network has been trained to classify.

classes = getClassNames()

classes = 11×1 string
    "Sky"
    "Building"
    "Pole"
    "Road"
    "Pavement"
    "Tree"
    "SignSymbol"
    "Fence"
    "Car"
    "Pedestrian"
    "Bicyclist"

Segment an image and display it.

% Load a test image from drivingdata
roadSequenceData = fullfile(toolboxdir('driving'), 'drivingdata', 'roadSequence');
I = imread(fullfile(roadSequenceData, 'f00001.png'));
inputSize = data.net.Layers(1).InputSize;
I = imresize(I,inputSize(1:2));

% Run the network on the image
automatedLabels = semanticseg(I, data.net, Classes=classes);

% Display the labels overlaid on the image, choosing relevant categories
figure
imshow(labeloverlay(I, automatedLabels, IncludedLabels=["Sky", "Road"]));

Figure contains an axes object. The axes object contains an object of type image.

The blue overlay indicates the 'Sky' category, and the green overlay indicates 'Road'.

Integrate Pixel Segmentation Algorithm Into Ground Truth Labeler

Incorporate this semantic segmentation algorithm into the automation workflow of the app by creating a class that inherits from the abstract base class vision.labeler.AutomationAlgorithm. This base class defines the API that the app uses to configure and run the algorithm. The Ground Truth Labeler app provides a convenient way to obtain an initial automation class template. For details, see Create Automation Algorithm for Labeling. The RoadAndSkySegmentation class is based on this template and provides a ready-to-use automation class for pixel label segmentation.

The first set of properties in the RoadAndSkySegmentation class specify the name of the algorithm, provide a brief description of it, and give directions for using it.

    properties(Constant)
        % Name
        %   Character vector specifying name of algorithm.
        Name = 'RoadAndSkySegmentation'

        % Description
        %   Character vector specifying short description of algorithm.
        Description = 'This algorithm uses semanticseg with a pretrained network to annotate roads and sky'

        % UserDirections
        %   Cell array of character vectors specifying directions for
        %   algorithm users to follow in order to use algorithm.
        UserDirections = {...
            ['Automation algorithms are a way to automate manual labeling ' ...
            'tasks. This AutomationAlgorithm automatically creates pixel ', ...
            'labels for road and sky.'], ...
            ['Review and Modify: Review automated labels over the interval ', ...
            'using playback controls. Modify/delete/add ROIs that were not ' ...
            'satisfactorily automated at this stage. If the results are ' ...
            'satisfactory, click Accept to accept the automated labels.'], ...
            ['Accept/Cancel: If results of automation are satisfactory, ' ...
            'click Accept to accept all automated labels and return to ' ...
            'manual labeling. If results of automation are not ' ...
            'satisfactory, click Cancel to return to manual labeling ' ...
            'without saving automated labels.']};

        % ClassNames
        %   String array of all the classes the pretrained network has been
        %   trained on.
        ClassNames = [
            "Sky"
            "Building"
            "Pole"
            "Road"
            "Pavement"
            "Tree"
            "SignSymbol"
            "Fence"
            "Car"
            "Pedestrian"
            "Bicyclist"
            ]
    end

The next section of the RoadAndSkySegmentation class specifies the custom properties needed by the core algorithm. The PretrainedNetwork property holds the pretrained network. The AllCategories property holds the names of all the categories.

    properties
        % PretrainedNetwork saves the SeriesNetwork object that does the semantic
        % segmentation.
        PretrainedNetwork

        % Categories holds the default 'background', 'road', and 'sky'
        % categorical types.
        AllCategories = {'background'};

        % Store names for 'road' and 'sky'.
        RoadName
        SkyName
    end

checkLabelDefinition, the first method defined in RoadAndSkySegmentation, checks that only labels of type PixelLabel are enabled for automation. PixelLabel is the only type needed for semantic segmentation.

    function TF = checkLabelDefinition(~, labelDef)
        isValid = false;

        if (strcmpi(labelDef.Name, 'road') && labelDef.Type == labelType.PixelLabel)
            isValid = true;
            algObj.RoadName = labelDef.Name;
            algObj.AllCategories{end+1} = labelDef.Name;
        elseif (strcmpi(labelDef.Name, 'sky') && labelDef.Type == labelType.PixelLabel)
            isValid = true;
            algObj.SkyName = labelDef.Name;
            algObj.AllCategories{end+1} = labelDef.Name;
        elseif(labelDef.Type == labelType.PixelLabel)
            isValid = true;
        end
    end

The next set of functions control the execution of the algorithm. The vision.labeler.AutomationAlgorithm class includes an interface that contains methods like 'initialize', 'run', and 'terminate' for setting up and running the automation with ease. The initialize function populates the initial algorithm state based on the existing labels in the app. In the RoadAndSkySegmentation class, the initialize function has been customized to load the pretrained semantic segmentation network from tempdir and save it to the PretrainedNetwork property.

    function initialize(algObj, ~, ~)
        % Point to tempdir where pretrainedNetwork was downloaded.
        pretrainedFolder = fullfile(tempdir,'pretrainedNetwork');
        pretrainedNetwork = fullfile(pretrainedFolder,'deeplabv3plusResnet18CamVid_v2.mat'); 
        data = load(pretrainedNetwork);
            
        % Store the network in the 'net' property of this object.
        algObj.PretrainedNetwork = data.net;
    end

Next, the run function defines the core semantic segmentation algorithm of this automation class. run is called for each video frame, and expects the automation class to return a set of labels. The run function in RoadAndSkySegmentation contains the logic introduced previously for creating a categorical matrix of pixel labels corresponding to "Road" and "Sky". This can be extended to any categories the network is trained on, and is restricted to these two for illustration only.

    function autoLabels = run(algObj, I)
        % Resize image to expected network size
        img = imresize(I,algObj.PretrainedNetwork.Layers(1).InputSize(1:2));

        % Setup categorical matrix with categories including road and
        % sky
        autoLabels = categorical(zeros(size(I,1), size(I,2)),0:2,algObj.AllCategories,Ordinal=true);
        
        pixelCat = semanticseg(img,algObj.PretrainedNetwork,Classes=algObj.ClassNames);
        
        % Resize the categorical matrix back to the size of the
        % original input image.
        pixelCatResized = imresize(pixelCat,size(I,1:2));
        if ~isempty(pixelCatResized)
            % Add the selected label at the bounding box position(s)
            autoLabels(pixelCatResized == "Road") = algObj.RoadName;
            autoLabels(pixelCatResized == "Sky") = algObj.SkyName;
        end

    end

This algorithm does not require any cleanup, so the terminate function is empty.

Use the Pixel Segmentation Automation Class in the App

The properties and methods described in the previous section have been implemented in the RoadAndSkySegmentation automation algorithm class file. To use this class in the app:

Create the folder structure +vision/+labeler required under the current folder, and copy the automation class into it.

    mkdir('+vision/+labeler');
    copyfile('RoadAndSkySegmentation.m','+vision/+labeler');

Open the groundTruthLabeler app with custom data to label. For illustration purposes, open the caltech_cordova1.avi video.

    groundTruthLabeler caltech_cordova1.avi

On the left pane, click the Define new ROI label button and define two ROI labels with names Road and Sky, of type Pixel label as shown.

Click Algorithm > Select Algorithm > Refresh list.

Click Algorithm > RoadAndSkySegmentation. If you do not see this option, ensure that the current working folder has a folder called +vision/+labeler, with a file named RoadAndSkySegmentation.m in it.
Click Automate. A new panel opens, displaying directions for using the algorithm.

Click Run. The created algorithm executes on each frame of the video, segmenting "Road" and "Sky" categories. After the run is completed, use the slider or arrow keys to scroll through the video and verify the result of the automation algorithm.

The automation does pretty well in labeling the sky, but it is evident that parts of the ego vehicle itself are marked as "Road". These results indicate that the network has not been previously trained on such data. This workflow allows for making manual corrections to these results, so that an iterative process of training and labeling (sometimes called active learning or human in the loop ) can be used to further refine the accuracy of the network on custom data sets. You can manually tweak the results by using the brush tool in the Label Pixels tab and adding or removing pixel annotations. Other tools like flood fill and smart polygons are also available in the Label Pixels tab and can be used when appropriate.

Once you are satisfied with the pixel label categories for the entire video, click Accept.

Automation for pixel labeling for the video is complete. You can now proceed with labeling other objects of interest, save the session, or export the results of this labeling run.

Conclusion

This example showed how to use a pretrained semantic segmentation network to accelerate labeling of road and sky pixels in the Ground Truth Labeler app using the AutomationAlgorithm interface.

Supporting Functions

function classes = getClassNames()
classes = [
    "Sky"
    "Building"
    "Pole"
    "Road"
    "Pavement"
    "Tree"
    "SignSymbol"
    "Fence"
    "Car"
    "Pedestrian"
    "Bicyclist"
    ];
end

References

[1] Brostow, G. J., J. Fauqueur, and R. Cipolla. "Semantic object classes in video: A high-definition ground truth database." Pattern Recognition Letters. Vol. 30, Issue 2, 2009, pp 88-97.