Automate Ground Truth Labeling for Semantic Segmentation
This example shows how to use a pretrained semantic segmentation algorithm to segment the sky and road in an image, and use this algorithm to automate ground truth labeling in the Ground Truth Labeler (Automated Driving Toolbox) app.
The Ground Truth Labeler App
Good ground truth data is crucial for developing automated driving algorithms and evaluating their performance. However, creating and maintaining a diverse and high-quality set of annotated driving data requires significant effort. The Ground Truth Labeler (Automated Driving Toolbox) app makes this process easy and efficient. This app includes features to annotate objects as rectangles, lines, or pixel labels. Pixel labeling is a process in which each pixel in an image is assigned a class or category, which can then be used to train a pixel-level segmentation algorithm. Although you can use the app to manually label all your data, this process requires a significant amount of time and resources, especially for pixel labeling. As an alternative, the app also provides a framework to incorporate algorithms to extend and automate the labeling process. You can use the algorithms you create to automatically label entire data sets, and then end with a more efficient, shorter manual verification step. You can also edit the results of the automation step to account for challenging scenarios that the algorithm might have missed.
In this example, you will:
Use a pretrained segmentation algorithm to segment pixels that belong to the categories 'Road' and 'Sky'.
Create an automation algorithm that can be used in the Ground Truth Labeler app to automatically label road and sky pixels.
This ground truth data can then be used to train a new semantic segmentation network, or retrain an existing one.
Create a Road and Sky Detection Algorithm
First, create a semantic segmentation algorithm that segments road and sky pixels in an image. The Semantic Segmentation Using Deep Learning example describes how to train a deep learning network for semantic segmentation. This network has been trained to predict 11 classes of semantic labels including 'Road' and 'Sky'. The performance of these networks depends on how generalizable they are. Applying the networks to situations they did not encounter during training can lead to subpar results. Iteratively introducing custom training data to the learning process can make the network perform better on similar data sets.
Download a network, which was pretrained on the CamVid dataset [1] from the University of Cambridge.
pretrainedURL = "https://ssd.mathworks.com/supportfiles/vision/data/deeplabv3plusResnet18CamVid_v2.zip"; pretrainedFolder = fullfile(tempdir,"pretrainedNetwork"); pretrainedNetworkZip = fullfile(pretrainedFolder,"deeplabv3plusResnet18CamVid_v2.zip"); if ~exist(pretrainedNetworkZip,'file') mkdir(pretrainedFolder); disp("Downloading pretrained network (58 MB)..."); websave(pretrainedNetworkZip,pretrainedURL); end unzip(pretrainedNetworkZip, pretrainedFolder)
Load the pretrained network.
pretrainedNetwork = fullfile(pretrainedFolder,"deeplabv3plusResnet18CamVid_v2.mat"); % Load the semantic segmentation network data = load(pretrainedNetwork);
Set the classes this network has been trained to classify.
classes = getClassNames()
classes = 11×1 string
"Sky"
"Building"
"Pole"
"Road"
"Pavement"
"Tree"
"SignSymbol"
"Fence"
"Car"
"Pedestrian"
"Bicyclist"
Segment an image and display it.
% Load a test image from drivingdata roadSequenceData = fullfile(toolboxdir('driving'), 'drivingdata', 'roadSequence'); I = imread(fullfile(roadSequenceData, 'f00001.png')); inputSize = data.net.Layers(1).InputSize; I = imresize(I,inputSize(1:2)); % Run the network on the image automatedLabels = semanticseg(I, data.net, Classes=classes); % Display the labels overlaid on the image, choosing relevant categories figure imshow(labeloverlay(I, automatedLabels, IncludedLabels=["Sky", "Road"]));
The blue overlay indicates the 'Sky' category, and the green overlay indicates 'Road'.
Integrate Pixel Segmentation Algorithm Into Ground Truth Labeler
Incorporate this semantic segmentation algorithm into the automation workflow of the app by creating a class that inherits from the abstract base class vision.labeler.AutomationAlgorithm
. This base class defines the API that the app uses to configure and run the algorithm. The Ground Truth Labeler app provides a convenient way to obtain an initial automation class template. For details, see Create Automation Algorithm for Labeling. The RoadAndSkySegmentation
class is based on this template and provides a ready-to-use automation class for pixel label segmentation.
The first set of properties in the RoadAndSkySegmentation
class specify the name of the algorithm, provide a brief description of it, and give directions for using it.
properties(Constant) % Name % Character vector specifying name of algorithm. Name = 'RoadAndSkySegmentation' % Description % Character vector specifying short description of algorithm. Description = 'This algorithm uses semanticseg with a pretrained network to annotate roads and sky' % UserDirections % Cell array of character vectors specifying directions for % algorithm users to follow in order to use algorithm. UserDirections = {... ['Automation algorithms are a way to automate manual labeling ' ... 'tasks. This AutomationAlgorithm automatically creates pixel ', ... 'labels for road and sky.'], ... ['Review and Modify: Review automated labels over the interval ', ... 'using playback controls. Modify/delete/add ROIs that were not ' ... 'satisfactorily automated at this stage. If the results are ' ... 'satisfactory, click Accept to accept the automated labels.'], ... ['Accept/Cancel: If results of automation are satisfactory, ' ... 'click Accept to accept all automated labels and return to ' ... 'manual labeling. If results of automation are not ' ... 'satisfactory, click Cancel to return to manual labeling ' ... 'without saving automated labels.']}; % ClassNames % String array of all the classes the pretrained network has been % trained on. ClassNames = [ "Sky" "Building" "Pole" "Road" "Pavement" "Tree" "SignSymbol" "Fence" "Car" "Pedestrian" "Bicyclist" ] end
The next section of the RoadAndSkySegmentation
class specifies the custom properties needed by the core algorithm. The PretrainedNetwork
property holds the pretrained network. The AllCategories
property holds the names of all the categories.
properties % PretrainedNetwork saves the SeriesNetwork object that does the semantic % segmentation. PretrainedNetwork % Categories holds the default 'background', 'road', and 'sky' % categorical types. AllCategories = {'background'}; % Store names for 'road' and 'sky'. RoadName SkyName end
checkLabelDefinition
, the first method defined in RoadAndSkySegmentation
, checks that only labels of type PixelLabel
are enabled for automation. PixelLabel
is the only type needed for semantic segmentation.
function TF = checkLabelDefinition(~, labelDef) isValid = false; if (strcmpi(labelDef.Name, 'road') && labelDef.Type == labelType.PixelLabel) isValid = true; algObj.RoadName = labelDef.Name; algObj.AllCategories{end+1} = labelDef.Name; elseif (strcmpi(labelDef.Name, 'sky') && labelDef.Type == labelType.PixelLabel) isValid = true; algObj.SkyName = labelDef.Name; algObj.AllCategories{end+1} = labelDef.Name; elseif(labelDef.Type == labelType.PixelLabel) isValid = true; end end
The next set of functions control the execution of the algorithm. The vision.labeler.AutomationAlgorithm
class includes an interface that contains methods like 'initialize'
, 'run'
, and 'terminate'
for setting up and running the automation with ease. The initialize
function populates the initial algorithm state based on the existing labels in the app. In the RoadAndSkySegmentation
class, the initialize
function has been customized to load the pretrained semantic segmentation network from tempdir
and save it to the PretrainedNetwork
property.
function initialize(algObj, ~, ~) % Point to tempdir where pretrainedNetwork was downloaded. pretrainedFolder = fullfile(tempdir,'pretrainedNetwork'); pretrainedNetwork = fullfile(pretrainedFolder,'deeplabv3plusResnet18CamVid_v2.mat'); data = load(pretrainedNetwork); % Store the network in the 'net' property of this object. algObj.PretrainedNetwork = data.net; end
Next, the run
function defines the core semantic segmentation algorithm of this automation class. run
is called for each video frame, and expects the automation class to return a set of labels. The run
function in RoadAndSkySegmentation
contains the logic introduced previously for creating a categorical matrix of pixel labels corresponding to "Road" and "Sky". This can be extended to any categories the network is trained on, and is restricted to these two for illustration only.
function autoLabels = run(algObj, I) % Resize image to expected network size img = imresize(I,algObj.PretrainedNetwork.Layers(1).InputSize(1:2)); % Setup categorical matrix with categories including road and % sky autoLabels = categorical(zeros(size(I,1), size(I,2)),0:2,algObj.AllCategories,Ordinal=true); pixelCat = semanticseg(img,algObj.PretrainedNetwork,Classes=algObj.ClassNames); % Resize the categorical matrix back to the size of the % original input image. pixelCatResized = imresize(pixelCat,size(I,1:2)); if ~isempty(pixelCatResized) % Add the selected label at the bounding box position(s) autoLabels(pixelCatResized == "Road") = algObj.RoadName; autoLabels(pixelCatResized == "Sky") = algObj.SkyName; end end
This algorithm does not require any cleanup, so the terminate
function is empty.
Use the Pixel Segmentation Automation Class in the App
The properties and methods described in the previous section have been implemented in the RoadAndSkySegmentation
automation algorithm class file. To use this class in the app:
Create the folder structure
+vision/+labeler
required under the current folder, and copy the automation class into it.
mkdir('+vision/+labeler'); copyfile('RoadAndSkySegmentation.m','+vision/+labeler');
Open the
groundTruthLabeler
app with custom data to label. For illustration purposes, open thecaltech_cordova1.avi
video.
groundTruthLabeler caltech_cordova1.avi
On the left pane, click the Define new ROI label button and define two ROI labels with names
Road
andSky
, of typePixel
label
as shown.
Click Algorithm > Select Algorithm > Refresh list.
Click Algorithm > RoadAndSkySegmentation. If you do not see this option, ensure that the current working folder has a folder called
+vision/+labeler
, with a file namedRoadAndSkySegmentation.m
in it.Click Automate. A new panel opens, displaying directions for using the algorithm.
Click Run. The created algorithm executes on each frame of the video, segmenting "Road" and "Sky" categories. After the run is completed, use the slider or arrow keys to scroll through the video and verify the result of the automation algorithm.
The automation does pretty well in labeling the sky, but it is evident that parts of the ego vehicle itself are marked as "Road". These results indicate that the network has not been previously trained on such data. This workflow allows for making manual corrections to these results, so that an iterative process of training and labeling (sometimes called active learning or human in the loop ) can be used to further refine the accuracy of the network on custom data sets. You can manually tweak the results by using the brush tool in the Label Pixels tab and adding or removing pixel annotations. Other tools like flood fill and smart polygons are also available in the Label Pixels tab and can be used when appropriate.
Once you are satisfied with the pixel label categories for the entire video, click Accept.
Automation for pixel labeling for the video is complete. You can now proceed with labeling other objects of interest, save the session, or export the results of this labeling run.
Conclusion
This example showed how to use a pretrained semantic segmentation network to accelerate labeling of road and sky pixels in the Ground Truth Labeler app using the AutomationAlgorithm
interface.
Supporting Functions
function classes = getClassNames() classes = [ "Sky" "Building" "Pole" "Road" "Pavement" "Tree" "SignSymbol" "Fence" "Car" "Pedestrian" "Bicyclist" ]; end
References
[1] Brostow, G. J., J. Fauqueur, and R. Cipolla. "Semantic object classes in video: A high-definition ground truth database." Pattern Recognition Letters. Vol. 30, Issue 2, 2009, pp 88-97.