Preprocess Volumes for Deep Learning

Read Volumetric Data

Supported file formats for volumetric image data include MAT-files, Digital Imaging and Communications in Medicine (DICOM) files, and Neuroimaging Informatics Technology Initiative (NIfTI) files.

Read volumetric image data into an ImageDatastore. Read volumetric pixel label data into a PixelLabelDatastore. When you create the datastore, specify the 'FileExtensions' argument as the file extensions of your data. Specify the ReadFcn property as a function handle that reads data of the file format. For more information, see Datastores for Deep Learning.

The table shows how to create an image or pixel label datastore for each of the supported file formats. The filepath argument specifies the path to the files or folder containing image data. For pixel label images, the additional classNames and pixelLabelID arguments specify the mapping of voxel label values to class names.

Image File Format

Create Image Datastore

Create Pixel Label Datastore

MAT

volds = imageDatastore(filepath, ...
   'FileExtensions','.mat','ReadFcn',@(x) matRead(x));
matRead is a custom function that you write to read data from a .MAT file. For a sample implementation, see Define Custom Function to Read MAT Files.

pxds = pixelLabelDatastore(filepath,classNames,pixelLabelID, ...
    'FileExtensions','.mat','ReadFcn',@(x) matRead(x));
matRead is a custom function that you write to read data from a .MAT file. For a sample implementation, see Define Custom Function to Read MAT Files.

DICOM volume in single file

volds = imageDatastore(filepath, ...
   'FileExtensions','.dcm','ReadFcn',@(x) dicomread(x));

For more information about the DICOM file format, see dicomread.

pxds = pixelLabelDatastore(filepath,classNames,pixelLabelID, ...
   'FileExtensions','.dcm','ReadFcn',@(x) dicomread(x));

For more information about the DICOM file format, see dicomread.

DICOM volume in multiple files

Create an ImageDatastore from a collection of DICOM files by following these steps.

  • Aggregate the files into a single study by using the dicomCollection function.

  • Read the DICOM data in the study by using the dicomreadVolume function.

  • Write each volume as a .MAT file.

  • Create an ImageDatastore from the collection of .MAT files.

For an example of these steps, see Read Multi-File DICOM Volumes.

Create a PixelLabelDatastore from a collection of DICOM files by following these steps.

  • Aggregate the files into a single study by using the dicomCollection function.

  • Read the DICOM data in the study by using the dicomreadVolume function.

  • Write each volume as a .MAT file.

  • Create a PixelLabelDatastore from the collection of .MAT files, class names, and pixel label IDs.

For an example of these steps, see Read Multi-File DICOM Volumes.

NIfTI

volds = imageDatastore(filepath, ...
   'FileExtensions','.nii','ReadFcn',@(x) niftiread(x));

For more information about the NIfTI file format, see niftiread.

pxds = pixelLabelDatastore(filepath,classNames,pixelLabelID, ...
   'FileExtensions','.nii','ReadFcn',@(x) niftiread(x));

For more information about the NIfTI file format, see niftiread.

Define Custom Function to Read MAT Files

To read data from a .MAT file, you must define a custom read function. For example, this code creates a function called matRead that loads volume data from the first variable of a .MAT file. Save the function in a file called matRead.m.

function data = matRead(filename)
% data = matRead(filename) reads the image data in the MAT-file filename

inp = load(filename);
f = fields(inp);
data = inp.(f{1});
end

Customize your custom read function according to how your image data is stored in .MAT files.

Example: Prepare Datastore Containing Single and Multi-File DICOM Volumes

This example shows how to create an imageDatastore or PixelLabelDatastore from a set of DICOM files that comprise a 3-D volume.

Specify the directory that contains the DICOM files. The directory can include files that contain a 2-D image, files that contain a complete 3-D volume, and files that contain 2-D slices of a 3-D volume. The datastore will include only the files that contain 3-D data, either as a complete 3-D volume in a single file or as 2-D slices of a 3-D volume.

dicomDir = fullfile(matlabroot,'toolbox','images','imdata');

Gather details about the DICOM files by using the dicomCollection function. This function returns the details as a table, where each row represents a single study. The function aggregates the files of a multi-file DICOM volume into a single study. The file names of a multi-file DICOM volume are listed in a string array in the Filenames variable.

collection = dicomCollection(dicomDir,'IncludeSubfolders',true)

Create a directory to store the processed DICOM volumes.

matFileDir = fullfile(tempdir,'MATFiles');
if ~exist(matFileDir,'dir')
    mkdir(matFileDir)
end

For every study in the collection, get the file names that comprise the study. Try reading the data of the study by using the dicomreadVolume function.

  • If the data is contained in multiple files, then dicomreadVolume runs successfully and returns the complete volume in a single 4-D array. This volume can be included in the datastore.

  • If the data is contained in a single file, then dicomreadVolume does not run successfully. In this case, read the data by using the dicomread function.

    • If dicomread returns a 4-D array, then the study contains a complete 3-D volume that can be included in the datastore.

    • If dicomread returns a 2-D matrix or 3-D array, then the study contains a single 2-D image. In this case, omit the image data from the datastore and move on to the next study in the collection.

For complete volumes returned in a 4-D array, write the data to a .MAT file. This example also writes the absolute file name of the DICOM file, 'dicomFileName', as a second variable. For multiple file DICOM volumes, 'dicomFileName' is a string array of all individual DICOM files.

for idx = 1:numel(collection.Row)
    dicomFileName = collection.Filenames{idx};
    if length(dicomFileName) > 1
        matFileName = fileparts(dicomFileName(1));
        matFileName = split(matFileName,filesep);
        matFileName = replace(strtrim(matFileName(end))," ","_");
    else
        [~,matFileName] = fileparts(dicomFileName);
    end
    matFileName = fullfile(matFileDir,matFileName);
    
    try
        V = dicomreadVolume(collection,collection.Row{idx});
    catch ME
        V = dicomread(dicomFileName);
        if ndims(V)<4
            % Skip files that are not volumes
            continue;
        end
    end
    
    % For multi-file DICOM, dicomFileName is a string array.    
    save(matFileName,'V','dicomFileName');
    
end

If the volumes represent image data, then create an imageDatastore from the .MAT files containing the volumes. You can specify the ReadFcn property as the matRead function from Define Custom Function to Read MAT Files.

imdsdicom = imageDatastore(matFileDir,'FileExtensions','.mat', ...
    'ReadFcn',@matReader);

If the volumes represent pixel label data, then create a PixelLabelDatastore from the .MAT files containing the volumes. You can specify the ReadFcn property as the matRead function from Define Custom Function to Read MAT Files. The arguments classNames and pixelLabelID are vectors that specify the mapping of voxel label values to class names.

pxdsdicom = pixelLabelDatastore(matFileDir,classNames,pixelLabelID, ...
    'FileExtensions','.mat','ReadFcn',@(x) matRead(x));

Associate Image and Label Data

To associate volumetric image and label data for semantic segmentation, or two volumetric image datastores for regression, use a randomPatchExtractionDatastore. A random patch extraction datastore extracts corresponding randomly-positioned patches from two datastores. Patching is a common technique to prevent running out of memory when training with arbitrarily large volumes. Specify a patch size that matches the input size of the network and, for memory efficiency, is smaller than the full size of the volume, such as 64-by-64-by-64 voxels.

You can also use the combine function to associate two datastores. However, associating two datastores using a randomPatchExtractionDatastore has several benefits over combine.

  • randomPatchExtractionDatastore supports parallel training, multi-GPU training, and prefetch reading. Specify parallel or multi-GPU training using the 'ExecutionEnvironment' name-value pair argument of trainingOptions. Specify prefetch reading using the 'DispatchInBackground' name-value pair argument of trainingOptions. Prefetch reading requires Parallel Computing Toolbox™.

  • randomPatchExtractionDatastore inherently supports patch extraction. In contrast, to extract patches from a CombinedDatastore, you must define your own function that crops images into patches, and then use the transform function to apply the cropping operations.

  • randomPatchExtractionDatastore can generate several image patches from one test image. One-to-many patch extraction effectively increases the amount of available training data.

Preprocess Volumetric Data

Deep learning frequently requires the data to be preprocessed and augmented. For example, you may want to normalize image intensities, enhance image contrast, or add randomized affine transformations to prevent overfitting.

To preprocess volumetric data, use the transform function. transform creates an altered form of a datastore, called an underlying datastore, by transforming the data read by the underlying datastore according to the set of operations you define in a custom function. Image Processing Toolbox™ provides several functions that accept volumetric input. For a full list of functions, see 3-D Volumetric Image Processing (Image Processing Toolbox). You can also preprocess volumetric images using functions in MATLAB® that work on multidimensional arrays.

The custom transformation function must accept data in the format returned by the read function of the underlying datastore.

Underlying Datastore

Format of Input to Custom Transformation Function

ImageDatastore

The input to the custom transformation function depends on the ReadSize property.

  • When ReadSize is 1, the transformation function must accept an integer array. The size of the array is consistent with the type of images in the ImageDatastore. For example, a grayscale image has size m-by-n, a truecolor image has size m-by-n-by-3, and a multispectral image with c channels has size m-by-n-by-c.

  • When ReadSize is greater than 1, the transformation function must accept a cell array of image data corresponding to each image in the batch.

For more information, see the read function of ImageDatastore.

PixelLabelDatastore

The input to the custom transformation function depends on the ReadSize property.

  • When ReadSize is 1, the transformation function must accept a categorical matrix.

  • When ReadSize is greater than 1, the transformation function must accept a cell array of categorical matrices.

For more information, see the read function of PixelLabelDatastore.

randomPatchExtractionDatastore

The input to the custom transformation function must be a table with two columns.

For more information, see the read function of randomPatchExtractionDatastore.

RandomPatchExtractionDatastore does not support the DataAugmentation property for volumetric data. To apply random affine transformations to volumetric data, you must use transform.

The transform function must return data that matches the input size of the network. The transform function does not support one-to-many observation mappings.

Example: Transform Volumetric Data in Image Datastore

This sample code shows how to transform volumetric data in image datastore volds using an arbitrary preprocessing pipeline defined in the function preprocessVolumetricIMDS. The example assumes that the ReadSize of volds is greater than 1.

dsTrain = transform(volds,@(x) preprocessVolumetricIMDS(x,inputSize));

Define the preprocessVolumetricIMDS function that performs the desired transformations of data read from the underlying datastore. The function must accept a cell array of image data. The function loops through each read image and transforms the data according to this preprocessing pipeline:

  • Randomly rotate the image about the z-axis.

  • Resize the volume to the size expected by the network.

  • Create a noisy version of the image with Gaussian noise.

  • Return the image in a cell array.

function dataOut = preprocessVolumetricIMDS(data,inputSize)
 
numRows = size(data,1);
dataOut = cell(numRows,1);
 
for idx = 1:numRows
    
    % Perform randomized 90 degree rotation about the z-axis
    data = imrotate3(data{idx,1},90*(randi(4)-1),[0 0 1]);

    % Resize the volume to the size expected by the network
    dataClean = imresize(data,inputSize);
    
    % Add zero-mean Gaussian noise with a normalized variance of 0.01
    dataNoisy = imnoise(dataClean,'gaussian',0.01);

    % Return the preprocessed data
    dataOut(idx) = dataNoisy;
    
end
end

Example: Transform Volumetric Data in Random Patch Extraction Datastore

This sample code shows how to transform volumetric data in random patch extraction datastore volds using an arbitrary preprocessing pipeline defined in the function preprocessVolumetricPatchDS. The example assumes that the ReadSize of volds is 1.

dsTrain = transform(volds,@preprocessVolumetricPatchDS);

Define the preprocessVolumetricPatchDS function that performs the desired transformations of data read from the underlying datastore. The function must accept a table. The function transforms the data according to this preprocessing pipeline:

  • Randomly select one of five augmentations.

  • Apply the same augmentation to the data in both columns of the table.

  • Return the augmented image pair in a table.

function dataOut = preprocessVolumetricPatchDS(data)

img = data(1);
resp = data(2);

% 5 augmentations: nil,rot90,fliplr,flipud,rot90(fliplr)
augType = {@(x) x,@rot90,@fliplr,@flipud,@(x) rot90(fliplr(x))};

rndIdx = randi(5,1);
imgOut = augType{rndIdx}(img);
respOut = augType{rndIdx}(resp);

% Return the preprocessed data
dataOut = table(imgOut,respOut};

end

See Also

| | | |

Related Examples

More About