Main Content

Create Image Datastore Containing Single and Multi-File DICOM Series

Datastores are a convenient way of working with and representing collections of data that are too large to fit in memory at one time, especially in deep learning workflows. Digital Imaging and Communications in Medicine (DICOM) is a standardized medical image file format that can store volumes and image series as a single file or multiple files in a folder. This example shows how to create an image datastore containing DICOM data stored as a mix of single files and multiple files.

Medical Imaging Toolbox™ provides objects and functions that simplify this workflow. To get started, see Create Training Data for 3-D Medical Image Semantic Segmentation (Medical Imaging Toolbox).

Download Data

Download the file from the MathWorks® website and unzip it in the current example directory. This file contains three chest CT volumes stored in the DICOM file format, with each volume stored as a directory of multiple files.

zipFile = matlab.internal.examples.downloadSupportFile("medical","");

Gather DICOM Information

In the DICOM file format, a series corresponds to one scan, such as one MRI or CT volume. The dicomCollection function analyzes the metadata of all DICOM files in a folder, and returns a table in which each row represents one series. For multi-file DICOM volumes, the function aggregates the files into a single series.

Gather the details for the DICOM series in the current example directory, which includes the multi-file chest CT volumes and a multi-frame ultrasound series stored as a single DICOM file.

dicomDir = pwd;
collection = dicomCollection(dicomDir,IncludeSubfolders=true)
collection=4×14 table
             StudyDateTime             SeriesDateTime         PatientName     PatientSex    Modality    Rows    Columns    Channels    Frames          StudyDescription          SeriesDescription                            StudyInstanceUID                                                    SeriesInstanceUID                                       Filenames           
          ____________________    ________________________    ____________    __________    ________    ____    _______    ________    ______    ____________________________    _________________    ________________________________________________________________    __________________________________________________________________    ______________________________

    s1    14-Dec-2018 08:10:05    {[14-Dec-2018 08:14:20]}    ""                 "M"          "CT"      512       512         1         176      "CT CARDIAC CALCIUM SCORING"    "LUNG 30%"           ""    ""    {176×1 string                }
    s2    14-Dec-2018 08:10:05    {[14-Dec-2018 08:14:20]}    ""                 "M"          "CT"      512       512         1          88      "CT CARDIAC CALCIUM SCORING"    "LUNG 2.5 30%"       ""    ""    { 88×1 string                }
    s3    14-Dec-2018 08:10:05    {[14-Dec-2018 08:14:20]}    ""                 "M"          "CT"      512       512         1          88      "CT CARDIAC CALCIUM SCORING"    "STANDARD 30%"       ""    ""    { 88×1 string                }
    s4    30-Jan-1994 11:25:01    {0×0 double            }    "Anonymized"       ""           "US"      430       600         1          10      "Echocardiogram"                "PS LAX MR & AI"     "999.999.3859744"                                                   "999.999.94827453"                                                    {["C:\US-PAL-8-10x-echo.dcm"]}

Create a temporary directory to store the processed DICOM volumes.

matFileDir = fullfile(pwd,"MATFiles");
if ~exist(matFileDir,"dir")

Convert DICOM Volumes to MAT Files

Find all image volumes in the dicomCollection table, and save each volume as a MAT file.

Fist, loop through each series in the collection.

for idx = 1:size(collection,1)

For the current series, extract the DICOM filenames from the table. If the series contains a multi-file DICOM volume, the filenames are listed as a string array.

    dicomFileName = collection.Filenames{idx};

Adapt the DICOM filenames to specify a single filename for the new MAT file.

    if length(dicomFileName) > 1
        matFileName = fileparts(dicomFileName(1));
        matFileName = split(matFileName,filesep);
        matFileName = replace(strtrim(matFileName(end))," ","_");
        [~,matFileName] = fileparts(dicomFileName);
    matFileName = fullfile(matFileDir,matFileName);

Read the image data in the current series. Try different read functions that handle multi-file and single file DICOM volumes.

1) Try reading the data by using the dicomreadVolume function.

  • If the data is a multi-file volume, then dicomreadVolume runs successfully and returns the complete volume in a single 4-D array. The four dimensions correspond to [rows,columns,samples,slices], where samples is the number of channels per voxel. You can add this data to the datastore and skip step 2.

  • If the data is contained in a single file, then dicomreadVolume does not run successfully. Move to step 2.

2) Try reading the data by using the dicomread function.

  • If the data is a complete volume, dicomread returns a 4-D array. You can add this data to the datastore.

  • If the data is a single 2-D image, dicomread returns a 2-D matrix or 3-D array. Skip this series and continue to the next series in the collection.

        data = dicomreadVolume(collection,collection.Row{idx});
    catch ME
        data = dicomread(dicomFileName);
        if ndims(data)<4
            % Skip files that are not volumes

If the current series is a volume, write the data and the corresponding DICOM filenames to a MAT file.


End the loop over the studies in the collection.


Create Image Datastore

Create an imageDatastore from the MAT files containing the volumetric DICOM data. Specify the ReadFcn property as the helper function matRead, which is defined at the end of this example.

imdsdicom = imageDatastore(matFileDir,FileExtensions=".mat", ...

Visually Check Results

Read the first DICOM volume from the image datastore.

[V,Vinfo] = read(imdsdicom);
[~,VFileName] = fileparts(Vinfo.Filename);

Remove the singleton channel dimension by using the squeeze function, then display the volume by using the volshow function.

V = squeeze(V);

Supporting Functions

The matRead function loads data from the first variable of a MAT file with filename filename.

function data = matRead(filename)
    inp = load(filename);
    f = fields(inp);
    data = inp.(f{1});

See Also

| | | | |

Related Topics