How to use ImageDataStore together with tall array?

4 Ansichten (letzte 30 Tage)
Alexander Jensen
Alexander Jensen am 6 Sep. 2018
So I'm trying to work with rather large amounts of data, that'll by no means fit into memory (~10TB of data), even though the computer I'm working on has 256GB memory.
I run into an issue using the following code (simply as an example, trying to understand the functionality of ImageDataStore and Tall arrays):
% Please don't mind the variable names!
ds = datastore('/Users/dummyPath/blabla/','Type','image');
K = tall(ds);
D = 0;
for ii = 1:10
D = D + mean(mean(mean(K(ii))));
end
N = gather(D);
OUTPUT:
Evaluating tall expression using the Parallel Pool 'local':
- Pass 1 of 2: Completed in 17 sec
- Pass 2 of 2: Completed in 0 sec
Evaluation 100% complete
Error using tall/mean (line 22)
Argument 1 to MEAN must be one of the following data types: numeric logical duration datetime char.
Learn more about errors encountered during GATHER.
Error in untitled (line 5)
D = mean(mean(mean(K(ii))));
Error in tall/gather (line 50)
[varargout{:}] = iGather(varargin{:});
Error in untitled (line 7)
N = gather(D)
Hope someone can help, I was unable to find anyone having the issue with datastore and tall arrays anywhere :/ Using them together could potentially alleviate a tedious process for me (shifting data in and out of memory, etc.).

Akzeptierte Antwort

Edric Ellis
Edric Ellis am 7 Sep. 2018
The output of the imageDatastore is a tall cell array, so you'll almost certainly want to use cell2mat to convert this to a tall array.
There's an additional complexity - tall arrays can be "large" only in the first dimension, so it's going to be easiest to perform per-image computations by first using permute on each image so that the dimensions are offset by 1. An example is probably in order.
imds = imageDatastore(fullfile(matlabroot, 'toolbox', 'images', 'imdata', 'AT*.tif'));
t = tall(imds);
At this point, t is an M×1 tall cell array where each cell is a 480×640 uint8 array. If we use cell2mat at this point, we'll end up with a (M*480)×640 uint8 array. By using permute on each cell prior to calling cell2mat, we can end up instead with an M×480×640 uint8 array.
t2 = cellfun(@(im) permute(im, [3, 1, 2]), t, 'UniformOutput', false);
t2 = cell2mat(t2);
gather(size(t2)) % gets [10 480 640]
Now, we can perform mean on each image separately
meanPerImage = mean(mean(t2, 2), 3)
gather(meanPerImage)
Unfortunately, in this case, this turns out to be not the most efficient way to compute this mean. It works better to do:
sumPerImage = sum(sum(t2, 2), 3);
numelPerImage = cellfun(@numel, t);
meanPerImage2 = gather(sumPerImage ./ numelPerImage)
  1 Kommentar
Alexander Jensen
Alexander Jensen am 7 Sep. 2018
Seems good! Thanks! Do you know, however, if it's more efficient to work on each image individually with MapReduce and just a datastore (instead of tall array), if say the images do not have equal dimensions?

Melden Sie sich an, um zu kommentieren.

Weitere Antworten (0)

Kategorien

Mehr zu Big Data Processing finden Sie in Help Center und File Exchange

Produkte


Version

R2018a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by