How to create tall datastore from multiple data parts?
2 Ansichten (letzte 30 Tage)
Ältere Kommentare anzeigen
I would like to compute PCA on a large amount of data and thus use the tall array feature of the newest versions of Matlab. My data consists of multiple blocks of features that I gather from big images, i.e. blocks Di of size (Ni,d).
Let's say I have M such blocks and I want to compute PCA for all of them, i.e. something like
[coeff, score, latent] = pca([D0; D1; D2; ... ; Dn]);
but the data array [D0; D1; D2; ... ; Dn] does not fit into memory (several GB of data). Every block Di fits in memory by itself, but not their concatenation.
What is the best way to generate a datastore from these multiple blocks of data?
Note: I could compute pcacov using the eigen decomposition manually since the computation of the covariance matrix can be done using the sum of the outer products, which can be easily computed whatever the size of the data matrix, but I read PCA is more stable.
0 Kommentare
Antworten (1)
Rick Amos
am 29 Nov. 2016
A datastore can be created from a collection of folders and so the easiest way to achieve this is to place each block of data into its own folder using tall/write. The following code does both this as well as creating the datastore:
baseFolder = fullfile(pwd, 'MyFolder');
for ii = 1 : numBlocks
block = calculateBlock(ii);
subfolder = fullfile(baseFolder, num2str(ii, '%05i'));
write(subfolder, tall(block));
end
wildcardPattern = fullfile(baseFolder, '*');
ds = datastore(wildcardPattern);
0 Kommentare
Siehe auch
Kategorien
Mehr zu Dimensionality Reduction and Feature Extraction finden Sie in Help Center und File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!