In memory calculations with tall arrays from different databases

3 Ansichten (letzte 30 Tage)
TOSA2016
TOSA2016 am 16 Jul. 2019
Kommentiert: Guillaume am 18 Jul. 2019
Imagine I have two data bases as (Table and double numbers in them)
ds_1 = tabularTextDatastore('file_1.txt');
ds_2 = tabularTextDatastore('file_2.txt');
Also imagine that I created my tall arrays as
X = tall(ds_1);
Y = tall(ds_2);
Now, let's imagine that I trained a model, mdl, with fitlm and I want to use this model to predict from X and Y as
Anwer = predict(mdl, [X,Y]);
The error I receive is this
Error using tall/horzcat (line 23)
Incompatible tall array arguments. The tall arrays must be based on the
same datastore.
How can I solve this problem without gathering the data and just use in memory capabilities?

Antworten (1)

Guillaume
Guillaume am 16 Jul. 2019
Bearbeitet: Guillaume am 16 Jul. 2019
If you have R2019a, you can combine your two datastores.
ds_1 = tabularTextDatastore('file_1.txt');
ds_2 = tabularTextDatastore('file_2.txt');
ds_combined = combine(ds_1, ds_2);
Answer = predict(mdl, tall(ds_combined));
In previous versions, I'm not sure that there's a way to do it other than creating your own custom datastore that would keep track of both datastores (essentially recreating the R2019a CombinedDatastore).
  6 Kommentare
TOSA2016
TOSA2016 am 17 Jul. 2019
I also notices that there might be an easier way. What if I make the datastore as
DS = tabularTextDatastore({'file_1.txt', 'file_2.txt'});
X1 = tall(datastore(DS.Files{1}));
X2 = tall(datastore(DS.Files{2}));
I still cannot make a tall array from X1 and X2 as
X_NEW = tall([X1, X2]);
As it gives me the following error.
Error using tall/horzcat (line 21)
Duplicate table variable name: 'Var1'.
Guillaume
Guillaume am 18 Jul. 2019
"The tall array generation from combined datasores is not compatible with parallel compution"
I would recommend raising a service request with matlab then, as they should make it possible to create a combined datastore that has the exact same properties as the source datastores (if they are compatible). I don't have the parallel toolbox, so I'm not sure what these properties are. Since you now have access to the source code of CombinedDatastore (in fullfile(matlaroot, 'toolbox\matlab\datastoreio\+matlab\+io\+datastore')), you could also copy it and make the required modifications.
I'm not sure you will be able to concatenate two tall arrays from the same datastore since by necessity they will have the same variable names, so indeed horizontal concatenation will create duplicate variable names which is not allowed. The only way this could work is if you are allowed to modify the variable names of the tall array. See if this work:
DS = tabularTextDatastore({'file_1.txt', 'file_2.txt'});
X1 = tall(datastore(DS.Files{1}));
X2 = tall(datastore(DS.Files{2}));
X2.Properties.VariableNames = compose('X2Var%d', 1:width(X2));

Melden Sie sich an, um zu kommentieren.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by