Reading multiple excel files with data store when the columns of the files change position

2 Ansichten (letzte 30 Tage)
Hi. I am trying to read hundreds of excel files with data store, but the files vary the position of the columns. Data Store reads the position of the columns in the first file and assumes that it is the same for the rest. when reviewing the data of the last excel files does not match, because its position was different. how do i solve it?
thanks.
  2 Kommentare
Guillaume
Guillaume am 31 Okt. 2019
Unfortunately, it's not something that's supported by the SpreadsheetDatastore. In theory, you would have to write your own custom datastore to handle this but it would be a lot of work.
However, have all the files got the exact same columns just in different position? If so, you could handle it with a TransformedDatastore. If you can attach two sample files (actual content can be rubbish as long as they've got the correct format), I can show you how to do that.
Juan Carlos Pozuelos Buezo
Juan Carlos Pozuelos Buezo am 31 Okt. 2019
Thanks Guillaume, unfortunately the files has diferent comunt of columns. buy, i whant to work with the same columns in all files. The point is that columns are en diferent positions.

Melden Sie sich an, um zu kommentieren.

Akzeptierte Antwort

Guillaume
Guillaume am 31 Okt. 2019
This works with the very simple test case I built (of two excel files with columns in different order and the 2nd file with extra columns).
First create your datastore as normal, e.g:
ds = datastore('C:\somewhere\*.xlsx');
ds.SelectedVariableNames = {'Something', 'SomethingElse', 'Whatever'};
Then create a TransformedDatastore with that underlying datastore:
tds = transform(ds, @reorderds, 'IncludeInfo', true);
with the following reorderds.m file:
function [tout, infoin] = reorderds(tin, infoin)
%reorder the variables of a table read by datastore.read so that they match what is actually in the file
%works with a spreadsheetdatastore only
currentvars = tin.Properties.VariableNames; %these names come from the first file in the datastore and may not match the actual names in the file
opts = detectImportOptions(infoin.Filename, 'Sheet', infoin.SheetNames{1}); %so get the actual names with detectimportoptions
actualvars = opts.VariableNames;
[found, where] = ismember(currentvars, actualvars); %find actual location of the variables in the file
assert(all(found), 'Some expected variables not found in %s', infoin.Filename);
tout = tin(:, where); %reorder columns
tout.Properties.VariableNames = tin.Properties.VariableNames; %and set the name back to what they should be
end
and use tds instead of ds. Any read or readall operation with tds will automatically reorder the columns to their true location.
There will be a performance penalty to this since in effect each file is parsed twice, once by the internal datastore read and another time in reorderds by the detectImportOptions.

Weitere Antworten (1)

Roofus Milton
Roofus Milton am 31 Okt. 2019
I would use a cell array populated by readtable then simply reference the colums by name.

Kategorien

Mehr zu Large Files and Big Data finden Sie in Help Center und File Exchange

Produkte

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by