Join several timetables with different variables and times and duplicates

Question

bk am 25 Feb. 2020

0
Verknüpfen

Direkter Link zu dieser Frage

https://de.mathworks.com/matlabcentral/answers/507450-join-several-timetables-with-different-variables-and-times-and-duplicates

Kommentiert: Guillaume am 26 Feb. 2020

I'm trying to read a bunch of csv files that each have one variable timeseries. Many files can hold the same variable and there are overlaps in the timestamps (duplicates).

My problem is when trying to merge these csv files, which are read to timetables, into a single timetable.

Example code:

times_TT_A = [datetime(2019,01,01) datetime(2019,01,02) datetime(2019,01,03)]';
data_TT_A = [1 2 3]';
TT_A = timetable(times_TT_A, data_TT_A,'VariableNames',{'Variable_ONE'});
times_TT_B = [datetime(2019,01,01) datetime(2019,01,02)]';
data_TT_B = [10 20]';
TT_B = timetable(times_TT_B, data_TT_B,'VariableNames',{'Variable_TWO'});
times_TT_C = [datetime(2019,01,02) datetime(2019,01,03)]';
data_TT_C = [20 30]';
TT_C = timetable(times_TT_C, data_TT_C,'VariableNames',{'Variable_TWO'});

So here there are two different variables and one variable, Variable_TWO, has overlap in timestamps.

I would like to have this result:

    times_TT_A    Variable_ONE    Variable_TWO    
    __________    ____________    _________________  
    2019-01-01         1                  10                 
    2019-01-02         2                  20                     
    2019-01-03         3                 30  

I don't mind a solution with duplicate entries, I can sort those out later, for example this is also fine:

   times_TT_A    Variable_ONE    Variable_TWO    
    __________    ____________    _________________  
    2019-01-01         1                  10                 
    2019-01-02         2                  20                     
    2019-01-03         3                 NaN
    2019-01-02        NaN                 20
    2019-01-03        NaN                 30

The csv files are read in a loop and should then one by one be added to the common timetable like this in pseudocode

timetable_alldata = timetable()
for csvfile in csvfiles
    timetable_csv = readtable(csvfile)
    timetable_alldata = join(timetable_alldata, timetable_csv)
end

This is then the same as calling this using the example data - here using outerjoin:

outerjoin(outerjoin(TT_A,TT_B), TT_C)
ans =
  3×3 timetable
    times_TT_A    Variable_ONE    Variable_TWO_left    Variable_TWO_TT_C
    __________    ____________    _________________    _________________
    2019-01-01         1                  10                  NaN       
    2019-01-02         2                  20                   20       
    2019-01-03         3                 NaN                   30      

I have tried stacking ([A; B]), vertcat, outerjoin and synchronize...but I seem to be stuck here.

Any suggestions how to solve this?

3 Kommentare
1 älteren Kommentar anzeigen1 älteren Kommentar ausblenden

Lei Hou am 25 Feb. 2020

I have another clarification question. TT_B and TT_C both have the same row: 02-Jan-2019 20. If there are duplicated rows like this, do you want only keep one row?

bk am 25 Feb. 2020

Yes, the variable names need to match - they are not random.

And I only want to keep one row. The duplicated rows are always "true" duplicates with same timestamp and value. Sorting duplicates I guess will be relatively easy, so that can be done afterwards if the merging results in duplicates.

Melden Sie sich an, um zu kommentieren.

Melden Sie sich an, um diese Frage zu beantworten.

Answer 1

Lei Hou am 26 Feb. 2020

1
Verknüpfen

Direkter Link zu dieser Antwort

https://de.mathworks.com/matlabcentral/answers/507450-join-several-timetables-with-different-variables-and-times-and-duplicates#answer_417477

In MATLAB Online öffnen

Hi,

I came up with two solutions.

Solution 1: use outerjoint. It is easy to understand but have duplicated rows that need to be merged.

% Make the header of row times of input timetables to be the same
TT_A.Properties.DimensionNames{1} = 'Time';
TT_B.Properties.DimensionNames{1} = 'Time';
TT_C.Properties.DimensionNames{1} = 'Time';
% call outerjoin and merge keys
TT_AB = outerjoin(TT_A,TT_B, ...
    'Keys',['Time' intersect(TT_A.Properties.VariableNames,TT_B.Properties.VariableNames)], ...
    'MergeKeys',true)
TT_ABC = outerjoin(TT_AB,TT_C,...
    'Keys',['Time' intersect(TT_AB.Properties.VariableNames,TT_C.Properties.VariableNames)], ...
    'MergeKeys',true)
TT_ABC =
  4x2 timetable
            Time            Variable_ONE    Variable_TWO
    ____________________    ____________    ____________
    01-Jan-2019 00:00:00          1              10     
    02-Jan-2019 00:00:00          2              20     
    03-Jan-2019 00:00:00        NaN              30     
    03-Jan-2019 00:00:00          3             NaN     

Solution 2: Make use of assignment. This makes the strong assumption that the duplicate rows are really duplicates. So no need to post-process the result.

TTcell = {TT_A TT_B TT_C};
TT = timetable();
for i = 1:length(TTcell)
    TT_i = TTcell{i};
    TT(TT_i.Time,TT_i.Properties.VariableNames) = TT_i;
end
TT =
  3x2 timetable
       Time        Variable_ONE    Variable_TWO
    ___________    ____________    ____________
    01-Jan-2019         1               10     
    02-Jan-2019         2               20     
    03-Jan-2019         3               30     

2 Kommentare
Keine anzeigenKeine ausblenden

bk am 26 Feb. 2020

In MATLAB Online öffnen

Thanks!

I'm gonna go for solution 1 here for now. I find it a bit more readable, and a little faster it seem as well.

Solution two would work great in this case since its always true duplicates - but it has a little caveat when one of the variables have missing data for a timestamp. Matlab seem to insist that zero is a good value to fill tables. Any idea how to set default fill value to NaN, without prepopulating the table?

Example

times_TT_A = [datetime(2019,01,01) datetime(2019,01,02) datetime(2019,01,03)]';
data_TT_A = [1 2 3]';
TT_A = timetable(times_TT_A, data_TT_A,'VariableNames',{'Variable_ONE'});
times_TT_B = [datetime(2019,01,01) datetime(2019,01,02)]';
data_TT_B = [10 20]';
TT_B = timetable(times_TT_B, data_TT_B,'VariableNames',{'Variable_TWO'});
times_TT_C = [datetime(2019,01,02) datetime(2019,01,03) datetime(2019,01,04)]';
data_TT_C = [20 30 40]';
TT_C = timetable(times_TT_C, data_TT_C,'VariableNames',{'Variable_TWO'});
TTcell = {TT_A TT_B TT_C};
TT = timetable();
for i = 1:length(TTcell)
    TT_i = TTcell{i};
    TT(TT_i.Time,TT_i.Properties.VariableNames) = TT_i;
end
TT =
  4×2 timetable
           Time            Variable_ONE    Variable_TWO
    ___________________    ____________    ____________
    2019-01-01 00:00:00         1               10     
    2019-01-02 00:00:00         2               20     
    2019-01-03 00:00:00         3               30     
    2019-01-04 00:00:00         0               40     

Guillaume am 26 Feb. 2020

In MATLAB Online öffnen

Any idea how to set default fill value to NaN, without prepopulating the table?

It's not possible. The default fill value is the default fill value for the datatype, which for numeric types is 0. You get the same behaviour with plain arrays:

x = [];
x(5) = 8;  %x(1:4) gets filled with 0

Melden Sie sich an, um zu kommentieren.

Answer 2

Guillaume am 26 Feb. 2020

0
Verknüpfen

Direkter Link zu dieser Antwort

https://de.mathworks.com/matlabcentral/answers/507450-join-several-timetables-with-different-variables-and-times-and-duplicates#answer_417479

In MATLAB Online öffnen

With your example tables, this is trivially achieved with:

synchronize(TT_A, [TT_B; TT_C])

3 Kommentare
1 älteren Kommentar anzeigen1 älteren Kommentar ausblenden

Lei Hou am 26 Feb. 2020

Try me solution of using outerjoin, it doesn't require priori knowledge of the variables in the CSV file and their order. Please let me know if my solution doesn't work for you. I'm happy to help you.

Guillaume am 26 Feb. 2020

I must admit I didn't read the whole question. Yes, this won't work for what you want. If Lei's solution doesn't work, then you can always do it the 'old fashioned way' (before timetables and tables existed) with intersect and co. A bit more work but still fairly simple. I can work out the code if it's needed.

Melden Sie sich an, um zu kommentieren.

Join several timetables with different variables and times and duplicates

3 Kommentare
1 älteren Kommentar anzeigen1 älteren Kommentar ausblenden

Akzeptierte Antwort

2 Kommentare
Keine anzeigenKeine ausblenden

Weitere Antworten (1)

3 Kommentare
1 älteren Kommentar anzeigen1 älteren Kommentar ausblenden

Siehe auch

Kategorien

Tags

Produkte

Version

Community Treasure Hunt

Join several timetables with different variables and times and duplicates

3 Kommentare 1 älteren Kommentar anzeigen1 älteren Kommentar ausblenden

Akzeptierte Antwort

2 Kommentare Keine anzeigenKeine ausblenden

Weitere Antworten (1)

3 Kommentare 1 älteren Kommentar anzeigen1 älteren Kommentar ausblenden

Siehe auch

Kategorien

Tags

Produkte

Version

Community Treasure Hunt

3 Kommentare
1 älteren Kommentar anzeigen1 älteren Kommentar ausblenden

2 Kommentare
Keine anzeigenKeine ausblenden

3 Kommentare
1 älteren Kommentar anzeigen1 älteren Kommentar ausblenden