Unique function is not working as expected

8 Ansichten (letzte 30 Tage)
Ariane Fidelia Wettig
Ariane Fidelia Wettig am 3 Apr. 2021
I have a problem with the unique-function and I can't find the reason why.
I import a table vom excel and afterwards I want to use [table,~] = unique(table,'rows');
It worked until recently. The only change I remember I did, was that I adjusted the timeformat.
Now I have doubled rows in my table, even after using [table,~] = unique(table,'rows');.
table = [];
for i=1:length(files_table_data)
path_table = fullfile(files_table_data(i).name);
opts = detectImportOptions(path_table);%excel
opts.VariableNames(1) = {'Name'};
opts.VariableNames(2) = {'Date'};
opts.VariableNames(3) = {'Number'};
opts = setvartype(opts,{'Date'},'datetime');%datetime
opts = setvaropts(opts,{'Date'},'InputFormat','yyyy-MM-dd HH:mm:ss.SSS XXX','TimeZone','Europe/Zurich');
opts = setvaropts(opts,{'Date'},'DatetimeFormat','yyyy-MM-dd HH:mm');
opts = setvartype(opts,{'Name'},'string');
opts = setvartype(opts,{'Number'},'double');
opts.SelectedVariableNames = {'Name','Date','Number'};
table = [table; readtable(path_table,opts)];
end
[table,~] = unique(table,'rows');
  4 Kommentare
the cyclist
the cyclist am 3 Apr. 2021
The last two rows -- as you have written them -- are obviously identical.
If I create a table using exactly what you have written, they will be identical, and the unique command will get rid of one of them. (Unless there is a bug in the unique command, which I doubt.)
This is why uploading the actual data, not just typing what it looks like, is so important to replicating the problem.
Can you create a very small segment of the data that is not sensitive to upload? Perhaps just a few rows, and only the relevant columns? Then test that segment to make sure it still exhibits the bug? Then upload everything we need to replicate that bug.
Ariane Fidelia Wettig
Ariane Fidelia Wettig am 3 Apr. 2021
I think with the Code and the data from the excel sheet it should be visible... You might need to adjust the data_path, so the code can find the Test.xlsx.
I needed to change the original Time-Format, because I compare this data afterwards with different data from other sources, that have different time-formats.
Thank you!
function test = import_test()
files_test = dir('../data/not_anonymized/Test*');
%% read table
table = [];
for i=1:length(files_test)
path_table = fullfile('..','data','not_anonymized', files_test(i).name);
%fprintf('Starting file %d/%d: %s\n',i,length(files_test),files_test(i).name)
opts = detectImportOptions(path_table);%excel
opts.VariableNames(1) = {'Name'};
opts.VariableNames(2) = {'Date'};
opts.VariableNames(3) = {'Number'};
opts = setvartype(opts,{'Date'},'datetime');%datetime
opts = setvaropts(opts,{'Date'},'InputFormat','yyyy-MM-dd HH:mm:ss.SSS XXX','TimeZone','Europe/Zurich');
opts = setvaropts(opts,{'Date'},'DatetimeFormat','yyyy-MM-dd HH:mm');
opts = setvartype(opts,{'Name'},'string');
opts = setvartype(opts,{'Number'},'double');
opts.SelectedVariableNames = {'Name','Date','Number'};
table = [table; readtable(path_table,opts)];
end
[table,~] = unique(table,'rows');
end

Melden Sie sich an, um zu kommentieren.

Akzeptierte Antwort

Cris LaPierre
Cris LaPierre am 3 Apr. 2021
First, do not name your table variable table. That overwrites MATLAB's function for constructing tables.
When I look at your excel file, I see that times record milliseconds to three decimal places. With times, you can modify how it is displayed without losing this level of detail. You have set your display format to 'yyyy-MM-dd HH:mm' but the actual data is still saved as 'yyyy-MM-dd HH:mm:ss.SSS XXX'.
Therefore, the unique command is working just fine. The times are not identical when looking at the actual data.
If you only want to compare your times to the minute, then you must actually modify the data to only be minutes. Perhaps something like this.
% Create a time with seconds and milliseconds, but set display format to
% HH:mm
d=datetime(2021,01,02,3,4,5,123,'Format','yyyy-MM-dd HH:mm')
d = datetime
2021-01-02 03:04
% Change display format so seconds and milliseconds are shown
d.Format = 'dd-MMM-yyyy HH:mm:ss.SSS'
d = datetime
02-Jan-2021 03:04:05.123
% set second and milliseconds to 0.
d.Second = 0
d = datetime
02-Jan-2021 03:04:00.000
For a datetime variable in a table, use dot notation: tbl.Date.Second = 0
  5 Kommentare
Cris LaPierre
Cris LaPierre am 5 Apr. 2021
Here's a simplified version using the test data set you shared.
path_table = "Ariane_Test.xlsx";
opts = detectImportOptions(path_table);%excel
opts = setvartype(opts,'Date','datetime');%datetime
opts = setvaropts(opts,'Date','InputFormat','yyyy-MM-dd HH:mm:ss.SSS XXX','TimeZone','Europe/Zurich');
opts = setvaropts(opts,'Date','DatetimeFormat','yyyy-MM-dd HH:mm.ss.SSS');
opts = setvartype(opts,'Name','string');
test = readtable(path_table,opts);
test.Date.Second = 0
test = 63×3 table
Name Date Number ______________ _______________________ ______ "338778418867" 2020-10-05 18:34.00.000 3 "338778418867" 2020-10-05 18:34.00.000 NaN "338778419063" 2020-10-05 18:34.00.000 1 "338778419063" 2020-10-05 18:34.00.000 NaN "338778419063" 2020-10-05 18:34.00.000 3 "338778419063" 2020-10-05 18:34.00.000 3 "338778419204" 2020-10-05 18:34.00.000 1 "338778419204" 2020-10-05 18:34.00.000 1 "338778419204" 2020-10-05 18:34.00.000 3 "338778419204" 2020-10-05 18:34.00.000 4 "338045763939" 2020-10-05 17:50.00.000 1 "378045659745" 2020-10-05 05:00.00.000 2 "378045659745" 2020-10-05 05:00.00.000 3 "378045659745" 2020-10-05 05:00.00.000 4 "378045659760" 2020-10-05 05:00.00.000 1 "378045659760" 2020-10-05 05:00.00.000 2
testU = unique(test,'rows');
tail(testU)
ans = 8×3 table
Name Date Number ______________ _______________________ ______ "378045659760" 2020-10-05 05:00.00.000 3 "378045659760" 2020-10-05 05:00.00.000 4 "378049751076" 2020-10-04 23:44.00.000 1 "378049751076" 2020-10-04 23:44.00.000 2 "378049751076" 2020-10-04 23:44.00.000 3 "378049751076" 2020-10-04 23:44.00.000 4 "378049751076" 2020-10-04 23:44.00.000 5 "378049751076" 2020-10-04 23:44.00.000 6
The last 6 have the same Name and Date, but Number is different.
Ariane Fidelia Wettig
Ariane Fidelia Wettig am 5 Apr. 2021
Thank you very much!
I checked my data again, googled and found out, that unique is indeed seeing NaN as NaN~=NaN. Therefore I added:
test.Number(isnan(test.Number),:)=inf(size(test.Number(isnan(test.Number))));
[test,~] = unique(test,'rows');
test.Number(isinf(test.Number),:)=NaN(size(test.Number(isinf(test.Number))));
to my code and now it works! :)
Thank you very much for your help!

Melden Sie sich an, um zu kommentieren.

Weitere Antworten (0)

Produkte


Version

R2020b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by