readtable does not handle excel dates and times properly

27 views (last 30 days)
I have created a program that reads tables, both CSV and excel. The tables come in different formats, and the program helps the user reformat them to match the formatting of our database. However, I am having trouble reading excel files which have dates and times.
If I open the table in excel, it looks like this.
DATE EVENTNO TIME
30-Apr-2017 84 140158
30-Apr-2017 85 140201
30-Apr-2017 86 140206
30-Apr-2017 87 140211
30-Apr-2017 88 140216
30-Apr-2017 89 140221
30-Apr-2017 90 140226
30-Apr-2017 91 140231
I can look at the format of cells in excel and see that they are the time represents hmmss.
However, when I try to open it in MATLAB, it looks like this
>> opts = detectImportOptions(fname);
>> preview(fname,opt)
>> opts.VariableTypes
ans =
1×3 cell array
{'datetime'} {'double'} {'double'}
'30-Apr-2017 10:01:57' 84 0.584694444444445
'30-Apr-2017 10:02:00' 85 0.584731481481482
'30-Apr-2017 10:02:05' 86 0.584789351851852
'30-Apr-2017 10:02:10' 87 0.584847222222222
'30-Apr-2017 10:02:15' 88 0.584906250000000
'30-Apr-2017 10:02:20' 89 0.584964120370370
'30-Apr-2017 10:02:25' 90 0.585021990740741
'30-Apr-2017 10:02:30' 91 0.585079861111111
The dates have some unexpected times added to them, and the times have values that only come into range if multiplied by 24. Even then it is not a proper time, and I'm not sure how to reconcile it with the date time.
There are actually a lot more columns in my actual tables, and my program does not know what order they will be in or what they will be named, so I cannot hard code a solution.
Is there anyway to automatically detect and fix excel datetime import errors? I prefer to have a single column labeled TIME containing a datetime object.
EDIT: sample file attached
  2 Comments
Russell Shomberg
Russell Shomberg on 3 Dec 2019
Thanks for the response.
I attached a sample file. I am in Eastern Standard Time (GMT-5)

Sign in to comment.

Accepted Answer

Walter Roberson
Walter Roberson on 4 Dec 2019
In your file, the data in the first column is a complete date and time, that has been formatted as date only, and formatted for Cameroon, Cameroon is GMT+1
In your file, the data in the third column is time only, formatted as Custom format hmmss
The times in the thrid column appear to be exactly 6 hours after the times in the first column.
But 6 hours after a time, to be consistent, would be an earlier timezone. If the first column is Cameroon time GMT+1, then the third column would have to correspond to GMT+7, which is an Indo-China timezone -- Central Russia (such as Novosibirsk), western China (such as Nepal), eastern India.
You can use readtable() such as
T = readtable(filename);
T.DATE.TimeZone = 'Africa/Algiers'; %starts with no timezone, slap one onto it. Note: WAT has no DST
T.DATE.TimeZone = 'America/New_York'; %switch to local timezone with potential DST
and ignore the TIME column.

More Answers (1)

Russell Shomberg
Russell Shomberg on 3 Dec 2019
I am using readtable. That is what messes it up. Post readtable, this function I wrote seems to help, but I have not figured out how to autodetect and fix yet.
function T = Fix_Excel_Dates(T)
%Fix Excel Dates
%
%Excel files often come with a DATE and TIME column which do not represent
%correctly. The DATE column has random times added to it, while the TIME
%column is represented as a fractional day. Run this macro to fix this.
%TODO: TIME must be type double
%TODO: DATE must be type datetime
% Convert TIME from fractional days to durations
t = days(T.TIME);
% Remove incorrect times from days
d = dateshift(T.DATE,'start','day');
% Add times to fixed dates
dtg = d+t;
T.DTG = dtg;
T.DTG.Format = 'default';
T.TIME = dtg;
T.TIME.Format = 'HHmmss';
T.DAY = dtg.Day;
T.MONTH = dtg.Month;
T.YEAR = dtg.Year;
T = table2timetable(T,'RowTimes',"DTG");
T = removevars(T,{'DATE'});
end

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by