Converting partial strings using datetime
5 Ansichten (letzte 30 Tage)
Ältere Kommentare anzeigen
I am working with a large CSV datafile (16GB) and one column includes the date and time, for example:
{'Mon Apr 22 2019 12:53:23 GMT+1000 (Australian Eastern Standard Time)'}
I'm currently using code like this to import from the CSV:
% formats for importing data from CSV
tsFormats = {'%q'};
% make datastore and tall array object for data
ds = datastore("input.csv", ...
"TreatAsMissing", [".","NA"],'Delimiter', ',',...
"DatetimeLocale","en_GB",'TextscanFormats',tsFormats);
tall_ds = tall(ds);
% add datetime field
tall_ds.Datetime = datetime(tall_ds.timestamp,...
"InputFormat",'eee MMM d yyyy H:mm:ss ''GMT''Z ''(Australian Eastern Standard Time)''',...
'TimeZone','Australia/Sydney');
However, not all of the times are Eastern Standard time (some are Eastern Daylight Time) and in future not all data will be in this locale/timezone (i.e. they might be in Australia/Perth).
The format string I have used results in NaT for Australian Eastern Daylight Time, so I re-do the conversion for all rows that have NaT instead of a valid datetime.
% re-do for NaT datetimes
NaTs = isnat(tall_ds.Datetime); % find NaTs
tall_ds.Datetime(NaTs) = datetime(tall_ds.timestamp(NaTs),...
"InputFormat",'eee MMM d yyyy H:mm:ss ''GMT''Z ''(Australian Eastern Daylight Time)''',...
'TimeZone','Australia/Sydney');
Is there a 'better' way to do this, going forward? Especially if I want to make it resistant to changes in location (e.g. to Perth or elsewhere in the world), and minimise extra passes through the data. (I know MATLAB minimises passes through the data when using gather, but it seems silly to have to do this isnat call.)
Ideally I'd probably just cut the string after the timezone offset using something like extractbefore.
2 Kommentare
Siddharth Bhutiya
am 7 Aug. 2023
I think since you are any way ignoring the text that contains the name, your approach of using extractBefore to trim that part out before conversion seems like the best way to tackle this. Extract the timestamp before "(" and then use a single datetime call to convert all the timestamps. Something like the code below
str = {'Mon Apr 22 2019 12:53:23 GMT+1000 (Australian Eastern Standard Time)','Mon Apr 22 2019 12:53:23 GMT+1000 (Australian Eastern Daylight Time)'};
str = extractBefore(str,"(");
dt = datetime(str,InputFormat='eee MMM d yyyy H:mm:ss ''GMT''Z',TimeZone='Australia/Sydney')
Akzeptierte Antwort
Stephen23
am 7 Aug. 2023
Bearbeitet: Stephen23
am 7 Aug. 2023
"Especially if I want to make it resistant to changes in location (e.g. to Perth or elsewhere in the world), and minimise extra passes through the data."
Then work in UTC:
C = {'Mon Apr 22 2019 12:53:23 GMT+1000 (Australian Eastern Standard Time)','Mon Apr 22 2019 12:53:23 GMT+1000 (Australian Eastern Daylight Time)'};
C = regexprep(C,"\s*\(.+$","")
D = datetime(C, "InputFormat","eee MMM d u H:m:s 'GMT'Z", "Timezone","UTC")
Weitere Antworten (0)
Siehe auch
Kategorien
Mehr zu Data Type Conversion finden Sie in Help Center und File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!