Converting partial strings using datetime

5 Ansichten (letzte 30 Tage)
Will
Will am 7 Aug. 2023
Kommentiert: Will am 8 Aug. 2023
I am working with a large CSV datafile (16GB) and one column includes the date and time, for example:
{'Mon Apr 22 2019 12:53:23 GMT+1000 (Australian Eastern Standard Time)'}
I'm currently using code like this to import from the CSV:
% formats for importing data from CSV
tsFormats = {'%q'};
% make datastore and tall array object for data
ds = datastore("input.csv", ...
"TreatAsMissing", [".","NA"],'Delimiter', ',',...
"DatetimeLocale","en_GB",'TextscanFormats',tsFormats);
tall_ds = tall(ds);
% add datetime field
tall_ds.Datetime = datetime(tall_ds.timestamp,...
"InputFormat",'eee MMM d yyyy H:mm:ss ''GMT''Z ''(Australian Eastern Standard Time)''',...
'TimeZone','Australia/Sydney');
However, not all of the times are Eastern Standard time (some are Eastern Daylight Time) and in future not all data will be in this locale/timezone (i.e. they might be in Australia/Perth).
The format string I have used results in NaT for Australian Eastern Daylight Time, so I re-do the conversion for all rows that have NaT instead of a valid datetime.
% re-do for NaT datetimes
NaTs = isnat(tall_ds.Datetime); % find NaTs
tall_ds.Datetime(NaTs) = datetime(tall_ds.timestamp(NaTs),...
"InputFormat",'eee MMM d yyyy H:mm:ss ''GMT''Z ''(Australian Eastern Daylight Time)''',...
'TimeZone','Australia/Sydney');
Is there a 'better' way to do this, going forward? Especially if I want to make it resistant to changes in location (e.g. to Perth or elsewhere in the world), and minimise extra passes through the data. (I know MATLAB minimises passes through the data when using gather, but it seems silly to have to do this isnat call.)
Ideally I'd probably just cut the string after the timezone offset using something like extractbefore.
(Similar to, but distinct from, this previous question.)
  2 Kommentare
Siddharth Bhutiya
Siddharth Bhutiya am 7 Aug. 2023
I think since you are any way ignoring the text that contains the name, your approach of using extractBefore to trim that part out before conversion seems like the best way to tackle this. Extract the timestamp before "(" and then use a single datetime call to convert all the timestamps. Something like the code below
str = {'Mon Apr 22 2019 12:53:23 GMT+1000 (Australian Eastern Standard Time)','Mon Apr 22 2019 12:53:23 GMT+1000 (Australian Eastern Daylight Time)'};
str = extractBefore(str,"(");
dt = datetime(str,InputFormat='eee MMM d yyyy H:mm:ss ''GMT''Z',TimeZone='Australia/Sydney')
dt = 1×2 datetime array
22-Apr-2019 12:53:23 22-Apr-2019 12:53:23
Will
Will am 8 Aug. 2023
Thanks Siddharth. I was hoping to avoid extra passes through the data, but it seems like there's no way to avoid some pre- or post-processing of the strings before they're converted.

Melden Sie sich an, um zu kommentieren.

Akzeptierte Antwort

Stephen23
Stephen23 am 7 Aug. 2023
Bearbeitet: Stephen23 am 7 Aug. 2023
"Especially if I want to make it resistant to changes in location (e.g. to Perth or elsewhere in the world), and minimise extra passes through the data."
Then work in UTC:
C = {'Mon Apr 22 2019 12:53:23 GMT+1000 (Australian Eastern Standard Time)','Mon Apr 22 2019 12:53:23 GMT+1000 (Australian Eastern Daylight Time)'};
C = regexprep(C,"\s*\(.+$","")
C = 1×2 cell array
{'Mon Apr 22 2019 12:53:23 GMT+1000'} {'Mon Apr 22 2019 12:53:23 GMT+1000'}
D = datetime(C, "InputFormat","eee MMM d u H:m:s 'GMT'Z", "Timezone","UTC")
D = 1×2 datetime array
22-Apr-2019 02:53:23 22-Apr-2019 02:53:23
  1 Kommentar
Will
Will am 8 Aug. 2023
Thanks Stephen23, nice use of regexp. And I like the idea of working in UTC.
I was trying to do the conversion in one pass through the dataset, but it seems that's not doable. I suppose gather will optimise the second pass to make it as efficient as possible.

Melden Sie sich an, um zu kommentieren.

Weitere Antworten (0)

Kategorien

Mehr zu Data Type Conversion finden Sie in Help Center und File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by