Processing a HUGE number of timestamps
4 Ansichten (letzte 30 Tage)
Ältere Kommentare anzeigen
Matlab2010
am 13 Nov. 2014
Beantwortet: Jan
am 16 Nov. 2014
I have a cell array whose elements are time stamps in the format "Mon Apr 01 20:00:00 BST 2013". I have a very large number of these vectors. At the moment, I loop through each value in the vector and apply the below function. This loop taking up 99% of my processing time.
How can I remove the loop?
thanks
function myTimeOut = st_timestampConvert(myTime)
year = strtrim(myTime(end-4:end));
month = strtrim(myTime(5:8));
day = strtrim(myTime(9:10));
time = strtrim(myTime(11:19));
timezone = strtrim(myTime(20:23));
myTimeOut = convert_to_UTC(myTimeOut, timezone); %time zone conversion
myTimeOut = datenum([day '-' month '-' year ' ' time], 'dd-mmm-yyyy HH:MM:SS');
end
0 Kommentare
Akzeptierte Antwort
Guillaume
am 14 Nov. 2014
Without R2014b datetime, you can use regexprep to rearrange the bits of the string you want before calling datenum. It's many orders of magnitude faster than a loop, cellfun, or strsplit.
s2 = regexprep(s, '(\w+) (\w+) (\d+) ([0-9:]+) (\w+) (\d+)', '$3-$2-$6 $4');
tout = datenum(s2, 'dd-mmm-yyyy HH:MM:SS');
On my machine, to process 100k dates, the above two lines takes 2.1 seconds , most of it taken by the datenum operation. The regexp line is only 0.3 seconds.
There remains the problem of the time zone adjustment (which I believe should have come after the conversion to datenum in your example). Your convert_to_UTC is not part of matlab. Hopefully it can operate on cell arrays as well. Thus to extract the timezone:
tzones = regexp(s, '\w+(?= \d+$)', 'match', 'once');
tout = convert_to_UTC(tout, tzones); %Will this work?
0 Kommentare
Weitere Antworten (2)
Peter Perkins
am 13 Nov. 2014
none, I don't know if you have access to R2014b. If you do, consider using the new datetime data type. On a not so fast PC, parsing 100000 strings like yours, with time zones, takes a bit over 2s. 'BST' presents a potential issue, because it might mean any number of things. In the UK, it means "British Summer Time", and the following parses the strings using that locale. Hope this helps.
% construct 100k strings
d = datetime(2013,4,1,20,0,0,'TimeZone','Europe/London') + days(randn(100000,1));
s = cellstr(d,'eee MMM dd HH:mm:ss z yyyy','en_UK');
ans =
Tue Apr 02 14:33:32 BST 2013
% parse those strings
tic
d1 = datetime(s,'Format','eee MMM dd HH:mm:ss z yyyy','TimeZone', ...
'Europe/London','Locale','en_UK');
toc
Jan
am 16 Nov. 2014
I prefer Guillaume's version, but it is not "magnitudes" faster than a loop approach:
function DOut = ConvertCellDate(DIn)
DOut = zeros(size(DIn));
for k = 1:numel(DIn)
Dx = double(DIn{k} - '0'); % For faster conversion of numbers
month = (strfind('JanFebMarAprMayJunJulAugSepOctNovDec', DIn{k}(5:7)) + 2) / 3;
year = Dx(25) * 1000 + Dx(26) * 100 + Dx(27) * 10 + Dx(28);
DOut(k) = datenummx(year, month, Dx(9) * 10 + Dx(10), ...
Dx(12) * 10 + Dx(13), Dx(15) * 10 + Dx(16), ...
Dx(18) * 10 + Dx(19));
end
Phew, this looks cruel and is not smart to debug. But it takes 2.3 sec on my Matlab 2011b/Win7/32 system, while Guillaume's method takes 1.7 sec.
0 Kommentare
Siehe auch
Kategorien
Mehr zu Dates and Time finden Sie in Help Center und File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!