Why is str2num not recommended when it is faster in certain circumstances?

22 Ansichten (letzte 30 Tage)
I have a cell array of millions of strings representing dates with the format "yyyyMMddHHmmss"
I need to convert these to datetimes. After many attempts of various kinds I think I have found an optimal solution.
However, my solution requires that I use str2num instead of str2double and results in nearly 100x increase in speed. This is depite the fact that MATLAB recommends using str2double for "faster performance" and specifically discourages str2num.
In particular, str2double cannot convert a char array and results in Inf, while str2num converts the char array without issue.
Below is an example script.
(P.S. not directly related to this question, but if there is a faster way to convert cell arrays of strings to datetimes, let me know!)
%Make example input data
t1 = datetime(2000,1,1,0,0,0);
t2 = datetime("now");
t = string(datestr(t1:days(1):t2,'yyyymmddHHMMss'));
t_cell = cellstr(t); %<--- This is the "cell array" example data
%Option 1: datetime cell array of strings
% ---> Very slow (~0.75 s)
tic
tcheck1 = datetime(t_cell,'InputFormat','yyyyMMddHHmmss');
toc
%Option 2: str2double
% ---> DOES NOT WORK (results in Inf)
tic
da = char(t_cell);
year1 = str2double(da(:,1:4));
month1 = str2double(da(:,5:6));
day1 = str2double(da(:,7:8));
hour1 = str2double(da(:,9:10));
min1 = str2double(da(:,11:12));
sec1 = str2double(da(:,13:14));
tcheck2 = datetime(year1,month1,day1,hour1,min1,sec1);
toc
%Option 3: str2double with extra conversion from char to string
% ---> About twice as fast as Option #1 (~0.4 s)
tic
da = char(t_cell);
year1 = str2double(string(da(:,1:4)));
month1 = str2double(string(da(:,5:6)));
day1 = str2double(string(da(:,7:8)));
hour1 = str2double(string(da(:,9:10)));
min1 = str2double(string(da(:,11:12)));
sec1 = str2double(string(da(:,13:14)));
tcheck3 = datetime(year1,month1,day1,hour1,min1,sec1);
toc
%Option 3: str2num
% ---> About 100 times faster than Option #1 and #3 (~0.005 s)
tic
da = char(t_cell);
year1 = str2num(da(:,1:4));
month1 = str2num(da(:,5:6));
day1 = str2num(da(:,7:8));
hour1 = str2num(da(:,9:10));
min1 = str2num(da(:,11:12));
sec1 = str2num(da(:,13:14));
tcheck4 = datetime(year1,month1,day1,hour1,min1,sec1);
toc

Akzeptierte Antwort

Stephen23
Stephen23 am 5 Jul. 2024
Bearbeitet: Stephen23 am 5 Jul. 2024
"Why is str2num not recommended when it is faster in certain circumstances?"
Because it relies on EVAL. Code called by EVAL is not accelerated by the JIT engine, which is why that you get that warning (this slow-down effect is particularly noticeable with code which repeats a lot (e.g. loops), it might not be as noticeable with code which does not repeat a lot). The reliance on EVAL also make the code liable to unxpected behavior when provided with input data containing valid commands/function calls (the STR2NUM documentation explicitly warns for this), and makes the code less versatile because EVAL is not supported by e.g. parallel loops or the code compiler. For these reasons experienced users often prefer to avoid STR2NUM.
"In particular, str2double cannot convert a char array and results in Inf..."
STR2DOUBLE is not documented to work on character matrices. It is documented that it its input "str can be a character vector, a cell array of character vectors, or a string array."
If speed is a high priority for you, it might help to leverage low-level conversion functions:
%Make example input data
t1 = datetime(2000,1,1,0,0,0);
t2 = datetime("now");
%This is the "cell array" example data
t_cell = cellstr(string(datestr(t1:days(1):t2,'yyyymmddHHMMss')));
%Option 1: datetime cell array of strings
% ---> Very slow (~0.75 s)
tic
tcheck1 = datetime(t_cell,'InputFormat','yyyyMMddHHmmss');
toc
Elapsed time is 0.546119 seconds.
%Option 2: str2double
% N/A
%Option 3: str2double with extra conversion from char to string
% ---> About twice as fast as Option #1 (~0.4 s)
tic
da = char(t_cell);
year1 = str2double(string(da(:,1:4)));
month1 = str2double(string(da(:,5:6)));
day1 = str2double(string(da(:,7:8)));
hour1 = str2double(string(da(:,9:10)));
min1 = str2double(string(da(:,11:12)));
sec1 = str2double(string(da(:,13:14)));
tcheck3 = datetime(year1,month1,day1,hour1,min1,sec1);
toc
Elapsed time is 0.243270 seconds.
%Option 4: str2num
% ---> About 100 times faster than Option #1 and #3 (~0.005 s)
tic
da = char(t_cell);
year1 = str2num(da(:,1:4));
month1 = str2num(da(:,5:6));
day1 = str2num(da(:,7:8));
hour1 = str2num(da(:,9:10));
min1 = str2num(da(:,11:12));
sec1 = str2num(da(:,13:14));
tcheck4 = datetime(year1,month1,day1,hour1,min1,sec1);
toc
Elapsed time is 0.043460 seconds.
% Option 5: SSCANF
tic
da = char(t_cell);
M = sscanf(da.','%4d%2d%2d%2d%2d%2d',[6,Inf]).';
tcheck5 = datetime(M);
toc
Elapsed time is 0.013550 seconds.
% Option 6: DOUBLE
tic
da = char(t_cell);
year1 = double(string(da(:,1:4)));
month1 = double(string(da(:,5:6)));
day1 = double(string(da(:,7:8)));
hour1 = double(string(da(:,9:10)));
min1 = double(string(da(:,11:12)));
sec1 = double(string(da(:,13:14)));
tcheck6 = datetime(year1,month1,day1,hour1,min1,sec1);
toc
Elapsed time is 0.013220 seconds.
Mileage will vary depending on your installed MATLAB version and hardware. Chasing down every last millisecond is not always productive, if the code will then run on other machines or releases.
I have not tried it, but you might find this useful:

Weitere Antworten (1)

Umar
Umar am 4 Jul. 2024
Hi Darcy,
That’s is a very good catch. You asked, Why is str2num not recommended when it is faster in certain circumstances?
My suggestion is that str2num may offer speed advantages in specific scenarios, its drawbacks in terms of error handling, ambiguity, flexibility, readability, and future compatibility outweigh the performance gains.
You also asked, not directly related, if there is a faster way to convert cell arrays of strings to datetimes, let me know!
One efficient approach is to utilize vectorized operations and built-in functions provided by Matlab.Preallocating memory for the datetime array before conversion can improve performance. This can be achieved by initializing an empty datetime array with the desired size before populating it with converted values.For even faster conversion of large datasets, consider leveraging Matlab's Parallel Computing Toolbox. By parallelizing the conversion process, you can distribute the workload across multiple cores or workers, significantly reducing processing time.
Hope this answers your question.

Kategorien

Mehr zu Data Type Identification finden Sie in Help Center und File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by