MATLAB functions str2double and strsplit taking long time
5 Ansichten (letzte 30 Tage)
Ältere Kommentare anzeigen
Arup Ghosh
am 9 Jan. 2018
Bearbeitet: Stephen23
am 9 Jan. 2018
I have a comma separated string file like this:
number1, number2, number3, number4
number5, number6, number7, number8
number9, number10, number11, number12
.....................................
_[all the numbers are double]_
I am reading it line by line.
Then splitting the line into parts (',' is the delimiter) and converting the parts into double numbers.
The code is given below:
str2double(strsplit(line,','));
The input file is very big. It has >100000 lines and each line has >200 parts or numbers.
The Profiler shows the above code is taking long time to execute.
How to replace the above code thus it takes very short time to execute?
I want to read the file line by line. Do not want to read the whole file into a single Matrix using csvread.
Thanks in advance.
2 Kommentare
Stephen23
am 9 Jan. 2018
Why not just use csvread? If the file is comma separated and contains only numbers, then why waste time writing buggy code when csvread already exists?
Akzeptierte Antwort
Stephen23
am 9 Jan. 2018
Bearbeitet: Stephen23
am 9 Jan. 2018
The approach of reading each line and then using strsplit and str2double will be slow, because those functions are inherently much more complex than what you require.
method one: sscanf:
fmt = repmat(',%f',1,N); % N == number of columns
fmt = fmt(2:end);
...
while ~feof(fid)
...
S = fgetl(fid);
V = sscanf(S,fmt);
end
method two: textscan and blocks of data:
One way to read a large file efficiently is to use textscan inside a loop to read blocks of numeric data as numeric data (importing as char and then converting to numeric is, in general, slow code). Use textscan's optional third input to specify the number of rows per block. How to read blocks of data is explained clearly in the MATLAB documentation:
If you know the file format in advance then it is trivial to write a format string to suit. If the number of columns can vary then you can read the first line, calculate the columns to generate the format string, then use frewind to go back to the start of the file and start reading the blocks of data using textscan.
See these threads to see working examples:
method three: datastore:
Depending on your MATLAB version you might also like to consider using tall arrays, which are a special kind of data type especially for working with very large data that cannot be read into memory:
or methods like datastore for working with large files:
0 Kommentare
Weitere Antworten (0)
Siehe auch
Kategorien
Mehr zu Large Files and Big Data finden Sie in Help Center und File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!