Loading part of a text file (i.e., fileread the first X bytes)
2 Ansichten (letzte 30 Tage)
Ältere Kommentare anzeigen
Scott
am 2 Okt. 2019
Kommentiert: Walter Roberson
am 2 Okt. 2019
I'm using fileread to load data. The problem I have is that the files are large (several MB) and I actually only need to load/process in the first fraction (say 100 kb) of the file. There are over 1M files so wasted computation time from loading all of this "data fat" at the end of the files adds up to several days.
Does anyone know of a way to use fileread (or something similar) where you can specify to only load part of the file into MATLAB's memory buffer? With this many files even saving a fraction of a second will make a big difference.
0 Kommentare
Akzeptierte Antwort
Walter Roberson
am 2 Okt. 2019
Bearbeitet: Walter Roberson
am 2 Okt. 2019
You would use fopen(), fread() with a size, then fclose() . You would want to use a "precision" specifier such as '*c' .
However, if there is a possibility that your files are UTF encoded or are multibyte character set, then you need to define more clearly what the size is intended to indicate. Is it (say) 100000 bytes that then potentially have to be decoded, or would you be wanting to read 100000 decoded characters ?
Also, remember to take into account line terminators in your counting. Does your file use carriage returns as well as linefeeds ?
2 Kommentare
Walter Roberson
am 2 Okt. 2019
Extracting the beginning of a character vector is not always more efficient if the parsing code is able to handle extra characters beyond what you need. But if you are using regexp you would want to be sure to use the ? quantifier on .* for example, so using the .*? operator, or make sure you use 'dotexceptnewline' with .* because .* implicitly skips the pointer to the end of the entire stretch of characters and then work backwards to find matches, instead of finding the first match from the current position.
Extracting the beginning of a character vector usually does not cost much and can save you from having to carefully code .?* but when talking about "fractions of a second" then it costs a little that might not strictly need to be used. Extracting the beginning before parsing is cleaner programing in most cases, but not always the utmost optimization.
Weitere Antworten (0)
Siehe auch
Kategorien
Mehr zu Data Type Conversion finden Sie in Help Center und File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!