Loading part of a text file (i.e., fileread the first X bytes)

Question

Scott am 2 Okt. 2019

0
Verknüpfen

Direkter Link zu dieser Frage

https://de.mathworks.com/matlabcentral/answers/483291-loading-part-of-a-text-file-i-e-fileread-the-first-x-bytes

Kommentiert: Walter Roberson am 2 Okt. 2019

I'm using fileread to load data. The problem I have is that the files are large (several MB) and I actually only need to load/process in the first fraction (say 100 kb) of the file. There are over 1M files so wasted computation time from loading all of this "data fat" at the end of the files adds up to several days.

Does anyone know of a way to use fileread (or something similar) where you can specify to only load part of the file into MATLAB's memory buffer? With this many files even saving a fraction of a second will make a big difference.

0 Kommentare
-2 ältere Kommentare anzeigen-2 ältere Kommentare ausblenden

Melden Sie sich an, um zu kommentieren.

Melden Sie sich an, um diese Frage zu beantworten.

Answer 1

Walter Roberson am 2 Okt. 2019

0
Verknüpfen

Direkter Link zu dieser Antwort

https://de.mathworks.com/matlabcentral/answers/483291-loading-part-of-a-text-file-i-e-fileread-the-first-x-bytes#answer_394501

Bearbeitet: Walter Roberson am 2 Okt. 2019

You would use fopen(), fread() with a size, then fclose() . You would want to use a "precision" specifier such as '*c' .

However, if there is a possibility that your files are UTF encoded or are multibyte character set, then you need to define more clearly what the size is intended to indicate. Is it (say) 100000 bytes that then potentially have to be decoded, or would you be wanting to read 100000 decoded characters ?

Also, remember to take into account line terminators in your counting. Does your file use carriage returns as well as linefeeds ?

2 Kommentare
Keine anzeigenKeine ausblenden

Scott am 2 Okt. 2019

Thanks Walter. I'll see how the run time compares. I likely could also save time by not passing the full block of text to the various parsing functions as well.

This is helpful, thanks!

Walter Roberson am 2 Okt. 2019

Extracting the beginning of a character vector is not always more efficient if the parsing code is able to handle extra characters beyond what you need. But if you are using regexp you would want to be sure to use the ? quantifier on .* for example, so using the .*? operator, or make sure you use 'dotexceptnewline' with .* because .* implicitly skips the pointer to the end of the entire stretch of characters and then work backwards to find matches, instead of finding the first match from the current position.

Extracting the beginning of a character vector usually does not cost much and can save you from having to carefully code .?* but when talking about "fractions of a second" then it costs a little that might not strictly need to be used. Extracting the beginning before parsing is cleaner programing in most cases, but not always the utmost optimization.

Melden Sie sich an, um zu kommentieren.

Loading part of a text file (i.e., fileread the first X bytes)

0 Kommentare
-2 ältere Kommentare anzeigen-2 ältere Kommentare ausblenden

Akzeptierte Antwort

2 Kommentare
Keine anzeigenKeine ausblenden

Weitere Antworten (0)

Siehe auch

Kategorien

Tags

Community Treasure Hunt

Loading part of a text file (i.e., fileread the first X bytes)

0 Kommentare -2 ältere Kommentare anzeigen-2 ältere Kommentare ausblenden

Akzeptierte Antwort

2 Kommentare Keine anzeigenKeine ausblenden

Weitere Antworten (0)

Siehe auch

Kategorien

Tags

Community Treasure Hunt

0 Kommentare
-2 ältere Kommentare anzeigen-2 ältere Kommentare ausblenden

2 Kommentare
Keine anzeigenKeine ausblenden