Why do I get "Out of memory." when reading only 16 chars?

5 Ansichten (letzte 30 Tage)

Ältere Kommentare anzeigen

Ed Frank am 25 Mär. 2020

0
Verknüpfen

Direkter Link zu dieser Frage

https://de.mathworks.com/matlabcentral/answers/512803-why-do-i-get-out-of-memory-when-reading-only-16-chars

Beantwortet: Guillaume am 29 Apr. 2020

Akzeptierte Antwort: Guillaume

In MATLAB Online öffnen

Dear Matlab community,

I want to read specific parts from a large (> 20 GB) binary file. However, the command

tmpString=fread(fid,[1,16],'char=>char');

fails with "Out of memory." The command is applied very near to the beginning of the file (offset is 20 bytes).

Why do I get this error and how can I successfully read in my file?

Thank you for your suggestions,

17 Kommentare
15 ältere Kommentare anzeigen15 ältere Kommentare ausblenden

Walter Roberson am 22 Apr. 2020

It turns out that R2020a, fopen now tries to do encoding detection; https://www.mathworks.com/help/matlab/ref/fopen.html#btrnibn-1-encodingIn

Historically, encoding detection for text being read by readtable() used to examine the first 10 kilobytes of the file; matters might be different for fopen()

Ed Frank: if you are still interested, could you try starting by reading (say) one character, and timing the fopen() and the first short fread(), and then fread() of the next 79, to see whether the long time is at the fopen() or at the first fread() of character data, or if the position somehow triggers the delay ?

Guillaume am 29 Apr. 2020

Sorry, I've been a bit too busy to follow answers recently but indeed I had conversations with Mathworks recently on text file parsing and indeed 2020a does automatic character set detection which is most likely the issue here. I'll post the details I've got from mathworks support in an answer.

Melden Sie sich an, um zu kommentieren.

Melden Sie sich an, um diese Frage zu beantworten.

Akzeptierte Antwort

Guillaume am 29 Apr. 2020

1
Verknüpfen

Direkter Link zu dieser Antwort

https://de.mathworks.com/matlabcentral/answers/512803-why-do-i-get-out-of-memory-when-reading-only-16-chars#answer_429178

"However what I could think of is that Matlab tries to guess the encoding"

I've had discussions with Mathworks support about this. The whole process is not properly documented unfortunately, which I told them can be a problem.

Indeed, if you open a file without specifying a character encoding, matlab will try to guess the file encoding the first time you either:

use any character reading function such as fgetl, fgets, fscanf, etc.
use fread with a 'char' or '*char' precision
ask for the encoding with the multi-output version of fopen.

I haven't been given the full process of character set detection, but it does read the whole file which indeed can be an issue for large files. If any byte sequence in the file is not a valid UTF8 code point, then the algorithm uses some heuristics to see if it's a CJK encoding and if it still doesn't match, it assumes the local encoding.

To prevent this autodetection to take place, you have to specify an encoding when you fopen the file. If you don't know what the encoding is for your binary file, I'd suggest using 'US-ASCII'. As we mentioned in the comment, it's unlikely that a binary file uses UTF8 unless it prefixes the text by a length.

Unfortunately, it's not easy to go back to pre-2020a behaviour of automatically using the native encoding whatever it is, as R2020a has lost the ability of easily getting the local encoding. On the other hand, relying on native encoding when reading a binary file is asking for trouble.

0 Kommentare
-2 ältere Kommentare anzeigen-2 ältere Kommentare ausblenden

Melden Sie sich an, um zu kommentieren.

Weitere Antworten (0)

Melden Sie sich an, um diese Frage zu beantworten.

Kategorien

MATLAB Data Import and Analysis Data Import and Export Low-Level File I/O

Mehr zu Low-Level File I/O finden Sie in Help Center und File Exchange

Produkte

MATLAB

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by

Why do I get "Out of memory." when reading only 16 chars?

17 Kommentare
15 ältere Kommentare anzeigen15 ältere Kommentare ausblenden

Akzeptierte Antwort

0 Kommentare
-2 ältere Kommentare anzeigen-2 ältere Kommentare ausblenden

Weitere Antworten (0)

Siehe auch

Kategorien

Tags

Produkte

Community Treasure Hunt

Why do I get "Out of memory." when reading only 16 chars?

17 Kommentare 15 ältere Kommentare anzeigen15 ältere Kommentare ausblenden

Akzeptierte Antwort

0 Kommentare -2 ältere Kommentare anzeigen-2 ältere Kommentare ausblenden

Weitere Antworten (0)

Siehe auch

Kategorien

Tags

Produkte

Community Treasure Hunt

17 Kommentare
15 ältere Kommentare anzeigen15 ältere Kommentare ausblenden

0 Kommentare
-2 ältere Kommentare anzeigen-2 ältere Kommentare ausblenden