handling blanks (whitespaces) with textscan

9 Ansichten (letzte 30 Tage)
Massimo Bassan
Massimo Bassan am 5 Nov. 2017
Kommentiert: Massimo Bassan am 7 Nov. 2017
I must read many text files like this:
920700292298339754786266784325021305647353906 47869473536 1585320 87242843 5824282 1674 106 125015 0 787 112311111 21A
920700292298356219105414709712071946390548771 42521474942 1825320101142892 8319982 1674 115 18438 -7 1957 632311110021B
920700292298356802105424709712071917177569918 42071444587 1965320101142892 8319491 1674 89 18438 -7 1957 612311110021B
...
I used both textread and textscan, but they both have trouble handling the blanks.
each line (thousands per file) of 130 characters is a data point, with some 25 fields in it. E.g.: formatt=%7u %2u %3u %12u64 %4u %4u %7d %6d%12u64 %7d %4u %5u %4u %3u %5d %6d %5u %8d%6d %4d %1c %4u %11c
Cdata = textscan(fid, formatt) ;
The blanks represent either trailing zeros (for which 'MultipleDelimitersAsOne' works ok) or missing fields, that I should read as zeros: in this case, all methods I tried have failed: any combination of 'withespace','0' or 'emptyvalue','0' or 'delimiter',''
I would be happy just by converting all blanks to '0'...but I cannot find the way, unless I use an external editor, file by file... Any suggestion ?
  3 Kommentare
Jeremy Hughes
Jeremy Hughes am 6 Nov. 2017
Hi Massimo,
This one is a bit weird, it's almost fixed-width, but when I look at these columns, I don't see fixed width. What I see is a number of fixed-width fields followed by space delimited fields (maybe I'm misinterpreting the format), but line 1 has fewer characters than lines 2 & 3.
I'm also having trouble understanding how the format relates to the data, it appears to be more characters than appear in the line. Line 1 is 118 characters long, where the format suggest there are at least 130.
textscan tries to consume all delimiters and whitespace that appear outside of a "field". For %u, %d, and %f type fields, they all scan numbers and end when the number has been read completely. (There's a picture in the documentation that explains what that means under "Algorithms".) The catch is that % 6 d will try to use up to 6 characters, but if if finishes reading a number before, it will stop and move on to the next number.
(e.g. '%6d2d' reads '1234 5' as 1234 and 5.)
This might be why it appears that things are working, but give odd results.
If you need to replace the spaces with zeros, you can do that with some simple code:
c = fileread(filename);
c = replace(c,' ','0');
fid = fopen(filename,'w');
fprintf(fid,c);
fclose(fid);
Alternatively, you can call textscan with 'Whitespace','','Delimiter','' and then use only %##c field specifiers. This will give you the text in a character array, and then use another function to convert the numbers(not ideal).
A more advanced option would be to use readtable with matlab.io.test.FixedWidthImportOptions. Which will take some setup, but allows you more control over field widths and output datatypes. I tried to set that up for an example, but since the format doesn't lineup with the actual text, I didn't know the right field widths to use. It will apply the exact width of the fields and try to convert that block to a number--whereas textscan tries to scan the field using up to the number of characters specified.
Massimo Bassan
Massimo Bassan am 7 Nov. 2017
Thank you both for the good suggestions. I will try them and let you know...
Sorry for the formatting issue: I guarantee there are 130 characters per line, and I attach now a short sample file.
I had tried character substitution with:
Cdata = textscan(fid, '%130c','endofline','\n');
C2=strrep(Cdata,' ','0');
but it did not work, would not change the blanks into zeros.
Indeed fread (+strrep) does work ! And by converting then to cell, I can easily parse to my 25 fields:
c = fileread(filename);
c = strrep(c,' ','0');
cc=cellstr(c,formatt);
Thanks again ! (but I do agree these are rather miserable workarounds and something better should be provided )

Melden Sie sich an, um zu kommentieren.

Antworten (0)

Kategorien

Mehr zu Data Type Conversion finden Sie in Help Center und File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by