How to use textscan to read my 2nd column and ignore the string or non numerical values?

29 Ansichten (letzte 30 Tage)
Hello, I have a huge data file and I was wondering if anyone could help me use textscan to only read the 2nd column but also ignore the strings. The file has this sort of format and the data keeps going.
detector located at x,y,z = 3.97500E+03 3.97500E+03-9.95000E+01
energy
1.0000E-01 5.44426E-10 0.4254
1.2475E-01 1.71665E-10 0.8055
1.4950E-01 8.51003E-11 0.8817
1.7426E-01 2.09602E-10 0.6570
1.9901E-01 2.62823E-10 0.4473
2.2376E-01 3.18821E-11 0.7145
2.4851E-01 4.37107E-11 0.6539
2.7327E-01 1.34258E-10 0.6703
2.9802E-01 1.31857E-10 0.6663
3.2277E-01 4.53330E-11 0.9459
3.4752E-01 9.37144E-13 0.9914
3.7228E-01 2.99698E-10 0.9950
3.9703E-01 7.03990E-18 0.9950
4.2178E-01 5.24669E-16 0.9950
4.4653E-01 4.01338E-29 0.9950
4.7129E-01 0.00000E+00 0.0000
4.9604E-01 0.00000E+00 0.0000
5.2079E-01 0.00000E+00 0.0000
5.4554E-01 0.00000E+00 0.0000
5.7030E-01 0.00000E+00 0.0000
5.9505E-01 0.00000E+00 0.0000
6.1980E-01 9.12419E-29 0.9950
6.4455E-01 0.00000E+00 0.0000
6.6931E-01 2.43906E-25 0.9950
6.9406E-01 0.00000E+00 0.0000
7.1881E-01 0.00000E+00 0.0000
7.4356E-01 0.00000E+00 0.0000
7.6832E-01 1.86537E-12 0.9950
7.9307E-01 0.00000E+00 0.0000
8.1782E-01 2.11385E-11 0.9950
8.4257E-01 0.00000E+00 0.0000
8.6733E-01 0.00000E+00 0.0000
8.9208E-01 0.00000E+00 0.0000
9.1683E-01 0.00000E+00 0.0000
9.4158E-01 9.73682E-13 0.9950
9.6634E-01 0.00000E+00 0.0000
9.9109E-01 0.00000E+00 0.0000
1.0158E+00 0.00000E+00 0.0000
1.0406E+00 0.00000E+00 0.0000
1.0653E+00 4.94059E-40 0.9950
1.0901E+00 0.00000E+00 0.0000
1.1149E+00 0.00000E+00 0.0000
1.1396E+00 0.00000E+00 0.0000
1.1644E+00 0.00000E+00 0.0000
1.1891E+00 0.00000E+00 0.0000
1.2139E+00 0.00000E+00 0.0000
1.2386E+00 0.00000E+00 0.0000
1.2634E+00 0.00000E+00 0.0000
1.2881E+00 0.00000E+00 0.0000
1.3129E+00 0.00000E+00 0.0000
1.3376E+00 6.03842E-10 0.9950
1.3624E+00 0.00000E+00 0.0000
1.3871E+00 0.00000E+00 0.0000
1.4119E+00 0.00000E+00 0.0000
1.4366E+00 0.00000E+00 0.0000
1.4614E+00 0.00000E+00 0.0000
1.4861E+00 0.00000E+00 0.0000
1.5109E+00 0.00000E+00 0.0000
1.5356E+00 0.00000E+00 0.0000
1.5604E+00 0.00000E+00 0.0000
1.5851E+00 0.00000E+00 0.0000
1.6099E+00 0.00000E+00 0.0000
1.6347E+00 0.00000E+00 0.0000
1.6594E+00 0.00000E+00 0.0000
1.6842E+00 0.00000E+00 0.0000
1.7089E+00 0.00000E+00 0.0000
1.7337E+00 0.00000E+00 0.0000
1.7584E+00 0.00000E+00 0.0000
1.7832E+00 0.00000E+00 0.0000
1.8079E+00 0.00000E+00 0.0000
1.8327E+00 0.00000E+00 0.0000
1.8574E+00 0.00000E+00 0.0000
1.8822E+00 0.00000E+00 0.0000
1.9069E+00 0.00000E+00 0.0000
1.9317E+00 0.00000E+00 0.0000
1.9564E+00 0.00000E+00 0.0000
1.9812E+00 0.00000E+00 0.0000
2.0059E+00 0.00000E+00 0.0000
2.0307E+00 0.00000E+00 0.0000
2.0554E+00 0.00000E+00 0.0000
2.0802E+00 0.00000E+00 0.0000
2.1050E+00 0.00000E+00 0.0000
2.1297E+00 0.00000E+00 0.0000
2.1545E+00 0.00000E+00 0.0000
2.1792E+00 0.00000E+00 0.0000
2.2040E+00 0.00000E+00 0.0000
2.2287E+00 0.00000E+00 0.0000
2.2535E+00 0.00000E+00 0.0000
2.2782E+00 0.00000E+00 0.0000
2.3030E+00 0.00000E+00 0.0000
2.3277E+00 0.00000E+00 0.0000
2.3525E+00 0.00000E+00 0.0000
2.3772E+00 0.00000E+00 0.0000
2.4020E+00 0.00000E+00 0.0000
2.4267E+00 0.00000E+00 0.0000
2.4515E+00 0.00000E+00 0.0000
2.4762E+00 0.00000E+00 0.0000
2.5010E+00 0.00000E+00 0.0000
2.5257E+00 0.00000E+00 0.0000
2.5505E+00 0.00000E+00 0.0000
2.5752E+00 0.00000E+00 0.0000
2.6000E+00 0.00000E+00 0.0000
total 2.58911E-09 0.3011
detector located at x,y,z = 3.97500E+03 3.97500E+03-9.95000E+01
uncollided photon flux
energy
1.0000E-01 7.06645E-15 0.9950
1.2475E-01 0.00000E+00 0.0000
1.4950E-01 0.00000E+00 0.0000
  4 Kommentare
John Vargas
John Vargas am 22 Aug. 2018
I am sorry, in this file, I had already removed the first two lines of strings which are:
detector located at x,y,z = 3.97500E+03 3.97500E+03-9.95000E+01 energy

Melden Sie sich an, um zu kommentieren.

Akzeptierte Antwort

Star Strider
Star Strider am 22 Aug. 2018
Your file is not easy to import. I’ve been working on this for a while.
Try this:
fidi = fopen('M1output1.txt','rt');
k1 = 1;
while ~feof(fidi)
C = textscan(fidi, '%*f%f%*f', 'HeaderLines',2, 'CollectOutput',true, 'CommentStyle',{' total', ' energy'});
M = cell2mat(C);
if isempty(M)
break
end
D{k1,:} = M;
fseek(fidi, 0, 0);
k1 = k1 + 1
end
fclose(fidi);
Column2 = cell2mat(D);
The textscan format descriptor reads only Column 2, ignoring the other two columns.
The loop is necessary because you have several text lines that interrupt the ordinary file reading process, so every time textscan encounters text where it expects a numeric value, it stops, snd it is necessary to use fseek to re-start it and read the next block of numbers. Your file also does not have a valid ‘end-of-file’ indicator, so to keep the loop from becoming infinite, it is necessary to test to see if the input is empty. If it is, then the loop breaks and file reading stops. Since this does not occur until all the data have been read, no data are lost.
The cell2mat call at the end concatenates the cells in ‘D’ into a single vector.
  3 Kommentare
Star Strider
Star Strider am 22 Aug. 2018
As always, my pleasure.
Yes. The asterisk between the ‘%’ and the type descriptor (here ‘f’) tells textscan to ignore that column. So to read all columns:
'%f%f%f'
to read only the first column and ignore the last two:
'%f%*f%*f'
or in a more readable (and still valid) form:
'%f %*f %*f'
and so for any other combinations you want to import or exclude.
Walter Roberson
Walter Roberson am 22 Aug. 2018
Star Strider used a format of
'%*f%f%*f'
that says to skip the first number, read and record the second, and skip the third number.
So use %*f for any column you want to skip, and %f for any column you want to read in.

Melden Sie sich an, um zu kommentieren.

Weitere Antworten (4)

Jeremy Hughes
Jeremy Hughes am 22 Aug. 2018
Bearbeitet: Jeremy Hughes am 22 Aug. 2018
I suggest trying
opts = detectImportOptions(filename)
T = readtable(filename,opts)
Also, if you want to ignore those lines:
opts = detectImportOptions(filename)
opts.ImportErrorRule = 'omitrow'
T = readtable(filename,opts)

Yuvaraj Venkataswamy
Yuvaraj Venkataswamy am 22 Aug. 2018

Walter Roberson
Walter Roberson am 22 Aug. 2018
You can use the CommentStyle option of textscan, specifying {'detector', 'energy'} . This will ignore the x, y, z coordinates on those lines.

jonas
jonas am 22 Aug. 2018
Here's another approach with regexprep and textscan
%%Read and remove annoying intermediate headers
str=fileread('File.txt');
str=regexprep(str,'(total|detector).*?energy','');
%%Read 2nd col
num=textscan(str,'%*f%f%*f');
out=cell2mat(num)

Kategorien

Mehr zu Large Files and Big Data finden Sie in Help Center und File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by