readtable skipping some rows

104 Ansichten (letzte 30 Tage)
Andrea Cecilia
Andrea Cecilia am 7 Jan. 2020
Kommentiert: Cris LaPierre am 8 Jan. 2020
Hello,
I am reading a dataset with
opts = detectImportOptions('ML-04.csv');
opts.MissingRule = 'fill';
opts.ExtraColumnsRule = 'ignore';
data = readtable(strcat('ML-04.csv'), opts);
what I obtain is
>> data(1:30,:)
ans =
30×23 table
Var1 Var2 Var3 Var4 Var5 Var6 Var7 Var8 Var9 Var10 Var11 Var12 Var13 Var14 Var15 Var16 Var17 Var18 Var19 Var20 Var21 Var22 Var23
_________ __________ ____________________ _________ ____ ____ ____ ____ ____ ______ _____ _______ _____ _____ _____ _____ _____ _____ _____ _____ _____ _____ _____
{'ML-04'} 1.5594e+09 01-Jun-0019 00:00:00 {'02:20'} 14.1 15.6 13.1 13.4 96 1019.9 3.2 {'NE' } 45 8 8 0 0 0 0 0 0 0 65
{'ML-04'} 1.5594e+09 01-Jun-0019 00:00:00 {'02:25'} 14.1 15.6 13.1 13.3 95 1019.9 4.8 {'NE' } 45 8 8 0 0 0 0 0 0 0 65
{'ML-04'} 1.5594e+09 01-Jun-0019 00:00:00 {'02:30'} 14.1 15.6 13.1 13.3 95 1019.9 3.2 {'NE' } 45 6.4 8 0 0 0 0 0 0 0 65
{'ML-04'} 1.5594e+09 01-Jun-0019 00:00:00 {'02:35'} 14 15.6 13.1 13.2 95 1019.9 3.2 {'NE' } 45 6.4 8 0 0 0 0 0 0 0 65
{'ML-04'} 1.5594e+09 01-Jun-0019 00:00:00 {'02:40'} 13.9 15.6 13.1 13.1 95 1019.9 3.2 {'ENE'} 67.5 8 8 0 0 0 0 0 0 0 65
{'ML-04'} 1.5594e+09 01-Jun-0019 00:00:00 {'02:45'} 13.8 15.6 13.1 12.8 94 1019.9 3.2 {'ENE'} 67.5 6.4 8 0 0 0 0 0 0 0 65
{'ML-04'} 1.5594e+09 01-Jun-0019 00:00:00 {'02:50'} 13.7 15.6 13.1 12.7 94 1019.8 1.6 {'ENE'} 67.5 8 8 0 0 0 0 0 0 0 65
{'ML-04'} 1.5594e+09 01-Jun-0019 00:00:00 {'02:55'} 13.7 15.6 13.1 12.7 94 1019.7 3.2 {'ENE'} 67.5 6.4 8 0 0 0 0 0 0 0 65
{'ML-04'} 1.5594e+09 01-Jun-0019 00:00:00 {'03:00'} 13.7 15.6 13.1 12.7 94 1019.6 1.6 {'ENE'} 67.5 6.4 8 0 0 0 0 0 0 0 65
{'ML-04'} 1.5594e+09 01-Jun-0019 00:00:00 {'03:05'} 13.6 15.6 13.1 12.7 94 1019.7 3.2 {'ENE'} 67.5 6.4 8 0 0 0 0 0 0 0 65
{'ML-04'} 1.5594e+09 01-Jun-0019 00:00:00 {'03:10'} 13.6 15.6 13.1 12.5 93 1019.6 1.6 {'NE' } 45 4.8 8 0 0 0 0 0 0 0 65
{'ML-04'} 1.5594e+09 01-Jun-0019 00:00:00 {'03:15'} 13.6 15.6 13.1 12.5 93 1019.7 1.6 {'ENE'} 67.5 6.4 8 0 0 0 0 0 0 0 65
{'ML-04'} 1.5594e+09 01-Jun-0019 00:00:00 {'03:20'} 13.6 15.6 13.1 12.4 93 1019.8 3.2 {'NE' } 45 8 8 0 0 0 0 0 0 0 65
{'ML-04'} 1.5594e+09 01-Jun-0019 00:00:00 {'03:25'} 13.6 15.6 13.1 12.4 93 1019.7 3.2 {'NE' } 45 6.4 8 0 0 0 0 0 0 0 65
{'ML-04'} 1.5594e+09 01-Jun-0019 00:00:00 {'03:30'} 13.6 15.6 13.1 12.4 93 1019.7 1.6 {'NE' } 45 6.4 8 0 0 0 0 0 0 0 65
{'ML-04'} 1.5594e+09 01-Jun-0019 00:00:00 {'03:35'} 13.6 15.6 13.1 12.4 93 1019.7 1.6 {'NE' } 45 8 8 0 0 0 0 0 0 0 65
{'ML-04'} 1.5594e+09 01-Jun-0019 00:00:00 {'03:40'} 13.6 15.6 13.1 12.3 92 1019.6 3.2 {'NE' } 45 8 8 0 0 0 0 0 0 0 65
{'ML-04'} 1.5594e+09 01-Jun-0019 00:00:00 {'03:45'} 13.6 15.6 13.1 12.3 92 1019.7 3.2 {'NE' } 45 6.4 8 0 0 0 0 0 0 0 65
{'ML-04'} 1.5594e+09 01-Jun-0019 00:00:00 {'03:50'} 13.6 15.6 13.1 12.3 92 1019.7 3.2 {'ENE'} 67.5 8 8 0 0 0 0 0 0 0 65
{'ML-04'} 1.5594e+09 01-Jun-0019 00:00:00 {'03:55'} 13.6 15.6 13.1 12.4 93 1019.7 3.2 {'NE' } 45 6.4 8 0 0 0 0 0 0 0 65
{'ML-04'} 1.5594e+09 01-Jun-0019 00:00:00 {'04:00'} 13.5 15.6 13.1 12.2 92 1019.7 3.2 {'NE' } 45 6.4 8 0 0 0 0 0 0 0 65
{'ML-04'} 1.5594e+09 01-Jun-0019 00:00:00 {'04:05'} 13.5 15.6 13.1 12.2 92 1019.7 3.2 {'ENE'} 67.5 6.4 8 0 0 0 0 0 0 0 65
{'ML-04'} 1.5594e+09 01-Jun-0019 00:00:00 {'04:10'} 13.5 15.6 13.1 12.4 93 1019.6 1.6 {'ENE'} 67.5 4.8 8 0 0 0 0 0 0 0 65
{'ML-04'} 1.5594e+09 01-Jun-0019 00:00:00 {'04:15'} 13.4 15.6 13.1 12.3 93 1019.6 1.6 {'ENE'} 67.5 4.8 8 0 0 0 0 0 0 0 65
{'ML-04'} 1.5594e+09 01-Jun-0019 00:00:00 {'04:20'} 13.3 15.6 13.1 12.2 93 1019.6 1.6 {'ENE'} 67.5 4.8 8 0 0 0 0 0 0 0 65
{'ML-04'} 1.5594e+09 01-Jun-0019 00:00:00 {'04:25'} 13.2 15.6 13.1 12.1 93 1019.6 0 {'NE' } 45 3.2 8 0 0 0 0 0 0 0 65
{'ML-04'} 1.5594e+09 01-Jun-0019 00:00:00 {'04:30'} 13.1 15.6 13.1 12 93 1019.8 3.2 {'NE' } 45 8 8 0 0 0 0 0 0 0 65
{'ML-04'} 1.5594e+09 01-Jun-0019 00:00:00 {'04:35'} 13.1 15.6 13.1 11.9 93 1019.9 1.6 {'NE' } 45 4.8 8 0 0 0 0 0 0 0 65
{'ML-04'} 1.5594e+09 01-Jun-0019 00:00:00 {'04:40'} 13 15.6 13 11.9 93 1019.9 4.8 {'NE' } 45 8 8 0 0 0 0 0 2 0 65
{'ML-04'} 1.5594e+09 01-Jun-0019 00:00:00 {'04:45'} 13 15.6 13 11.9 93 1020 4.8 {'NE' } 45 11.3 11.3 0 0 0 0 0 5 0 65
the original file (attached) is
ML-04 1559336400 31/05/19 22:00 16.2 20.9 13.3 14.6 90 1020.5 0.0 N/A 0.0 19.3 0 0.0 0.0 0 0 0 0.0 65
ML-04 1559336700 31/05/19 22:05 16.2 20.9 13.3 14.5 90 1020.5 0.0 N/A 0.0 19.3 0 0.0 0.0 0 0 0 0.0 65
ML-04 1559337000 31/05/19 22:10 16.1 20.9 13.3 14.4 90 1020.6 0.0 N/A 0.0 19.3 0 0.0 0.0 0 0 0 0.0 65
ML-04 1559337300 31/05/19 22:15 16.0 20.9 13.3 14.5 91 1020.6 0.0 N/A 0.0 19.3 0 0.0 0.0 0 0 0 0.0 65
ML-04 1559337600 31/05/19 22:20 15.9 20.9 13.3 14.5 91 1020.5 0.0 N/A 0.0 19.3 0 0.0 0.0 0 0 0 0.0 65
ML-04 1559337900 31/05/19 22:25 15.8 20.9 13.3 14.4 91 1020.5 0.0 N/A 0.0 19.3 0 0.0 0.0 0 0 0 0.0 65
ML-04 1559338200 31/05/19 22:30 15.7 20.9 13.3 14.3 91 1020.5 0.0 N/A 0.0 19.3 0 0.0 0.0 0 0 0 0.0 65
ML-04 1559338500 31/05/19 22:35 15.6 20.9 13.3 14.1 91 1020.5 0.0 N/A 0.0 19.3 0 0.0 0.0 0 0 0 0.0 65
ML-04 1559338800 31/05/19 22:40 15.6 20.9 13.3 14.1 91 1020.5 0.0 N/A 0.0 19.3 0 0.0 0.0 0 0 0 0.0 65
ML-04 1559339100 31/05/19 22:45 15.6 20.9 13.3 14.1 91 1020.5 0.0 N/A 0.0 19.3 0 0.0 0.0 0 0 0 0.0 65
ML-04 1559339400 31/05/19 22:50 15.6 20.9 13.3 14.1 91 1020.5 0.0 N/A 0.0 19.3 0 0.0 0.0 0 0 0 0.0 65
ML-04 1559339700 31/05/19 22:55 15.6 20.9 13.3 14.1 91 1020.5 0.0 N/A 0.0 19.3 0 0.0 0.0 0 0 0 0.0 65
ML-04 1559340000 31/05/19 23:00 15.6 15.6 15.6 14.1 91 1020.5 0.0 N/A 0.0 0.0 0 0.0 0.0 0 0 0 0.0 65
ML-04 1559340300 31/05/19 23:05 15.5 15.6 15.5 14.0 91 1020.6 0.0 N/A 0.0 0.0 0 0.0 0.0 0 0 0 0.0 65
ML-04 1559340600 31/05/19 23:10 15.6 15.6 15.5 14.1 91 1020.6 0.0 N/A 0.0 0.0 0 0.0 0.0 0 0 0 0.0 65
ML-04 1559340900 31/05/19 23:15 15.4 15.6 15.4 14.1 92 1020.6 0.0 N/A 0.0 0.0 0 0.0 0.0 0 0 0 0.0 65
ML-04 1559341200 31/05/19 23:20 15.3 15.6 15.3 14.0 92 1020.5 0.0 N/A 0.0 0.0 0 0.0 0.0 0 0 0 0.0 65
ML-04 1559341500 31/05/19 23:25 15.2 15.6 15.2 13.9 92 1020.5 0.0 N/A 0.0 0.0 0 0.0 0.0 0 0 0 0.0 65
ML-04 1559341800 31/05/19 23:30 15.2 15.6 15.1 13.9 92 1020.4 0.0 N/A 0.0 0.0 0 0.0 0.0 0 0 0 0.0 65
ML-04 1559342100 31/05/19 23:35 15.2 15.6 15.1 14.1 93 1020.4 0.0 N/A 0.0 0.0 0 0.0 0.0 0 0 0 0.0 65
ML-04 1559342400 31/05/19 23:40 15.1 15.6 15.1 13.8 92 1020.4 0.0 N/A 0.0 0.0 0 0.0 0.0 0 0 0 0.0 65
ML-04 1559342700 31/05/19 23:45 14.9 15.6 14.8 13.6 92 1020.4 0.0 N/A 0.0 0.0 0 0.0 0.0 0 0 0 0.0 65
ML-04 1559343000 31/05/19 23:50 14.8 15.6 14.8 13.7 93 1020.4 0.0 N/A 0.0 0.0 0 0.0 0.0 0 0 0 0.0 65
ML-04 1559343300 31/05/19 23:55 14.7 15.6 14.7 13.6 93 1020.4 0.0 N/A 0.0 0.0 0 0.0 0.0 0 0 0 0.0 65
ML-04 1559343600 01/06/19 00:00 14.7 15.6 14.6 13.5 93 1020.4 0.0 N/A 0.0 0.0 0 0.0 0.0 0 0 0 0.0 65
ML-04 1559343900 01/06/19 00:05 14.7 15.6 14.6 13.6 93 1020.4 0.0 N/A 0.0 0.0 0 0.0 0.0 0 0 0 0.0 65
ML-04 1559344200 01/06/19 00:10 14.7 15.6 14.6 13.5 93 1020.4 0.0 N/A 0.0 0.0 0 0.0 0.0 0 0 0 0.0 65
ML-04 1559344500 01/06/19 00:15 14.8 15.6 14.6 13.7 93 1020.4 0.0 N/A 0.0 0.0 0 0.0 0.0 0 0 0 0.0 65
ML-04 1559344800 01/06/19 00:20 14.8 15.6 14.6 13.9 94 1020.4 0.0 N/A 0.0 0.0 0 0.0 0.0 0 0 0 0.0 65
ML-04 1559345100 01/06/19 00:25 14.8 15.6 14.6 13.7 93 1020.4 0.0 N/A 0.0 0.0 0 0.0 0.0 0 0 0 0.0 65
ML-04 1559345400 01/06/19 00:30 15.0 15.6 14.6 14.0 94 1020.3 0.0 N/A 0.0 0.0 0 0.0 0.0 0 0 0 0.0 65
ML-04 1559345700 01/06/19 00:35 15.1 15.6 14.6 14.0 93 1020.3 0.0 N/A 0.0 0.0 0 0.0 0.0 0 0 0 0.0 65
ML-04 1559346000 01/06/19 00:40 15.0 15.6 14.6 13.9 93 1020.2 0.0 N/A 0.0 0.0 0 0.0 0.0 0 0 0 0.0 65
ML-04 1559346300 01/06/19 00:45 14.8 15.6 14.6 13.7 93 1020.3 0.0 N/A 0.0 0.0 0 0.0 0.0 0 0 0 0.0 65
ML-04 1559346600 01/06/19 00:50 14.7 15.6 14.6 13.6 93 1020.3 0.0 N/A 0.0 0.0 0 0.0 0.0 0 0 0 0.0 65
ML-04 1559346900 01/06/19 00:55 14.6 15.6 14.6 13.4 93 1020.3 0.0 N/A 0.0 0.0 0 0.0 0.0 0 0 0 0.0 65
ML-04 1559347200 01/06/19 01:00 14.4 15.6 14.4 13.5 94 1020.4 0.0 N/A 0.0 0.0 0 0.0 0.0 0 0 0 0.0 65
ML-04 1559347500 01/06/19 01:05 14.2 15.6 14.2 13.3 94 1020.4 0.0 N/A 0.0 0.0 0 0.0 0.0 0 0 0 0.0 65
ML-04 1559347800 01/06/19 01:10 13.9 15.6 13.9 13.0 94 1020.4 0.0 N/A 0.0 0.0 0 0.0 0.0 0 0 0 0.0 65
ML-04 1559348100 01/06/19 01:15 13.8 15.6 13.8 12.8 94 1020.3 0.0 N/A 0.0 0.0 0 0.0 0.0 0 0 0 0.0 65
ML-04 1559348400 01/06/19 01:20 13.8 15.6 13.7 12.8 94 1020.2 0.0 N/A 0.0 0.0 0 0.0 0.0 0 0 0 0.0 65
ML-04 1559348700 01/06/19 01:25 13.7 15.6 13.7 12.8 94 1020.1 0.0 N/A 0.0 0.0 0 0.0 0.0 0 0 0 0.0 65
ML-04 1559349000 01/06/19 01:30 13.7 15.6 13.7 12.8 94 1020.1 0.0 N/A 0.0 0.0 0 0.0 0.0 0 0 0 0.0 65
ML-04 1559349300 01/06/19 01:35 13.5 15.6 13.5 12.6 94 1020.1 0.0 N/A 0.0 0.0 0 0.0 0.0 0 0 0 0.0 65
ML-04 1559349600 01/06/19 01:40 13.4 15.6 13.4 12.5 94 1020.1 0.0 N/A 0.0 0.0 0 0.0 0.0 0 0 0 0.0 65
ML-04 1559349900 01/06/19 01:45 13.2 15.6 13.2 12.2 94 1020.0 0.0 N/A 0.0 0.0 0 0.0 0.0 0 0 0 0.0 65
ML-04 1559350200 01/06/19 01:50 13.1 15.6 13.1 12.3 95 1020.0 0.0 N/A 0.0 0.0 0 0.0 0.0 0 0 0 0.0 65
ML-04 1559350500 01/06/19 01:55 13.4 15.6 13.1 12.8 96 1020.0 0.0 N/A 0.0 0.0 0 0.0 0.0 0 0 0 0.0 65
ML-04 1559350800 01/06/19 02:00 13.8 15.6 13.1 13.2 96 1020.0 0.0 N/A 0.0 0.0 0 0.0 0.0 0 0 0 0.0 65
ML-04 1559351100 01/06/19 02:05 13.4 15.6 13.1 12.7 95 1019.9 0.0 N/A 0.0 0.0 0 0.0 0.0 0 0 0 0.0 65
ML-04 1559351400 01/06/19 02:10 13.6 15.6 13.1 12.8 95 1019.9 0.0 N/A 0.0 0.0 0 0.0 0.0 0 0 0 0.0 65
ML-04 1559351700 01/06/19 02:15 13.9 15.6 13.1 13.3 96 1019.9 0.0 N/A 0.0 0.0 0 0.0 0.0 0 0 0 0.0 65
ML-04 1559352000 01/06/19 02:20 14.1 15.6 13.1 13.4 96 1019.9 3.2 NE 45 8.0 8.0 0 0.0 0.0 0 0 0 0.0 65
ML-04 1559352300 01/06/19 02:25 14.1 15.6 13.1 13.3 95 1019.9 4.8 NE 45 8.0 8.0 0 0.0 0.0 0 0 0 0.0 65
ML-04 1559352600 01/06/19 02:30 14.1 15.6 13.1 13.3 95 1019.9 3.2 NE 45 6.4 8.0 0 0.0 0.0 0 0 0 0.0 65
As you can see, readtable skips the first lines, and the table starts from the "01/06/19 02:20" line. It is curious that this line is exactly the first in which the 12th column has a different value from "N/A". I suppose that in some way it skips the rows where in that column there is "N/A" value. But why?
Besides, I don't need that column and I could not import it at all, but I don't know how to skip some specific columns with readtable.
Can you help me?

Antworten (2)

Jeremy Hughes
Jeremy Hughes am 7 Jan. 2020
detectImportOptions detecting Space as the delimiter, (correctly I think) and is looking at the "N/A" and "NE 45" and treating those as differently formatted lines and skipping the earlier ones, as there are fewer variables.
You can avoid this header line detection by passing 'NumHeaderLines',0 into detectImportOptions. However, there's a difference between the number of fields on those lines when using Space as delimiter.
From what I can see, is that "NE 45" is probably meant to be one field, and "N/A" is meant to mean that there's no data for that field on that row. This is unusual since the data is being parsed as two fields when it's there, and one when it's not. That's going to cause problems for any delimited reading method. It's a perfectly human readable file that's going to be a pain to parse automatically.
Typically, this kind of file is actually fixed-width, but that's not the case for your file.
I don't like to recommend modifying data files, but if you insert a " 0 " after each "N/A" in that column, it would work better. If you have any control over how the file is generated, put double-quotes around the "NE 45" type data and things will also smooth out.
The only other option is to read using something like textscan, which can read a single line and then allow you to make decisions based on that line.

Cris LaPierre
Cris LaPierre am 7 Jan. 2020
I'd suggest trying to import your file interactively first to see if you can understand what is happening. You can do this using the Import Data tool. Once you have it working as you want, you can then generate the corresponding code.
I discovered another issue in your file. At row 53 (1/6/2019 and 2:20), you gain a column, at least as far as the import options are concerned. If you inspect the output you are getting, you'll see Var12 and Var13 are perhaps considered a single entry (NE 45, ENE 67). However, it is getting split into 2 columns, causing the bump in columns. I'm not sure what is correct here since I'm unfamiliar with the data.
You'll have to find a way to correct for this.
If possible, I'd suggest trying to create the file using a delimiter other that space or tab. If you had commas, for example, then each column, including empty columns, would be clearly defined and the input would properly handle rows with the cardinal direction w/ accompanying degrees is specified.
In the meantime, a simple fix is to open the file in a text editor and replace each instance of " N/A " with " N/A N/A " (attached).
  2 Kommentare
Jeremy Hughes
Jeremy Hughes am 8 Jan. 2020
Ha! We must have both been typing at the same time.
Cris LaPierre
Cris LaPierre am 8 Jan. 2020
Must have! Glad to see we identified the same issue!

Melden Sie sich an, um zu kommentieren.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by