Good morning,
I can't find a way to properly set a script for import data as table.
Suppose you have many folders and in each of them there are many csv or xlsx or txt files and you want to create as output a single file ('pathOut/database.xlsx'). For each folder, appending one file to the other is easy.
The problem is that I'd like to write a script (actually an app with App Designer) that is able to load files that are structured in different way: for example, I have files which have the first 30 rows that are meaningless text and I want to neglect those. In another folder, files have 3 meaningless rows.
I'm using App Designer and I've created a Edit Field (numeric) which allows to indicate the data starting rows, the column delimiter and the headerline row. On the other hand, detectImportOptions is not able to properly attribute the right variable type; in addition, in some cases althought I told that delimiter=',' readtable understands that but not for the headerline: the resultant table is mainly full of string column, with an header line confused where comma (the delimeter) is converted with ' _ '.
I attached here two different examples (Input_type_1.xlsx was converted in xlsx because this webpage does not support the .dat file): I don't need to load both of them in the same run, but I'd like to create an app which is able to read both of them specifing as input the less that I can (e.g. data starting rows, column delimiter and headerline row).
Thus, how can I manage detectImportOptions to do that? (I have Matlab 2018b release).
Thank you for your suggestions and time!
Luca

8 Kommentare

Ive J
Ive J am 12 Jan. 2021
I read your question couple of times, but I couldn't figure it out clearly. So, you have different delimited files with different number of comment rows (junk rows you don't wanna read). Can you share a part of your script you've tried to handle these differences? What about similarities between these files?
Also, share chunks of your data to see what exactly you are dealing with (instead of converting to XLSX, just compress them).
Hello Ive J,
thank you for your answer. Unfortunatly, I haven't my script here with me (it's on another laptop). The idea is:
pathIn='C:\Users\path\Input_folder\Folder_1\';
list=dir(pathIn);
list=list(~ismember(list},{'.','..'})); %not sure about this right now: it's to remove '.' and '..' from list
delimiter=','; %as input in app designer
startdatarow=41; %it's for Input_type_1 file (that I attach here) but user has to insert in app designer as input
headerrow=38; %it's for Input_type_1 file (that I attach here) but user has to insert in app designer as input
for nfile=1:length(list)
fileName=strcat([pathIn list(nfile)]); %this should concatenate pathIn and each file at each loop
opt=detectImportOptions(fileName);
% can't remember how I wrote that opt has to consider delimiter as 'Delimiter' and the other two input values
DataTemp=readtable(fileName,opt);
if nFile==1
Data=DataTemp;
else Data=[Data, DataTemp]; %append data
end
clear fileName opt DataTemp
end
When pathIn='C:\Users\path\Input_folder\Folder_2\', files are totally different. User can input different dlimiter, startdatarow and headerrow but VariableTypes and VariableNames are different.
Theoretically (in a perfect world but mainly if I would be a good developer), detectImportOptions would made by itself but it doesn't. Thus, what should I write to improve readtable command?
Am I explaining better?
dpb
dpb am 12 Jan. 2021
Bearbeitet: dpb am 12 Jan. 2021
The second should be no problem as shown it has no header lines and seems regular. There are no variable names in it, however, so those will either have to be supplied or use the default 'VarN'
The first is an ill-formed spreadsheet file, however -- it is all text in the first column, not a spreadsheet at all. That undoubtedly is a significant factor in parsing it; you've seriously misled detectImportOptions
If these two are representative of the population, the thing to do to help for the second is to read line-by-line first to find that header line with the distinctive string. You can then use that to set the VariableNamesLine position in the opt object.
However, the extraneous line between it and the real data is going to be an issue -- if that is not a constant then you also need to continue to scan to find where the real data do begin so you can also set the starting data line in the opt object; I'm not confident it will be able to do that on its own volition.
If you do that, then you should have a reasonable chance of reading the files; the better solution, of course, would be to have the files created originally with some discipline and with the thought of wanting to read them later.
dpb
dpb am 12 Jan. 2021
>> optType1=detectImportOptions('Input_type_1.xlsx')
optType1 =
SpreadsheetImportOptions with properties:
Sheet Properties:
Sheet: ''
Replacement Properties:
MissingRule: 'fill'
ImportErrorRule: 'fill'
Variable Import Properties: Set types by name using setvartype
VariableNames: {'No_important_text'}
VariableTypes: {'char'}
SelectedVariableNames: {'No_important_text'}
VariableOptions: Show all 1 VariableOptions
Access VariableOptions sub-properties using setvaropts/getvaropts
PreserveVariableNames: false
Range Properties:
DataRange: 'A2' (Start Cell)
VariableNamesRange: 'A1'
RowNamesRange: ''
VariableUnitsRange: ''
VariableDescriptionsRange: ''
To display a preview of the table, use preview
>>
As shown, the ill-formed spreadsheet file is total gibberish to detectImportOptions by default, everything being just text in first column it can't be expected to know any better.
Hello dpb, thank you for your answer.
The second type of file is not a problem and my code works good. I attached it in order to allow to understand what I meant as "different type of input". My problem consists in reading every kind of data file. For example, the "Input_type_1".
Here is my code:
%Load data insert as input directly by user:
RigaStartDati=app.RigastartdatiEditField.Value; %Data start row: in input
RigaHeader=app.RigaheaderEditField.Value; %header row: in input
SepColChar=app.CarattereEditField.Value; %delimiter: input character if not from app.SeparatorecolonneButtonGroup
lista=app.lista; %list of files in selected folder
pathIn=app.pathIn; %folder path
%Delimiter
switch app.SeparatorecolonneButtonGroup.SelectedObject.Text
case 'Punto e virgola'
delimiter=';';
case 'Virgola'
delimiter=',';
case 'Tab'
delimiter='\t';
case 'Spazio'
delimiter=' ';
otherwise %app.SepColChar
delimiter=SepColChar;
end
if RigaStartDati==0 %if user does not indicate the first data row, the first row is selected
RigaStartDati=1;
end
%Start reading
count=1; %
for i=1:length(lista)
D{count,1}=lista(i).name;
Filename=strcat(pathIn,'/',D{count,1});
filename=char(Filename);
opt=detectImportOptions(filename);
opt.Delimiter=delimiter;
opt.DataLines(1,1)=RigaStartDati;
if RigaHeader~=0
opt.VariableNamesLine=RigaHeader;
end
datatemp=readtable(filename,opt);
if count==1
Data=datatemp;
else Data=[Data;datatemp];
end
clear Filename filename datatemp
count=count+1;
end
fileNameOutput=strcat([pathIn '/Database.xlsx']);
Inserting as initial input (running the app):
RigaStartDati=41;
RigaHeader=38;
When i=1,
opt =
DelimitedTextImportOptions with properties:
Format Properties:
Delimiter: {','}
Whitespace: '\b\t '
LineEnding: {'\n' '\r' '\r\n'}
CommentStyle: {}
ConsecutiveDelimitersRule: 'split'
LeadingDelimitersRule: 'keep'
EmptyLineRule: 'skip'
Encoding: 'ISO-8859-1'
Replacement Properties:
MissingRule: 'fill'
ImportErrorRule: 'fill'
ExtraColumnsRule: 'addvars'
Variable Import Properties: Set types by name using setvartype
VariableNames: {'DataOrder_DateTime_yyyy_mm_ddHh', 'mm', 'ss__Status_Stage1_Stage2_Stage3_Stage4_Stage5_Stage6_Stage7_Sta'}
VariableTypes: {'char', 'double', 'char'}
SelectedVariableNames: {'DataOrder_DateTime_yyyy_mm_ddHh', 'mm', 'ss__Status_Stage1_Stage2_Stage3_Stage4_Stage5_Stage6_Stage7_Sta'}
VariableOptions: Show all 3 VariableOptions
Access VariableOptions sub-properties using setvaropts/getvaropts
Location Properties:
DataLines: [41 Inf]
VariableNamesLine: 38
RowNamesColumn: 0
VariableUnitsLine: 0
VariableDescriptionsLine: 0
To display a preview of the table, use preview
As you can see:
VariableNames: {'DataOrder_DateTime_yyyy_mm_ddHh', 'mm', 'ss__Status_Stage1_Stage2_Stage3_Stage4_Stage5_Stage6_Stage7_Sta'}
VariableTypes: {'char', 'double', 'char'}
Why is like that? The cell was like this:
DataOrder=DateTime(yyyy/mm/dd hh:mm:ss),Status,Stage1,Stage2,Stage3,Stage4,Stage5,Stage6,Stage7,Stage8,Stage9,Stage10,Stage11,Stage12,Stage13,Stage14,MISC,ChargerI,ChargerU,TrapU,ImpactorLowPressure,AmbientPressure,AuxChannel1,AuxChannel2,AuxChannel3,AuxChannel4,AuxChannel5,AuxChannel6,T1,T2,T3,T4,Reserved,CAL,Stage1,Stage2,Stage3,Stage4,Stage5,Stage6,Stage7,Stage8,Stage9,Stage10,Stage11,Stage12,Stage13,Stage14,CON,Concentration value,COM,UserComment,PM10,PM2.5,PM1,PN10,PN2.5,PN1
What should I change to allow a correct reading?
Thank you for your time!
dpb
dpb am 12 Jan. 2021
Bearbeitet: dpb am 13 Jan. 2021
"What should I change to allow a correct reading?"
The file, maybe?
The biggest problem is one you are creating for yourself by building a spreadsheet file with all the data in the first column as text, not in separate columns as any respectable spreadsheet would be.
It's unreasonable to expect that file to be parsed as is...if you can't cause the files to be written correctly in the first place, then I would suggest writing a macro to do the column split inside Excel first and resave the file, then import that into MATLAB.
I'm no VBA/COM guru, it's probable you could create that code to do that as COM that you could call from MATLAB.
Luca D'Angelo
Luca D'Angelo am 12 Jan. 2021
Well, that file is an output of an instrument. I have several thousands of that file and that's why I'm trying to write an app which should allow me not wasting some months splitting columns and I learnt to use MATLAB and not VBA/COM.
I wrote a specific script for that kind of file that works but I wanted to wrote an app able to load all kind of output file that I obtain from every instruments that I normally use.
I could convert that script as a function and call that one in my app. This could be a solution even if a little unelegant.
I'll keep trying and maybe I'll do that.
Thank you.
dpb
dpb am 13 Jan. 2021
Did the instrument name that as a .xlsx on creation or was that done later?
Doesn't make sense to have done this way; who's the manufacturer of the instrument and what is the instrument itself?
I've used a lot of test gear over 40 years and never saw this symptom from any of them...

Melden Sie sich an, um zu kommentieren.

 Akzeptierte Antwort

dpb
dpb am 13 Jan. 2021
Bearbeitet: dpb am 13 Jan. 2021

0 Stimmen

"When the instrument creates its output, this is in .dat."
AHA! So you created the problem with the spreadsheet yourself, not the instrument manufacturer. They did write a csv file (albeit with non-quoted space-containing string variables). If you don't lie to detectImportOptions it seems to work just fine...
The one thing I've discovered in the past is that it (detectImportOptions) needs the klew about the delimiter on input to it instead of trying to only patch the default import options object. That's owing to the non-quoted strings in the file so there's still an issue there that the vendor could have done better, but it's not completely broken in being a one-column spreadsheet as is your original file.
optType1DAT=detectImportOptions('Input_type_1-1.dat','delimiter',',');
tType1_DAT=readtable('Input_type_1-1.dat',optType1DAT,'ReadVariableNames',1);
This returned
tType1_DAT =
17×58 table
DataOrder_DateTime_yyyy_mm_ddHh_mm_ss_ Status Stage1 Stage2 Stage3 Stage4 Stage5 Stage6 Stage7 Stage8 Stage9 Stage10 Stage11 Stage12 Stage13 Stage14 MISC ChargerI ChargerU TrapU ImpactorLowPressure AmbientPressure AuxChannel1 AuxChannel2 AuxChannel3 AuxChannel4 AuxChannel5 AuxChannel6 T1 T2 T3 T4 Reserved CAL Stage1_1 Stage2_1 Stage3_1 Stage4_1 Stage5_1 Stage6_1 Stage7_1 Stage8_1 Stage9_1 Stage10_1 Stage11_1 Stage12_1 Stage13_1 Stage14_1 CON ConcentrationValue COM UserComment PM10 PM2_5 PM1 PN10 PN2_5 PN1
______________________________________ _________________ ______ ______ ______ ______ ______ ______ ______ ______ ______ _______ _______ _______ _______ _______ ________ ________ ________ _____ ___________________ _______________ ___________ ___________ ___________ ___________ ___________ ___________ ______ ______ _____ _____ ________ _______ ________ ________ ________ ________ ________ ________ ________ ________ ________ _________ _________ _________ _________ _________ _______ __________________ _______ ___________ _______ _______ _______ _____ _____ _____
{'2019/03/10 00:00:43.1'} {'CbTlsfpAvzij0'} 44.39 78.19 75.01 76.31 96.02 157 117.6 60.55 19.55 14.59 12.38 16.15 8.71 18.38 {'MISC'} 0.9955 3997 19.99 40.77 1013 0.01602 0.01556 0.01564 0.01556 0.01556 0.01579 30.58 170.2 22.57 34.34 0 {'CAL'} 42550 22090 8786 4306 3160 2768 1358 315 26.64 4.705 2.177 4.434 0.1254 1.105 {'CON'} 29010 {'COM'} {0×0 char} 0.07477 0.01382 0.01214 29010 29010 29010
{'2019/03/10 00:01:43.1'} {'CbTlsfpAvzij0'} 46.4 87.02 81.06 79.99 99.32 162.5 120.8 62.23 20.2 15.06 12.86 16.96 8.786 18.99 {'MISC'} 0.9954 3821 19.99 40.84 1012 0.0161 0.01556 0.01564 0.01556 0.01549 0.01564 -39.32 -12.55 22.56 34.34 0 {'CAL'} 44470 24660 9494 4498 3255 2857 1389 320.6 26.34 4.384 2.053 4.61 0.04668 1.109 {'CON'} 30820 {'COM'} {0×0 char} 0.07461 0.01398 0.0124 30820 30820 30820
{'2019/03/10 00:02:43.1'} {'CbTlsfpAvzij0'} 43.44 79.87 77.16 77.11 96.31 157.9 118.2 60.78 19.91 14.62 12.33 16.71 8.317 18.69 {'MISC'} 0.9956 3907 19.99 40.67 1012 0.01602 0.01556 0.01579 0.01564 0.01564 0.01556 427 -261.7 22.54 33.93 0 {'CAL'} 41640 22610 9069 4356 3169 2786 1366 316.5 28.11 4.736 2.125 4.792 0.1119 1.152 {'CON'} 28870 {'COM'} {0×0 char} 0.07704 0.01391 0.01227 28870 28870 28870
{'2019/03/10 00:03:43.1'} {'CbTlsfpAvzij0'} 40.22 71.18 72.52 73.55 91.59 148.5 112.2 58.56 18.44 13.91 11.56 15.96 7.723 17.17 {'MISC'} 0.9954 3940 19.99 40.79 1011 0.01602 0.01564 0.01572 0.01564 0.01549 0.01572 -14.4 18.25 22.54 34.34 0 {'CAL'} 38550 20110 8561 4177 3025 2622 1302 308.6 25.86 4.973 2.101 4.736 0.1009 1.051 {'CON'} 26610 {'COM'} {0×0 char} 0.07071 0.01334 0.01171 26610 26610 26610
{'2019/03/10 00:04:43.1'} {'CbTlsfpAvzij0'} 37.01 60.24 65.23 68.88 85.76 137.8 105.8 56.72 17.11 13.35 10.73 15.07 7.242 16.31 {'MISC'} 0.9956 3929 19.99 40.71 1010 0.01595 0.01564 0.01579 0.01556 0.01556 0.01572 -213.3 208.3 22.52 34.34 0 {'CAL'} 35480 16950 7718 3937 2846 2439 1236 304.6 24.7 5.663 2.161 4.652 0.1681 1.049 {'CON'} 24060 {'COM'} {0×0 char} 0.07086 0.01286 0.0112 24050 24050 24050
{'2019/03/10 00:05:43.1'} {'CbTlsfpAvzij0'} 39.07 66.93 69.13 71.23 87.19 139.7 105.9 58.12 17.44 14.01 11.34 15.86 7.529 17.62 {'MISC'} 0.9957 3867 19.99 40.9 1011 0.01602 0.01572 0.01572 0.01549 0.01556 0.01564 -433.9 342.5 22.52 34.34 0 {'CAL'} 37440 18880 8159 4051 2877 2464 1226 309.9 23.85 5.902 2.296 4.878 0.1038 1.155 {'CON'} 25580 {'COM'} {0×0 char} 0.07627 0.013 0.01123 25580 25580 25580
{'2019/03/10 00:06:43.1'} {'CbTlsfpAvzij0'} 41.39 69.27 71.15 70.84 83.48 134.6 103.6 54.26 19.96 13.12 11.44 15.05 7.471 17.05 {'MISC'} 0.9957 3838 19.99 40.97 1012 0.01602 0.01549 0.01572 0.01564 0.01556 0.01556 -254.7 180.4 22.52 34.34 0 {'CAL'} 39670 19520 8380 4001 2725 2356 1191 281.2 33.06 4.008 2.156 4.231 0.08179 1.039 {'CON'} 26660 {'COM'} {0×0 char} 0.07019 0.01272 0.01107 26660 26660 26660
{'2019/03/10 00:07:43.1'} {'CbTlsfpAvzij0'} 39.5 64.87 68.21 69.03 81.26 128.6 99.68 54.06 18.43 13.09 11.07 14.66 7.613 16.4 {'MISC'} 0.9956 3941 19.99 40.7 1011 0.01602 0.01572 0.01572 0.01572 0.01556 0.01572 -78.19 -352.2 22.51 33.93 0 {'CAL'} 37860 18260 8048 3913 2660 2252 1147 284.1 28.94 4.809 2.209 4.215 0.1711 1.009 {'CON'} 25410 {'COM'} {0×0 char} 0.06858 0.01241 0.0107 25410 25410 25410
{'2019/03/10 00:08:43.1'} {'CbTlsfpAvzij0'} 40.91 66.71 68.9 69.02 80.86 126.8 98.06 54.47 18.36 13.48 11.35 15.17 7.67 17.16 {'MISC'} 0.9954 3990 19.99 41 1011 0.01595 0.01556 0.01572 0.01549 0.01556 0.01556 189.9 -292.1 22.51 34.34 0 {'CAL'} 39210 18770 8108 3897 2636 2212 1122 285.5 27.88 5.112 2.324 4.423 0.09481 1.08 {'CON'} 26120 {'COM'} {0×0 char} 0.07162 0.01234 0.01055 26120 26110 26110
{'2019/03/10 00:09:43.1'} {'CbTlsfpAvzij0'} 40.82 66.16 68.03 68.24 79.47 123.3 95.87 53.86 18.04 13.45 11.1 14.93 7.63 17.48 {'MISC'} 0.9955 3996 19.99 40.77 1011 0.0161 0.01579 0.01579 0.01564 0.01564 0.01572 -428.6 -293 22.5 34.34 0 {'CAL'} 39120 18610 7998 3851 2587 2146 1094 282.3 27.06 5.253 2.167 4.316 0.1383 1.13 {'CON'} 25980 {'COM'} {0×0 char} 0.07459 0.01199 0.01032 25980 25980 25980
{'2019/03/10 00:10:43.1'} {'CbTlsfpAvzij0'} 40 57.94 61.19 65.07 76.05 117.6 92.68 53.06 17.28 13.14 10.72 14.47 7.244 16.4 {'MISC'} 0.9954 3947 19.99 40.88 1012 0.01602 0.01564 0.01579 0.01572 0.01564 0.01572 -254.4 -261.5 22.49 34.34 0 {'CAL'} 38340 16180 7166 3684 2483 2050 1063 281.8 26.33 5.635 2.253 4.283 0.1049 1.045 {'CON'} 24670 {'COM'} {0×0 char} 0.0692 0.01181 0.01007 24670 24670 24670
{'2019/03/10 00:11:43.1'} {'CbTlsfpAvzij0'} 45.92 69.13 66.5 68.72 79.25 122.3 95.41 55.17 18.82 14.36 11.98 15.9 8.304 18.2 {'MISC'} 0.9954 3868 19.99 40.99 1012 0.01602 0.01564 0.01579 0.01549 0.01556 0.01556 -400.8 -398.6 22.47 34.34 0 {'CAL'} 44020 19360 7712 3845 2557 2113 1078 286.7 27.61 5.783 2.494 4.574 0.108 1.132 {'CON'} 28140 {'COM'} {0×0 char} 0.07431 0.01224 0.01032 28140 28140 28140
{'2019/03/10 00:12:43.1'} {'CbTlsfpAvzij0'} 51.3 78.16 70.87 71.26 80.85 124.2 96.79 55.37 20.16 14.67 12.53 16.59 8.705 18.7 {'MISC'} 0.9955 3848 19.99 40.89 1011 0.01602 0.01564 0.01572 0.01564 0.01564 0.01564 442.9 279.8 22.47 34.34 0 {'CAL'} 49170 21900 8156 3943 2578 2126 1078 280.1 28.96 4.744 2.175 4.501 0.08361 1.075 {'CON'} 31170 {'COM'} {0×0 char} 0.07079 0.01201 0.01033 31170 31170 31170
{'2019/03/10 00:13:43.1'} {'CbTlsfpAvzij0'} 45.04 66.49 64.36 68.23 78.32 120 94.31 54.52 18.61 13.92 11.52 15.44 8.108 17.54 {'MISC'} 0.9955 3933 19.99 40.74 1012 0.01602 0.01579 0.01579 0.01572 0.01556 0.01572 101.2 65.98 22.46 33.93 0 {'CAL'} 43170 18590 7459 3829 2532 2074 1068 284.3 27.8 5.451 2.228 4.409 0.1309 1.075 {'CON'} 27480 {'COM'} {0×0 char} 0.07106 0.01195 0.01023 27480 27480 27480
{'2019/03/10 00:14:43.1'} {'CbTlsfpAvzij0'} 46.31 68.89 65.76 69.64 79.68 122.2 95.59 55.48 18.76 14.11 11.7 15.6 7.816 17.66 {'MISC'} 0.9955 3932 19.99 40.33 1011 0.0161 0.01564 0.01564 0.01549 0.01564 0.01556 -324.1 49.78 22.45 33.93 0 {'CAL'} 44390 19270 7611 3903 2572 2109 1080 288.6 27.19 5.299 2.163 4.369 0.07646 1.058 {'CON'} 28270 {'COM'} {0×0 char} 0.06982 0.01202 0.01033 28270 28270 28270
{'2019/03/10 00:15:43.1'} {'CbTlsfpAvzij0'} 47.62 71.44 67.48 70.54 80.14 123 95.99 55.35 19.31 14.27 11.85 16.11 8.374 17.77 {'MISC'} 0.9955 3887 19.99 40.75 1011 0.0161 0.01564 0.01556 0.01556 0.01564 0.01572 -306 192.3 22.44 33.93 0 {'CAL'} 45640 20000 7803 3944 2578 2118 1080 285.4 28.34 5.152 2.077 4.553 0.08317 1.04 {'CON'} 29070 {'COM'} {0×0 char} 0.06907 0.01195 0.01036 29070 29070 29070
{'2019/03/10 00:16:43.1'} {'CbTlsfpAvzij0'} 47.44 78.03 72.56 72.99 83 125.3 96.54 57.36 18.18 14.7 12.12 16.24 8.249 17.5 {'MISC'} 0.9955 3937 19.99 40.47 1011 0.01617 0.01572 0.01564 0.01572 0.01564 0.01556 -216.1 -342.7 22.44 34.34 0 {'CAL'} 45470 21960 8427 4077 2669 2155 1080 296.5 22.45 5.36 2.074 4.494 0.06754 0.9752 {'CON'} 29760 {'COM'} {0×0 char} 0.06514 0.01189 0.01029 29760 29760 29760
>>
Go ahead and finish giving it the help it needs by setting the variable types and the import date format in the import options object to match.
I'd reset the first column column/variable name there as well as probably turn the Status variable into categorical.

3 Kommentare

dpb
dpb am 13 Jan. 2021
Bearbeitet: dpb am 13 Jan. 2021
It's certainly possible that the comment text in a real file may have numeric data included in it or otherwise be formatted such that detectImportOptions can't tell for sure where the real data start. This could be symptomatic that some files work and others apparently similar don't.
If that turns out to be the case, almost certainly the reliable solution will be the one shown earlier of parsing the file to find the header line explicitly.
It would be agoodthing™ if the vendor had preceded the comment lines with a comment indicator -- don't suppose you've left that detail out of the sample file, too, by any chance, have you?
Luca D'Angelo
Luca D'Angelo am 16 Jan. 2021
I still can't understand why it works since it seems to be more or less the same I've tried at the beginning but...thank you so much!!! :)
dpb
dpb am 16 Jan. 2021
Bearbeitet: dpb am 16 Jan. 2021
"...it seems to be more or less the same..."
It's NOT the same as in any that I saw you post, no. "More or less" doesn't cut it in programming; details are not only important, they can be critical.
Here the critical difference is what I outlined above -- I called detectImportOptions with the 'delimiter',',' optional named parameter defined on input to it rather than relying on the default delimiter detection routine internal to the function and then patching the default returned import object.
As noted, from empirical evidence this is a particularly important additional aid to the function in its parsing of the file when there are non-quoted strings in a CSV file that contain embedded spaces; it is nearly impossible for a generic routine to be able to identify that condtion unilaterally.
As the last comment notes, it's a pretty good bet you'll still have random occasions where it fails depending on just what the extraneous lines can contain.

Melden Sie sich an, um zu kommentieren.

Weitere Antworten (2)

dpb
dpb am 13 Jan. 2021

0 Stimmen

Save the file as a .csv instead of .xlsx. May have to coerce Excel to not double-quote the strings; I presume there's a way but I use Excel as little as possible so that's an "Exercise for Student".
Following that, it's pretty straightforward...
optType1CSV=detectImportOptions('Input_type_1.csv','Delimiter',',');
optType1CSV.VariableNamesLine=optType1CSV.DataLines(1)-3; % if this is variable, takes more work
tType1=readtable('Input_type_1.csv',optType1CSV,'ReadVariableNames',1);
tType1.Date=datetime(tType1.Date,'InputFormat','yyyy/MM/dd hh:mm:ss.S');
The result of this for your sample file is:
tType1 =
8×58 table
Date Status Stage1 Stage2 Stage3 Stage4 Stage5 Stage6 Stage7 Stage8 Stage9 Stage10 Stage11 Stage12 Stage13 Stage14 MISC ChargerI ChargerU TrapU ImpactorLowPressure AmbientPressure AuxChannel1 AuxChannel2 AuxChannel3 AuxChannel4 AuxChannel5 AuxChannel6 T1 T2 T3 T4 Reserved CAL Stage1_1 Stage2_1 Stage3_1 Stage4_1 Stage5_1 Stage6_1 Stage7_1 Stage8_1 Stage9_1 Stage10_1 Stage11_1 Stage12_1 Stage13_1 Stage14_1 CON ConcentrationValue COM UserComment PM10 PM2_5 PM1 PN10 PN2_5 PN1
____________________ _________________ ______ ______ ______ ______ ______ ______ ______ ______ ______ _______ _______ ________ _______ _______ ________ ________ ________ _____ ___________________ _______________ ___________ ___________ ___________ ___________ ___________ ___________ ______ ______ _____ _____ ________ _______ _________ ________ ________ ________ ________ ________ ________ ________ ________ _________ _________ _________ _________ _________ _______ __________________ _______ ___________ _______ _______ _______ _____ _____ _____
26-Feb-2019 00:30:21 {'CbTlsfpAvzij0'} 106.2 72.35 121.2 155.9 259.9 379.2 340.7 231.8 68.89 24.7 10.26 7.802 19.51 12.76 {'MISC'} 0.9982 3437 19.99 39.96 1021 0.01602 0.01564 0.01564 0.01556 0.01556 0.01564 101.5 -50.8 24.83 29.85 0 {'CAL'} 1.019e+05 18730 14200 8973 8909 6805 4143 1391 163.6 1.231 0 0.0107 0.8538 0.02063 {'CON'} 58360 {'COM'} {0×0 char} 0.05119 0.04368 0.04368 58360 58360 58360
26-Feb-2019 00:31:21 {'CbTlsfpAvzij0'} 100.7 66.72 118.6 152.5 252.8 372.3 329.9 226.5 69.89 24.14 12.06 4.145 20.63 8.241 {'MISC'} 0.9981 3551 19.99 40.05 1020 0.01602 0.01564 0.01572 0.01549 0.01556 0.01572 -423.7 -28.2 24.8 30.67 0 {'CAL'} 96650 17200 13960 8805 8681 6693 4015 1362 174.8 6.267 0.1556 0.6002 1.995 0.1764 {'CON'} 55490 {'COM'} {0×0 char} 0.06779 0.04344 0.04332 55490 55490 55490
26-Feb-2019 00:32:21 {'CbTlsfpAvzij0'} 106.9 78.78 123.5 154.4 249.9 365.3 327.2 223.4 70.82 24.62 12.62 7.961 20.35 8.58 {'MISC'} 0.9982 3507 19.99 40.47 1020 0.01595 0.01556 0.01579 0.01556 0.01549 0.01556 189.9 -244.5 24.79 31.08 0 {'CAL'} 1.025e+05 20630 14440 8850 8525 6532 3961 1333 171.8 2.635 0 0.006366 0.8741 0 {'CON'} 59010 {'COM'} {0×0 char} 0.04922 0.04255 0.04255 59010 59010 59010
26-Feb-2019 00:33:21 {'CbTlsfpAvzij0'} 107.5 79.25 123.9 152.4 247.5 359 320.1 221.6 67.33 26 10.5 4.261 22.98 20.01 {'MISC'} 0.9975 3547 19.99 40.27 1020 0.01579 0.01556 0.01579 0.01549 0.01556 0.01564 -177.6 -111.7 24.79 30.67 0 {'CAL'} 1.032e+05 20750 14490 8712 8436 6410 3865 1322 158.1 3.178 0 0 1.659 0.3021 {'CON'} 59220 {'COM'} {0×0 char} 0.0704 0.0414 0.0414 59220 59220 59220
26-Feb-2019 00:34:21 {'CbTlsfpAvzij0'} 96.54 59.88 111.2 144.4 246.7 362 323.7 223.2 63.95 23.99 7.791 -0.06385 23.19 10.1 {'MISC'} 0.9967 3551 19.99 40.19 1020 0.01595 0.01564 0.01572 0.01564 0.01564 0.01564 -420.4 -402.7 24.79 31.49 0 {'CAL'} 92620 15270 13080 8344 8493 6520 3952 1349 152.6 2.434 0 0 2.366 0 {'CON'} 52840 {'COM'} {0×0 char} 0.05984 0.04178 0.04178 52840 52840 52840
26-Feb-2019 00:35:21 {'CbTlsfpAvzij0'} 99.33 63.98 112.1 144.5 251.1 369 328.9 227.8 61.62 25.13 6.76 -3.935 27.56 3.491 {'MISC'} 0.9967 3464 19.99 39.86 1020 0.01595 0.01572 0.01579 0.01564 0.01572 0.01564 -405.6 -22.27 24.81 31.89 0 {'CAL'} 95300 16420 13130 8316 8639 6643 4012 1376 140.8 3.313 0 0 3.761 0 {'CON'} 54390 {'COM'} {0×0 char} 0.07067 0.04189 0.04189 54390 54390 54390
26-Feb-2019 00:36:21 {'CbTlsfpAvzij0'} 92.43 58.33 104.8 135.6 238.9 351.7 312 220.1 58.7 24.7 5.016 -6.357 28.83 -0.1046 {'MISC'} 0.9969 3450 19.99 40.11 1020 0.0161 0.01564 0.01572 0.01549 0.01556 0.01564 10.44 -369.9 24.83 32.3 0 {'CAL'} 88670 14920 12300 7818 8235 6341 3811 1336 135.5 4.209 0 0 4.619 0 {'CON'} 50630 {'COM'} {0×0 char} 0.07547 0.04022 0.04022 50630 50630 50630
26-Feb-2019 00:37:21 {'CbTlsfpAvzij0'} 93.91 53.83 104 134.8 235.2 343.8 306 216.3 57.03 24.13 4.745 -7.297 28.45 -0.4211 {'MISC'} 0.9969 3627 19.99 40.47 1019 0.01595 0.01556 0.01572 0.01556 0.01549 0.01549 -312.5 -352.6 24.85 32.3 0 {'CAL'} 90080 13540 12200 7768 8099 6191 3733 1310 129.5 3.907 0 0 4.487 0 {'CON'} 50720 {'COM'} {0×0 char} 0.07352 0.03927 0.03927 50720 50720 50720
>>
Moral: Quit trying to pound square peg into round hole; fix the underlying problem of the badly constructed file.

1 Kommentar

Since this website didn't permit to upload a .dat file, in my second comment I attached a zip file where both the type input are. When the instrument creates its output, this is in .dat. In my zip file it was converted in .txt.
I've tried to attach it again from my Mac. It doesn't seem that there is a big difference.
The headerline row (Data=DataOrder=DateTime(yyyy/mm/dd hh:mm:ss),Status,Stage1,Stage2,Stage3,Stage4,Stage5,Stage6,...,Concentrarion value,...) has the problem that it has a space after "yyyy/mm/dd". Since I forced comma as delimiter, I thought that it wouldn't be a problem. I could try to solve this to set two delimiter character, i.e. space and comma (as done here: https://it.mathworks.com/matlabcentral/answers/466503-multiple-delimiters-for-readtable) and then merge the first and second column as one datetime.
pathIn='/path/Input_type_1/';
lista=dir(pathIn);
lista=lista(~ismember({lista.name},{'.','..','.DS_Store'})); %elimina elementi non leggibili
NumFile=length(lista);
delimiter=',';
RigaStartDati=41; %as custom input
count=1; %contatore per assegnare n righe e n colonne
i=1;
D{count,1}=lista(i).name;
Filename=strcat(pathIn,'/','Input_type_1-1.csv');
filename=char(Filename);
opt=detectImportOptions(filename,'Delimite',{',',' '});
opt.DataLines(1,1)=RigaStartDati;
RigaHeader=38; %as custom input
opt
opt =
DelimitedTextImportOptions with properties:
Format Properties:
Delimiter: {' ' ','}
Whitespace: '\b\t'
LineEnding: {'\n' '\r' '\r\n'}
CommentStyle: {}
ConsecutiveDelimitersRule: 'split'
LeadingDelimitersRule: 'keep'
EmptyLineRule: 'skip'
Encoding: 'ISO-8859-1'
Replacement Properties:
MissingRule: 'fill'
ImportErrorRule: 'fill'
ExtraColumnsRule: 'addvars'
Variable Import Properties: Set types by name using setvartype
VariableNames: {'DataOrder_DateTime_yyyy_mm_dd', 'hh_mm_ss_', 'Status' ... and 57 more}
VariableTypes: {'char', 'duration', 'char' ... and 57 more}
SelectedVariableNames: {'DataOrder_DateTime_yyyy_mm_dd', 'hh_mm_ss_', 'Status' ... and 57 more}
VariableOptions: Show all 60 VariableOptions
Access VariableOptions sub-properties using setvaropts/getvaropts
Location Properties:
DataLines: [41 Inf]
VariableNamesLine: 38
RowNamesColumn: 0
VariableUnitsLine: 0
VariableDescriptionsLine: 0
To display a preview of the table, use preview
This solves but since there is one of the headers is "Concentration value", it splits that and then shift data from column 51 to end.
In the end, the problem was not the file format but that even if I set comma as delimiter, the "space" in headerline causes problems.
If you have another idea, I will thank you more than I've already done ;)

Melden Sie sich an, um zu kommentieren.

dpb
dpb am 13 Jan. 2021

0 Stimmen

c=readcell('Input_type_1.xlsx'); % read as cellstr array
c=c(~cellfun(@(c)all(ismissing(c)),c,'UniformOutput',1)); % get rid of missing lines
c=c(find(contains(c,'DataOrder=')):end) % and everything before the data header line
c=c(contains(c,',')); % and anything else not comma delimited
writecell(c,'tempType1.csv',"QuoteStrings",0) % write to a temporary file
tType1=readtable('tempType1.csv','delimiter',',','ReadVariableNames',1)
This is about 99% there with only minor cleanup left...
>> readtable('tempType1.csv','delimiter',',','ReadVariableNames',1)
ans =
8×58 table
DataOrder_DateTime_yyyy_mm_ddHh_mm_ss_ Status Stage1 Stage2 Stage3 Stage4 Stage5 Stage6 Stage7 Stage8 Stage9 Stage10 Stage11 Stage12 Stage13 Stage14 MISC ChargerI ChargerU TrapU ImpactorLowPressure AmbientPressure AuxChannel1 AuxChannel2 AuxChannel3 AuxChannel4 AuxChannel5 AuxChannel6 T1 T2 T3 T4 Reserved CAL Stage1_1 Stage2_1 Stage3_1 Stage4_1 Stage5_1 Stage6_1 Stage7_1 Stage8_1 Stage9_1 Stage10_1 Stage11_1 Stage12_1 Stage13_1 Stage14_1 CON ConcentrationValue COM UserComment PM10 PM2_5 PM1 PN10 PN2_5 PN1
______________________________________ _________________ ______ ______ ______ ______ ______ ______ ______ ______ ______ _______ _______ ________ _______ _______ ________ ________ ________ _____ ___________________ _______________ ___________ ___________ ___________ ___________ ___________ ___________ ______ ______ _____ _____ ________ _______ _________ ________ ________ ________ ________ ________ ________ ________ ________ _________ _________ _________ _________ _________ _______ __________________ _______ ___________ _______ _______ _______ _____ _____ _____
{'2019/02/26 12:30:21.8'} {'CbTlsfpAvzij0'} 106.2 72.35 121.2 155.9 259.9 379.2 340.7 231.8 68.89 24.7 10.26 7.802 19.51 12.76 {'MISC'} 0.9982 3437 19.99 39.96 1021 0.01602 0.01564 0.01564 0.01556 0.01556 0.01564 101.5 -50.8 24.83 29.85 0 {'CAL'} 1.019e+05 18730 14200 8973 8909 6805 4143 1391 163.6 1.231 0 0.0107 0.8538 0.02063 {'CON'} 58360 {'COM'} NaN 0.05119 0.04368 0.04368 58360 58360 58360
{'2019/02/26 12:31:21.8'} {'CbTlsfpAvzij0'} 100.7 66.72 118.6 152.5 252.8 372.3 329.9 226.5 69.89 24.14 12.06 4.145 20.63 8.241 {'MISC'} 0.9981 3551 19.99 40.05 1020 0.01602 0.01564 0.01572 0.01549 0.01556 0.01572 -423.7 -28.2 24.8 30.67 0 {'CAL'} 96650 17200 13960 8805 8681 6693 4015 1362 174.8 6.267 0.1556 0.6002 1.995 0.1764 {'CON'} 55490 {'COM'} NaN 0.06779 0.04344 0.04332 55490 55490 55490
{'2019/02/26 12:32:21.8'} {'CbTlsfpAvzij0'} 106.9 78.78 123.5 154.4 249.9 365.3 327.2 223.4 70.82 24.62 12.62 7.961 20.35 8.58 {'MISC'} 0.9982 3507 19.99 40.47 1020 0.01595 0.01556 0.01579 0.01556 0.01549 0.01556 189.9 -244.5 24.79 31.08 0 {'CAL'} 1.025e+05 20630 14440 8850 8525 6532 3961 1333 171.8 2.635 0 0.006366 0.8741 0 {'CON'} 59010 {'COM'} NaN 0.04922 0.04255 0.04255 59010 59010 59010
...
to convert to datetime and set first column name.
More elegant would be to then use detectImportOptions on the cleaned-up file and can then set the time import format, etc., in it.
NB: If this format is consistent, you don't need to actually call detectImportOptions each time; create a library of import options objects for each particular type and save them as .mat files to reload and use where needed.
Either of the three ways gets around the problem your instrument vendor has created...you can't use their files in Excel as is, either. Something seems wrong in the setup; it doesn't make sense they would have done this deliberately.

Produkte

Version

R2018b

Gefragt:

am 12 Jan. 2021

Bearbeitet:

dpb
am 16 Jan. 2021

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by