Find and extract column values from a text file

Hi all,
I have a txt file and a part of it, it's like this below:
==============================================================================================================
Allestimento: Nr. 1 1.5 GSE T4 DOHC DDCT MHEV (P2 48V) LONGITUDE FWD EMEA Base 215/65 R16
Accessori :
==============================================================================================================
========================== = PESI = ============================ = ASSETTI = ===========================
condizione di carico asse asse asse K scuotim fle ant freqPro W scuotim fle pos freqPro beccheg
ant[kg] pos[kg] tot[kg] [mm] ant[mm] [mm/100kg] ant[Hz] [mm] pos[mm] [mm/100kg] pos[Hz] [gradi]
---------------------- ------- ------- ------- ------- ------- ---------- ------- ------- ------- ---------- ------- --------
Standard 0 887 528 1415 -14.7 8 36.4 1.3 -21.6 -13.8 44.3 1.58 9'
Standard A 894 555 1449 -13.6 9.1 36.2 1.3 -15.6 -7.8 44.2 1.54 2'
Assetto di Disegno 844 590 1434 -22.7 0 37.2 1.33 -7.8 0 43.9 1.49 0°-19'
Teorico di Progetto 973 685 1658 0.7 23.4 35.3 1.26 12.8 20.6 43.1 1.38 0°-16'
1 Persone + 0kg 927 591 1518 -7.5 15.2 35.9 1.28 -7.6 0.2 43.9 1.49 0° 0'
2 Persone + 0kg 960 628 1588 -1.6 21.1 35.3 1.27 0.4 8.2 43.6 1.44 0°-2'
3 Persone + 30kg 969 719 1688 0 22.7 35.2 1.26 20 27.8 42.8 1.35 0°-26'
5 Persone + 50kg 992 856 1848 4 26.7 35.1 1.25 40.2 48 17.4 1.92 0°-48'
4 Persone + 30kg 982 776 1758 2.2 24.9 35.4 1.25 32.3 40.1 42.2 1.3 0°-40'
0 Persone + 147kg 947 649 1596 -4 18.7 35.9 1.27 4.9 12.7 43.4 1.42 0°-12'
This txt file is "periodic", so this kind of table is present more than one time with different values. From this file, I have to extract the last column ('beccheg', see the underlined values) of all of these tables and put them in differents arrays (to be then sorted and to let me take only some values from them, always periodically: in particular the 'Standard A' value).
I started like this:
clear all;
fid = fopen('520MCA.lis','r');
text = textscan(fid,'%s','Delimiter','');
text = text{1};
fid = fclose(fid);
beccheggio = regexp(text,'Standard A[\s\.=]+(\d+){2}[\s\.=]+(\d+){3}[\s\.=]+(\d+){4}[\s\.=]+[-]+(\d+)','tokens')
beccheggio = [beccheggio{:}];
beccheggio = str2double([beccheggio{:}]).';
Then I blocked myself because of two reasons:
1) the regular expression to be matched, to reach the last value of the 'Standard A' row of each table, could be too long, complicated and not equal for each table
2) there is not just one row, within the period of the file, which started with 'Standard A'.
I hope someone can help me.
Thanks,
Mattia

 Akzeptierte Antwort

Mathieu NOE
Mathieu NOE am 14 Okt. 2020

0 Stimmen

So , hopefully now a solution that works
I put the excel output file as well, that includes min and max values for beccheg already computed
now I hope that the file structure will not change too much...
i prefered to stick with low level string functions. I asume another "expert" will find out that there are alternatives with the latest matlab versions. For my fun I still develop for R12 backward compatibility (if possible)
enjoy

4 Kommentare

Hello Mathieu,
wonderful, it seems to work!!! I don't know how to repay you, thank you so much! :-)
Hope to not bothering you, I have to ask two things:
1) In the file excel file which coming out, I need the three values of 'beccheg', the StandardA value and then the minimum and the maximum of the same column (so three different values in three different columns).
2) I have a lot of txt files like the one I sent you for example: I change the name of the file and I extract the data from another txt file, but how can I put these other rows in the same excel file created at the first step? Obviously after the other rows.
Thank you very much again,
Mattia
Hello Mathieu,
in addition to what I wrote above, another comment. In the script, you wrote:
" % NB : the type of vehicule is repeated 5 times ( x n_sections ) = 40 items + at the very end of the file ,
% there is a summary of references files used that also list the 1 to n_sections type of vehicules
% we will use then the last n_sections cells created above for the final titles"
What if the numbers of items is not 40? I sent you by email others .txt files, maybe it's only sufficient to generalize the code.
Thanks,
Mattia
Hello Mattia
glad it works !
ok I think the "upgrades" are doable without much hard work. I'll have a look tomorrow
please send me on my email a couple of other data files so I can test the new code
ciao
And now the final version !
enjoy !
I have further refined the program and even simplified it a bit .
So the main script is : assetti1E.m
And it calls the function : retrieve_data.m
The major change is that I do not longer need my “stop line” calculation in the first portion of the code
If there is important thinks to know is to see the lines 96 to 98 in retrieve_data.m
% job for beccheg data retrieval
nb_lines_max = 20; % assuming table of data will never exceed this size
offset_start = 4;

Melden Sie sich an, um zu kommentieren.

Weitere Antworten (1)

Mathieu NOE
Mathieu NOE am 13 Okt. 2020

0 Stimmen

hello
see below :
fid = fopen('data.txt');
tline = fgetl(fid);
k = 0;
start_line = 9;
while ischar(tline)
k = k+1;
if k > start_line-1 % the extraction start when row index is 9
[m,n] = size(tline);
ind_stop = n;
ind_start = ind_stop-8; % the beccheg value must be written on max 8 character length (taken from the length of "--------" line above
beccheg_string = tline(ind_start:ind_stop);
% find index of "°" mark
ind_deg_separator = findstr(beccheg_string,'°');
Deg = str2num(beccheg_string(1:ind_deg_separator-1));
Minutes = str2num(beccheg_string(1+ind_deg_separator:end-1));
% conversion to degrees (D/M/S => deg
degrees(k+1-start_line,:) = Deg+Minutes/60;
end
tline = fgetl(fid);
end
fclose(fid);
degrees % left uncommented to check values in command window
%save or export as ascii or csv file : degrees

7 Kommentare

Gloria Longo
Gloria Longo am 14 Okt. 2020
Bearbeitet: Gloria Longo am 14 Okt. 2020
Hi Mathieu,
thanks for your quick answer.
Unfortunately, I got this error:
Unable to perform assignment because the indices on the left side are not compatible with the size of
the right side.
Error in assetti1 (line 20)
degrees(k+1-start_line,:) = Deg+Minutes/60;
I tried to modify the code without success.
I have also another question: with this code, are you assuming that the "beccheg" column is always at the same position? This could be true but I don't understand why start_line is initally equal to 9.
If we solve this problem, I have another step of this exercise :-) Hope you can help me!
Thanks,
Mattia
Hello Mattia
no problem
could you please send me a couple of your original data txt files - maybe the error is due I copy paste your txt from this matlab central page
Ciao
Hello Mathieu,
Thanks. I couldn't attach the file here since it's not a proper a .txt file (I got an error from Matlab portal, extension is not valid), so I uploaded it on Google Drive (file).
Meanwhile, I'll prepare the 2nd question.
Thanks again,
Mattia
Hello Mathieu,
below my 2nd question. In the file shared before, there are also a lot of tables like this:
punto Std.0 Std.A AdD TdP 1P + 7P + 14P + 14P + 14P + 7P + QUOTE IN INPUT
0kg 0kg 0kg 120kg 505kg 60kg x [mm] z [mm]
---------------------------------- ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ --------------- ------
%% ANABBAGLIANTE ANT 1186 1182 1190 1176 1179 1176 1167 1168 1154 1176 -362 805
%% LUCE DI POSIZ. POST P3450 (M) 1476 1476 1418 1454 1475 1454 1433 1425 1416 1450 4423 945
%% LUCE DI POSIZ. POST P4035 (L) 1495 1494 1429 1471 1494 1471 1449 1440 1430 1466 5008 945
%% LUCE DI POSIZ. POST P4035+(XL) 1505 1505 1435 1481 1505 1481 1458 1448 1439 1476 5358 945
%% FILO INF.TRAVERSA ANTERIORE 210 207 206 199 204 199 188 189 175 199 256 -190
%% FILO INF.CAMBIO - M40 201 197 201 190 194 190 180 182 167 191 -38 -189
BORLOTTO ANT 798 794 797 787 791 787 777 778 764 787 0 405
BORLOTTO POST L2 837 836 792 817 834 817 799 793 782 814 3450 336
BORLOTTO POST L3/L4 855 854 802 834 853 834 814 807 797 830 4035 336
TESTA VITE P2 ANTERIORE 209 206 204 197 203 197 186 187 173 197 367 -194
ASSE VITE P1 POSTERIORE L2 336 334 300 318 332 318 301 297 285 316 2735 -142
ASSE VITE POSTERIORE SPECIAL 347 345 306 328 344 328 311 305 294 325 3085 -142
ASSE VITE P1 POSTERIORE L3/L4 354 353 310 335 351 335 317 311 300 332 3320 -142
LDW 1888 1884 1883 1876 1881 1876 1865 1865 1852 1876 338 1484
GRADINO DI ACCESSO 460 457 450 447 454 447 436 435 422 447 730 45
SOGLIA DI CARICO PLS 513 510 487 497 508 497 482 479 467 495 1935 60
SOGLIA DI CARICO POST (M) 586 585 530 564 584 564 544 536 526 560 4260 60
SOGLIA DI CARICO POST (L) 604 604 541 581 603 581 559 550 541 576 4850 60
UREA EU (IPOTESI FL) 274 272 256 260 269 260 247 245 232 259 1380 -161
UREA USA (IPOTESI FL) 220 217 204 206 215 206 193 192 179 205 1217 -210
MRR 374 370 385 365 366 365 358 361 346 367 -892 10
CAMER SCAM 1908 1905 1903 1896 1902 1896 1886 1886 1872 1896 354 1505
In addition to the "beccheg" values we are trying to extract (see table of the previous post), I have to read the 2nd value of the CAMERA SCAM row (see value underlined) for each tables in the file.
Last but not least: these values ('beccheg' and 'CAMERA SCAM') should be associated to the title of the pages where these tables are (Allestimento: Nr. 1 1.5 GSE T4 DOHC DDCT MHEV (P2 48V) LONGITUDE FWD EMEA Base 215/65 R16). Seeing the line 'Allestimento', I have to read all the words, in this case starting from 1.5.
After all, I have to create an excel file where each row should contain in order:
  • the name of the 'Allestimento'
  • the 2nd value of the CAMERA SCAM row (see value underlined)
  • the last value of the 'Standard A' row / 'becchegg' column (see previous post): actually, as I wrote before, I need all the column also to extract the minimum and the maximum value of this column
I should create this row in the excel file for each the tables contained in the txt you imported.
Hope all it's clear, at your disposal if not.
Thank you very much again,
Mattia
OK
could you try again to send me your data files - but change the extension to txt first so that it will not be rejected by the Matlab portal
sorry, I have no acces to google drive at the office - blocked by internet security firewall
if it's not too big you can send it per email at : mathieu.noe@hutchinson.com
Hello Mathieu,
I tried to change the extension but this didn't solve the problem: I just sent you an email with the file attached, this comes from schianom@hotmail.it.
Thanks a lot,
Mattia

Melden Sie sich an, um zu kommentieren.

Kategorien

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by