Parsing data from complicated text files
    8 Ansichten (letzte 30 Tage)
  
       Ältere Kommentare anzeigen
    
    Michael Browne
 am 22 Mär. 2021
  
    
    
    
    
    Bearbeitet: Michael Browne
 am 24 Mär. 2021
            I have about 20 years of text files that contain the records of individual tests (about 8GB of plain text files, about 4,000 individual files). Each file has this format:
********************************************************************************
Test Data Report
Station ID:                     [Test Station ID Number]
Station Part Number:            [Test Station Part Number]
Station Serial Number:          [Test Station Serial Number]
Test Procedure Number:          [Test Procedure Number]   [Test Procedure Revision]
Operation:                      [colloquial test]
Serial Number of test subject:  [Serial Number + plus some other info about the test]
Date:                           [Day, Month Date, year]
Time:                           [11:00:03 AM]
Operator:                       [Operator Name]
Number of Results:              [NNNN]
Test Result:                    [Passed/Failed]
********************************************************************************
--------------------------------------------------------------------------------
MEASUREMENT               LL      READING           UL        UNITS      STATUS
--------------------------------------------------------------------------------
Enter Testing Time:                                                         Done
--------------------------------------------------------------------------------
08:00
--------------------------------------------------------------------------------
FOE, CAL:                                                                 Passed
--------------------------------------------------------------------------------
CALIBRATION IS VALID
--------------------------------------------------------------------------------
Test Start Time:                                                             Done
--------------------------------------------------------------------------------
11:00:33 AM
--------------------------------------------------------------------------------
Group Meas Init:                                                          Passed
--------------------------------------------------------------------------------
Datapoint_01            LL         Measured         UL         Units      Passed
Datapoint_02            LL         Measured         UL         Units      Passed
Datapoint_03            LL         Measured         UL         Units      Passed
Datapoint_04            LL         Measured         UL         Units      Passed
Datapoint_05            LL         Measured         UL         Units      Passed
Datapoint_06            LL         Measured         UL         Units      Passed
Datapoint_07            LL         Measured         UL         Units      Passed
Datapoint_08            LL         Measured         UL         Units      Passed
Datapoint_09            LL         Measured         UL         Units      Passed
Datapoint_10            LL         Measured         UL         Units      Passed
Datapoint_11            LL         Measured         UL         Units      Passed
Datapoint_12            LL         Measured         UL         Units      Passed
Datapoint_13            LL         Measured         UL         Units      Passed
Datapoint_14            LL         Measured         UL         Units      Passed
Datapoint_15            LL         Measured         UL         Units      Passed
Datapoint_16            LL         Measured         UL         Units      Passed
Datapoint_17            LL         Measured         UL         Units      Passed
Datapoint_18            LL         Measured         UL         Units      Passed
Datapoint_19            LL         Measured         UL         Units      Passed
Datapoint_20            LL         Measured         UL         Units      Passed
Datapoint_21            LL         Measured         UL         Units      Passed
Datapoint_22            LL         Measured         UL         Units      Passed
Datapoint_23            LL         Measured         UL         Units      Passed
Datapoint_24            LL         Measured         UL         Units      Passed
Datapoint_25            LL         Measured         UL         Units      Passed
Datapoint_26            LL         Measured         UL         Units      Passed
Datapoint_27            LL         Measured         UL         Units      Passed
Datapoint_28                       Measured         UL         Units      Passed
Datapoint_29                       Measured                    Units      Passed
--------------------------------------------------------------------------------
Group Meas Ramp:                                                          Passed
--------------------------------------------------------------------------------
Datapoint_01            LL         Measured         UL         Units      Passed
Datapoint_02            LL         Measured         UL         Units      Passed
Datapoint_03            LL         Measured         UL         Units      Passed
Datapoint_04            LL         Measured         UL         Units      Passed
Datapoint_05            LL         Measured         UL         Units      Passed
Datapoint_06            LL         Measured         UL         Units      Passed
Datapoint_07            LL         Measured         UL         Units      Passed
Datapoint_08            LL         Measured         UL         Units      Passed
Datapoint_09            LL         Measured         UL         Units      Passed
Datapoint_10            LL         Measured         UL         Units      Passed
Datapoint_11            LL         Measured         UL         Units      Passed
Datapoint_12            LL         Measured         UL         Units      Passed
Datapoint_13            LL         Measured         UL         Units      Passed
Datapoint_14            LL         Measured         UL         Units      Passed
Datapoint_15            LL         Measured         UL         Units      Passed
Datapoint_16            LL         Measured         UL         Units      Passed
Datapoint_17            LL         Measured         UL         Units      Passed
Datapoint_18            LL         Measured         UL         Units      Passed
Datapoint_19            LL         Measured         UL         Units      Passed
Datapoint_20            LL         Measured         UL         Units      Passed
Datapoint_21            LL         Measured         UL         Units      Passed
Datapoint_22            LL         Measured         UL         Units      Passed
Datapoint_23            LL         Measured         UL         Units      Passed
Datapoint_24            LL         Measured         UL         Units      Passed
Datapoint_25            LL         Measured         UL         Units      Passed
Datapoint_26            LL         Measured         UL         Units      Passed
Datapoint_27            LL         Measured         UL         Units      Passed
Datapoint_28                       Measured         UL         Units      Passed
Datapoint_29                       Measured                    Units      Passed
--------------------------------------------------------------------------------
Time (after meas):                                                          Done
--------------------------------------------------------------------------------
11:01:16 AM
--------------------------------------------------------------------------------
Group Meas Ramp:                                                          Passed
--------------------------------------------------------------------------------
Datapoint_01            LL         Measured         UL         Units      Passed
Datapoint_02            LL         Measured         UL         Units      Passed
Datapoint_03            LL         Measured         UL         Units      Passed
Datapoint_04            LL         Measured         UL         Units      Passed
Datapoint_05            LL         Measured         UL         Units      Passed
Datapoint_06            LL         Measured         UL         Units      Passed
Datapoint_07            LL         Measured         UL         Units      Passed
Datapoint_08            LL         Measured         UL         Units      Passed
Datapoint_09            LL         Measured         UL         Units      Passed
Datapoint_10            LL         Measured         UL         Units      Passed
Datapoint_11            LL         Measured         UL         Units      Passed
Datapoint_12            LL         Measured         UL         Units      Passed
Datapoint_13            LL         Measured         UL         Units      Passed
Datapoint_14            LL         Measured         UL         Units      Passed
Datapoint_15            LL         Measured         UL         Units      Passed
Datapoint_16            LL         Measured         UL         Units      Passed
Datapoint_17            LL         Measured         UL         Units      Passed
Datapoint_18            LL         Measured         UL         Units      Passed
Datapoint_19            LL         Measured         UL         Units      Passed
Datapoint_20            LL         Measured         UL         Units      Passed
Datapoint_21            LL         Measured         UL         Units      Passed
Datapoint_22            LL         Measured         UL         Units      Passed
Datapoint_23            LL         Measured         UL         Units      Passed
Datapoint_24            LL         Measured         UL         Units      Passed
Datapoint_25            LL         Measured         UL         Units      Passed
Datapoint_26            LL         Measured         UL         Units      Passed
Datapoint_27            LL         Measured         UL         Units      Passed
Datapoint_28                       Measured         UL         Units      Passed
Datapoint_29                       Measured                    Units      Passed
--------------------------------------------------------------------------------
Time (after meas):                                                          Done
--------------------------------------------------------------------------------
11:01:37 AM
--------------------------------------------------------------------------------
Now, at the moment, the only things I care about are
- Whether a failure occured or not
 - When that failure occured
 
I will likely want to perform other analysises on the data the in the future, but for the moment, this will suffice. I want to go through each report, determine whether a failure occured, record when that failure occured, and then plot all the failures as a histogram in terms of time so that I can see if there are any typical lengths of time it takes for a test to fail.
I have a pretty good amount of experience with working with data once it is in Matlab, but I am much less experienced with importing data, especially this kind of batch importing. Is there a simple way to do this, or am I essentially just using something like textscan() or fscanf() in a loop?
3 Kommentare
  dpb
      
      
 am 23 Mär. 2021
				Well, we still don't have a file to test with nor is there a case that fails in the text you posted...if you expect somebody to write code, you've got to do your part to give them the help needed from your end; otherwise you'll have the result of the other poster's wasted time/effort that doesn't work because what he was provided wasn't sufficient and his best guess of what it should be apparently wasn't correct.
In general, however, the idea would be to use readcell to import each file into a cell array, use contains or regexp to find rows with the key words/phrases wanted, and then parse those lines, taking into account where the group headers are to match which are which.
Akzeptierte Antwort
Weitere Antworten (1)
  Mathieu NOE
      
 am 23 Mär. 2021
        hello 
this is my 2 cents code to import the required data. The function will give you the time values (char array) and the number of failures. i tested it with two dummy files, one is your original data and the second one I changed the last section to create a Failed condition , plus I added another failed case with a different time value , just to check my code would correctly detect the 2 failures 
Filename_in = 'data2.txt';
% Filename_out= 'dataABC_reduced.txt';
[Time_init,Time_end,fail_count] = extract_data(Filename_in);
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
function [Time_init,Time_end,fail_count] = extract_data(Filename)
fid = fopen(Filename);
tline = fgetl(fid); 
% initialization
k = 0; % counter #1
fail_count = 0; % counter #2
Time_init = '';
Time_end{1} = '';
line_fail_ind = 0;
fail_flag = 0;
while ischar(tline)
    k = k+1;    % loop over line index
    % store initial Time value (start Time)
    if contains(tline,'Time:                           [')
        Time_init = deblank(extractBetween(tline,'[',']'))
    end
    % then search for 'Failed' case in line " Group Meas Ramp " 
     if (contains(tline,'Group Meas Ramp') && contains(tline,'Failed'))
        fail_flag = 1 ;
     end
     if fail_flag == 1 && contains(tline,'Time (after meas)')
         line_fail_ind = k;
     end
    %  time of failure  : capture when running index k = line_fail_ind + 2
    %  (and fail_flag == 1)
    if fail_flag == 1 && k == line_fail_ind + 2
        fail_count = fail_count+1;    
        Time_end{fail_count} = tline;
        fail_flag = 0; % reset fail_flag
    end
    tline = fgetl(fid);  % lower make matlab not case sensitive
end
fclose(fid);
end
3 Kommentare
  Mathieu NOE
      
 am 23 Mär. 2021
				hi
would you be able to copy paste the section of data that seems not to work 100% with my code ? 
  dpb
      
      
 am 24 Mär. 2021
				Is this one test/file?
Is the Group Meas Init: section of interest?  There is no time after it; only after the "Ramp" section is a ending time given.  I presume maybe if the INIT fails, the rest of the test is aborted and there consequently is no file?
Need all the ground rules...
Siehe auch
Kategorien
				Mehr zu Text Data Preparation finden Sie in Help Center und File Exchange
			
	Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!