Data extraction from .txt file

Question

Mate 2u am 31 Mär. 2012

0
Verknüpfen

Direkter Link zu dieser Frage

https://de.mathworks.com/matlabcentral/answers/34130-data-extraction-from-txt-file

Beantwortet: D. Ali am 27 Apr. 2019

I have a LARGE minute data set in the format of:

10/21/2002,0601,0.97360,0.97360,0.97360,0.97360,0,0
10/21/2002,0602,0.97360,0.97360,0.97360,0.97360,0,0
10/21/2002,0603,0.97350,0.97350,0.97340,0.97350,0,0
10/21/2002,0604,0.97340,0.97340,0.97340,0.97340,0,0
10/21/2002,0605,0.97330,0.97330,0.97330,0.97330,0,0
10/21/2002,0606,0.97300,0.97310,0.97300,0.97310,0,0
10/21/2002,0607,0.97290,0.97290,0.97290,0.97290,0,0
10/21/2002,0608,0.97280,0.97280,0.97260,0.97260,0,0
10/21/2002,0609,0.97270,0.97270,0.97260,0.97260,0,0

This goes on till 2012.

I need two programs:

Select a time period for which I need. So lets say 0600-0630, then the program would store IN ORDER all the 0600-0630 in one matrix in matlab.
Select selected dates period and time period. So lets say I want 10/21/2002 0500 - 0700 and 17/21/2002 - 0500 - 0700.

Basically I want programs which will let me extract any time period and date period from the data. It is huge data so remember the program cant run for too long.

Look forward to some replies. Thanks in advance.

0 Kommentare
-2 ältere Kommentare anzeigen-2 ältere Kommentare ausblenden

Melden Sie sich an, um zu kommentieren.

Melden Sie sich an, um diese Frage zu beantworten.

Answer 1

per isakson am 1 Apr. 2012

1
Verknüpfen

Direkter Link zu dieser Antwort

https://de.mathworks.com/matlabcentral/answers/34130-data-extraction-from-txt-file#answer_42793

In MATLAB Online öffnen

Check whether this approach is fast enough. SSCANF used to be faster than DATEVEC to convert date strings with identical format for all lines and only numbers. Might not be the case anymore.

I copied your data to cssm.dat.

    fid = fopen( '.\cssm.dat', 'r' );
    cac = textscan( fid, '%s%s%f%f%f%f%f%f', 'Delimiter', ',' );
    sts = fclose( fid );
    vec = zeros( numel( cac{1,1} ), 6 );
    [ vec(:,1), vec(:,2), vec(:,3), ~, ~, ~ ] = datevec( cac{1,1}, 'mm/dd/yyyy' );
    [ ~, ~, ~,  vec(:,4), vec(:,5), ~ ]       = datevec( cac{1,2}, 'HHMM' );
    dat = [cac{1,3:end}];
    m1  = 4;
    m2  = 7;
    d06 = dat( vec(:,4)==6 & vec(:,5)>=m1 & vec(:,5)<=m2, : );
    v06 = vec( vec(:,4)==6 & vec(:,5)>=m1 & vec(:,5)<=m2, : );

Use the same approach for your second case.

Rounding errors should not be a problem in this example since DATEVEC returns flint (floating integers)

FSCANF (with 64bit R2011a) seems to be twice(?) as fast

    fid = fopen( '.\cssm.dat', 'r' );
    num = fscanf( fid, '%2u/%2u/%4u,%2u%2u,%f,%f,%f,%f,%f,%f', [11,inf] );
    sts = fclose( fid );
    num = transpose( num );
    vec(:,1:5) = num(:,[3,2,1,4,5]);  
    vec(:,6)   = 0;
    dat = num(:,6:end);

However, this requires a bit more care and a file that adheres to the format.

The code above assumes that all data fits in memory.

Surprisingly, the code below is another 20% faster (with 64bit R2011a). I see three advantages with this code. Easy to handle header lines. Easier do debug when the text file is "corrupted". One additional line of code will handle comma as decimal separator. And it's a bit faster.

    fid = fopen( '.\cssm.dat', 'r' );
    cac = textscan( fid, '%s', 'Whitespace','', 'Delimiter','\n' );
    sts = fclose( fid );                 
    str = transpose( char( cac{:} ) );
    str = cat( 1, str, repmat( ',', 1, size(str,2) ) );
    num = sscanf( str, '%2u/%2u/%4u,%2u%2u,%f,%f,%f,%f,%f,%f,', [11,inf] );
    vec(:,1:5) = num(:,[3,2,1,4,5]); 
    vec(:,6)   = 0;
    dat = num(:,6:end);

0 Kommentare
-2 ältere Kommentare anzeigen-2 ältere Kommentare ausblenden

Melden Sie sich an, um zu kommentieren.

Answer 2

Mate 2u am 1 Apr. 2012

0
Verknüpfen

Direkter Link zu dieser Antwort

https://de.mathworks.com/matlabcentral/answers/34130-data-extraction-from-txt-file#answer_42831

Hi, Used your first code...great code and fast. It allows me to change m1 and m2 to get respective minutes. Great work.

Now is there a way I can get the program to do it so I can get the time periods for certain days in the week. Lets say Mondays? But bear in mind some mondays are bank holidays historically.

So essentially last part I would need is a program which lets me choose the day and the times.

0 Kommentare
-2 ältere Kommentare anzeigen-2 ältere Kommentare ausblenden

Melden Sie sich an, um zu kommentieren.

Answer 3

per isakson am 4 Apr. 2012

0
Verknüpfen

Direkter Link zu dieser Antwort

https://de.mathworks.com/matlabcentral/answers/34130-data-extraction-from-txt-file#answer_43244

In MATLAB Online öffnen

isMonday = ( weekday( datenum( vec ) ) == 2 ); % every monday

However, bank holidays is a problem. As far as I know Matlab doesn't have a calendar. I guess there is a Finacial Toolbox, which has a calendar.

Bank holidays varies between countries. Google finds e.g: https://www.gov.uk/bank-holidays

=====

See also the FEX-contribution Error-tolerant parsing of newline-delimited data

1 Kommentar
-1 ältere Kommentare anzeigen-1 ältere Kommentare ausblenden

Mate 2u am 4 Apr. 2012

Thank you very much

Melden Sie sich an, um zu kommentieren.

Answer 4

Kevin Lamb am 30 Okt. 2015

0
Verknüpfen

Direkter Link zu dieser Antwort

https://de.mathworks.com/matlabcentral/answers/34130-data-extraction-from-txt-file#answer_197975

Bearbeitet: Kevin Lamb am 10 Nov. 2015

In MATLAB Online öffnen

so I was presented with a few dozen log files that were not uniform in size or structure, or quantity of tests ran that was in the form of:

header header blah
header
beginning of first repetition of datalog
blah blah number blah blah number
blah blah blah
blah blah blah blah
number you want is: 50 second number you want is: 100 bla bla bla
blah blah blah
end of first repetition of data log
beginning of second repetition of datalog
... so on

my solution was as follows;

%directory
file_names = dir('C:\Work\...\matlab\Log_Folder');
file_names = {file_names.name};
for index2 = 3:1:length(file_names)
    %find file, offset by 2
    fileID = fopen(char(file_names(index2)),'r');
        %parse data
        logfile = textscan(fileID, '%s','Delimiter','\n','MultipleDelimsAsOne',1);
        Parse1 = strfind(logfile{:},'number you want is');
        Parse2 = find(not(cellfun('isempty',Parse1)));
        Parse3 = logfile{1}(Parse2);
        for index1 = 1:1:length(Parse3)
            Parse4(:,index1) = textscan(Parse3{index1},'%s %s %s %s %f %s %s %s %s %s %f %s %s %s', 'Delimiter', ' ', 'MultipleDelimsAsOne',1)';
        end
        %allocate data to output
        finaldataparse(index2-2,:) = {[Parse4{11,:}]'};
        clear Parse1 Parse2 Parse3 Parse4
    end

What this code ultimately ended up doing was incrementing through numerous log files, allocating every line of the log file to a cell, then looking at each cell for that keyphrase, "number you want is". Once found, it broke up those cells by the second textscan call to extract the numbers of interest, from there it was fairly easy to put it in a usable form and do data analysis.

With the uniform data file you have, using collectoutput would probably be useful.

0 Kommentare
-2 ältere Kommentare anzeigen-2 ältere Kommentare ausblenden

Melden Sie sich an, um zu kommentieren.

Answer 5

D. Ali am 27 Apr. 2019

0
Verknüpfen

Direkter Link zu dieser Antwort

https://de.mathworks.com/matlabcentral/answers/34130-data-extraction-from-txt-file#answer_372566

annotations sdb4.txt

I have similar question where I need to extarct all MCAP amples with time they occured on in separat file and plot if possilbe

I attached the file

0 Kommentare
-2 ältere Kommentare anzeigen-2 ältere Kommentare ausblenden

Melden Sie sich an, um zu kommentieren.

Data extraction from .txt file

0 Kommentare
-2 ältere Kommentare anzeigen-2 ältere Kommentare ausblenden

Akzeptierte Antwort

0 Kommentare
-2 ältere Kommentare anzeigen-2 ältere Kommentare ausblenden

Weitere Antworten (4)

0 Kommentare
-2 ältere Kommentare anzeigen-2 ältere Kommentare ausblenden

1 Kommentar
-1 ältere Kommentare anzeigen-1 ältere Kommentare ausblenden

0 Kommentare
-2 ältere Kommentare anzeigen-2 ältere Kommentare ausblenden

0 Kommentare
-2 ältere Kommentare anzeigen-2 ältere Kommentare ausblenden

Siehe auch

Kategorien

Tags

Community Treasure Hunt

Data extraction from .txt file

0 Kommentare -2 ältere Kommentare anzeigen-2 ältere Kommentare ausblenden

Akzeptierte Antwort

0 Kommentare -2 ältere Kommentare anzeigen-2 ältere Kommentare ausblenden

Weitere Antworten (4)

0 Kommentare -2 ältere Kommentare anzeigen-2 ältere Kommentare ausblenden

1 Kommentar -1 ältere Kommentare anzeigen-1 ältere Kommentare ausblenden

0 Kommentare -2 ältere Kommentare anzeigen-2 ältere Kommentare ausblenden

0 Kommentare -2 ältere Kommentare anzeigen-2 ältere Kommentare ausblenden

Siehe auch

Kategorien

Tags

Community Treasure Hunt

0 Kommentare
-2 ältere Kommentare anzeigen-2 ältere Kommentare ausblenden

0 Kommentare
-2 ältere Kommentare anzeigen-2 ältere Kommentare ausblenden

0 Kommentare
-2 ältere Kommentare anzeigen-2 ältere Kommentare ausblenden

1 Kommentar
-1 ältere Kommentare anzeigen-1 ältere Kommentare ausblenden

0 Kommentare
-2 ältere Kommentare anzeigen-2 ältere Kommentare ausblenden

0 Kommentare
-2 ältere Kommentare anzeigen-2 ältere Kommentare ausblenden