How can I arrange my output from regexp stored in multiple cells in a for loop?

Question

Linus Dock am 18 Okt. 2016

1
Verknüpfen

Direkter Link zu dieser Frage

https://de.mathworks.com/matlabcentral/answers/307831-how-can-i-arrange-my-output-from-regexp-stored-in-multiple-cells-in-a-for-loop

Kommentiert: Guillaume am 20 Okt. 2016

Hi, I am using regexp to extract and match data from a textstring with the code:

tokens = regexp(DATALow, '\<(R\d{2}[A-Z])/(?:(?:\d{4})?[A-Z]+)?(\d{4})[A-Z]\>', 'tokens'); %find RVR in DATALow
SortTokens = cellfun(@(t) vertcat(t{:}), tokens, 'UniformOutput', false); %sort RVR as vertical cells

My output is stored in cells within a cell like this:

[]
[]
[]
<4x2 cell>
<4x2 cell>
<4x2 cell>
[]
[]

The output cells contain the data that look like this:

'R01L'  '1500'
'R19R'  '1500'
'R01R'  '1300'
'R19L'  '1500'

But the output cells are of different shape and can look like this as well:

[]
<1x2 cell>
<1x2 cell>
[]

My goal is to extract the data with a for-loop that take the size of the output cell into consideration and store it in to a cell with this code:

NoRUNWAY=ones(1,length(SortTokens)); %vector of zeros for speed 
for j=1:length(SortTokens)                  %for all data in the cell
      NoRws=length(SortTokens{j,1});        %count the length of each row
      if NoRws>0                            %if larger than zero
      NoRUNWAY(j)=NoRws;                    %set the number to the length of the row
      end
end
isemp = cellfun('isempty', tokens);     %find all empty cells in tokens
for l=1:length(SortTokens);
    RWYnum=NoRUNWAY(l);
for k=1:RWYnum
    tempRUNWAY = cellfun(@(x) x{k,1}, SortTokens(~isemp), 'uni', 0);
    tempRVR = cellfun(@(x) x{k,2}, SortTokens(~isemp), 'uni', 0);
    RVR = nan(size(SortTokens));
    RVR(~isemp) = cellfun(@str2num, tempRVR);
    RVRnan=isnan(RVR);
    RVRnanx=find(RVRnan);
    RVR(RVRnanx)=9999;
    RWYcell{1,k}=tempRUNWAY(1);
    RVRcell{1,k}=RVR;
end
end

The largest output cell is of size

<4x2 cell>

I would like to store the data into a new cell with four columns and to ultimately compare these values with some other measurements.

Is this making any sense? These are measurements of Runway Visual Range at multiple runways from different Airports and I would like to compare these with the Meteorological Visibility for the same Airports. The Data I am using called DATALow looks like this:

'METAR ESNS 010050Z AUTO 00000KT 0500 R10/0550V1300N R28/0500V0750N FG VV000 09/08 Q1011'
'METAR ESNS 010150Z AUTO 30002KT 0150 R10/0200N R28/0500VP1500N FG VV001 10/09 Q1012'
'METAR ESNS 010220Z AUTO 00000KT 0300 R10/0450V0800N R28/0300V0650D FG VV000 09/09 Q1012'
'METAR ESNS 010250Z AUTO 00000KT 0050 R10/0550V0800N R28/0175N FG VV000 10/09 Q1012'
'METAR ESNS 010320Z AUTO 00000KT 0050 R10/0200N R28/0375N FG VV001 10/09 Q1012'
'METAR ESNS 010350Z AUTO 00000KT 0100 R10/0250N R28/0250N FG VV001 10/10 Q1012'
'METAR ESNS 010420Z AUTO VRB02KT 0150 R10/0300N R28/0275N FG VV001 11/11 Q1012'
'METAR ESNS 010450Z AUTO 00000KT 0250 R10/0600VP1500N R28/0500V0800N FG VV001 12/11 Q1012'

And I just realized that my regexp code is missing most of the RVR because it is looking to match Runway designators with the shape:

R19L/

which is not the case for most of the Airports. Can someone please help with this?

15 Kommentare
13 ältere Kommentare anzeigen13 ältere Kommentare ausblenden

Linus Dock am 18 Okt. 2016

In MATLAB Online öffnen

I might as well give you more of the DATALow just in case:

'METAR ESNS 010020Z AUTO VRB01KT 0050 R10/0375V0550N R28/0150V0325N FG VV000 09/08 Q1011'
'METAR ESNS 010050Z AUTO 00000KT 0500 R10/0550V1300N R28/0500V0750N FG VV000 09/08 Q1011'
'METAR ESNS 010150Z AUTO 30002KT 0150 R10/0200N R28/0500VP1500N FG VV001 10/09 Q1012'
'METAR ESNS 010220Z AUTO 00000KT 0300 R10/0450V0800N R28/0300V0650D FG VV000 09/09 Q1012'
'METAR ESNS 010250Z AUTO 00000KT 0050 R10/0550V0800N R28/0175N FG VV000 10/09 Q1012'
'METAR ESNS 010320Z AUTO 00000KT 0050 R10/0200N R28/0375N FG VV001 10/09 Q1012'
'METAR ESNS 010350Z AUTO 00000KT 0100 R10/0250N R28/0250N FG VV001 10/10 Q1012'
'METAR ESNS 010420Z AUTO VRB02KT 0150 R10/0300N R28/0275N FG VV001 11/11 Q1012'
'METAR ESNS 010450Z AUTO 00000KT 0250 R10/0600VP1500N R28/0500V0800N FG VV001 12/11 Q1012'
'METAR ESNY 010120Z AUTO 29004KT 0150 R30/0500N FG VV001 12/12 Q1010'
'METAR ESNY 010150Z AUTO 28003KT 0200 R30/0450 FG VV001 12/12 Q1010'
'METAR ESNY 010220Z AUTO 31001KT 0200 R30/0450N FG VV001 12/12 Q1011'
'METAR ESNY 010250Z AUTO 26002KT 0150 R30/0350N FG VV001 12/12 Q1011'
'METAR ESNY 010320Z AUTO 28004KT 0350 R30/0700N FG VV001 12/12 Q1011'
'METAR ESNY 010350Z AUTO 29004KT 0500 R30/0750 FG VV001 12/12 Q1011'
'METAR ESNY 010420Z AUTO 30004KT 1300 R30/1000VP1500U VV002 12/12 Q1012 REFG'
'METAR ESNY 010450Z AUTO 29006KT 1200 R30/0800VP1500D VV001 12/12 Q1012 REFG'
'METAR ESNY 010520Z AUTO 30005KT 1000 R30/0750VP1500D VV001 12/12 Q1012 REFG'
'METAR ESNY 012120Z AUTO 28001KT 1300 R30/1100VP1500 SKC 11/11 Q1018 REFG'
'METAR ESPA 010250Z 31003KT 0300 R14/0450N R32/0250N FG VV010 11/11 Q1012'
'METAR ESPA 010320Z 36002KT 0250 R14/0400V0600N R32/0375N FG VV001 12/12 Q1012'
'METAR ESPA 010350Z VRB02KT 0300 R14/0500N R32/0700N FG OVC001 12/12 Q1012'
'METAR ESPA 010420Z VRB02KT 0500 R14/0750V1500U R32/P1500N FG OVC001 13/13 Q1013'
'METAR ESMK 022250Z AUTO 07004KT 0200 R01/0350VP2000 R19/P2000D FG NCD 12/12 Q1017'
'METAR ESMK 022320Z AUTO 06004KT 0100 R01/0350 R19/P2000D NCD 12/11 Q1017'
'METAR ESMK 022350Z AUTO 06003KT 0050 R01/0300N R19/1200VP2000D FG NCD 12/11 Q1017'
'METAR ESMK 060020Z AUTO 00000KT 0800 R01/P2000D R19/P2000D FEW067/// 12/12 Q1008'
'METAR ESMK 060120Z AUTO 18002KT 0150 R01/0450V0750 R19/P2000D FG FEW067/// SCT079/// 12/11 Q1008'
'METAR ESMK 060150Z AUTO 35003KT 320V030 0300 R01/0800V1800 R19/P2000U BCFG FEW072/// 11/11 Q1008'
'METAR ESMK 060220Z AUTO VRB02KT 1200 R01/P2000D R19/P2000D FEW001/// SCT004/// 12/12 Q1008'
'METAR ESMK 060250Z AUTO VRB02KT 0300 R01/P2000 R19/P2000D FEW091/// 12/11 Q1007'
'METAR ESMK 060320Z AUTO 19003KT 0550 R01/0400V0750 R19/P2000U NCD 12/12 Q1007'
'METAR ESDF 072050Z AUTO 21002KT 0300 R01/1000VP1500N R19/P1500N FG FEW100/// 13/12 Q1003'
'METAR ESDF 072120Z AUTO 00000KT 1000 R01/0750VP1500N R19/P1500N BKN100/// 13/12 Q1003'
'METAR ESDF 072150Z AUTO 22005KT 0350 R01/P1500N R19/P1500D FG SCT043/// BKN110/// 13/12 Q1003'
'METAR COR ESGP 070247Z 00000KT 0100 R01/0400 R19/0350V0800 BCFG SCT100 13/12 Q1001'
'METAR ESGP 070250Z 00000KT 0100 R01/0400 R19/0350V0800 FG SCT100 13/12 Q1001'
'METAR ESGP 070320Z 00000KT 0150 R01/0800V1600 R19/0400 BCFG FEW001 11/10 Q1001'
'METAR ESGP 070350Z 00000KT 0100 R01/0375 R19/0225V0350 BCFG NSC 11/09 Q1001'
'METAR ESMK 070250Z AUTO 11002KT 0600 R01/0900V1700D R19/P2000U BR FEW009/// SCT012/// 15/14 Q1001'
'METAR ESMK 070320Z AUTO VRB01KT 0600 R01/0650V1200 R19/P2000N BR FEW008/// SCT010/// 14/14 Q1001'
'METAR ESMK 072220Z AUTO 07001KT 0600 R01/0800V1500 R19/P2000D BR FEW074/// SCT110/// 14/13 Q1003'
'METAR ESMK 072250Z AUTO VRB02KT 1300 R01/P2000D R19/P2000D BCFG SCT100/// 13/13 Q1003'
'METAR ESMK 072320Z AUTO 20002KT 0500 R01/1000VP2000D R19/P2000D SCT090/// BKN100/// 13/13 Q1003'
'METAR ESMK 072350Z AUTO 19002KT 0300 R01/0900 R19/P2000D FG FEW041/// BKN089/// 13/13 Q1003'
'METAR ESMT 072050Z AUTO 06003KT 0700 R01/P2000N R19/0900V1900U FG VV000 14/13 Q1003'
'METAR ESNU 072350Z 00000KT 0800 R14/P1500N R32/P1500N MIFG NSC 03/02 Q1006'
'METAR ESTA 070150Z AUTO 15004KT 1500 BR FEW007/// SCT010/// BKN030/// 15/14 Q1001'
'METAR ESTL 070320Z AUTO 20004KT 0350 R11R/0800VP1500N R29L/0800VP1500N FG SKC 13/13 Q1002'
'METAR ESMK 080050Z AUTO VRB01KT 0600 R01/P2000D R19/P2000U FEW025/// BKN041/// 13/12 Q1002'

Linus Dock am 20 Okt. 2016

In MATLAB Online öffnen

I think I solved it:

tokens = regexp(DATALow, '\<(R\d{2}[A-Z]?)/.*?(\d{4})*[A-Z]*[^ ]*?\>', 'tokens');

Guillaume am 20 Okt. 2016

In MATLAB Online öffnen

You cannot create a regular expression (even a dynamic one) that would match the smaller of the two numerical groups if both are present. You would have to return both group and select the minimum afterward.

I believe the following would suit:

%the regexp now returns three tokens per match, the last token of each match may be empty
tokens = regexp(DATALow, '\<(R\d{2}[A-Z]?)/[A-Z]?(\d{4})[A-Z]?(\d{4})?[A-Z]?\>', 'tokens');  
tokens  = cellfun(@(t) vertcat(t{:}), tokens, 'UniformOutput', false);  %concatenate all pairs of each row vertically
alltokens = vertcat(tokens{:});  %concatenate it all regardless of row, note that this remove empty rows
allvalues = str2double(alltokens(:, [2 3])); %convert RVR tokens to number. If only one RVR per match, the second token is converted to NaN
minvalues = min(allvalues, [], 2);

If using an old version of matlab where min does not ignore nans by default, replace the nans by inf before the call to min:

allvalues(isnan(allvalues)) = inf;

or use nanmin if appropriate toolbox is installed.

Melden Sie sich an, um zu kommentieren.

Melden Sie sich an, um diese Frage zu beantworten.

Answer 1

Guillaume am 18 Okt. 2016

1
Verknüpfen

Direkter Link zu dieser Antwort

https://de.mathworks.com/matlabcentral/answers/307831-how-can-i-arrange-my-output-from-regexp-stored-in-multiple-cells-in-a-for-loop#answer_239558

Bearbeitet: Guillaume am 18 Okt. 2016

In MATLAB Online öffnen

As per comment to question, changing the regex to take into account the optional letter is not a problem.

To produce your output, I believe the following would work:

tokens = regexp(DATALow, '\<(R\d{2}[A-Z]?)/.*?(\d{4})[A-Z]?\>', 'tokens'); %find RVR in DATALow
tokens  = cellfun(@(t) vertcat(t{:}), tokens, 'UniformOutput', false);  %concatenate all pairs of each row vertically
alltokens = vertcat(tokens{:});  %concatenate it all regardless of row, note that this remove empty rows
allvalues = str2double(alltokens(:, 2)); %convert RVR value from string to number. str2double is a lot safer than str2num and can work on cell arrays
destcol = repelem((1:numel(tokens))', cellfun(@(c) size(c, 1), tokens)); %find column destination for each row of alltokens and allvalues
[runway, ~, destrow] = unique(alltokens(:, 1));   %get unique runway id and row destination for each row of alltokens and allvalues
visibility = nan(numel(runway), numel(tokens));  %initialise output matrix.
%visibility = zeros(numel(runway), numel(tokens)) + 9999; %if you want 9999 instead
visibility(sub2ind(size(visibility), destrow, destcol)) = allvalues;

If I remember correctly, you're using an old version of matlab, which may not have repelem, in which case:

repelem = @(v, r) cell2mat(arrayfun(@(n, r) repmat(n, 1, r), v, r, 'UniformOutput', false)')';

for this particular case.

edit: new more versatile regex

1 Kommentar
-1 ältere Kommentare anzeigen-1 ältere Kommentare ausblenden

Linus Dock am 18 Okt. 2016

In MATLAB Online öffnen

Awesome Guillaume! This is just what I needed. Just one more thing, I'm using this expression instead:

tokens = regexp(DATALow, '\<(R\d{2}[A-Z]?)/(\w(\d{4})|(\d{4})?:((\d{4})?[A-Z]+)?(\d{4})[A-Z]|(\d{4}))', 'tokens');

But I have some strings like this as mentioned above:

'METAR ESMK 060020Z AUTO 00000KT 0800 R01/P2000D R19/P2000D FEW067/// 12/12 Q1008'

I'm getting an unwanted extra 'P' Before my '2000' like this:

'R01'  'P2000'
'R19'  'P2000'

Ohterwise it does exactly what I want!

Melden Sie sich an, um zu kommentieren.

Answer 2

Andrei Bobrov am 18 Okt. 2016

0
Verknüpfen

Direkter Link zu dieser Antwort

https://de.mathworks.com/matlabcentral/answers/307831-how-can-i-arrange-my-output-from-regexp-stored-in-multiple-cells-in-a-for-loop#answer_239554

In MATLAB Online öffnen

    tokens = regexp(DATALow, '\<(R\d{2})/(\d{4})[A-Z]+(?:(?:\d{4})[A-Z])?\>', 'tokens');
    out = cellfun(@(x)cat(1,x{:}),tokens,'un',0);

6 Kommentare
4 ältere Kommentare anzeigen4 ältere Kommentare ausblenden

Linus Dock am 19 Okt. 2016

In MATLAB Online öffnen

Also '0023' in this str is wrongly extracted:

'METAR ESGJ 102247Z 35015KT 1200 SHSN FEW006 BKN010 M04/M04 Q1009 R01/790023'

That group contains information about the Runway condition and braking action which I'm not interested in for the moment.

Thank you!

Andrei Bobrov am 19 Okt. 2016

In MATLAB Online öffnen

tokens = regexp(DATALow, '\<(R\d{2}[A-Z]?)/[A-Z]?(\d{4,})[A-Z]*(?:(?:\d{4})[A-Z])?\>|(?:\<BECMG\>).*(\<\d{4}\>)', 'tokens');

Melden Sie sich an, um zu kommentieren.

How can I arrange my output from regexp stored in multiple cells in a for loop?

15 Kommentare
13 ältere Kommentare anzeigen13 ältere Kommentare ausblenden

Akzeptierte Antwort

1 Kommentar
-1 ältere Kommentare anzeigen-1 ältere Kommentare ausblenden

Weitere Antworten (1)

6 Kommentare
4 ältere Kommentare anzeigen4 ältere Kommentare ausblenden

Siehe auch

Kategorien

Tags

Community Treasure Hunt

How can I arrange my output from regexp stored in multiple cells in a for loop?

15 Kommentare 13 ältere Kommentare anzeigen13 ältere Kommentare ausblenden

Akzeptierte Antwort

1 Kommentar -1 ältere Kommentare anzeigen-1 ältere Kommentare ausblenden

Weitere Antworten (1)

6 Kommentare 4 ältere Kommentare anzeigen4 ältere Kommentare ausblenden

Siehe auch

Kategorien

Tags

Community Treasure Hunt

15 Kommentare
13 ältere Kommentare anzeigen13 ältere Kommentare ausblenden

1 Kommentar
-1 ältere Kommentare anzeigen-1 ältere Kommentare ausblenden

6 Kommentare
4 ältere Kommentare anzeigen4 ältere Kommentare ausblenden