Filter löschen
Filter löschen

How can I arrange my output from regexp stored in multiple cells in a for loop?

2 Ansichten (letzte 30 Tage)
Hi, I am using regexp to extract and match data from a textstring with the code:
tokens = regexp(DATALow, '\<(R\d{2}[A-Z])/(?:(?:\d{4})?[A-Z]+)?(\d{4})[A-Z]\>', 'tokens'); %find RVR in DATALow
SortTokens = cellfun(@(t) vertcat(t{:}), tokens, 'UniformOutput', false); %sort RVR as vertical cells
My output is stored in cells within a cell like this:
[]
[]
[]
<4x2 cell>
<4x2 cell>
<4x2 cell>
[]
[]
The output cells contain the data that look like this:
'R01L' '1500'
'R19R' '1500'
'R01R' '1300'
'R19L' '1500'
But the output cells are of different shape and can look like this as well:
[]
<1x2 cell>
<1x2 cell>
[]
My goal is to extract the data with a for-loop that take the size of the output cell into consideration and store it in to a cell with this code:
NoRUNWAY=ones(1,length(SortTokens)); %vector of zeros for speed
for j=1:length(SortTokens) %for all data in the cell
NoRws=length(SortTokens{j,1}); %count the length of each row
if NoRws>0 %if larger than zero
NoRUNWAY(j)=NoRws; %set the number to the length of the row
end
end
isemp = cellfun('isempty', tokens); %find all empty cells in tokens
for l=1:length(SortTokens);
RWYnum=NoRUNWAY(l);
for k=1:RWYnum
tempRUNWAY = cellfun(@(x) x{k,1}, SortTokens(~isemp), 'uni', 0);
tempRVR = cellfun(@(x) x{k,2}, SortTokens(~isemp), 'uni', 0);
RVR = nan(size(SortTokens));
RVR(~isemp) = cellfun(@str2num, tempRVR);
RVRnan=isnan(RVR);
RVRnanx=find(RVRnan);
RVR(RVRnanx)=9999;
RWYcell{1,k}=tempRUNWAY(1);
RVRcell{1,k}=RVR;
end
end
The largest output cell is of size
<4x2 cell>
I would like to store the data into a new cell with four columns and to ultimately compare these values with some other measurements.
Is this making any sense? These are measurements of Runway Visual Range at multiple runways from different Airports and I would like to compare these with the Meteorological Visibility for the same Airports. The Data I am using called DATALow looks like this:
'METAR ESNS 010050Z AUTO 00000KT 0500 R10/0550V1300N R28/0500V0750N FG VV000 09/08 Q1011'
'METAR ESNS 010150Z AUTO 30002KT 0150 R10/0200N R28/0500VP1500N FG VV001 10/09 Q1012'
'METAR ESNS 010220Z AUTO 00000KT 0300 R10/0450V0800N R28/0300V0650D FG VV000 09/09 Q1012'
'METAR ESNS 010250Z AUTO 00000KT 0050 R10/0550V0800N R28/0175N FG VV000 10/09 Q1012'
'METAR ESNS 010320Z AUTO 00000KT 0050 R10/0200N R28/0375N FG VV001 10/09 Q1012'
'METAR ESNS 010350Z AUTO 00000KT 0100 R10/0250N R28/0250N FG VV001 10/10 Q1012'
'METAR ESNS 010420Z AUTO VRB02KT 0150 R10/0300N R28/0275N FG VV001 11/11 Q1012'
'METAR ESNS 010450Z AUTO 00000KT 0250 R10/0600VP1500N R28/0500V0800N FG VV001 12/11 Q1012'
And I just realized that my regexp code is missing most of the RVR because it is looking to match Runway designators with the shape:
R19L/
which is not the case for most of the Airports. Can someone please help with this?
  15 Kommentare
Linus Dock
Linus Dock am 20 Okt. 2016
I think I solved it:
tokens = regexp(DATALow, '\<(R\d{2}[A-Z]?)/.*?(\d{4})*[A-Z]*[^ ]*?\>', 'tokens');
Guillaume
Guillaume am 20 Okt. 2016
You cannot create a regular expression (even a dynamic one) that would match the smaller of the two numerical groups if both are present. You would have to return both group and select the minimum afterward.
I believe the following would suit:
%the regexp now returns three tokens per match, the last token of each match may be empty
tokens = regexp(DATALow, '\<(R\d{2}[A-Z]?)/[A-Z]?(\d{4})[A-Z]?(\d{4})?[A-Z]?\>', 'tokens');
tokens = cellfun(@(t) vertcat(t{:}), tokens, 'UniformOutput', false); %concatenate all pairs of each row vertically
alltokens = vertcat(tokens{:}); %concatenate it all regardless of row, note that this remove empty rows
allvalues = str2double(alltokens(:, [2 3])); %convert RVR tokens to number. If only one RVR per match, the second token is converted to NaN
minvalues = min(allvalues, [], 2);
If using an old version of matlab where min does not ignore nans by default, replace the nans by inf before the call to min:
allvalues(isnan(allvalues)) = inf;
or use nanmin if appropriate toolbox is installed.

Melden Sie sich an, um zu kommentieren.

Akzeptierte Antwort

Guillaume
Guillaume am 18 Okt. 2016
Bearbeitet: Guillaume am 18 Okt. 2016
As per comment to question, changing the regex to take into account the optional letter is not a problem.
To produce your output, I believe the following would work:
tokens = regexp(DATALow, '\<(R\d{2}[A-Z]?)/.*?(\d{4})[A-Z]?\>', 'tokens'); %find RVR in DATALow
tokens = cellfun(@(t) vertcat(t{:}), tokens, 'UniformOutput', false); %concatenate all pairs of each row vertically
alltokens = vertcat(tokens{:}); %concatenate it all regardless of row, note that this remove empty rows
allvalues = str2double(alltokens(:, 2)); %convert RVR value from string to number. str2double is a lot safer than str2num and can work on cell arrays
destcol = repelem((1:numel(tokens))', cellfun(@(c) size(c, 1), tokens)); %find column destination for each row of alltokens and allvalues
[runway, ~, destrow] = unique(alltokens(:, 1)); %get unique runway id and row destination for each row of alltokens and allvalues
visibility = nan(numel(runway), numel(tokens)); %initialise output matrix.
%visibility = zeros(numel(runway), numel(tokens)) + 9999; %if you want 9999 instead
visibility(sub2ind(size(visibility), destrow, destcol)) = allvalues;
If I remember correctly, you're using an old version of matlab, which may not have repelem, in which case:
repelem = @(v, r) cell2mat(arrayfun(@(n, r) repmat(n, 1, r), v, r, 'UniformOutput', false)')';
for this particular case.
edit: new more versatile regex
  1 Kommentar
Linus Dock
Linus Dock am 18 Okt. 2016
Awesome Guillaume! This is just what I needed. Just one more thing, I'm using this expression instead:
tokens = regexp(DATALow, '\<(R\d{2}[A-Z]?)/(\w(\d{4})|(\d{4})?:((\d{4})?[A-Z]+)?(\d{4})[A-Z]|(\d{4}))', 'tokens');
But I have some strings like this as mentioned above:
'METAR ESMK 060020Z AUTO 00000KT 0800 R01/P2000D R19/P2000D FEW067/// 12/12 Q1008'
I'm getting an unwanted extra 'P' Before my '2000' like this:
'R01' 'P2000'
'R19' 'P2000'
Ohterwise it does exactly what I want!

Melden Sie sich an, um zu kommentieren.

Weitere Antworten (1)

Andrei Bobrov
Andrei Bobrov am 18 Okt. 2016
tokens = regexp(DATALow, '\<(R\d{2})/(\d{4})[A-Z]+(?:(?:\d{4})[A-Z])?\>', 'tokens');
out = cellfun(@(x)cat(1,x{:}),tokens,'un',0);
  6 Kommentare
Linus Dock
Linus Dock am 19 Okt. 2016
Also '0023' in this str is wrongly extracted:
'METAR ESGJ 102247Z 35015KT 1200 SHSN FEW006 BKN010 M04/M04 Q1009 R01/790023'
That group contains information about the Runway condition and braking action which I'm not interested in for the moment.
Thank you!
Andrei Bobrov
Andrei Bobrov am 19 Okt. 2016
tokens = regexp(DATALow, '\<(R\d{2}[A-Z]?)/[A-Z]?(\d{4,})[A-Z]*(?:(?:\d{4})[A-Z])?\>|(?:\<BECMG\>).*(\<\d{4}\>)', 'tokens');

Melden Sie sich an, um zu kommentieren.

Kategorien

Mehr zu Standard File Formats finden Sie in Help Center und File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by