How to use lazy quantifiers in look ahead?

Question

Serbring am 8 Sep. 2024

0
Verknüpfen

Direkter Link zu dieser Frage

https://de.mathworks.com/matlabcentral/answers/2151129-how-to-use-lazy-quantifiers-in-look-ahead

Bearbeitet: Serbring am 8 Sep. 2024

test.mat

Hi all,

I need to develop a regular expression for extracting a few numbers from HTML code. In particular, the numbers must be followed by the units (lbs) and not preceded by the string "front", "rear" and "drawbar". What make things complicated to me due to HTML tags. I am attaching a workspace where there are two strings containing the HTML codes of the pages where I need to extract those numbers. In particular, from the web1 variable I need to extract only the numbers 2840 and 4630; while, from web2 variable, I need to extract the numbers 13338 and 23149.

Probably, I have to use a lazy quantifier and I tried as regexp(HTMLtext,'(?<!([fF]ront | [rR]ear | [dD]rawbar).*?)\d+(?=\s*lbs)','match') with no success.

4 Kommentare
2 ältere Kommentare anzeigen2 ältere Kommentare ausblenden

Stephen23 am 8 Sep. 2024

Bearbeitet: Stephen23 am 8 Sep. 2024

"I have uploaded a couple of sample HTML codes in the starting message."

Please save the messages in a MAT file and upload that by clicking the paperclip button.

Your explanation is inconsistent with your example. You wrote "the numbers must be followed by the units (lbs) and not by the string "front", "rear" and "drawbar""

I am guessing you mean per line (or more strictly within the parent <tr></tr> tags). Why then are 1940, 4409, etc not bold? (they are definitely not "followed" by the words you specified).

Or did you mean preceded by rather than "followed by" ?

If you want a regular expression to workd reliably then you need to define its usecase very precisely.

Serbring am 8 Sep. 2024

Thanks again. I have attached a .mat file with two strings and updated the initial message. You are on right, the numbers must not preceded by the terms "front", "rear" and "drawbar". Hopefully, now it is more clear.

Melden Sie sich an, um zu kommentieren.

Melden Sie sich an, um diese Frage zu beantworten.

Answer 1

Stephen23 am 8 Sep. 2024

0
Verknüpfen

Direkter Link zu dieser Antwort

https://de.mathworks.com/matlabcentral/answers/2151129-how-to-use-lazy-quantifiers-in-look-ahead#answer_1512829

Bearbeitet: Stephen23 am 8 Sep. 2024

In MATLAB Online öffnen

test.mat

Regular expressions are the wrong tool for this. It might be possible with some effort, but personally I would just use the correct tools from the start:

S = load('test.mat');
w1 = S.web1;
w2 = S.web2;
t1 = htmlTree(w1);
t2 = htmlTree(w2);
td1 = findElement(t1,'td');
td2 = findElement(t2,'td');
tx1 = extractHTMLText(td1);
tx2 = extractHTMLText(td2);
ix1 = find(contains(tx1,'lbs'));
ix2 = find(contains(tx2,'lbs'));

Text 1:

hdr = tx1(ix1-1)
hdr = 4x1 string array
    "Shipping:"
    "Max capacity:"
    "Max front axle:"
    "Max rear axle:"
val = tx1(ix1)
val = 4x1 string array
    "2840 lbs..."
    "4630 lbs..."
    "1940 lbs..."
    "3527 lbs..."

Text 2:

hdr = tx2(ix2-1)
hdr = 3x1 string array
    ""
    "Max capacity:"
    "Max Drawbar:"
val = tx2(ix2)
val = 3x1 string array
    "13338 lbs..."
    "23149 lbs..."
    "4409 lbs..."

Then filter for what you want (rather than for what you do not want) using some basic text tools, e.g. CONTAINS.

1 Kommentar
-1 ältere Kommentare anzeigen-1 ältere Kommentare ausblenden

Serbring am 8 Sep. 2024

Bearbeitet: Serbring am 8 Sep. 2024

Thank you. I did not know these new (to me) functions for parsing HTML codes and I am using regular expressions for such things for years. But this will change everything in some activities. You saved me. Thank you so much!!!

Melden Sie sich an, um zu kommentieren.

How to use lazy quantifiers in look ahead?

4 Kommentare
2 ältere Kommentare anzeigen2 ältere Kommentare ausblenden

Akzeptierte Antwort

1 Kommentar
-1 ältere Kommentare anzeigen-1 ältere Kommentare ausblenden

Weitere Antworten (0)

Siehe auch

Kategorien

Tags

Produkte

Version

Community Treasure Hunt

How to use lazy quantifiers in look ahead?

4 Kommentare 2 ältere Kommentare anzeigen2 ältere Kommentare ausblenden

Akzeptierte Antwort

1 Kommentar -1 ältere Kommentare anzeigen-1 ältere Kommentare ausblenden

Weitere Antworten (0)

Siehe auch

Kategorien

Tags

Produkte

Version

Community Treasure Hunt

4 Kommentare
2 ältere Kommentare anzeigen2 ältere Kommentare ausblenden

1 Kommentar
-1 ältere Kommentare anzeigen-1 ältere Kommentare ausblenden