How to use lazy quantifiers in look ahead?
5 Ansichten (letzte 30 Tage)
Ältere Kommentare anzeigen
Serbring
am 8 Sep. 2024
Bearbeitet: Serbring
am 8 Sep. 2024
Hi all,
I need to develop a regular expression for extracting a few numbers from HTML code. In particular, the numbers must be followed by the units (lbs) and not preceded by the string "front", "rear" and "drawbar". What make things complicated to me due to HTML tags. I am attaching a workspace where there are two strings containing the HTML codes of the pages where I need to extract those numbers. In particular, from the web1 variable I need to extract only the numbers 2840 and 4630; while, from web2 variable, I need to extract the numbers 13338 and 23149.
Probably, I have to use a lazy quantifier and I tried as regexp(HTMLtext,'(?<!([fF]ront | [rR]ear | [dD]rawbar).*?)\d+(?=\s*lbs)','match') with no success.
4 Kommentare
Stephen23
am 8 Sep. 2024
Bearbeitet: Stephen23
am 8 Sep. 2024
"I have uploaded a couple of sample HTML codes in the starting message."
Please save the messages in a MAT file and upload that by clicking the paperclip button.
Your explanation is inconsistent with your example. You wrote "the numbers must be followed by the units (lbs) and not by the string "front", "rear" and "drawbar""
I am guessing you mean per line (or more strictly within the parent <tr></tr> tags). Why then are 1940, 4409, etc not bold? (they are definitely not "followed" by the words you specified).
Or did you mean preceded by rather than "followed by" ?
If you want a regular expression to workd reliably then you need to define its usecase very precisely.
Akzeptierte Antwort
Stephen23
am 8 Sep. 2024
Bearbeitet: Stephen23
am 8 Sep. 2024
Regular expressions are the wrong tool for this. It might be possible with some effort, but personally I would just use the correct tools from the start:
S = load('test.mat');
w1 = S.web1;
w2 = S.web2;
t1 = htmlTree(w1);
t2 = htmlTree(w2);
td1 = findElement(t1,'td');
td2 = findElement(t2,'td');
tx1 = extractHTMLText(td1);
tx2 = extractHTMLText(td2);
ix1 = find(contains(tx1,'lbs'));
ix2 = find(contains(tx2,'lbs'));
Text 1:
hdr = tx1(ix1-1)
val = tx1(ix1)
Text 2:
hdr = tx2(ix2-1)
val = tx2(ix2)
Then filter for what you want (rather than for what you do not want) using some basic text tools, e.g. CONTAINS.
1 Kommentar
Weitere Antworten (0)
Siehe auch
Kategorien
Mehr zu String Parsing finden Sie in Help Center und File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!