How do I use regexp to extract text between numbers
4 Ansichten (letzte 30 Tage)
Ältere Kommentare anzeigen
I have a string that I extracted from a pdf
str = "↵↵↵1. Receptacles, general purpose. ↵2. Receptacles with integral GFCI. ↵3. USB Charger receptacles. ↵4. AFCI receptacles. ↵5. Twist-locking receptacles. ↵6. Isolated-ground receptacles. ↵7. Tamper-resistant receptacles. ↵8. Weather-resistant receptacles. ↵9. Pendant cord-connector devices. ↵10. Cord and plug sets. ↵11. Wall box dimmers. ↵12. Wall box dimmer/sensors. ↵13. Wall box occupancy/vacancy sensors. ↵14. Toggle Switches. ↵15. Floor service outlets. ↵16. Associated device plates. ↵↵"
How can I use the function regexp to extract all the descriptions between the numbers to put them into a 16x1 matrix. So the end product I want will be a 16x1 string that looks like
- Receptacles, general purpose.
- Receptacles with integral GFCI.
- USB Charger receptacles.
- AFCI receptacles.
- Twist-locking receptacles.
- Isolated-ground receptacles.
- Tamper-resistant receptacles.
- Weather-resistant receptacles.
- Pendant cord-connector devices.
- Cord and plug sets.
- Wall box dimmers.
- Wall box dimmer/sensors.
- Wall box occupancy/vacancy sensors.
- Toggle Switches.
- Floor service outlets.
- Associated device plates.
I also have this line of code
parts = regexp(str,'^\d*+.*$','dotexceptnewline','lineanchors');
which finds the index of each number in the string. I think I could then use all the index values to write a for loop to extract the text that is in between the text
4 Kommentare
Rik
am 9 Nov. 2019
Is this the exact text of your char array? Or are there actually some char(10) in there?
Antworten (2)
per isakson
am 9 Nov. 2019
Bearbeitet: per isakson
am 9 Nov. 2019
"So the end product I want will be a 16x1 string that looks like" I'm not sure exactly how understand your requirement.
The problem is the delimiter that looks a bit like the character on my ENTER key ( ↵). After copy&paste from your question the hex number of that character is \x21B5.
Try
%%
z = regexp( str, "\x21B5+", 'split' );
z = strtrim( z );
z( isstring(z) & strlength(z)==0 ) = [];
%%
% z = regexp( z, "(?<=\d+\.\x20).+$", 'match', 'once' ); % removes the numbers
out = reshape( z, [],1 );
%%
fprintf( 1, '%s\n', out );
outputs in the command window
1. Receptacles, general purpose.
2. Receptacles with integral GFCI.
3. USB Charger receptacles.
4. AFCI receptacles.
5. Twist-locking receptacles.
6. Isolated-ground receptacles.
....
and
>> out(1:4)
ans =
4×1 string array
"1. Receptacles, general purpose."
"2. Receptacles with integral GFCI."
"3. USB Charger receptacles."
"4. AFCI receptacles."
0 Kommentare
JESUS DAVID ARIZA ROYETH
am 9 Nov. 2019
str = "↵↵↵1. Receptacles, general purpose. ↵2. Receptacles with integral GFCI. ↵3. USB Charger receptacles. ↵4. AFCI receptacles. ↵5. Twist-locking receptacles. ↵6. Isolated-ground receptacles. ↵7. Tamper-resistant receptacles. ↵8. Weather-resistant receptacles. ↵9. Pendant cord-connector devices. ↵10. Cord and plug sets. ↵11. Wall box dimmers. ↵12. Wall box dimmer/sensors. ↵13. Wall box occupancy/vacancy sensors. ↵14. Toggle Switches. ↵15. Floor service outlets. ↵16. Associated device plates. ↵↵"
parts = regexp(str,'\d+\. +[.\w,-/\s]+\.','match')'
parts =
16×1 string array
"1. Receptacles, general purpose."
"2. Receptacles with integral GFCI."
"3. USB Charger receptacles."
"4. AFCI receptacles."
"5. Twist-locking receptacles."
"6. Isolated-ground receptacles."
"7. Tamper-resistant receptacles."
"8. Weather-resistant receptacles."
"9. Pendant cord-connector devices."
"10. Cord and plug sets."
"11. Wall box dimmers."
"12. Wall box dimmer/sensors."
"13. Wall box occupancy/vacancy sensors."
"14. Toggle Switches."
"15. Floor service outlets."
"16. Associated device plates."
0 Kommentare
Siehe auch
Kategorien
Mehr zu String Parsing finden Sie in Help Center und File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!