How do I use regexp to extract text between numbers

4 Ansichten (letzte 30 Tage)
Ean Hendrickson
Ean Hendrickson am 9 Nov. 2019
Bearbeitet: per isakson am 9 Nov. 2019
I have a string that I extracted from a pdf
str = "↵↵↵1. Receptacles, general purpose. ↵2. Receptacles with integral GFCI. ↵3. USB Charger receptacles. ↵4. AFCI receptacles. ↵5. Twist-locking receptacles. ↵6. Isolated-ground receptacles. ↵7. Tamper-resistant receptacles. ↵8. Weather-resistant receptacles. ↵9. Pendant cord-connector devices. ↵10. Cord and plug sets. ↵11. Wall box dimmers. ↵12. Wall box dimmer/sensors. ↵13. Wall box occupancy/vacancy sensors. ↵14. Toggle Switches. ↵15. Floor service outlets. ↵16. Associated device plates. ↵↵"
How can I use the function regexp to extract all the descriptions between the numbers to put them into a 16x1 matrix. So the end product I want will be a 16x1 string that looks like
  1. Receptacles, general purpose.
  2. Receptacles with integral GFCI.
  3. USB Charger receptacles.
  4. AFCI receptacles.
  5. Twist-locking receptacles.
  6. Isolated-ground receptacles.
  7. Tamper-resistant receptacles.
  8. Weather-resistant receptacles.
  9. Pendant cord-connector devices.
  10. Cord and plug sets.
  11. Wall box dimmers.
  12. Wall box dimmer/sensors.
  13. Wall box occupancy/vacancy sensors.
  14. Toggle Switches.
  15. Floor service outlets.
  16. Associated device plates.
I also have this line of code
parts = regexp(str,'^\d*+.*$','dotexceptnewline','lineanchors');
which finds the index of each number in the string. I think I could then use all the index values to write a for loop to extract the text that is in between the text
  4 Kommentare
Rik
Rik am 9 Nov. 2019
Is this the exact text of your char array? Or are there actually some char(10) in there?
Ean Hendrickson
Ean Hendrickson am 9 Nov. 2019
this is the exact text I extracted from a pdf. there should be no char(10) in there. I used extractFileText, strfind and extractBetween to get the above text.

Melden Sie sich an, um zu kommentieren.

Antworten (2)

per isakson
per isakson am 9 Nov. 2019
Bearbeitet: per isakson am 9 Nov. 2019
"So the end product I want will be a 16x1 string that looks like" I'm not sure exactly how understand your requirement.
The problem is the delimiter that looks a bit like the character on my ENTER key ( ↵). After copy&paste from your question the hex number of that character is \x21B5.
Try
%%
z = regexp( str, "\x21B5+", 'split' );
z = strtrim( z );
z( isstring(z) & strlength(z)==0 ) = [];
%%
% z = regexp( z, "(?<=\d+\.\x20).+$", 'match', 'once' ); % removes the numbers
out = reshape( z, [],1 );
%%
fprintf( 1, '%s\n', out );
outputs in the command window
1. Receptacles, general purpose.
2. Receptacles with integral GFCI.
3. USB Charger receptacles.
4. AFCI receptacles.
5. Twist-locking receptacles.
6. Isolated-ground receptacles.
....
and
>> out(1:4)
ans =
4×1 string array
"1. Receptacles, general purpose."
"2. Receptacles with integral GFCI."
"3. USB Charger receptacles."
"4. AFCI receptacles."

JESUS DAVID ARIZA ROYETH
JESUS DAVID ARIZA ROYETH am 9 Nov. 2019
str = "↵↵↵1. Receptacles, general purpose. ↵2. Receptacles with integral GFCI. ↵3. USB Charger receptacles. ↵4. AFCI receptacles. ↵5. Twist-locking receptacles. ↵6. Isolated-ground receptacles. ↵7. Tamper-resistant receptacles. ↵8. Weather-resistant receptacles. ↵9. Pendant cord-connector devices. ↵10. Cord and plug sets. ↵11. Wall box dimmers. ↵12. Wall box dimmer/sensors. ↵13. Wall box occupancy/vacancy sensors. ↵14. Toggle Switches. ↵15. Floor service outlets. ↵16. Associated device plates. ↵↵"
parts = regexp(str,'\d+\. +[.\w,-/\s]+\.','match')'
parts =
16×1 string array
"1. Receptacles, general purpose."
"2. Receptacles with integral GFCI."
"3. USB Charger receptacles."
"4. AFCI receptacles."
"5. Twist-locking receptacles."
"6. Isolated-ground receptacles."
"7. Tamper-resistant receptacles."
"8. Weather-resistant receptacles."
"9. Pendant cord-connector devices."
"10. Cord and plug sets."
"11. Wall box dimmers."
"12. Wall box dimmer/sensors."
"13. Wall box occupancy/vacancy sensors."
"14. Toggle Switches."
"15. Floor service outlets."
"16. Associated device plates."

Tags

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by