use regexp to extract index

Question

0 Stimmen

trans_setup_2=ASSIGN {
# Blower speed in rpm
  variable = SPEED
  value = 1234
}
ASSIGN {
# Resulting time increment
  variable = TIME_INCREMENT
  value = 60 / SPEED / NR_BLADES / NR_TIME_STEPS_PER_BLADE
}

trans_setup_2 is a cell array created from the text file containing the above text.

I would like to extract an index of ASSIGN { # Blower speed in rpm (only first occurence ) and along with an index of string value = 60 / SPEED / NR_BLADES / NR_TIME_STEPS_PER_BLADE .

I have tried following:

ex='(?=ASSIGN).*|(?<=value = 60 / SPEED / NR_BLADES / NR_TIME_STEPS_PER_B).*' 
ones=~cellfun(@isempty,regexp(trans_setup_2, ex, 'match'))

How should adapt 'ex' search pattern extactly to get only first occurence of the ASSIGN and then value = 60 / SPEED / NR_BLADES / NR_TIME_STEPS_PER_BLADE.

Please help me on this.

7 Kommentare
5 ältere Kommentare anzeigen 5 ältere Kommentare ausblenden

Nit C am 13 Sep. 2021

Bearbeitet: Nit C am 13 Sep. 2021

trans_setup_2.txt

@Stephen @Walter Roberson. Sorry for the late reply. Thanks for taking time to answer my question.

Attached sample text file (Actual file is 2000++ lines long) where i should look into and extract the index of some text, words occuring inside so that i can copy the text from one index upto the target index and insert into another input text file.

e.g. i wouild like to first occurence of : 'ASSIGN' along with ' # Blower ...

ASSIGN {

# Blower speed in rpm

in my main text file ASSIGN word occurs more 10 times. So for me interesting to get one an first index ASSIGN and then an end index from line value = 60 / SPEED / NR_BLADES / NR_TIME_STEPS_PER_BLADE, so that i get range of lines to copy into aonther text file.

Similarly i would like find index of following text pattern occurs in my file. e.g.

text pattern to search

MESH_MOTION( "wheel-_3D_of_fluid-wheel" ) {

type = rotation

}

and

another text pattern to search

SIMPLE_BOUNDARY_CONDITION( "wall__fluid-wheel" ) {

shape = three_node_triangle

element_set = "fluid-wheel-_3D_of_fluid-wheel"

my problem is, i have multiple rows cell array to look for the pattern. I have tried many combinations of regular expressions to build the pattern but not getting extact text.

Thanks (sorry for my English)

Walter Roberson am 13 Sep. 2021

In MATLAB Online öffnen

Consider the extract you posted above,

trans_setup_2=ASSIGN {
# Blower speed in rpm
  variable = SPEED
  value = 1234
}
ASSIGN {
# Resulting time increment
  variable = TIME_INCREMENT
  value = 60 / SPEED / NR_BLADES / NR_TIME_STEPS_PER_BLADE
}

exactly what output would you like from this? When you talk about "index", do you mean that trans_setup_2 is a character vector, and you want to know the value J such that trans_setup_2(J) is the start of the '1' character of the '1234', and that you want the value K such that trans_setup_2(K) is the start of the '6' character of the '60 / SPEED / NR_BLADES / NR_TIME_STEPS_PER_BLADE' ? Or do you need the start and end indices, like J1, J2 such that trans_setup_2(J1:J2) = '1234' and K1, K2 such that trans_setup_2(K1:K2) = '60 / SPEED / NR_BLADES / NR_TIME_STEPS_PER_BLADE' ?

Or do you want the text '1234' and '60 / SPEED / NR_BLADES / NR_TIME_STEPS_PER_BLADE' extracted and you do not care about the indices into trans_setup2 that they occur at?

In the cases where the extracted text is a valid number, do you want the saved value automatically converted to double precision?

Do you need as output a table,

SPEED    TIME_INCREMENT
1234     '60 / SPEED / NR_BLADES / NR_TIME_STEPS_PER_BLADE'

Do you need a struct,

struct('SPEED', 1234, 'TIME_INCREMENT', '60 / SPEED / NR_BLADES / NR_TIME_STEPS_PER_BLADE')

do you need something else?

"I want as output, the following variables: X. X should be a struct array, one entry per block. Each entry should have a field named 'variable' that should be a categorical, and a field named 'value' that should be ..."

Walter Roberson am 13 Sep. 2021

In MATLAB Online öffnen

trans_setup_path = fullfile('D:\timpts' ,'trans_setup_2.txt');
S = fileread(trans_setup_path);
S = regexp(S, '^ASSIGN\s', 'split', 'lineanchors');
S = regexprep(S, '^{', 'ASSIGN {');

Now S should be a cell array of character vectors. The first one should start with

trans_setup_2=ASSIGN {

and the others should start with

ASSIGN {

and each of them should be an exact copy of a {} block of text.

You probably do not need to know the line numbers to copy: you have the blocks of text right there, so you can copy out of the blocks.

You can parse each block,

vals = regexp(S, 'variable = (?<variable>\S+).*value = (?<value>[^\r\n]+)', 'names');

and that should get you a struct array with fields 'variable' and 'value' . You can search those for the variable names you are looking for to determine whether you are interested in copying the block or not.

Copying the block is

number_of_blocks_written = 0;
stuff
if number_of_blocks_written > 0
   fprintf(outfid, '\n');
end
fwrite(outfid, S{K});
number_of_blocks_written = number_of_blocks_written + 1;

The care about writing \n or not is to avoid writing extra newlines. A newline has probably been eaten by the the process of finding the lines beginning with ASSIGN.

Nit C am 15 Sep. 2021

@Walter Roberson, Thanks. This solved my problem.

Melden Sie sich an, um zu kommentieren.

Melden Sie sich an, um diese Frage zu beantworten.

Follow Question

Answer 1

Mathieu NOE am 13 Sep. 2021

In MATLAB Online öffnen

1 Stimme

hello

my 2 cents suggestion using readlines and working on strings :

this simple code can be expanded / modified according to what you need.

rr = readlines('trans_setup_2.txt');
rr_strip = strip(rr,'left'); % remove left blanks
a = find(strcmp(rr_strip,'ASSIGN {'));
b = find(strcmp(rr_strip,'value = 60 / SPEED / NR_BLADES / NR_TIME_STEPS_PER_BLADE'));
text_extract1 = rr(a(1):b);

1 Kommentar
-1 ältere Kommentare anzeigen -1 ältere Kommentare ausblenden

Nit C am 13 Sep. 2021

@Mathieu NOE thanks.

I had strcmp used. But i am intersted to go with 'regexp' becuase there are many selection of text to make based on general pattern, keywords instead of extact text.

Melden Sie sich an, um zu kommentieren.

use regexp to extract index

7 Kommentare
5 ältere Kommentare anzeigen 5 ältere Kommentare ausblenden

Akzeptierte Antwort

1 Kommentar
-1 ältere Kommentare anzeigen -1 ältere Kommentare ausblenden

Weitere Antworten (0)

Kategorien

Produkte

Version

Tags

Community Treasure Hunt

use regexp to extract index

7 Kommentare 5 ältere Kommentare anzeigen 5 ältere Kommentare ausblenden

Akzeptierte Antwort

1 Kommentar -1 ältere Kommentare anzeigen -1 ältere Kommentare ausblenden

Weitere Antworten (0)

Kategorien

Produkte

Version

Tags

Siehe auch

Community Treasure Hunt

7 Kommentare
5 ältere Kommentare anzeigen 5 ältere Kommentare ausblenden

1 Kommentar
-1 ältere Kommentare anzeigen -1 ältere Kommentare ausblenden