Best solution to finding repeating characters on a line.

Question

Matthew Worker am 13 Jul. 2021

3
Verknüpfen

Direkter Link zu dieser Frage

https://de.mathworks.com/matlabcentral/answers/877228-best-solution-to-finding-repeating-characters-on-a-line

Kommentiert: Rena Berman am 26 Sep. 2023

I am looking for any instances of two characters (e/d) being repeated in a row greater then or equal to 10. I just want to either print every line that this occurs to the command line or stop and print the location of the stop everytime it is detected. Basically I am trying to find when e and d show up over ten times grouped together in a large data file. For example:

asdfsdfsdfsasdfsdfsdfsasdfsdfsdfs

asseefadfefeeedddeeedddasdfsdf

asdfsdfsdfsasdfsdfsdfsasdfsdfsdfs

asseefadfefeeedddeeedddasdfsdf

The script would then print out line 2 and line 4 in the command line.

Thank you for your help

1 Kommentar
-1 ältere Kommentare anzeigen-1 ältere Kommentare ausblenden

Rena Berman am 26 Sep. 2023

(Answers Dev) Restored edit

Melden Sie sich an, um zu kommentieren.

Melden Sie sich an, um diese Frage zu beantworten.

Answer 1

Stephen23 am 13 Jul. 2021

0
Verknüpfen

Direkter Link zu dieser Antwort

https://de.mathworks.com/matlabcentral/answers/877228-best-solution-to-finding-repeating-characters-on-a-line#answer_745433

Bearbeitet: Stephen23 am 13 Jul. 2021

In MATLAB Online öffnen

inp = {'asdfsdfsdfsasdfsdfsdfsasdfsdfsdfs';'asseefadfefaaadddaaadddasdfsdf';'asdfsdfsdfsasdfsdfsdfsasdfsdfsdfs';'asseefadfefaaadddaaadddasdfsdf'};
rgx = '(.)(??$1*)(.?)(??[$1$2]*)';
spl = regexp(inp,rgx,'match');
idx = cellfun(@(c)any(cellfun(@numel,c)>9),spl);
find(idx)
ans = 2×1
     2
     4

5 Kommentare
3 ältere Kommentare anzeigen3 ältere Kommentare ausblenden

Walter Roberson am 13 Jul. 2021

In MATLAB Online öffnen

The bold text does not represent repetitions this time, not unless you mean repetition between lines. In the previous example there was two halves, with the second being the same as the first.

If the task is to find places where there is a string of at least 10 d or e characters then

'[de]{10,}'

can find that, and the 'once' and isempty and indexing from my Answer gives you the rest. It just depends on your having used readlines() on the file.

Stephen23 am 13 Jul. 2021

Bearbeitet: Rena Berman am 22 Sep. 2023

Matthew Worker: are the specific characters known in advance? Or do you want to detect them automatically? (i.e. detect any two characters that are repeated more than 10 times contiguously)

Are there any particular patterns that you need to include/exclude? (e.g. does 10 'e' characters in a row count, or does the sequence have to include at least one 'd' character?).

Melden Sie sich an, um zu kommentieren.

Answer 2

Walter Roberson am 13 Jul. 2021

0
Verknüpfen

Direkter Link zu dieser Antwort

https://de.mathworks.com/matlabcentral/answers/877228-best-solution-to-finding-repeating-characters-on-a-line#answer_745428

In MATLAB Online öffnen

You say "10 or over", so is it correct that the program needs to all possible patterns? For example,

'adadadadaaaadadadadaaa'
ans = 'adadadadaaaadadadadaaa'

(length 22) should be located if it exists?

S = {'asseefadfefaaadddaaadddasdfsdf', 'asseeadadadadaaaadadadadaaadfsdf'}
S = 1×2 cell array
    {'asseefadfefaaadddaaadddasdfsdf'}    {'asseeadadadadaaaadadadadaaadfsdf'}
matches = regexp(S, '([ad]{5,})\1', 'match');
celldisp(matches)
 
matches{1}{1} =
 
aaadddaaaddd
 
 
matches{2}{1} =
 
adadadadaaaadadadadaaa
 

5 Kommentare
3 ältere Kommentare anzeigen3 ältere Kommentare ausblenden

Matthew Worker am 13 Jul. 2021

@Walter Roberson Two specific characters (d/e), sorry should've gone more in depth. The problem is already solved as I can convert the data to an array. But if you know a way of going through a block of text data (in a txt file) and stating the lines it is detected, I would be very appreciative.

Thank you for you previous answer as it does give a good explanation.

Walter Roberson am 14 Jul. 2021

In MATLAB Online öffnen

Example of reading from file:

%create a file for demonstration purposes only
tname = [tempname() '.txt'];
fid = fopen(tname, 'w');
T = regexprep('asseefadfefaaadddaaadddasdfsdf\nasseeadadadadaaaadadadadaaadfsdf\nasdfsdfsdfsasdfsdfsdfsasdfsdfsdfs\nasseefadfefaaadddaaadddasdfsdf\nasdfsdfsdfsasdfsdfsdfsasdfsdfsdfs\nasseefadfefaaadddaaadddasdfsdf\n', 'a', 'e');
fprintf(fid, T);
fclose(fid);
%okay, main function
filename = tname;
%okay, main function
S = readlines(filename);
matches = S(~cellfun(@isempty, regexp(S, '[de]{10}', 'once')));
matches
matches = 4×1 string array
    "esseefedfefeeedddeeedddesdfsdf"
    "esseeededededeeeededededeeedfsdf"
    "esseefedfefeeedddeeedddesdfsdf"
    "esseefedfefeeedddeeedddesdfsdf"
%alternative without readlines
S = regexp(fileread(filename), '\r?\n', 'split');
matches = S(~cellfun(@isempty, regexp(S, '[de]{10}', 'once')));
matches
matches = 1×4 cell array
    {'esseefedfefeeedddeeedddesdfsdf'}    {'esseeededededeeeededededeeedfsdf'}    {'esseefedfefeeedddeeedddesdfsdf'}    {'esseefedfefeeedddeeedddesdfsdf'}
%alternative without splitting
S = fileread(filename);
matches = regexp(S, '^.*[de]{10}.*$', 'match', 'dotexceptnewline', 'lineanchors');
matches
matches = 1×4 cell array
    {'esseefedfefeeedddeeedddesdfsdf'}    {'esseeededededeeeededededeeedfsdf'}    {'esseefedfefeeedddeeedddesdfsdf'}    {'esseefedfefeeedddeeedddesdfsdf'}

Melden Sie sich an, um zu kommentieren.

Best solution to finding repeating characters on a line.

1 Kommentar
-1 ältere Kommentare anzeigen-1 ältere Kommentare ausblenden

Akzeptierte Antwort

5 Kommentare
3 ältere Kommentare anzeigen3 ältere Kommentare ausblenden

Weitere Antworten (1)

5 Kommentare
3 ältere Kommentare anzeigen3 ältere Kommentare ausblenden

Siehe auch

Kategorien

Tags

Produkte

Version

Community Treasure Hunt

Best solution to finding repeating characters on a line.

1 Kommentar -1 ältere Kommentare anzeigen-1 ältere Kommentare ausblenden

Akzeptierte Antwort

5 Kommentare 3 ältere Kommentare anzeigen3 ältere Kommentare ausblenden

Weitere Antworten (1)

5 Kommentare 3 ältere Kommentare anzeigen3 ältere Kommentare ausblenden

Siehe auch

Kategorien

Tags

Produkte

Version

Community Treasure Hunt

1 Kommentar
-1 ältere Kommentare anzeigen-1 ältere Kommentare ausblenden

5 Kommentare
3 ältere Kommentare anzeigen3 ältere Kommentare ausblenden

5 Kommentare
3 ältere Kommentare anzeigen3 ältere Kommentare ausblenden