How to see if characters are present in a string array.

2 views (last 30 days)
I am trying to write some code that will take a short amino acid sequence, ex. 'GSA' and then search through a string array of sequences to find the number and index of matches, but I would like it to ignore the order of the characters. As long as each character is present, I would like to consider it a hit.
Here is the code I have so far, which kind of works. InputSeq is the sequence I would like to search for, and AAseq is the string array of sequences that I would be searching through. This code only produces a match if all characters are present AND the order is correct.
InputSeq = "GSA";
AAseq = [ SGD; SGS; SGA; SGV; SGS; SGA; SGD; SGS; SGS; SGY; SGD; SGS; SGI.........];
result = ismember(InputSeq, AAseq)
This kind of works, but it will not register a match if the order of the characters does not match.

Accepted Answer

Stephen23 on 3 Dec 2021
Edited: Stephen23 on 3 Dec 2021
Assuming that all string elements contain exactly the same number of characters, then you can do this easily with basci logical operations on character arrays:
A = "GSA";
B = ["SGD";"SGS";"SGA";"SGV";"SGS";"SGA";"SGD";"SGS";"SGS";"SGY";"SGD";"SGS";"SGI"]
B = 13×1 string array
X = all(sort(char(A))==sort(char(B),2),2)
X = 13×1 logical array
0 0 1 0 0 1 0 0 0 0
Or without sorting:
X = all(any(char(A)==permute(char(B),[1,3,2]),3),2)
X = 13×1 logical array
0 0 1 0 0 1 0 0 0 0

Sign in to comment.

More Answers (1)

Walter Roberson
Walter Roberson on 2 Dec 2021
You could use multiple contains() tests.
But I suggest that instead you do something like
ismember(sort(char(InputSeq)), cellfun(@sort, cellstr(AAseq), 'uniform', 0))
Walter Roberson
Walter Roberson on 2 Dec 2021
ismember( cellfun(@sort, cellstr(AAseq), 'uniform', 0), sort(char(InputSeq)) )
You could also strcmp()

Sign in to comment.





Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by