Using Recursive function to calculate all possible peptide combinations

Question

Joseph Mills am 22 Sep. 2017

0
Verknüpfen

Direkter Link zu dieser Frage

https://de.mathworks.com/matlabcentral/answers/357969-using-recursive-function-to-calculate-all-possible-peptide-combinations

Beantwortet: Jeremy Huard am 12 Jun. 2024

Hello,

I'm trying to write a recursive function in MATLAB with input 'n' terms that calculates all possible peptides formed with the amino acids used being 'ACDEFGHIKLMNPQRSTVYW'. So if there are n characters in the sequence, there are 20^n combinations that could be produced from those characters. I'm trying to write a recursive function that would incorporate this. Any ideas?

Thank you

0 Kommentare
-2 ältere Kommentare anzeigen-2 ältere Kommentare ausblenden

Melden Sie sich an, um zu kommentieren.

Melden Sie sich an, um diese Frage zu beantworten.

Answer 1

Walter Roberson am 22 Sep. 2017

1
Verknüpfen

Direkter Link zu dieser Antwort

https://de.mathworks.com/matlabcentral/answers/357969-using-recursive-function-to-calculate-all-possible-peptide-combinations#answer_282766

In MATLAB Online öffnen

There is a general computation pattern I refer to as the odometer pattern

possible_values = {'Z_Z', -3, 'V7', 'G', 'C', 119, 'Sr', 'Q', '2J'};
num_diff_vals = length(possible_values);
NumPlaces = 5;   %or as needed
num_possibilities = num_diff_vals^NumPlaces;
results = cell(num_possibilities, NumPlaces);
odo = zeros(1, NumPlaces);
for K = 1 : num_possibilities
  results(K,:) = possible_values( odo + 1 );    %note: adding 1 to each position
  for P = NumPlaces : -1 : 1
    odo(P) = odo(P) + 1;
    if odo(P) == num_diff_vals
      odo(P) = 0;
    else
      break;
    end
  end
end

The variable odo increases by 1 in the final digit each time. If that would take it past the maximum value, then it is reset to the first value and the next digit to the left is incremented. If that digit would overflow, zero it and increment the next digit to the left, and so on. At any given time, the odometer can be used as a vector index into the array of permitted values such as peptide letters.

This general structure has the same effect as if you had NumPlaces nested loops.

In the special case where all of the valid values are single characters, then you can use a character vector instead of a cell array to store the possible values, and a character array to store the results, with a minor change to how you initialize the results array.

0 Kommentare
-2 ältere Kommentare anzeigen-2 ältere Kommentare ausblenden

Melden Sie sich an, um zu kommentieren.

Answer 2

James Tursa am 22 Sep. 2017

0
Verknüpfen

Direkter Link zu dieser Antwort

https://de.mathworks.com/matlabcentral/answers/357969-using-recursive-function-to-calculate-all-possible-peptide-combinations#answer_282743

First, for even a moderate value of n, the number of possible combinations will be extremely large. Generating them as a list could take a very, very, long time and will likely use more memory than you have. Even if you could generate the list, it would take a long, long, time to process it in any algorithm you are envisioning.

Second, using a recursive function to generate this extremely long list would probably be the absolute slowest way to do it (unless you also mixed in some hard disk access which would make it even slower).

What size n are you thinking of using? What are you intending to do with this downstream in your code?

6 Kommentare
4 ältere Kommentare anzeigen4 ältere Kommentare ausblenden

Joseph Mills am 22 Sep. 2017

In MATLAB Online öffnen

Here's what I've got so far. The first for loop correctly finds the first ones, basically just all of the letters by itself. But there's an issue in the second for loop that I can't find

function [peptide] = aminoAcid(n)
%n is the amount of amino acids in the protein or peptide. This can also be
%referenced as the amount of characters. The output is the sequence for the
%protein or peptide.
AA = 'ACDEFGHIKLMNPQRSTVYW';
for i=1:length(AA)
        if n==1
            peptide(i,1)=AA(i);
        end
end
for i=1:n
    for j=1:length(AA)
        for k=1:length(AA)
            peptide(k,n)=[AA(j),aminoAcid(n-1)]
        end
    end
end

James Tursa am 22 Sep. 2017

Bearbeitet: James Tursa am 22 Sep. 2017

In MATLAB Online öffnen

So, if you want to stick with the loops, here are some suggestions:

For the first column to start with, you don't need a loop at all. Just set the initial result to be the AA as a column vector. I.e.,

peptide = AA(:);

Then in the next loop, since you already have that first column, run the loop from i=2:n instead of i=1:n

Inside that loop, you will need to replicate the current peptide matrix prior to tacking on your AA(j) characters. You need to replicate it by the number of elements of AA (that is, each current row of peptide gets each AA element tacked on). That k loop then needs to run over the rows of the peptide matrix, appending your AA(j) characters. There is no need to call aminoAcid again in a recursive fashion. E.g., making these changes:

function [peptide] = aminoAcid(n)
%n is the amount of amino acids in the protein or peptide. This can also be
%referenced as the amount of characters. The output is the sequence for the
%protein or peptide.
AA = 'ACDEFGHIKLMNPQRSTVYW';
peptide = AA(:);  % <-- set the first column directly, no need for a loop
m = numel(AA);  % <-- remember the number of elements of AA
for i=2:n  % <-- start the loop at 2
    p = size(peptide,1);  % <-- remember the current number of rows of peptide
    peptide = repmat(peptide,m,1);  % <-- replicate peptide by row for each element of AA
    z = 1;  % <-- start the generic counter
    for j=1:m  % <-- for each element of AA
        for k=1:p  % <-- for each row of peptide (prior to replicating it)
            peptide(z,i) = AA(j);  % <-- set the new character
            z = z + 1;  % <-- go to next row of peptide
        end
    end
end

Of course, much of this could be vectorized. But I was trying to retain as much of your looping logic as I could. The main things you were missing were replicating peptide at each of the outer loop iterations, and then noting that one of your inner loops should be using the number of rows of peptide instead of length(AA) for the indexing limit.

Melden Sie sich an, um zu kommentieren.

Answer 3

Jeremy Huard am 12 Jun. 2024

0
Verknüpfen

Direkter Link zu dieser Antwort

https://de.mathworks.com/matlabcentral/answers/357969-using-recursive-function-to-calculate-all-possible-peptide-combinations#answer_1470831

In MATLAB Online öffnen

R2023a introduced the function combinations, which can be used on most datatypes including strings.

Your code could then look like the following:

AA = "ACDEFGHIKLMNPQRSTVYW";
% extract individual letters
letters = extract(AA,lettersPattern(1))
letters = 20x1 string array
    "A"
    "C"
    "D"
    "E"
    "F"
    "G"
    "H"
    "I"
    "K"
    "L"
    "M"
    "N"
    "P"
    "Q"
    "R"
    "S"
    "T"
    "V"
    "Y"
    "W"
% generate all sequences of length 1 to 3
N = 3;
% preallocate a string array to store the results
Ncombinations = sum(numel(letters).^(1:N));
allcombinations = strings(Ncombinations,1);
% generate combinations
for i=1:N
    repeatedletters = repmat({letters},1,i);
    allcombinations_temp = combinations(repeatedletters{:});
    idx = sum(numel(letters).^(1:(i-1)))+1 : sum(numel(letters).^(1:i));
    allcombinations(idx) = join(allcombinations_temp{:,:},"",2);
end
allcombinations(17:24)
ans = 8x1 string array
    "T"
    "V"
    "Y"
    "W"
    "AA"
    "AC"
    "AD"
    "AE"

0 Kommentare
-2 ältere Kommentare anzeigen-2 ältere Kommentare ausblenden

Melden Sie sich an, um zu kommentieren.

Using Recursive function to calculate all possible peptide combinations

0 Kommentare
-2 ältere Kommentare anzeigen-2 ältere Kommentare ausblenden

Antworten (3)

0 Kommentare
-2 ältere Kommentare anzeigen-2 ältere Kommentare ausblenden

6 Kommentare
4 ältere Kommentare anzeigen4 ältere Kommentare ausblenden

0 Kommentare
-2 ältere Kommentare anzeigen-2 ältere Kommentare ausblenden

Siehe auch

Kategorien

Tags

Produkte

Community Treasure Hunt

Using Recursive function to calculate all possible peptide combinations

0 Kommentare -2 ältere Kommentare anzeigen-2 ältere Kommentare ausblenden

Antworten (3)

0 Kommentare -2 ältere Kommentare anzeigen-2 ältere Kommentare ausblenden

6 Kommentare 4 ältere Kommentare anzeigen4 ältere Kommentare ausblenden

0 Kommentare -2 ältere Kommentare anzeigen-2 ältere Kommentare ausblenden

Siehe auch

Kategorien

Tags

Produkte

Community Treasure Hunt

0 Kommentare
-2 ältere Kommentare anzeigen-2 ältere Kommentare ausblenden

0 Kommentare
-2 ältere Kommentare anzeigen-2 ältere Kommentare ausblenden

6 Kommentare
4 ältere Kommentare anzeigen4 ältere Kommentare ausblenden

0 Kommentare
-2 ältere Kommentare anzeigen-2 ältere Kommentare ausblenden