Filter löschen
Filter löschen

the number of occurences of each character of one string,in another

1 Ansicht (letzte 30 Tage)
i have a string of more than 100 characters (fasta format of a protein sequence. like
'MEQNGLDHDSRSSIDTTINDTQKTFLEFRSYTQLSEKLASSSSYTAPPLNEDGPKGVASAVSQGSESVVSWTTLTHVYSILGAYGGPTCLYPTATYFLMGTSKGCVLIFNYNEHLQTILVPTLSEDPSIH'
which is being shortened here for simplicity) and i want to find out whether or not it is hydrophobic. so i have to check the number of occurrences of each of the characters in the set 'A C F I L M P V W Y'(hydrophob amino acids) in my fasta string. considering the very long length of fasta strings, is there any easy way to do that by matlab string functions?

Akzeptierte Antwort

Azzi Abdelmalek
Azzi Abdelmalek am 28 Dez. 2014
Bearbeitet: Azzi Abdelmalek am 28 Dez. 2014
str='MEQNGLDHDSRSSIDTTINDTQKTFLEFRSYTQLSEKLASSSSYTAPPLNEDGPKGVASAVSQGSESVVSWTTLTHVYSILGAYGGPTCLYPTATYFLMGTSKGCVLIFNYNEHLQTILVPTLSEDPSIH'
p={'A' 'C' 'F' 'I' 'L' 'M' 'P' 'V' 'W' 'Y'}'
out=[p cellfun(@(x) nnz(ismember(str,x)),p,'un',0)]
  2 Kommentare
hiva
hiva am 29 Dez. 2014
thanks a lot.i guess this works well for a lot of similar cases that are supposed to work the same way in my code(since it is feature extraction and there are lots of features). also tells me how much i don't know from matlab.thanks.
Stephen23
Stephen23 am 30 Dez. 2014
Bearbeitet: Stephen23 am 30 Dez. 2014
This could be simplified and speeded-up by using arrayfun instead of cellfun, and removing the ismember:
>> t = 'ACFILMPVWY';
>> arrayfun(@(x)sum(str==x), t)
ans =
6 2 4 6 13 2 7 7 1 7

Melden Sie sich an, um zu kommentieren.

Weitere Antworten (4)

Peter Perkins
Peter Perkins am 29 Dez. 2014
Another possibility:
>> s = 'MEQNGLDHDSRSSIDTTINDTQKTFLEFRSYTQLSEKLASSSSYTAPPLNEDGPKGVASAVSQGSESVVSWTTLTHVYSILGAYGGPTCLYPTATYFLMGTSKGCVLIFNYNEHLQTILVPTLSEDPSIH';
>> t = 'ACFILMPVWY';
>> n = hist(double(s),1:90);
>> n(t)
ans =
6 2 4 6 13 2 7 7 1 7

Luuk van Oosten
Luuk van Oosten am 24 Jan. 2015
Bearbeitet: Luuk van Oosten am 24 Jan. 2015
I reckon you are using the BioInformatics Toolbox. In that case you can probably use:
aacount('SEQ')
Where SEQ is of course your sequence of interest: MEQNGLDHDSRSSIDTTINDTQKTFLEF....
and using
nr_A = All.A
nr_C = All.C
nr_F = All.F
etc. (you get the idea)
you get the numbers of your hydrophobic residues. Sum these and you have your hydrophobic score. You might want to 'normalize' this number by dividing this number by the total amount of amino acids in the sequence.
Of course you can write a loop for this and calculate the hydrophobic score for all your sequences in your FASTA file.

Shoaibur Rahman
Shoaibur Rahman am 28 Dez. 2014
s = 'MEQNGLDHDSRSSIDTTINDTQKTFLEFRSYTQLSEKLASSSSYTAPPLNEDGPKGVASAVSQGSESVVSWTTLTHVYSILGAYGGPTCLYPTATYFLMGTSKGCVLIFNYNEHLQTILVPTLSEDPSIH';
numA = sum(s=='A')
numC = sum(s=='C')
numF = sum(s=='F')
numI = sum(s=='I')
numL = sum(s=='L')
numM = sum(s=='M')
numP = sum(s=='P')
numV = sum(s=='V')
numW = sum(s=='W')
numY = sum(s=='Y')

Stephen23
Stephen23 am 30 Dez. 2014
Bearbeitet: Stephen23 am 30 Dez. 2014
A neat solution using bsxfun :
>> s = 'MEQNGLDHDSRSSIDTTINDTQKTFLEFRSYTQLSEKLASSSSYTAPPLNEDGPKGVASAVSQGSESVVSWTTLTHVYSILGAYGGPTCLYPTATYFLMGTSKGCVLIFNYNEHLQTILVPTLSEDPSIH';
>> t = 'ACFILMPVWY';
>> sum(bsxfun(@eq,s.',t))
ans =
6 2 4 6 13 2 7 7 1 7
  1 Kommentar
hiva
hiva am 30 Dez. 2014
Bearbeitet: hiva am 30 Dez. 2014
wow!!! just wonderful. it works pretty well.thanks a lot.

Melden Sie sich an, um zu kommentieren.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by