Random amino acid sequence generation with a given amino acid count of a specified sequence
2 Ansichten (letzte 30 Tage)
Ältere Kommentare anzeigen
Sk. Hassan
am 8 Jun. 2022
Bearbeitet: Tim DeFreitas
am 22 Jun. 2022
I have a sequence
>sp|Q7RTX1|TS1R1_HUMAN Taste receptor type 1 member 1 OS=Homo sapiens OX=9606 GN=TAS1R1 PE=2 SV=1
MLLCTARLVGLQLLISCCWAFACHSTESSPDFTLPGDYLLAGLFPLHSGCLQVRHRPEVT
LCDRSCSFNEHGYHLFQAMRLGVEEINNSTALLPNITLGYQLYDVCSDSANVYATLRVLS
LPGQHHIELQGDLLHYSPTVLAVIGPDSTNRAATTAALLSPFLVPMISYAASSETLSVKR
QYPSFLRTIPNDKYQVETMVLLLQKFGWTWISLVGSSDDYGQLGVQALENQATGQGICIA
FKDIMPFSAQVGDERMQCLMRHLAQAGATVVVVFSSRQLARVFFESVVLTNLTGKVWVAS
EAWALSRHITGVPGIQRIGMVLGVAIQKRAVPGLKAFEEAYARADKKAPRPCHKGSWCSS
NQLCRECQAFMAHTMPKLKAFSMSSAYNAYRAVYAVAHGLHQLLGCASGACSRGRVYPWQ
LLEQIHKVHFLLHKDTVAFNDNRDPLSSYNIIAWDWNGPKWTFTVLGSSTWSPVQLNINE
TKIQWHGKDNQVPKSVCSSDCLEGHQRVVTGFHHCCFECVPCGAGTFLNKSDLYRCQPCG
KEEWAPEGSQTCFPRTVVFLALREHTSWVLLAANTLLLLLLLGTAGLFAWHLDTPVVRSA
GGRLCFLMLGSLAAGSGSLYGFFGEPTRPACLLRQALFALGFTIFLSCLTVRSFQLIIIF
KFSTKVPTFYHAWVQNHGAGLFVMISSAAQLLICLTWLVVWTPLPAREYQRFPHLVMLEC
TETNSLGFILAFLYNGLLSISAFACSYLGKDLPENYNEAKCVTFSLLFNFVSWIAFFTTA
SVYDGKYLPAANMMAGLSSLSSGFGGYFLPKCYVILCRPDLNSTEHFQASIQDYTRRCGS
T
I wish to get a set of 10000 sequences (in fasta format) having identical amino acid counts as in the above sequence.
I could not use properly the randseq function in getting what I need.
Any help would be highly appreciated.
0 Kommentare
Akzeptierte Antwort
Tim DeFreitas
am 8 Jun. 2022
Bearbeitet: Tim DeFreitas
am 22 Jun. 2022
If you want exactly the same amino acid counts, then you want to randomly shuffle the input sequence, which can be done with randperm:
[~, sequences] = fastaread('pf00002.fa');
targetSeq = sequences{1}; % Select specific sequence from FA file.
randomSeqs = cell(1,10000);
for i=1:numel(randomSeqs)
randomSeqs{i} = targetSeq(:, randperm(numel(targetSeq)));
end
0 Kommentare
Weitere Antworten (1)
Sam Chak
am 8 Jun. 2022
Hi @Sk. Hassan
I'm no expert in this, but it is possible to generate a sequence of numbers that associate with the Roman alphabet.
Here is a simple script that you can modify to generate a long sequence of characters like the PASTA spaghetti form.
function amino = generateAmino(n)
ASCII_L = 65;
ASCII_U = 90;
C = round((ASCII_U - ASCII_L).*rand(n, 1) + ASCII_L);
amino = char(C');
end
On the Command Window:
S1 = generateAmino(60)
S1 =
'TGNRWYODEGVGUGXJFGPMJVPOXHTTKOCBNTXDOMAIEUINEPHQRTLCGXEVNZCL'
You can then use a for loop to generate the number of sequences that you want.
numOfSeq = 10000;
for i = 1:numOfSeq
S(i,:) = generateAmino(60);
end
S
That's the basic idea. You may need to modify the script to select certain Roman alphabets in the ASCII chart.
0 Kommentare
Siehe auch
Kategorien
Mehr zu QSP, PKPD, and Systems Biology finden Sie in Help Center und File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!