generate all variations on a 20-mer, that are 1 to 4 mismatches away
4 Ansichten (letzte 30 Tage)
Ältere Kommentare anzeigen
Shlomo Geva
am 23 Mai 2023
Bearbeitet: Shlomo Geva
am 22 Jun. 2023
We consider a kmer. An arbitray k long DNA sequence, consisting of only {A,C,G,T}.
For instance, 'ACTGGTCATTTGGGCTGGTA'. Let's call it a kernel.
We need to generate from the kernel an array of all unique kmers, each of which differs from the kernel by 1 to n positions.
n is typically a small number - 5 at most.
I wrote a solution, but it is a bit slow - it takes about 1.7 sec to generate all variations on a 20-mer, that are at most 4 mismatches away.
A much faster Matlab solution will be very useful, without going into the rabbit hole of implementing a MEX file.
Thanks!
2 Kommentare
Walter Roberson
am 23 Mai 2023
I wrote a solution, but it is a bit slow - it takes about 1.7 sec to generate all variations on a 20-mer, that are at most 4 mismatches away.
And how many seconds of your life are you prepared to dedicate to making the function faster? What is the estimated total number of times you expect your code will be executed before your program falls out of use?
Akzeptierte Antwort
Matt J
am 23 Mai 2023
Bearbeitet: Matt J
am 23 Mai 2023
Using blkColon from this FEX download,
kmer='ACTGGTCATTTGGGCTGGTA';
k=length(kmer);
n=4;
tic;
v=nchoosek(1:k,n);
clear c
[c{1:n}]=ndgrid('AGCT');
c=reshape( cat(n+1,c{:}),[],n);
p=height(c);
idx=repmat( any( (1:k)==permute(v,[2,3,1]) ,1) ,p,1,1);
Kmers=repmat( kmer ,p,1,height(v));
Kmers(idx)=repmat( c,1,1,size(idx,3));
Kmers=blkColon(Kmers,[1,k]);
Kmers(all(kmer==Kmers,2),:)=[]; %the result
toc;
Elapsed time is 0.164970 seconds.
3 Kommentare
Weitere Antworten (0)
Siehe auch
Kategorien
Mehr zu Probability Distributions finden Sie in Help Center und File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!