Sequence Distance

5 Ansichten (letzte 30 Tage)
Talha
Talha am 20 Jul. 2011
I am sort of confused on how matlab gets its answers for various distance methods. My boss wants to know how matlab arrives at various answers.
I set up matlab to give me answers in fractions, so when I analyze two sequences of the same length, it gives me the denominator of the fraction to be length of the sequences (for example, if both amino acid sequences have a length of 327 then the answer has a denominator of 327). I understood this until when I analyzed two amino acid sequences with each having a different length, one being 369 amino acids long, and another being 379 amino acids long. It gave me the answer: 209/398. I don't understand how it got to having a denominator of 398 (I specifically asked it to use p-distance). When I type in "help seqpdist", it does not give me very clear explanation on how the p-distance works.
So can some one please help me out? I would greatly appreciate it!

Antworten (1)

Lucio Cetto
Lucio Cetto am 20 Jul. 2011
When you are comparing sequences it is common to first align them using a dynamic programing algorithm. SEQPDIST uses NWALIGN to pair-wise align all possible pairs of sequences and then takes the measure from the alignment.
Consider:
seqpdist({'AACGT','AAGT','AAT'},'alpha','nt','square',1,'method','p-dist')
The alignment between 1 and 2 is 'AACGT' and 'AA-GT' =>1/5
The alignment between 1 and 3 is 'AACGT' and 'AA--T' =>2/5
The alignment between 1 and 3 is 'AAGT' and 'AA-T' =>1/4
HTH

Kategorien

Mehr zu Genomics and Next Generation Sequencing finden Sie in Help Center und File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by