nt2aa
Convert nucleotide sequence to amino acid sequence
Description
Examples
Convert Nucleotide Sequence to Amino Acid Sequence
Generate a random DNA sequence.
ntSeq = randseq(30)
ntSeq = 'TTATGACGTTATTCTACTTTGATTGTGCGA'
Convert the DNA sequence to an amino acid sequence using the standard genetic code.
aaSeq = nt2aa(ntSeq)
aaSeq = 'L*RYSTLIVR'
Generate amino acid sequences for all three reading frames using the yeast mitochondrial genetic code.
aaSeq = nt2aa(ntSeq,Frame="all",GeneticCode=3)
aaSeq = 3x1 cell
{'LWRYSTLIVR'}
{'YDVITTWLC' }
{'MTLFYFDCA' }
Input Arguments
SeqNT
— Nucleotide sequence
character vector | string scalar | row vector of integers | structure
Nucleotide sequence, specified as one of the following.
Character vector or string scalar consisting of the characters
A
,C
,G
,T
, andU
, and ambiguous charactersR
,Y
,K
,M
,S
,W
,B
,D
,H
,V
, andN
.Row vector of integers specifying a nucleotide sequence. For information on valid integers, see Mapping Nucleotide Integers to Letter Codes.
Structure that contains a nucleotide sequence in the
Sequence
field. Thefastaread
,fastqread
,emblread
,getembl
,genbankread
, andgetgenbank
functions return structures with aSequence
field.
Note
Hyphens are valid only if the codon to which it belongs represents a gap, that is, the codon contains all hyphens. For example,
ACT---TGA
.Do not use a sequence with hyphens if you specify
"all"
forFrame
.
Example: SeqAA = nt2aa("CGACTT")
converts the nucleotide sequence
to the amino acid sequence 'RL'
.
Data Types: double
| char
| string
| struct
Name-Value Arguments
Specify optional pairs of arguments as
Name1=Value1,...,NameN=ValueN
, where Name
is
the argument name and Value
is the corresponding value.
Name-value arguments must appear after other arguments, but the order of the
pairs does not matter.
Example: SeqAA = nt2aa("CGACTT",Frame=2)
Frame
— Reading frame
1
(default) | 2
| 3
| "all"
Reading frame, specified as 1
, 2
,
3
, or "all"
. If you specify
"all"
, the function outputs a 3-by-1 cell array containing the
amino acid sequences for all three reading frames.
Example: SeqAA = nt2aa("AAGACT",Frame=3)
converts the nucleotide
sequence to an amino acid sequence using the third reading frame.
Data Types: double
| char
| string
GeneticCode
— Genetic code number or name
1
(default) | integer
| character vector
| string scalar
Genetic code number or name, specified as an integer, character vector, or string scalar. This table lists valid genetic code numbers and names.
Genetic Code Number | Genetic Code Name |
---|---|
1 | Standard |
2 | Vertebrate Mitochondrial |
3 | Yeast Mitochondrial |
4 | Mold , Protozoan , Coelenterate Mitochondrial , and Mycoplasma/Spiroplasma |
5 | Invertebrate Mitochondrial |
6 | Ciliate , Dasycladacean , and Hexamita Nuclear |
9 | Echinoderm Mitochondrial |
10 | Euplotid Nuclear |
11 | Bacterial and Plant Plastid |
12 | Alternative Yeast Nuclear |
13 | Ascidian Mitochondrial |
14 | Flatworm Mitochondrial |
15 | Blepharisma Nuclear |
16 | Chlorophycean Mitochondrial |
21 | Trematode Mitochondrial |
22 | Scenedesmus Obliquus Mitochondrial |
23 | Thraustochytrium Mitochondrial |
Tip
If you use a code name, you can truncate the name to the first two letters of the name.
This table shows the nucleotide codon to amino acid mapping for the standard genetic code.
Amino Acid Name | Amino Acid Code | Nucleotide Codon |
---|---|---|
Alanine | A | GCT GCC GCA GCG |
Arginine | R | CGT CGC CGA CGG AGA AGG |
Asparagine | N | AAT AAC |
Aspartic acid (Aspartate) | D | GAT GAC |
Cysteine | C | TGT TGC |
Glutamine | Q | CAA CAG |
Glutamic acid (Glutamate) | E | GAA GAG |
Glycine | G | GGT GGC GGA GGG |
Histidine | H | CAT CAC |
Isoleucine | I | ATT ATC ATA |
Leucine | L |
† indicates an alternative
start codon for the standard genetic code as defined here. If you are using |
Lysine | K | AAA AAG |
Methionine | M | ATG |
Phenylalanine | F | TTT TTC |
Proline | P | CCT CCC CCA CCG |
Serine | S | TCT TCC TCA TCG AGT AGC |
Threonine | T | ACT ACC ACA ACG |
Tryptophan | W | TGG |
Tyrosine | Y | TAT TAC |
Valine | V | GTT GTC GTA GTG |
Asparagine or Aspartic acid (Aspartate) | B | Random codon from D and N |
Glutamine or Glutamic acid (Glutamate) | Z | Random codon from E and Q |
Unknown amino acid (any amino acid) | X | Random codon |
Translation stop | * | TAA TAG TGA |
Gap of indeterminate length | - | --- |
Unknown character (any character or symbol not in table) | ? | ??? |
Example: SeqAA = nt2aa("ACGTTA",GeneticCode=2)
converts the
nucleotide sequence using the vertebrate mitochondrial genetic code.
Data Types: double
| char
| string
AlternativeStartCodons
— Flag to translate alternative start codons
false
(default) | true
Flag to translate alternative start codons, specified as true
or false
. When true
, if the first codon of a
sequence is a known alternative start codon, the function translates the codon to
methionine (M
). When false
, the function
translates the alternative start codon to its corresponding amino acid.
For more information on alternative start codons, visit https://www.ncbi.nlm.nih.gov/Taxonomy/Utils/wprintgc.cgi?mode=t#SG1.
Example: SeqAA = nt2aa("TTGATC",AlternativeStartCodons=true)
converts the first codon to methionine (M
) instead of leucine
(L
).
Data Types: logical
ACGTOnly
— Flag to control the behavior of ambiguous nucleotides
true
(default) | false
Flag to control the behavior of ambiguous nucleotides (R
,
Y
, K
, M
,
S
, W
, B
,
D
, H
, V
, and
N
), specified as true
or
false
. If you specify true
, the function
produces an error if any ambiguous nucleotides are present. If you specify
false
, the function tries to resolve any ambiguities. If it
cannot, the function returns X
for the affected codon.
Data Types: logical
Output Arguments
SeqAA
— Amino acid sequence
character vector | row vector of integers | cell array
Amino acid sequence, specified as one of the following.
If
SeqNT
is a character vector or string scalar, then the function returns a character vector.If
SeqNT
is a row vector of integers, then the function returns a row vector of integers. For information on valid integers, see Mapping Amino Acid Letter Codes to Integers.If
SeqNT
is a structure, then the function returnsSeqAA
with the same data type as theSequence
field, either a character vector or a row vector of integers.
Setting Frame
to "all"
directs
the function to return a 3-by-1 cell array.
Version History
Introduced before R2006a
See Also
aa2nt
| aminolookup
| baselookup
| codonbias
| dnds
| dndsml
| geneticcode
| isotopicdist
| revgeneticcode
| seqviewer
MATLAB Command
You clicked a link that corresponds to this MATLAB command:
Run the command by entering it in the MATLAB Command Window. Web browsers do not support MATLAB commands.
Select a Web Site
Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select: .
You can also select a web site from the following list
How to Get Best Site Performance
Select the China site (in Chinese or English) for best site performance. Other MathWorks country sites are not optimized for visits from your location.
Americas
- América Latina (Español)
- Canada (English)
- United States (English)
Europe
- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)
- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom (English)
Asia Pacific
- Australia (English)
- India (English)
- New Zealand (English)
- 中国
- 日本Japanese (日本語)
- 한국Korean (한국어)