codoncount
Count codons in nucleotide sequence
Syntax
Description
[
returns Codons
,CodonArray
] =
codoncount(SeqNT
)CodonArray
, a
4-by-4-by-4 array containing the raw count data for each codon.
___ = codoncount(
specifies options using one or more name-value arguments in addition to the input arguments
in previous syntaxes. For example, to bundle ambiguous nucleotide characters, set
SeqNT
,Name=Value
)AmbiguousValue
to "bundle"
.
Examples
Count Codons in a Nucleotide Sequence
SeqNT = randseq(1000); Codons = codoncount(SeqNT)
Codons = struct with fields:
AAA: 11
AAC: 5
AAG: 8
AAT: 6
ACA: 6
ACC: 7
ACG: 4
ACT: 7
AGA: 6
AGC: 9
AGG: 5
AGT: 2
ATA: 6
ATC: 4
ATG: 4
ATT: 6
CAA: 3
CAC: 5
CAG: 7
CAT: 10
CCA: 5
CCC: 4
CCG: 8
CCT: 5
CGA: 7
CGC: 6
CGG: 5
CGT: 5
CTA: 4
CTC: 7
CTG: 4
CTT: 5
GAA: 5
GAC: 6
GAG: 5
GAT: 4
GCA: 3
GCC: 2
GCG: 8
GCT: 5
GGA: 6
GGC: 7
GGG: 10
GGT: 4
GTA: 2
GTC: 6
GTG: 5
GTT: 2
TAA: 2
TAC: 4
TAG: 1
TAT: 4
TCA: 6
TCC: 2
TCG: 5
TCT: 5
TGA: 4
TGC: 1
TGG: 5
TGT: 8
TTA: 6
TTC: 1
TTG: 8
TTT: 5
Count the codons in the second frame for the reverse complement of the sequence.
r2Codons = codoncount(SeqNT,Frame=2,Reverse=true)
r2Codons = struct with fields:
AAA: 5
AAC: 2
AAG: 5
AAT: 6
ACA: 8
ACC: 4
ACG: 5
ACT: 2
AGA: 5
AGC: 5
AGG: 5
AGT: 7
ATA: 4
ATC: 4
ATG: 10
ATT: 6
CAA: 8
CAC: 5
CAG: 4
CAT: 4
CCA: 5
CCC: 10
CCG: 5
CCT: 5
CGA: 5
CGC: 8
CGG: 8
CGT: 4
CTA: 1
CTC: 5
CTG: 7
CTT: 8
GAA: 1
GAC: 6
GAG: 7
GAT: 4
GCA: 1
GCC: 7
GCG: 6
GCT: 9
GGA: 2
GGC: 2
GGG: 4
GGT: 7
GTA: 4
GTC: 6
GTG: 5
GTT: 5
TAA: 6
TAC: 2
TAG: 4
TAT: 6
TCA: 4
TCC: 6
TCG: 7
TCT: 6
TGA: 6
TGC: 3
TGG: 5
TGT: 6
TTA: 2
TTC: 5
TTG: 3
TTT: 11
Create a heat map of the codons from the original sequence and overlay a grid that groups the synonymous codons according to the standard genetic code.
codoncount(SeqNT,Figure=true)
AAA - 11 AAC - 5 AAG - 8 AAT - 6 ACA - 6 ACC - 7 ACG - 4 ACT - 7 AGA - 6 AGC - 9 AGG - 5 AGT - 2 ATA - 6 ATC - 4 ATG - 4 ATT - 6 CAA - 3 CAC - 5 CAG - 7 CAT - 10 CCA - 5 CCC - 4 CCG - 8 CCT - 5 CGA - 7 CGC - 6 CGG - 5 CGT - 5 CTA - 4 CTC - 7 CTG - 4 CTT - 5 GAA - 5 GAC - 6 GAG - 5 GAT - 4 GCA - 3 GCC - 2 GCG - 8 GCT - 5 GGA - 6 GGC - 7 GGG - 10 GGT - 4 GTA - 2 GTC - 6 GTG - 5 GTT - 2 TAA - 2 TAC - 4 TAG - 1 TAT - 4 TCA - 6 TCC - 2 TCG - 5 TCT - 5 TGA - 4 TGC - 1 TGG - 5 TGT - 8 TTA - 6 TTC - 1 TTG - 8 TTT - 5
Input Arguments
SeqNT
— Nucleotide sequence
string scalar | character vector | integer row vector | structure
Nucleotide sequence, specified as a string scalar, character vector, integer row vector, or structure.
To specify
SeqNT
as a string scalar or character vector, see Mapping Nucleotide Letter Codes to Integers for valid letter codes.To specify
SeqNT
as an integer row vector, see Mapping Nucleotide Integers to Letter Codes for valid integers.To specify
SeqNT
as a structure, the structure must contain aSequence
field, such as returned byfastaread
,fastqread
,emblread
,getembl
,genbankread
, orgetgenbank
.
Example: "ACGT"
Example: 1:4
Data Types: string
| char
| double
| single
| int8
| int16
| int32
| int64
| uint8
| uint16
| uint32
| uint64
| struct
Name-Value Arguments
Specify optional pairs of arguments as
Name1=Value1,...,NameN=ValueN
, where Name
is
the argument name and Value
is the corresponding value.
Name-value arguments must appear after other arguments, but the order of the
pairs does not matter.
Example: codoncount(seqNT,Ambiguous="prorate")
Frame
— Reading frame in nucleotide sequence
1
(default) | 2
| 3
Reading frame in nucleotide sequence, specified as 1
,
2
, or 3
.
Data Types: double
| single
| int8
| int16
| int32
| int64
| uint8
| uint16
| uint32
| uint64
Reverse
— Whether to return the reverse complement sequence
false
or 0
(default) | true
or 1
Whether to return the reverse complement sequence, specified as a numeric or
logical 0
(false
) or 1
(true
).
Data Types: logical
| double
| single
| int8
| int16
| int32
| int64
| uint8
| uint16
| uint32
| uint64
Ambiguous
— Ambiguous value treatment
"ignore"
(default) | "bundle"
| "prorate"
| "warn"
Ambiguous value treatment, specified as "ignore"
,
"bundle"
, "prorate"
,
or"warn"
. The choices are described in this list.
"ignore"
— Skips codons containing ambiguous characters."bundle"
— Counts codons containing ambiguous characters and reports the total count in theAmbiguous
field of theCodons
output structure."prorate"
— Counts codons containing ambiguous characters and distributes them proportionately in the appropriate codon fields containing standard nucleotide characters. For example, the counts for the codon ART are distributed evenly among the AAT and AGT fields."warn"
— Skips codons containing ambiguous characters and displays a warning.
Data Types: string
| char
Figure
— Whether to display heat map of the codon counts
false
or 0
(default) | true
or 1
Whether to display heat map of the codon counts, specified as a numeric or logical
0
(false
) or 1
(true
).
Data Types: logical
| double
| single
| int8
| int16
| int32
| int64
| uint8
| uint16
| uint32
| uint64
GeneticCode
— Genetic code number or name
1
(default) | integer scalar | string scalar | character vector
Genetic code number or name, specified as an integer scalar, string scalar, or
character vector. The value for GeneticCode
comes from the table
Genetic Code. You can also specify
"None"
.
Tip
If you specify GeneticCode
as a code name, you can truncate
to the first two letters of the name.
Example: 2
Example: "Yeast Mitochondrial"
Example: "Mo"
Data Types: double
| single
| int8
| int16
| int32
| int64
| uint8
| uint16
| uint32
| uint64
| string
| char
Output Arguments
Codons
— Codon counts
structure
Codon counts, returned as a structure. The Codons
structure
contains fields for the 64 possible codons (AAA
,
AAC
, AAG
, ..., TTG
, and
TTT
), and the values count the codons of corresponding type from
SeqNT
.
Codons that contain the
U
character are combined with the corresponding codons containing theT
character.If the sequence contains gaps indicated by a hyphen (
-
), then codons containing gaps are ignored.If the sequence contains unrecognized characters, then codons containing these characters are ignored, and this warning message appears:
Warning: Unknown symbols appear in the sequence. These will be ignored.
CodonArray
— Raw count data for codons
4-by-4-by-4 array
Raw count data for codons, returned as a 4-by-4-by-4 array. The three dimensions
correspond to the three positions in the codon, and the indices to each element are
represented by 1
, 2
, 3
, and
4
for A
, C
,
G
, and T
, respectively. For example, the element
(2,3,4)
in the array contains the number of CGT
codons.
Version History
Introduced before R2006a
See Also
aacount
| basecount
| baselookup
| codonbias
| dimercount
| nmercount
| ntdensity
| seqcomplement
| seqreverse
| seqwordcount
MATLAB Command
You clicked a link that corresponds to this MATLAB command:
Run the command by entering it in the MATLAB Command Window. Web browsers do not support MATLAB commands.
Select a Web Site
Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select: .
You can also select a web site from the following list
How to Get Best Site Performance
Select the China site (in Chinese or English) for best site performance. Other MathWorks country sites are not optimized for visits from your location.
Americas
- América Latina (Español)
- Canada (English)
- United States (English)
Europe
- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)
- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom (English)
Asia Pacific
- Australia (English)
- India (English)
- New Zealand (English)
- 中国
- 日本Japanese (日本語)
- 한국Korean (한국어)