codoncount

Count codons in nucleotide sequence

Syntax

Codons = codoncount(SeqNT)

[Codons,CodonArray] =
codoncount(SeqNT)

___ = codoncount(SeqNT,Name=Value)

Description

Codons = codoncount(SeqNT) counts the codons in SeqNT, a nucleotide sequence, and returns the codon counts in Codons, a MATLAB^® structure containing fields for the 64 possible codons (AAA, AAC, AAG, ..., TTG, and TTT).

example

[Codons,CodonArray] = codoncount(SeqNT) returns CodonArray, a 4-by-4-by-4 array containing the raw count data for each codon.

___ = codoncount(SeqNT,Name=Value) specifies options using one or more name-value arguments in addition to the input arguments in previous syntaxes. For example, to bundle ambiguous nucleotide characters, set AmbiguousValue to "bundle".

example

Examples

collapse all

Count Codons in a Nucleotide Sequence

Open Live Script

SeqNT = randseq(1000);
Codons = codoncount(SeqNT)

Codons = struct with fields:
    AAA: 11
    AAC: 5
    AAG: 8
    AAT: 6
    ACA: 6
    ACC: 7
    ACG: 4
    ACT: 7
    AGA: 6
    AGC: 9
    AGG: 5
    AGT: 2
    ATA: 6
    ATC: 4
    ATG: 4
    ATT: 6
    CAA: 3
    CAC: 5
    CAG: 7
    CAT: 10
    CCA: 5
    CCC: 4
    CCG: 8
    CCT: 5
    CGA: 7
    CGC: 6
    CGG: 5
    CGT: 5
    CTA: 4
    CTC: 7
    CTG: 4
    CTT: 5
    GAA: 5
    GAC: 6
    GAG: 5
    GAT: 4
    GCA: 3
    GCC: 2
    GCG: 8
    GCT: 5
    GGA: 6
    GGC: 7
    GGG: 10
    GGT: 4
    GTA: 2
    GTC: 6
    GTG: 5
    GTT: 2
    TAA: 2
    TAC: 4
    TAG: 1
    TAT: 4
    TCA: 6
    TCC: 2
    TCG: 5
    TCT: 5
    TGA: 4
    TGC: 1
    TGG: 5
    TGT: 8
    TTA: 6
    TTC: 1
    TTG: 8
    TTT: 5

Count the codons in the second frame for the reverse complement of the sequence.

r2Codons = codoncount(SeqNT,Frame=2,Reverse=true)

r2Codons = struct with fields:
    AAA: 5
    AAC: 2
    AAG: 5
    AAT: 6
    ACA: 8
    ACC: 4
    ACG: 5
    ACT: 2
    AGA: 5
    AGC: 5
    AGG: 5
    AGT: 7
    ATA: 4
    ATC: 4
    ATG: 10
    ATT: 6
    CAA: 8
    CAC: 5
    CAG: 4
    CAT: 4
    CCA: 5
    CCC: 10
    CCG: 5
    CCT: 5
    CGA: 5
    CGC: 8
    CGG: 8
    CGT: 4
    CTA: 1
    CTC: 5
    CTG: 7
    CTT: 8
    GAA: 1
    GAC: 6
    GAG: 7
    GAT: 4
    GCA: 1
    GCC: 7
    GCG: 6
    GCT: 9
    GGA: 2
    GGC: 2
    GGG: 4
    GGT: 7
    GTA: 4
    GTC: 6
    GTG: 5
    GTT: 5
    TAA: 6
    TAC: 2
    TAG: 4
    TAT: 6
    TCA: 4
    TCC: 6
    TCG: 7
    TCT: 6
    TGA: 6
    TGC: 3
    TGG: 5
    TGT: 6
    TTA: 2
    TTC: 5
    TTG: 3
    TTT: 11

Create a heat map of the codons from the original sequence and overlay a grid that groups the synonymous codons according to the standard genetic code.

codoncount(SeqNT,Figure=true)

AAA - 11     AAC -  5     AAG -  8     AAT -  6     
ACA -  6     ACC -  7     ACG -  4     ACT -  7     
AGA -  6     AGC -  9     AGG -  5     AGT -  2     
ATA -  6     ATC -  4     ATG -  4     ATT -  6     
CAA -  3     CAC -  5     CAG -  7     CAT - 10     
CCA -  5     CCC -  4     CCG -  8     CCT -  5     
CGA -  7     CGC -  6     CGG -  5     CGT -  5     
CTA -  4     CTC -  7     CTG -  4     CTT -  5     
GAA -  5     GAC -  6     GAG -  5     GAT -  4     
GCA -  3     GCC -  2     GCG -  8     GCT -  5     
GGA -  6     GGC -  7     GGG - 10     GGT -  4     
GTA -  2     GTC -  6     GTG -  5     GTT -  2     
TAA -  2     TAC -  4     TAG -  1     TAT -  4     
TCA -  6     TCC -  2     TCG -  5     TCT -  5     
TGA -  4     TGC -  1     TGG -  5     TGT -  8     
TTA -  6     TTC -  1     TTG -  8     TTT -  5

Figure contains an axes object. The hidden axes object contains 67 objects of type image, text, line.

Input Arguments

collapse all

`SeqNT` — Nucleotide sequence
string scalar | character vector | integer row vector | structure

Nucleotide sequence, specified as a string scalar, character vector, integer row vector, or structure.

To specify SeqNT as a string scalar or character vector, see Mapping Nucleotide Letter Codes to Integers for valid letter codes.
To specify SeqNT as an integer row vector, see Mapping Nucleotide Integers to Letter Codes for valid integers.
To specify SeqNT as a structure, the structure must contain a Sequence field, such as returned by fastaread, fastqread, emblread, getembl, genbankread, or getgenbank.

Example: "ACGT"

Example: 1:4

Name-Value Arguments

Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter.

Example: codoncount(seqNT,Ambiguous="prorate")

`Frame` — Reading frame in nucleotide sequence
`1` (default) | `2` | `3`

Reading frame in nucleotide sequence, specified as 1, 2, or 3.

`Reverse` — Whether to return the reverse complement sequence
`false` or `0` (default) | `true` or `1`

Whether to return the reverse complement sequence, specified as a numeric or logical 0 (false) or 1 (true).

`Ambiguous` — Ambiguous value treatment
`"ignore"` (default) | `"bundle"` | `"prorate"` | `"warn"`

Ambiguous value treatment, specified as "ignore", "bundle", "prorate", or"warn". The choices are described in this list.

"ignore" — Skips codons containing ambiguous characters.
"bundle" — Counts codons containing ambiguous characters and reports the total count in the Ambiguous field of the Codons output structure.
"prorate" — Counts codons containing ambiguous characters and distributes them proportionately in the appropriate codon fields containing standard nucleotide characters. For example, the counts for the codon ART are distributed evenly among the AAT and AGT fields.
"warn" — Skips codons containing ambiguous characters and displays a warning.

Data Types: string | char

`Figure` — Whether to display heat map of the codon counts
`false` or `0` (default) | `true` or `1`

Whether to display heat map of the codon counts, specified as a numeric or logical 0 (false) or 1 (true).

`GeneticCode` — Genetic code number or name
`1` (default) | integer scalar | string scalar | character vector

Genetic code number or name, specified as an integer scalar, string scalar, or character vector. The value for GeneticCode comes from the table Genetic Code. You can also specify "None".

Tip

If you specify GeneticCode as a code name, you can truncate to the first two letters of the name.

Example: 2

Example: "Yeast Mitochondrial"

Example: "Mo"

Output Arguments

collapse all

`Codons` — Codon counts
structure

Codon counts, returned as a structure. The Codons structure contains fields for the 64 possible codons (AAA, AAC, AAG, ..., TTG, and TTT), and the values count the codons of corresponding type from SeqNT.

Codons that contain the U character are combined with the corresponding codons containing the T character.
If the sequence contains gaps indicated by a hyphen (-), then codons containing gaps are ignored.
If the sequence contains unrecognized characters, then codons containing these characters are ignored, and this warning message appears:
```
Warning: Unknown symbols appear in the sequence. These will be ignored.
```

`CodonArray` — Raw count data for codons
4-by-4-by-4 array

Raw count data for codons, returned as a 4-by-4-by-4 array. The three dimensions correspond to the three positions in the codon, and the indices to each element are represented by 1, 2, 3, and 4 for A, C, G, and T, respectively. For example, the element (2,3,4) in the array contains the number of CGT codons.

Version History

Introduced before R2006a

codoncount

Syntax

Description

Examples

Count Codons in a Nucleotide Sequence

Input Arguments

SeqNT — Nucleotide sequence string scalar | character vector | integer row vector | structure

Name-Value Arguments

Frame — Reading frame in nucleotide sequence 1 (default) | 2 | 3

Reverse — Whether to return the reverse complement sequence false or 0 (default) | true or 1

Ambiguous — Ambiguous value treatment "ignore" (default) | "bundle" | "prorate" | "warn"

Figure — Whether to display heat map of the codon counts false or 0 (default) | true or 1

GeneticCode — Genetic code number or name 1 (default) | integer scalar | string scalar | character vector

Output Arguments

Codons — Codon counts structure

CodonArray — Raw count data for codons 4-by-4-by-4 array

Version History

See Also

`SeqNT` — Nucleotide sequence
string scalar | character vector | integer row vector | structure

`Frame` — Reading frame in nucleotide sequence
`1` (default) | `2` | `3`

`Reverse` — Whether to return the reverse complement sequence
`false` or `0` (default) | `true` or `1`

`Ambiguous` — Ambiguous value treatment
`"ignore"` (default) | `"bundle"` | `"prorate"` | `"warn"`

`Figure` — Whether to display heat map of the codon counts
`false` or `0` (default) | `true` or `1`

`GeneticCode` — Genetic code number or name
`1` (default) | integer scalar | string scalar | character vector

`Codons` — Codon counts
structure

`CodonArray` — Raw count data for codons
4-by-4-by-4 array