Main Content

codoncount

Count codons in nucleotide sequence

    Description

    Codons = codoncount(SeqNT) counts the codons in SeqNT, a nucleotide sequence, and returns the codon counts in Codons, a MATLAB® structure containing fields for the 64 possible codons (AAA, AAC, AAG, ..., TTG, and TTT).

    example

    [Codons,CodonArray] = codoncount(SeqNT) returns CodonArray, a 4-by-4-by-4 array containing the raw count data for each codon.

    ___ = codoncount(SeqNT,Name=Value) specifies options using one or more name-value arguments in addition to the input arguments in previous syntaxes. For example, to bundle ambiguous nucleotide characters, set AmbiguousValue to "bundle".

    example

    Examples

    collapse all

    SeqNT = randseq(1000);
    Codons = codoncount(SeqNT)
    Codons = struct with fields:
        AAA: 11
        AAC: 5
        AAG: 8
        AAT: 6
        ACA: 6
        ACC: 7
        ACG: 4
        ACT: 7
        AGA: 6
        AGC: 9
        AGG: 5
        AGT: 2
        ATA: 6
        ATC: 4
        ATG: 4
        ATT: 6
        CAA: 3
        CAC: 5
        CAG: 7
        CAT: 10
        CCA: 5
        CCC: 4
        CCG: 8
        CCT: 5
        CGA: 7
        CGC: 6
        CGG: 5
        CGT: 5
        CTA: 4
        CTC: 7
        CTG: 4
        CTT: 5
        GAA: 5
        GAC: 6
        GAG: 5
        GAT: 4
        GCA: 3
        GCC: 2
        GCG: 8
        GCT: 5
        GGA: 6
        GGC: 7
        GGG: 10
        GGT: 4
        GTA: 2
        GTC: 6
        GTG: 5
        GTT: 2
        TAA: 2
        TAC: 4
        TAG: 1
        TAT: 4
        TCA: 6
        TCC: 2
        TCG: 5
        TCT: 5
        TGA: 4
        TGC: 1
        TGG: 5
        TGT: 8
        TTA: 6
        TTC: 1
        TTG: 8
        TTT: 5
    
    

    Count the codons in the second frame for the reverse complement of the sequence.

    r2Codons = codoncount(SeqNT,Frame=2,Reverse=true)
    r2Codons = struct with fields:
        AAA: 5
        AAC: 2
        AAG: 5
        AAT: 6
        ACA: 8
        ACC: 4
        ACG: 5
        ACT: 2
        AGA: 5
        AGC: 5
        AGG: 5
        AGT: 7
        ATA: 4
        ATC: 4
        ATG: 10
        ATT: 6
        CAA: 8
        CAC: 5
        CAG: 4
        CAT: 4
        CCA: 5
        CCC: 10
        CCG: 5
        CCT: 5
        CGA: 5
        CGC: 8
        CGG: 8
        CGT: 4
        CTA: 1
        CTC: 5
        CTG: 7
        CTT: 8
        GAA: 1
        GAC: 6
        GAG: 7
        GAT: 4
        GCA: 1
        GCC: 7
        GCG: 6
        GCT: 9
        GGA: 2
        GGC: 2
        GGG: 4
        GGT: 7
        GTA: 4
        GTC: 6
        GTG: 5
        GTT: 5
        TAA: 6
        TAC: 2
        TAG: 4
        TAT: 6
        TCA: 4
        TCC: 6
        TCG: 7
        TCT: 6
        TGA: 6
        TGC: 3
        TGG: 5
        TGT: 6
        TTA: 2
        TTC: 5
        TTG: 3
        TTT: 11
    
    

    Create a heat map of the codons from the original sequence and overlay a grid that groups the synonymous codons according to the standard genetic code.

    codoncount(SeqNT,Figure=true)
    AAA - 11     AAC -  5     AAG -  8     AAT -  6     
    ACA -  6     ACC -  7     ACG -  4     ACT -  7     
    AGA -  6     AGC -  9     AGG -  5     AGT -  2     
    ATA -  6     ATC -  4     ATG -  4     ATT -  6     
    CAA -  3     CAC -  5     CAG -  7     CAT - 10     
    CCA -  5     CCC -  4     CCG -  8     CCT -  5     
    CGA -  7     CGC -  6     CGG -  5     CGT -  5     
    CTA -  4     CTC -  7     CTG -  4     CTT -  5     
    GAA -  5     GAC -  6     GAG -  5     GAT -  4     
    GCA -  3     GCC -  2     GCG -  8     GCT -  5     
    GGA -  6     GGC -  7     GGG - 10     GGT -  4     
    GTA -  2     GTC -  6     GTG -  5     GTT -  2     
    TAA -  2     TAC -  4     TAG -  1     TAT -  4     
    TCA -  6     TCC -  2     TCG -  5     TCT -  5     
    TGA -  4     TGC -  1     TGG -  5     TGT -  8     
    TTA -  6     TTC -  1     TTG -  8     TTT -  5     
    

    Figure contains an axes object. The hidden axes object contains 67 objects of type image, text, line.

    Input Arguments

    collapse all

    Nucleotide sequence, specified as a string scalar, character vector, integer row vector, or structure.

    Example: "ACGT"

    Example: 1:4

    Data Types: string | char | double | single | int8 | int16 | int32 | int64 | uint8 | uint16 | uint32 | uint64 | struct

    Name-Value Arguments

    Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter.

    Example: codoncount(seqNT,Ambiguous="prorate")

    Reading frame in nucleotide sequence, specified as 1, 2, or 3.

    Data Types: double | single | int8 | int16 | int32 | int64 | uint8 | uint16 | uint32 | uint64

    Whether to return the reverse complement sequence, specified as a numeric or logical 0 (false) or 1 (true).

    Data Types: logical | double | single | int8 | int16 | int32 | int64 | uint8 | uint16 | uint32 | uint64

    Ambiguous value treatment, specified as "ignore", "bundle", "prorate", or"warn". The choices are described in this list.

    • "ignore" — Skips codons containing ambiguous characters.

    • "bundle" — Counts codons containing ambiguous characters and reports the total count in the Ambiguous field of the Codons output structure.

    • "prorate" — Counts codons containing ambiguous characters and distributes them proportionately in the appropriate codon fields containing standard nucleotide characters. For example, the counts for the codon ART are distributed evenly among the AAT and AGT fields.

    • "warn" — Skips codons containing ambiguous characters and displays a warning.

    Data Types: string | char

    Whether to display heat map of the codon counts, specified as a numeric or logical 0 (false) or 1 (true).

    Data Types: logical | double | single | int8 | int16 | int32 | int64 | uint8 | uint16 | uint32 | uint64

    Genetic code number or name, specified as an integer scalar, string scalar, or character vector. The value for GeneticCode comes from the table Genetic Code. You can also specify "None".

    Tip

    If you specify GeneticCode as a code name, you can truncate to the first two letters of the name.

    Example: 2

    Example: "Yeast Mitochondrial"

    Example: "Mo"

    Data Types: double | single | int8 | int16 | int32 | int64 | uint8 | uint16 | uint32 | uint64 | string | char

    Output Arguments

    collapse all

    Codon counts, returned as a structure. The Codons structure contains fields for the 64 possible codons (AAA, AAC, AAG, ..., TTG, and TTT), and the values count the codons of corresponding type from SeqNT.

    • Codons that contain the U character are combined with the corresponding codons containing the T character.

    • If the sequence contains gaps indicated by a hyphen (-), then codons containing gaps are ignored.

    • If the sequence contains unrecognized characters, then codons containing these characters are ignored, and this warning message appears:

      Warning: Unknown symbols appear in the sequence. These will be ignored.

    Raw count data for codons, returned as a 4-by-4-by-4 array. The three dimensions correspond to the three positions in the codon, and the indices to each element are represented by 1, 2, 3, and 4 for A, C, G, and T, respectively. For example, the element (2,3,4) in the array contains the number of CGT codons.

    Version History

    Introduced before R2006a