basecount

Count nucleotides in sequence

collapse all in page

Syntax

NTStruct = basecount(SeqNT)

NTStruct = basecount(SeqNT,Name=Value)

Description

NTStruct = basecount(SeqNT) returns the number of each type of base in SeqNT.

example

NTStruct = basecount(SeqNT,Name=Value) uses additional options specified by one or more name-value arguments.

example

Examples

collapse all

Count Nucleotides in Sequence

Open Live Script

Count the bases in a DNA sequence and return the results in a structure.

bases = basecount('TAGCTGGCCAAGCGAGCTTG')

bases = struct with fields:
    A: 4
    C: 5
    G: 7
    T: 4

Get the number of adenosine (A) bases.

bases.A

ans = 
4

Create a bar graph comparing the number of each nucleotide.

basecount('TAGCTGGCCAAGCGAGCTTG',Chart="bar")

Figure contains an axes object. The axes object contains an object of type bar.

ans = struct with fields:
    A: 4
    C: 5
    G: 7
    T: 4

Count the bases in a DNA sequence containing ambiguous characters (R, Y, K, M, S, W, B, D, H, V, or N), listing each of them in a separate field.

basecount('ABCDGGCCAAGCGAGCTTG',Ambiguous="individual")

ans = struct with fields:
    A: 4
    C: 5
    G: 6
    T: 2
    R: 0
    Y: 0
    K: 0
    M: 0
    S: 0
    W: 0
    B: 1
    D: 1
    H: 0
    V: 0
    N: 0

Input Arguments

collapse all

`SeqNT` — Nucleotide sequence
character vector | string scalar | row vector of integers | structure

Nucleotide sequence, specified as one of the following.

Character vector or string scalar consisting of the characters A, C, G, T, and U, and ambiguous characters R, Y, K, M, S, W, B, D, H, V, and N.
Row vector of integers specifying a nucleotide sequence. For information on valid integers, see Mapping Nucleotide Integers to Letter Codes.
Structure that contains a nucleotide sequence in the Sequence field. The fastaread, fastqread, emblread, getembl, genbankread, and getgenbank functions return structures with a Sequence field.

Example: NTStruct = basecount('CGACTT') counts the number of times of each nucleotide occurs in the sequence.

Data Types: double | char | string | struct

Name-Value Arguments

collapse all

Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter.

Example: NTStruct = basecount("ACGGTC",Ambiguous="individual")

`Ambiguous` — Method for counting ambiguous nucleotide characters
"`ignore`" (default) | "`bundle`" | "`prorate`" | "`individual`" | "`warn`"

Method for counting ambiguous nucleotide characters (R, Y, K, M, S, W, B, D, H, V, and N), specified as one of the following.

"ignore" — basecount skips ambiguous characters.
"bundle" — basecount counts ambiguous characters and reports the total count in the Ambiguous field.
"prorate" — basecount counts ambiguous characters and distributes the total number evenly between all possible unambiguous nucleotide fields. For example, the count for the character R is distributed evenly between the A and G fields.
"individual" — basecount counts ambiguous characters and reports them in individual fields.
"warn" — basecount skips ambiguous characters and displays a warning.

Example: NTStruct = basecount("CGRTTMSA",Ambiguous="bundle") reports the total number of ambiguous characters in the Ambiguous field of NTStruct.

Data Types: char | string

`Gaps` — Flag to count or ignore gaps
`false` (default) | `true`

Flag to count or ignore gaps, specified as true or false. Gaps are indicated by a hyphen (-).

If you set this option to true, then basecount counts the gaps and reports the total count in the Gaps field.

Data Types: logical

`Chart` — Type of chart
`"pie"` | `"bar"`

Type of chart to display the proportions of nucleotides, specified as "pie" or "bar".

Data Types: char | string

Output Arguments

collapse all

`NTStruct` — Nucleotide counts
structure

Nucleotide counts, returned as a structure containing the fields A, C, G, and T. Uracil nucleotides (U) are added to the T field. Additional fields can be present, depending on the value of Ambiguous and Gaps.

Version History

Introduced before R2006a

basecount

Syntax

Description

Examples

Count Nucleotides in Sequence

Input Arguments

SeqNT — Nucleotide sequence character vector | string scalar | row vector of integers | structure

Name-Value Arguments

Ambiguous — Method for counting ambiguous nucleotide characters "ignore" (default) | "bundle" | "prorate" | "individual" | "warn"

Gaps — Flag to count or ignore gaps false (default) | true

Chart — Type of chart "pie" | "bar"

Output Arguments

NTStruct — Nucleotide counts structure

Version History

See Also

`SeqNT` — Nucleotide sequence
character vector | string scalar | row vector of integers | structure

`Ambiguous` — Method for counting ambiguous nucleotide characters
"`ignore`" (default) | "`bundle`" | "`prorate`" | "`individual`" | "`warn`"

`Gaps` — Flag to count or ignore gaps
`false` (default) | `true`

`Chart` — Type of chart
`"pie"` | `"bar"`

`NTStruct` — Nucleotide counts
structure