seq2regexp
Convert sequence with ambiguous characters to regular expression
Syntax
RegExp = seq2regexp(Seq)
RegExp = seq2regexp(Seq,
...'Alphabet', AlphabetValue, ...)
RegExp = seq2regexp(Seq,
...'Ambiguous', AmbiguousValue, ...)
Input Arguments
| Seq | Either of the following: 
 | 
| AlphabetValue | Character vector or string specifying the sequence alphabet. Choices are: 
 | 
| AmbiguousValue | Controls whether ambiguous characters are included in  
 | 
Output Arguments
| RegExp | Character vector of codes specifying an amino acid or nucleotide sequence in regular expression format using IUB/IUPAC codes. | 
Description
RegExp = seq2regexp(Seq)
RegExp = seq2regexp(Seq,
...'PropertyName', PropertyValue,
...)seq2regexp with optional properties
that use property name/property value pairs. You can specify one or
more properties in any order. Each PropertyName must
be enclosed in single quotation marks and is case insensitive. These
property name/property value pairs are as follows:
 specifies
the sequence alphabet. RegExp = seq2regexp(Seq,
...'Alphabet', AlphabetValue, ...)AlphabetValue can
be either 'NT' for nucleotide sequences or 'AA' for
amino acid sequences. Default is 'NT'.
RegExp = seq2regexp(Seq,
...'Ambiguous', AmbiguousValue, ...)RegExp,
the regular expression return value. Choices are true (default)
or false. For example:
- If - Seq- = 'ACGTK', and- AmbiguousValueis- true, the MATLAB® software returns- ACGT[GTK]with the unambiguous characters- Gand- Tand the ambiguous character- K.
- If - Seq- = 'ACGTK', and- AmbiguousValueis- false, the MATLAB software returns- ACGT[GT]with only the unambiguous characters.
Nucleotide Conversion
| Nucleotide Code | Nucleotide | Conversion | 
|---|---|---|
| A | Adenosine | A | 
| C | Cytosine | C | 
| G | Guanine | G | 
| T | Thymidine | T | 
| U | Uridine | U | 
| R | Purine | [AG] | 
| Y | Pyrimidine | [TC]  | 
| K | Keto | [GT]  | 
| M | Amino | [AC] | 
| S | Strong interaction (3 H bonds) | [GC] | 
| W | Weak interaction (2 H bonds) | [AT] | 
| B | Not A | [CGT] | 
| D | Not C | [AGT] | 
| H | Not G | [ACT] | 
| V | Not TorU | [ACG] | 
| N | Any nucleotide | [ACGT]  | 
| - | Gap of indeterminate length | - | 
| ? | Unknown | ? | 
Amino Acid Conversion
| Amino Acid Code | Amino Acid | Conversion | 
|---|---|---|
| B | Asparagine or Aspartic acid (Aspartate) | [DN] | 
| Z | Glutamine or Glutamic acid (Glutamate) | [EQ] | 
| X | Any amino acid | [A R N D C Q E G H I L K M F
P S T W Y V] | 
Examples
- Convert a nucleotide sequence to a regular expression. - seq2regexp('ACWTMAN') ans = AC[ATW]T[ACM]A[ACGTRYKMSWBDHVN]
- Convert the same nucleotide sequence, but remove ambiguous characters from the regular expression. - seq2regexp('ACWTMAN', 'ambiguous', false) ans = AC[AT]T[AC]A[ACGT] 
Version History
Introduced before R2006a
See Also
restrict | seqwordcount | regexp | regexpi