fastaread
Read data from FASTA file
Syntax
Description
returns the sequence data from the input FASTA fastaStruct
= fastaread(file
)file
as a
structure.
uses additional options specified by one or more name-value arguments. For example,
fastaStruct
= fastaread(file
,Name=Value
)seqdata = fastaread(fastafile,IgnoreGaps=true)
removes any gap symbol
(-
or .
) from the sequences.
[
returns the sequence data as separate variables: header
,sequence
] = fastaread(___)header
and
sequence
. You can specify any of the input argument combinations in
the previous syntaxes. If the file contains multiple sequences, header
and sequences
are cell arrays of sequence header and nucleotide or
amino acid sequence information.
Examples
Read sequence data from FASTA files
Read the nucleotide sequence information of the human p53 tumor gene.
p53nt = fastaread("p53nt.txt")
p53nt = struct with fields:
Header: 'gi|8400737|ref|NM_000546.2| Homo sapiens tumor protein p53 (Li-Fraumeni syndrome) (TP53), mRNA'
Sequence: 'ACTTGTCATGGCGACTGTCCAGCTTTGTGCCAGGAGCCTCGCAGGGGTTGATGGGATTGGGGTTTTCCCCTCCCATGTGCTCAAGACTGGCGCTAAAAGTTTTGAGCTTCTCAAAAGTCTAGAGCCACCGTCCAGGGAGCAGGTAGCTGCTGGGCTCCGGGGACACTTTGCGTTCGGGCTGGGAGCGTGCTTTCCACGACGGTGACACGCTTCCCTGGATTGGCAGCCAGACTGCCTTCCGGGTCACTGCCATGGAGGAGCCGCAGTCAGATCCTAGCGTCGAGCCCCCTCTGAGTCAGGAAACATTTTCAGACCTATGGAAACTACTTCCTGAAAACAACGTTCTGTCCCCCTTGCCGTCCCAAGCAATGGATGATTTGATGCTGTCCCCGGACGATATTGAACAATGGTTCACTGAAGACCCAGGTCCAGATGAAGCTCCCAGAATGCCAGAGGCTGCTCCCCGCGTGGCCCCTGCACCAGCAGCTCCTACACCGGCGGCCCCTGCACCAGCCCCCTCCTGGCCCCTGTCATCTTCTGTCCCTTCCCAGAAAACCTACCAGGGCAGCTACGGTTTCCGTCTGGGCTTCTTGCATTCTGGGACAGCCAAGTCTGTGACTTGCACGTACTCCCCTGCCCTCAACAAGATGTTTTGCCAACTGGCCAAGACCTGCCCTGTGCAGCTGTGGGTTGATTCCACACCCCCGCCCGGCACCCGCGTCCGCGCCATGGCCATCTACAAGCAGTCACAGCACATGACGGAGGTTGTGAGGCGCTGCCCCCACCATGAGCGCTGCTCAGATAGCGATGGTCTGGCCCCTCCTCAGCATCTTATCCGAGTGGAAGGAAATTTGCGTGTGGAGTATTTGGATGACAGAAACACTTTTCGACATAGTGTGGTGGTGCCCTATGAGCCGCCTGAGGTTGGCTCTGACTGTACCACCATCCACTACAACTACATGTGTAACAGTTCCTGCATGGGCGGCATGAACCGGAGGCCCATCCTCACCATCATCACACTGGAAGACTCCAGTGGTAATCTACTGGGACGGAACAGCTTTGAGGTGCGTGTTTGTGCCTGTCCTGGGAGAGACCGGCGCACAGAGGAAGAGAATCTCCGCAAGAAAGGGGAGCCTCACCACGAGCTGCCCCCAGGGAGCACTAAGCGAGCACTGCCCAACAACACCAGCTCCTCTCCCCAGCCAAAGAAGAAACCACTGGATGGAGAATATTTCACCCTTCAGATCCGTGGGCGTGAGCGCTTCGAGATGTTCCGAGAGCTGAATGAGGCCTTGGAACTCAAGGATGCCCAGGCTGGGAAGGAGCCAGGGGGGAGCAGGGCTCACTCCAGCCACCTGAAGTCCAAAAAGGGTCAGTCTACCTCCCGCCATAAAAAACTCATGTTCAAGACAGAAGGGCCTGACTCAGACTGACATTCTCCACTTCTTGTTCCCCACTGACAGCCTCCCACCCCCATCTCTCCCTCCCCTGCCATTTTGGGTTTTGGGTCTTTGAACCCTTGCTTGCAATAGGTGTGCGTCAGAAGCACCCAGGACTTCCATTTGCTTTGTCCCGGGGCTCCACTGAACAAGTTGGCCTGCACTGGTGTTTTGTTGTGGGGAGGAGGATGGGGAGTAGGACATACCAGCTTAGATTTTAAGGTTTTTACTGTGAGGGATGTTTGGGAGATGTAAGAAATGTTCTTGCAGTTAAGGGTTAGTTTACAATCAGCCACATTCTAGGTAGGTAGGGGCCCACTTCACCGTACTAACCAGGGAAGCTGTCCCTCATGTTGAATTTTCTCTAACTTCAAGGCCCATATCTGTGAAATGCTGGCATTTGCACCTACCTCACAGAGTGCATTGTGAGGGTTAATGAAATAATGTACATCTGGCCTTGAAACCACCTTTTATTACATGGGGTCTAAAACTTGACCCCCTTGAGGGTGCCTGTTCCCTCTCCCTCTCCCTGTTGGCTGGTGGGTTGGTAGTTTCTACAGTTGGGCAGCTGGTTAGGTAGAGGGAGTTGTCAAGTCTTGCTGGCCCAGCCAAACCCTGTCTGACAACCTCTTGGTCGACCTTAGTACCTAAAAGGAAATCTCACCCCATCCCACACCCTGGAGGATTTCATCTCTTGTATATGATGATCTGGATCCACCAAGACTTGTTTTATGCTCAGGGTCAATTTCTTTTTTCTTTTTTTTTTTTTTTTTTCTTTTTCTTTGAGACTGGGTCTCGCTTTGTTGCCCAGGCTGGAGTGGAGTGGCGTGATCTTGGCTTACTGCAGCCTTTGCCTCCCCGGCTCGAGCAGTCCTGCCTCAGCCTCCGGAGTAGCTGGGACCACAGGTTCATGCCACCATGGCCAGCCAACTTTTGCATGTTTTGTAGAGATGGGGTCTCACAGTGTTGCCCAGGCTGGTCTCAAACTCCTGGGCTCAGGCGATCCACCTGTCTCAGCCTCCCAGAGTGCTGGGATTACAATTGTGAGCCACCACGTGGAGCTGGAAGGGTCAACATCTTTTACATTCTGCAAGCACATCTGCATTTTCACCCCACCCTTCCCCTCCTTCTCCCTTTTTATATCCCATTTTTATATCGATCTCTTATTTTACAATAAAACTTTGCTGCCA'
Read the amino acid sequence information of p53 protein.
p53aa = fastaread("p53aa.txt")
p53aa = struct with fields:
Header: 'gi|8400738|ref|NP_000537.2| tumor protein p53 [Homo sapiens]'
Sequence: 'MEEPQSDPSVEPPLSQETFSDLWKLLPENNVLSPLPSQAMDDLMLSPDDIEQWFTEDPGPDEAPRMPEAAPRVAPAPAAPTPAAPAPAPSWPLSSSVPSQKTYQGSYGFRLGFLHSGTAKSVTCTYSPALNKMFCQLAKTCPVQLWVDSTPPPGTRVRAMAIYKQSQHMTEVVRRCPHHERCSDSDGLAPPQHLIRVEGNLRVEYLDDRNTFRHSVVVPYEPPEVGSDCTTIHYNYMCNSSCMGGMNRRPILTIITLEDSSGNLLGRNSFEVRVCACPGRDRRTEEENLRKKGEPHHELPPGSTKRALPNNTSSSPQPKKKPLDGEYFTLQIRGRERFEMFRELNEALELKDAQAGKEPGGSRAHSSHLKSKKGQSTSRHKKLMFKTEGPDSD'
Read a block of entries from the 5th to 10th sequences from a FASTA file ignoring gaps from each sequence.
pf2 = fastaread('pf00002.fa',BlockRead=[5 10],IgnoreGaps=true)
pf2=6×1 struct array with fields:
Header
Sequence
Input Arguments
file
— Name of FASTA file or sequence information
character vector | character array | string scalar
Name of a FASTA-formatted file or sequence information, specified as a character vector, character array, or string scalar.
You specify either of the following:
File name, a path and file name, or a URL pointing to a file. The referenced file is a FASTA-formatted file (ASCII text file). If you specify only a file name, that file must be on the MATLAB® search path or in the MATLAB Current Folder.
MATLAB character array that contains the text of a FASTA-formatted file.
A FASTA-formatted file begins with a right angle bracket (>
)
and a single line description. Following this description is the sequence information as
a series of lines. Sequences must use the standard IUB/IUPAC amino acid and nucleotide
letter codes.
For a list of codes, see aminolookup
and baselookup
.
Data Types: char
| string
Name-Value Arguments
Specify optional pairs of arguments as
Name1=Value1,...,NameN=ValueN
, where Name
is
the argument name and Value
is the corresponding value.
Name-value arguments must appear after other arguments, but the order of the
pairs does not matter.
Example: seqdata =
fastaread(fastafile,TrimHeaders=true,TimeOut=10)
Before R2021a, use commas to separate each name and value, and enclose
Name
in quotes.
Example: seqdata =
fastaread(fastafile,'TrimHeaders',true,'TimeOut',10)
IgnoreGaps
— Flag to remove gap symbols
false
or 0
(default) | true
or 1
Flag to remove any gap symbols (-
or .
) from
the sequences, specified as a logical 1
(true
)
or 0
(false
).
BlockRead
— Sequence entry or blocks to read from input file
positive integer | 1
-by-2
of positive integers
Sequence entry or blocks to read from the input file with multiple sequences,
specified as a positive integer or 1
-by-2
vector of positive integers.
Specify a scalar positive integer n to read in the nth entry in the file.
Specify a two-element vector [m1
m2]
to read in a block of entries starting at the
m1 entry and ending at the m2 entry. Use
Inf
for m2 to read all entries in the file
starting at m1.
Data Types: double
TrimHeaders
— Flag to trim header after first white space
false
or 0
(default) | true
or 1
Flag to trim the header after the first white space, specified as a logical
1
(true
) or 0
(false
). White space characters include a space
(char(32)
) and a tab (char(9)
).
TimeOut
— Connection time out to read from remote EMBL-EBI file
5
(default) | positive scalar
Connection time out in seconds to read from a remote EMBL-EBI file, specified as a positive scalar. For details, see here.
Data Types: double
Output Arguments
fastaStruct
— Sequence data
structure
Sequence data, returned as a structure. The structure contains the following fields:
Field | Description |
---|---|
Header | Header information. |
Sequence | Single letter-code representation of a nucleotide or amino acid sequence. |
header
— Sequence header information
character vector | cell array of character vectors
Sequence header information, returned as a character vector or cell array of character vectors.
Data Types: char
| cell
sequence
— Single letter-coded nucleotide or amino acid sequences
character vector | cell array of character vectors
Single letter-coded nucleotide or amino acid sequences, returned as a character vector or cell array of character vectors.
Data Types: char
| cell
Version History
Introduced before R2006a
See Also
aminolookup
| baselookup
| BioIndexedFile
| emblread
| fastainfo
| fastawrite
| fastqinfo
| fastqread
| fastqwrite
| genbankread
| genpeptread
| multialignread
| saminfo
| samread
| seqprofile
| seqviewer
MATLAB Command
You clicked a link that corresponds to this MATLAB command:
Run the command by entering it in the MATLAB Command Window. Web browsers do not support MATLAB commands.
Select a Web Site
Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select: .
You can also select a web site from the following list
How to Get Best Site Performance
Select the China site (in Chinese or English) for best site performance. Other MathWorks country sites are not optimized for visits from your location.
Americas
- América Latina (Español)
- Canada (English)
- United States (English)
Europe
- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)
- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom (English)
Asia Pacific
- Australia (English)
- India (English)
- New Zealand (English)
- 中国
- 日本Japanese (日本語)
- 한국Korean (한국어)