Main Content

getembl

Retrieve sequence information from EMBL database

Syntax

EMBLData = getembl(AccessionNumber)
EMBLData = getembl(..., 'ToFile', ToFileValue, ...)
EMBLSeq = getembl(..., 'SequenceOnly', SequenceOnlyValue, ...)
EMBLSeq = getembl(..., 'TimeOut', TimeOutValue, ...)

Input Arguments

AccessionNumber Unique identifier for a sequence record. Enter a unique combination of letters and numbers.
ToFileValue Character vector specifying a file name or a path and file name to which to save the data. If you specify only a file name, the file is stored in the current folder.
SequenceOnlyValueControls the retrieving of only the sequence without the metadata. Choices are true or false (default).
TimeOutValueConnection timeout in seconds, specified as a positive scalar. The default value is 5. For details, see here.

Output Arguments

EMBLData MATLAB® structure with fields corresponding to EMBL data.
EMBLSeqMATLAB character vector representing the sequence.

Description

getembl retrieves information from the European Molecular Biology Laboratory (EMBL) database for nucleotide sequences. This database is maintained by the European Bioinformatics Institute (EBI). For more details about the EMBL database, see

EMBLData = getembl(AccessionNumber) searches for the accession number in the EMBL database (https://www.ebi.ac.uk/) and returns EMBLData, a MATLAB structure with fields corresponding to the EMBL two-character line type code. Each line type code is stored as a separate element in the structure.

EMBLData contains the following fields.

Field
Identification
Accession
SequenceVersion
DateCreated
DateUpdated
Description
Keyword
OrganismSpecies
OrganismClassification
Organelle
Reference
DatabaseCrossReference
Comments
Assembly
Feature
BaseCount
Sequence

EMBLData = getembl(..., 'PropertyName', PropertyValue, ...) calls getembl with optional properties that use property name/property value pairs. You can specify one or more properties in any order. Each PropertyName must be enclosed in single quotation marks and is case insensitive. These property name/property value pairs are as follows:

EMBLData = getembl(..., 'ToFile', ToFileValue, ...) saves the information to an EMBL-formatted file. ToFileValue is a character vector specifying a file name or a path and file name to which to save the data. If you specify only a file name, the file is stored in the current folder.

Tip

Read an EMBL-formatted file back into the MATLAB software using the emblread function.

EMBLSeq = getembl(..., 'SequenceOnly', SequenceOnlyValue, ...) controls the retrieving of only the sequence without the metadata. Choices are true or false (default).

EMBLSeq = getembl(..., 'TimeOut', TimeOutValue, ...) sets the connection timeout (in seconds) to retrieve data from EMBL database.

Examples

Retrieve data for the rat liver apolipoprotein A-I.

emblout = getembl('X00558')

Retrieve data for the rat liver apolipoprotein A-I and save it to the file rat_protein. If you specify a file name without a path, the file is stored in the current folder.

emblout = getembl('X00558','ToFile','c:\project\rat_protein.txt')

Retrieve only the sequence for the rat liver apolipoprotein A-I.

Seq = getembl('X00558','SequenceOnly',true)

Version History

Introduced before R2006a