BioMap Class
Superclasses: BioRead
Contain sequence, quality, alignment, and mapping data
Description
The BioMap
class contains data from short-read sequences,
including sequence headers, read sequences, quality scores for the sequences, and data
about how each sequence aligns to a given reference. This data is typically obtained
from a high-throughput sequencing instrument.
Construct a BioMap
object from short-read sequence data. Each
element in the object has a sequence, header, quality score, and alignment/mapping
information associated with it. Use the object properties and methods to explore,
access, filter, and manipulate all or a subset of the data, before analyzing or viewing
the data.
Construction
constructs
BioMapobj
= BioMapBioMapobj
, which is an empty BioMap
object.
constructs BioMapobj
= BioMap(File
)BioMapobj
, a BioMap
object, from
File
, a SAM- or BAM-formatted file whose reads are ordered by
start position in the reference sequence. The data remains in the source file, and the
BioMap
object accesses it using one or two auxiliary index
files. For a SAM-formatted file, MATLAB® uses or creates one index file that must have the same name as the source
file, but with an .idx
extension. For a BAM-formatted file,
MATLAB uses or creates two index files that must have the same name as the source
file, but with *.bai
and *.linearindex
extensions.
If the index files are not found in the same folder as the source file, the
BioMap
constructor function creates the index files in that
folder.
When you pass in an unordered BAM-formatted file, the constructor automatically orders
the file and writes the data to an ordered file using the same base name and extension
with an added character vector “.ordered” before the extension. The new
file is indexed and used to instantiate the new BioMap
object.
Note
Because the data remains in the source file and is accessed using the index files:
Do not delete the source file (SAM or BAM).
Do not delete the index files (*.
idx
,*.bai
, or *.linearindex
).You cannot modify
BioMapobj
properties.
Tip
To determine the number of reference sequences included in your source file, use
the saminfo
or baminfo
function. Use SAMtools to
check if the reads in your source file are ordered by position in the reference
sequence, and also to reorder them, if needed.
constructs BioMapobj
= BioMap(Struct
)BioMapobj
, a BioMap
object, from
Struct
, a MATLAB structure containing sequence and alignment information, such as returned
by the samread
or bamread
function. The data from Struct
remains in
memory, which lets you modify the BioMapobj
properties.
constructs the BioMapobj
= BioMap(___,'Name
',Value
)BioMap
object using any of previous input arguments
and additional options, specified as name-value pair arguments as follows.
selects one or more references when the source data contains sequences mapped to more
than one reference. By default, the constructor includes all of the references in the
header dictionary of the source file. When the header dictionary is not available, the
constructor defaults to including all reference names found in the source data.
BioMapobj
= BioMap(___,'SelectReference'
,SelectRefValue
)SelectRefValue
is a character vector, string, string vector, or
cell array of character vectors. By using this option, you can prevent the
BioMap
constructor from creating auxiliary index files for
references that you will not use in your analysis. If any reads mapped to selected
references are paired and BioMapobj
is written to a file, the
reference sequences of the mates are also included in the file header.
specifies whether to place the data in memory or leave the data in the source file.
Leaving the data in the source file and accessing via an index file is more memory
efficient, but does not let you modify properties of BioMapobj
= BioMap(File
,'InMemory'
,InMemoryValue
)BioMapobj
.
Choices are true
or false
(default). If the first
input argument is not a file name, then this name-value pair argument is ignored, and
the data is automatically placed in memory.
Tip
Set the 'InMemory'
name-value pair argument to
true
if you want to modify the properties of
BioMapobj
.
specifies the path to the folder where the index files
(*.BioMapobj
= BioMap(___,'IndexDir'
,IndexDirValue
)idx
,*.bai
, or
*.linearindex
) either exist or will be created.
Tip
Use the 'IndexDir'
name-value pair argument if you do not
have write access to the folder where the source file is located.
constructs BioMapobj
= BioMap(___,'Sequence'
,SequenceValue
)BioMapobj
, a BioMap
object, from
SequenceValue
that contains he letter representations of
nucleotide sequences. This name-value pair works only if the data is read into
memory.
constructs BioMapobj
= BioMap(___,'Header'
,HeaderValue
)BioMapobj
, a BioMap
object, from
HeaderValue
that contains header text for nucleotide sequences.
This name-value pair works only if the data is read into memory.
constructs BioMapobj
= BioMap(___,'Quality'
,QualityValue
)BioMapobj
, a BioMap
object, from
QualityValue
that contains the ASCII representation of per-base
quality scores for nucleotide sequences. This name-value pair works only if the data is
read into memory.
constructs BioMapobj
= BioMap(___,'Reference'
,ReferenceValue
)BioMapobj
, a BioMap
object, and
sets the Reference
property to ReferenceValue
that contains the names of the reference sequences. This name-value pair works only if
the data is read into memory.
constructs BioMapobj
= BioMap(___,'Signature'
,SignatureValue
)BioMapobj
, a BioMap
object, from
SignatureValue
that contains information describing the
alignment of each read sequence with the reference sequence. This name-value pair works
only if the data is read into memory.
constructs BioMapobj
= BioMap(___,'Start'
,StartValue
)BioMapobj
, a BioMap
object, from
StartValue
, a vector of positive integers specifying the
position in the reference sequence where the alignment of each read sequence starts.
This name-value pair works only if the data is read into memory.
constructs BioMapobj
= BioMap(___,'Flag'
,FlagValue
)BioMapobj
, a BioMap
object, from
FlagValue
, a vector of positive integers indicating the
bit-wise information for the status of the 11 flags specified by the SAM format
specification. These flags describe different sequencing and alignment aspects of the
read sequences. This name-value pair works only if the data is read into memory.
constructs BioMapobj
= BioMap(___,'MappingQuality'
,MappingQualityValue
)BioMapobj
, a BioMap
object,
from MappingQualityValue
, a vector of positive integers specifying
the mapping quality for each read sequence. This name-value pair works only if the data
is read into memory.
constructs BioMapobj
= BioMap(___,'MatePosition'
,MatePositionValue
)BioMapobj
, a BioMap
object, from
MatePositionValue
, a vector of nonnegative integers specifying
the mate position for each read sequence. This name-value pair works only if the data is
read into memory.
Input Arguments
|
Character vector or string specifying a SAM- or BAM-formatted file that contains only one reference sequence and whose reads are ordered by start position in the reference sequence. |
|
MATLAB structure containing sequence and alignment information,
such as returned by the |
|
Character vector, string, string vector, or cell array of character
vectors specifying the name of the reference sequences in
|
|
Logical specifying whether to place the data in memory or leave the
data in the source file. Leaving the data in the source file and
accessing it via an index file is more memory efficient, but does not
let you modify properties of the Default: |
|
Character vector or string specifying the path to the folder where the index file either exists or will be created. Default: Folder where |
|
String vector or cell array of character vectors containing the letter
representations of nucleotide sequences. This information populates the
|
|
String vector or cell array of character vectors containing the ASCII
representation of per-base quality scores for nucleotide sequences. This
information populates the |
|
String vector or cell array of character vectors containing header
text for nucleotide sequences. This information populates the
|
|
Character vector or string describing the Default: |
|
String vector or cell array of character vectors containing the names
of the reference sequences. This information populates the object's
|
|
String vector or cell array of character vectors containing
information describing the alignment of each read sequence with the
reference sequence. The |
|
Vector of positive integers specifying the position in the reference
sequence where the alignment of each read sequence starts. This
information populates the object's |
|
Vector of positive integers indicating the bit-wise information for
the status of the 11 flags specified by the SAM format specification.
These flags describe different sequencing and alignment aspects of the
read sequences. This information populates the object's
|
|
Vector of positive integers specifying the mapping quality for each
read sequence. This information populates the object's
|
|
Vector of nonnegative integers specifying the mate position for each
read sequence. This information populates the object's
|
Properties
|
Flags associated with all read sequences represented in the
Vector of positive integers such that there is an integer for each read
sequence in the object. Each integer indicates the bit-wise information that
specifies the status of the 11 flags described by the SAM format
specification. These flags describe different sequencing and alignment
aspects of a read sequence. A one-to-one relationship exists between the
number and order of elements in |
|
Headers associated with all read sequences represented in the
Cell array of character vectors, such that there is a header for each read
sequence in the object. Headers can be empty. A one-to-one relationship
exists between the number and order of elements in |
|
Positions of the mates for all read sequences represented in the
Vector of nonnegative integers such that there is an integer for each read
sequence in the object. Each integer indicates the position of the
corresponding mate sequence, relative to the reference sequence. A
one-to-one relationship exists between the number and order of elements in
Not all values in the |
|
Mapping quality scores associated with all read sequences represented in
the Vector of integers, such that there is a mapping quality score for each
read sequence in the object. A one-to-one relationship exists between the
number and order of elements in |
|
Description of the Character vector describing the Default: |
|
Number of sequences in the This information is read-only. |
|
Per-base quality scores associated with all read sequences represented in
the Cell array of character vectors, such that there is a quality for each
read sequence in the object. Each quality is an ASCII representation of
per-base quality scores for a read sequence. Quality can be an empty
character vector. A one-to-one relationship exists between the number and
order of elements in |
|
Reference sequences in the
The reference sequences are the sequences against which the read sequences are aligned. |
|
Read sequences in the Cell array of character vectors containing the letter representations of the read sequences. |
|
Cell array of character vectors that catalogs the names of the references
available in the This information is read-only. |
|
Alignment information associated with all read sequences represented in
the Cell array of CIGAR–formatted character vectors, such that there is
alignment information for each read sequence in the object. Each character
vector represents how a read sequence aligns to the reference sequence.
Signatures can be empty character vectors. A one-to-one relationship exists
between the number and order of elements in |
|
Start positions of all aligned read sequences represented in the
Vector of integers, such that there is a start position for each read
sequence in the object. Each integer specifies the start position of the
aligned read sequence with respect to the position numbers in the reference
sequence. A one-to-one relationship exists between the number and order of
elements in |
Methods
filterByFlag | Filter sequence reads by SAM flag |
getAlignment | Construct alignment represented in BioMap object |
getBaseCoverage | Return base-by-base alignment coverage of reference sequence
in BioMap object |
getCompactAlignment | Construct compact alignment represented in BioMap object |
getCounts | Return count of read sequences aligned to reference sequence
in BioMap object |
getFlag | Retrieve read sequence flags from BioMap object |
getIndex | Return indices of read sequences aligned to reference
sequence in BioMap object |
getInfo | Retrieve information for single element of BioMap object |
getMappingQuality | Retrieve sequence mapping quality scores from BioMap object |
getReference | Retrieve reference sequence from BioMap object |
getSignature | Retrieve signature (alignment information) from BioMap object |
getStart | Retrieve start positions of aligned read sequences from BioMap object |
getStop | Compute stop positions of aligned read sequences from BioMap object |
getSummary | Print summary of BioMap object |
setFlag | Set read sequence flags for BioMap object |
setMappingQuality | Set sequence mapping quality scores for BioMap object |
setReference | Set name of reference sequence for BioMap object |
setSignature | Set signature (alignment information) for BioMap object |
setStart | Set start positions of aligned read sequences in BioMap object |
Inherited Methods
combine | Combine two objects |
get | Retrieve property of object |
getHeader | Retrieve sequence headers from object |
getQuality | Retrieve sequence quality information from object |
getSequence | Retrieve sequences from object |
getSubsequence | Retrieve partial sequences from object |
getSubset | Retrieve subset of elements from object |
set | Set property of object |
setHeader | Update header information of reads |
setQuality | Update quality information |
setSequence | Update read sequences |
setSubsequence | Update partial sequences |
setSubset | Update elements of object |
write | Write contents of BioRead or BioMap object to file |
Copy Semantics
Value. To learn how value classes affect copy operations, see Copying Objects in the MATLAB Programming Fundamentals documentation.
Indexing
BioMap
objects support dot . indexing to extract, assign, and
delete data.
Examples
See Also
BioIndexedFile
| BioRead
| saminfo
| samread
| baminfo
| bamread
| align2cigar
| cigar2align