Hauptinhalt

samread

Read data from SAM file

    Description

    SAMStruct = samread(File) reads a SAM-formatted file and returns the data in a MATLAB® array of structures.

    [SAMStruct,HeaderStruct]= samread(File) returns the alignment and header data in two separate variables.

    example

    ___=samread(File,Name=Value) specifies options using one or more name-value arguments in addition to the arguments in previous syntaxes.

    example

    Examples

    collapse all

    Read the header information and the alignment data from the ex1.sam file included with Bioinformatics Toolbox™, and return the information in two separate variables.

    [data header] = samread("ex1.sam");

    Read a block of entries (5 through 10), excluding the tags, from the ex1.sam file, and then return the information in an array of structures.

    data = samread("ex1.sam",BlockRead=[5 10],Tags=false);

    Input Arguments

    collapse all

    File name, path to a SAM-formatted file, or the text of a SAM-formatted file, specified as a string or character vector. If you specify only a file name, that file must be on the MATLAB search path or in the current folder.

    Data Types: char | string

    Name-Value Arguments

    collapse all

    Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter.

    Example: BlockRead=[5 10]

    Indicator to read optional tags in addition to the first 11 fields for each alignment in the SAM-formatted file, specified as true or false.

    Data Types: logical

    Read group ID for which to read alignment records from, specified as a string or character vector. The default is to read records from all groups.

    For a list of the read groups (if present), return the header information in a separate HeaderStruct structure and view the ReadGroup field in this structure.

    Data Types: char | string

    Indicator to read single sequence entry or block of sequence entries from a SAM-formatted file containing multiple sequences, specified as a number or a vector.

    To read the Nth entry in the file, specify N. To read a block of entries starting at the M1 entry and ending at the M2 entry, specify a vector [M1 M2]. To read all remaining entries in the file starting at the M1 entry, enter a positive value for M1 and enter Inf for M2.

    Data Types: double

    Output Arguments

    collapse all

    Sequence alignment and mapping information from a SAM-formatted file, returned as an N-by-1 array of structure arrays. Here, N is the number of alignment records stored in the SAM-formatted file. Each structure contains these fields:

    FieldDescription
    QueryName

    Name of read sequence (if unpaired) or name of sequence pair (if paired).

    Tip

    You can use this information to populate the Header property of the BioMap object.

    Flag

    Integer indicating the bit-wise information that specifies the status of each of 11 flags described by the SAM format specification.

    Tip

    You can use the bitget function to determine the status of a specific SAM flag.

    ReferenceNameName of the reference sequence.
    PositionPosition (one-based offset) of the forward reference sequence where the left-most base of the alignment of the read sequence starts.
    MappingQualityInteger specifying the mapping quality score for the read sequence.
    CigarStringCIGAR-formatted character vector representing how the read sequence aligns with the reference sequence.
    MateReferenceNameName of the reference sequence associated with the mate. If this name is the same as ReferenceName, then this value is =. If there is no mate, then this value is *.
    MatePositionPosition (one-based offset) of the forward reference sequence where the left-most base of the alignment of the mate of the read sequence starts.
    InsertSizeThe number of base positions between the read sequence and its mate, when both are mapped to the same reference sequence. Otherwise, this value is 0.
    SequenceCharacter vector containing the letter representations of the read sequence. It is the reverse-complement if the read sequence aligns to the reverse strand of the reference sequence.
    QualityCharacter vector containing the ASCII representation of the per-base quality score for the read sequence. The quality score is reversed if the read sequence aligns to the reverse strand of the reference sequence.
    TagsList of applicable SAM tags and their values.

    Header information for the SAM-formatted file, returned as a structure array with these fields:

    FieldDescription
    Header*Structure containing the file format version, sort order, and group order.
    SequenceDictionary*

    Structure containing the:

    • Sequence name

    • Sequence length

    • Genome assembly identifier

    • MD5 checksum of sequence

    • URI of sequence

    • Species

    ReadGroup*

    Structure containing the:

    • Read group identifier

    • Sample

    • Library

    • Description

    • Platform unit

    • Predicted median insert size

    • Sequencing center

    • Date

    • Platform

    Program*

    Structure containing the:

    • Program name

    • Version

    • Command line

    These structures and their fields appear in the output structure only if they are present in the SAM file. The information in these structures depends on the information present in the SAM file.

    Tips

    • Use the saminfo function to investigate the size and content of a SAM-formatted file before using the samread function to read the file contents into a MATLAB array of structures.

    • If your SAM-formatted file is too large to read using available memory, try one of the following:

      • Use the BlockRead parameter with the samread function to read a subset of entries.

      • Create a BioIndexedFile object from the SAM-formatted file, then access the entries using methods of the BioIndexedFile class.

    • Use the SAMStruct output argument that samread returns to create a BioMap object, which lets you explore, access, filter, and manipulate all or a subset of the data, before doing subsequent analyses or viewing the data.

    Version History

    Introduced in R2010a