CuffMergeOptions

Option set for cuffmerge

Description

A CuffMergeOptions object contains options for the cuffmerge function, which merges cufflinks transcript assemblies [1].

Creation

Syntax

cuffmergeOpt = CuffMergeOptions

cuffmergeOpt = CuffMergeOptions(Name,Value)

cuffmergeOpt = CuffMergeOptions(S)

Description

example

cuffmergeOpt = CuffMergeOptions creates a CuffMergeOptions object with the default property values.

CuffMergeOptions requires the Cufflinks Support Package for the Bioinformatics Toolbox™. If the support package is not installed, then the function provides a download link. For details, see Bioinformatics Toolbox Software Support Packages.

cuffmergeOpt = CuffMergeOptions(Name,Value) sets the object properties using one or more name-value pair arguments. Enclose each property name in quotes. For example, cuffmergeOpt = CuffMergeOptions('NumThreads',8) specifies eight parallel threads.

cuffmergeOpt = CuffMergeOptions(S) specifies optional parameters using a string or character vector S.

Input Arguments

expand all

`S` — `cuffmerge` options
string | character vector

cuffmerge options, specified as a string or character vector. S must be in the original cuffmerge option syntax (prefixed by one or two dashes).

Example: '--num-thread 5'

Properties

expand all

`ExtraCommand` — Additional commands
`""` (default) | string | character vector

The commands must be in the native syntax (prefixed by one or two dashes). Use this option to apply undocumented flags and flags without corresponding MATLAB^® properties.

When the software converts the original flags to MATLAB properties, it stores any unrecognized flags in this property.

Example: '--library-type fr-secondstrand'

Data Types: char | string

`IncludeAll` — Flag to use all object properties
`false` (default) | `true`

Flag to include all the object properties with the corresponding default values when converting to the original options syntax, specified as true or false. You can convert the properties to the original syntax prefixed by one or two dashes (such as '-d 100 -e 80') by using getCommand. The default value false means that when you call getCommand(optionsObject), it converts only the specified properties. If the value is true, getCommand converts all available properties, with default values for unspecified properties, to the original syntax.

Note

If you set IncludeAll to true, the software converts all available properties, using default values for unspecified properties. The only exception is when the default value of a property is NaN, Inf, [], '', or "". In this case, the software does not translate the corresponding property.

Example: true

Data Types: logical

`MinIsoformFraction` — Minimum abundance of isoform to be included in merged assembly
`0.5` (default) | scalar between `0` and `1`

Minimum abundance of an isoform to be included in the merged assembly, specified as a scalar between 0 and 1. This value is expressed as a percentage of the most abundant (major) isoform.

Example: 0.4

Data Types: double

`NumThreads` — Number of parallel threads to use
`1` (default) | positive integer

Number of parallel threads to use, specified as a positive integer. Threads are run on separate processors or cores. Increasing the number of threads generally improves the runtime significantly, but increases the memory footprint.

Example: 4

Data Types: double

`OutputDirectory` — Directory to store analysis results
current directory (`"./"`) (default) | string | character vector

Directory to store analysis results, specified as a string or character vector.

Example: "./AnalysisResults/"

Data Types: char | string

`ReferenceGTF` — Name of optional reference annotation GTF file
string | character vector

Name of an optional reference annotation GTF file to be included in the combined assembly, specified as a string or character vector.

Example: "ref.gtf"

Data Types: char | string

`ReferenceSequence` — Name of directory or FASTA file containing genomic sequences
string | character vector

Name of a directory or FASTA file containing genomic DNA sequences for the reference, specified as a string or character vector.

If you specify a directory, it must contain one FASTA file per contig. In other words, the directory must contain one FASTA file per reference chromosome, and each file must be named after the chromosome and have a .fa or .fasta extension.
If you specify a FASTA file, it must contain all the reference sequences.

The function uses the provided sequences to improve transfrag classification and exclude artifacts.

Example: "allrefs.fasta"

Data Types: char | string

`Version` — Supported version
string

This property is read-only.

Supported version of the original cufflinks software, returned as a string.

Example: "2.2.1"

Data Types: string

Object Functions

`getCommand`	Translate object properties to original options syntax
`getOptionsTable`	Return table with all properties and equivalent options in original syntax

Examples

collapse all

Create CuffMergeOptions Object

Create a CuffMergeOptions object with the default values.

opt = CuffMergeOptions;

Create an object using name-value pairs.

opt2 = CuffMergeOptions('OutputDirectory',"./merged",'MinIsoformFraction',0.1)

Create an object by using the original syntax.

opt3 = CuffMergeOptions('-o ./merged --min-isoform-fraction 0.1')

Assemble Transcriptome and Perform Differential Expression Testing

Create a CufflinksOptions object to define cufflinks options, such as the number of parallel threads and the output directory to store the results.

cflOpt = CufflinksOptions;
cflOpt.NumThreads = 8;
cflOpt.OutputDirectory = "./cufflinksOut";

The SAM files provided for this example contain aligned reads for Mycoplasma pneumoniae from two samples with three replicates each. The reads are simulated 100bp-reads for two genes (gyrA and gyrB) located next to each other on the genome. All the reads are sorted by reference position, as required by cufflinks.

sams = ["Myco_1_1.sam","Myco_1_2.sam","Myco_1_3.sam",...
        "Myco_2_1.sam", "Myco_2_2.sam", "Myco_2_3.sam"];

Assemble the transcriptome from the aligned reads.

[gtfs,isofpkm,genes,skipped] = cufflinks(sams,cflOpt);

gtfs is a list of GTF files that contain assembled isoforms.

Compare the assembled isoforms using cuffcompare.

stats = cuffcompare(gtfs);

Merge the assembled transcripts using cuffmerge.

mergedGTF = cuffmerge(gtfs,'OutputDirectory','./cuffMergeOutput');

mergedGTF reports only one transcript. This is because the two genes of interest are located next to each other, and cuffmerge cannot distinguish two distinct genes. To guide cuffmerge, use a reference GTF (gyrAB.gtf) containing information about these two genes. If the file is not located in the same directory that you run cuffmerge from, you must also specify the file path.

gyrAB = which('gyrAB.gtf');
mergedGTF2 = cuffmerge(gtfs,'OutputDirectory','./cuffMergeOutput2',...
			'ReferenceGTF',gyrAB);

Calculate abundances (expression levels) from aligned reads for each sample.

abundances1 = cuffquant(mergedGTF2,["Myco_1_1.sam","Myco_1_2.sam","Myco_1_3.sam"],...
                        'OutputDirectory','./cuffquantOutput1');
abundances2 = cuffquant(mergedGTF2,["Myco_2_1.sam", "Myco_2_2.sam", "Myco_2_3.sam"],...
                        'OutputDirectory','./cuffquantOutput2');

Assess the significance of changes in expression for genes and transcripts between conditions by performing the differential testing using cuffdiff. The cuffdiff function operates in two distinct steps: the function first estimates abundances from aligned reads, and then performs the statistical analysis. In some cases (for example, distributing computing load across multiple workers), performing the two steps separately is desirable. After performing the first step with cuffquant, you can then use the binary CXB output file as an input to cuffdiff to perform statistical analysis. Because cuffdiff returns several files, specify the output directory is recommended.

isoformDiff = cuffdiff(mergedGTF2,[abundances1,abundances2],...
                      'OutputDirectory','./cuffdiffOutput');

Display a table containing the differential expression test results for the two genes gyrB and gyrA.

readtable(isoformDiff,'FileType','text')

ans =

  2×14 table

        test_id            gene_id        gene              locus             sample_1    sample_2    status     value_1       value_2      log2_fold_change_    test_stat    p_value    q_value    significant
    ________________    _____________    ______    _______________________    ________    ________    ______    __________    __________    _________________    _________    _______    _______    ___________

    'TCONS_00000001'    'XLOC_000001'    'gyrB'    'NC_000912.1:2868-7340'      'q1'        'q2'       'OK'     1.0913e+05    4.2228e+05          1.9522           7.8886      5e-05      5e-05        'yes'   
    'TCONS_00000002'    'XLOC_000001'    'gyrA'    'NC_000912.1:2868-7340'      'q1'        'q2'       'OK'     3.5158e+05    1.1546e+05         -1.6064          -7.3811      5e-05      5e-05        'yes'

You can use cuffnorm to generate normalized expression tables for further analyses. cuffnorm results are useful when you have many samples and you want to cluster them or plot expression levels for genes that are important in your study. Note that you cannot perform differential expression analysis using cuffnorm.

Specify a cell array, where each element is a string vector containing file names for a single sample with replicates.

alignmentFiles = {["Myco_1_1.sam","Myco_1_2.sam","Myco_1_3.sam"],...
                  ["Myco_2_1.sam", "Myco_2_2.sam", "Myco_2_3.sam"]}
isoformNorm = cuffnorm(mergedGTF2, alignmentFiles,...
                      'OutputDirectory', './cuffnormOutput');

Display a table containing the normalized expression levels for each transcript.

readtable(isoformNorm,'FileType','text')

ans =

  2×7 table

      tracking_id          q1_0          q1_2          q1_1          q2_1          q2_0          q2_2   
    ________________    __________    __________    __________    __________    __________    __________

    'TCONS_00000001'    1.0913e+05         78628    1.2132e+05    4.3639e+05    4.2228e+05    4.2814e+05
    'TCONS_00000002'    3.5158e+05    3.7458e+05    3.4238e+05    1.0483e+05    1.1546e+05    1.1105e+05

Column names starting with q have the format: conditionX_N, indicating that the column contains values for replicate N of conditionX.

References

[1] Trapnell, Cole, Brian A Williams, Geo Pertea, Ali Mortazavi, Gordon Kwan, Marijke J van Baren, Steven L Salzberg, Barbara J Wold, and Lior Pachter. “Transcript Assembly and Quantification by RNA-Seq Reveals Unannotated Transcripts and Isoform Switching during Cell Differentiation.” Nature Biotechnology 28, no. 5 (May 2010): 511–15.

Version History

Introduced in R2019a

CuffMergeOptions

Description

Creation

Syntax

Description

Input Arguments

`S` — `cuffmerge` options
string | character vector

Properties

`ExtraCommand` — Additional commands
`""` (default) | string | character vector

`IncludeAll` — Flag to use all object properties
`false` (default) | `true`

`MinIsoformFraction` — Minimum abundance of isoform to be included in merged assembly
`0.5` (default) | scalar between `0` and `1`

`NumThreads` — Number of parallel threads to use
`1` (default) | positive integer

`OutputDirectory` — Directory to store analysis results
current directory (`"./"`) (default) | string | character vector

`ReferenceGTF` — Name of optional reference annotation GTF file
string | character vector

`ReferenceSequence` — Name of directory or FASTA file containing genomic sequences
string | character vector

`Version` — Supported version
string

Object Functions

Examples

Create CuffMergeOptions Object

Assemble Transcriptome and Perform Differential Expression Testing

References

Version History

See Also

Topics

External Websites

CuffMergeOptions

Description

Creation

Syntax

Description

Input Arguments

S — cuffmerge options string | character vector

Properties

ExtraCommand — Additional commands "" (default) | string | character vector

IncludeAll — Flag to use all object properties false (default) | true

MinIsoformFraction — Minimum abundance of isoform to be included in merged assembly 0.5 (default) | scalar between 0 and 1

NumThreads — Number of parallel threads to use 1 (default) | positive integer

OutputDirectory — Directory to store analysis results current directory ("./") (default) | string | character vector

ReferenceGTF — Name of optional reference annotation GTF file string | character vector

ReferenceSequence — Name of directory or FASTA file containing genomic sequences string | character vector

Version — Supported version string

Object Functions

Examples

Create CuffMergeOptions Object

Assemble Transcriptome and Perform Differential Expression Testing

References

Version History

See Also

Topics

External Websites

`S` — `cuffmerge` options
string | character vector

`ExtraCommand` — Additional commands
`""` (default) | string | character vector

`IncludeAll` — Flag to use all object properties
`false` (default) | `true`

`MinIsoformFraction` — Minimum abundance of isoform to be included in merged assembly
`0.5` (default) | scalar between `0` and `1`

`NumThreads` — Number of parallel threads to use
`1` (default) | positive integer

`OutputDirectory` — Directory to store analysis results
current directory (`"./"`) (default) | string | character vector

`ReferenceGTF` — Name of optional reference annotation GTF file
string | character vector

`ReferenceSequence` — Name of directory or FASTA file containing genomic sequences
string | character vector

`Version` — Supported version
string