CuffQuantOptions
Option set for cuffquant
Description
A CuffQuantOptions
object contains options to run the
cuffquant
function, which quantifies gene and transcript expression
data [1].
Creation
Syntax
Description
creates a
cuffquantOpt
= CuffQuantOptionsCuffQuantOptions
object with default property values.
CuffQuantOptions
requires the Cufflinks Support Package for the Bioinformatics Toolbox™. If the support package is not installed, then the function provides a download
link. For details, see Bioinformatics Toolbox Software Support Packages.
sets the object properties using
one or more name-value pair arguments. Enclose each property name in quotes. For example,
cuffquantOpt
= CuffQuantOptions(Name,Value)cuffquantOpt = CuffQuantOptions('NumThreads',8)
specifies to use
eight parallel threads.
specifies optional parameters using a string or character vector
cuffquantOpt
= CuffQuantOptions(S
)S
.
Input Arguments
S
— Cuffquant
options
string | character vector
Cuffquant
options, specified as a string or character vector.
S
must be in the cuffquant
option syntax
(prefixed by one or two dashes).
Example: '--seed 5'
Properties
EffectiveLengthCorrection
— Flag to normalize fragment counts
true
(default) | false
Flag to normalize fragment counts to fragments per kilobase per million mapped reads (FPKM), specified as true
or false
.
Example: false
Data Types: logical
ExtraCommand
— Additional commands
""
(default) | string | character vector
The commands must be in the native syntax (prefixed by one or two dashes). Use this option to apply undocumented flags and flags without corresponding MATLAB® properties.
When the software converts the original flags to MATLAB properties, it stores any unrecognized flags in this property.
Example: '--library-type fr-secondstrand'
Data Types: char
| string
FragmentBiasCorrection
— Name of FASTA file with reference transcripts to detect bias
string | character vector
Name of the FASTA file with reference transcripts to detect bias in fragment counts, specified as a string or character vector. Library preparation can introduce sequence-specific bias into RNA-Seq experiments. Providing reference transcripts improves the accuracy of the transcript abundance estimates.
Example:
"bias.fasta"
Data Types: char
| string
FragmentLengthMean
— Expected mean fragment length in base pairs
200
(default) | positive integer
Expected mean fragment length, specified as a positive integer.
The default value is 200
base pairs. The function can learn the fragment
length mean for each SAM file. Using this option is not recommended for paired-end reads.
Example: 100
Data Types: double
FragmentLengthSD
— Expected standard deviation for fragment length distribution
80
(default) | positive scalar
Expected standard deviation for the fragment length
distribution, specified as a positive scalar. The default value is 80
base
pairs. The function can learn the fragment length standard deviation for each SAM file. Using
this option is not recommended for paired-end reads.
Example: 70
Data Types: double
IncludeAll
— Flag to use all object properties
false
(default) | true
Flag to include all the object properties with the
corresponding default values when converting to the original options syntax, specified as
true
or false
. You can convert the properties to the
original syntax prefixed by one or two dashes (such as '-d 100 -e 80'
) by
using getCommand
. The
default value false
means that when you call
getCommand(optionsObject)
, it converts only the specified properties.
If the value is true
, getCommand
converts all available
properties, with default values for unspecified properties, to the original syntax.
Note
If you set IncludeAll
to true
, the software
converts all available properties, using default values for unspecified properties. The
only exception is when the default value of a property is NaN
,
Inf
, []
, ''
, or
""
. In this case, the software does not translate the
corresponding property.
Example: true
Data Types: logical
LengthCorrection
— Flag to correct by transcript length
true
(default) | false
Flag to correct by the transcript length, specified as
true
or false
. Set this value to
false
only when the fragment count is independent of the feature size,
such as for small RNA libraries with no fragmentation and for 3' end sequencing, where all
fragments have the same length.
Example: false
Data Types: logical
MaskFile
— Name of GTF or GFF file containing transcripts to ignore
string | character vector
Name of the GTF or GFF file containing transcripts to ignore during analysis, specified as a string or character vector. Some examples of transcripts to ignore include annotated rRNA transcripts, mitochondrial transcripts, and other abundant transcripts. Ignoring these transcripts improves the robustness of the abundance estimates.
Example: 'excludes.gtf'
Data Types: char
| string
MaxBundleFrags
— Maximum number of fragments to include for each locus before skipping
500000
(default) | positive integer
Maximum number of fragments to include for each locus before
skipping new fragments, specified as a positive integer. Skipped fragments are marked with the
status HIDATA
in the file skipped.gtf
.
Example: 400000
Data Types: double
MaxFragAlignments
— Maximum number of aligned reads to include for each fragment
Inf
(default) | positive integer
Maximum number of aligned reads to include for each fragment
before skipping new reads, specified as a positive integer. Inf
, the default
value, sets no limit on the maximum number of aligned reads.
Example: 1000
Data Types: double
MaxMLEIterations
— Maximum number of iterations for maximum likelihood estimation
5000
(default) | positive integer
Maximum number of iterations for the maximum likelihood estimation of abundances, specified as a positive integer.
Example: 4000
Data Types: double
MinAlignmentCount
— Minimum number of alignments in locus to perform significance testing
10
(default) | positive integer
Minimum number of alignments required in a locus to perform the significance testing for differences between samples, specified as a positive integer.
Example:
8
Data Types: double
MultiReadCorrection
— Flag to improve abundance estimation using rescue method
false
(default) | true
Flag to improve abundance estimation for reads mapped to
multiple genomic positions using the rescue method, specified as true
or
false
. If the value is false
, the function divides
multimapped reads uniformly to all mapped positions. If the value is true
,
the function uses additional information, including gene abundance estimation, inferred fragment
length, and fragment bias, to improve transcript abundance estimation.
The rescue method is described in [2].
Example: true
Data Types: logical
NumThreads
— Number of parallel threads to use
1
(default) | positive integer
Number of parallel threads to use, specified as a positive integer. Threads are run on separate processors or cores. Increasing the number of threads generally improves the runtime significantly, but increases the memory footprint.
Example: 4
Data Types: double
OutputDirectory
— Directory to store analysis results
current directory ("./"
) (default) | string | character vector
Directory to store analysis results, specified as a string or character vector.
Example: "./AnalysisResults/"
Data Types: char
| string
Seed
— Seed for random number generator
0
(default) | nonnegative integer
Seed for the random number generator, specified as a nonnegative integer. Setting a seed value ensures the reproducibility of the analysis results.
Example: 10
Data Types: double
Version
— Supported version
string
This property is read-only.
Supported version of the original cufflinks software, returned as a string.
Example: "2.2.1"
Data Types: string
Object Functions
getCommand | Translate object properties to original options syntax |
getOptionsTable | Return table with all properties and equivalent options in original syntax |
Examples
Create CuffQuantOptions Object
Create a CuffQuantOptions
object with the default values.
opt = CuffQuantOptions;
Create an object using name-value pairs.
opt2 = CuffQuantOptions('NumThreads',4,'MinAlignmentCount',50)
Create an object by using the original syntax.
opt3 = CuffQuantOptions('-p 4 --min-alignment-count 50')
Assemble Transcriptome and Perform Differential Expression Testing
Create a CufflinksOptions
object to define cufflinks options, such
as the number of parallel threads and the output directory to store the results.
cflOpt = CufflinksOptions;
cflOpt.NumThreads = 8;
cflOpt.OutputDirectory = "./cufflinksOut";
The SAM files provided for this example contain aligned reads for Mycoplasma
pneumoniae from two samples with three replicates each. The reads are
simulated 100bp-reads for two genes (gyrA
and
gyrB
) located next to each other on the genome. All the reads are
sorted by reference position, as required by cufflinks
.
sams = ["Myco_1_1.sam","Myco_1_2.sam","Myco_1_3.sam",... "Myco_2_1.sam", "Myco_2_2.sam", "Myco_2_3.sam"];
Assemble the transcriptome from the aligned reads.
[gtfs,isofpkm,genes,skipped] = cufflinks(sams,cflOpt);
gtfs
is a list of GTF files that contain assembled isoforms.
Compare the assembled isoforms using cuffcompare
.
stats = cuffcompare(gtfs);
Merge the assembled transcripts using cuffmerge
.
mergedGTF = cuffmerge(gtfs,'OutputDirectory','./cuffMergeOutput');
mergedGTF
reports only one transcript. This is because the two
genes of interest are located next to each other, and cuffmerge
cannot distinguish two distinct genes. To guide cuffmerge
, use a
reference GTF (gyrAB.gtf
) containing information about these two
genes. If the file is not located in the same directory that you run
cuffmerge
from, you must also specify the file path.
gyrAB = which('gyrAB.gtf'); mergedGTF2 = cuffmerge(gtfs,'OutputDirectory','./cuffMergeOutput2',... 'ReferenceGTF',gyrAB);
Calculate abundances (expression levels) from aligned reads for each sample.
abundances1 = cuffquant(mergedGTF2,["Myco_1_1.sam","Myco_1_2.sam","Myco_1_3.sam"],... 'OutputDirectory','./cuffquantOutput1'); abundances2 = cuffquant(mergedGTF2,["Myco_2_1.sam", "Myco_2_2.sam", "Myco_2_3.sam"],... 'OutputDirectory','./cuffquantOutput2');
Assess the significance of changes in expression for genes and transcripts between
conditions by performing the differential testing using cuffdiff
.
The cuffdiff
function operates in two distinct steps: the function
first estimates abundances from aligned reads, and then performs the statistical
analysis. In some cases (for example, distributing computing load across multiple
workers), performing the two steps separately is desirable. After performing the first
step with cuffquant
, you can then use the binary CXB output file as
an input to cuffdiff
to perform statistical analysis. Because
cuffdiff
returns several files, specify the output directory is
recommended.
isoformDiff = cuffdiff(mergedGTF2,[abundances1,abundances2],... 'OutputDirectory','./cuffdiffOutput');
Display a table containing the differential expression test results for the two genes
gyrB
and gyrA
.
readtable(isoformDiff,'FileType','text')
ans = 2×14 table test_id gene_id gene locus sample_1 sample_2 status value_1 value_2 log2_fold_change_ test_stat p_value q_value significant ________________ _____________ ______ _______________________ ________ ________ ______ __________ __________ _________________ _________ _______ _______ ___________ 'TCONS_00000001' 'XLOC_000001' 'gyrB' 'NC_000912.1:2868-7340' 'q1' 'q2' 'OK' 1.0913e+05 4.2228e+05 1.9522 7.8886 5e-05 5e-05 'yes' 'TCONS_00000002' 'XLOC_000001' 'gyrA' 'NC_000912.1:2868-7340' 'q1' 'q2' 'OK' 3.5158e+05 1.1546e+05 -1.6064 -7.3811 5e-05 5e-05 'yes'
You can use cuffnorm
to generate normalized expression tables for
further analyses. cuffnorm
results are useful when you have many
samples and you want to cluster them or plot expression levels for genes that are
important in your study. Note that you cannot perform differential expression analysis
using cuffnorm
.
Specify a cell array, where each element is a string vector containing file names for a single sample with replicates.
alignmentFiles = {["Myco_1_1.sam","Myco_1_2.sam","Myco_1_3.sam"],... ["Myco_2_1.sam", "Myco_2_2.sam", "Myco_2_3.sam"]} isoformNorm = cuffnorm(mergedGTF2, alignmentFiles,... 'OutputDirectory', './cuffnormOutput');
Display a table containing the normalized expression levels for each transcript.
readtable(isoformNorm,'FileType','text')
ans = 2×7 table tracking_id q1_0 q1_2 q1_1 q2_1 q2_0 q2_2 ________________ __________ __________ __________ __________ __________ __________ 'TCONS_00000001' 1.0913e+05 78628 1.2132e+05 4.3639e+05 4.2228e+05 4.2814e+05 'TCONS_00000002' 3.5158e+05 3.7458e+05 3.4238e+05 1.0483e+05 1.1546e+05 1.1105e+05
Column names starting with q have the format: conditionX_N, indicating that the column contains values for replicate N of conditionX.
References
[1] Trapnell, Cole, Brian A Williams, Geo Pertea, Ali Mortazavi, Gordon Kwan, Marijke J van Baren, Steven L Salzberg, Barbara J Wold, and Lior Pachter. “Transcript Assembly and Quantification by RNA-Seq Reveals Unannotated Transcripts and Isoform Switching during Cell Differentiation.” Nature Biotechnology 28, no. 5 (May 2010): 511–15.
[2] Mortazavi, Ali, Brian A Williams, Kenneth McCue, Lorian Schaeffer, and Barbara Wold. “Mapping and Quantifying Mammalian Transcriptomes by RNA-Seq.” Nature Methods 5, no. 7 (July 2008): 621–28. https://doi.org/10.1038/nmeth.1226.
Version History
Introduced in R2019a
See Also
CufflinksOptions
| cuffcompare
| cuffdiff
| cuffmerge
| cuffnorm
| cuffquant
| cuffgffread
| cuffgtf2sam
External Websites
MATLAB Command
You clicked a link that corresponds to this MATLAB command:
Run the command by entering it in the MATLAB Command Window. Web browsers do not support MATLAB commands.
Select a Web Site
Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select: .
You can also select a web site from the following list
How to Get Best Site Performance
Select the China site (in Chinese or English) for best site performance. Other MathWorks country sites are not optimized for visits from your location.
Americas
- América Latina (Español)
- Canada (English)
- United States (English)
Europe
- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)
- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom (English)
Asia Pacific
- Australia (English)
- India (English)
- New Zealand (English)
- 中国
- 日本Japanese (日本語)
- 한국Korean (한국어)