Main Content

SeqFilterOptions

Contain options to filter sequences

Since R2023a

Description

A SeqFilterOptions object contains options to filter sequences based on a specified criterion. This object is used as the value of Options property of the bioinfo.pipeline.block.SeqFilter block.

Creation

Description

optionsObj = bioinfo.pipeline.options.SeqFilterOptions creates a SeqFilterOptions object with default property values.

optionsObj = bioinfo.pipeline.options.SeqFilterOptions(Name=Value) sets properties using one or more name-value arguments. Name is the property name and Value is the property value. For example, optionsObj = bioinfo.pipeline.options.SeqFilterOptions(Threshold=[5 15]) specifies to filter out sequences with a total of more than five low-quality bases, where a base is low quality if its score is less than 15.

Properties

expand all

Criterion to filter sequences, specified as one of the following options. Specify only one filtering criterion per function call.

  • 'MaxNumberLowQualityBases'– applies a maximum threshold on the number of low-quality bases allowed.

  • 'MaxPercentLowQualityBases'– applies a maximum threshold on the percentage of low-quality bases allowed.

  • 'MeanQuality'– applies a minimum threshold on the average base quality across each sequence.

  • 'MinLength'– applies a minimum threshold on the sequence length.

Use this name-value pair argument together with 'Threshold' to specify the appropriate threshold value. Depending on the filtering criterion, the corresponding value for 'Threshold' can be a scalar or two-element vector. See the 'Threshold' option for the default values. If you do not specify 'Threshold', then the function uses the default threshold value of the specified method. For each filtering criterion, the function uses the base quality encoding format specified by the 'Encoding' name-value pair argument.

Threshold value for the filtering criterion, specified as a scalar or vector. Use this name-value pair to define the threshold value for the filtering criterion specified by 'Method'.

Depending on the filtering criterion, the corresponding value for 'Threshold' can be a scalar or two-element vector. If you do not specify 'Threshold', then the function uses the default threshold value of the corresponding method. For each filtering criterion, the function uses the encoding format of the base quality specified by the 'Encoding' name-value pair argument.

'Method''Threshold'Default 'Threshold' value
'MaxNumberLowQualityBases'Two-element vector [V1 V2]. V1 is a nonnegative integer that specifies the maximum number of low-quality bases allowed. V2 specifies the minimum base quality. Any base with quality less than V2 is considered a low-quality base. Any sequence containing a number of low-quality bases greater than V1 is filtered out and not saved in the output file.[0 10]
'MaxPercentLowQualityBases'Two-element vector [V1 V2]. V1 is a scalar between 0 and 100 that specifies the maximum percentage of low-quality bases allowed. V2 specifies the minimum base quality. Any base with quality less than V2 is considered a low-quality base. Any sequence containing a percentage of low-quality bases greater than V1 is filtered out and not saved in the output file.[0 10]
'MeanQuality'Positive scalar that specifies the minimum threshold on the average base quality across each sequence. Any sequence with average base quality less than this value is filtered out.0
'MinLength'Nonnegative integer that specifies the minimum threshold on the sequence length allowed. Any sequence with length less than this value is filtered out. 1

Size of the sliding window to apply the filtering criterion to a sequence, specified as a positive integer. The size of the window corresponds to the number of bases that the function uses at one time to apply the criterion. If any window fails the criterion, the whole sequence is discarded.

The default is Inf, that is, the filtering criterion is applied to the whole sequence.

Base quality encoding format, specified as a character vector or string.

Suffix to use in the output file name, specified as a character vector or string. It is inserted after the input file name and before the file extension. The default is '_filtered'.

Whether to consider the input files as pairs for paired-end sequence data, specified as true or false.

If true, the input files are read as pairs, and the sequence data is maintained in sync between the files. That is, if a sequence is filtered out in the first file, the corresponding sequence in the paired file is also filtered out.

Whether to save singleton sequences in a separate output file, specified as true or false. To set this to true, the 'PairedFiles' option must also be set to true.

A singleton sequence is the sequence that passes the filtering criterion but its corresponding sequence in the paired file does not. If true, singleton sequences are saved in a separate file with the suffix '_singleton'. The default is false, meaning that, only sequences that pass the filtering criterion in both input files of a given pair are saved in the output files.

Version History

Introduced in R2023a