affyinvarsetnorm
Perform rank invariant set normalization on probe intensities from multiple Affymetrix CEL or DAT files
Syntax
NormData
=
affyinvarsetnorm(Data
)
[NormData
, MedStructure
]
= affyinvarsetnorm(Data
)
... affyinvarsetnorm(..., 'Baseline', BaselineValue
,
...)
... affyinvarsetnorm(..., 'Thresholds', ThresholdsValue
,
...)
... affyinvarsetnorm(..., 'StopPercentile', StopPercentileValue
,
...)
... affyinvarsetnorm(..., 'RayPercentile', RayPercentileValue
,
...)
... affyinvarsetnorm(..., 'Method', MethodValue
,
...)
... affyinvarsetnorm(..., 'Showplot', ShowplotValue
,
...)
Arguments
Data | Matrix of intensity values where each row corresponds to a perfect match (PM) probe and each column corresponds to an Affymetrix® CEL or DAT file. (Each CEL or DAT file is generated from a separate chip. All chips should be of the same type.) |
MedStructure | Structure of each column's intensity median before and after normalization, and the index of the column chosen as the baseline. |
BaselineValue | Property to control the selection of the column index |
ThresholdsValue | Property to set the thresholds for the lowest average rank and the highest average rank, which are used to determine the invariant set. The rank invariant set is a set of data points whose proportional rank difference is smaller than a given threshold. The threshold for each data point is determined by interpolating between the threshold for the lowest average rank and the threshold for the highest average rank. Select these two thresholds empirically to limit the spread of the invariant set, but allow enough data points to determine the normalization relationship.
|
StopPercentileValue | Property to stop the iteration process when the number
of data points in the invariant set reaches Note If you do not use this property, the iteration process continues until no more data points are eliminated.
|
RayPercentileValue | Property to select the |
MethodValue | Property to select the smoothing method used to normalize
the data. Enter |
ShowplotValue | Property to control the plotting of two pairs of scatter
plots (before and after normalization). The first pair plots baseline
data versus data from a specified column (chip) from the matrix |
Description
normalizes
the values in each column (chip) of probe intensities in NormData
=
affyinvarsetnorm(Data
)Data
to
a baseline reference, using the invariant set method. NormData
is
a matrix of normalized probe intensities from Data
.
Specifically, affyinvarsetnorm
:
Selects a baseline index, typically the column whose median intensity is the median of all the columns.
For each column, determines the proportional rank difference (
prd
) for each pair of ranks, RankX and RankY, from the sample column and the baseline reference.prd = abs(RankX - RankY)
For each column, determines the invariant set of data points by selecting data points whose proportional rank differences (prd) are below threshold, which is a predetermined threshold for a given data point (defined by the
ThresholdsValue
property). It repeats the process until either no more data points are eliminated, or a predetermined percentage of data points is reached.The invariant set is data points with a prd < threshold.
For each column, uses the invariant set of data points to calculate the lowess or running median smoothing curve, which is used to normalize the data in that column.
[
also
returns a structure of the index of the column chosen as the baseline
and each column's intensity median before and after normalization.NormData
, MedStructure
]
= affyinvarsetnorm(Data
)
Note
If Data
contains NaN values, then NormData
will
also contain NaN values at the corresponding positions.
... affyinvarsetnorm(..., '
calls PropertyName
', PropertyValue
,
...)affyinvarsetnorm
with optional
properties that use property name/property value pairs. You can specify
one or more properties in any order. Each PropertyName
must
be enclosed in single quotation marks and is case insensitive. These
property name/property value pairs are as follows:
... affyinvarsetnorm(..., 'Baseline',
lets you select the column index BaselineValue
,
...)N
from Data
to
be the baseline column. Default is the index of the column whose median
intensity is the median of all the columns.
... affyinvarsetnorm(..., 'Thresholds',
sets the thresholds for the lowest average rank and
the highest average rank, which are used to determine the invariant
set. The rank invariant set is a set of data points whose proportional
rank difference is smaller than a given threshold. The threshold for
each data point is determined by interpolating between the threshold
for the lowest average rank and the threshold for the highest average
rank. Select these two thresholds empirically to limit the spread
of the invariant set, but allow enough data points to determine the
normalization relationship.ThresholdsValue
,
...)
ThresholdsValue
is a 1-by-2 vector
[LT, HT
], where LT
is
the threshold for the lowest average rank and HT
is
threshold for the highest average rank. Values must be between 0
and 1
.
Default is [0.05, 0.005
].
... affyinvarsetnorm(..., 'StopPercentile',
stops the iteration process when the number of data
points in the invariant set reaches StopPercentileValue
,
...)N
percent
of the total number of data points. Default is 1
.
Note
If you do not use this property, the iteration process continues until no more data points are eliminated.
... affyinvarsetnorm(..., 'RayPercentile',
selects the RayPercentileValue
,
...)N
percentage
of the highest ranked invariant set of data points to fit a straight
line through, while the remaining data points are fitted to a running
median curve. The final running median curve is a piecewise linear
curve. Default is 1.5
.
... affyinvarsetnorm(..., 'Method',
selects the smoothing method for normalizing the data.
When MethodValue
,
...)MethodValue
is 'lowess'
, affyinvarsetnorm
uses
the lowess method. When MethodValue
is
'runmedian'
, affyinvarsetnorm
uses
the running median method. Default is 'lowess'
.
... affyinvarsetnorm(..., 'Showplot',
plots two pairs of scatter plots (before and after
normalization). The first pair plots baseline data versus data from
a specified column (chip) from the matrix ShowplotValue
,
...)Data
.
The second is a pair of M-A scatter plots, which plots M (ratio between
baseline and sample) versus A (the average of the baseline and sample).
When ShowplotValue
is 'all'
, affyinvarsetnorm
plots
a pair of scatter plots for each column or chip. When ShowplotValue
is
a number(s) or range of numbers, affyinvarsetnorm
plots
a pair of scatter plots for the indicated column numbers (chips).
Examples
References
[1] Li, C., and Wong, W.H. (2001). Model-based analysis of oligonucleotide arrays: model validation, design issues and standard error application. Genome Biology 2(8): research0032.1-0032.11.
[2] Best, C.J.M., Gillespie, J.W., Yi, Y., Chandramouli, G.V.R., Perlmutter, M.A., Gathright, Y., Erickson, H.S., Georgevich, L., Tangrea, M.A., Duray, P.H., Gonzalez, S., Velasco, A., Linehan, W.M., Matusik, R.J., Price, D.K., Figg, W.D., Emmert-Buck, M.R., and Chuaqui, R.F. (2005). Molecular alterations in primary prostate cancer after androgen ablation therapy. Clinical Cancer Research 11, 6823–6834.
Version History
Introduced in R2006a
See Also
affyread
| celintensityread
| mainvarsetnorm
| malowess
| manorm
| quantilenorm
| rmabackadj
| rmasummary