mspeaks

Convert raw peak data to peak list (centroided data)

Syntax

Peaklist = mspeaks(X, Intensities) [Peaklist, PFWHH] = mspeaks(X, Intensities) [Peaklist, PFWHH, PExt] = mspeaks(X, Intensities) mspeaks(X, Intensities, ...'Base', BaseValue, ...) mspeaks(X, Intensities, ...'Levels', LevelsValue, ...) mspeaks(X, Intensities, ...'NoiseEstimator', NoiseEstimatorValue, ...) mspeaks(X, Intensities, ...'Multiplier', MultiplierValue, ...) mspeaks(X, Intensities, ...'Denoising', DenoisingValue, ...) mspeaks(X, Intensities, ...'PeakLocation', PeakLocationValue, ...) mspeaks(X, Intensities, ...'FWHHFilter', FWHHFilterValue, ...) mspeaks(X, Intensities, ...'OverSegmentationFilter', OverSegmentationFilterValue, ...) mspeaks(X, Intensities, ...'HeightFilter', HeightFilterValue, ...) mspeaks(X, Intensities, ...'ShowPlot', ShowPlotValue, ...) mspeaks(X, Intensities, ...'Style', StyleValue, ...)

Description

Peaklist = mspeaks(X, Intensities) finds relevant peaks in raw, noisy peak signal data, and creates Peaklist, a two-column matrix, containing the separation-axis value and intensity for each peak. X is a vector of separation-unit values for a set of signals with peaks. Intensities is a matrix of intensity values for a set of peaks that share the same separation-unit range.

[Peaklist, PFWHH] = mspeaks(X, Intensities) returns PFWHH, a two-column matrix indicating the left and right locations of the full width at half height (FWHH) markers for each peak. For any peak not resolved at FWHH, mspeaks returns the peak shape extents instead. When Intensities includes multiple signals, then PFWHH is a cell array of matrices.

[Peaklist, PFWHH, PExt] = mspeaks(X, Intensities) returns PExt, a two-column matrix indicating the left and right locations of the peak shape extents determined after wavelet denoising. When Intensities includes multiple signals, then PExt is a cell array of matrices.

mspeaks(X, Intensities, ...'PropertyName', PropertyValue, ...) calls mspeaks with optional properties that use property name/property value pairs. You can specify one or more properties in any order. Enclose each PropertyName in single quotation marks. Each PropertyName is case insensitive. These property name/property value pairs are as follows:

mspeaks(X, Intensities, ...'Base', BaseValue, ...) specifies the wavelet base.

mspeaks(X, Intensities, ...'Levels', LevelsValue, ...) specifies the number of levels for the wavelet decomposition.

mspeaks(X, Intensities, ...'NoiseEstimator', NoiseEstimatorValue, ...) specifies the method to estimate the threshold, T, to filter out noisy components in the first high-band decomposition (y_h).

mspeaks(X, Intensities, ...'Multiplier', MultiplierValue, ...) specifies the threshold multiplier constant.

mspeaks(X, Intensities, ...'Denoising', DenoisingValue, ...) controls the use of wavelet denoising to smooth the signal. Choices are true (default) or false.

mspeaks(X, Intensities, ...'PeakLocation', PeakLocationValue, ...) specifies the proportion of the peak height to use to select the points used to compute the centroid separation-axis value of the respective peak. PeakLocationValue must be a value ≥ 0 and ≤ 1. Default is 1.0.

mspeaks(X, Intensities, ...'FWHHFilter', FWHHFilterValue, ...) specifies the minimum full width at half height (FWHH), in separation units, for reported peaks. Peaks with FWHH below this value are excluded from the output list Peaklist.

mspeaks(X, Intensities, ...'OverSegmentationFilter', OverSegmentationFilterValue, ...) specifies the minimum distance, in separation units, between neighboring peaks. When the signal is not smoothed appropriately, multiple maxima can appear to represent the same peak. Increase this filter value to join oversegmented peaks into a single peak.

mspeaks(X, Intensities, ...'HeightFilter', HeightFilterValue, ...) specifies the minimum height for reported peaks. Peaks with heights below this value are excluded from the output list Peaklist.

mspeaks(X, Intensities, ...'ShowPlot', ShowPlotValue, ...) controls the display of a plot of the original and the smoothed signal, with the peaks included in the output matrix Peaklist marked.

mspeaks(X, Intensities, ...'Style', StyleValue, ...) specifies the style for marking the peaks in the plot.

mspeaks finds peaks in data from any separation technique that produces signal data, such as spectroscopy, nuclear magnetic resonance (NMR), electrophoresis, chromatography, or mass spectrometry.

Input Arguments

`X`	Vector of separation-unit values for a set of signals with peaks. The number of elements in the vector equals the number of rows in the matrix `Intensities`. The separation unit can quantify wavelength, frequency, distance, time, or m/z depending on the instrument that generates the signal data.
`Intensities`	Matrix of intensity values for a set of peaks that share the same separation-unit range. Each row corresponds to a separation-unit value, and each column corresponds to either a set of signals with peaks or a retention time. The number of rows equals the number of elements in vector `X`.
`BaseValue`	Integer from `2` to `20` that specifies the wavelet base. Default: `4`
`LevelsValue`	Integer from `1` to `12` that specifies the number of levels for the wavelet decomposition. Default: `10`
`NoiseEstimatorValue`	Character vector, string, or scalar that specifies the method to estimate the threshold, `T`, to filter out noisy components in the first high-band decomposition (`y_h`). Choices are: `mad` — Default. Median absolute deviation, which calculates `T = sqrt(2log(n))mad(y_h) / 0.6745`, where `n` = the number of rows in the `Intensities` matrix. `std` — Standard deviation, which calculates `T = std(y_h)`. A positive real value.
`MultiplierValue`	Positive real value that specifies the threshold multiplier constant. Default: `1.0`
`DenoisingValue`	Controls the use of wavelet denoising to smooth the signal. Choices are `true` (default) or `false`. Tip If your data was previously smoothed, for example, with the `mslowess` or `mssgolay` function, you do not need to use wavelet denoising. Set this property to `false`.
`PeakLocationValue`	Value that specifies the proportion of the peak height to use to select the points to compute the centroid separation-axis value of the respective peak. The value must be `≥ 0` and `≤ 1`. Note When `PeakLocationValue` = `1.0`, the peak location is at the maximum of the peak. When `PeakLocationValue` = `0`, `mspeaks` computes the peak location with all the points from the closest minimum to the left of the peak to the closest minimum to the right of the peak. Default: `1.0`
`FWHHFilterValue`	Positive real value that specifies the minimum full width at half height (FWHH), in separation units, for reported peaks. Peaks with FWHH below this value are excluded from the output list `Peaklist`. Default: `0`
`OverSegmentationFilterValue`	Positive real value that specifies the minimum distance, in separation units, between neighboring peaks. When the signal is not smoothed appropriately, multiple maxima can appear to represent the same peak. Increase this filter value to join oversegmented peaks into a single peak. Default: `0`
`HeightFilterValue`	Positive real value that specifies the minimum height for reported peaks. Default: `0`
`ShowPlotValue`	Controls the display of a plot of the original signal and the smoothed signal, with the peaks included in the output matrix `Peaklist` marked. Choices are `true`, `false`, or `I`, an integer specifying the index of a spectrum in `Intensities`. If set to `true`, the first spectrum in `Intensities` is plotted. Default is: `false` — When you specify return values. `true` — When you do not specify return values.
`StyleValue`	Character vector or string specifying the style for marking the peaks in the plot. Choices are: `'peak'` (default) — Places a marker at the peak crest. `'exttriangle'` — Draws a triangle using the peak crest and the extents. `'fwhhtriangle'` — Draws a triangle using the peak crest and the FWHH points. `'extline'` — Places a marker at the peak crest and vertical lines at the extents. `'fwhhline'` — Places a marker at the peak crest and a horizontal line at FWHH.

Output Arguments

`Peaklist`	Two-column matrix where each row corresponds to a peak. The first column contains separation-unit values (indicating the location of peaks along the separation axis). The second column contains intensity values. When `Intensities` includes multiple signals, then `Peaklist` is a cell array of matrices, each containing a peak list.
`PFWHH`	Two-column matrix indicating the left and right locations of the full width at half height (FWHH) markers for each peak. For any peak not resolved at FWHH, `mspeaks` returns the peak shape extents instead. When `Intensities` includes multiple signals, then `PFWHH` is a cell array of matrices.
`PExt`	Two-column matrix indicating the left and right locations of the peak shape extents determined after wavelet denoising. When `Intensities` includes multiple signals, then `PExt` is a cell array of matrices.

Examples

Load a MAT-file, included with the Bioinformatics Toolbox™ software, that contains two mass spectrometry data variables, MZ_lo_res and Y_lo_res. MZ_lo_res is a vector of m/z values for a set of spectra. Y_lo_res is a matrix of intensity values for a set of mass spectra that share the same m/z range.
```
load sample_lo_res
```
Adjust the baseline of the eight spectra stored in Y_lo_res.
```
YB = msbackadj(MZ_lo_res,Y_lo_res);
```
Convert the raw mass spectrometry data to a peak list by finding the relevant peaks in each spectrum.
```
P = mspeaks(MZ_lo_res,YB);
```
Plot the third spectrum in YB, the matrix of baseline-corrected intensity values, with the detected peaks marked.
```
P = mspeaks(MZ_lo_res,YB,'SHOWPLOT',3);
```
Smooth the signal using the mslowess function. Then convert the smoothed data to a peak list by finding relevant peaks and plot the third spectrum.
```
YS = mslowess(MZ_lo_res,YB,'SHOWPLOT',3);
```
```
P = mspeaks(MZ_lo_res,YS,'DENOISING',false,'SHOWPLOT',3);
```
Use the cellfun function to remove all peaks with m/z values less than 2000 from the eight peaks listed in output P. Then plot the peaks of the third spectrum (in red) over its smoothed signal (in blue).
```
Q = cellfun(@(p) p(p(:,1)>2000,:),P,'UniformOutput',false);
figure
plot(MZ_lo_res,YS(:,3),'b',Q{3}(:,1),Q{3}(:,2),'rx')
xlabel('Mass/Charge (M/Z)')
ylabel('Relative Intensity')
axis([0 20000 -5 95])
```

Algorithms

mspeaks converts raw peak data to a peak list (centroided data) by:

Smoothing the signal using undecimated wavelet transform with Daubechies coefficients
Assigning peak locations
Estimating noise
Eliminating peaks that do not satisfy specified criteria

References

[1] Morris, J.S., Coombes, K.R., Koomen, J., Baggerly, K.A., and Kobayash, R. (2005) Feature extraction and quantification for mass spectrometry in biomedical applications using the mean spectrum. Bioinfomatics 21:9, 1764–1775.

[2] Yasui, Y., Pepe, M., Thompson, M.L., Adam, B.L., Wright, G.L., Qu, Y., Potter, J.D., Winget, M., Thornquist, M., and Feng, Z. (2003) A data-analytic strategy for protein biomarker discovery: profiling of high-dimensional proteomic data for cancer detection. Biostatistics 4:3, 449–463.

[3] Donoho, D.L., and Johnstone, I.M. (1995) Adapting to unknown smoothness via wavelet shrinkage. J. Am. Statist. Asso. 90, 1200–1224.

[4] Strang, G., and Nguyen, T. (1996) Wavelets and Filter Banks (Wellesley: Cambridge Press).

[5] Coombes, K.R., Tsavachidis, S., Morris, J.S., Baggerly, K.A., Hung, M.C., and Kuerer, H.M. (2005) Improved peak detection and quantification of mass spectrometry data acquired from surface-enhanced laser desorption and ionization by denoising spectra with the undecimated discrete wavelet transform. Proteomics 5(16), 4107–4117.

Version History

Introduced in R2007a