Main Content

Data Preprocessing for Condition Monitoring and Predictive Maintenance

Data preprocessing is the second stage of the workflow for predictive maintenance algorithm development:

Data preprocessing is often necessary to clean the data and convert it into a form from which you can extract condition indicators. Data preprocessing can include:

  • Outlier and missing-value removal, offset removal, and detrending.

  • Noise reduction, such as filtering or smoothing.

  • Transformations between time and frequency domain.

  • More advanced signal processing such as short-time Fourier transforms and transformations to the order domain.

You can perform data preprocessing on arrays or tables of measured or simulated data that you manage with Predictive Maintenance Toolbox™ ensemble datastores, as described in Data Ensembles for Condition Monitoring and Predictive Maintenance. Generally, you preprocess your data before analyzing it to identify a promising condition indicator, a quantity that changes in a predictable way as system performance degrades. (See Condition Indicators for Monitoring, Fault Detection, and Prediction.) There can be some overlap between the steps of preprocessing and identifying condition indicators. Typically, though, preprocessing results in a cleaned or transformed signal, on which you perform further analysis to condense the signal information into a condition indicator.

Understanding your machine and the kind of data you have can help determine what preprocessing methods to use. For example, if you are filtering noisy vibration data, knowing what frequency range is most likely to display useful features can help you choose preprocessing techniques. Similarly, it might be useful to transform gearbox vibration data to the order domain, which is used for rotating machines when the rotational speed changes over time. However, that same preprocessing would not be useful for vibration data from a car chassis, which is a rigid body.

Basic Preprocessing

MATLAB® includes many functions that are useful for basic preprocessing of data in arrays or tables. These include functions for:

  • Data cleaning, such as fillmissing and filloutliers. Data cleaning uses various techniques for finding, removing, and replacing bad or missing data.

  • Smoothing data, such as smoothdata and movmean. Use smoothing to eliminate unwanted noise or high variance in data.

  • Detrending data, such as detrend. Removing a trend from the data lets you focus your analysis on the fluctuations in the data about the trend. While trends can be meaningful, others are due to systematic effects, and some types of analyses yield better insight once you remove them. Removing offsets is another, similar type of preprocessing.

  • Scaling or normalizing data, such as rescale. Scaling changes the bounds of the data, and can be useful, for example, when you are working with data in different units.

Another common type of preprocessing is to extract a useful portion of the signal and discard other portions. For instance, you might discard the first five seconds of a signal that is part of some start-up transient, and retain only the data from steady-state operation. For an example that performs this kind of preprocessing, see Using Simulink to Generate Fault Data.

For more information on basic preprocessing commands in MATLAB, see Preprocessing Data.


Filtering is another way to remove noise or unwanted components from a signal. Filtering is helpful when you know what frequency range in the data is most likely to display useful features for condition monitoring or prediction. The basic MATLAB function filter lets you filter a signal with a transfer function. You can use designfilt to generate filters for use with filter, such as passband, high-pass and low-pass filters, and other common filter forms. For more information about using these functions, see Digital and Analog Filters.

If you have a Wavelet Toolbox™ license, you can use wavelet tools for more complex filter approaches. For instance, you can divide your data into subbands, process the data in each subband separately, and recombine them to construct a modified version of the original signal. For more information about such filters, see Filter Banks (Wavelet Toolbox). You can also use the Signal Processing Toolbox™ function emd to decompose separate a mixed signal into components with different time-frequency behavior.

Time-Domain Preprocessing

Predictive Maintenance Toolbox and Signal Processing Toolbox provides functions that let you study and characterize vibrations in mechanical systems in the time domain. Use these functions for preprocessing or extraction of condition indicators. For example:

  • tsa — Remove noise coherently with time-synchronous averaging and analyze wear using envelope spectra. The example Using Simulink to Generate Fault Data uses time-synchronous averaging to preprocess vibration data.

  • tsadifference — Remove the regular signal, the first-order sidebands and other specific sidebands with their harmonics from a time-synchronous averaged (TSA) signal.

  • tsaregular — Isolate the known signal from a TSA signal by removing the residual signal and specific sidebands.

  • tsaresidual — Isolate the residual signal from a TSA signal by removing the known signal components and their harmonics.

  • ordertrack — Use order analysis to analyze and visualize spectral content occurring in rotating machinery. Track and extract orders and their time-domain waveforms.

  • rpmtrack — Track and extract the RPM profile from a vibration signal by computing the RPM as a function of time.

  • envspectrum — Compute an envelope spectrum. The envelope spectrum removes the high-frequency sinusoidal components from the signal and focuses on the lower-frequency modulations. The example Rolling Element Bearing Fault Diagnosis uses an envelope spectrum for such preprocessing.

For more information on these and related functions, see Vibration Analysis.

Frequency-Domain (Spectral) Preprocessing

For vibrating or rotating systems, fault development can be indicated by changes in frequency-domain behavior such as the changing of resonant frequencies or the presence of new vibrational components. Signal Processing Toolbox provides many functions for analyzing such spectral behavior. Often these are useful as preprocessing before performing further analysis for extracting condition indicators. Such functions include:

  • pspectrum — Compute the power spectrum, time-frequency power spectrum, or power spectrogram of a signal. The spectrogram contains information about how the power distribution changes with time. The example Multi-Class Fault Detection Using Simulated Data performs data preprocessing using pspectrum.

  • envspectrum — Compute an envelope spectrum. A fault that causes a repeating impulse or pattern will impose amplitude modulation on the vibration signal of the machinery. The envelope spectrum removes the high-frequency sinusoidal components from the signal and focuses on the lower-frequency modulations. The example Rolling Element Bearing Fault Diagnosis uses an envelope spectrum for such preprocessing.

  • orderspectrum — Compute an average order-magnitude spectrum.

  • modalfrf — Estimate the frequency-response function of a signal.

For more information on these and related functions, see Vibration Analysis.

Time-Frequency Preprocessing

Signal Processing Toolbox includes functions for analyzing systems whose frequency-domain behavior changes with time. Such analysis is called time-frequency analysis, and is useful for analyzing and detecting transient or changing signals associated with changes in system performance. These functions include:

  • spectrogram — Compute a spectrogram using a short-time Fourier transform. The spectrogram describes the time-localized frequency content of a signal and its evolution over time. The example Condition Monitoring and Prognostics Using Vibration Signals uses spectrogram to preprocess signals and help identify potential condition indicators.

  • hht — Compute the Hilbert spectrum of a signal. The Hilbert spectrum is useful for analyzing signals that comprise a mixture of signals whose spectral content changes in time. This function computes the spectrum of each component in the mixed signal, where the components are determined by empirical mode decomposition.

  • emd — Compute the empirical mode decomposition of a signal. This decomposition describes the mixture of signals analyzed in a Hilbert spectrum, and can help you separate a mixed signal to extract a component whose time-frequency behavior changes as system performance degrades. You can use emd to generate the inputs for hht.

  • kurtogram — Compute the time-localized spectral kurtosis, which characterizes a signal by differentiating stationary Gaussian signal behavior from nonstationary or non-Gaussian behavior in the frequency domain. As preprocessing for other tools such as envelope analysis, spectral kurtosis can supply key inputs such as optimal band. (See pkurtosis.) The example Rolling Element Bearing Fault Diagnosis uses spectral kurtosis for preprocessing and extraction of condition indicators.

For more information on these and related functions, see Time-Frequency Analysis.

Related Topics