A changepoint is a sample or time instant at
which some statistical property of a signal changes abruptly. The property in question
can be the mean of the signal, its variance, or a spectral characteristic, among
others.
To find a signal changepoint, findchangepts
employs a parametric
global method. The function:
Chooses a point and divides the signal into two sections.
Computes an empirical estimate of the desired statistical property for each
section.
At each point within a section, measures how much the property deviates from
the empirical estimate. Adds the deviations for all points.
Adds the deviations section-to-section to find the total residual
error.
Varies the location of the division point until the total residual error
attains a minimum.
The procedure is clearest when the chosen statistic is the mean. In that case,
findchangepts
minimizes the total residual error from the "best"
horizontal level for each section. Given a signal x1,
x2, …,
xN, and the subsequence mean and variance
where the sum of squares
findchangepts
finds k such that
is smallest. This result can be generalized to incorporate other
statistics. findchangepts
finds k such that
is smallest, given the section empirical estimate
χ and the deviation measurement Δ.
Minimizing the residual error is equivalent to maximizing the log likelihood. Given a
normal distribution with mean μ and variance
σ2, the log-likelihood for
N independent observations is
If 'Statistic'
is specified as 'mean'
,
the variance is fixed and the function uses
as obtained previously.
If 'Statistic'
is specified as 'std'
,
the mean is fixed and the function uses
If 'Statistic'
is specified as 'rms'
,
the total deviation is the same as for 'std'
but with the
mean set to zero:
If 'Statistic'
is specified as
'linear'
, the function uses as total deviation the sum of
squared differences between the signal values and the predictions of the
least-squares linear fit through the values. This quantity is also known as the
error sum of squares, or SSE.
The best-fit line through xm,
xm+1, …,
xn is
and the SSE is
Signals of interest often have more than one changepoint. Generalizing the procedure
is straightforward when the number of changepoints is known. When the number is unknown,
you must add a penalty term to the residual error, since adding changepoints always
decreases the residual error and results in overfitting. In the extreme case, every
point becomes a changepoint and the residual error vanishes.
findchangepts
uses a penalty term that grows linearly with the
number of changepoints. If there are K changepoints to be found, then
the function minimizes
where k0 and kK are respectively the first and the last sample of the signal.
The proportionality constant, denoted by β and specified
in 'MinThreshold'
, corresponds to a fixed penalty added for
each changepoint. findchangepts
rejects adding additional
changepoints if the decrease in residual error does not meet the threshold. Set
'MinThreshold'
to zero to return all possible
changes.
If you do not know what threshold to use or have a rough idea of the number
of changepoints in the signal, specify 'MaxNumChanges'
instead. This option gradually increases the threshold until the function finds
fewer changes than the specified value.
To perform the minimization itself, findchangepts
uses an
exhaustive algorithm based on dynamic programming with early abandonment.