DriftDiagnostics
Diagnostics information of batch drift detection
Description
DriftDiagnostics
object stores the diagnostics information after performing
permutation testing for batch drift detection
Creation
Create a DriftDiagnostics
object by using detectdrift
to test
for drift between baseline and target data sets.
Properties
Baseline
— Baseline data set
numeric array | categorical array | table
This property is read-only.
Baseline data set, specified as a numeric array, categorical array, or a table.
Data Types: double
| categorical
| table
CategoricalVariables
— Indices of categorical variables in data
numeric array | []
This property is read-only.
Indices of categorical variables in data, specified as a numeric array. If data does
not have any categorical variables, then this property is empty
([]
).
Data Types: double
ConfidenceIntervals
— 95% confidence interval bounds for estimated p-values
two-row matrix of positive scalar values from 0 to 1 | NaN
This property is read-only.
95% confidence interval bounds for estimated p-values for each
variable, specified as a 2-by-k matrix of positive scalar values from
0 to 1, where k is the number of variables. The rows of
ConfidenceIntervals
correspond to the lower and upper bounds of
the confidence intervals, respectively.
If you set 'EstimatePValues'
to false
in the
call to detectdrift
, then the function does not compute the
confidence interval bounds and the ConfidenceIntervals
property has
NaN
s instead.
Data Types: double
DriftStatus
— Drift status for each variable
string array
This property is read-only.
Drift status for each variable, specified as a string array with these possible values:
Drift Status | Condition |
---|---|
Drift | Upper < DriftThreshold |
Warning | DriftThreshold < Lower <
WarningThreshold or DriftThreshold
< Upper < WarningThreshold |
Stable | Lower > WarningThreshold |
Lower and Upper are the lower and upper confidence interval bounds for an estimated p-value.
Data Types: string
DriftThreshold
— Threshold to determine drift status
scalar value from 0 to 1
This property is read-only.
Threshold to determine drift status, specified as a scalar value from 0 to 1. If
the upper bound of the confidence interval for the estimated p-value
is below DriftThreshold
, then drift status is
'Drift'
.
Data Types: double
Metrics
— List of metrics
string array
This property is read-only.
List of metrics detectdrift
uses to quantify the difference
between baseline and target data for each variable during permutation testing, specified
as a string array.
Data Types: string
MetricValues
— Metric values for each variable
row vector
This property is read-only.
Metric values for the corresponding variables, specified as a row vector with the
number of columns equal to the number of variables specified for drift detection. The
metric corresponding to each variable is stored in the Metrics
property.
Data Types: double
MultipleTestCorrection
— Multiple hypothesis testing correction
'Bonferroni'
| 'FalseDiscoveryRate'
This property is read-only.
Multiple hypothesis testing correction, specified as either
'Bonferroni'
or 'FalseDiscoveryRate'
.
If you set 'EstimatePValues'
to false
in the
call to detectdrift
, then the function ignores the
MultipleTestCorrection
name-value argument.
Data Types: string
MultipleTestDriftStatus
— Drift status for overall data
'Drift'
| 'Warning'
| 'Stable'
This property is read-only.
Drift status for overall data detectdrift
estimates using the
multiple test correction method in MultipleTestCorrection
. Multiple
test corrections provide a conservative estimate of drift status when the testing is
done for multiple variables.
If you set 'EstimatePValues'
to false
in the
call to detectdrift
, then the function does not populate
MultipleTestDriftStatus
.
Data Types: string
NumPermutations
— Number of permutation tests performed for each variable
array of integer values
This property is read-only.
Number of permutation tests detectdrift
performs for each
variable to determine the drift status for that variable, specified as an array of
integer values.
If you set 'EstimatePValues'
to false
in the
call to detectdrift
, then NumPermutations
is a
row vector of ones, corresponding to the baseline and target data as you provide, and
the metric values are the initial computations using baseline and target data for each
variable.
Data Types: double
PermutationResults
— Permutation testing results for each variable
table
This property is read-only.
Permutation testing results for each variable, specified as a k-by-1 table, where k
is the number of variables. Each row corresponds to one variable and holds a 1-by-1 cell
array containing the metric values in a vector of size equal to the number of
permutations for that variable. To access the metric values for the second variable, for
example, use DDiagnostics.PermutationResults{2,1}{1,1}
.
If you set 'EstimatePValues'
to false
in the
call to detectdrift
, then PermutationResults
holds only the initial metric values for each variable.
You can visualize the test results using
plotPermutationResults
.
Data Types: table
PValues
— Estimated p-value for each variable
vector of scalar values from 0 to 1
This property is read-only.
Estimated p-value for each variable, specified as a vector of scalar values from 0 to 1.
If you set 'EstimatePValues'
to false
in the
call to detectdrift
, then PValues
is a vector
of NaN
s.
Data Types: double
Target
— Target data set
numeric array | categorical array | table
This property is read-only.
Target data set, specified as a numeric array, categorical array, or a table.
Data Types: single
| double
| categorical
| table
Variable Names
— Variables specified for drift detection
string array
This property is read-only.
Variables specified for drift detection in the call to
detectdrift
, specified as a string array.
Data Types: string
Warning Threshold
— Threshold to determine warning status
scalar value from 0 to 1
This property is read-only.
Threshold to determine warning versus drift status, specified as a scalar value from 0 to 1.
Data Types: double
Object Functions
ecdf | Compute empirical cumulative distribution function (ecdf) for baseline and target data specified for drift detection |
histcounts | Compute histogram bin counts for specified variables in baseline and target data for drift detection |
plotDriftStatus | Visualize p-values and confidence intervals |
plotEmpiricalCDF | Visualize empirical cumulative distribution function (ecdf) of a variable specified for drift detection |
plotHistogram | Visualize histogram for a variable in drift detection |
plotPermutationResults | Plot histogram of permutation results for a variable |
summary | Summary table for DriftDiagnostics object |
Examples
Test and Examine Drift Status
Load the sample data.
load humanactivity
For details on the data set, enter Description
at the command line.
Assign the first 250 observations as baseline data and next 250 as target data for variables 1 to 15.
baseline = feat(1:250,1:15); target = feat(251:500,1:15);
Test for drift on all variables.
DDiagnostics = detectdrift(baseline,target);
Display a summary of the test results.
summary(DDiagnostics)
Multiple Test Correction Drift Status: Drift DriftStatus PValue ConfidenceInterval ___________ ______ ________________________ x1 "Drift" 0.001 2.5317e-05 0.0055589 x2 "Drift" 0.001 2.5317e-05 0.0055589 x3 "Drift" 0.001 2.5317e-05 0.0055589 x4 "Drift" 0.001 2.5317e-05 0.0055589 x5 "Drift" 0.001 2.5317e-05 0.0055589 x6 "Drift" 0.001 2.5317e-05 0.0055589 x7 "Drift" 0.001 2.5317e-05 0.0055589 x8 "Stable" 0.863 0.84012 0.88372 x9 "Stable" 0.726 0.69722 0.75344 x10 "Drift" 0.001 2.5317e-05 0.0055589 x11 "Stable" 0.496 0.46456 0.52746 x12 "Stable" 0.249 0.22247 0.27702 x13 "Drift" 0.001 2.5317e-05 0.0055589 x14 "Stable" 0.574 0.54267 0.60489 x15 "Warning" 0.094 0.076629 0.1138
Summary table shows the drift status and the estimated p-value for each variable tested for drift detection. You can also see the 95% confidence interval bounds for the p-values.
Plot drift status for variables x10
to x15
.
plotDriftStatus(DDiagnostics,Variables=(10:15))
Compute the ecdf values for variables x13
and x15.
E = ecdf(DDiagnostics,Variables=["x13","x15"])
E=2×3 table
x F_Baseline F_Target
______________ ______________ ______________
x13 {501x1 double} {501x1 double} {501x1 double}
x15 {501x1 double} {501x1 double} {501x1 double}
x contains the common domain over which ecdf
computes the empirical cumulative distribution function for baseline and target data of a variable. Access the common domain for x13.
E.x{1}
ans = 501×1
0.0420
0.0420
0.0423
0.0424
0.0424
0.0425
0.0425
0.0426
0.0426
0.0426
⋮
Access the ecdf values for x15
in baseline data .
E.F_Baseline{2}
ans = 501×1
0
0
0.0040
0.0080
0.0080
0.0080
0.0080
0.0080
0.0120
0.0120
⋮
Plot the ecdf values for variables x13
and x15
.
tiledlayout(1,2) ax1 = nexttile; plotEmpiricalCDF(DDiagnostics,ax1,Variable="x13") ax2= nexttile; plotEmpiricalCDF(DDiagnostics,ax2,Variable="x15")
You can also visualize the permutation test results for a variable. Plot permutation results for variable x13
.
figure
plotPermutationResults(DDiagnostics,Variable="x13")
The plot also shows the metric threshold value with a straight line. Based on the histogram of metric values obtained during permutation testing, the probability that a metric value being greater than the threshold value if baseline and target data for variable x13 had the same distribution is very small. The plot also displays the estimated p-value, 0.001, and the drift status decision, Drift
, right below the plot title.
Compute Metrics without Estimating p-Values
Generate baseline and target data with three variables, where the distribution parameters of the second and third variables change for target data.
rng('default') % For reproducibility baseline = [normrnd(0,1,100,1),wblrnd(1.1,1,100,1),betarnd(1,2,100,1)]; target = [normrnd(0,1,100,1),wblrnd(1.2,2,100,1),betarnd(1.7,2.8,100,1)];
Compute the initial metrics for all variables between the baseline and target data without estimating p-values.
DDiagnostics = detectdrift(baseline,target,EstimatePValues=false)
DDiagnostics = DriftDiagnostics VariableNames: ["x1" "x2" "x3"] CategoricalVariables: [] Metrics: ["Wasserstein" "Wasserstein" "Wasserstein"] MetricValues: [0.2022 0.3468 0.0559] Properties, Methods
detectdrift
only computes the initial metrics value for each variable using the baseline and target data. The properties associated with permutation testing and p-value estimation are either empty or contain NaN
s.
summary(DDiagnostics)
MetricValue Metric ___________ _____________ x1 0.20215 "Wasserstein" x2 0.34676 "Wasserstein" x3 0.055922 "Wasserstein"
summary
method only displays the metrics used and the initial metric value for each of the specified variables.
plotDriftStatus
and plotPermutationResults
do not produce plots and return warning messages. plotEmpiricalCDF
and plotHistogram
plot the ecdf and the histogram, respectively, for the first variable by default. They both return NaN
for the p-value and drift status associated with the variable.
plotEmpiricalCDF(DDiagnostics)
plotHistogram(DDiagnostics)
Version History
See Also
detectdrift
| ecdf
| histcounts
| plotDriftStatus
| plotEmpiricalCDF
| plotHistogram
| plotPermutationResults
| summary
Beispiel öffnen
Sie haben eine geänderte Version dieses Beispiels. Möchten Sie dieses Beispiel mit Ihren Änderungen öffnen?
MATLAB-Befehl
Sie haben auf einen Link geklickt, der diesem MATLAB-Befehl entspricht:
Führen Sie den Befehl durch Eingabe in das MATLAB-Befehlsfenster aus. Webbrowser unterstützen keine MATLAB-Befehle.
Select a Web Site
Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select: .
You can also select a web site from the following list:
How to Get Best Site Performance
Select the China site (in Chinese or English) for best site performance. Other MathWorks country sites are not optimized for visits from your location.
Americas
- América Latina (Español)
- Canada (English)
- United States (English)
Europe
- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)
- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom (English)