# ecdf

Compute empirical cumulative distribution function (ecdf) for baseline and target data specified for drift detection

## Syntax

``E = ecdf(DDiagnostics)``
``E = ecdf(DDiagnostics,Variables=variables)``

## Description

example

````E = ecdf(DDiagnostics)` returns the table `E`, which stores the ecdf values for all the variables specified for drift detection in the call to `detectdrift`.`ecdf` returns `NaN` values for categorical variables.```

example

````E = ecdf(DDiagnostics,Variables=variables)` returns the table `E` for the variables specified by `variables`.```

## Examples

collapse all

Generate baseline and target data with two variables, where the distribution parameters of the second variable change for target data.

```rng('default') % For reproducibility baseline = [normrnd(0,1,100,1),wblrnd(1.1,1,100,1)]; target = [normrnd(0,1,100,1),wblrnd(1.2,2,100,1)];```

Perform permutation testing for any drift between the baseline and target data.

`DDiagnostics = detectdrift(baseline,target)`
```DDiagnostics = DriftDiagnostics VariableNames: ["x1" "x2"] CategoricalVariables: [] DriftStatus: ["Stable" "Drift"] PValues: [0.2850 0.0030] ConfidenceIntervals: [2x2 double] MultipleTestDriftStatus: "Drift" DriftThreshold: 0.0500 WarningThreshold: 0.1000 Properties, Methods ```

Compute the ecdf values for both variables.

`E = ecdf(DDiagnostics)`
```E=2×3 table x F_Baseline F_Target ______________ ______________ ______________ x1 {201x1 double} {201x1 double} {201x1 double} x2 {201x1 double} {201x1 double} {201x1 double} ```

E is a table with two rows and three columns. The two rows correspond to the two variables. For each variable, `ecdf` computes the ecdf values over a common domain for baseline and target data. It stores the common domain for each variable in the column `x`,` the` ecdf values for baseline data in the column` F_Baseline`, and the ecdf values for target data in the column` F_Target`.

Access the ecdf values for variable 2 in baseline data.

`E.F_Baseline{2}`
```ans = 201×1 0 0.0100 0.0100 0.0200 0.0300 0.0400 0.0500 0.0600 0.0700 0.0800 ⋮ ```

Plot the empirical cumulative distribution function values of baseline and target data for variable `x2`.

```stairs(E.x{2},E.F_Baseline{2},LineWidth=1.5) hold on stairs(E.x{2},E.F_Target{2},LineWidth=1.5) title('ECDF for x2') xlabel('x2') ylabel('Empirical CDF') legend('Baseline','Target',Location = 'east') hold off```

The plot of the ecdf values also show the drift in the distribution of the target data.

Copyright 2021 The MathWorks, Inc.

Load the sample data.

`load humanactivity`

For details on the data set, enter `Description` at the command line.

Assign the first 1000 observations as baseline data and next 1000 as target data.

```baseline = feat(1:1000,:); target = feat(1001:2000,:);```

Test for drift on all variables.

`DDiagnostics = detectdrift(baseline,target);`

Compute the ecdf values for only the first five variables.

`E = ecdf(DDiagnostics,Variables=[1:5])`
```E=5×3 table x F_Baseline F_Target _______________ _______________ _______________ x1 {2001x1 double} {2001x1 double} {2001x1 double} x2 {2001x1 double} {2001x1 double} {2001x1 double} x3 {2001x1 double} {2001x1 double} {2001x1 double} x4 {2001x1 double} {2001x1 double} {2001x1 double} x5 {2001x1 double} {2001x1 double} {2001x1 double} ```

Access the ecdf values for the third variable in baseline data.

`E.F_Baseline{3}`
```ans = 2001×1 0 0 0 0 0 0 0.0010 0.0020 0.0030 0.0040 ⋮ ```

Plot the empirical cumulative distribution function values of baseline and target data for variable `x3`.

```stairs(E.x{3},E.F_Baseline{3},LineWidth=1.5) hold on stairs(E.x{3},E.F_Target{3},LineWidth=1.5) title('ECDF for x3') xlabel('x3') ylabel('Empirical CDF') legend('Baseline','Target',Location = 'southeast') hold off```

The ecdf plot shows the drift in the target data for variable `x3`.

## Input Arguments

collapse all

Diagnostics of the permutation testing for drift detection, specified as a `DriftDiagnostics` object returned by `detectdrift`.

List of variables for which to compute the ecdf values, specified as a string array, a cell array of character vectors, or a list of integer indices.

Example: `Variables=["x1","x3"]`

Example: `Variables=(1,3)`

Data Types: `single` | `double` | `char` | `string`

## Output Arguments

collapse all

ecdf values for all variables specified for drift detection in the call to `detectdrift`, returned as a table with the following columns.

Column nameDescription
`x`Common domain over which to evaluate the empirical cdf
`F_Baseline`ecdf values for the baseline data
`F_Target`ecdf for the target data

For each variable in `E`, the columns hold `x` and the ecdf values in cell arrays. To access the values, you can index into the table; for example, to obtain the ecdf values for the second variable in baseline data, use `E.F_Baseline{2,1}`.

## Version History

Introduced in R2022a