Empirical cumulative distribution function (cdf) plot
an empirical cumulative distribution function (cdf) plot for the data in
x. For a value t in
x, the empirical cdf F(t) is the proportion of the values in
than or equal to t.
returns a handle of the empirical cdf plot line object. Use
h = cdfplot(
to query or modify properties of the object after you create it. For a list of
properties, see Line Properties.
Compare Empirical cdf to Theoretical cdf
Plot the empirical cdf of a sample data set and compare it to the theoretical cdf of the underlying distribution of the sample data set. In practice, a theoretical cdf can be unknown.
Generate a random sample data set from the extreme value distribution with a location parameter of 0 and a scale parameter of 3.
rng('default') % For reproducibility y = evrnd(0,3,100,1);
Plot the empirical cdf of the sample data set and the theoretical cdf on the same figure.
cdfplot(y) hold on x = linspace(min(y),max(y)); plot(x,evcdf(x,0,3)) legend('Empirical CDF','Theoretical CDF','Location','best') hold off
The plot shows the similarity between the empirical cdf and the theoretical cdf.
Alternatively, you can use the
ecdf function. The
ecdf function also plots the 95% confidence intervals estimated by using Greenwood's Formula. For details, see Algorithms.
ecdf(y,'Bounds','on') hold on plot(x,evcdf(x,0,3)) grid on title('Empirical CDF') legend('Empirical CDF','Lower Confidence Bound','Upper Confidence Bound','Theoretical CDF','Location','best') hold off
Test for Standard Normal Distribution
Perform the one-sample Kolmogorov-Smirnov test by using
kstest. Confirm the test decision by visually comparing the empirical cumulative distribution function (cdf) to the standard normal cdf.
examgrades data set. Create a vector containing the first column of the exam grade data.
load examgrades test1 = grades(:,1);
Test the null hypothesis that the data comes from a normal distribution with a mean of 75 and a standard deviation of 10. Use these parameters to center and scale each element of the data vector, because
kstest tests for a standard normal distribution by default.
x = (test1-75)/10; h = kstest(x)
h = logical 0
The returned value of
h = 0 indicates that
kstest fails to reject the null hypothesis at the default 5% significance level.
Plot the empirical cdf and the standard normal cdf for a visual comparison.
cdfplot(x) hold on x_values = linspace(min(x),max(x)); plot(x_values,normcdf(x_values,0,1),'r-') legend('Empirical CDF','Standard Normal CDF','Location','best')
The figure shows the similarity between the empirical cdf of the centered and scaled data vector and the cdf of the standard normal distribution.
x — Input data
Input data, specified as a numeric vector.
h — Handle of plot line object
chart line object
Handle of the empirical cdf plot line object, returned as a chart line
h to query or modify properties of the
object after you create it. For a list of properties, see Line Properties.
stats — Summary statistics
Summary statistics for the data in
x, returned as a
structure with the following fields:
Sample median (50th percentile)
Sample standard deviation
cdfplotis useful for examining the distribution of a sample data set. You can overlay a theoretical cdf on the same plot of
cdfplotto compare the empirical distribution of the sample to the theoretical distribution. For an example, see Compare Empirical cdf to Theoretical cdf.
lillietestfunctions compute test statistics derived from an empirical cdf.
cdfplotis useful in helping you to understand the output from these functions. For an example, see Test for Standard Normal Distribution.
You can use the
ecdf function to find the empirical cdf
values and create an empirical cdf plot. The
ecdf function enables
you to indicate censored data and compute the confidence bounds for the estimated cdf
Introduced before R2006a