Bootstrap sampling


bootstat = bootstrp(nboot,bootfun,d1,...)
[bootstat,bootsam] = bootstrp(...)
bootstat = bootstrp(...,'Name',Value)


bootstat = bootstrp(nboot,bootfun,d1,...) draws nboot bootstrap data samples, computes statistics on each sample using bootfun, and returns the results in the matrix bootstat. nboot must be a positive integer. bootfun is a function handle specified with @. Each row of bootstat contains the results of applying bootfun to one bootstrap sample. If bootfun returns a matrix or array, then this output is converted to a row vector for storage in bootstat.

The third and later input arguments (d1,...) are data (scalars, column vectors, or matrices) used to create inputs to bootfun. bootstrp creates each bootstrap sample by sampling with replacement from the rows of the non-scalar data arguments (these must have the same number of rows). bootfun accepts scalar data unchanged.

[bootstat,bootsam] = bootstrp(...) returns an n-by-nboot matrix of bootstrap indices, bootsam. Each column in bootsam contains indices of the values that were drawn from the original data sets to constitute the corresponding bootstrap sample. For example, if d1,... each contain 16 values, and nboot = 4, then bootsam is a 16-by-4 matrix. The first column contains the indices of the 16 values drawn from d1,..., for the first of the four bootstrap samples, the second column contains the indices for the second of the four bootstrap samples, and so on. (The bootstrap indices are the same for all input data sets.) To get the output samples bootsam without applying a function, set bootfun to empty ([]).

bootstat = bootstrp(...,'Name',Value) uses additional arguments specified by one or more Name,Value pair arguments. The name-value pairs must appear after the data arguments. The available name-value pairs:

  • 'Weights' — Observation weights. The weights value must be a vector of nonnegative numbers with at least one positive element. The number of elements in weights must be equal to the number of rows in non-scalar input arguments to bootstrp. To obtain one bootstrap replicate, bootstrp samples N out of N with replacement using these weights as multinomial sampling probabilities.

  • 'Options' — The value is a structure that contains options specifying whether to compute bootstrap iterations in parallel, and specifying how to use random numbers during the bootstrap sampling. Create the options structure with statset. Applicable statset parameters are:

    • 'UseParallel' — If true Parallel Computing Toolbox™ is installed, compute bootstrap iterations in parallel. If the Parallel Computing Toolbox is not installed, computation occurs in serial mode. Default is false, meaning serial computation.

    • UseSubstreams — Set to true to compute in parallel in a reproducible fashion. Default is false. To compute reproducibly, set Streams to a type allowing substreams: 'mlfg6331_64' or 'mrg32k3a'.

    • Streams — A RandStream object or cell array of such objects. If you do not specify Streams, bootstrp uses the default stream or streams. If you choose to specify Streams, use a single object except in the case

      • UseParallel is true

      • UseSubstreams is false

      In that case, use a cell array the same size as the Parallel pool.


collapse all

This example shows how to compute a correlation coefficient standard error using bootstrap resampling of the sample data.

Load a data set containing the LSAT scores and law-school GPA for 15 students. These 15 data points are resampled to create 1000 different data sets, and the correlation between the two variables is computed for each data set.

load lawdata
rng default  % For reproducibility
[bootstat,bootsam] = bootstrp(1000,@corr,lsat,gpa);

Display the first 5 bootstrapped correlation coefficients.

ans = 5×1


Display the indices of the data selected for the first 5 bootstrap samples.

ans = 15×5

    13     3    11     8    12
    14     7     1     7     4
     2    14     5    10     8
    14    12     1    11    11
    10    15     2    12    14
     2    10    13     5    15
     5     1    11    11     9
     9    13     5    10     3
    15    15    15     3     3
    15    11     1     2     4


The histogram shows the variation of the correlation coefficient across all the bootstrap samples. The sample minimum is positive, indicating that the relationship between LSAT score and GPA is not accidental.

Finally, compute a bootstrap standard of error for the estimated correlation coefficient.

se = std(bootstat)
se = 0.1285

This example shows how to estimate the kernel density of bootstrapped means.

Compute a sample of 100 bootstrapped means of random samples taken from the vector Y.

rng default;  % For reproducibility
y = exprnd(5,100,1);
m = bootstrp(100,@mean,y);

Plot an estimate of the density of these bootstrapped means.

[fi,xi] = ksdensity(m);

This example shows how to compute and plot the means and standard deviations of bootstrapped 100 samples from a data vector.

Compute a sample of 100 bootstrapped means and standard deviations of random samples taken from the vector y.

rng('default')  % For reproducibility
y = exprnd(5,100,1);
stats = bootstrp(100,@(x)[mean(x) std(x)],y);

Plot the bootstrap estimate pairs.


This example shows how to estimate the standard errors for a coefficient vector in a linear regression by bootstrapping the residuals.

Load the sample data.

load hald

Perform a linear regression and compute the residuals.

x = [ones(size(heat)),ingredients];
y = heat;
b = regress(y,x);
yfit = x*b;
resid = y - yfit;

Estimate the standard errors by bootstrapping the residuals.

se = std(bootstrp(...
se = 1×5

   56.1752    0.5940    0.5815    0.5989    0.5691

Extended Capabilities

Introduced before R2006a