Representing Sampling Distributions Using Markov Chain Samplers

For probability distributions that are complex, or are not in the list of supported distributions in Random Number Generation, you might need more advanced methods for generating samples than the methods described in Common Pseudorandom Number Generation Methods. Such distributions arise, for example, in Bayesian data analysis and in the large combinatorial problems of Markov chain Monte Carlo (MCMC) simulations. An alternative is to construct a Markov chain with a stationary distribution equal to the target sampling distribution, using the states of the chain to generate random numbers after an initial burn-in period in which the state distribution converges to the target.

Using the Metropolis-Hastings Algorithm

The Metropolis-Hastings algorithm draws samples from a distribution that is only known up to a constant. Random numbers are generated from a distribution with a probability density function that is equal to or proportional to a proposal function.

To generate random numbers:

Assume an initial value x(t).
Draw a sample, y(t), from a proposal distribution q(y|x(t)).
Accept y(t) as the next sample x(t + 1) with probability r(x(t),y(t)), and keep x(t) as the next sample x(t + 1) with probability 1 – r(x(t),y(t)), where:

$r (x, y) = m i n {\frac{f (y)}{f (x)} \frac{q (x | y)}{q (y | x)}, 1}$
Increment t → t + 1, and repeat steps 2 and 3 until you get the desired number of samples.

Generate random numbers using the Metropolis-Hastings method with the mhsample function. To produce quality samples efficiently with the Metropolis-Hastings algorithm, it is crucial to select a good proposal distribution. If it is difficult to find an efficient proposal distribution, use slice sampling (slicesample) or Hamiltonian Monte Carlo (hmcSampler) instead.

Using Slice Sampling

In instances where it is difficult to find an efficient Metropolis-Hastings proposal distribution, the slice sampling algorithm does not require an explicit specification. The slice sampling algorithm draws samples from the region under the density function using a sequence of vertical and horizontal steps. First, it selects a height at random from 0 to the density function f (x). Then, it selects a new x value at random by sampling from the horizontal “slice” of the density above the selected height. A similar slice sampling algorithm is used for a multivariate distribution.

If a function f(x) proportional to the density function is given, then do the following to generate random numbers:

Assume an initial value x(t) within the domain of f(x).
Draw a real value y uniformly from (0, f(x(t))), thereby defining a horizontal “slice” as S = {x: y < f(x)}.
Find an interval I = (L, R) around x(t) that contains all, or much of the “slice” S.
Draw the new point x(t + 1) within this interval.
Increment t → t + 1 and repeat steps 2 through 4 until you get the desired number of samples.

Slice sampling can generate random numbers from a distribution with an arbitrary form of the density function, provided that an efficient numerical procedure is available to find the interval I = (L,R), which is the “slice” of the density.

Generate random numbers using the slice sampling method with the slicesample function.

Using Hamiltonian Monte Carlo

Metropolis-Hastings and slice sampling can produce MCMC chains that mix slowly and take a long time to converge to the stationary distribution, especially in medium-dimensional and high-dimensional problems. Use the gradient-based Hamiltonian Monte Carlo (HMC) sampler to speed up sampling in these situations.

To use HMC sampling, you must specify log f(x) (up to an additive constant) and its gradient. You can use a numerical gradient, but this leads to slower sampling. All sampling variables must be unconstrained, meaning that log f(x) and its gradient are well-defined for all real x. To sample constrained variables, transform these variables into unconstrained ones before using the HMC sampler.

The HMC sampling algorithm introduces a random “momentum vector” z and defines a joint density of z and the “position vector” x as P(x,z) = f(x)g(z). The goal is to sample from this joint distribution and then to ignore the values of z — the marginal distribution of x has the desired density f(x).

The HMC algorithm assigns a Gaussian density with covariance matrix M (the “mass matrix”) to z:

$g (z) \propto \exp (- \frac{1}{2} z^{T} M^{- 1} z)$

Then, it defines an “energy function” as

$E (x, z) = - \log f (x) + \frac{1}{2} z^{T} M^{- 1} z = U (x) + K (z)$

with U(x) = – log f(x) the “potential energy” and K(z) = z^TM^-1z/2 the “kinetic energy”. The joint density is given by P(x,z) ∝ exp{-E(x,z)}.

To generate random samples, the HMC algorithm:

Assumes an initial value x of the position vector.
Generates a sample of the momentum vector: z ∼ g(z).
Evolves the state (x, z) for some amount of fictitious time τ to a new state (x’,z’) using the “equations of motion”:

$\frac{d z}{d τ} = - \frac{\partial U}{\partial x}$

$\frac{d x}{d τ} = \frac{\partial K}{\partial z}$
If the equations of motion could be solved exactly, the energy (and hence the density) would remain constant: E(x,z) = E(x’,z’). In practice, the equations of motions must be solved numerically (usually using so-called leapfrog integration) and the energy is not conserved.
Accepts x’ as the next sample with probability p_acc = min(1, exp{E(x,z) – E(x’,z’)}), and keeps x as the next sample with probability 1 – p_acc.
Repeats steps 2 through 4 until it has generated the desired number of samples.

To use HMC sampling, create a sampler using the hmcSampler function. After creating a sampler, you can compute MAP (maximum-a-posteriori) point estimates, tune the sampler, draw samples, and check convergence diagnostics. For an example of this workflow, see Bayesian Linear Regression Using Hamiltonian Monte Carlo.