Documentation

### This is machine translation

Mouseover text to see original. Click the button below to return to the English version of the page.

Note: This page has been translated by MathWorks. Click here to see
To view all translated materials including this page, select Country from the country navigator on the bottom of this page.

## Gaussian Mixture Models

Gaussian mixture models (GMM) are composed of k multivariate normal density components, where k is a positive integer. Each component has a d-dimensional mean (d is a positive integer), d-by-d covariance matrix, and a mixing proportion. Mixing proportion j determines the proportion of the population composed by component j, j = 1,...,k.

You can create a Gaussian mixture distribution object `gmdistribution` using `gmdistribution` or `fitgmdist`. Use `gmdistribution` to create a fully specified GMM object by specifying the component means, covariances, and mixture proportions. Use `fitgmdist` to fit a GMM object to an n-by-d matrix of the data X by specifying the number of mixture components k. The columns of X correspond to the predictors, features, or attributes. The rows of X correspond to the observations or examples. By default, `fitgmdist` fits full covariance matrices that are different among components (or unshared).

`fitgmdist` fits GMMs to data using the iterative Expectation-Maximization (EM) algorithm. Using initial values for component means, covariance matrices, and mixing proportions, the EM algorithm proceeds using these steps.

1. For each observation, the algorithm computes posterior probabilities of component memberships. You can think of the result as an n-by-k matrix, where element (i,j) contains the posterior probability that observation i is from component j. This is the E-step of the EM algorithm.

2. Using the component-membership posterior probabilities as weights, the algorithm estimates the component means, covariance matrices, and mixing proportions by applying maximum likelihood. This is the M-step of the EM algorithm.

The algorithm iterates over these steps until convergence. The likelihood surface is complex, and the algorithm might converge to a local optimum. Also, the resulting local optimum might depend on the initial conditions. `fitgmdist` has several options for choosing initial conditions, including random component assignments for the observations and the k-means ++ algorithm.

`fitgmdist` returns a fitted `gmdistribution` model object. The object contains properties that store the estimation results, which include the estimated parameters, convergence information, and information criteria (Akaike and Bayesian information criteria). You can use dot notation to access the properties.

Once you have a fitted GMM, you can cluster query data using it. Clustering using GMM is sometimes considered a soft clustering method. The posterior probabilities for each point indicate that each data point has some probability of belonging to each cluster. For more information on clustering with GMM, see Cluster Using Gaussian Mixture Models.

Download ebook