File Exchange

## Multiple Correspondence Analysis Based on the Burt Matrix.

version 1.3.0.0 (10.3 KB) by Antonio Trujillo-Ortiz

### Antonio Trujillo-Ortiz (view profile)

multiple correspondence analysis, correspondence analysis, categorical analysis, graphical procedure

Updated 03 Jan 2009

Editor's Note: Popular File 2009

Statistics fundamentals of the Correspondence Analysis (CA) is presented in the CORRAN and MCORRAN1 m-files you can find in this FEX author''s page. CA can be extended to more than two categorical variables, called Multiple Correspondence Analysis (MCA). CA and MCA are graphical techniques for representing the information in a two-way or higher-order multiway contingency table. They contain the counts (frequencies) of items for a cross-classifications of the categorical variables (Rencher, 2000).

Karl Pearson (1913) developed the antecedent of CA used by Procter&Gamble (Horst 1935). R.A. Fisher (1940) named the approach 'reciprocal averaging' because is reciprocally averages row and column percents in table data until they are reconciled. Since reciprocal averaging was inefficient, Europeans such as Mosaier (1946) and Benzecri (1969) related table data with computer programs for principal component (factor) analysis. Burt (1953) developed MCA (homogeneity analysis) of a binary indicator.

Here, MCA is applied to the Burt matrix (B), the matrix of all two-way cross-tabulations of the categorical variables. The Burt matrix has a square block on the diagonal for each variable (the frequencies for the categories in the corresponding variable) and a rectangular block off-diagonal for each pair of variables (a two-way contingency table for the corresponding pair of variables). In the dual eigenanalysis or
Singular Value Decomposition (SVD) we get the squares of the singular values, or principal inertias.

The so-called 'percentage of inertia problem' can be improved by using adjusted inertias procedure or eigenvalue correction. The adjusted inertias are calculated only for each singular value that satisfies the inequality >= 1/number of variables. They are expressed as a percentage of the average off-diagonal inertia, which can be calculated either by direct calculation on the off-diagonal tables in the Burt matrix. The adjusted solution not only does it considerably improve the measure of fit, but it also removes the inconsistency about the Burt matrix to analyse. This inconsistency is due to artificial dimensions added because one categorical variable is coded with several columns. As a consequence, the inertia (i.e., variance) of the solution space is artificially inflated and therefore the percentage of inertia explained by the first dimension is severely underestimated.

A complete statistics fundamentals explanation is found on Greenacre (2006).

A MCA yields only rows or columns coordinates and each point represents a category (attribute) of one of the variables.

Syntax: function mcorran2(X)

Input:
X - Data matrix=Burt matrix. Size: categorical variables x categorical variables (>2).

Outputs:
Complete Multiple Correspondence Analysis
The adjusted inertias table is given by default
Pair-wise Dimensions Plots. For the vertical and horizonal lines we use the hline.m and vline.m files kindly published on FEX by Brandon Kuczenski [http://www.mathworks.com/matlabcentral/fileexchange/1039]. For connecting lines to the originwe use the plot2org published on FEX by Jos [http://www.mathworks.com/matlabcentral/fileexchange/11337]