Machine learning models are often referred to as “black box” because their representations of knowledge are not intuitive, and, as a result, it’s difficult to understand how they work. Interpretable machine learning refers to techniques that overcome the black-box nature of most machine learning algorithms. By revealing how various features contribute (or do not contribute) to predictions, you can validate that the model is using the right evidence for its predictions and find model biases that were not apparent during training.
Practitioners seek model interpretability primarily for three reasons:
- Guidelines: “Black box” models violate many corporate technology best practices and personal preference.
- Validation: It’s valuable to understand where or why predictions go wrong and run “what-if” scenarios to improve model robustness and eliminate bias.
- Regulations: Model interpretability is required to comply with government regulations for sensitive applications, such as in finance, public health, and transportation.
Interpretable machine learning addresses these concerns and increases trust in the models in situations where explanations for predictions are important or required by regulation.
Interpretable machine learning works on three levels:
Local: Explaining the factors behind an individual prediction such as why a loan application was rejected
Cohort: Demonstrating how a model makes predictions for a specific population or group within a training or test data set such as why a group of manufactured products were classified as faulty
Global: Understanding how a machine learning model works over an entire training or test data set such as which factors are considered by a model classifying radiology images
Some machine learning models, such as linear regression and decision trees, are inherently interpretable. However, interpretability often comes at the expense of power and accuracy.
Using MATLAB® for machine learning, you can apply techniques to interpret and explain most popular and highly accurate machine learning models that aren’t inherently interpretable.
Local Interpretable Model-Agnostic Explanations (LIME): Approximate a complex model in the neighborhood of the prediction of interest with a simple interpretable model, such as a linear model or decision tree, and use it as a surrogate to explain how the original (complex) model works. Figure 2 below illustrates the three main steps of applying LIME.
Figure 2: How to obtain Local Interpretable Model-Agnostic Explanations (LIME).
Partial Dependence and Individual Conditional Expectation Plots: Examine the effect of one or two predictors on the overall prediction by averaging the output of the model over all the possible feature values
You can use MATLAB for other popular interpretability methods, including:
- Permuted Predictor Importance: Look at a model prediction error on a test or training data set and shuffle the values of a predictor. The magnitude of the changes in error from shuffling the values of the predictor correspond to the predictor’s importance.
- Shapley Value: Derived from cooperative game theory, the Shapley value is the average marginal contribution of a specific feature over all possible “coalitions” i.e., combinations of features. Evaluating all feature combinations generally takes a long time, therefore in practice Shapley values are approximated applying Monte Carlo simulation.
Local | Cohort | Global | |
What's explained: | Individual prediction | Model behavior on subset of population | Model behavior "anywhere" |
Use cases | When individual prediction goes wrong Prediction seems counter-intuitive What-if analysis |
Protection against bias Validate outcome for a particular group |
Demonstrate how the model works Compare different models for deployment |
Applicable interpretability methods | LIME Local decision tree Shapely value |
Global methods on subset of data | PDP/ICE Global decision tree Feature importance |
Interpretability methods have their own limitations. A best practice is to be aware of those limitations as you fit these algorithms to the various use cases. Interpretability tools help you understand why a machine learning model makes the predictions that it does, which is a key part of verifying and validating applications of AI. Certification bodies are currently working on a framework for certifying AI for sensitive applications such as autonomous transportation and medicine.