Abstract:
To explain black-box machine learning models, one popular method is to calculate the feature importance using Shapley value, a solution to distribute the payoff among the players in a coalition game. Though popular, Shapley-based feature importance methods suffer from interpretation issues when there are correlated features and out-of-distribution perturbed data points. Kumar et al. 2020b introduces Shapley residuals to quantify the information lost during the computation of Shapley values, e.g. feature interaction in the model; meanwhile, Slack et al. 2020b introduces Bayesian SHAP to calculate the variance of the Shapley values. With these uncertainty measures, one important question arises: how do practitioners interpret the Shapley values and does that interpretation match the underlying machine learning model? This thesis aims to answer that question and contribute to the literature by (1) introducing a linear mental model that represents how practitioners interpret Shapley values, and (2) introducing deviation as a way to measure how much this mental model deviates from the machine learning model. Our experiments show that this deviation measure is related to Shapley values and sampling variance, but also captures another aspect of the model: its nonlinearity.