Abstract:
In this thesis, I study an approach inspired by game theory to explain how features in the input data affect the output of machine learning models without access to the models' internal working. This approach assigns a number called Shapley value to each feature to indicate its contribution to the model output, but has a few limitations and uncertainty. I investigate three sources of uncertainty of Shapley value and respective methods to quantify the uncertainty, using Shapley Residuals to capture missing information in the game theory representation, Mean-Standard-Error to quantify the sampling error in Shapley value estimation, and Bayesian SHAP to calculate the statistical variations in SHAP explanation model. I aim to investigate and decompose the cause for each type of error and evaluate their combined effect on the trust-worthiness of Shapley explanations for real-life models. My goal is to make machine learning models more interpretable to humans so we can gain meaningful knowledge from them.