Improvement on the interpretability of convolutional neural network for population genetics
Haverford College. Department of Computer Science
Place of Publication
Table of Contents
Population genetics is a study of genetic variations in populations and evolutionary forces that explain these variations. Relevant studies are usually based on simulated genomic data in matrix form. Many existing methods, such as statistical likelihood inferencesand SVM, can only deal with the summary statistics of simulated matrices, which suffer a loss of information and reduced accuracy. CNNs, with their ability to process raw genomic data and stable performance, have outperformed the existing methods in solving many population genetic problems such as detecting natural selection. However, since the inner architecture of CNNs is complex, it is usually difficult for researchers to understand what their models are learning and why the models make certain decisions on the given inputs. To enhance the interpretability of CNN models, we look into two techniques that have been successfully applied in other fields, where the application of CNNs is more mature than population genetics. One technique is an intrinsically interpretable CNN design called SincNet in speech recognition, which utilizes band-pass filters to limit the number of learnable parameters and thereby improve interpretability. The other is a post-hoc interpretation technique known as saliency maps which visualize the importance of each input unit to the final decisions, and have been widely applied in computer vision and natural language processing. In the end, we propose two approaches to fit these two techniques accordingly into the studies of natural selection.