Learning Natural Selection with Convolutional Neural Networks
Haverford College. Department of Computer Science
Place of Publication
Table of Contents
Tri-College users only until 2022-01-01, afterwards Open Access.
Convolutional Neural Networks (CNNs) is one of the most efficient approaches to analyze population genetics data and draw conclusions on the species' evolution. Population genetics is one approach people take to learn about biological evolution on the genetic level. Genetic differences between populations encode loads of information; decoding all the information can be computationally costly. Fortunately, machine learning is known for its ability to compute large data sets, especially high-dimensional ones, efficiently and draw insightful conclusions. The key to applying CNN to genetic data is to treat the data (alignments of DNA sequences from different individuals in the same population) as images and identify patterns in the images. Modification has been made on the original CNN architecture to utilize unique properties of genetic data like exchangeability. The second half of this thesis is devoted to modifying the current CNN design to study the tomato. The process of plant domestication is often challenging andtakes a long time. Until now, the process still involves many unclear and under-explored intermediate stages with potentially important information about how domestication traits evolved. Our research is based on previous CNN designs that are already shown to work on population genetics data. Our model is adapted from the OnePop model by Sara Mathieson. Starting from some initial idea of the design, we experimented with different data-processing approaches and trained the model using various combinations of parameters.