Learning Natural Selection with Convolutional Neural Networks

Date
2021
Journal Title
Journal ISSN
Volume Title
Publisher
Producer
Director
Performer
Choreographer
Costume Designer
Music
Videographer
Lighting Designer
Set Designer
Crew Member
Funder
Rehearsal Director
Concert Coordinator
Moderator
Panelist
Alternative Title
Department
Haverford College. Department of Computer Science
Type
Thesis
Original Format
Running Time
File Format
Place of Publication
Date Span
Copyright Date
Award
Language
eng
Note
Table of Contents
Terms of Use
Rights Holder
Access Restrictions
Tri-College users only until 2022-01-01, afterwards Open Access.
Tripod URL
Identifier
Abstract
Convolutional Neural Networks (CNNs) is one of the most efficient approaches to analyze population genetics data and draw conclusions on the species' evolution. Population genetics is one approach people take to learn about biological evolution on the genetic level. Genetic differences between populations encode loads of information; decoding all the information can be computationally costly. Fortunately, machine learning is known for its ability to compute large data sets, especially high-dimensional ones, efficiently and draw insightful conclusions. The key to applying CNN to genetic data is to treat the data (alignments of DNA sequences from different individuals in the same population) as images and identify patterns in the images. Modification has been made on the original CNN architecture to utilize unique properties of genetic data like exchangeability. The second half of this thesis is devoted to modifying the current CNN design to study the tomato. The process of plant domestication is often challenging andtakes a long time. Until now, the process still involves many unclear and under-explored intermediate stages with potentially important information about how domestication traits evolved. Our research is based on previous CNN designs that are already shown to work on population genetics data. Our model is adapted from the OnePop model by Sara Mathieson. Starting from some initial idea of the design, we experimented with different data-processing approaches and trained the model using various combinations of parameters.
Description
Subjects
Citation
Collections