Deep Networks for Population Genetics Data Generation

Date
2020
Journal Title
Journal ISSN
Volume Title
Publisher
Producer
Director
Performer
Choreographer
Costume Designer
Music
Videographer
Lighting Designer
Set Designer
Crew Member
Funder
Rehearsal Director
Concert Coordinator
Moderator
Panelist
Alternative Title
Department
Haverford College. Department of Computer Science
Type
Thesis
Original Format
Running Time
File Format
Place of Publication
Date Span
Copyright Date
Award
Language
eng
Note
Table of Contents
Terms of Use
Rights Holder
Access Restrictions
Tri-College users only
Tripod URL
Identifier
Abstract
As the population genetic database such as 1000 Genomes Dataset (Consortium et al. 2015) grows in size every day, it becomes more and more challenging to understand the large flow of the genetic information. Recent works in population genetics involved with Machine Learning heavily rely on simulated data based on the collected real data of population genetics because machine learning is very promising when dealing with large scale data. The main motivation behind this is that Machine Learning can help researchers figure out the potential patterns that lead to the occurrences of evolution (e.g. natural selection, mutation, migration, etc.) and thus broaden the knowledge about how to maintain biodiversity within and between species. However, researchers always have a worry that the simulated data do not actually match the real data such that the predictions resulted from the trained Machine Learning model does not actually reflect and explain the evolutionary event for the real world. To circumvent this problem, a new population genetic data simulation framework by applying generative adversarial neural nets is introduced. Generative Adversarial Nets (GANs) is a framework for training generative models through adversarial process, in which the generative model and the discriminator model are trained simultaneously. During each training section, the generative model enhances its skill of simulating data from real data and tries to fool the discriminator model for believing the simulated data are real, whereas the discriminator model improves its judgment of distinguishing between the real data and the simulated data. In this work, we are going to examine the previous works of simulating real data with different approaches, the usage of GANs in generating data outside the field of population genetics, and how GANs can be applied to the real data of population genetics to generate data as close to genuine as real data.
Description
Citation
Collections