Understanding Convolutional Neural Networks Applied to COVID-19
Haverford College. Department of Computer Science
Place of Publication
Table of Contents
Bi-College users only
With the great advantage of DNA sequencing at the end of the 20th century, researches are provided with larger and far more complicated genetic datasets to study. They tried to infer human evolutionary facts like natural selections, migrations and mutations from these complex data. However, traditional models that uses only a few aspects of data as summaries could have the possibility of ignoring important information in the large datasets today, and these methods usually involve likelihood calculations which require individual analysis for each problem, and are focusing on a single aspect of the data. To better utilize the advantage of these population-scale genetic datasets, we introduce a new way of analyzing genetic data using supervised Machine Learning, more specifically, Convolutional Neural Networks (CNN). In this literature review, we are going to look at two papers in depth, each with a different architecture of CNN. Both researches showed great potential of CNNs' application in genetics study, with comparable accuracies to traditional methods and great scalability to all kinds of related problems. However, as an emerging field of study, there are still problems and blanks awaiting people to answer and fill. First, even though statistics showed great capabilities of CNNs on genetic datasets, we still don't have a thorough understanding on what and how CNNs learn from these datasets. Also, current researches are generally focused on applying CNNs to simulated human evolutionary data. I propose that we could also apply CNNs to other species like virus with faster generation iterations, thus there would be realistic data for us to test and verify on. A good choice would be COVID-19, considering its wide spread across the world and the urgent need for development of COVID-19 vaccines.