Learning Hierarchical Structure with LSTMs
Haverford College. Department of Computer Science
Place of Publication
Table of Contents
Recurrent Neural Networks (RNNs) are successful at modeling languages because of their ability to recognize patterns over an undefined input length using their internal memory. However, the data kept in their memory decays over time due to a problem called vanishing gradients. Long Short-Term Memory (LSTM) units mitigate this problem with forget gates which help reserve its memory for only important data. This model has thus become very popular in natural language processing (NLP), because they are able to model context. Compared to ealier models used in NLP, LSTMs excel at modeling a language modeling. However, some aspects of their success in the field have surprised researchers. Their apparent ability to model syntax suggests that they use mechanisms of learning which we do not yet fully understand. Research has been done on LSTMs and language syntax, in an effort to potentially further the field. Yet, an exhaustive account of how the inside an LSTM works and what needs improving has yet to be compiled. Here, we hope to use previous research and some final experiments to provide a clear picture of how LSTMs model hierarchical syntax.