Browsing by Subject "Computational linguistics"
Now showing 1 - 6 of 6
Results Per Page
Sort Options
- ItemAuthor Gender Classification from Blogs: A Comparative Analysis of N-Grams-Based Author Attribution Techniques(2019) Kay, Sujin; Kumar, DeepakThis thesis explores n-grams-based gender classification analyses using various n-grams types, sizes, and feature sets. This study expanded on previous research by including a non-binary gender category. First, a state-of-the-art n-grams analysis using a simple dissimilarity measure was replicated, and peak accuracy reached 71%. Seeking to improve this result, a formal feature selection and extraction process was performed. This secondary analysis yielded lower peak accuracy of 61% overall, but non-binary and female-specific accuracy reached 99–100%. Both results are comparable to findings from previous research.
- ItemBuilding a Linguistics based Loss Function for Dialogue Generation(2020) St. Clair, Jack; Chandlee, JaneThis paper will investigate different loss functions used for various natural language processing (NLP) machine learning tasks. These loss functions have proven their worth in the area of machine translation but they have been shown to be inadequate for the task of dialogue generation. Thus, this paper proposes some potential additions to these loss functions that add more linguistic information with the goal of improving dialogue generation to get machine learning algorithms closer to creating human like dialogue.
- ItemDeclaration, Childhood Understanding, and the Contents of Natural Language(2013) Duncan, Robin; Napoli, Donna Jo, 1948-We may reasonably expect that children as young as three years old understand the difference between socially constructed facts and bare facts of nature. This assumption is reasonable because children at this age do understand rules for correct performance of speech acts and the scope of normative rules. Declarations, as defined by John Searle, constitute a class of speech acts which bears aptly on socially constructed and not bare facts, thus, experiments that demonstrate childhood understanding of the rules governing apt use of declarations could be taken to demonstrate an understanding of the distinction between the two types of facts.
- ItemGrapheme to Phoneme Conversion: Using Input Strictly Local Finite State Transducers(2019) Morgan, Gregory M.; Chandlee, JaneThis thesis explores the many methods of Grapheme to Phoneme Conversion (G2P) including dictionary look-up, rule-based approaches, and probabilistic approaches such as Joint Sequence Models (JSM), Recurrent Neural Networks (RNN), and weighted finite state automata (WFST) as well as a discussion of letter to phoneme alignments methods. We then explain Strictly Local languages and functions and their previous applications in an Input Strictly Local FST Learning Algorithm. Finally, I propose a further application for G2P conversion by adapting the Input Strictly Local FST Learning Algorithm. My results indicate that while this algorithm had some success learning G2P, future work will be necessary to improve accuracy by implementing a probabilistic model.
- ItemSentiment Analysis of Egyptian Arabic in Social Media(2014) Abdalkader, Mohamed; Kumar, Deepak; Darwish, ManarSentiment analysis is an emerging area of application fueled by the increase of public participation in online social media. Much work has been done on sentiment analysis in English while less work has been done on other languages like Mandarin and Arabic. Arabic is spoken by hundreds of millions of people in over twenty countries. Modern Standard Arabic (MSA) is used online mostly by newspapers and other official sources. However, social media and blogs used by individuals are typically in Dialect Arabic (DA). My Senior Thesis work has been focused on exploring ways to increase the accuracy of automated sentiment analysis in Egyptian Arabic through using the specific features of Arabic. I found that the baseline algorithm makes the most mistakes in classifying tweets that carry a sentiment as neutral tweets. Using Minimum Edit Distance (MED) and ISRI Arabic stemmer, I was able to decrease the error of the baseline algorithm by 31% without having to add any new entries to the lexicon. My approach has allowed me to not only get over the challenge of different morphological forms but also misspelling and informal writing. While I cannot empirically compare it to results by other authors as I am using a different data set, my approach reaches an accuracy of 78% which has an improvement of 14.7% over the baseline.
- ItemThe Effect of Linguistic Typology on Transfer Learning of Morphology(2020) Roe, Conor Stuart; Chandlee, JaneIn the SIGMORPHON 2019 shared task 1, multiple teams attempted for the first time to leverage transfer learning to build more accurate models of natural language morphology with small amounts of target language data, with the intended goal of boosting modeling resources for low-resource languages. It was found that transfer learning could aid the development of computational models for low resource languages and that transfer learning was most effective between genealogically related languages. This study expounds on those findings by testing a much larger number of unrelated language pairs, systematically comparing two model architectures, and examining relationships between model performance and selected linguistic typological similarities of source and target languages. It was found that transfer learning can still afford substantial benefits when source and target language are unrelated, and that transfer learning is most beneficial when source and target language have similar sets of morphologically inflected categories and similar patterns of fusion between those categories, while similarities in inflection shape are not predictive of transfer learning efficacy. This information can be used to select source languages when leveraging transfer learning to improve computational resources for low-resource target languages, especially those without closely related high-resource languages.