Browsing by Author "Washington, Jonathan"
Now showing 1 - 8 of 8
Results Per Page
- ItemescaLATING CAPITALIZATION: Tumblr Bloggers' Use of Multiple Letter Cases Within a Single Word(2019) Bienz, Sierra; Fernald, Theodore B.; Washington, Jonathan
- ItemIncorporating Phonological Knowledge into a Computational Model for Language Family Homeland Identification(2018) Shen, Ziting; Washington, JonathanSeveral computational models have been proposed which hypothesize the geographical homeland of a language family in a quantitative manner. Aiming at identifying the homeland accurately for the world's language families, especially those that have not been well-studied, we examine and propose modifications to one of these models, the ASJP model. Specifically, we incorporate more phonological information in the linguistic distance measurement. In the original ASJP model, the linguistic distance is calculated through Levenshtein distance. In the modified model, we apply a technique similar to the ALINE algorithm to assign weights to the feature changes in the Levenshtein distance calculation. The weights are chosen based on a priori knowledge about frequencies of types of phonological change. The model will be tested on the Indo-European family in the future, and the results will be compared to current major Indo-European homeland thecries, i.e. the Steppe Theory and the Anatolian Hypothesis.
- ItemNivkh and Sakha Language Ideologies: Their Causes and What They Mean for Language Revitalization(2022) Charter, Dylan; Washington, JonathanThis thesis explores existing research into Nivkh and Sakha language ideologies in order to elucidate the need for investigations into the causes of language ideologies in future research as well as the importance of incorporating youth and rural perspectives into such research. It is argued that the causes of ideologies must be understood in order to design and implement effective language maintenance and revitalization programs. I then present the findings of my own language ideology interviews conducted with young Sakha and Nivkh consultants and lay out five major factors that help shape their ideologies. I conclude that Nivkh and Sakha revitalization programs should be determined by the communities themselves and will likely be most successful if they aim for bilingualism and highlight personal and spiritual factors for studying the indigenous language.
- ItemThe Problem With Mumble Rap: Stigmatization of Variant Production in Contemporary Mainstream Hip-Hop(2018) Abraham, Elsher; Washington, JonathanIn this thesis I will argue that the term "Mumble Rap" fails to function as an accurate descriptor ofa new generation of mainstream American hip-hop artists, instead being used to mainly disparage its artists, sounds, and ideologies. In doing so, I will attempt to refute the unfair criticisms of those that do not care for this new wave of hip-hop. I will show that Mumble Rap is not used to describe any sort of linguistic property and that this perception of so-called mumbling is simply a phonetic phenomenon that is fairly common throughout any given language. By using language stigmatization models, explanations for the misguided usage of the term and the criticisms of naysayers will be offered. Additionally, I will question the importance of intelligibility in hip-hop music by offering different means of extracting semantic value from an utterance. Ultimately, the unfair stigmatization of these artists will be made clear.
- ItemTokenization of Japanese Text: Using a Morphological Transducer(2018) Hanlon, Clare; Washington, JonathanWord segmenters comprise a vital step in the methodology of natural language processing. In languages such as English, which already necessitate word delimiters such as spaces, this task is trivial. However, in non-segmented languages such as Japanese and Chinese, a translator must accurately identify every word in a sentence before or as they attempt to parse it, and to do that requires a method of finding word boundaries without the aid of word delimiters. Much has been done in this field for the case of Chinese, as Chinese is a highly isolating language which makes the task of identifying morphological units almost isomorphic to the task of identifying syntactic units. As such, many functional Chinese Word Segmenter models already exist. But 1 Japanese, on the other hand, is a synthetic language that utilizes both inflectional and agglutinative morphology, and so the tasks of identifying morphological units and syntactic units are more separate. However, much work has also been done in the field of mapping inflected Japanese words to their root form, a process known as transduction. In this paper, I modify an existing Chinese Word Segmenter to incorporate an existing Japanese transducer into its segmentation process: specifically, the transducer's ability to detect the validity of a combination of characters is used in parallel with dynamic programming's ability to compute all possible combinations of characters in a string to find the overall number of valid tokens in a given input string. Testing shows that this approach does indeed give valid results; furthermore, its ability to return information about the grammatical tags of each token suggests that further extensions of the program could not only tokenize the text, but also infer information about its syntactic meaning in the clause.
- ItemTowards Effective Machine Translation For A Low-Resource Agglutinative Language: Karachay-Balkar(2022) Rice, Enora; Washington, Jonathan; Grissom, AlvinNeural machine translation (NMT) is often heralded as the most effective approach to machine translation due to its success on language pairs with large parallel corpora. However, neural methods produce less than ideal results on low-resource languages when their performance is evaluated using accuracy metrics like the Bilingual Evaluation Understudy (BLEU) score. One alternative to NMT is rule-based machine translation (RBMT), but it too has drawbacks. Furthermore, little research has been done to compare the two approaches on criteria beyond their respective accuracies. This thesis evaluates RBMT and NMT systems holistically based on efficacy, ethicality, and utility to low-resource language communities. Using the language Karachay-Balkar as a case-study, the latter half of this thesis investigates how two free and open-source machine translation packages, Apertium (rule-based) and JoeyNMT (neural), might support community-driven machine translation development. While neither platform is found to be ideal, this thesis finds that the Apertium is more conducive to a community driven machine translation development process than JoeyNMT when evaluated on the criteria of efficiency, accessibility, ease of deployment, and interpretability.