Browsing by Author "Kumar, Deepak"
Now showing 1 - 8 of 8
Results Per Page
Sort Options
- ItemAuthor Gender Classification from Blogs: A Comparative Analysis of N-Grams-Based Author Attribution Techniques(2019) Kay, Sujin; Kumar, DeepakThis thesis explores n-grams-based gender classification analyses using various n-grams types, sizes, and feature sets. This study expanded on previous research by including a non-binary gender category. First, a state-of-the-art n-grams analysis using a simple dissimilarity measure was replicated, and peak accuracy reached 71%. Seeking to improve this result, a formal feature selection and extraction process was performed. This secondary analysis yielded lower peak accuracy of 61% overall, but non-binary and female-specific accuracy reached 99–100%. Both results are comparable to findings from previous research.
- ItemExploring the Role of Emojis in Tweets for Authorship Attribution(2019) Ellison, Kennedy; Chandlee, Jane; Kumar, DeepakAuthorship attribution research has long focused primarily on determining authorship of books or other large texts (Mosteller and L. Wallace, 1963; Gamon, 2004). Only recently have scholars turned to using authorship attribution on short texts or tweets (Eder, 2010; Schwartz et aI., 2013; Mikros and Perifanos, 2013). This research explores whether emojis are a useful linguistic feature for authorship attribution of tweets because of the rise of emoji use. An emoji rich dataset was created since none existed at the time of this research. A Naive Bayes classifier was used as the authorship attribution model. The baseline feature set consisting of commonly used authorship attribution features was augmented with emoji rich features to perform authorship attribution of tweets. My results show that targeting emojis in the feature set prompts a percent increase of at least 30% (raising the accuracy from 65% to 85%).
- ItemFoundations and Applications of Authorship Attribution Analysis(2019) Shukla, Rohan; Kumar, DeepakAuthorship attribution is the process of identifying the author of a given work. This thesis surveys the history and foundations of authorship attribution, and then analyzes multiple machine learning methods that are used frequently in this field. In the classic authorship attribution problem, a text with unknown authorship is assigned an author from a set of candidate authors for whom documents of irrefutable authorship exist. Prior to the 1960’s, authorship attribution was a linguistics-focused field in which linguistic experts would determine the authors of unknown texts. In 1964, the analysis of ‘The Federalist Papers’ by Mosteller and Wallace was the first statistically driven approach to authorship attribution. This study marked the beginning of authorship attribution as a computational field rather than a linguistics field. The modern approach to authorship attribution involves selecting a set of linguistic features from the texts at hand and then applying a machine learning method on that feature set to classify authorship. This thesis analyzes multiple machine learning methods used for this purpose. Principal Components Analysis (PCA) is a popular unsupervised learning method that considers each text’s feature set as a vector in a multivariate vector space and has had success in authorship attribution. Support Vector Machines (SVMs) are a powerful supervised learning technique that creates a linear classifier used to attribute authorship. SVMs have outperformed all other analytical techniques used in authorship attribution. Due to the plethora of electronic texts that exist, authorship attribution has extensive applications in many different fields, with current research focusing primarily on developing application-specific methodologies.
- ItemNatural Language Interaction with Robots(2007) Walker, Alden; Dougherty, John P.; Kumar, DeepakNatural language communication with robots has obvious uses in almost all areas of life. Computer-based natural language interaction is an active area of research in Computational Linguistics and AI. While there have been several NL systems built for specific computer applications, NL interaction with robots remains largely unexplored. Our research focuses on implementing a natural language interpreter for commands and queries given to a small mobile robot. Our goal is to implement a complete system for natural language understanding in this domain, and as such consists of two main parts: a system for parsing the subset of English our robot is to understand and a semantic analyzer used to extract meaning from the natural language. By using such a system we will be able to demonstrate that a mobile robot is capable of understanding NL commands and queries and responding to them appropriately.
- ItemNatural Language Processing and Translation using Augmented Transition Networks and Semantic Networks(2003) Ramos, Juan; Kumar, DeepakThe problem of computers understanding and communicating with humans using natural languages such as English is a complicated task with many details to examine and explore. The goal of this project, then, is to examine some of the established data structures and methods used to enable computers to understand and generate natural language. In an attempt to contribute some original material, we will also consider how a computer might be able to translate sentences between English and Spanish. The techniques covered in this paper are well-established data structures and methods for parsing and generating natural language sentences. In particular, we will pay close attention to the augmented transition network model (ATN) and semantic networks. The ATN data structure is a powerful mechanism for interpreting natural language constructs, most notably due to its ability to both parse and generate language with a single network. Extending the ATN structure slightly will also allow for our goal of language translation. The semantic network model will assist in this endeavor by representing the input data as a network of entity nodes connected by labeled arcs that represent the relationship between nodes. This model abstracts the input into a form independent from the source and target languages, facilitating the task of translation immensely. Finally, we will provide a demonstration of how SNePS, a LISP-based system that incorporates ATNs and semantic networks, translates a simple set of sentences using the techniques described.
- ItemSentiment Analysis of Egyptian Arabic in Social Media(2014) Abdalkader, Mohamed; Kumar, Deepak; Darwish, ManarSentiment analysis is an emerging area of application fueled by the increase of public participation in online social media. Much work has been done on sentiment analysis in English while less work has been done on other languages like Mandarin and Arabic. Arabic is spoken by hundreds of millions of people in over twenty countries. Modern Standard Arabic (MSA) is used online mostly by newspapers and other official sources. However, social media and blogs used by individuals are typically in Dialect Arabic (DA). My Senior Thesis work has been focused on exploring ways to increase the accuracy of automated sentiment analysis in Egyptian Arabic through using the specific features of Arabic. I found that the baseline algorithm makes the most mistakes in classifying tweets that carry a sentiment as neutral tweets. Using Minimum Edit Distance (MED) and ISRI Arabic stemmer, I was able to decrease the error of the baseline algorithm by 31% without having to add any new entries to the lexicon. My approach has allowed me to not only get over the challenge of different morphological forms but also misspelling and informal writing. While I cannot empirically compare it to results by other authors as I am using a different data set, my approach reaches an accuracy of 78% which has an improvement of 14.7% over the baseline.
- ItemThesis Shmesis: Representing Reduplication with Directed Graphs(2004) Coleman, Jason; Raimy, Eric; Kumar, DeepakThis thesis studies the linguistic phenomena of reduplication. I will address two problems involving reduplication: the generation of reduplicative word forms and the recognition of reduplicative patterns. I will show how to represent reduplication using an augmented directed graph. The standard digraph is augmented with additional properties. I will then restate the problems of generation and recognition in terms of the reduplication graphs. While at first glance reduplication appears to be something that could only be of interest to linguists, after some formalization, the problem becomes one which is of interest to computer scientists as well.
- ItemUsing Adaptive Learning Algorithms to Make Complex Strategic Decisions(2011) Seralathan, Ashanthi Meena; Blank, Douglas; Kumar, Deepak; Lindell, StevenTraditionally, artificial intelligence (AI) algorithms have not been built on particularly adaptive principles. Systems were created using complex collections of rules that were created specifically for the purpose at hand, and whose flexibility was wholly dependent on what flexibility the programmer incorporated within the rules. As a result, this thesis examines many different algorithms for decision-making, particularly for playing chess. It surveys a number of different techniques for creating a chess-playing system, and finally begins an altered implementation on the genetic algorithm-inspired algorithm that uses Population Dynamics to train a system to understand how to rank board states in a game of chess, which includes more genes than the original algorithm. While still a work in progress, the process of creating the system has already demonstrated some advantages over other algorithms for learning evaluation functions for chess (such as the flexibility of the algorithm), and further work could lead to interesting insight on whether a chess system built using a modified version of Population Dynamics can lead to a system whose skill is comparable to the likes of other chess systems, or even to human players.