Author Gender Classification from Blogs: A Comparative Analysis of N-Grams-Based Author Attribution Techniques
Haverford College. Department of Computer Science
Place of Publication
Table of Contents
This thesis explores n-grams-based gender classification analyses using various n-grams types, sizes, and feature sets. This study expanded on previous research by including a non-binary gender category. First, a state-of-the-art n-grams analysis using a simple dissimilarity measure was replicated, and peak accuracy reached 71%. Seeking to improve this result, a formal feature selection and extraction process was performed. This secondary analysis yielded lower peak accuracy of 61% overall, but non-binary and female-specific accuracy reached 99–100%. Both results are comparable to findings from previous research.