Author Gender Classification from Blogs: A Comparative Analysis of N-Grams-Based Author Attribution Techniques

Date
2019
Journal Title
Journal ISSN
Volume Title
Publisher
Producer
Director
Performer
Choreographer
Costume Designer
Music
Videographer
Lighting Designer
Set Designer
Crew Member
Funder
Rehearsal Director
Concert Coordinator
Moderator
Panelist
Alternative Title
Department
Haverford College. Department of Computer Science
Type
Thesis
Original Format
Running Time
File Format
Place of Publication
Date Span
Copyright Date
Award
Language
eng
Note
Table of Contents
Terms of Use
Rights Holder
Access Restrictions
Open Access
Tripod URL
Identifier
Abstract
This thesis explores n-grams-based gender classification analyses using various n-grams types, sizes, and feature sets. This study expanded on previous research by including a non-binary gender category. First, a state-of-the-art n-grams analysis using a simple dissimilarity measure was replicated, and peak accuracy reached 71%. Seeking to improve this result, a formal feature selection and extraction process was performed. This secondary analysis yielded lower peak accuracy of 61% overall, but non-binary and female-specific accuracy reached 99–100%. Both results are comparable to findings from previous research.
Description
Citation
Collections