Author Gender Classification from Blogs: A Comparative Analysis of N-Grams-Based Author Attribution Techniques
Date
2019
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Producer
Director
Performer
Choreographer
Costume Designer
Music
Videographer
Lighting Designer
Set Designer
Crew Member
Funder
Rehearsal Director
Concert Coordinator
Advisor
Moderator
Panelist
Alternative Title
Department
Haverford College. Department of Computer Science
Type
Thesis
Original Format
Running Time
File Format
Place of Publication
Date Span
Copyright Date
Award
Language
eng
Note
Table of Contents
Terms of Use
Rights Holder
Access Restrictions
Open Access
Terms of Use
Tripod URL
Identifier
Abstract
This thesis explores n-grams-based gender classification analyses using various n-grams
types, sizes, and feature sets. This study expanded on previous research by including
a non-binary gender category. First, a state-of-the-art n-grams analysis using a simple
dissimilarity measure was replicated, and peak accuracy reached 71%. Seeking to
improve this result, a formal feature selection and extraction process was performed.
This secondary analysis yielded lower peak accuracy of 61% overall, but non-binary
and female-specific accuracy reached 99–100%. Both results are comparable to findings
from previous research.