Indirect Discrimination in the Age of Big Data

Date
2016
Journal Title
Journal ISSN
Volume Title
Publisher
Producer
Director
Performer
Choreographer
Costume Designer
Music
Videographer
Lighting Designer
Set Designer
Crew Member
Funder
Rehearsal Director
Concert Coordinator
Moderator
Panelist
Alternative Title
Department
Haverford College. Department of Economics
Haverford College. Department of Computer Science
Type
Thesis
Original Format
Running Time
File Format
Place of Publication
Date Span
Copyright Date
Award
Language
eng
Note
Table of Contents
Terms of Use
Rights Holder
Access Restrictions
Tri-College users only
Tripod URL
Identifier
Abstract
Rapid advancements in the use of big data to make automated decisions may result in indirect discrimination. For example, Larson et al. (2015) find that the Princeton Review charges different prices by zip code resulting in disproportionately high prices in predominantly Asian zip codes. In an economic sense, this "indirect discrimination" has no inherent unfairness as we expect the firm to maximize pro fits by exploiting consumers' willingness to pay, even if the outcome disproportionately affects certain groups. However, Larson et al. and the press coverage of their findings suggest that Princeton Review prices exhibit an unfair level of indirect discrimination. My regression analysis finds that the level of indirect discrimination on the Asian population in the Princeton Review Prices is not significantly different than that of retail gasoline prices, even though gasoline price variation is not typically perceived as unfair. I use behavioral economic theory to explain this result. In the second part of this thesis, I further develop and apply a technique that serves to both remove indirect discrimination and identify the most important features in machine learning models of automated decisions. I use the technique to further explore the differences between Princeton Review prices and retail gasoline prices. I also contribute an analysis of the technique on recidivism data, in which all the features are categorical, and on sexual offender data, in which surname and address are used to predict race to simulate the problem of hidden data.
Description
Subjects
Citation