Existing research on fairness evaluation of document classification models mainly use synthetic monolingual data without ground truth in author demographic attributes. In this work, we assemble and publish a multilingual Twitter corpus of hate speech detection task with inferred four author demographic factors: race/ethnicity, gender, age and country. The corpus covers five languages, English, Italian, Polish, Portuguese and Spanish. We evaluate the inferred demographic labels by a crowdsourcing platform, Figure Eight. We measure performance of four popular document classifiers and evaluate the fairness and bias of the baseline classifiers on the author-level demographic attributes.
Learn More