Large Language Models (LLMs) are increasingly used for data annotation in NLP tasks, particularly for subjective tasks like hate speech detection where human annotation is expensive and potentially traumatic. However, the reliability of LLM-generated annotations, especially in the context of demographic biases and model explanations, remains understudied. This paper presents a comprehensive evaluation of LLM annotation reliability for sexism detection.
Using a Generalized Linear Mixed Model (GLMM) approach, we examine annotation variability in sexism detection tasks. Our findings reveal that demographic factors account for a minor fraction (8%) of the observed variance, with tweet content being the dominant factor. We also find that persona-based prompting approaches often fail to enhance, and sometimes degrade, performance compared to baseline models.
Through Explainable AI analysis, we demonstrate that model predictions rely heavily on content-specific tokens related to sexism, rather than correlates of demographic characteristics. This work recommends prioritizing content-driven explanations and robust annotation protocols over demographic persona simulation for achieving fairness in NLP systems.
A comprehensive framework for evaluating LLM annotation reliability that considers both accuracy and consistency across demographic groups.
Advanced statistical modeling to quantify the impact of demographic factors on LLM annotation quality and identify systematic biases.
Analysis of LLM-generated explanations quality and consistency, revealing discrepancies between stated reasoning and actual predictions.
Concrete recommendations for using LLMs in annotation tasks, including strategies for bias mitigation and quality control.
Analysis of sexism detection tasks using established datasets for examining annotation variability.
Testing generative AI as annotators, including persona-based prompting approaches compared against baseline models.
Generalized Linear Mixed Model (GLMM) to examine annotation variability and quantify the impact of different factors.
Explainable AI techniques to understand which features drive model predictions in sexism detection.
@inproceedings{mohammadi2025assessing,
title={Assessing the Reliability of LLMs Annotations in the Context of Demographic Bias and Model Explanation},
author={Mohammadi, Hadi and Shahedi, Tina and Mosteiro Romero, Pablo and Poesio, Massimo and Bagheri, Ayoub and Giachanou, Anastasia},
booktitle={Proceedings of the 6th Workshop on Gender Bias in Natural Language Processing (GeBNLP)},
year={2025},
organization={Association for Computational Linguistics}
}