Sexism, a form of discrimination based on gender, is increasingly prevalent on social media platforms, where it often manifests as hate speech targeted at individuals or groups based on their gender. While machine learning models can detect such content, their "black box" nature obscures their decision-making processes, making it difficult for users to understand why certain posts are flagged as sexist.
This paper addresses the critical need for transparency in automated sexism detection by proposing a new approach that combines multiple BERT architectures with a CNN framework. By integrating explainability techniques like LIME and SHAP, our approach provides valuable insights into model behavior, revealing which words and phrases most strongly indicate sexist content.
Through evaluation on the EXIST 2021 dataset, we demonstrate that SHAP values show clear correlation between Sexism Scores and model performance. Texts with higher Sexism Scores are more reliably identified as sexist, highlighting the efficacy of our explainability-driven approach that moves beyond binary classification to provide deeper understanding of the detection process.
A new approach combining multiple BERT architectures with a CNN framework, integrating explainability techniques (LIME, SHAP, attention visualization) for sexism detection.
Definition of Sexism Scores based on both model predictions and explainability, moving beyond binary classification to provide deeper understanding of the detection process.
Demonstration that SHAP values correlate with model performance, with texts having higher Sexism Scores being more reliably identified as sexist.
A fully implemented system with code and guidelines for deployment, making our approach accessible to practitioners and researchers.
Text cleaning, normalization, and feature extraction tailored for social media content.
Training multiple classifiers with different architectures to compare performance and explainability trade-offs.
Applying LIME for local explanations and SHAP for global feature importance analysis.
Creating intuitive visualizations that highlight important words and their contribution to the prediction.
@article{mohammadi2024transparent,
title={A Transparent Pipeline for Identifying Sexism in Social Media: Combining Explainability with Model Prediction},
author={Mohammadi, Hadi and Giachanou, Anastasia and Bagheri, Ayoub},
journal={Applied Sciences},
volume={14},
number={19},
pages={8620},
year={2024},
publisher={MDPI}
}