A Transparent Pipeline for Identifying Sexism in Social Media

Abstract

Sexism, a form of discrimination based on gender, is increasingly prevalent on social media platforms, where it often manifests as hate speech targeted at individuals or groups based on their gender. While machine learning models can detect such content, their "black box" nature obscures their decision-making processes, making it difficult for users to understand why certain posts are flagged as sexist.

This paper addresses the critical need for transparency in automated sexism detection by proposing a new approach that combines multiple BERT architectures with a CNN framework. By integrating explainability techniques like LIME and SHAP, our approach provides valuable insights into model behavior, revealing which words and phrases most strongly indicate sexist content.

Through evaluation on the EXIST 2021 dataset, we demonstrate that SHAP values show clear correlation between Sexism Scores and model performance. Texts with higher Sexism Scores are more reliably identified as sexist, highlighting the efficacy of our explainability-driven approach that moves beyond binary classification to provide deeper understanding of the detection process.

Key Contributions

Transparent Pipeline

A new approach combining multiple BERT architectures with a CNN framework, integrating explainability techniques (LIME, SHAP, attention visualization) for sexism detection.

Sexism Score Framework

Definition of Sexism Scores based on both model predictions and explainability, moving beyond binary classification to provide deeper understanding of the detection process.

Explainability Validation

Demonstration that SHAP values correlate with model performance, with texts having higher Sexism Scores being more reliably identified as sexist.

Practical Implementation

A fully implemented system with code and guidelines for deployment, making our approach accessible to practitioners and researchers.

Methodology

Data Preprocessing

Text cleaning, normalization, and feature extraction tailored for social media content.

Model Training

Training multiple classifiers with different architectures to compare performance and explainability trade-offs.

Explainability Generation

Applying LIME for local explanations and SHAP for global feature importance analysis.

Explanation Visualization

Creating intuitive visualizations that highlight important words and their contribution to the prediction.

Explainability Techniques

LIME (Local Interpretable Model-agnostic Explanations)

Provides instance-level explanations by approximating the model locally.

SHAP (SHapley Additive exPlanations)

Offers both local and global explanations based on game theory principles.

Attention Visualization

For transformer-based models, we visualize attention weights to understand model focus.

Results

The approach combines multiple BERT architectures with CNN framework for sexism detection, evaluated on the EXIST 2021 dataset
SHAP values demonstrate clear correlation between Sexism Scores and model performance
Texts with higher Sexism Scores are more reliably identified as sexist, validating the explainability-driven approach
The pipeline provides interpretable explanations alongside predictions, moving beyond binary classification
Combining multiple explainability techniques (LIME, SHAP, attention visualization) provides complementary insights into model behavior

Code and Resources

Complete implementation of the transparent pipeline

Pre-trained models for immediate use

Jupyter notebooks with examples and tutorials

Scripts for reproducing all experimental results

Visit GitHub Repository

Citation

@article{mohammadi2024transparent,
  title={A Transparent Pipeline for Identifying Sexism in Social Media: Combining Explainability with Model Prediction},
  author={Mohammadi, Hadi and Giachanou, Anastasia and Bagheri, Ayoub},
  journal={Applied Sciences},
  volume={14},
  number={19},
  pages={8620},
  year={2024},
  publisher={MDPI}
}