Let Me Explain! — PhD Thesis

About this thesis

This thesis presents six connected studies on explainable Natural Language Processing (NLP) for understanding large language models (LLMs). As LLMs grow more capable and widely deployed, methods to understand, verify, and evaluate their behavior across different contexts and cultures have lagged behind their raw performance.

The work addresses three dimensions of this gap. The first is methodological: explanation methods that travel well from generic feature attribution to the linguistic and cultural complexity of text. The second is empirical: the reliability of LLM-produced annotations under demographic bias and prompt variation. The third is normative: cross-cultural moral alignment, asking whether large models reflect a narrow set of values when deployed globally, and how to evaluate that systematically.

Across nine chapters and six core studies, the thesis combines a survey of the explainable NLP literature with empirical work on sexism detection, AI-text detection, annotator demographic bias, and moral alignment evaluation against the World Values Survey and PEW Global Attitudes Survey.

Chapter overview

Section I · Opening

1

Introduction

The opacity challenge of LLMs, the case for explainability across high-stakes domains, and an overview of the thesis contributions.

Section II · Foundations of Explainable NLP

2

Explainability in Practice: A Survey of Explainable NLP Across Various Domains

A domain-specific review of XNLP across healthcare, finance, content moderation, customer relationship management, and beyond — with a critical take on evaluation, real-world applicability, and the role of human interaction.
3

A Transparent Pipeline for Identifying Sexism in Social Media: Combining Explainability with Model Prediction

An explainable detection pipeline that combines a CustomBERT ensemble with LIME and SHAP explanations to identify sexism in English and Spanish social-media posts.
4

Explainability-Based Token Replacement on LLM-Generated Text

Explainability-guided token replacement reveals brittle features in AI-text detectors, and shows how explanation-targeted edits flip detector outputs while preserving meaning.

Section III · Human-Centered AI and Moral Alignment

5

Assessing the Reliability of LLM Annotations in the Context of Demographic Bias and Model Explanation

A study of how explainability tools and demographic personas affect LLM annotation reliability, with mixed-effects models that quantify variability sources.
6

Exploring Cultural Variations in Moral Judgments with Large Language Models

Probing whether LLMs mirror cross-cultural moral attitudes from the World Values Survey, comparing log-probability scoring against country-level ground truth.
7

EvalMORAAL: Interpretable Chain-of-Thought and LLM-as-Judge Evaluation for Moral Alignment in Large Language Models

A transparent CoT framework evaluating moral alignment in 20 LLMs against the WVS and PEW surveys, with model-as-judge peer review and a persistent Western / non-Western alignment gap.

Section IV · Closing

8

Discussion and Conclusion

Cross-chapter synthesis, limitations, and directions for future work in explainable NLP and culturally aware language models.

Underlying papers

The empirical chapters are based on six peer-reviewed and preprint papers, listed in chapter order. The full publication list, including pre-PhD work, lives on the publications page.

[1] Mohammadi, H., Bagheri, R. A., Giachanou, A., & Oberski, D. L. (2025). Explainability in Practice: A Survey of Explainable NLP Across Various Domains. arXiv preprint arXiv:2502.00837. · Chapter 2

[2] Mohammadi, H., Giachanou, A., & Bagheri, R. A. (2024). A Transparent Pipeline for Identifying Sexism in Social Media: Combining Explainability with Model Prediction. Applied Sciences, 14(19), 8620. doi:10.3390/app14198620. · Chapter 3

[3] Mohammadi, H., Giachanou, A., Oberski, D. L., & Bagheri, R. A. (2025). Explainability-Based Token Replacement on LLM-Generated Text. arXiv preprint arXiv:2506.04050. · Chapter 4

[4] Mohammadi, H.^†, Shahedi, T.^†, Mosteiro, P., Poesio, M., Bagheri, R. A., & Giachanou, A. (2025). Assessing the Reliability of LLM Annotations in the Context of Demographic Bias and Model Explanation. Workshop on Gender Bias in Natural Language Processing (GeBNLP), ACL 2025, pp. 92–104. ACL Anthology. · Chapter 5

[5] Mohammadi, H., Meijer, Y. F. S. S., Papadopoulou, E., & Bagheri, R. A. (2025). Do Large Language Models Understand Morality Across Cultures? Proceedings of the 2nd LUHME Workshop, pp. 30–39. ACL Anthology. · Chapter 6

[6] Mohammadi, H., Giachanou, A., & Bagheri, R. A. (2026). EvalMORAAL: Interpretable Chain-of-Thought and LLM-as-Judge Evaluation for Moral Alignment in Large Language Models. Proceedings of the 15th Joint Conference on Lexical and Computational Semantics (*SEM 2026), ACL 2026 (in press). arXiv:2510.05942. · Chapter 7

^† Equal contribution.

How to cite

The full reference will be updated when the thesis is officially deposited with Utrecht University after the defense. For now, please cite as:

@phdthesis{mohammadi2026letmeexplain, author = {Mohammadi, Hadi}, title = {Let Me Explain! Explainable NLP for Understanding Large Language Models}, school = {Utrecht University}, year = {2026}, type = {Doctoral thesis}, note = {Forthcoming} }

About this thesis

Defense information

Chapter overview

Underlying papers

How to cite