Explainable NLP for Understanding Large Language Models
Defense forthcoming, 2026 · Utrecht University
This thesis presents six connected studies on explainable Natural Language Processing (NLP) for understanding large language models (LLMs). As LLMs grow more capable and widely deployed, methods to understand, verify, and evaluate their behavior across different contexts and cultures have lagged behind their raw performance.
The work addresses three dimensions of this gap. The first is methodological: explanation methods that travel well from generic feature attribution to the linguistic and cultural complexity of text. The second is empirical: the reliability of LLM-produced annotations under demographic bias and prompt variation. The third is normative: cross-cultural moral alignment, asking whether large models reflect a narrow set of values when deployed globally, and how to evaluate that systematically.
Across nine chapters and six core studies, the thesis combines a survey of the explainable NLP literature with empirical work on sexism detection, AI-text detection, annotator demographic bias, and moral alignment evaluation against the World Values Survey and PEW Global Attitudes Survey.
The empirical chapters are based on six peer-reviewed and preprint papers, listed in chapter order. The full publication list, including pre-PhD work, lives on the publications page.
The full reference will be updated when the thesis is officially deposited with Utrecht University after the defense. For now, please cite as:
@phdthesis{mohammadi2026letmeexplain,
author = {Mohammadi, Hadi},
title = {Let Me Explain! Explainable NLP for Understanding
Large Language Models},
school = {Utrecht University},
year = {2026},
type = {Doctoral thesis},
note = {Forthcoming}
}