EvalMORAAL: Moral Alignment in Large Language Models
Interpretable Chain-of-Thought and LLM-as-Judge Evaluation for Moral Alignment in Large Language Models
Hadi Mohammadi, Anastasia Giachanou, Ayoub Bagheri
How do AI systems understand morality across different cultures? This research investigates how 20 leading Large Language Models make moral judgments, comparing them against human values from 64 countries around the world using the World Values Survey.
Top AI models (Claude-3-Opus, GPT-4o) achieve approximately 90% alignment with the World Values Survey. However, there's a significant gap between how well AI understands Western cultures versus non-Western cultures. This reveals the urgent need for more culturally-aware AI systems.
Two complementary methods (log-probabilities and direct ratings) for fair cross-model comparison
Chain-of-thought reasoning with self-consistency checks to stabilize moral judgments
Novel peer-review methodology where models evaluate each other's reasoning quality
Systematic measurement of Western vs. non-Western performance gaps in AI moral alignment