Project

Cracking the Multilingual Code: Exploring Multilinguality in LLMs through Mechanistic Interpretability

Code
BOF/PDO/2025/001
Duration
01 October 2025 → 30 September 2028
Funding
Regional and community funding: Special Research Fund
Promotor
Research disciplines
  • Natural sciences
    • Natural language processing
  • Humanities and the arts
    • Computational linguistics
  • Social sciences
    • Artificial intelligence
    • Knowledge representation and machine learning
Keywords
fair and transparent AI mechanistic interpretability multilingual language modelling
 
Project description
This research project aims to advance the understanding of multilinguality in multilingual large language models (MLLMs), with the overarching goal of making these models more transparent, interpretable and equitable for speakers of low-resourced languages. By investigating how these models represent and transfer meaning across languages, the project seeks to uncover the mechanisms underlying their multilingual behavior at the neuron level. This includes examining the role of language-specific and polyglot neurons, the formation of typological clusters, and the impact of pre-training data and model architecture on multilingual generalizations. Additionally, the research delves into how MLLMs internalize translation processes and whether they employ language-pivoting strategies to facilitate cross-lingual tasks, especially for under-represented languages. Finally, the study addresses the consistency of reasoning structures —such as cause-effect relationships and logical reasoning— across languages, identifying whether systemic biases or universal patterns shape multilingual reasoning capabilities. By advancing our understanding of how MLLMs process diverse languages, this work aims to make these models more equitable and effective, ensuring their benefits extend to all linguistic communities, including those currently underserved by AI technologies.