AI Language Models Show No Special Advantage for Related Languages

Researchers found that large language models don't transfer knowledge better between related languages like Arabic and Hebrew, regardless of model size. This challenges assumptions about linguistic similarity aiding AI performance. The study tested models with 4 billion to 671 billion parameters and various architectures.

Researchers from arXiv tested seven large language models, ranging from 4 billion to 671 billion parameters and including both dense and Mixture-of-Experts architectures, on cross-lingual transfer tasks. They fine-tuned the models on Arabic and then evaluated their zero-shot reading comprehension in Semitic languages (like Hebrew) and non-Semitic control languages (like English and others). Surprisingly, they found no evidence that models perform better on languages that are linguistically related to the training language.

This finding is significant because it challenges the common assumption that AI models can leverage linguistic similarities to perform better. For example, you might expect an AI trained on Spanish to do slightly better on Italian than on Chinese. But this research suggests that's not the case—models improve equally across all languages, regardless of their linguistic relationship to the training language. This means AI language models are more general than we thought, treating all languages as equally distinct challenges.

Interestingly, the study also performed a chain-of-thought ablation, which reinforced the finding: models with weak baselines improved dramatically across all languages, while strong-baseline models showed only marginal gains regardless of language family. This indicates that the primary driver of improvement is overall model capability rather than linguistic proximity.