Power Law Data Distributions Boost Compositional Reasoning in AI Models
A new study reveals that training AI models on power-law distributed data improves their performance on complex reasoning tasks. This challenges the assumption that uniform data distributions are superior for learning rare skills.

Researchers have discovered that AI models trained on data following a power-law distribution—where a few common examples dominate and rare examples are scarce—perform better on compositional reasoning tasks. These tasks include state tracking and multi-step arithmetic, which require models to combine multiple pieces of information to solve problems. The findings, published on arXiv, counter the traditional belief that uniform data distributions, where all examples are equally represented, are better for learning rare skills.
The study highlights that power-law distributions mimic the natural structure of language and knowledge, where a small number of common concepts and skills appear frequently, while a long tail of rare concepts and skills appear infrequently. This asymmetry in data distribution seems to help models generalize better to complex tasks that require combining multiple pieces of information. The researchers suggest that the power-law distribution allows models to focus on the most relevant and frequent patterns, which are crucial for solving intricate problems.
The implications of this research are significant for the development of AI models, particularly in areas requiring advanced reasoning capabilities. Future work may explore how to optimize power-law distributions for different types of tasks and whether this approach can be applied to other types of data beyond natural language. The study also raises questions about the broader impact of data distribution on model performance and how it can be leveraged to improve AI systems across various applications.