LLMs' Hidden States Mirror Human Semantic Associations in Feature Space

Researchers found that geometric relations between semantic features in LLMs' hidden states closely match human psychological associations. The study projects 360 words onto 32 semantic axes, showing high correlation with human ratings.

A new study published on arXiv reveals that the geometric relations between semantic features in large language models' (LLMs) hidden states closely mirror human psychological associations. Researchers constructed feature vectors for 360 words and projected them onto 32 semantic axes, such as beautiful-ugly or soft-hard. The projections correlated highly with human ratings of those words on the respective semantic scales.

The findings suggest that LLMs inherently capture human-like semantic structures in their hidden states. The cosine similarities between the semantic axes themselves were highly predictive of the correlations between these scales in human psychology. This indicates that LLMs not only understand words but also the intricate relationships between them, much like humans do.

The implications of this research are profound for both AI development and cognitive science. Understanding how LLMs encode semantic information could lead to more intuitive and human-like AI systems. Future studies may explore how these findings can be leveraged to improve AI's ability to understand and generate more nuanced and contextually appropriate responses. The study also raises questions about the extent to which LLMs can truly replicate human cognitive processes.