AI-Generated Reviews for Research Papers: What's the Real Score?

A new study tested AI-generated reviews for academic papers and found they don't always match human opinions. The research also reveals that authors can "game" the system by using AI to revise their papers before submission, raising serious questions about fairness and reliability in scientific publishing.

Researchers published a study on arXiv examining AI-generated reviews for scientific papers, focusing on the 2025 ACL Rolling Review (ARR). They found that AI reviews have only limited alignment with human reviews, even in the best-case scenarios. More concerning, the study demonstrated that authors can use AI tools to revise their papers in ways that significantly improve AI-generated scores, effectively "gaming" the review process.

This matters because major conferences are piloting AI-generated reviews, and both reviewers and authors are increasingly using AI assistance. If authors can easily manipulate AI reviewers while human reviewers might not be fooled, the integrity of the peer review process is at risk. Think of it like a student learning exactly how a robo-grader works and tailoring answers to get a perfect score, even if the work isn't actually better.

If you're an academic, you can start by being transparent about using AI tools in your research process. For example, when submitting a paper, you could mention if you used an AI tool to revise it. This helps maintain trust and clarity in the review process.