IntentScore: A Plan-Aware Reward Model for Evaluating Computer-Use Agent Actions

Researchers introduce IntentScore, a new reward model designed to evaluate the quality of actions taken by Computer-Use Agents (CUAs) to prevent irreversible errors. Trained on 398,000 offline GUI interaction steps across three operating systems, it uses contrastive alignment and margin ranking to ensure actions align with user intent.

Researchers have unveiled IntentScore, a novel plan-aware reward model specifically engineered to address the critical flaw of action evaluation in Computer-Use Agents (CUAs). While current CUAs leverage large language models to execute graphical user interface operations, they often generate actions without assessing their quality, leading to irreversible errors that cascade through subsequent steps. To solve this, the team developed a system trained on 398,000 offline GUI interaction steps spanning three different operating systems, enabling it to score candidate actions with unprecedented accuracy.

The significance of IntentScore lies in its dual-objective training approach, which combines contrastive alignment for state-action relevance and margin ranking for action correctness. This architecture allows the model to not only understand the immediate context of a GUI state but also evaluate whether a proposed action logically advances the user's broader plan. By integrating this evaluation layer, CUAs can now filter out low-quality or hallucinated actions before execution, fundamentally shifting the paradigm from blind action generation to intent-conditioned decision-making. This is a crucial step toward reliable automation in complex desktop environments.

Looking ahead, the introduction of IntentScore opens new avenues for robust agent deployment in real-world scenarios where error tolerance is near zero. The immediate reaction from the community focuses on how this method might integrate with existing agent frameworks and whether it can generalize to unseen operating systems or novel GUI elements. Future work will likely explore scaling this approach to multi-step planning tasks and investigating how human feedback can further refine the reward signals, potentially setting a new standard for evaluating autonomous agent behavior.