New Method Reduces Computational Cost of Simultaneous Speech Translation

Researchers propose hierarchical policy optimization to improve simultaneous speech translation (SST) efficiency. The method leverages LLM KV cache reuse, reducing computational overhead without requiring extensive dialogue annotations.

Researchers have introduced a novel approach to simultaneous speech translation (SST) that significantly reduces computational costs. The method, detailed in a new arXiv paper, reformulates SST as a multi-turn dialogue task, allowing for full reuse of the LLM's key-value (KV) cache. This eliminates redundant feature recomputation, a major bottleneck in current SST systems.

The innovation addresses a critical challenge in SST: balancing quality and efficiency. While large language models (LLMs) have improved translation quality, they come with high computational overhead. The proposed hierarchical policy optimization technique mitigates this by leveraging the LLM's existing capabilities more effectively. Notably, it doesn't require extensive supervised fine-tuning (SFT) data in dialogue form, which is scarce.

This development could accelerate real-time translation applications, from international conferences to customer service. The method's ability to reuse KV caches suggests broader implications for other tasks involving LLMs. However, the lack of extensive dialogue annotations remains a limitation, and future work may explore synthetic data generation to overcome this hurdle.