arXiv (CS.AI)
2026-06-16 12:00
DOI:
arXiv:2606.15141
EChO-Agent: Evidence Chain Orchestration Agent for Audio Reasoning
Authors:
Abstract
arXiv:2606.15141v1 Announce Type: cross
Abstract: While LALMs show promise on audio question answering, they fail to focus on question-relevant segments of audio and provide a clear, checkable reasoning process when dealing with complex audio reasoning. Reinforcement learning and tool-augmented prompting can help models better relate questions to audio but lack a reliable way to understand, integrate, and self-verify audio segments. To address this gap, we present EChO-Agent, a modular agent framework that reformulates complex audio QA as a planning, tool execution, evidence integration, and answer verification workflow. Experiments on MMAR benchmark show EChO-Agent improves both accuracy and rubric scores over baseline and ablation studies show evidence integration is the key factor.