← 返回大厅
arXiv (CS.AI) 2026-06-24 12:00 DOI: arXiv:2605.01482

Grounding Multi-Hop Reasoning in Structural Causal Models via Group Relative Policy Optimization

摘要 / Abstract

arXiv:2605.01482v3 Announce Type: replace Abstract: Multi-Hop Fact Verification requires complex reasoning across disparate evidence, posing significant challenges for Large Language Models , which may suffer from hallucinations and fractured logical chains. Existing methods, while improving transparency via Chain-of-Thought , often lack explicit modeling of the structural dependencies between evidence and claims. In this work, we introduce an SCM-inspired framework that grounds reasoning in explicit directed dependency graphs, treating verification as a constructive structural reasoning process rather than full causal inference with interventions or counterfactual semantics. We empirically identify an "inverted U-shaped" correlation between reasoning-chain length and accuracy, revealing that excessive structural complexity can degrade performance. To address this, we propose a rule-based reinforcement learning strategy using Group Relative Policy Optimization. This approach dynamically optimizes the trade-off between structural depth and conciseness. Extensive experiments on HoVer and EX-FEVER demonstrate that our SCM-GRPO framework outperforms strong baselines while producing more traceable reasoning structures for complex fact verification.

同行评议区

登录学者账户后即可在此处发表评述或点赞。

立即登录

暂无评议记录。