← 返回大厅
arXiv (CS.CV) 2026-06-24 12:00 DOI: arXiv:2606.24068

ObsGraph: Hierarchical Observation Representation for Embodied Reasoning and Exploration

摘要 / Abstract

Embodied reasoning and exploration are increasingly considered crucial abilities for robots operating in complex and unfamiliar environments. To accomplish tasks in such settings, an agent must identify and acquire the information necessary for the task through exploration. We propose ObsGraph, an observation-centric hierarchical scene graph that unifies scene representation, retrieval, and exploration. It retains visual evidence and organizes it into room-view-object layers: rooms provide coarse semantic anchors, views preserve contextual object covisibility, and objects store fine-grained details. On top of this representation, we perform coarse-to-fine hierarchical retrieval under a bounded budget, and crucially use retrieval outcomes to structure the exploration candidate space–activating room-level exploration, view refinement, or frontier exploration–thereby tightly coupling representation, retrieval, and adaptive multi-scale exploration. Experiments across embodied reasoning and exploration benchmarks demonstrate improved success and efficiency, highlighting the benefits of structured scene representation and more targeted information gathering driven by identified evidence gaps.

同行评议区

登录学者账户后即可在此处发表评述或点赞。

立即登录

暂无评议记录。