论文广场 - AcademicHub

01.

arXiv (CS.CL) 2026-06-16 DOI: arXiv:2606.15088

When the Same Musical Knowledge Forgets Differently: A Clean Probe of Pathway-Dependent Forgetting

作者:

Yu Liu ↗Zhiwei Yang ↗Wenxiao Zhang ↗Cong Cao ↗Fangfang Yuan ↗Kun Peng ↗Haimei Qin ↗Lei Jiang ↗Jin B. Hong ↗Hao Peng ↗Yanbing Liu ↗

A model can learn that the piano piece Für Elise is calm and reflective by listening to the audio or by reading a text description, but does it matter which route that knowledge took when it is later at risk of being forgotten? Forgetting research in multimodal models measures what knowledge is lost under adaptation, yet has not asked whether acquisition route affects how easily that knowledge is forgotten. We call this untested premise the Pathway-Invariant Assumption. Music understanding enables a clean test because a music clip and a canonical text description can be aligned to the same perceptual content, allowing the same knowledge unit to enter a model through listening or reading while the target remains fixed. Across multiple architecturally distinct audio-language models, we observe a consistent asymmetry: text-pathway knowledge is forgotten more than matched audio-pathway knowledge under identical adaptation pressure. To attribute this effect to route rather than confounds, we introduce the Paired Pathway Controlled Protocol (PPCP), a three-phase design that establishes matched pathway baselines, activates both pathways under symmetric supervision on the same knowledge pool, and applies identical forgetting pressure to both pathways. The gap is stable across models and gain-controlled analyses, persists when contradictory overwrite is replaced by correct-label cross-domain learning, remains under single-modality pressure, and is not removed by lightweight replay. Two independent routing-depth controls confirm that the effect is not explained by architectural depth, pointing to input representation as the dominant factor. Under PPCP, our results demonstrate that forgetting is highly route-dependent, establishing acquisition route as a new analytical dimension for forgetting research and multimodal system design.

阅读与讨论 → 访问原文 →

02.

arXiv (CS.CV) 2026-06-16 DOI: arXiv:2606.15869

Metis: A Generalizable and Efficient World-Action Model for Autonomous Driving and Urban Navigation

作者:

Jingyu Li ↗Zhe Liu ↗Dongnan Hu ↗Junjie Wu ↗Zipei Ma ↗Wenxiao Wu ↗Chao Han ↗Zhihui Hao ↗Zhikang Liu ↗Kun Zhan ↗Jiankang Deng ↗Xiatian Zhu ↗…

World action models~(WAMs) have shown great promise for autonomous driving and urban navigation. Built upon Vision-Language-Action models or video generation models, existing approaches suffer key limitations: (1) High inference latency due to future observation prediction at test time, and (2) tightly coupled video and action modeling leading to representational mismatch and degraded generalization. To address both issues, we propose Metis, an end-to-end WAM framework that decouples video generation and action prediction. Specifically, Metis employs a Mixture-of-Transformers architecture with dedicated experts for video generation and action prediction, preserving the intrinsic distributional properties of each task. To enhance efficiency, we introduce an asymmetric attention mask that enables joint training of both experts while allowing the action model to bypass explicit video generation during inference. This design ensures training-inference consistency and significantly reduces computational costs without compromising planning performance. Extensive experiments demonstrate state-of-the-art performance on the NAVSIM navhard and navtest benchmarks and the CityWalker navigation benchmark, validating both the generalizability and efficiency across diverse tasks. Real-robot deployments further confirm the practical feasibility of our approach.

阅读与讨论 → 访问原文 →

探索全球前沿学术脉络

When the Same Musical Knowledge Forgets Differently: A Clean Probe of Pathway-Dependent Forgetting

Metis: A Generalizable and Efficient World-Action Model for Autonomous Driving and Urban Navigation