论文广场 - AcademicHub

01.

Nature Biotechnology 2026-06-16 DOI: HASH:39dd3b98bf98ac0351f213df1edf9f2e

DeepMind spinout raises $2.1 billion

作者: 未知作者

该条目无摘要（多为勘误、社论或新闻类内容，出版方未提供摘要）

阅读与讨论 → 访问原文 →

02.

arXiv (CS.CV) 2026-06-15 DOI: arXiv:2512.05025

RAMEN: Resolution-Adjustable Multimodal Encoder for Earth Observation

作者:

Nicolas Houdr\'e ↗Diego Marcos ↗Hugo Riffaud de Turckheim ↗Dino Ienco ↗Laurent Wendling ↗Camille Kurtz ↗Sylvain Lobry ↗

Earth observation (EO) data spans a wide range of spatial, spectral, and temporal resolutions, from high-resolution optical imagery to low resolution multispectral products or radar time series. While recent foundation models have improved multimodal integration for learning meaningful representations, they often expect fixed input resolutions or are based on sensor-specific encoders limiting generalization across heterogeneous EO modalities. To overcome these limitations we introduce RAMEN, a resolution-adjustable multimodal encoder that learns a shared visual representation across EO data in a fully sensor-agnostic manner. RAMEN treats the modality and spatial and temporal resolutions as key input data features, enabling coherent analysis across modalities within a unified latent space. Its main methodological contribution is to define spatial resolution as a controllable output parameter, giving users direct control over the desired level of detail at inference and allowing explicit trade-offs between spatial precision and computational cost. We train a single, unified transformer encoder reconstructing masked multimodal EO data drawn from diverse sources, ensuring generalization across sensors and resolutions. Once pretrained, RAMEN transfers effectively to both known and unseen sensor configurations and outperforms larger state-of-the-art models on the community-standard PANGAEA benchmark, containing various multi-sensor and multi-resolution downstream tasks. Our code and pretrained model are available at https://github.com/nicolashoudre/RAMEN.

阅读与讨论 → 访问原文 →

03.

arXiv (CS.CL) 2026-06-16 DOI: arXiv:2606.16093

Long-Context Modeling via GSS-Transformer Hybrid Architecture with Learnable Mixing

作者:

Kuzey Torlak ↗H\"useyin Arda Arslan ↗An{\i}l Dervi\c{s}o\u{g}lu ↗Beyza Nur Deniz ↗Onur Boyar ↗

Modeling long-range dependencies remains a central challenge in natural language processing. Transformer architectures achieve strong performance via self-attention but scale quadratically ($O(N^2)$) with sequence length, while State Space Models (SSMs) scale linearly ($O(N)$) but suffer from a selective recall bottleneck, struggling to retrieve precise information from compressed states. This creates a fundamental tradeoff between efficiency and perplexity. To tackle these challenges, we propose the Parallel Hybrid Architecture (PHA), which runs Gated State Spaces (GSS), Grouped Query Attention (GQA), and Feed-Forward Networks (FFNs) as independent parallel branches fused by a learnable mixing mechanism. Instead of forcing SSMs to approximate attention or serializing the two paradigms, PHA allows each branch to specialize: GSS captures global context, while attention performs selective retrieval, with FFN providing complementary processing. On WikiText-103, PHA achieves 16.51 PPL at 125M parameters, outperforming Hedgehog (16.70) and H3-125M (23.70). Scaling to 180M parameters yields 16.42 PPL, which gives comparable results with the pure attention baseline while delivering 24\% higher throughput and up to 40\% lower memory usage at long contexts. On OpenWebText, our 125M model achieves 19.72 PPL, outperforming standard Transformers (20.60) and GSS hybrid baselines (19.80). These results demonstrate that separating sequence modeling paradigms into parallel specialists enables Transformer-level perplexity with substantially improved efficiency for long-context language modeling.

阅读与讨论 → 访问原文 →

04.

arXiv (CS.CL) 2026-06-17 DOI: arXiv:2602.06154

MoSE: Mixture of Slimmable Experts for Efficient and Adaptive Language Models

作者:

Nurbek Tastan ↗Stefanos Laskaridis ↗Karthik Nandakumar ↗Samuel Horvath ↗

Mixture-of-Experts (MoE) models scale large language models efficiently by sparsely activating experts, but once an expert is selected, it is executed fully. Hence, the trade-off between accuracy and computation in an MoE model typically exhibits large discontinuities. We propose Mixture of Slimmable Experts (MoSE), an MoE architecture in which each expert has a nested, slimmable structure that can be executed at variable widths. This enables conditional computation not only over which experts are activated but also over how much of each expert is utilized. Consequently, a single pretrained MoSE model can support a more continuous spectrum of accuracy-compute trade-offs at inference time. We present a simple and stable training recipe for slimmable experts under sparse routing, combining multi-width training with standard MoE objectives. During inference, we explore strategies for runtime width determination, including a lightweight test-time training mechanism that learns how to map router confidence/probabilities to expert widths under a fixed budget. Experiments on GPT-style models, various routing regimes, zero-shot downstream reasoning benchmarks, and continual pre-training adaptation of DeepSeek model show that MoSE matches or improves standard MoE at full width and consistently shifts the compute-quality frontier toward lower inference FLOPs. The code can be found at: https://github.com/tnurbek/mose.

阅读与讨论 → 访问原文 →

05.

arXiv (CS.AI) 2026-06-19 DOI: arXiv:2602.17315

Flickering Multi-Armed Bandits

作者:

Sourav Chakraborty ↗Amit Kiran Rege ↗Claire Monteleoni ↗Lijun Chen ↗

arXiv:2602.17315v3 Announce Type: replace-cross Abstract: We introduce Flickering Multi-Armed Bandits (FMAB) to model sequential decision-making in environments with changing action availability, where accessibility of the next action is restricted to a subset dependent on the agent's current choice. We formalize these constraints through stochastically evolving graphs where actions are limited to local neighborhoods. This mobility-constrained structure imposes a dual challenge: the statistical requirement of information acquisition and the physical overhead of navigation. We analyze FMAB under i.i.d. Erdős–R'enyi and Edge-Markovian process, proposing a two-phase lazy random walk algorithm for robust exploration. We establish high-probability sublinear regret bounds and prove near-optimality via a matching information-theoretic lower bound. Our results characterize the intrinsic cost of learning under local-move constraints, complemented by a robotic disaster-response simulation.

阅读与讨论 → 访问原文 →

06.

arXiv (quant-ph) 2026-06-16 DOI: arXiv:2606.16918

Controlled Quantum Metrology with Anisotropic Heisenberg Spin Interactions under Intrinsic Decoherence

作者:

S. K. Singh ↗Jia-Xin Peng ↗Y-J Zhu ↗Mohammad Khalid ↗

arXiv:2606.16918v1 Announce Type: new Abstract: We theoretically investigate quantum parameter estimation in a two-qubit anisotropic Heisenberg spin system with Dzyaloshinskii-Moriya (DM) interaction in the presence of intrinsic decoherence described by the Milburn model. Using the Quantum Fisher Information (QFI), we study the estimation of both the uniform magnetic field and the DM interaction strength. Analytical expressions for the time-evolved density matrix are obtained and used to explore the effects of exchange anisotropy, intrinsic decoherence, and probe-state preparation on the achievable estimation precision. Our results show that suitable tuning of the anisotropic exchange coupling and the initial entangled state can considerably enhance the estimation performance, with different optimal parameter regimes emerging for magnetic-field and DM-interaction sensing. To better understand the role of quantum resources in metrology, we also examine the behaviour of concurrence, quantum coherence, and von Neumann entropy. Overall, our findings demonstrate that anisotropic Heisenberg spin systems with DM interaction provide a promising and flexible platform for high-precision quantum metrology even in the presence of intrinsic decoherence.

阅读与讨论 → 访问原文 →

07.

arXiv (CS.CL) 2026-06-18 DOI: arXiv:2606.19170

Dango: A Strictly L1-Only Large Language Model for Studying Second Language Acquisition

作者:

Shiho Matta ↗Yin Jou Huang ↗Fei Cheng ↗Takashi Kodama ↗Hirokazu Kiyomaru ↗Yugo Murawaki ↗

We introduce Dango, a 1.8B-parameter large language model designed for controlled studies of L1-to-L2 (Japanese-to-English) transfer in second language acquisition (SLA). While previous studies have explored SLA in language models, they have predominantly relied on smaller or non-decoder models, limiting their ability to generate open-ended text and reducing their suitability as practical L2 simulators. We identify a key challenge when scaling models to this size: L2 contamination within the "monolingual" pretraining corpus used for L1 acquisition. To address this, we propose a filtering method to reduce premature exposure to English while preserving realistic, minimal exposure. We then fine-tune the model on LLM-generated L2-learning lessons to simulate the L2 acquisition process. Our evaluations confirm that Dango develops human-like L2 production patterns, outperforming both unfiltered and standard multilingual baselines. We release the model, data, and code to facilitate reproducible computational SLA research and learner-facing applications.

阅读与讨论 → 访问原文 →

08.

arXiv (CS.CV) 2026-06-12 DOI: arXiv:2606.13587

Towards Effective Waste Segmentation for Automated Waste Recycling in Cluttered Background

作者:

Mamoona Javaid ↗Mubashir Noman ↗Abdul Hannan ↗Shah Nawaz ↗Mustansar Fiaz ↗Sajid Ghuffar ↗

Rapid expansion of urban areas and population growth is causing an immense increase in waste production, which demands the need for efficient and automated waste management. In this scenario, automated waste recycling (AWR) using deep learning methods can assist humans in optimal waste management. Recent deep learning approaches for AWR provide promising waste segmentation performance, however, these methods rely on large backbone networks that are inefficient for AWR systems and suffer from performance deterioration in cluttered scenes. To this end, an optimal waste segmentation network is introduced which effectively utilizes the spatial domain to capture localized structural dependencies and the spectral domain to efficiently extract global contextual relationships. This cascaded design allows the network to progressively leverage both local and global representations across complementary domains to highlight the semantic information necessary for effective segmentation of various waste objects. Furthermore, auxiliary feature enhancement module (AFEM) is introduced to enhance the target objects' boundaries and blob amplification for better segmentation in cluttered scenarios. Extensive experimentation on ZeroWaste-aug, ZeroWaste-f and SpectralWaste datasets reveals the merits of the proposed method.

阅读与讨论 → 访问原文 →

09.

arXiv (CS.CV) 2026-06-15 DOI: arXiv:2512.00336

MVAD: A Benchmark Dataset for Multimodal AI-Generated Video-Audio Detection

作者:

Mengxue Hu ↗Yunfeng Diao ↗Changtao Miao ↗Tairui Ge ↗Taize Ge ↗Zhiqing Guo ↗Jianshu Li ↗Zhe Li ↗Zhongjie Ba ↗Joey Tianyi Zhou ↗

The rapid advancement of AI-generated multimodal video-audio content has raised significant concerns regarding information security and content authenticity. Existing synthetic video datasets predominantly focus on the visual modality alone, while the few incorporating audio are largely confined to facial deepfakes–a limitation that fails to address the expanding landscape of general multimodal AI-generated content and substantially impedes the development of trustworthy detection systems. To bridge this critical gap, we introduce the Multimodal Video-Audio Dataset (MVAD), the first comprehensive dataset specifically designed for detecting AI-generated multimodal video-audio content. Our dataset exhibits three key characteristics: (1) genuine multimodality with samples generated according to three realistic video-audio forgery patterns; (2) high perceptual quality achieved through diverse state-of-the-art generative models; and (3) comprehensive diversity spanning realistic and anime visual styles, four content categories (humans, animals, objects, and scenes), and four video-audio multimodal data types. Our dataset will be available at https://github.com/HuMengXue0104/MVAD.

阅读与讨论 → 访问原文 →

10.

arXiv (CS.AI) 2026-06-17 DOI: arXiv:2606.17872

AnchorKV: Safety-Aware KV Cache Compression via Soft Penalty with a Refusal Anchor

作者:

Ning Ni ↗Yingjie Lao ↗

arXiv:2606.17872v1 Announce Type: cross Abstract: Large language models (LLMs) outperform earlier architectures on generative inference and long-context tasks, but their large size introduces significant challenges in memory usage, energy cost, and on-device deployment. Since scaling pre-trained language models improves downstream capability [zhao2023survey], the key-value (KV) cache becomes a dominant inference bottleneck. Recent KV cache compression methods [jo2025fastkv,li2024snapkv,zhou2024dynamickv] reduce this cost by retaining only a subset of attention-relevant tokens. However, while these approaches preserve accuracy on benign workloads, their compression policies either fail to defend against jailbreak attacks [jiang2024robustkv] or degrade safety alignment under aggressive eviction. We propose AnchorKV, a drop-in modification to KV cache compression that biases token retention scores away from directions in key space associated with harmful prompts. AnchorKV constructs an offline safety anchor by adapting a difference-of-means representation engineering approach [arditi2024refusal,zou2023representation] to the layer-specific key projection space used in KV caching. Based on this anchor, a soft penalty token selection rule trades a small amount of utility for substantially improved safety alignment, while reducing to the original compressor when the penalty is zero.

阅读与讨论 → 访问原文 →

11.

arXiv (quant-ph) 2026-06-16 DOI: arXiv:2606.17036

Grid-state deformation in a no-jump non-Hermitian bosonic dimer

作者:

B. M. Rodriguez-Lara ↗H. Ghaemi-Dizicheh ↗S. Dehdashti ↗A. Hanke ↗A. Touhami ↗J. N\"otzel ↗

arXiv:2606.17036v1 Announce Type: new Abstract: We study the no-jump evolution of ideal grid states in a lossy bosonic dimer with differential decay. The effective non-Hermitian quadratic dynamics induces a complex symplectic flow in phase space that deforms both the primitive lattice vectors and the origin seed. The average decay rate controls common attenuation, while coherent hopping and differential decay control the reduced dimer deformation. The reduced sector contains elliptic, parabolic, and hyperbolic regimes with imaginary spectra, an exceptional point, and real spectra, producing oscillatory, linear, and exponential lattice deformations. Although projected lattice areas can change, the deformation comes from a determinant-one complex symplectic flow on the full four-dimensional phase space. For a Gaussian regularization of the origin seed, we derive the associated complex width matrix and identify the positivity conditions that preserve Gaussian form. For an initial two-mode qunaught product state, the lossless limit recovers the standard beam-splitter generation of a square GKP$+$ Bell pair, while the no-jump dynamics produces its non-Hermitian deformation with a postselection cost set by the no-jump probability.

阅读与讨论 → 访问原文 →

12.

arXiv (CS.CL) 2026-06-18 DOI: arXiv:2508.04086

ToolGrad: Efficient Tool-use Dataset Generation with Textual "Gradients"

作者:

Zhongyi Zhou ↗Kohei Uehara ↗Haoyu Zhang ↗Jingtao Zhou ↗Lin Gu ↗Ruofei Du ↗Zheng Xu ↗Tatsuya Harada ↗

Prior work synthesizes tool-use LLM datasets by first generating a user query, followed by complex tool-use annotations like depth-first search (DFS). This leads to inevitable annotation failures and low efficiency in data generation. We introduce ToolGrad, an agentic framework that inverts this paradigm. ToolGrad first constructs valid tool-use chains through an iterative process guided by textual "gradients", and then synthesizes corresponding user queries. This "answer-first" approach led to ToolGrad-500, a dataset generated with more complex tool use, lower cost, and almost 100% pass rate. Experiments show that ToolGrad models outperform those trained on expensive baseline datasets and proprietary LLMs. The ToolGrad source code, dataset, and models are available at https://github.com/zhongyi-zhou/toolgrad.

阅读与讨论 → 访问原文 →

13.

arXiv (CS.CV) 2026-06-11 DOI: arXiv:2606.12263

VOID: Defeating Unauthorized Mimicry in Latent Diffusion Models

作者:

Chunlin Qiu ↗Ang Li ↗Tianxiao Huang ↗Ruilin Gan ↗Yunjie Ge ↗Shenyi Zhang ↗Huayi Duan ↗Lingchen Zhao ↗Chao Shen ↗Qian Wang ↗

While Latent Diffusion Models (LDMs) have revolutionized visual synthesis, they are increasingly exploited for unauthorized mimicry of individuals. Existing defenses inject deceptive perturbations to steer the generated images toward irrelevant targets. However, this approach hinges on an ungrounded assumption: subtle perturbations can maintain their deceptive efficacy throughout an LDM's extensive generation process. In reality, the model's innate restoration mechanism will remove such perturbations and cause individual identities to re-emerge in the images generated. We propose VOID, a defense framework that overcomes this conundrum by manipulating an LDM's intrinsic stochasticity. VOID perturbs the diffusion pipeline in two novel ways: 1) amplifying the latent encoding errors to shatter an image's semantic structure, and 2) counteracting the target guidance signals to suppress the model's restoration capabilities. This results in a semantic corruption that thwarts any unauthorized mimicry. Notably, the security gain does not come at the price of visual utility, as VOID simultaneously manages to confine perturbations to human-imperceptible regions of protected images. Our comprehensive evaluation of 24 state-of-the-art defenses against 10 mimicry attacks on 5 datasets demonstrates VOID's unprecedented protection power: it increases the average Frechet Inception Distance (FID) from 113 to 365, a 223% improvement over the strongest defense to date.

阅读与讨论 → 访问原文 →

14.

arXiv (quant-ph) 2026-06-17 DOI: arXiv:2603.15495

An energy-based uncertainty principle and low-energy state preparation

作者:

Anurag Anshu ↗

arXiv:2603.15495v2 Announce Type: replace Abstract: Preparing low-energy states of many-body Hamiltonians is a central challenge in quantum computing, quantum complexity, and condensed matter physics. Existing approaches often get trapped in suboptimal states such as high-energy eigenstates or, more generally, low-variance states that resist further energy reduction. In this work, we explore a different perspective: instead of optimizing with respect to a single Hamiltonian, we leverage the fact that many systems admit families of Hamiltonians that share similar low-energy subspaces but differ at higher energies. We show that this redundancy can be turned into an algorithmic resource by establishing an energy-based uncertainty principle, which implies that these Hamiltonians cannot simultaneously admit low-variance states at higher energies. This suggests a simple strategy of alternating energy-lowering steps across such Hamiltonians, which we investigate numerically on several models. We also introduce a sparse variant where the uncertainty principle yields quadratically larger variance at higher energies, leading to more pronounced energy change. Overall, this work suggests a range of open questions at the interface of random matrix theory, local Hamiltonians and low-energy state preparation, aimed at understanding when such approaches are practical and how they can be analyzed rigorously.

阅读与讨论 → 访问原文 →

15.

arXiv (CS.LG) 2026-06-17 DOI: arXiv:2606.17803

Continual Self-Improvement with Lightweight Experiential Latent Memories

作者:

Vaggelis Dorovatas ↗Nancy Kalaj ↗Rahaf Aljundi ↗

arXiv:2606.17803v1 Announce Type: new Abstract: Large language models achieve strong reasoning performance by scaling inference-time compute, yet remain fundamentally stateless, discarding the rich, self-produced reasoning traces generated during this process. We investigate whether models can instead learn online from this experience, converting transient computation (reasoning traces) into persistent reusable knowledge, and without external supervision or access to future data. We show that In-Context Learning (ICL) over raw reasoning traces fails to generalize, reflecting a fundamental limitation of token-level reuse: individual traces lack the abstraction needed for transfer, even after refinement (e.g. self-reflection). In contrast, drawing inspiration from recent works on unsupervised reinforcement learning, we find that lightweight per-instance training with self-generated test-time signals (majority voting) as rewards yields substantial gains, often surpassing full-dataset offline training, motivating a shift from raw traces to learned latent representations. Building on this insight, we propose an online method that distills inference-time compute spent on encountered problems into compact modular latent memories capturing the underlying reasoning structure. These memories are stored and retrieved for future inputs, enabling continual improvement while avoiding catastrophic forgetting through modular design. Importantly, our method is highly efficient, parametrized as extremely lightweight soft prompt memories (~0.001% of model parameters) and trained with only a few gradient steps, yet achieving performance competitive with full parametric updates and offline training. Across challenging mathematical reasoning benchmarks, our approach significantly outperforms zero-shot and raw data ICL baselines, while transferring effectively across datasets.

阅读与讨论 → 访问原文 →

16.

arXiv (CS.CV) 2026-06-16 DOI: arXiv:2606.15617

NeRD: Neuro-Symbolic Rule Distillation for Efficient Ontology-Grounded Chain-of-Thought in Medical Image Diagnosis

作者:

Hongxi Yang ↗Yiwen Jiang ↗Siyuan Yan ↗Jamie Chow ↗Eunis Li ↗Charlotte Poon ↗Stephanie Fong ↗Xiangyu Zhao ↗Deval Mehta ↗Yasmeen George ↗Zongyuan Ge ↗

Interpretability is essential for trustworthy medical image diagnosis. However, existing concept-driven interpretable methods have key limitations: Concept Bottleneck Models (CBMs) require scoring all predefined concepts at inference time and for manual intervention, imposing a substantial burden on clinicians, while rationale-based generative approaches often select concepts by class discriminability, which can drift from diagnostic ontologies. To address these issues, we propose Neuro-Symbolic Rule Distillation (NeRD), a framework that produces efficient, ontology-grounded reasoning chains that are sufficient yet non-redundant, without manually crafting diagnostic rules. Experiments on two skin datasets demonstrate strong diagnostic performance and interpretability, and blinded expert evaluation confirms the clinical plausibility of NeRD rationales. Our method further enables a first expert-in-the-loop study for Multimodal Chain-of-Thought-based diagnosis, achieving efficient and effective concept-level intervention.

阅读与讨论 → 访问原文 →

17.

arXiv (CS.CL) 2026-06-15 DOI: arXiv:2601.04646

Succeeding at Scale: Enterprise Retrieval Benchmark Construction and Index-Preserving Query Adaptation for Multi-Tenant Search

作者:

Prateek Jain ↗Shabari S Nair ↗Ritesh Goru ↗Prakhar Agarwal ↗Ajay Yadav ↗Yoga Sri Varshan Varadharajan ↗Constantine Caramanis ↗

Large-scale multi-tenant retrieval systems generate extensive query logs but lack curated relevance labels for effective domain adaptation, resulting in substantial underutilized "dark data." This challenge is compounded by the high cost of model updates, as jointly fine-tuning query and document encoders requires full corpus re-indexing, which is impractical in multi-tenant settings with thousands of isolated indices. We introduce DevRev-Search, a passage retrieval benchmark for technical customer support built via a fully automated pipeline. Candidate generation uses fusion across diverse sparse and dense retrievers, followed by an LLM-as-a-Judge for consistency filtering and relevance labeling. We further study and systematically evaluate index-preserving query-only adaptation strategies that fine-tune only the query-encoder while keeping the document indices fixed. Experiments on DevRev-Search, SciFact, and FiQA-2018 show that parameter-efficient fine-tuning of the query encoder delivers a remarkable quality-efficiency trade-off, enabling scalable and practical enterprise multi-tenant retrieval.

阅读与讨论 → 访问原文 →

18.

arXiv (math.PR) 2026-06-12 DOI: arXiv:2606.13264

Interference Queueing Networks: A Replica Mean-Field Approach in the Symmetric Setting

作者:

Philippe Sarotte ↗Nahuel Soprano-Loto ↗Fran\c{c}ois Baccelli ↗

arXiv:2606.13264v1 Announce Type: new Abstract: We propose a model for evaluating the performance of wireless communication networks beyond the ubiquitous full-buffer assumption, under which every transmitter is always active. The network is represented by N interacting queues arranged on a torus, with homogeneous arrival rate and service rates depending on the activity of neighboring interferers. More precisely, each queue is associated with a transmitter-receiver pair, and its service rate is given by the Shannon capacity, which depends on the corresponding Signal-to-Interference-plus-Noise Ratio (SINR). Since interfering transmitters only emit when their queue is non-empty, the SINR and hence the service rate improves when neighboring queues are empty. We derive the stability region of the system, together with approximations of its stationary distribution and its exponential rate of convergence to stationarity. These approximations are obtained via a replica mean-field limit, for which we establish propagation of chaos and long-time behavior results.

阅读与讨论 → 访问原文 →

19.

arXiv (CS.LG) 2026-06-12 DOI: arXiv:2511.19716

Design Criteria for SGD Preconditioners: Local Conditioning, Noise Floors, and Basin Stability

作者:

Mitchell Scott ↗Tianshi Xu ↗Ziyuan Tang ↗Alexandra Pichette-Emmons ↗Qiang Ye ↗Yousef Saad ↗Yuanzhe Xi ↗

arXiv:2511.19716v2 Announce Type: replace-cross Abstract: Stochastic Gradient Descent (SGD) often slows in the late stage of training due to anisotropic curvature and gradient noise. We analyze preconditioned SGD in the geometry induced by a symmetric positive definite matrix $\mathbf{M}$, deriving bounds in which both the convergence rate and the stochastic noise floor are governed by $\mathbf{M}$-dependent quantities: the rate through an effective condition number in the $\mathbf{M}$-metric, and the floor through the product of that condition number and the preconditioned noise level. For nonconvex objectives, we establish a preconditioner-dependent basin-stability guarantee: when smoothness and basin size are measured in the $\mathbf{M}$-norm, the probability that the iterates remain in a well-behaved local region admits an explicit lower bound. This perspective is particularly relevant in Scientific Machine Learning (SciML), where achieving small training loss under stochastic updates is closely tied to physical fidelity, numerical stability, and constraint satisfaction. The framework applies to both diagonal/adaptive and curvature-aware preconditioners and yields a simple design principle: choose $\mathbf{M}$ to improve local conditioning while attenuating noise. Experiments on a quadratic diagnostic and three SciML benchmarks validate the predicted rate-floor behavior.

阅读与讨论 → 访问原文 →

20.

arXiv (CS.AI) 2026-06-16 DOI: arXiv:2606.16914

Greed Is Learned: Visible Incentives as Reward-Hacking Triggers

作者:

Tong Che ↗Rui Wu ↗

arXiv:2606.16914v1 Announce Type: new Abstract: Deployed agents increasingly act with their reward proxy in view, such as a balance, score, or KPI dashboard. We show that reinforcement learning can make a policy addicted to such a visible self-benefit channel. It chases the displayed payoff across held-out domains, sacrifices the true task to do so, and follows the channel wherever we rewrite it, while policies that never saw the channel stay honest. We call this reward-channel addiction and study it in MoneyWorld, a synthetic sandbox. The addiction can flip a model's safety alignment: trained only on innocuous money tasks with no safety content, the model abandons the safe action it otherwise always takes whenever a dashboard pays for an unsafe one, and reverts to safe once the channel is hidden. This learned bribe replicates across model scales and families. Blindly optimizing super-capable, next-generation AI on KPIs or P\&L can be dangerous for alignment. Greed is learned when following such a channel pays.

阅读与讨论 → 访问原文 →

21.

arXiv (CS.LG) 2026-06-11 DOI: arXiv:2603.12901

A theory of learning data statistics in diffusion models, from easy to hard

作者:

Lorenzo Bardone ↗Claudia Merger ↗Sebastian Goldt ↗

arXiv:2603.12901v2 Announce Type: replace-cross Abstract: While diffusion models have emerged as a powerful class of generative models, their learning dynamics remain poorly understood. We address this issue first by empirically showing that standard diffusion models trained on natural images exhibit a distributional simplicity bias, learning simple, pair-wise input statistics before specializing to higher-order correlations. We reproduce this behaviour in simple denoisers trained on a minimal data model, the mixed cumulant model, where we precisely control both pair-wise and higher-order correlations of the inputs. We identify a scalar invariant of the model that governs the sample complexity of learning pair-wise and higher-order correlations that we call the diffusion information exponent, in analogy to related invariants in different learning paradigms. Using this invariant, we prove that the denoiser learns simple, pair-wise statistics of the inputs at linear sample complexity, while more complex higher-order statistics, such as the fourth cumulant, require at least cubic sample complexity. We also prove that the sample complexity of learning the fourth cumulant is linear if pair-wise and higher-order statistics share a correlated latent structure. Our work describes a key mechanism for how diffusion models can learn distributions of increasing complexity.

阅读与讨论 → 访问原文 →

22.

arXiv (CS.AI) 2026-06-16 DOI: arXiv:2510.15966

PISA: A Pragmatic Psych-Inspired Unified Memory System for Enhanced AI Agency

作者:

Shian Jia ↗Ziyang Huang ↗Xinbo Wang ↗Haofei Zhang ↗Mingli Song ↗

arXiv:2510.15966v2 Announce Type: replace Abstract: Memory systems are fundamental to AI agents, yet existing work often lacks adaptability to diverse tasks and overlooks the constructive and task-oriented role of AI agent memory. Drawing from Piaget's theory of cognitive development, we propose PISA, a pragmatic, psych-inspired unified memory system that addresses these limitations by treating memory as a constructive and adaptive process. To enable continuous learning and adaptability, PISA introduces a trimodal adaptation mechanism (i.e., schema updation, schema evolution, and schema creation) that preserves coherent organization while supporting flexible memory updates. Building on these schema-grounded structures, we further design a hybrid memory access architecture that seamlessly integrates symbolic reasoning with neural retrieval, significantly improving retrieval accuracy and efficiency. Our empirical evaluation, conducted on the existing LOCOMO benchmark and our newly proposed AggQA benchmark for data analysis tasks, confirms that PISA sets a new state-of-the-art by significantly enhancing adaptability and long-term knowledge retention.

阅读与讨论 → 访问原文 →

23.

medRxiv (Medicine) 2026-06-17 DOI: HASH:2664068516f4d678a6627786d4adb44f

Nickel and Dimed: How a Common Earth Element is Short-Changing Our Health

作者:

Heitzig ↗Rehkopf ↗

Nickel has been studied for a long time as an environmental contaminant but less so in its connection to population health. It does not announce itself as loudly as its transition metal brethren like mercury and cadmium, but its chemical properties permit it to be deleterious as a low-dose, chronic exposure, particularly among those with immune systems sensitized to it. There is a growing evidence base and vocabulary to discuss nickel's affect on health. However, in the U.S., there are not recent, reliable estimates of the share of the population with a nickel allergy, let alone how much nickel Americans are exposed to through their diet. This paper seeks to close this evidence gap by creating a new dataset of dietary nickel and other heavy metal exposure and assessing how high levels of dietary nickel exposure shape local demand for health care services. We use soil data from the U.S. Geological Survey and data on agricultural product transport from FoodFlows.org to create a county-level dietary nickel exposure index. We then use a large electronic health record database and double machine learning to estimate how demand for primary care services varies across levels of dietary nickel exposure. We find that counties with high nickel exposure experience an increase in the share of primary care office visits for symptoms highly suggestive of nickel poisoning. This result survives multiple hypothesis test corrections and placebo tests. Our research suggests that nickel has harmful effects on individual health whose exposure can be measured at a population level, and is shaping primary care across the U.S.

阅读与讨论 → 访问原文 →

24.

arXiv (math.PR) 2026-06-15 DOI: arXiv:2606.14322

Stability of Synthetic Ricci Curvature Lower Bounds for Inverse Limit Extended Metric Measure Spaces

作者:

Kohei Suzuki ↗Takumi Yokota ↗

arXiv:2606.14322v1 Announce Type: cross Abstract: We show that every Polish extended metric measure space arises as an inverse limit of metric measure spaces up to isomorphism. We then prove that synthetic Ricci curvature lower bounds and several functional inequalities, including the log-Sobolev, Talagrand, Poincaré, and dimension-free Harnack inequalities are stable under inverse limit. We discuss applications to infinite-dimensional spaces, including abstract Wiener spaces and their quotient spaces.

阅读与讨论 → 访问原文 →

25.

arXiv (CS.CV) 2026-06-16 DOI: arXiv:2606.15162

GeoStream: Toward Precise Camera Controlled Streaming Video Generation

作者:

Yizhou Zhao ↗Yifan Wang ↗Xiaoyuan Wang ↗Yushu Wu ↗Hao Zhang ↗Moayed Haji-Ali ↗Rameen Abdal ↗Ashkan Mirzaei ↗Yanyu Li ↗Willi Menapace ↗Laszlo Jeni ↗Sergey Tulyakov ↗…

Accurate interactive camera control is essential for video-based world models, but most existing approaches learn camera motion implicitly, leading to inaccurate control under out-of-distribution trajectories. Explicit geometric conditioning improves controllability, but existing methods are non-autoregressive and rely on a static 3D cache built from an initial frame, which becomes ineffective once the viewpoint moves beyond the original frustum. We propose GeoStream, a framework that enables precise metric-scale camera control in autoregressive streaming video generation. Our method maintains a self-refreshing 3D cache that is periodically updated online from the model's own outputs: we estimate depth from the most recently generated frame, unproject to 3D, and reproject into the target view to produce point reprojections as geometric conditioning for subsequent synthesis. By the same principle, the conditioning seen during training is also rendered from the student's own generated frames, yielding a fully on-policy distillation that naturally aligns the train and inference conditioning distributions. Unlike prior work that uses off-policy condition noising, our approach trains the model against the exact error distribution it encounters at inference, mitigating both standard autoregressive drift and the second-order geometric feedback loop that arises when the cache itself is derived from generated outputs. Quantitative and qualitative results show that our approach substantially improves camera controllability.

阅读与讨论 → 访问原文 →

探索全球前沿学术脉络