Academic Intelligence · Curated Daily

探索全球前沿学术脉络

AcademicHub 汇聚顶级期刊与预印本平台的实时文献。定制您的专属科研雷达,利用大语言模型自动生成交叉领域文献分析简报。

01.
arXiv (CS.AI) 2026-06-18

MIDS: Detecting Stealthy Masquerade and Tampering Attacks on CAN Bus via Bidirectional Mamba

arXiv:2606.18599v1 Announce Type: cross Abstract: The Controller Area Network (CAN) protocol is the primary communication standard for Electronic Control Units (ECUs) in modern vehicles, but its lack of encryption and authentication exposes it to a range of security threats. Existing intrusion detection systems are largely tuned to fabrication-style attacks (DoS, fuzzing, ID spoofing realised by frame injection), in which detection signals such as per-ID inter-arrival statistics are readily available. We instead address the harder masquerade setting[b37], in which an internal adversary substitutes a legitimate frame in-situ at its original transmission slot, preserving traffic periodicity and rendering traffic-statistic defences ineffective. We propose the Mamba Intrusion Detection System (MIDS), an innovative dual-stream framework that processes CAN identifiers and payloads in parallel and reconstructs their joint temporal semantics through bidirectional selective state-space modelling. To evaluate MIDS, we collected over 100 million CAN frames from a physical Tesla Model 3 across three driving regimes and synthesised 54 masquerade attack variants spanning ID-only, data-only, and combined modifications. MIDS attains an F1 of 96.94\% on this dataset, exceeding the strongest reproducible baseline by more than 8 percentage points, while sustaining a 1.147~ms single-window inference latency – ample headroom for real-time onboard deployment. To verify generalisation, we further evaluate MIDS on four public benchmarks (ROAD, CrySyS, OTIDS, CT\&T) covering both masquerade and injection scenarios; MIDS attains F1 from 93.70\% to 99.61\%, outperforming the strongest of eight reproduced baselines by up to 13.94 percentage points under a unified 5-fold protocol.

02.
arXiv (CS.LG) 2026-06-16

Maximum Entropy Inverse Reinforcement Learning for Mean-Field Games with Average Reward

arXiv:2606.16759v1 Announce Type: new Abstract: We study inverse reinforcement learning for discrete-time, infinite-horizon mean-field games (MFGs) under an average-reward criterion. Expert demonstrations are assumed to arise from a stationary mean-field equilibrium under an unknown reward, and the goal is to recover a policy explaining the observed behaviour via the maximum causal entropy principle. We formulate the inverse problem by enforcing consistency with the expert mean-field term and long-run feature expectations, treating two reward classes within a unified occupation-measure framework. For finite-dimensional linear rewards, we give a convex dual reformulation with an explicit log-partition objective, and prove smoothness and curvature properties justifying constant-step-size gradient descent. For infinite-dimensional RKHS rewards, we develop a Lagrangian relaxation whose inner-maximising policy is characterised by a soft Bellman equation. The main obstacle is the absence of a discount-factor contraction. We resolve this by introducing a minorisation-based sub-stochastic kernel that yields a strict contraction of the soft Bellman operator. We establish Fréchet differentiability and Lipschitz smoothness of the log-likelihood score, leading to a gradient ascent algorithm with convergence guarantees. Two numerical examples, a malware-spread MFG and an RKHS-based consumer-choice model, show that the recovered policies closely match expert behaviour.

03.
arXiv (CS.LG) 2026-06-16

Assessing Predictive Models for Fairness Based on Movement Patterns

arXiv:2605.23234v3 Announce Type: replace Abstract: Assessing the spatial fairness of predictive models involves establishing whether they are statistically penalizing (favoring) individuals associated with certain geographical locations. Literature on this topic makes the fundamental assumption that each individual is assigned to a single geographical location (e.g., place of residence). However, fairness with respect to the set of locations where one has been, i.e., their movement patterns over different regions, also matters when fairness is considered. Consequently, we argue that it is necessary to generalize the notion of spatial fairness to also include movement patterns, leading to the novel problem of assessing predictive models for fairness relative to the movements of individuals. To deal with this problem, we propose an approach that first associates the movements of individuals to certain geographic regions, considering multiple spatial partitions with different resolutions and alignments, and then employs a suitable spatial scan statistic to assess whether a predictive model is fair based on movement patterns. In the experimental evaluation, we study the performance of our approach over thousands of synthetic unfair datasets, showing that it is effective at detecting this new type of unfairness and at retrieving the set of objects treated unfairly, while localization performance exhibits a consistent multi-resolution trade-off.

04.
arXiv (CS.AI) 2026-06-19

Class-Incremental Motion Forecasting

arXiv:2603.09420v3 Announce Type: replace-cross Abstract: Motion forecasting enables autonomous vehicles to anticipate scene evolution by predicting the future trajectories of dynamic agents. However, existing approaches typically assume a closed-world setting with a fixed object taxonomy and access to high-quality perception, limiting their applicability in the real world where perception is imperfect, and new object classes may emerge over time. In this work, we introduce class-incremental motion forecasting, a novel setting in which new object classes are sequentially introduced over time and future object trajectories are predicted directly from camera images. We propose the first end-to-end framework for this setting, which adapts to newly introduced classes while mitigating catastrophic forgetting of previously learned ones. Our method generates motion forecasting pseudo-labels for known classes and matches them with 2D instance masks from an open-vocabulary segmentation model. This 3D-to-2D keypoint voting mechanism filters inconsistent and overconfident predictions, while a query feature variance-based replay strategy samples informative past sequences to preserve prior knowledge. Extensive evaluations on nuScenes and Argoverse 2 show that our approach successfully preserves performance on known classes while effectively adapting to novel ones. We further demonstrate zero-shot transfer to real-world driving and show that the framework extends naturally to open- and closed-loop end-to-end class-incremental planning on nuScenes and NeuroNCAP. Code and models will be made publicly available at https://omen.cs.uni-freiburg.de.

05.
arXiv (CS.AI) 2026-06-15

Discovery under Hypothesis Redundancy: A Geometric Theory of Discovery Bottlenecks

arXiv:2606.14386v1 Announce Type: cross Abstract: Scientific discovery saturates when new hypotheses cease to provide independent information, even if the nominal hypothesis space remains large. We study hybrid discovery systems that combine structured local search with LLM-generated non-local proposals and pose the Search Compression Hypothesis: non-local exploration helps only when three geometric conditions co-occur: spectral compression, orthogonal escape from the explored span, and residual signal alignment with the target. We formalize these conditions, derive necessary conditions for hybrid advantage, and test the mechanism in controlled synthetic environments, large-scale A-share factor discovery, and symbolic-regression benchmarks; a public tabular operational sanity check tests the associated budget-allocation implication. Signal-planting and directed-versus-random experiments show that novelty alone is insufficient: random orthogonal jumps expand coverage but do not improve yield without predictive alignment. Across compression sweeps, real factor archives, and LLM-SRBench tasks, hybrid gains concentrate in weakly represented but target-bearing directions and vanish as the hypothesis space approaches full rank. The framework turns LLM-guided discovery from generic novelty search into a diagnostic procedure for deciding when directed non-local exploration is warranted.

06.
arXiv (CS.CL) 2026-06-16

The Art of Mixology: Mixup-based Obfuscation for Privacy-Preserving Split Learning in Large Language Models

Split learning provides a practical paradigm for resource-constrained users to train Large Language Models (LLMs) by offloading computation-intensive layers to a server while keeping raw data local. However, existing privacy-preserving split learning methods still face a difficult trade-off among utility, privacy, efficiency, and stability. Specifically, these methods often suffer from substantial utility degradation, remain vulnerable to advanced data reconstruction attacks, incur prohibitive computational and communication overhead, or exhibit unstable performance across different tasks. In this paper, we propose MIXGUARD, a novel mixup-based privacy-preserving split learning framework for LLMs. MIXGUARD introduces token-level obfuscation, representation-level obfuscation, and adaptive gradient perturbation mechanisms, which operate jointly to preserve useful learning signals while preventing privacy leakage to the server. Technically, MIXGUARD first constructs a lightweight calibration model on a public dataset to refine the approximated target representation, and then applies this model during privacy-preserving fine-tuning on private data. We conduct extensive experiments on four classification tasks and four text generation tasks across multiple LLM families, model sizes, architectures, and fine-tuning strategies. The results show that MIXGUARD preserves model utility comparable to non-split training baselines, consistently achieves stronger privacy protection than existing split learning defense methods against state-of-the-art data reconstruction attacks, and remains robust under adaptive attack settings.

07.
arXiv (CS.CL) 2026-06-19

The Almost Intelligent Revolution: Options for Scaling Up Deliberation and Empowering People with AI

The increasing prominence of Large Language Models (LLMs) in public discourse presents both opportunities and challenges for democratic deliberation. While red teaming strategies help mitigate specific risks, broader concerns persist regarding linguistic constraints, biases, and the sycophantic tendencies of LLMs. This chapter explores how LLMs can be used to significantly scale up and democratise deliberation, particularly in fostering inclusivity and empowering traditionally marginalised groups. Drawing on concepts from Systemic-Functional Linguistics, the chapter examines how variations across language users (for example, with respect to socio-demographic groups) and across language use (for example, with respect to communicative functions) shape participation in AI-supported deliberation. The chapter presents AI-driven deliberation studies and assesses their potential to scaffold argumentation, enhance access, and reduce the influence of exclusionary linguistic norms and biases which are embedded in prestigious registers. At the same time, the chapter cautions against both overclaiming, which leads to unrealistic expectations, and underclaiming, which risks missed opportunities for AI-assisted engagement. The chapter concludes by identifying future research directions to maximise the democratic potential of AI-assisted participation while embedding ethical safeguards to counteract the reproduction of linguistic inequalities.

08.
arXiv (CS.LG) 2026-06-16

Coercivity and Local Convergence of Physical Learning in Linear Circuits

arXiv:2606.15443v1 Announce Type: cross Abstract: Physical learning methods train physical networks to perform computational tasks using only local update rules, exploiting the physics of the system to handle the global transfer of information. We provide the first local convergence analysis of three such methods – Equilibrium Propagation (EP), Coupled Learning (CL), and a new method we call Adjoint Coupled Learning (AL) – for linear circuits, in the limit of small-nudging for both discrete and continuous time. EP and AL perform gradient descent on a natural loss function, while CL follows modified dynamics with an additional cubic correction. Assuming the existence of a solution, we identify a coercivity condition, expressed as a rank condition on a matrix built from the network's incidence structure, under which the training loss decays exponentially and the parameters converge to the solution manifold. We show that coercivity can fail by exhibiting a kite circuit in which a symmetry causes the coercivity constant to degenerate on the solution manifold, but prove using Sard's theorem that such degeneracies are non-generic: coercivity holds at every point of the solution manifold for almost every choice of desired output.

09.
arXiv (CS.AI) 2026-06-17

Breaking the Code: Security Assessment of AI Code Agents Through Systematic Jailbreaking Attacks

arXiv:2510.01359v2 Announce Type: replace-cross Abstract: Code-capable large language model (LLM) agents are embedded in software engineering workflows where they can read, write, and execute code, raising "jailbreak" stakes beyond text-only settings. Prior evaluations emphasize refusal or harmful-text detection, leaving open whether agents compile and run malicious programs. We present JAWS-Bench (Jailbreaks Across WorkSpaces), a benchmark spanning three escalating workspace regimes mirroring attacker capability: empty (JAWS-0), single-file (JAWS-1), and multi-file (JAWS-M). We pair this with a hierarchical, executable-aware Judge Framework that tests (i) compliance, (ii) attack success, (iii) syntactic correctness, and (iv) runtime executability, to measure deployable harm. Across seven LLM backends from five families, prompt-only attacks in JAWS-0 achieve 61% compliance; 58% are harmful, 52% parse, and 27% run end-to-end. In JAWS-1, compliance reaches ~100% for stronger models with a mean ASR (Attack Success Rate) ~71%; JAWS-M raises mean ASR to ~75%, with 32% runnable attack code. Wrapping an LLM in an agent increases ASR by 1.6$\times$, by overturning initial refusals during planning and tool use. Similar trends hold for OpenHands, SWE-Agent, and OpenAI Codex, suggesting our JAWS-Bench is agent-agnostic. Category analyses identify which attack classes are most vulnerable and deployable, motivating execution-aware defenses and refusal-preserving agent designs.

10.
arXiv (CS.LG) 2026-06-16

ANCHOR: Error-Controlled Adaptive Numerical Correction for Neural Operator Time Marching

arXiv:2512.19643v2 Announce Type: replace Abstract: Numerical simulation of time-dependent partial differential equations (PDEs) is central to scientific and engineering applications, but high-fidelity solvers are often prohibitively expensive for long-horizon or time-critical settings. Neural operator (NO) surrogates offer fast inference across parametric and functional inputs; however, most autoregressive NO frameworks remain vulnerable to compounding errors, and ensemble-averaged metrics provide limited guarantees for individual inference trajectories. In practice, error accumulation can become unacceptable beyond the training horizon, and existing methods lack mechanisms for online monitoring or correction. To address this gap, we propose ANCHOR (Adaptive Numerical Correction for High-fidelity Operator Rollouts), an online, instance-aware hybrid inference framework for stable long-horizon prediction of nonlinear, time-dependent PDEs. ANCHOR treats a pretrained NO as the primary inference engine and adaptively couples it with a classical numerical solver using a physics-informed, residual-based error estimator. Inspired by adaptive time-stepping in numerical analysis, ANCHOR monitors an exponential moving average (EMA) of the normalized PDE residual to detect accumulating error and trigger corrective solver interventions without requiring access to ground-truth solutions. We show that the EMA-based estimator correlates strongly with the true relative L2 error, enabling data-free, instance-aware error control during inference. Evaluations on six canonical PDEs: 1D and 2D Burgers', 2D Allen-Cahn, 2D Cahn-Hilliard, 2D Navier-Stokes, and 3D heat conduction, demonstrate that ANCHOR reliably bounds long-horizon error growth, stabilizes extrapolative rollouts, and significantly improves robustness over standalone neural operators, while remaining substantially more efficient than high-fidelity numerical solvers.

11.
arXiv (CS.AI) 2026-06-19

Context-Aware Hierarchical Bayesian Modeling of IVF Laboratory Environmental Conditions

arXiv:2606.20459v1 Announce Type: new Abstract: IVF pregnancy rates are routinely modeled using patient-level variables, while high-resolution laboratory environmental data remain underutilized. We show that this is a missed opportunity. Rather than relying on raw sensor averages, we engineer 55 context-aware temporal features, including rolling thermal stability, simultaneous temperature-humidity adherence, peak stress duration, and post-stress recovery speed, that capture the dynamics of incubator microenvironments. On 61 weeks of data from an Asian IVF clinic, these features reduce cross-validated prediction error to 1.27%, compared to 3-5% for raw averages. We then train a hierarchical Bayesian Beta regression model that shares environmental effects across an Asian and a Northern European clinic via partial pooling, while preserving site-specific baselines. On held-out data from the Northern European clinic, the model achieves R2 = 0.86 and a 64% error reduction for the 35-39 age group over a naive baseline, demonstrating that structured environmental monitoring contains clinically meaningful, transferable signal.

12.
arXiv (CS.LG) 2026-06-16

IBAD: Interpretable Behavioral Anomaly Detection on Human Mobility Data

arXiv:2606.16023v1 Announce Type: new Abstract: Human mobility appears highly diverse, yet much of a person's daily mobility can be explained by a small set of recurring behavioral templates, such as commuting, school-centered activities, caregiving, nightlife, or errand patterns. We present \texttt{IBAD} (\underline{I}nterpretable \underline{B}ehavioral \underline{A}nomaly \underline{D}etection), a framework that learns interpretable daily mobility templates and represents each individual as a distribution over mixtures of these templates. Rather than focusing on specific locations, IBAD characterizes activities that individuals perform across locations. This approach first discovers global behavioral templates using Latent Dirichlet Allocation (LDA), then employs a hierarchical self-supervised model to learn normal behavior of individuals from their soft behavioral templates. We also introduce a splicing benchmark that creates controlled behavioral mismatches between an individual's historical profile and injected mobility patterns. Experiments on real-world and synthetic datasets show that daily behavior can be effectively decomposed into a small number of interpretable templates. Crucially, we show that the learned behavioral archetypes transfer across distinct geographic and demographic contexts. Furthermore, IBAD maintains a robust competitive performance across all settings. For reproducibility purposes, the code is accessible at ~\href{https://github.com/USC-InfoLab/IBAD}{https://github.com/USC-InfoLab/IBAD}.

13.
Nature Medicine 2026-06-17

General-purpose chatbots outperform clinical AI tools on physicians’ real-world questions

作者: 未知作者

Specialized clinical AI tools are entering medical practice with little independent testing. In a head-to-head evaluation across two public benchmarks and real questions from physicians, three general-purpose frontier large language models outperformed two leading clinical AI tools, which performed no better than Google search AI overview.

14.
arXiv (CS.AI) 2026-06-16

NeuronFabric: A Software Reference Architecture for On-Chip Transformer Training with Local Adam

arXiv:2606.16440v1 Announce Type: cross Abstract: Publicly documented accelerator architectures generally separate training computation from optimizer-state updates or rely on external memory and host orchestration. This paper presents NeuronFabric, a software reference architecture intended for future FPGA and ASIC implementations of transformer training with local Adam updates. A complete C# prototype implements forward pass, backpropagation, and Adam optimization without external machine-learning frameworks. The goal is to validate numerical correctness and memory requirements before hardware implementation. The evaluated model is a 334K-parameter autoregressive transformer (d=88, H=4, f=264, L=4, vocab=256) trained on the Shakespeare corpus. The BF16W configuration achieves evaluation loss 1.5426 after 80K samples, compared with 1.5224 for an FP32 GPU reference, while producing coherent character-level text. The paper introduces BF16W, which stores weights in BF16 while retaining Adam optimizer moments in FP32. This reduces memory requirements for on-chip training. A 334K-parameter FP32 model with Adam moments requires approximately 4.0 MB, matching the BRAM capacity of a Xilinx ZCU102 device. The BF16W variant requires approximately 3.34 MB, leaving memory available for activation storage. We describe the vocabulary-budget constraint observed during earlier experiments, quantify BF16W memory savings, and outline FPGA training as the next stage of development. No FPGA measurements are included in this paper. This publication serves as a public architectural disclosure and software reference implementation for future FPGA and ASIC exploration of the NeuronFabric architecture.

15.
arXiv (CS.LG) 2026-06-16

From Physics to Representation: Audio Learning with Synthetic Pre-training via Procedural Generation

arXiv:2606.14791v1 Announce Type: cross Abstract: Self-supervised learning advances audio representation for multimedia analysis. However, prevailing data-centric approaches rely on massive real-world corpora, increasing training costs, curation burdens, and privacy barriers. To address this, we present AudioPG, a procedural synthesis framework eliminating real audio recordings during pre-training. AudioPG trains a Transformer-based masked autoencoder on waveforms generated on-the-fly from basic acoustic primitives and composition rules. The encoder transfers effectively to real audio benchmarks, achieving 90.60% accuracy on ESC-50, 0.546 mAP on FSD50K, 88.17% on UrbanSound8K, and 97.03% on Speech Commands V2. Notably, pre-training completes in under 20 minutes on a single GPU. Latent space analysis reveals physical factors, including fundamental frequency and relative intensity, emerge in orthogonal subspaces, making representations linearly decodable. These results establish procedural synthesis as an efficient, interpretable pre-training signal when large-scale corpora are unavailable. Our code is available at: https://github.com/Freyliu0516/audioPG.

16.
arXiv (quant-ph) 2026-06-16

Exactly Solvable Quantum Model with Spin-Dependent Coulomb Interaction

arXiv:2501.05103v5 Announce Type: replace Abstract: In this work, we report an exactly solvable quantum model featuring a spin-dependent Coulomb interaction, described by the spin vector potential \(\vec{\mathcal{A}} = k (\vec{r} \times \vec{S}) / r^2\) together with a Coulomb-type scalar potential \(\varphi = \kappa / r\) . The model is governed by the Schrödinger-type Hamiltonian \(\mathcal{H}_S = \vec{\Pi}^2 / (2M) + q \varphi\) in nonrelativistic quantum mechanics and by the Dirac-type Hamiltonian \(\mathcal{H}_D = c \vec{\alpha} \cdot \vec{\Pi} + \beta M c^2 + q \varphi\) in relativistic quantum mechanics, where \(\vec{\Pi} = \vec{p} - (q/c)\vec{\mathcal{A}}\) is the canonical momentum. We demonstrate two main results: (i) Just as the Coulomb-type scalar potential \(\mathcal{S}_Maxwell = \{\vec{\mathcal{A}} = 0,\ \varphi = \kappa / r\}\) is a local exact solution of Maxwell's equations on $r\neq0$, the gauge potential \(\mathcal{S}_YM = \{\vec{\mathcal{A}} = k (\vec{r} \times \vec{S}) / r^2,\ \varphi = \kappa / r\}\) constitutes a local exact solution of the Yang–Mills equations on the punctured region $r\neq0$. (ii) Both Hamiltonians \(\mathcal{H}_S\) and \(\mathcal{H}_D\) can be solved exactly in the presence of this spin-dependent Coulomb interaction. The resulting energy spectra are derived, and they naturally reduce to those of the ordinary hydrogen atom when the spin-dependent terms are neglected. Finally, we clarify the quantization conditions and the fixed-background interpretation of the model.

17.
arXiv (CS.CV) 2026-06-15

Rotation-Invariant Spherical Watermarking via Third-Order SO(3) Representation Coupling

Reliable watermarking of panoramic imagery is fundamentally challenged by arbitrary 3D rotations. As panoramas are defined on the sphere, they naturally transform under the action of $SO(3)$, rendering conventional planar representations and augmentation-based robustness strategies inadequate and devoid of theoretical guarantees. To address this, we formulate panoramas as spherical signals and leverage $SO(3)$ representation theory to derive provably rotation-invariant descriptors. While spherical harmonic coefficients transform equivariantly under rotations, the natural invariant constructions are typically limited to zeroth-order statistics which eliminate directional information and severely constrain embedding capacity. In this work, we introduce a principled third-order invariant construction by coupling higher-order $SO(3)$ irreducible representations via tensor products and projecting onto the trivial representation. This yields a spherical invariant bispectrum that preserves phase information while remaining strictly rotation-invariant. Leveraging this property, we embed watermarks into higher-order spherical harmonic coefficients and recover them from invariant bispectral scalars, enabling reliable extraction under arbitrary 3D rotations. We provide a theoretical proof of $SO(3)$ invariance for it and demonstrate experimentally its near-perfect robustness to continuous rotations while maintaining high visual fidelity.

18.
Nature (Science) 2026-06-10

In situ nanocrystal confinement for efficient blue perovskite LEDs

Metal halide perovskites have emerged as promising semiconductors for light-emitting diodes (LEDs) owing to their excellent luminescence properties1. However, their performance remains limited, primarily owing to the inherent contradiction between ‘high crystallinity’ and ‘small size’ in the in situ synthesis of perovskite nanocrystals on substrates. Here we report efficient blue perovskite LEDs (PeLEDs) achieved via in situ polymerization-driven nanocrystal confinement to synthesize perovskite films composed of high-quality nanocrystals. The in situ-formed polymer network imposes nanoscale spatial constraints during perovskite nanocrystal growth, enabling nanocrystals with small sizes and a high photoluminescence quantum yield of 83%. Furthermore, polymerizable monomers with sufficient coordination sites allow a prolonged lattice rearrangement of perovskite clusters, promoting the crystallinity of the nanocrystals. The synthesized perovskite nanocrystals are utilized in the fabrication of PeLEDs, resulting in an external quantum efficiency of 21.8% at 491 nm, which is among the highest performances in blue PeLEDs. This work simultaneously controls the thermal dynamics of perovskite crystallization and organic ligand reactions, which helps to advance understanding of the effect of ligand engineering on nanocrystal synthesis, benefiting the development of efficient PeLEDs and other optoelectronic technologies. Efficient blue perovskite light-emitting diodes with an external quantum efficiency of 21.8% are achieved through in situ polymerization-driven nanocrystal confinement.

19.
arXiv (CS.LG) 2026-06-12

A2D2: Fine-Tuning Any-Length Discrete Diffusion for Adaptive Decoding

arXiv:2606.13565v1 Announce Type: new Abstract: Discrete diffusion models offer a simple and stable likelihood-based framework for sequence generation, recently extended to any-length settings via token insertion. Principled reward-guided fine-tuning for any-length discrete diffusion, however, remains largely unexplored. We introduce Fine-Tuning Any-Length Discrete Diffusion for Adaptive Decoding (A2D2), a unified framework for reward-guided fine-tuning of any-length discrete diffusion models via joint optimization of the insertion and unmasking policies together with a quality-based inference schedule. We derive the Radon-Nikodym derivative for the joint insertion-unmasking path measures, enabling theoretically guaranteed convergence to the intractable reward-tilted sequence distribution without requiring target samples. Building on this, we establish unmasking and insertion quality as tractable approaches for minimizing decoding error and introduce the Adaptive Joint Decoding (AJD) loss, which provably yields the optimal path measure that generates the reward-tilted distribution. Empirically, A2D2 improves reward optimization while enhancing generation flexibility and accuracy over prior fixed-length fine-tuning and inference-time guidance methods.

20.
medRxiv (Medicine) 2026-06-17

MedAgent: A Retrieval-Augmented Clinical Decision Support Agent with Verifiable Evidence Grounding for Evidence-Based Medicine

Evidence-based medicine demands clinical answers that are not only fluent and medically plausible, but also anchored in traceable evidence, tailored to patient-specific clinical questions, sensitive to the hierarchy of evidence, and respectful of clinical safety boundaries. While general-purpose large language models (LLMs) exhibit strong medical language generation ability, they tend to lean on parametric memory, underuse retrieved evidence, hallucinate citations, conflate evidence levels, and draw conclusions that are not fully supported by the underlying literature. Such limitations pose particular risks in clinical decision support, where answer reliability, evidence traceability, and reasoning consistency are paramount. To address these issues, we present MedAgent, an evidence-based medical agent trained through an end-to-end pipeline that integrates supervised fine-tuning (SFT) cold start, reward modeling, and Group Relative Policy Optimization (GRPO). The agent is designed to execute a structured workflow encompassing clinical question understanding, PICO extraction, evidence retrieval, evidence stratification, citation-grounded answer generation, and quality evaluation. Specifically, a Qwen2.5-14B-Instruct backbone is first cold-started on 200 human-verified agent trajectories, equipping it with tool invocation, PICO parsing, structured response generation, and citation faithfulness. Next, a Qwen2.5-7B reward model is trained on 2{,}099 pairwise preference samples to provide semantic-level quality signals for evidence-based responses. Finally, GRPO reinforcement learning is conducted in a retrieval-augmented agent environment, where every rollout involves real evidence retrieval and is scored jointly by rule-based rewards and reward-model signals. To avoid over-reliance on training rewards, we further construct an independent evidence-based medical evaluation benchmark, MedTrustBench, which contains 200 clinical questions spanning 10 specialties and four difficulty levels. Each question is annotated with standardized PICO elements and rubric-based scoring criteria. The benchmark includes 1{,}187 rubrics across seven dimensions: question relevance, evidence hierarchy, evidence quality and timeliness, evidence-answer consistency, completeness and depth, logical rigor, and medical terminology. Under an identical RAG pipeline, retrieval tool, retrieval configuration, and evaluation protocol, MedAgentv17 attains 78.6 points, outperforming GPT-4.1 (75.3) and approaching GPT-5.4 (80.3). These results show that a 14B domain-aligned model can surpass strong general-purpose baselines on specialized evidence-based medical reasoning, while delivering practical advantages in cost, privacy, controllability, and hospital-oriented private deployment. The model and associated datasets are publicly released at https://www.modelscope.cn/profile/InfoxmedModel

21.
arXiv (CS.CV) 2026-06-16

VANDERER: Map-Free Exploration using Future-Aware and Visual-Curiosity-Guided Diffusion Policy

Mobile agents require efficient exploration strategies to map unseen environments and autonomously plan tasks. Traditional methods rely on generating occupancy maps and optimizing the sequence in which unexplored regions are visited. However, in sensor-constrained settings, such as those limited to monocular cameras, generating accurate occupancy maps is challenging. To address this, we propose VANDERER, an exploration framework that leverages a Visual Curiosity Module (VCM) to guide pre-trained diffusion policies using only monocular image data. This curiosity module predicts the outcomes of proposed actions via a navigation world model and evaluates them through a curiosity cost. The cost then guides the diffusion process toward generating actions that maximize exploration. Evaluated across diverse simulated environments, VANDERER consistently outperforms established baselines, exploring an average of 13.4% more area than NoMaD. Our results reveal a direct correlation between visual and geometric curiosity in outdoor environments, demonstrating that VANDERER can effectively leverage this relationship for efficient exploration using sensor-constrained agents.

22.
arXiv (math.PR) 2026-06-17

Limit theorems for descents and inversions of shelf-shuffles

arXiv:2510.00343v2 Announce Type: replace Abstract: We prove central limit theorems for the number of descents and inversions of permutations produced by shelf-shuffles. These are a model for casino card shuffling machines. We show the asymptotic normality of the number of descents in two limiting regimes depending on the ratio of cards to shelves. On the other hand, we study the inversions by employing a modification of the techniques from Islak's analysis of the statistics of riffle shuffles. In particular, we obtain a bound for the rate of convergence for inversions that is independent of the number of shelves.

23.
arXiv (CS.CL) 2026-06-11

Fine-tuning Multi-modal LLMs with ART: Art-based Reinforcement Training

There are two main Parameter-Efficient Fine-Tuning (PEFT) techniques for Large Language Models (LLMs). While Low-Rank Adaptation (LoRA) introduces additional weights between the LLM layers, Soft Prompting introduces additional fine-tuning-specific raw tokens to an LLM input. However, both require modification to the computational graphs of precompiled, preoptimized LLMs. As a result, neither is fully supported in high-throughput engines like vLLM. We propose fine-tuning with ART (Art-based Reinforcement Training). The method injects information into a frozen Multimodal Large Language Model (MLLM) by optimizing only its raw visual input, thus enabling the soft-token approach on pre-compiled computational graphs. It relies on backpropagation of gradients back into a plain pixel array and thus supports any fine-tuning objective. Moreover, the optimized visual input can be stylized as task-relevant computational artworks. The approach's effectiveness is confirmed for different sizes of a popular open Qwen architecture and for several textual benchmarks. Specifically, ART reaches accuracy competitive with LoRA across mathematics and structured-tool-use benchmarks.

24.
arXiv (CS.LG) 2026-06-16

Physics-conforming Latent Twins

arXiv:2606.15053v1 Announce Type: new Abstract: Surrogate models are central to scientific machine learning, where they enable fast prediction, simulation, inference, and control for complex physical systems. For time-dependent problems, however, accurate interpolation of training trajectories is not sufficient: reliable surrogates should also respect the conservation laws, invariants, admissibility conditions, and dissipative structures that give those trajectories physical meaning. We introduce Physics-conforming Latent Twins, a framework for learning latent surrogate solution operators whose dynamics satisfy selected physical principles by design. The method builds on the Latent Twin formulation by jointly learning an encoder, a decoder, and a latent flow map between arbitrary time-indexed states, while constraining the latent dynamics to preserve or dissipate prescribed structural quantities. We develop a constraint-transfer viewpoint that connects physical structure in the original state space with compatible constraints in latent space, and prove structure-preservation bounds showing how latent enforcement improves control of physical defects after decoding. We also derive algebraic conditions for latent flow maps that preserve linear and quadratic invariants or enforce dissipative inequalities. Numerical experiments on representative ODE and PDE benchmarks demonstrate improved constraint satisfaction, structural fidelity, and qualitative long-time behavior while maintaining accurate surrogate prediction.

25.
arXiv (CS.CL) 2026-06-16

SpatialWorld: Benchmarking Interactive Spatial Reasoning of Multimodal Agents in Real-World Tasks

Spatial reasoning is a foundational capability for multimodal large language models (MLLMs) to perceive and operate within the physical world. However, existing benchmarks predominantly rely on passive evaluation (e.g., static VQA) or simulator-specific pipelines, failing to assess general interactive spatial understanding. We introduce SpatialWorld, a unified benchmark designed specifically for evaluating the interactive spatial understanding of multimodal agents in complex real-world tasks. Integrating eight heterogeneous simulation backends under a shared, simulator-agnostic protocol, SpatialWorld features 760 human-annotated tasks across diverse domains (e.g., household routines, travel, social collaboration). Agents must solve tasks under vision-only partial observability, actively gathering egocentric visual evidence and expressing decisions via a unified, text-based action interface native to MLLMs. For reliable evaluation, each task includes a human-validated initial state, a reference trajectory, and a terminal-state verifier. Evaluating 15 advanced agents reveals that robust spatial task solving remains challenging: the strongest model, GPT-5, achieves an average task success rate (TSR) of only 17.4%, while the leading open-source model, Qwen-3.5, reaches 14.1%. Further analysis exposes a clear mismatch between task success and execution efficiency, alongside substantial domain-specific performance variations. These bottlenecks in active exploration and long-horizon planning position SpatialWorld as a rigorous testbed for future spatial agents.