Academic Intelligence · Curated Daily

探索全球前沿学术脉络

AcademicHub 汇聚顶级期刊与预印本平台的实时文献。定制您的专属科研雷达,利用大语言模型自动生成交叉领域文献分析简报。

01.
arXiv (quant-ph) 2026-06-16

Intermodal entanglement in a quantum optical model of HHG due to the back-action on the driving field

arXiv:2603.01315v2 Announce Type: replace Abstract: Preparation of nonclassical light with special quantum properties is essential for quantum technologies. High-harmonic generation (HHG) is a process which not only enables the creation of attosecond pulses but also has the potential to generate light with intricate quantum properties. In a recent experiment [1], nonclassical inter-harmonic correlations have been measured from a HHG source. In this work, we theoretically investigate entanglement between different harmonics within an effective quantum optical model. This model implements a signifcant degree of simplifcation regarding the processes within the target material, treating the material through susceptibilities, as it is usual in quantum optics. Such an approach yields a general description of HHG, permitting the implications that can be derived within it to hold broadly. We find that entanglement is produced as a result of the often neglected back-action. We can qualitatively reproduce experimentally measured nonclassicalities, which suggests that intermodal entanglement can, to an extent, be considered a universal phenomenon associated with HHG, rather than a result of using specific material targets.

02.
arXiv (CS.LG) 2026-06-12

Extracting Governing Equations from Latent Dynamics via Multi-View Contrastive Learning

arXiv:2606.13260v1 Announce Type: new Abstract: Identifying latent dynamical systems from noisy, high-dimensional measurements is a central problem at the intersection of representation learning, system identification, and scientific discovery. We present DYSCO, a multi-view temporal contrastive learning algorithm that jointly recovers latent trajectories and the governing dynamics from such observations, by leveraging multiple independent noisy views of the same underlying process to disentangle signal from noise. By parameterizing the dynamics in a structured functional basis, our framework further enables symbolic recovery of the governing equations within an affine gauge. We offer theoretical guarantees for strong identification up to an affine indeterminacy, extending prior identifiability results to the realistic setting of noisy nonlinear observations. Empirically, we demonstrate accurate recovery of both latent trajectories and flow fields across a diverse set of dynamical regimes (e.g., chaotic, oscillatory, and metastable) under both Gaussian and Poisson observation noise, the latter being particularly relevant for neural recordings.

03.
arXiv (CS.LG) 2026-06-15

Contract-Based Compositional Shielding for Safe Multi-Agent Reinforcement Learning

arXiv:2606.14130v1 Announce Type: new Abstract: Safe coordination problems surface in multi-agent reinforcement learning when global safety cannot be enforced by any agent unilaterally: the admissibility of one agent's action may depend on the dynamics of other agents. Decentralised shields can enforce safety at runtime, but purely factorised permissions often exclude optimal team behaviour that is safe only through coordination. We study deterministic safety guarantees for agents trained and deployed under decentralised execution, recovering team-optimal safe behaviour without centralised runtime control. Agents have a shared global specification $\phi$ in the safety fragment of Linear Temporal Logic ($\mathsf{LTL}_{\mathsf{safe}}$ ), and select among tuples of local $\mathsf{LTL}_{\mathsf{safe}}$ obligations whose conjunction implies the global specification $\phi$. Each agent may rely on the other agents' local obligations as assumptions because the whole contract tuple is certified simultaneously and allows projection into local action masks. At learning time, a non-stationary multi-armed bandit chooses among a library of local $\mathsf{LTL}_{\mathsf{safe}}$ obligations to select the tuple that optimises team reward, all without forgoing end-to-end safety. We evaluate the approach across 6 environments and 15 algorithmic variants.

04.
arXiv (CS.LG) 2026-06-15

SemPiper: Interactive Code Synthesis for Semantic Operators in Machine Learning Pipelines

arXiv:2606.14361v1 Announce Type: new Abstract: Machine learning (ML) pipelines require extensive data preparation, feature engineering, and integration across heterogeneous sources, making them tedious and error-prone to develop. While large language models (LLMs) have recently shown promise for assisting programming tasks, chat-based interfaces provide limited control over pipeline behavior and often produce code that is difficult to optimize or integrate into production systems. We demonstrate SemPipes, a novel programming model that extends ML pipelines with declarative, LLM-powered semantic data operators. SemPipes allows developers to specify high-level natural language instructions for data-centric operations, while seamlessly combining these operators with arbitrary Python code from standard data science libraries. For the semantic operators, it synthesizes specialized implementations at pipeline training time, conditioned on dataset characteristics and pipeline context, enabling the flexible yet controlled integration of LLM capabilities. We demonstrate SemPipes through SemPiper, an interactive interface that visualizes computational graphs of the pipelines, synthesized operator implementations, and optimization trajectories produced by an evolutionary search procedure. Attendees can explore three end-to-end scenarios, modify pipelines, inspect generated code, and observe how semantic operators are synthesized and iteratively optimized. The demonstration highlights how declarative semantic operators enable controllable, optimizable, and practical integration of LLMs into ML pipeline development.

05.
arXiv (CS.CL) 2026-06-11

Reassessing High-Performing LLMs on Polish Medical Exams: True Competence or Bias-Driven Performance?

Large language models (LLMs) in medicine are mainly evaluated using multiple-choice question answering (MCQA), which can overestimate real clinical ability due to guessing strategies and answer biases. To address these limitations, we introduce an expanded and more challenging benchmark based on Polish medical exams, adding over 15,000 questions, two new domains, and four structural modifications that reduce MCQA-specific artifacts and better test reasoning. We evaluate 21 LLMs and show that evaluation design strongly affects results. Under our harder setup, the best model (Qwen3.5-122B) drops by 28.4 and 31 pp on English and Polish exams, respectively. Despite low evidence of data contamination, standard MCQA scores do not reliably reflect true medical competence. To facilitate further research, we make our benchmark publicly available.

06.
arXiv (CS.LG) 2026-06-12

The Geometry of Phase Transitions in Generative Dynamics via Projection Caustics

arXiv:2606.13191v1 Announce Type: new Abstract: Continuous-state generative samplers, including diffusion and flow-matching models, evolve through continuous reverse-time dynamics, yet their samples often undergo abrupt qualitative changes: trajectories commit to modes, semantic alternatives collapse, and small perturbations in narrow time windows can produce large downstream effects. This paper develops a geometric account of such phase-transition-like behaviour. We view denoising as gradient descent on a free energy landscape and show that sharp transitions arise near projection caustics, where the nearest-point projection onto the data support ceases to be unique. Motivated by this perspective, we introduce the Critical Boundary Detector (CBD), as practical diagnostics for score-direction instability. Across toy models, standard diffusion models, and latent text-to-image diffusion models, CBD localises mode commitment, predicts intervention-sensitive windows, and supports targeted control in geometrically sensitive regions. Our results connect geometry of data and dynamics of diffusion generation.

07.
arXiv (CS.AI) 2026-06-18

InfoPO: Information-Driven Policy Optimization for User-Centric Agents

arXiv:2603.00656v2 Announce Type: replace Abstract: Real-world user requests to LLM agents are often underspecified. Agents must interact to acquire missing information and make correct downstream decisions. However, current multi-turn GRPO-based methods often rely on trajectory-level reward computation, which leads to credit assignment problems and insufficient advantage signals within rollout groups. A feasible approach is to identify valuable interaction turns at a fine granularity to drive more targeted learning. To address this, we introduce InfoPO (Information-Driven Policy Optimization), which frames multi-turn interaction as a process of active uncertainty reduction and computes an information-gain reward that credits turns whose feedback measurably changes the agent's subsequent action distribution compared to a masked-feedback counterfactual. It then combines this signal with task outcomes via an adaptive variance-gated fusion to identify information importance while maintaining task-oriented goal direction. Across diverse tasks, including intent clarification, collaborative coding, and tool-augmented decision making, InfoPO consistently outperforms prompting and multi-turn RL baselines. It also demonstrates robustness under user simulator shifts and generalizes effectively to environment-interactive tasks. Overall, InfoPO provides a principled and scalable mechanism for optimizing complex agent-user collaboration. Code is available at https://github.com/kfq20/InfoPO.

08.
medRxiv (Medicine) 2026-06-17

Adverse Childhood Experiences Reorganise the Brain-Personality Network Across the Psychosis Spectrum

Exposure to adverse childhood experiences is a pervasive risk factor for psychosis, exhibiting a linear relationship across the psychosis spectrum from subclinical schizotypal traits to schizophrenia spectrum disorders. While this association is often conceptualised within the vulnerability-stress framework, the systemic mechanisms through which childhood trauma reconfigures the brain-personality interactome remain poorly understood. We examined clinical, neuropsychological, and neuroimaging data from a sample of low- and high-schizotypy individuals, and patients with a diagnosis of schizophrenia spectrum disorder (N=120). Our aim was to map how trauma reconfigures interactions between neurobiology and schizotypal phenomenology. We adopted a mixed graphical model approach to jointly estimate conditional dependencies between childhood trauma, regional brain morphometry, and schizotypal traits across the psychosis spectrum. Our results show that childhood trauma reconfigures the brain-personality network, shifting it from a state driven by cognitive processes to one anchored in emotional (limbic) reactivity. This transition is marked by the increased influence of impulsive traits and a significant strengthening of connections within the salience network. These changes converge with a reduced thickness of the frontal executive regions, the brain's control centres, identified in our models. Collectively, our results suggest a structural phenomenological decoupling, where trauma conditioned affective circuits may bypass weakened top-down regulatory controls. These findings highlight the necessity of using integrative frameworks to capture how trauma fundamentally reshapes the relationship between the brain and schizotypal personality.

09.
arXiv (CS.AI) 2026-06-18

Engagement Intensity as a Learner-Modeling Signal for Adaptive AI Ethics Instruction

arXiv:2606.18548v1 Announce Type: cross Abstract: Adaptive AI ethics instruction in graduate research training benefits from intake measures that reflect differences in prior LLM experience. Prior coursework or workshop attendance is an obvious candidate, but it is not clear whether it is associated with pre-instruction ratings on key AI perception items. We compare three candidate intake features, self-reported usage frequency, self-rated LLM familiarity, and prior AI education, across five baseline perception outcomes in 93 bioscience graduate and postdoctoral trainees enrolled in a required research ethics course. Usage frequency shows Holm-corrected associations with all five outcomes, self-rated familiarity with three, and prior AI education with none. A threshold-like pattern at the lower end of the scale is most visible for training interest and accuracy trust rather than appearing as a uniform gradient across all five outcomes. In a short intake survey, reported LLM use is more consistently associated with these perceptions than prior coursework or workshops, with self-rated familiarity serving as a secondary indicator. These results suggest that simple pre-instruction behavioral signals can inform lightweight intake profiling for adaptive AI ethics education.

10.
medRxiv (Medicine) 2026-06-22

Disentangling adiposity-related and non-adiposity-related genetic pathways for type 2 diabetes

OBJECTIVE To identify circulating proteins associated with type 2 diabetes (T2D) risk through pathways not fully explained by body mass index (BMI), and to assess therapeutic actionability. RESEARCH DESIGN AND METHODS We applied GWAS-by-subtraction within a genomic structural equation model to European ancestry summary statistics for T2D (74,124 cases, 824,006 controls) and BMI (n = 681,275), partitioning T2D liability into BMI-related and BMI-subtracted components. We then performed proteome-wide Mendelian randomization (MR) using cis-protein quantitative trait loci from four plasma proteomics cohorts: ARIC, deCODE, Fenland, and the UK Biobank Pharma Proteomics Project. Prioritized proteins passed sensitivity analyses with alternative MR methods and were supported by colocalization evidence. Tissue-resolution regulatory support was assessed using cis-eQTL colocalization across GTEx and pancreatic islet, subcutaneous adipose, and whole-blood resources. Actionability was evaluated using the druggable genome and Open Targets. RESULTS GWAS-by-subtraction attenuated the genetic correlation between BMI and BMI-subtracted T2D from 0.54 (SE 0.02) to 0.35 (SE 0.02). Proteome-wide MR prioritized 29 proteins for BMI-subtracted T2D. Thirteen showed eQTL colocalization in at least one tissue, implicating liver and intermediary metabolism (GCDH, NOTCH2), pancreatic islet biology (CTRB2, MANBA), adipose and Wnt signaling (RSPO3, GALNT3), and whole blood regulatory signals (PAM, SNUPN). Sixteen proteins were classified within druggable-genome Tiers 1-3, and five had existing Open Targets compounds. CONCLUSIONS Integrating GWAS-by-subtraction, proteome-wide MR, and colocalization nominated 29 proteins associated with T2D liability not fully explained by BMI. These findings highlight genetically supported targets for follow-up studies of T2D therapies that complement weight-centered approaches.

11.
arXiv (CS.CL) 2026-06-17

Perceptual compensation for tonal context in self-supervised speech models

This study examines the extent to which the wav2vec2.0 architecture exhibits evidence of compensation for phonological context. We conducted a pseudo-replication of a perceptional compensation experiment on Mandarin Chinese tones, and compared the embedding similarities and probing classifier outputs between a purely self-supervised pre-trained model and a model fine-tuned for Mandarin ASR. No evidence of compensation was found in the embedding similarities of the purely pre-trained model. Probing classifiers showed some evidence of compensation in addition to the expected layer-wise improvements in categorization, but failed to replicate human performance on isolated test syllables. Our findings contrast with previous reports of sensitivity to phonological structure emerging through pre-training alone, and suggest that supervised objectives may be necessary to encourage the abstraction of at least some types of phonological regularities.

12.
medRxiv (Medicine) 2026-06-17

Brain age gap correlates with DTI-derived microstructural abnormalities in multiple sclerosis.

Background: Brain age gap (BAG) is increased in multiple sclerosis (MS), but whether it reflects microstructural pathology beyond conventional atrophy remains unclear. Objective: To test whether BAG is elevated in MS and correlates with conventional and diffusion tensor imaging (DTI) abnormalities relative to healthy controls. Methods: A case-control study of 43 people with MS and 18 healthy controls was performed. BAG was estimated from T1-weighted MRI using brainageR. Controls were used as MRI reference distributions. MRI values were expressed as deviation z-scores and correlated with BAG within MS. Conventional MRI and DTI domains were analysed using age/sex-adjusted partial correlations with domain-wise Benjamini-Hochberg FDR correction, where appropriate. Results: BAG was higher in MS than controls (4.79 vs -2.58 years; p

13.
arXiv (quant-ph) 2026-06-17

Experimental Characterization and Modeling of Measurement-Induced State-Transitions in a Fluxonium Superconducting Qubit

arXiv:2606.17866v1 Announce Type: new Abstract: Superconducting qubits are most often measured using dispersive readout, which, ideally, implements a projective quantum non-demolition (QND) measurement. While a larger readout drive can increase the signal and, thus, reduce discrimination errors in the readout, strong microwave drives may also cause non-QND errors by driving the qubit to a state outside the computational subspace. In this work, we experimentally characterize measurement-induced state transitions (MIST) in a fluxonium qubit over its full external flux range. We further numerically calculate the MIST errors, and find that the theory accurately predicts eleven experimentally identified regions with increased MIST. In addition to transitions to higher fluxonium levels, we also find that, at certain flux points, MIST errors are dominated by transitions that include the transmission-line-like array modes of the fluxonium's superinductor. The excellent match between theory and experiment validates that the models accurately predict the occurrence of MIST in these systems, and further highlights the influence of array modes in fluxonium readout.

14.
arXiv (CS.CV) 2026-06-15

Toward 360-Degree Indoor Panorama Editing via Tuning-Free Diffusion Model with Refocusing Cross-Attention

Zero-shot text-guided diffusion has significantly advanced image editing; however, its practical usability remains constrained by three persistent challenges: prompt brittleness that requires meticulous prompt engineering, spillover edits that unintentionally affect non-target regions, and failures on small or cluttered objects caused by limited fine-grained supervision in training data. We propose FocusDiff (Target-Aware Refocusing for Tuning-Free Diffusion Editing), a tuning-free framework for precise and region-specific image manipulation based on refocusing cross-attention. Given a target region obtained through automated segmentation or manual selection, FocusDiff applies selective blurring to non-edit areas to guide attention toward the masked region while accurately transferring the object's identity, structure, and appearance to the edited output. Integrated context-preserving modules further ensure background fidelity and global coherence, enabling accurate edits from simple text prompts in a single pass. We also extend FocusDiff to 360-degree indoor panorama editing and demonstrate its effectiveness within virtual reality environments. Extensive experiments on our localized editing benchmark LIMB, comprising 30 multi-object images and 100 annotated examples including challenging small-object cases, show that FocusDiff outperforms existing zero-shot editors in text-image alignment and background preservation, achieving superior precision, photorealism, and usability. The project page is available at https://vdkhoi20.github.io/FocusDiff.

15.
arXiv (quant-ph) 2026-06-16

Counterdiabatic Raman Atom Optics for Compact High-Sensitivity Gravimetry

arXiv:2606.16945v1 Announce Type: new Abstract: Large-momentum-transfer (LMT) atom interferometry provides a route toward enhanced inertial sensitivity in compact quantum sensors, but its scalability is limited by the accumulation of pulse-transfer errors across long Raman pulse sequences. We investigate theoretically the use of stimulated Raman shortcut-to-adiabatic passage (STIRSAP) for high-fidelity LMT atom optics in a Mach–Zehnder interferometer geometry. The counterdiabatic correction is encoded directly into the Raman pulse envelopes, eliminating the need for auxiliary microwave or radio-frequency control fields. Numerical simulations based on an effective Raman model show that $1~\mu\mathrm{s}$ STIRSAP pulses achieve single-pulse transfer fidelities of $F_\pi = 0.99902$ while maintaining negligible pulse-time overhead even at high momentum order. We analyze the resulting tradeoff between interferometric phase enhancement and compound contrast decay and identify an unconstrained shot-noise optimum near $n\approx270$. The analysis further shows that practical operation at extreme LMT order is constrained by wave-packet separation, vibration noise, Doppler detuning, and accumulated systematic effects rather than by pulse duration itself. These results establish superadiabatic Raman control as a promising approach for scalable high-fidelity atom optics and clarify the physical limitations governing compact high-order atom interferometers.

16.
medRxiv (Medicine) 2026-06-22

Cumulative Metabolic Exposure to Hyperglycemia and Risk of Cardiovascular and Limb Events in Peripheral Artery Disease

Background: Although diabetes is a potent risk factor for the development of peripheral artery disease (PAD), the effect of cumulative metabolic exposure to hyperglycemia on risk of cardiovascular or limb events in patients with PAD remains unclear. Methods: The Peripheral Artery Disease: Long-term Survival (PEARLS) is a longitudinal registry of Veterans with newly diagnosed PAD identified using a natural language processing approach. Included patients had ankle brachial index [≤]0.9 or toe brachial index [≤]0.7, and no history of lower extremity revascularization or major amputation. Among patients with diabetes in this cohort, we assessed cumulative exposure to hyperglycema based on a 24-month rolling average of hemoglobin (Hgb) A1c values, categorized as [≤]7%, >7% to [≤]8%, and >8%. Multivariable Cox regression models evaluated the association between categories of HgbA1c, modeled as a time-varying exposure, and risk of cardiovascular (CV: myocardial infarction or stroke) and limb (chronic limb threatening ischemia [CLTI] or major amputation) events. Results: Among 45,109 patients with new diagnosis of PAD and pre-existing diabetes, the mean HgbA1c at baseline was 7.5%, with nearly one-third (30.4%) having HgbA1c >8%. The mean age was 70.4 years, 19.8% were Black and 4% were Hispanic. Patients with baseline HgbA1c >8% were younger and compared to those with HgbA1c [≤]7%, more likely to have coronary disease, kidney disease, and obesity. Over a median follow up of 4.2 years, 8,306 (18.4%) patients experienced a CV event, and 8,199 (18.2%) experienced a limb event. The adjusted association between HgbA1c and hazard of CV events was 12% higher in patients exposed to HgbA1c >7% to [≤]8% (HR 1.12; 95%CI: 1.05-1.18) and 38% higher in those exposed to HgbA1c >8% (HR 1.38; 95%CI: 1.30-1.46), compared to HgbA1c 7% to [≤]8% (HR 1.20; 95%CI: 1.13-1.28) and HgbA1c >8% (HR 1.60; 95%CI: 1.51-1.70), respectively when compared to HgbA1c [≤]7%. These findings were consistent in subgroups based on age and severity of PAD. Conclusions: Among diabetic patients with PAD, cumulatiave metabolic exposure to hyperglycemia is associated with a markedly increased risk of clinical events, especially limb events.

17.
arXiv (CS.LG) 2026-06-16

How to Score Experts for One-Shot MoE Expert Pruning: A Unified Formulation and Selection Principle

arXiv:2606.15716v1 Announce Type: new Abstract: Mixture-of-Experts (MoE) language models reduce per-token computation through sparse expert activation, yet deployment still requires storing the full expert pool, making one-shot expert pruning a practical approach for reducing memory usage. Although effective, existing criteria are largely heuristic, and no single criterion is universally optimal. Thus, establishing a principle for selecting pruning criteria suited to different deployment objectives remains an important yet largely underexplored problem in one-shot expert pruning. To this end, we introduce a unified formulation for one-shot MoE expert pruning organized around three factors: routing frequency, gate weighting, and activation strength. The formulation yields a criteria selection principle: task-agnostic pruning should favor routed-token-averaged, gate-free activation-based criteria, whereas task-specific pruning can benefit from retaining routing-frequency and gate-weight information. Beyond this principle, the formulation also provides a systematic view of existing heuristic criteria and gives rise to two new task-agnostic criteria, Mean Activation Norm (MAN) and Mean Squared Activation Norm (MSAN). Across four representative MoE models and 16 diverse benchmarks, MAN and MSAN are consistently strong in the task-agnostic setting, obtain the top-two average ranks, and improve average performance by up to 8.8 points over the strongest baseline.

18.
arXiv (CS.LG) 2026-06-12

A2D2: Fine-Tuning Any-Length Discrete Diffusion for Adaptive Decoding

arXiv:2606.13565v1 Announce Type: new Abstract: Discrete diffusion models offer a simple and stable likelihood-based framework for sequence generation, recently extended to any-length settings via token insertion. Principled reward-guided fine-tuning for any-length discrete diffusion, however, remains largely unexplored. We introduce Fine-Tuning Any-Length Discrete Diffusion for Adaptive Decoding (A2D2), a unified framework for reward-guided fine-tuning of any-length discrete diffusion models via joint optimization of the insertion and unmasking policies together with a quality-based inference schedule. We derive the Radon-Nikodym derivative for the joint insertion-unmasking path measures, enabling theoretically guaranteed convergence to the intractable reward-tilted sequence distribution without requiring target samples. Building on this, we establish unmasking and insertion quality as tractable approaches for minimizing decoding error and introduce the Adaptive Joint Decoding (AJD) loss, which provably yields the optimal path measure that generates the reward-tilted distribution. Empirically, A2D2 improves reward optimization while enhancing generation flexibility and accuracy over prior fixed-length fine-tuning and inference-time guidance methods.

19.
arXiv (CS.CV) 2026-06-11

Density Ridge Selective Prediction for LLM and VLM Hallucination Detection under Calibration Label Scarcity

Hallucination detection in large language and vision-language models is increasingly framed as selective prediction, where a detector assigns a confidence score and abstains when confidence is low. Unsupervised sampling detectors (Semantic Entropy) avoid labels but plateau in quality, while supervised probes attain stronger in-distribution scores yet degrade sharply when calibration labels are scarce. We recover the response manifold of an LLM as the density ridge of a kernel density estimate built on a six-dimensional kinematic feature map of hidden state generation trajectories. A test generation is scored by the negated Euclidean distance from its projected feature point to the nearest ridge vertex, yielding a low-dimensional geometric skeleton of the stochastic output distribution. We evaluate against Semantic Entropy, topological methods, and log-probability on six QA benchmarks (HaluEval-QA, TriviaQA, GSM8K, POPE, ScienceQA, A-OKVQA) using eight text and vision LLMs in a deliberately label-scarce protocol ($n_{cal}{=}200$ queries, $N{=}5$ generations). Our ridge-based score beats on AUROC with 5-20 points gain, while demonstrating tempered degradation under calibration-label scarcity.

21.
arXiv (CS.CV) 2026-06-16

DLWM: Diverse Latent World Models for Efficient Multimodal Reasoning

Reasoning capabilities of multimodal large language models (MLLMs) have improved considerably in recent years. Existing approaches typically rely on explicit chain-of-thought or continuous latent-space trajectories to enhance multi-step reasoning. However, these methods generally assume that an input admits a single latent interpretation and unfold reasoning along a fixed path or under a uniform computation budget. In real-world multimodal settings, visual observations are often subject to occlusion, blur, viewpoint variation, or semantic ambiguity, giving rise to multiple plausible interpretations. A uniform reasoning strategy not only limits the model's ability to explore multiple hypotheses but also incurs high memory usage and rollout cost. We present DLWM (Diverse Latent World Models), a multimodal reasoning framework that combines latent-space reasoning with reinforcement learning. First, we construct a set of diverse latent world hypotheses in continuous latent space, each capturing a different plausible interpretation of the visual input, and unfold latent reasoning independently on each hypothesis. An orthogonality-based diversity regularizer explicitly prevents hypothesis collapse. Second, we formulate the latent reasoning process as a resource-constrained sequential decision problem and introduce a resource-aware reinforcement learning policy that adaptively allocates computation across hypotheses, dynamically deciding whether to expand, terminate, or merge reasoning paths, thereby substantially reducing memory footprint and improving rollout efficiency. Experiments on multiple multimodal reasoning benchmarks demonstrate that DLWM outperforms existing methods by 2-5 points in accuracy while reducing memory usage by 24%.

22.
Nature (Science) 2026-06-17

A prototype differential atom interferometer for fundamental physics

Gravitational waves and ultralight dark matter are among the most compelling frontiers in fundamental physics, motivating proposals for very-long-baseline atom interferometerssuch as AION1, MAGIS2, AICE3 and AEDGE4 that aim to detect at frequencies at which ground-based5 and space-borne6 laser interferometers lose sensitivity. Very-long-baseline atom interferometers look for signals by comparing the quantum phase evolution of widely separated atomic ensembles interrogated by a common laser. However, their performance depends critically on suppressing noise sources, particularly laser phase noise. The experimental validation of such noise rejection remains an important challenge. Here we demonstrate a prototype differential atom interferometer based on the single-photon clock transition of fermionic 87Sr. Thus, we obtain a gradiometer configuration with a species intrinsically suited to kilometre-scale and space-baseline operation. The instrument operates at the standard quantum limit7 with no excess noise beyond atom shot noise. The differential configuration maintains quantum-limited sensitivity in the presence of several radians of artificially injected laser phase noise per shot, which emulates the conditions expected in a very-long-baseline atom interferometer. We also demonstrate the recovery of coherent oscillatory signals across a broad frequency range under fully phase-randomized conditions, a capability that is inaccessible to a single interferometer operating in the same regime. These results provide an experimental validation of the noise-immune measurement principle underlying very-long-baseline atom interferometers and mark an important step towards next-generation quantum sensors for gravitational-wave detection and searches for ultralight dark matter8,9. A prototype differential atom interferometer operates at the standard quantum limit with no excess noise beyond atom shot noise, achieving performance in line with the specifications for future long-baseline atom interferometers.

23.
arXiv (CS.AI) 2026-06-18

TRAP: Benchmark for Task-completion and Resistance to Active Privacy-extraction

arXiv:2606.18996v1 Announce Type: cross Abstract: Agents are increasingly deployed in document-intensive workflows where sensitive private information is not an edge case but a routine input, e.g., an agent booking a flight needs passport numbers. In such settings, the agent must use private information to complete tasks accurately while never exposing it in its responses, because it cannot verify who is actually at the keyboard. These two obligations are in fundamental tension. A model capable enough to use private information for task completion can, by the same capability, be induced to reveal it. To evaluate the trade-off of task accuracy and privacy leakage, we introduce Task-completion and Resistance to Active Privacy-extraction (TRAP). Each scenario includes a document containing private information, a task query that requires the agent to invoke the correct tool using private fields, and an attack query that attempts to elicit the same information in natural language. Evaluating 22 models spanning frontier proprietary and open-source models at multiple scales, we find that all model families exhibit non-trivial leakage, and that instruction-following ability correlates with leakage rate. Existing prompt-based defenses reduce leakage but at significant cost to task accuracy. Prompt optimization fails to escape this trade-off. We demonstrate that this failure is not incidental. For any softmax-based model, no soft-constraint defense, e.g., prompt-based defenses, can jointly achieve high task success with zero leakage probability. Motivated by this impossibility result, we propose structural private field isolation, which replaces private fields with hash keys before they reach the model. This approach largely prevents leakage while keeping task accuracy.

24.
arXiv (CS.CV) 2026-06-11

A Turbo-Inference Strategy for Object Detection and Instance Segmentation

Object detection and instance segmentation tasks are closely related. Existing top-down instance segmentation methods usually follow a detect-then-segment paradigm, where an initial detector is used to recognize and localize objects with bounding boxes, followed by the segmentation of an instance mask within each bounding box. In such methods, the detection accuracy directly influences the subsequent segmentation performance. However, previous research has seldom explored the impact of the instance segmentation task on object detection. In this paper, we present a turbo-inference strategy for the top-down methods that leverages the complementary information between detection and segmentation tasks iteratively. Specifically we design two modules: turbo-detection head and turbo-segmentation head, which facilitate communication between the tasks. The two modules form a closed loop that interlaces the detection and segmentation results without retraining the model. Comprehensive experiments on the COCO, iFLYTEK, and Cityscapes datasets demonstrate that our method substantially enhances both detection and segmentation accuracies with a certain increase in computational cost. The proposed method represents a tradeoff between prediction accuracy and inference speed. Codes are available at https://github.com/zhaozhen2333/Turbo-Learning.git.

25.
arXiv (CS.CL) 2026-06-12

Beyond Uniform Tokens: Adaptive Compression for Time Series Language Models

Large language models (LLMs) have enabled time series (TS) analysis by jointly modeling numerical observations and textual context through a shared token interface. However, TS tokens and prompt tokens exhibit fundamentally different information structures, making uniform token processing inefficient. In this paper, we study token efficiency in TS language modeling from an asymmetric-token perspective. We show that TS tokens have highly uneven spectral contributions, where many tokens share redundant frequency patterns while a small subset preserves critical temporal evidence. We also observe that prompt-token influence attenuates with model depth, suggesting that full prompt retention across all layers is unnecessary. Based on these findings, we develop an adaptive token budgeting framework that compresses TS tokens via frequency-domain structure and progressively reduces prompt tokens across layers. Experiments across forecasting, classification, imputation, and anomaly detection demonstrate up to 7.68$\times$ inference acceleration and performance gains in 78\% of evaluated settings, showing the effectiveness of asymmetric token compression for scalable TS foundation models.