Academic Intelligence · Curated Daily

探索全球前沿学术脉络

AcademicHub 汇聚顶级期刊与预印本平台的实时文献。定制您的专属科研雷达,利用大语言模型自动生成交叉领域文献分析简报。

01.
arXiv (CS.CL) 2026-06-15

Decoupled Mixture-of-Experts for Parametric Knowledge Injection

Knowledge injection aims to equip large language models (LLMs) with external, domain-specific, or time-sensitive knowledge. Existing approaches typically face a trade-off between flexibility and integration: retrieval-augmented generation keeps knowledge outside the model but only provides prompt-level augmentation, whereas post-training based methods encode new knowledge into shared parameters but may introduce catastrophic forgetting, knowledge conflict, and costly updates. In this paper, we propose Decoupled Mixture-of-Experts (DMoE), a modular architecture for parametric knowledge injection that decouples both experts and the router from the base model. DMoE converts external knowledge corpora into independently updatable expert modules and uses a lightweight uncertainty-aware router to activate relevant experts only when the base model lacks sufficient knowledge during generation. To support efficient auto-regressive inference, DMoE attaches experts only to the final-layer feed-forward network, preserving KV-cache reuse while enabling parameter-level knowledge augmentation. Experiments on knowledge-intensive benchmarks show that DMoE consistently improves answer quality over retrieval and adapter-based baselines.

02.
arXiv (CS.CL) 2026-06-19

Target-Side Paraphrase Augmentation for Sign Language Translation with Large Language Models

Sign language translation (SLT) remains constrained by the limited availability of paired sign-video/text corpora and by the heavy-tailed vocabularies typical of real-world datasets. We study a target-side augmentation strategy in which a large language model (LLM) generates controlled paraphrase variants of the reference spoken-language sentence while the sign input remains unchanged. Concretely, we use GPT-4o to produce semantically faithful variants of the training targets and train a Signformer-style pose-based Transformer under a two-stage schedule: pre-training on the augmented corpus followed by fine-tuning on the original references. We evaluate this strategy on three datasets that span complementary challenges: PHOENIX14T (German Sign Language), a real-world corpus with moderate lexical diversity; the Greek Sign Language Dataset with highly controlled, repetitive recordings; and LSA-T (Argentinian Sign Language), a naturalistic corpus with a large vocabulary and severe long-tail sparsity. This range allows us to characterize precisely when and why target-side augmentation is beneficial. On PHOENIX14T, augmentation improves BLEU-4 from 9.56 to 10.33, demonstrating that paraphrastic exposure helps the decoder generalize beyond memorized reference phrasing. The near-saturated GSL baseline and the extremely sparse LSA-T setting reveal the limits of the approach: in both cases, single-reference lexical overlap metrics are insufficient to capture the full picture, motivating a complementary semantic evaluation. To our knowledge, this is the first study to examine LLM-generated target-side paraphrases as an augmentation mechanism for SLT, and the first to apply an LLM-as-a-Judge evaluation protocol to SLT. This complementary evaluation reveals gains in semantic fidelity that lexical overlap metrics understate.

03.
arXiv (CS.CL) 2026-06-16

PaperJury: Due-Process Review for Bounded LaTeX Revision

Pre-submission hardening of human-authored LaTeX computer science papers differs from drafting assistance because it requires adversarial whole-paper review, explicit no-fix outcomes, and bounded artifact-safe revision. Existing writing assistants, critique generators, and judge-centered loops lack durable issue identity across rounds, deterministic routing from critique to adjudication, and manuscript control that can reject invalid concerns or defer author-dependent ones. We present PaperJury, a closed-loop review-verdict-revise-verify system built on a deterministic-versus-semantic split: deterministic orchestration manages decomposition, a frozen claim spine, a durable ledger, routing, stopping, and exact-once patch application, while semantic agents are limited to bounded review, judgment, and repair. PaperJury combines bounded holistic review, contestability-based routing, a due-process trial, and risk-proportional guard chains for anchor-bounded edits, yielding terminal outcomes of invalid-drop, valid-fixable, and author-required. In a two-arm expert-review evaluation on held-out Vision, natural language processing, and machine learning papers against four baselines, we assess issue quality, verdict and routing quality, edit safety, convergence behavior, and cost, supporting the thesis that load-bearing safety and completion logic should reside in deterministic orchestration rather than model discretion. PaperJury is available at https://github.com/u7079256/paperjury.

04.
arXiv (CS.LG) 2026-06-19

PU-UNet: Stable Multiplicative Interactions for Medical Image Segmentation

arXiv:2606.20035v1 Announce Type: cross Abstract: Many dense prediction networks rely on additive feature transformations and model higher-order feature interactions only implicitly. Product units provide an explicit mechanism for multiplicative feature modeling, but their logarithmic–exponential formulation can cause numerical instability, which has limited their use in deep dense prediction networks. In this work, we propose Product-Unit U-Net (PU-UNet), a residual U-Net that integrates stable product-unit residual blocks into rich low-resolution stages for medical image segmentation. The proposed formulation combines smooth positivity mapping with log-domain clipping, enabling stable multiplicative feature learning with negligible computational overhead. On ISIC 2018, Kvasir-SEG, and BUSI, PU-UNet achieves Dice scores of 0.942, 0.959, and up to 0.925, respectively. Compared with a matched Residual U-Net baseline, PU-UNet consistently improves Dice and IoU while keeping parameters, FLOPs, and inference latency nearly unchanged, and reduces the image-level false-positive rate on normal BUSI cases from 0.077 to zero. Ablation studies suggest that the gains are associated with product-unit interactions, are strongest under low-resolution placement, and benefit from the proposed stabilization design. These results suggest that stable product-unit residual learning can be an effective way to enhance U-Net-style segmentation networks with explicit multiplicative interactions.

05.
arXiv (CS.CV) 2026-06-16

CycliST: A Video Language Model Benchmark for Reasoning on Cyclical State Transitions

We present CycliST, a novel benchmark dataset designed to evaluate Video Language Models (VLM) on their ability for textual reasoning over cyclical state transitions. CycliST captures fundamental aspects of real-world processes by generating synthetic, richly structured video sequences featuring periodic patterns in object motion and visual attributes. CycliST employs a tiered evaluation system that progressively increases difficulty through variations in the number of cyclic objects, scene clutter, and lighting conditions, challenging state-of-the-art models on their spatio-temporal cognition. We conduct extensive experiments with current state-of-the-art VLMs, both open-source and proprietary, and reveal their limitations in generalizing to cyclical dynamics such as linear and orbital motion, as well as time-dependent changes in visual attributes like color and scale. Our results demonstrate that present-day VLMs struggle to reliably detect and exploit cyclic patterns, lack a notion of temporal understanding, and are unable to extract quantitative insights from scenes, such as the number of objects in motion, highlighting a significant technical gap that needs to be addressed. More specifically, we find no single model consistently leads in performance: neither size nor architecture correlates strongly with outcomes, and no model succeeds equally well across all tasks. By providing a targeted challenge and a comprehensive evaluation framework, CycliST paves the way for visual reasoning models that surpass the state-of-the-art in understanding periodic patterns.

06.
arXiv (CS.CV) 2026-06-16

PermaVid: Consistent Video Generation Across Edits via Disentangled Context Memory

Consistent video generation under editing operations requires persistence: when edits modify scene appearance or layout, subsequent generations should remain coherent across time and viewpoints. However, existing memory designs struggle to maintain long-term consistency after such modifications, as stored contexts may become outdated or invalid. To address this, we propose PermaVid, a novel framework built upon a multi-modal context memory that disentangles spatial context into semantic appearance and geometric structure, together with an edit-aware memory update and retrieval strategy that keeps memory evolution aligned with subsequent observations. Specifically, we develop two complementary memory banks: an RGB context memory that captures appearance-aware observations while implicitly encoding geometry, and a depth context memory that preserves geometry-only structure disentangled from semantics. Building on this design, we introduce a memory-guided video generation model that performs multi-modal feature fusion under reference conditions drawn from mixed-modality memory contexts. Experiments demonstrate that our method maintains strong long-term semantic and structural consistency after edits, significantly outperforming state-of-the-art methods.

07.
arXiv (CS.AI) 2026-06-17

IUU+DB: Tracking Illegal, Unreported, and Unregulated Fishing, Seafood Fraud, and Labor Abuse through LLM-driven Information Extraction

arXiv:2606.18181v1 Announce Type: cross Abstract: Illegal, unreported, and unregulated fishing (IUU) traditionally refers to fishing activities that violate applicable laws or occur in areas that lack applicable laws. We propose the term IUU+ to capture a broader suite of fisheries sector environmental and associated supply chain trade-related crimes and behaviors. Although IUU+ activity is widely recognized as a serious threat to marine ecosystems, markets, and livelihoods, a quantitative understanding of these incidents, e.g., their frequency, geography, species, actors, and patterns in the type of illicit activity, remains difficult to obtain. We propose IUU+DB, a large language model driven system for building a global incident database of IUU+ activity. The system ingests heterogeneous documents, classifies whether they describe relevant incidents, extracts key data elements such as actors, locations, species, vessels, violations, and enforcement outcomes, and supports deduplication and trend analysis. Case studies and validation results show that IUU+DB can help organize fragmented evidence, surface geographic and behavioral hotspots, support fisheries-domain specific research in academia and non-government organizations, assist source and species risk assessments for industry, and provide support for policy implementation and targeted enforcement efforts to government agencies.

08.
arXiv (CS.AI) 2026-06-16

Critically Engaged Pragmatism: Scientific Norm and Social, Pragmatist Epistemology for AI Science Evaluation Tools

arXiv:2601.09753v2 Announce Type: replace-cross Abstract: AI science evaluation tools aim to assess research credibility. As with traditional metrics such as impact factors, their edicts can be decontextualised and repurposed in problematic ways. To address this, I propose Critically-Engaged Pragmatism as a scientific norm enjoining scientific communities to scrutinise the purposes and purpose-specific reliability of AI science evaluation tools. To foster Critically Engaged Pragmatism, creators of AI science evaluation tools should transparently and fully report design, training, and benchmarking details to facilitate assessments of purpose-specific reliability, liability to different types of error, and bias. What count as best practices for the transparent reporting of AI science evaluation tools should be updated as new forms of error, bias, and gamesmanship are discovered. Under this framework, AI science evaluation tools are not objective arbiters of scientific credibility. Rather, they are the object of critical discursive practices that ultimately ground the credibility of scientific communities.

09.
arXiv (CS.AI) 2026-06-19

GLARE: A Natural Language Interface for Querying Global Explanations

arXiv:2606.19735v1 Announce Type: new Abstract: While global explanations are crucial for understanding vision models across datasets, classes, and decision contexts, their complex and monolithic nature often hinders practical exploration. Because users typically seek targeted answers to specific questions rather than static artifacts, we present an LLM-based interactive interface that provides natural language access to global explanations for black-box image classifiers. The system's core LLM acts as a mediator, translating natural language questions into structured SQL queries over local explanation data. This enables flexible aggregation without exposing users to low-level representations. For each query, the interface outputs statistics-augmented natural language responses, supporting local explanations, and intent-aligned visualizations. We evaluate the system on intent interpretation, query mapping accuracy, generalization to novel queries and datasets, and robustness to linguistic errors. Our results demonstrate that LLM-mediated querying substantially improves the accessibility and usability of global explanations for human-centered XAI.

10.
PLOS Medicine 2026-05-13

On the evolution of the company we keep: Implications for infectious disease modeling

by Joël Mossong Whom we meet shapes how infections spread. Where earlier focus of mathematical epidemiology was on incorporating age, more recent work has begun to reveal the importance of socioeconomic aspects for understanding and managing future epidemics. In this Perspective, Joël Mossong discusses the importance of understanding social contacts and how they have evolved for infectious disease modeling, and the need to factor in additional considerations such as ethic and socioeconomic backgrounds.

11.
arXiv (CS.CV) 2026-06-16

Null-Space Diffusion Distillation Unlocks Speed, Fidelity and Realism in Lensless Imaging

Lensless imaging reconstructs scenes from highly multiplexed measurements, resulting in a severely ill-posed inverse problem. In this work, we identify a fundamental trade-off between measurement consistency, perceptual quality, and inference speed across lensless reconstruction paradigms. Traditional methods favor consistency but produce perceptually degraded results, supervised approaches achieve high-quality reconstructions with fast inference but may violate physical constraints, and diffusion-prior methods achieve high perceptual quality and consistency–particularly when structured constraints such as range-null decomposition are used–but remain slow due to iterative sampling. Motivated by this observation, we propose Null-Space Diffusion Distillation (NSDD), a single-pass reconstruction model that distills structured diffusion-prior inference into an efficient feed-forward network. NSDD learns to produce high-quality reconstructions that preserve measurement consistency while avoiding costly iterative sampling. Experimental results demonstrate that NSDD achieves perceptual quality and consistency competitive with diffusion-prior methods, while providing significantly faster inference and offering a favorable balance across all three objectives. Furthermore, ablation experiments show that distilling the range–null decomposition improves reconstruction quality and robustness over unstructured full-reconstruction distillation, including on unseen real scenes. These results highlight the potential of structure-aware distillation for efficient lensless imaging. Code is available at github.com/JRCSAVSN/NullSpaceDiffusionDistillation.

12.
arXiv (CS.LG) 2026-06-16

Multi-User Dueling Bandits: A Fair Approach using Nash Social Welfare

arXiv:2605.01961v2 Announce Type: replace Abstract: Learning from human preference data is becoming a useful tool, from fine-tuning large language models to training reinforcement learning agents. However, in most scenarios, the model is trained on the average preference of all human evaluators, which, under large variations of preferences, can be unfair to minority groups. In this work, we consider fairness in dueling bandits, a standard framework for online learning from preference data. We assume that each user has a (potentially distinct) Condorcet winner, which is an arm preferred to every other arm. Using these user-specific Condorcet winners as reference points, we evaluate and score arms according to their performance relative to the corresponding winner. To promote fairness across heterogeneous users, we adopt the well-established Nash Social Welfare objective, which maximizes the product of user utilities, thereby inherently penalizing inequality and preventing the marginalization of any single user. Within this framework, we construct a hard instance to establish a regret lower bound of $\Omega(T^{2/3}\min(K,D)^\frac{1}{3})$ for a time horizon $T$, $K$ arms, and $D$ users, which, to the best of our knowledge, is the first result quantifying the cost of fairness in dueling bandits with heterogeneous preferences. We then present the Fair-Explore-Then-Commit and Fair-$\epsilon$-Greedy algorithms with a Condorcet winner identification phase. We further derive their regret upper bounds that match the lower-bound dependence on $T$ up to logarithmic factors.

13.
medRxiv (Medicine) 2026-06-10

Developing a Unified Criminal Justice Pathway into Drug and Alcohol Treatment from Police Custody: A Public Health Service Evaluation and Pathway-Design Project in Blackpool, United Kingdom

Introduction: Blackpool, England's most deprived local authority, has the highest drug-related death rate in the country. People in police custody with problem substance use are a key Core20PLUS5 inclusion-health group, yet referral from the police into structured drug and alcohol treatment is fragmented and relies heavily on self-report. We evaluated the current police-to-treatment route in Blackpool and designed an evidence-informed unified pathway. Materials and Methods: A mixed-methods service evaluation and pathway-design project was conducted during a six-month General Practice / Public Health rotation. Routinely collected referral data from Horizon (the local specialist drug and alcohol service) covering the 47-month period from December 2019 to October 2023 were analysed. Findings were triangulated with national policy, the Project ADDER and Liaison and Diversion evaluations, and the international evidence on police-led pre-arrest diversion. Results: Of 5,900 total referrals into Horizon over 47 months, only 269 (4.56%) originated from the police. Police referrals accounted for fewer than 5% of monthly referrals in 30 of 47 months, for 5 to 9.9% in 16 months, and for >/= 10% in only one month (10.8%, December 2022). Blackpool recorded 76 drug-misuse deaths in 2019-21 (19.4 per 100,000, approximately four times the England rate). A six-step unified pathway is proposed: Initiate Referral (opt-out, from ADDER Police and Liaison and Diversion); Initial Assessment; Tailored Treatment Plan; Continuous Support; Collaboration and Monitoring; and Evaluation and Adjustment. Conclusions: Police contact is markedly under-used as a gateway to treatment despite Blackpool having the highest drug-related mortality in England. An opt-out, multi-agency pathway anchored in Core20PLUS5 has the potential to narrow the treatment gap, reduce re-offending, and address the structural health inequalities that drive premature mortality.

14.
arXiv (CS.AI) 2026-06-19

Rethinking Shrinkage Bias in LLM FP4 Pretraining: Geometric Origin, Systemic Impact, and UFP4 Recipe

arXiv:2606.20381v1 Announce Type: new Abstract: FP4 training promises substantial reductions in memory and computation cost for LLM pretraining, yet current FP4 hardware paths and recipes, including NVIDIA Blackwell/Rubin-class systems and AMD MI350-series GPUs, remain centered on E2M1 data elements. In this study, we identify a fundamental limitation of that choice: non-uniform formats such as E2M1 inherently suffer from Shrinkage Bias, a systematic negative rounding error caused by the geometric asymmetry of their representable bins. We show that this bias accumulates multiplicatively across layers and is amplified by the Random Hadamard Transform (RHT), providing a unified explanation for the training instability observed in existing E2M1-based FP4 recipes. In contrast, uniform grids (E1M2/INT4) bypass this grid-geometry error and better convert the improved bucket utilization from RHT into higher quantization quality. Based on this finding, we propose UFP4, a uniform 4-bit training recipe that applies RHT to all three training GEMMs while restricting stochastic rounding to dY alone. On Dense 1.5B, MoE 7.9B, and MoE 124B long-run pretraining, UFP4 consistently achieves lower BF16-relative loss degradation than strong E2M1-based baselines, supported by scaling-law analysis and ablation studies. Our results suggest that future accelerators should support E1M2/INT4-style uniform 4-bit grids as first-class training primitives alongside E2M1.

15.
arXiv (CS.LG) 2026-06-11

Range-Aware Bayesian Optimization for Discovering Diverse Designs within Target Property Windows

arXiv:2606.11574v1 Announce Type: new Abstract: In many materials and product design problems, desirable candidates exhibit properties that fall within an acceptable range rather than achieve a single optimum. Recovering multiple, distinct solutions that satisfy such specifications is also practically valuable, as some candidates may be preferred for reasons of cost, processability, or robustness that are difficult to encode directly in an objective function. Here, we develop a range-aware Bayesian optimization (BO) framework in which the acquisition function directly scores the posterior probability that a candidate satisfies a target range. The framework naturally extends to parallel pursuit of multiple distinct specifications over a shared candidate space. Across benchmark tasks, range-aware acquisition consistently recovers larger and more diverse sets of valid designs than standard BO baselines and recent goal-seeking methods. Its utility is further demonstrated in two practically motivated design case studies involving optimizing reaction conditions for polymer synthesis and sequence-defined oligomer discovery for prescribed optical absorption bands, supported by quantum chemical calculations. These results suggest that range-aware BO can provide a practical and sample-efficient foundation for specification-driven design, particularly when design flexibility and solution diversity are important considerations.

16.
PLOS Computational Biology 2026-06-09

Multi-stable oscillations in cortical networks with two classes of inhibition

by Arnab Dey Sarkar, Bard Ermentrout In the classical view of cortical rhythms, interactions between excitatory pyramidal neurons (E) and inhibitory parvalbumin-expressing interneurons (I) are sufficient to generate gamma- and beta-band oscillations. However, it is now well established that multiple inhibitory interneuron subtypes exist and that they play important roles in the generation and modulation of these rhythms. In this paper, we develop a spiking network model consisting of populations of E, I, and an additional interneuron type, somatostatin-expressing neurons (S), which receive excitation from the E cells and inhibit both the E and I populations. The S cells are further modulated by a third inhibitory subtype, vasoactive intestinal peptide (VIP) neurons, which receive inputs from other cortical areas. We reduce the spiking network to a system of nine differential equations that describe the mean membrane potential, firing rate, and synaptic conductance for each population. Using this reduced model, we identify a wide range of parameters that exhibit multiple coexisting rhythms. Employing tools from nonlinear dynamics, we then explore the roles of the two classes of inhibition, as well as VIP modulation, in shaping the properties of these rhythms.

17.
arXiv (CS.LG) 2026-06-16

GPT-Based Fast Simulation of CLAS12 Detector Hits via Conditional Autoregressive Generation

arXiv:2606.16035v1 Announce Type: cross Abstract: Modern particles physics experiments have demonstrated an increasing need for fast, high-fidelity detector simulation as detector components have improved and subsequent computational requirements approach the limits of available resources. Recently, deep generative models have emerged as a promising alternative to traditional Monte-Carlo methods, with recent works drawing inspiration from large language models (LLMs) and self-supervised next-token prediction methods. In this work, we present an application of a GPT-style autoregressive transformer as a fast surrogate model for the calorimeter inside the CLAS12 experiment at the Thomas Jefferson National Accelerator Facility. The model is conditioned on incident momentum and generates realistic detector hits autoregressively across all nine calorimeter layers as sequences of strip, ADC, and TDC tokens. We demonstrate that the model faithfully reproduces hit multiplicity, spatial distributions, energy deposits, and the energy-momentum response of the electromagnetic calorimeter. The generator achieves inference rates exceeding 700 events per second on a single GPU, providing a substantial speedup over traditional Geant4-based simulations while maintaining physics fidelity essential for high-luminosity experimental programs.

18.
arXiv (CS.AI) 2026-06-18

NAVI-Orbital: First In-Orbit Demonstration of a Zero-Shot Vision-Language Model for Autonomous Earth Observation

arXiv:2606.18271v1 Announce Type: new Abstract: As Earth Observation data generation outpaces downlink bandwidth and human-in-the-loop processing, a widening gap has emerged between onboard collection and actionable ground intelligence. This paper presents NAVI-Orbital, a software system deployed on a Low Earth Orbit (LEO) spacecraft. On April 16, 2026, NAVI-Orbital achieved what is, to the authors' knowledge, the first in-orbit demonstration of a vision-language model performing autonomous multi-modal inference entirely onboard. NAVI-Orbital uses a local vision-language model (Gemma 3) to classify each captured scene, produce a text description of its content and the relationships between its features, and respond to operator follow-up via natural-language dialogue. The system is re-tasked through plain-English prompts in place of conventional command sequences, and is orchestrated by a graph-based state machine (LangGraph) coordinating dedicated agents for detection and dialogue. Results across ground benchmarking (88.16% accuracy on the 7,960-image curated AID benchmark), Flatsat validation, and live in-orbit captures of newly acquired, previously unseen Earth imagery (including uncorrected YAM-9 imagery, processed onboard with hardware-accelerated GPU inference and no fine-tuning for the flight instrument) demonstrate the feasibility of running foundation models on satellite-class edge computers to invert the conventional acquire-then-downlink-everything bandwidth profile through semantic compression of Earth observations in-orbit.

19.
arXiv (quant-ph) 2026-06-16

Inflationary branch decoherence and the cosmological arrow of time

作者:

arXiv:2602.21263v3 Announce Type: cross Abstract: We analyze branch decoherence in inflationary quantum cosmology by computing reduced density matrices and branch-overlap factors for long-wavelength perturbations. The Hartle-Hawking no-boundary state is real in the semiclassical regime and contains both expanding and contracting WKB components, whereas the tunneling state is selected as an outgoing complex WKB branch; expanding-contracting decoherence is therefore central for the former and mainly diagnostic for the latter. Using the influence-functional formalism, we derive the noise kernel for a light spectator environment and evaluate decoherence under horizon-based and EFT-motivated coarse grainings. We then compute the single-mode branch overlap directly from the Bunch-Davies mode functions, obtaining $|\mathcal{D}_k(z)|=[z^2/(z^2+1)]^{1/4}$ in the massless limit and $|\mathcal{D}_k(z)|\sim z^\nu$ on superhorizon scales for massive fields, where $z=-k\eta$ is the dimensionless wavenumber with $\eta$ the conformal time. In the massless case, the accumulated geometric branch functional is evaluated in closed form, with a leading cutoff-sensitive phase-space term and a universal subleading contribution. The calculation provides an explicit quantitative bridge between quantum-cosmological boundary conditions, inflationary squeezing, and the emergence of effectively classical cosmological histories.

20.
arXiv (CS.AI) 2026-06-16

Latent Thought Flow: Efficient Latent Reasoning in Large Language Models

arXiv:2606.16222v1 Announce Type: new Abstract: Large Language Models (LLMs) increasingly rely on intermediate reasoning, yet explicit Chain-of-Thought (CoT) suffers from a linguistic space bottleneck: each thought must be decoded into tokens, causing high inference overhead. Latent reasoning moves deliberation into continuous space, but existing methods mostly learn deterministic or reward-maximizing paths, lacking a principled way to allocate probability across trajectories with different correctness and costs. We propose Latent Thought Flow (LTF), which models reasoning as variable-length continuous trajectories and trains a sampler to match a reward-induced posterior over answer quality and computation cost. We instantiate this with a continuous GFlowNet using stochastic latent transitions. To handle sparse answer supervision, we introduce an Entropy-Weighted Subtrajectory Balance objective for intermediate rewards and a reference-prior regularizer to anchor exploration. Experiments under finetuning and transfer learning settings show that LTF outperforms explicit CoT and latent reasoning baselines, improving accuracy by 9.5% while reducing reasoning length by 27.2% on average compared with strong latent reasoning baselines.

21.
Nature (Science) 2026-06-17

Cortical development dynamics across autism spectrum disorder mouse models

Despite the functional diversity of over 100 causal genes1–3, phenotypic convergence across models may reveal common neurobiological processes in autism spectrum disorder (ASD). Here we profiled 251 samples from 11 monogenic mouse models of ASD using single-nucleus multi-omic sequencing across three developmental stages, both sexes and two brain regions. Despite genetic heterogeneity, ASD-linked mutations converged on perturbations of the radial glial cell lineage. These alterations reflect a transient developmental delay rather than lasting lineage misspecification and resolve by postnatal stages. Molecularly, the largest transcriptional differences emerged in neurons at early postnatal stages. These changes included downregulation of synaptic and ion channel-related genes, consistent with homeostatic adaptation or delayed maturation. Network analysis showed molecular convergence across models within each developmental stage, suggesting that diverse mutations linked to ASD impinge on common, stage-specific processes. Convergence becomes less pronounced by postnatal day 14, highlighting the dynamic nature of ASD-associated changes. Cross-genotype heterogeneity is superimposed on stage-specific effects. Electrophysiology corroborated this pattern: mutants generally showed altered neuronal excitability and synaptic properties with model-specific nuances. Our study also highlighted sex-specific gene expression alterations, with female mice often displaying larger effect sizes than male mice. Together, our findings provide a comprehensive view of developmental cellular and molecular dynamics across models of ASD. Using single-nucleus multi-omic sequencing, diverse autism spectrum disorder-linked gene mutations converge on transient, stage-specific disruptions in early brain development, and highlight sex-specific gene expression alterations.

22.
arXiv (CS.CV) 2026-06-16

PoseGAM: Robust Unseen Object Pose Estimation via Geometry-Aware Multi-View Reasoning

6D object pose estimation, which predicts the transformation of an object relative to the camera, remains challenging for unseen objects. Existing approaches typically rely on explicitly constructing feature correspondences between the query image and either the object model or template images. In this work, we propose PoseGAM, a geometry-aware multi-view framework that directly predicts object pose from a query image and multiple template images, eliminating the need for explicit matching. Built upon recent multi-view-based foundation model architectures, the method integrates object geometry information through two complementary mechanisms: explicit point-based geometry and learned features from geometry representation networks. In addition, we construct a large-scale synthetic dataset containing more than 190k objects under diverse environmental conditions to enhance robustness and generalization. Extensive evaluations across multiple benchmarks demonstrate our state-of-the-art performance, yielding an average AR improvement of 5.1% over prior methods and achieving up to 17.6% gains on individual datasets, indicating strong generalization to unseen objects. Project page: https://windvchen.github.io/PoseGAM/ .

23.
arXiv (CS.AI) 2026-06-19

Human-AI Agent Interaction in a Business Context

arXiv:2606.18716v1 Announce Type: cross Abstract: As AI agents are increasingly integrated into core business processes, understanding and designing effective interaction patterns between humans and AI agents becomes crucial for value creation. This study identifies and evaluates principles and criteria for a positive User Experience (UX) with AI agents, along with methods for its measurement. We identify user expectations and needs to facilitate adoption, build trust, and support user-centered decision-making by development teams. Using a mixed-methods approach that combines qualitative and quantitative techniques, we explore interaction patterns between humans and AI agents. The findings from this exploratory research serve as the basis to develop a survey experiment which evaluates the effectiveness of specific design elements on a larger scale. This foundational research contributes to the development of more intuitive and effective human-AI agent interactions in business settings.

24.
arXiv (quant-ph) 2026-06-15

Experimental violation of a Bell-like inequality for causal order

arXiv:2506.20516v2 Announce Type: replace Abstract: Quantum mechanics is compatible with scenarios where physical processes happen in an indefinite order. In theory, this feature could be detected through violations of inequalities on the observed correlations, analogous to Bell inequalities. However, experimental demonstrations of such violations have been missing until recently due to the complexity of the required setup. Here we report an experimental violation of a Bell-like inequality involving the correlations of four parties, one of which is spacelike separated from the others. Our demonstration employs 3 km fiber spools to simulate spacelike separation, and achieves high-speed operations in photonic time-bin encoding, nanosecond synchronization, and accurate temperature stabilization. These experimental advances enable a violation by 5.7 standard deviations and open a path towards a certification of indefinite order in conditions that guarantee spacelike separation with existing state-of-the-art devices. However, the certification is not device-independent, as it relies on knowledge about the setup to exclude bidirectional signaling–a loophole inherent to implementations in classical acyclic spacetimes, which may be resolved in future quantum-spacetime tests.

25.
arXiv (CS.CL) 2026-06-15

Knowing When to Quit: A Principled Framework for Dynamic Abstention in LLM Reasoning

LLMs utilizing chain-of-thought reasoning often waste substantial compute by producing long, incorrect responses. Abstention can mitigate this by withholding outputs unlikely to be correct. While most abstention methods decide to withhold outputs before or after generation, dynamic mid-generation abstention considers early termination of unpromising reasoning traces at each token position. Prior work has explored empirical variants of this idea, but principled guidance for the abstention rule remains lacking. We present a formal analysis of dynamic abstention for LLMs, modeling abstention as an explicit action within a regularized reinforcement learning framework. An abstention reward parameter controls the trade-off between compute and information. We show that abstaining when the value function falls below this reward strictly outperforms natural baselines under general conditions. We further derive a principled and efficient method to approximate the value function. Empirical results on mathematical reasoning and toxicity avoidance tasks support our theory and demonstrate improved selective accuracy over existing methods.