Academic Intelligence · Curated Daily

探索全球前沿学术脉络

AcademicHub 汇聚顶级期刊与预印本平台的实时文献。定制您的专属科研雷达,利用大语言模型自动生成交叉领域文献分析简报。

01.
medRxiv (Medicine) 2026-06-17

Womens intentions and motivations towards health behaviour change before pregnancy: a cross-sectional survey of pregnant women in Australia

Introduction: The preconception period (i.e. the weeks and months before pregnancy) is a critical window during which parental health behaviours can influence pregnancy outcomes and the childs long-term health. Modifiable factors such as nutrition, physical activity, substance use, and environmental exposures play a key role, yet womens ability to adopt and sustain healthy behaviours is shaped by complex psychological, social and environmental influences. This study applies the Theory of Planned Behaviour to identify the beliefs underpinning womens preconception behaviours, with the aim of informing support for effective and sustained health behaviour change. Methods: An Australian national retrospective cross-sectional survey of pregnant women (18-49 years), recruited through social media platforms. The 92-item survey captured respondent socio-demographics, pregnancy status and health conditions, health behaviours, and beliefs regarding preconception health behaviours. Respondents level of pregnancy planning was categorised using the London Measure of Unplanned Pregnancy (LMUP). Items regarding preconception beliefs were structured in accordance with the Theory of Planned Behaviour, with a focus on regular exercise, healthy diet, and alcohol avoidance. These beliefs variables were analysed using structured equation modelling to identify paths between latent variables and the items used to estimate each concept. Results: The study was completed by 430 pregnant women of whom 72.7% had a planned pregnancy. Most had a partner, were university educated and in good health. Structural equation modelling showed intention strongly predicted exercise ({beta}=0.65), healthy diet ({beta}=0.54) and alcohol avoidance ({beta}=0.64). Perceived control and partner norms influenced intentions, whereas health professional norms had limited effect. Positive beliefs were associated with folate supplement use and smoking cessation. Conclusion: These findings highlight intention as a key driver of preconception health behaviours, with perceived control and partner influences playing a more significant role than individual beliefs or health professional input. Effective interventions should therefore address structural barriers and actively involve partners, while respecting womens autonomy. Overall, couples-focused, multi-level strategies are likely essential to support meaningful and sustained preconception health behaviour change.

02.
bioRxiv (Bioinfo) 2026-06-11

A quantitative coordinate system for developmental dynamics

Quantitative comparison of morphogenesis across individuals remains a fundamental challenge, as developing embryos vary in shape, orientation and developmental tempo. Moreover, real-time three-dimensional imaging generates large, heterogeneous four-dimensional datasets that are difficult to directly align. As a result, developmental variability is typically described qualitatively rather than measured. Here we introduce STERN, a quantitative framework that learns continuous spatiotemporal representations of morphogenesis directly from in vivo 4D imaging data. By embedding embryos into a shared spatiotemporal space, STERN defines a quantitative developmental coordinate system that enables direct comparison of developmental trajectories across individuals without requiring explicit registration or staging. Applied to mouse embryogenesis, STERN reveals that embryos follow conserved developmental trajectories while progressing at distinct temporal rates, providing a quantitative measure of developmental heterochrony. Extending this framework to zebrafish neural crest light-sheet timelapse imaging, we further show that developmental order is preserved across distinct imaging views even with altered anatomical coverage, supporting the generality of the learned representation across vertebrate imaging contexts. Finally, in developing mouse hearts, where morphogenesis proceeds through subtle and continuously evolving structural changes, STERN resolves fine-scale developmental dynamics at minute-scale temporal resolution that are difficult to localize reproducibly using human experts or general-purpose multimodal AI. Together, these results establish a shared quantitative coordinate system for morphogenesis, in which developmental trajectories become directly comparable across individuals and developmental variability becomes a measurable property.

03.
arXiv (CS.LG) 2026-06-16

Structured Nonparametric Variational Inference for Dependent Latent Modeling

arXiv:2606.15458v1 Announce Type: cross Abstract: Variational inference (VI) is a core engine of modern AI, enabling scalable approximate Bayesian learning and uncertainty-aware training of large probabilistic and generative models. In this paper, we propose Structured Nonparametric Variational Inference (SN-VI), a novel framework for modeling complex dependencies among latent variables in posterior approximation, leveraging multivariate spline techniques. Unlike traditional methods that rely on the mean-field assumption, SN-VI preserves intricate latent variable dependencies, providing a flexible and accurate approximation of posteriors with arbitrary shapes. We establish rigorous theoretical guarantees, including the derivation of the lower bound for the variational objective and proof of asymptotic consistency in posterior estimation. To facilitate practical implementation, we develop an algorithm that automatically identifies dependent latent variables and their underlying dependence structure, without requiring manual specification. Simulation studies validate the effectiveness of SN-VI in approximating posterior distributions with bounded support and complex dependencies. The proposed method has been successfully applied to high-dimensional structured data, including computer vision datasets and spatial transcriptomics. In these applications, SN-VI demonstrates improved generative model performance and effectively uncovers coupled biological signals through the learned dependency structure.

04.
arXiv (CS.CL) 2026-06-11

Judging Against the Reference: Uncovering Knowledge-Driven Failures in LLM-Judges on QA Evaluation

While large language models (LLMs) are increasingly used as automatic judges for question answering (QA) and other reference-conditioned evaluation tasks, little is known about their ability to adhere to a provided reference. We identify a critical failure mode of such reference-based LLM QA evaluation: when the provided reference conflicts with the judge model's parametric knowledge, the resulting scores become unreliable, substantially degrading evaluation fidelity. To study this phenomenon systematically, we introduce a controlled swapped-reference QA framework that induces reference-belief conflicts. Specifically, we replace the reference answer with an incorrect entity and construct diverse pairings of original and swapped references with correspondingly aligned candidate answers. Surprisingly, grading reliability drops sharply under swapped references across a broad set of judge models. We empirically show that this vulnerability is driven by judges' over-reliance on parametric knowledge, leading judges to disregard the given reference under conflict. Finally, we find that this failure persists under common prompt-based mitigation strategies, highlighting a fundamental limitation of LLM-as-a-judge evaluation and motivating reference-based protocols that enforce stronger adherence to the provided reference.

05.
arXiv (CS.CL) 2026-06-15

Efficiency-Performance Trade-offs in Neural Speaker Diarization via Structured Pruning and Low-Bit Quantization

Streaming speaker diarization is crucial for time-critical medical dispatch, but deploying it on resource-constrained hardware requires smaller, faster models. Using SIMSAMU, a dataset of simulated medical-dispatch conversations, we evaluate streaming behavior before compressing the segmentation model with pruning and low-bit quantization. We characterize performance across a range of streaming latency budgets and find that additional buffering is not consistently beneficial, while very low-latency operating points can substantially degrade performance. Our study shows that model compression trades performance for memory footprint, and we highlight an operating point where FP16 reduces model size by half with essentially unchanged real-time factor, at a cost of a 40\% relative DER increase against the baseline. This work characterizes the trade-offs for real-time deployment and contributes to speech technology that can enable reliable human communication in time-critical contexts.

06.
arXiv (CS.AI) 2026-06-16

FOUNDv2: Learning Unified User Quantized Tokenizers for User Representation

arXiv:2508.00956v3 Announce Type: replace-cross Abstract: User representation learning serves as a fundamental pillar for personalized services on large-scale web platforms. Despite its importance, conventional continuous embedding methods face significant challenges, including the lack of a unified paradigm for multi-source data integration, prohibitive storage overhead due to low information density, and the lack of multi-scale modeling granularity. To overcome these limitations, we introduce FOUNDv2, a comprehensive user representation scheme centered on the Unified User Quantized Tokenizer U2QT) framework. FOUNDv2 transforms heterogeneous user data into a standardized discrete token space through a robust two-stage architecture. Specifically, the framework first extracts compact feature representations and subsequently employs a multi-view RQ-VAE to discretize them into storage-efficient tokens using shared and source-specific codebooks. To empower these representations with predictive intelligence, we further design multi-scale alignment objectives to capture both fine-grained behavioral dependencies and macro-temporal periodicity. Extensive experiments on various benchmarks demonstrate that FOUNDv2 consistently outperforms task-specific baselines while achieving substantial reductions in storage and computational costs. Finally, the large-scale deployment of FOUNDv2 on Alipay validates its practical scalability and efficiency across diverse industrial scenarios. The main code is available at: https://github.com/chuanhe1999/FOUNDv2.

07.
arXiv (CS.CV) 2026-06-16

InfoGeo: Information-Theoretic Object-Centric Learning for Cross-View Generalizable UAV Geo-Localization

Cross-view geo-localization (CVGL) is fundamental for precise localization and navigation in GPS-denied environments, aiming to match ground or UAV imagery with satellite views. Existing approaches often rely on global feature alignment, but they suffer from substantial domain shifts induced by varying regional textures and weather conditions. This issue becomes even more pronounced in UAV-based scenarios, where the broader perspective inevitably introduces dense, fine-grained objects, creating significant visual clutter. To address this, we draw inspiration from Object-Centric Learning (OCL) and propose InfoGeo, an information-theoretic framework designed to enhance robustness and generalization. InfoGeo reformulates the optimization as an information bottleneck process with two core objectives: (i) maximizing view-invariant information by aligning the object-centric structural relations across views, and (ii) minimizing view-specific noisy signals through cross-view knowledge constraints. Extensive evaluations across diverse benchmarks and challenging scenarios demonstrate that InfoGeo significantly outperforms state-of-the-art methods.

08.
arXiv (CS.CL) 2026-06-19

TransLaw: A Large-Scale Dataset and Multi-Agent Benchmark Simulating Professional Translation of Hong Kong Case Law

Translating Hong Kong Court Judgments from English to Traditional Chinese is mandated by Articles 8-9 of the Basic Law, yet remains constrained by a shortage of parallel resources and rigorous demands on legal terminology, citation format, and judicial style. We introduce HKCFA Judgment 97-22, the first large-scale sentence-aligned parallel corpus for HK case law, comprising 344 professionally translated judgments (11,099 sentence pairs; 2.1M tokens) spanning 1997-2022. Building on this resource, we propose TransLaw, a multi-agent framework that decomposes translation into word-level expression, sentence-level translation, and multidimensional review, integrating a specialized Hong Kong legal glossary database, Retrieval-Augmented Generation, and iterative feedback, with four-dimensional expert review covering semantic alignment, terminology, citation, and style. Benchmarking 13 open-source and commercial LLMs, we demonstrate that TransLaw significantly outperforms single-agent baselines across all evaluated models, with convergence within 3 iterations. Human evaluation by 10 certified legal translators using our proposed Legal ACS metric confirms gains in legal-semantic accuracy, while showing that TransLaw still trails human experts in stylistic naturalness. The dataset and benchmark code are available at https://github.com/xuanxixi/TransLaw.

09.
arXiv (quant-ph) 2026-06-16

Analyzing Initialization Strategies for the Local Unitary Cluster Jastrow Ansatz within the Quantum-Centric Supercomputing Framework

arXiv:2606.14933v1 Announce Type: cross Abstract: In this study, we analyze the choice of local unitary cluster Jastrow (LUCJ) ansatz initialization and sensitivity of the sample-based quantum diagonalization (SQD) algorithm within the quantum-centric supercomputing (QCSC) framework. We examine six initialization strategies, including those based on coupled-cluster singles and doubles (CCSD), M{\o}ller-Plesset second-order perturbation theory (MP2), data-driven coupled-cluster (DDCC), and trivial (zeroes and random) initializations, across twelve molecular systems and three basis sets (STO-3G, cc-pVDZ, and aug-cc-pVDZ). We find that while the mean absolute percentage errors (MAPEs) between the alternative and CCSD-initialized t2-amplitudes span many orders of magnitude, the resulting SQD energies are largely insensitive to this variation. In particular, most initializations recover energies within chemical accuracy (+/-1.6 mEh) of the CCSD reference, with convergence improving as the basis set size increases. Notably, random initialization achieves performance competitive with CCSD across all basis sets, while zeroes initialization, despite having smaller deviations from CCSD, yields the worst energy agreement. Our results highlight that the proximity to the CCSD initialization is not a reliable predictor of the quality of electronic energies. These findings establish that configuration recovery within SQD, rather than circuit initialization, is the dominant factor governing energy accuracy, and suggest that computationally cheaper initialization strategies are viable alternatives to CCSD for QCSC workflows

10.
arXiv (CS.CV) 2026-06-11

Causal Clothes-Invariant Feature Learning for Cloth-Changing Person Re-ID

In cloth-changing person re-identification (CCReID), it is critical to learn clothes-invariant feature, which can provide discriminative ID features that remain robust against clothing changes. However, a spurious correlation currently limits existing ReID methods from effectively extracting these clothing-invariant features. This spurious correlation arises from clothing ownership: clothing is rarely shared across different identities, so models tend to memorize clothing cues for identity recognition, and this strategy generalizes poorly to unseen clothing. In this paper, we propose Causal Clothes-Invariant Learning (CCIL), which explicitly shifts CC-ReID from likelihood learning P (Y|X) to causal intervention learning P (Y|do(X)) to block the clothing shortcut. CCIL realizes this intervention through three modules: a Confounder Dictionary, an Intervention Module, and Disentangle Regularization. The causality-based modeling makes the entire model naturally clothes-invariant, effectively preventing the capture of spurious correlations in feature learning. Extensive experiments validate the effectiveness of CCIL. On PRCC and DeepChange datasets, CCIL achieves Rank-1 accuracies of 66.4% and 59.2%, outperforming state-of-the-art methods by 1.4 and 4.1 percentage points, respectively.

11.
arXiv (CS.CL) 2026-06-11

Doc-to-Atom: Learning to Compile and Compose Memory Atoms

Long input sequences are central to document understanding and multi-step reasoning in Large Language Models, yet the quadratic cost of attention makes inference both memory-intensive and slow. Context distillation mitigates this by compressing contextual information into model parameters, and recent work such as Doc-to-LoRA amortizes context distillation into a single forward pass that generates one LoRA adapter per document. However, producing a single monolithic adapter for all queries leads to irrelevant-query interference, limited compositional recall, and poor scalability to long-document reasoning. To address these challenges, we propose Doc-to-Atom (Doc2Atom), a compositional parametric memory framework that decomposes each document into semantically typed knowledge atoms. Each atom is compiled into an independent micro-LoRA adapter and a provenance retrieval key. At inference time, a lightweight query router selects and assembles only the relevant atoms into a query-specific adapter, which is then injected into a frozen base model. The entire system is trained end-to-end through a multi-objective distillation framework. Experiments on six diverse QA benchmarks demonstrate that Doc2Atom outperforms Doc-to-LoRA baselines while reducing the memory cost of document internalization.

12.
arXiv (CS.LG) 2026-06-19

CAGE: Curvature-Aware Gradient Estimation For Accurate Quantization-Aware Training

arXiv:2510.18784v3 Announce Type: replace Abstract: Despite significant work on low-bit quantization-aware training (QAT), there is still an accuracy gap between such techniques and native training. To address this, we introduce CAGE (Curvature-Aware Gradient Estimation), a new QAT method that augments the straight-through estimator (STE) gradient with a curvature-aware correction designed to counteract the loss increase induced by quantization. CAGE is derived from a multi-objective view of QAT that balances loss minimization with the quantization constraints, yielding a principled correction term that depends on local curvature information. On the theoretical side, we introduce the notion of Pareto-optimal solutions for quantized optimization, and establish that CAGE yields strong convergence guarantees in the smooth non-convex setting. In terms of implementation, our approach is optimizer-agnostic, but we provide a highly-efficient implementation that leverages Adam statistics. CAGE significantly improves upon the prior state-of-the-art methods in terms of accuracy, for similar computational cost: for QAT fine-tuning, it halves the compression accuracy loss relative to the prior best method, while for QAT pre-training of Llama models, its accuracy for 3-bit weights-and-activations (W3A3) matches the accuracy achieved at 4-bits (W4A4) with the prior best method. The official implementation can be found over https://github.com/IST-DASLab/CAGE .

13.
arXiv (quant-ph) 2026-06-19

Recursive perturbation approach to time-convolutionless master equations: Explicit construction of generalized Lindblad generators for arbitrary open systems

arXiv:2506.04095v2 Announce Type: replace Abstract: We develop a recursive perturbative expansion for the time-convolutionless (TCL) generator of an open quantum system in a generalized Lindblad form. This formulation provides a systematic approach to derive the generator at arbitrary order while preserving a Lindblad-like structure, without imposing assumptions on the system or environment beyond an initially uncorrelated state. The generator is written, at all orders, in a canonical form, which also corresponds to the minimal dissipation condition, which uniquely specifies the decomposition of the generator into Hamiltonian and dissipative contributions. To validate the method and show its effectiveness in addressing non-Markovian dynamics and strong-coupling effects, we compute the generator explicitly up to fourth order.

14.
arXiv (CS.AI) 2026-06-12

A Three-Layer Framework for AI in Scientific Discovery

作者:

arXiv:2606.13566v1 Announce Type: new Abstract: Current discussions of AI in scientific discovery are often dominated by two visible capabilities: search over existing knowledge and execution through optimization, simulation, and automation. Both are important, but neither fully captures the central act of discovery: the formation and evolution of models. This paper proposes a three-layer view of AI in discovery. Layer 1 is search and retrieval by large language models. Layer 2, as the main innovation of this paper, is model formation through qualitative reasoning: the capacity to recognize when a current framework is structurally inadequate and to understand the problem within a broader representational space, not through trial and error, but through structural insight into what is missing and where it can be found. Layer 3 is execution, optimization, and refinement. The main claim is that Layer 2 is both the most important and the least developed. Search without model formation remains confined to inherited frameworks, while execution without conceptual revision only amplifies an existing formulation. We illustrate Layer 2 reasoning through three case studies: S. S. Chern's intrinsic proof of the Gauss-Bonnet theorem, the resolution of the Nesterov Accelerated Gradient convergence problem via Lyapunov functions, and the autonomous disproof of the Erdos unit distance conjecture by OpenAI in 2026. Each case exhibits the same structural signature: a framework that had become inadequate, a missing conceptual object, and a resolution found in an unexpected neighboring field.

15.
arXiv (CS.CL) 2026-06-11

FinTradeBench: A Financial Reasoning Benchmark for LLMs

Real-world financial decision-making is a challenging problem that requires reasoning over heterogeneous signals, including company fundamentals derived from regulatory filings and trading signals computed from price dynamics. Recently, with advances in Large Language Models (LLMs), financial analysts have begun to use them for financial decision-making tasks. However, existing financial question-answering benchmarks for testing these models primarily focus on company balance sheet data and rarely evaluate reasoning about how company stocks trade in the market or their interactions with fundamentals. To leverage the strengths of both approaches, we introduce FinTradeBench, a benchmark for evaluating financial reasoning that integrates company fundamentals and trading signals. FinTradeBench contains 1,400 questions grounded in NASDAQ-100 companies over a ten-year historical window. The benchmark is organized into three reasoning categories: fundamentals-focused, trading-signal-focused, and hybrid questions requiring cross-signal reasoning. To ensure reliability at scale, we adopt a calibration-then-scaling framework that combines expert seed questions, multi-model response generation, intra-model self-filtering, numerical auditing, and human-LLM judge alignment. We evaluate 14 LLMs under zero-shot prompting and retrieval-augmented settings and witness a clear performance gap. Retrieval substantially improves reasoning over textual fundamentals, but provides limited benefit for trading-signal reasoning. These findings highlight fundamental challenges in the numerical and time-series reasoning for current LLMs and motivate future research in financial intelligence.

16.
arXiv (CS.CV) 2026-06-16

Label Shift Aware Adaptation for Online Zero-shot Learning with Contrastive Language-Image Pre-Training (CLIP)

Vision-language models like Contrastive Language-Image Pre-Training (CLIP) have been extensively studied in data-scarce scenarios. A particularly challenging and realistic task in this area is online zero-shot learning with CLIP, where unknown test samples are predicted sequentially in random order by CLIP while keeping the feature extraction and model parameters fixed during the sequential inference phase. Most existing approaches in this setting address the problem by adapting representations online using incoming test samples, while neglecting the distribution of the data on which CLIP was initially trained. This mismatch can lead to degraded performance when the label distribution in the test data differs from that of the training domain. To address this gap, we propose Label Shift Aware (LSA), which formulates the online zero-shot classification task as a domain adaptation problem. Specifically, LSA adapts the predictions computed by CLIP, which was trained on an unknown source distribution, to a target distribution using only unlabeled test data, and applies label shift correction to mitigate the mismatch between the source and target domains. The extensive experiments across multiple datasets demonstrate that the proposed LSA consistently outperforms state-of-the-art online zero-shot learning methods based on CLIP.

17.
arXiv (quant-ph) 2026-06-17

Tensor network compression using fluid dynamics as a testbed: Analytical foundations in one dimension

arXiv:2606.17064v1 Announce Type: cross Abstract: High performance computers produce extreme-scale data sets that require sampling or compression if they are to be used to their full potential. Existing data compression techniques typically exploit features such as sparsity in the data, homogeneity in the data, or {\it a priori} knowledge of what subsets of data are of most interest. Fluid dynamics data in general do not exhibit these features and so are attractive test beds for generic compression techniques that are objective, robust, and tuneable with respect to information lost due to compression. Presented here is a method based on tensor networks, specifically matrix product states or tensor trains, that meets these requirements. The method is demonstrated for compression in one-dimension and is extensible to higher dimensionality. Lossless compression is demonstrated for random Fourier series for sufficiently high bond dimension of the tensor network, with the memory required to store the tensor network scaling directly proportional to the bond dimension. The lossy compression exhibited at lower bond dimension can be well within the relative error of many fluid simulations. The compression algorithm is tested for the time evolution of Burger's equation with excellent results. We additionally demonstrate the capability to perform computations in the compressed form through a tensor network periodic convolution that can be orders of magnitude faster than using fast Fourier transforms and the convolution theorem. In addition to being an attractive method for working with data sets generated by existing computers, the tensor network methods utilised are directly translatable to the emerging paradigm of quantum computing.

18.
arXiv (quant-ph) 2026-06-19

Resolving problems with the continuum limit in coherent-state path integrals

arXiv:2602.02466v2 Announce Type: replace Abstract: The paper solves the problem of continuum limit in bosonic thermal coherent-state path integrals. For this purpose, exact discrete versions of the path integral are constructed for three different orderings of the Hamiltonian: normal, anti-normal and symmetric (Weyl order). Subsequently, their different continuum versions are checked on the harmonic oscillator, to choose the symmetric ordering as a possibly correct choice for all polynomial Hamiltonians. Spotted mathematical subtleties in the simple case serve as a clue to the general solution. Finally, a general justification for the symmetric order is provided by deriving the continuum path integral starting from the exact discrete case using a renormalization procedure in the imaginary time frequency domain. While the role of Weyl order has already been found, the paper provides the missing proof of its suitability for every polynomial Hamiltonian and simplifies the previously established construction by referring only to creation and annihilation operators (without position and momentum operators).

19.
arXiv (CS.LG) 2026-06-11

PCA-Enhanced Adaptive NVAR Framework for High-Resolution Sea Surface Temperature Forecasting in the East Sea

arXiv:2606.12141v1 Announce Type: new Abstract: Accurate forecasting of sea surface temperature (SST) in regional seas such as the East Sea is crucial for monitoring marine ecosystems, assessing climate risks, managing fisheries, and conducting naval operations. Traditional numerical ocean models provide reliable predictions but are computationally expensive and often unsuitable for real-time forecasting. Many deep learning methods also struggle with high-dimensional spatiotemporal ocean data and experience error accumulation over longer forecasting periods. This study builds on our previously proposed Adaptive Next-Generation Reservoir Computing (Adaptive NVAR) framework, initially introduced and tested on synthetic dynamical systems, and extends it to ocean forecasting. We present a reduced-order forecasting framework that combines Singular Value Decomposition (SVD) with Adaptive NVAR to predict SST dynamics in the East Sea. SST fields are compressed into a low-dimensional representation using SVD, which extracts dominant modes of ocean variability. Adaptive NVAR models the temporal evolution of these latent states, and the predicted states are reconstructed into SST forecasts. We evaluate the framework using regional ocean datasets and compare it with the standard NG-RC/NVAR. Results show that Adaptive NVAR consistently achieves lower forecasting errors across multiple prediction horizons. In addition, SVD reduces computational complexity, resulting in a fast and scalable framework suitable for real-time ocean forecasting.

20.
arXiv (CS.CV) 2026-06-16

Focus When Necessary: Adaptive Routing and Collaborative Grounding for Training-Free Visual Grounding

While Multimodal Large Language Models (MLLMs) excel in cross-modal reasoning, they often struggle to perceive fine-grained details in complex high-resolution images. Recent training-free methods address this through image scaling and localized cropping. However, applying these manipulations indiscriminately introduces computational redundancy for simple queries and can degrade accuracy by truncating essential global context or introducing irrelevant background noise. To this end, we propose LazyMCoT, a dynamic and training-free framework that adaptively allocates visual grounding efforts based on sample difficulty. The framework features an Adaptive Routing mechanism that evaluates predictive uncertainty using first-token statistics from a single forward pass. This efficiently bypasses confident cases while ensuring the recall of difficult samples via conformal calibration. For these challenging cases, a Collaborative Grounding module integrates the inherent cross-modal attention of the model with an external visual expert through a two-stage refinement process. This refinement process generates a precise localized display to recover small or occluded targets. Extensive experiments across diverse benchmarks demonstrate that LazyMCoT rivals training-based approaches by simultaneously improving reasoning accuracy and reducing average inference latency. Our code is availble at https://github.com/TencentBAC/LazyMCoT.

21.
arXiv (math.PR) 2026-06-11

Asymptotic analysis of the finite predictor for fractional Gaussian noise

arXiv:2504.01562v2 Announce Type: replace-cross Abstract: This paper proposes a new approach to the asymptotic analysis of the finite predictor for stationary sequences. Our method yields the exact asymptotics of both the relative prediction error and the partial correlation coefficients. The underlying assumptions are analytic in nature, making the approach applicable to processes with long-range dependence. The ARMA-type process driven by fractional Gaussian noise (fGn), which had previously remained elusive, is used as a case study.

22.
arXiv (CS.AI) 2026-06-18

InfoPO: Information-Driven Policy Optimization for User-Centric Agents

arXiv:2603.00656v2 Announce Type: replace Abstract: Real-world user requests to LLM agents are often underspecified. Agents must interact to acquire missing information and make correct downstream decisions. However, current multi-turn GRPO-based methods often rely on trajectory-level reward computation, which leads to credit assignment problems and insufficient advantage signals within rollout groups. A feasible approach is to identify valuable interaction turns at a fine granularity to drive more targeted learning. To address this, we introduce InfoPO (Information-Driven Policy Optimization), which frames multi-turn interaction as a process of active uncertainty reduction and computes an information-gain reward that credits turns whose feedback measurably changes the agent's subsequent action distribution compared to a masked-feedback counterfactual. It then combines this signal with task outcomes via an adaptive variance-gated fusion to identify information importance while maintaining task-oriented goal direction. Across diverse tasks, including intent clarification, collaborative coding, and tool-augmented decision making, InfoPO consistently outperforms prompting and multi-turn RL baselines. It also demonstrates robustness under user simulator shifts and generalizes effectively to environment-interactive tasks. Overall, InfoPO provides a principled and scalable mechanism for optimizing complex agent-user collaboration. Code is available at https://github.com/kfq20/InfoPO.

23.
arXiv (CS.CL) 2026-06-11

ProcessThinker: Enhancing Multi-modal Large Language Models Reasoning via Rollout-based Process Reward

Visual question answering increasingly requires multi-step reasoning. Recent post-training with reinforcement learning under verifiable rewards (RLVR) and Group Relative Policy Optimization (GRPO) can improve multimodal reasoning, but most approaches rely on sparse outcome-only rewards. As a result, they struggle to tell whether an incorrect answer comes from a small mistake late in the reasoning or from an unhelpful trajectory from the start. A common solution is to train a process reward model (PRM) for step-level supervision, but this typically requires large-scale high-quality chain-of-thought annotations and additional training cost. We propose ProcessThinker, a practical post-training pipeline that provides step-level process rewards without training an explicit PRM. ProcessThinker first rewrites reasoning traces into a step-tagged format for cold-start supervised fine-tuning, then applies GRPO with a standard format reward and our rollout-based process reward. Concretely, for each intermediate step, we sample multiple continuations from that step and use the empirical success rate (final-answer verification) as the step reward. This gives dense credit assignment and encourages reasoning steps that more reliably support a correct conclusion, helping reduce inconsistent or self-contradictory progress across steps – a key issue in logical reasoning. Across four challenging video benchmarks (Video-MMMU, MMVU, VideoMathQA, and LongVideoBench), ProcessThinker consistently improves over the baseline model Qwen3-VL-8B-Instruct

24.
medRxiv (Medicine) 2026-06-22

Effect of Lowering the Drink-Driving Blood Alcohol Limit in Scotland on Road Traffic Crashes: a Synthetic Difference-in-Differences Study

Objective: To evaluate the road safety impact arising from Scotlands 2014 reduction in the legal blood alcohol concentration (BAC) limit for drivers, and to assess whether the effect of the reform varied across different spatial contexts. Design: A quasi-experimental statistical longitudinal study using a Synthetic Difference-in-Differences (SDID) approach. Setting: Small-area panel data for Great Britain, with areas (Middle-layer Super Output Areas, MSOAs, in England and Wales and Intermediate Zones, IZs, in Scotland) classed into control and treatment groups according to whether they were exposed to Scotlands BAC reform. The control and treatment groups comprise 7088 spatial units in England and Wales and 852 spatial units in Scotland, respectively, observed over the period 2008-2019. Participants: The study primarily analyses police-reported road traffic collision data from the UK Department for Transports STATS19 system. Data were analysed at the MSOA/IZ level. This is a secondary dataset, and we therefore did not involve patients or the public in formulating the research question, determining outcome measures, or designing and conducting the study. Main Outcome Measures: The main outcome measures were log-transformed rates of total road traffic crashes, and (weekend) night-time crashes (22:00-04:00) per 100,000 population. The latter is used as a proxy measure for drunk driving. Results: Our results indicate that the reduction in the legal BAC limit led to statistically significant declines in road traffic crash rates. Aggregate estimates suggest reductions of 12.0% (95% confidence interval (CI): [-13.7%, -10.3%]) in total crashes, 15.6% (95% CI: [-20.7%, -10.2%]) in night-time crashes, and 12.4% (95% CI: [-16.7%, -7.9%]) in weekend night-time crashes. We also find substantial heterogeneity in treatment effects across spatial contexts. Effects were strongest in rural and less densely populated areas, where reductions exceeded 16% (95% CI: [-18.7%, -13.9%]) for total crashes and reached up to 29.6% (95% CI: [-35.8%, -22.8%]) for night-time and 21.4% (95% CI: [-28.3%, -13.9%]) for weekend night-time crashes. Moderate but statistically significant effects were also observed in dense urban areas, whereas effects in suburban and transitional areas were smaller and not statistically significant. Conclusions: Our analysis suggests that lowering the legal BAC limit in Scotland led to meaningful reductions in road traffic crashes, particularly during higher-risk periods and in rural areas. The findings further suggest that the effectiveness of BAC regulation may vary across local contexts, highlighting the importance of accounting for spatial heterogeneity when evaluating road safety policies.

25.
arXiv (CS.CV) 2026-06-15

ADAPT: An Autonomous Forklift for Construction Site Operation

Efficient material logistics play a critical role in controlling costs and schedules in the construction industry. However, manual material handling remains prone to inefficiencies, delays, and safety risks. Autonomous forklifts offer a promising solution to streamline on-site logistics, reducing reliance on human operators and mitigating labor shortages. This paper presents the development and evaluation of ADAPT (Autonomous Dynamic All-terrain Pallet Transporter), a fully autonomous off-road forklift designed for construction environments. Unlike structured warehouse settings, construction sites pose significant challenges, including dynamic obstacles, unstructured terrain, and varying weather conditions. To address these challenges, our system integrates AI-driven perception techniques with traditional approaches for decision making, planning, and control, enabling reliable operation in complex environments. We validate the system through extensive real-world testing, comparing its continuous performance against an experienced human operator across various weather conditions. Our findings demonstrate that autonomous outdoor forklifts can operate near human-level performance, offering a viable path toward safer and more efficient construction logistics.