Academic Intelligence · Curated Daily

探索全球前沿学术脉络

AcademicHub 汇聚顶级期刊与预印本平台的实时文献。定制您的专属科研雷达,利用大语言模型自动生成交叉领域文献分析简报。

01.
arXiv (quant-ph) 2026-06-16

Magnetic control of an exciton-polariton condensate in a van der Waals magnet

arXiv:2506.06010v3 Announce Type: replace-cross Abstract: Quasiparticle condensates are among the most spectacular solid-state manifestations of quantum physics. Coupling macroscopic real-space wavefunctions to additional degrees of freedom, such as the electron spin, would add valuable control knobs for quantum applications. While creating spin-carrying superconducting condensates has attracted enormous attention, man-made condensates of light-matter hybrids known as exciton-polaritons have lacked an analogous spin-based perspective. Here we open a new door by demonstrating magnetically tunable exciton-polariton condensation in the van der Waals magnet CrSBr. Under photoexcitation, CrSBr microwires embedded in an optical cavity show the hallmarks of polariton condensation: a dramatic increase of the emission intensity from an excited laterally confined polariton state by multiple orders of magnitude, spectral narrowing of the emission line, and a continuous shift of the peak energy. Interferometry evidences an increase in spatial and temporal coherence. Owing to the strong coupling between the spin order and excitonic correlation, the energy of the condensate can be tuned by up to 10.5 meV by an external magnetic field of only 2 Tesla. Our results establish CrSBr microcavities as a powerful platform for exploring magnetic control of polariton condensates and mark a significant step toward spin-controlled coherent quantum light sources.

02.
arXiv (CS.LG) 2026-06-19

Efficient Neural Network Model Selection for Few-Class Application Datasets

arXiv:2606.19712v1 Announce Type: new Abstract: While much effort has focused on developing and benchmarking high-performance neural networks, less attention has been given to how dataset properties, known to practitioners, can guide efficient model selection. Neural models are typically evaluated on datasets with thousands of classes, yet many real-world applications involve fewer than ten. To address this understudied but common setting, we develop a measure of classification difficulty based on data-side properties and show how it enables more efficient model selection for few-class datasets, where traditional approaches are less effective. We term this phenomenon "few-class distinctiveness". Our metric allows comparison of models and datasets 6 to 29$\times$ faster than repeated training and testing. Leveraging this insight, we extend scaled model families below the smallest published models, achieving greater efficiency at similar accuracy, for example models up to 42% smaller than YOLOv5-nano for a mobile robot task. Targeting resource-constrained applications, we demonstrate few-class model selection across mobile robot, drone, and IoT scenarios, highlighting practical gains in efficiency without sacrificing performance.

03.
arXiv (CS.CV) 2026-06-19

TimeProVe: Propose, then Verify for Efficient Long Video Temporal Reasoning in Activities of Daily Living

Long Video Question Answering (LVQA) requires identifying sparse, query-relevant evidence within hours-long untrimmed videos. Existing approaches either process videos densely with large vision-language models (VLMs), incurring prohibitive computational cost, or rely on sparse caption-based reasoning, which often misses temporally localized and motion-centric evidence. We introduce TimeProVe, a cost-efficient hybrid framework for temporally grounded reasoning in long videos. TimeProVe first employs lightweight modules to generate action-grounded answer–evidence hypotheses and subsequently invokes an expensive VLM only for targeted verification. The core of our framework lies in the Action-based Candidate Evidence (ACE) module, which converts temporally localized actions into query-conditioned candidate answers and supporting evidence windows through lightweight LLM reasoning. We further introduce OpenTSUBench (OTB), an open-ended benchmark designed to evaluate temporally grounded reasoning in real-world Activities of Daily Living (ADL) scenarios. Experiments show that TimeProVe outperforms the strongest baseline on OTB by 7.3%, while reducing VLM calls by 75% and inference cost by 93%. Furthermore, without explicit temporal grounding training, TimeProVe achieves competitive performance on Charades-STA, and reaches state-of-the-art results when enhanced with grounding VLMs.

04.
arXiv (CS.LG) 2026-06-16

TextResNet: Decoupling and Routing Optimization Signals in Compound AI Systems via Deep Residual Tuning

arXiv:2602.08306v2 Announce Type: replace Abstract: Textual Gradient-style optimizers (TextGrad) enable gradient-like feedback propagation through compound AI systems. However, they do not work well for deep chains. The root cause of this limitation stems from the Semantic Entanglement problem in these extended workflows. In standard textual backpropagation, feedback signals mix local critiques with upstream contexts, leading to Attribution Ambiguity. To address this challenge, we propose TextResNet, a framework that reformulates the optimization process to achieve precise signal routing via four key innovations. Firstly, in the forward pass, it enforces Additive Semantic Deltas to preserve an Identity Highway for gradient flow. Secondly, in the backward pass, it introduces Semantic Gradient Decomposition via a Semantic Projector to disentangle feedback into causally independent subspaces. Thirdly, it implements Causal Routing, which routes projected signals to their specific components. Finally, it performs Density-Aware Optimization Scheduling to leverage the disentangled signals to dynamically allocate resources to key system bottlenecks. Our results show that TextResNet not only achieves superior performance compared to TextGrad, but also exhibits remarkable stability for agentic tasks in compound AI systems where baselines collapse. Code is available at https://github.com/JeanDiable/TextResNet.

05.
arXiv (math.PR) 2026-06-11

Matrix Discrepancy for Representations of Finite Groups

arXiv:2606.12181v1 Announce Type: new Abstract: Given a finite group $G$, we prove that there exist signs $\varepsilon\in\{\pm1\}^G$ such that $$\left\| \sum_{g\in G} \varepsilon_g\rho(g) \right\|\leq C\, \sqrt{|G|},$$ where $\rho$ is the left regular representation of $G$, and $C$ is a universal constant. This special case of the Matrix Spencer conjecture was posed in [BKMZ24], where it was established for simple groups.

06.
arXiv (CS.CV) 2026-06-16

Prompt Disentanglement via Language Guidance and Representation Alignment for Domain Generalization

Domain Generalization (DG) seeks to develop a versatile model capable of performing effectively on unseen target domains. Notably, recent advances in pre-trained Visual Foundation Models (VFMs), such as CLIP, have demonstrated considerable potential in enhancing the generalization capabilities of deep learning models. Despite the increasing attention toward VFM-based domain prompt tuning within DG, the effective design of prompts capable of disentangling invariant features across diverse domains remains a critical challenge. In this paper, we propose addressing this challenge by leveraging the controllable and flexible language prompt of the VFM. Noting that the text modality of VFMs is naturally easier to disentangle, we introduce a novel framework for text feature-guided visual prompt tuning. This framework first automatically disentangles the text prompt using a large language model (LLM) and then learns domain-invariant visual representation guided by the disentangled text feature. However, relying solely on language to guide visual feature disentanglement has limitations, as visual features can sometimes be too complex or nuanced to be fully captured by descriptive text. To address this, we introduce Worst Explicit Representation Alignment (WERA), which extends text-guided visual prompts by incorporating an additional set of abstract prompts. These prompts enhance source domain diversity through stylized image augmentations, while alignment constraints ensure that visual representations remain consistent across both the original and augmented distributions. Experiments conducted on major DG datasets, including PACS, VLCS, OfficeHome, DomainNet, and TerraInc, demonstrate that our proposed method outperforms state-of-the-art DG methods.

07.
medRxiv (Medicine) 2026-06-19

Cardiometabolic multimorbidity and care experiences in primary healthcare among Brazilian adults aged 50 and over (ELSI-Brazil)

Background: Population aging and the rising burden of non-communicable diseases have increased the prevalence of cardiometabolic multimorbidity (CM-MM) among older adults. Patient-reported experience measures (PREMs) are recognized as essential components of healthcare quality assessment, yet evidence on primary care experiences among individuals with CM-MM remains scarce. Objective: To analyze primary care experiences according to the presence of cardiometabolic multimorbidity among Brazilians aged 50 years and older. Methods: Cross-sectional study using data from the second wave of the Brazilian Longitudinal Study of Aging (ELSI-Brazil, 2019-2021; n = 9,949). CM-MM was defined as the self-reported coexistence of two or more of the following conditions: hypertension, diabetes mellitus, dyslipidemia, acute myocardial infarction, and stroke. Primary care experiences were assessed using a validated 12-item instrument organized into four domains: first-contact access, longitudinality, communication, and care coordination. Associations were estimated using Poisson regression adjusted for sociodemographic, health conditions, and healthcare utilization variables, with stratified analysis by Family Health Strategy (FHS) coverage. Results: CM-MM prevalence was 25.5%, with a progressive increase by age and an inverse gradient by education. Individuals with CM-MM reported significantly more positive experiences in longitudinality (mean index 2.53 vs. 2.34; adjusted PR = 1.22; 95%CI 1.12-1.33; p < 0.001) and, to a lesser extent, in communication (mean index 2.68 vs. 2.58; adjusted PR = 1.10; 95%CI 1.00-1.20; p = 0.041). No statistically significant differences were found in first-contact access or care coordination. After stratified by FHS coverage, the observed differences in longitudinality and communication were no longer statistically significant. Conclusions: CM-MM was associated with more positive primary care experiences in longitudinality and communication. The absence of differentiated experiences in first-contact access and coordination highlights structural gaps in primary care responsiveness to individuals with greater clinical complexity. Keywords: Multimorbidity; Cardiometabolic diseases; Primary Care; Patient-reported experience measures; Older adults; ELSI-Brazil.

08.
arXiv (CS.CV) 2026-06-11

Time-Conditioned and Multi-Time Survival Prediction from 2D PET/CT Projections in Lung Cancer

Accurate prediction of overall survival (OS) from positron emission tomography/computed tomography (PET/CT) can support personalized treatment and follow-up strategies in oncology. However, the impact of temporal modeling on imaging-based survival prediction remains insufficiently explored. We investigate how different temporal formulations influence survival prediction by developing two complementary approaches: Attention-guided Time-Conditioned Survival (ATCS) and Multi-Time Survival (MTS). We retrospectively analyzed pre-treatment PET/CT images from 848 patients with non-small cell lung cancer (NSCLC), including 556 for model development and 292 for held-out testing. A previously proposed Time-Conditioned Survival (TCS) model was used as a baseline. Models were trained using 5-fold cross-validation and evaluated on the test set using time-dependent area under the curve (AUC) at 6-month intervals from 0.5 to 5 years. Both ATCS and MTS outperformed the baseline TCS model, achieving mean AUCs of 0.794 and 0.793, respectively, compared to 0.767. ATCS performed better at earlier time points (0.5-3 years), whereas MTS performed better at later intervals (3.5-5 years). Combining tumor-specific and tissue-wise PET/CT features improved performance over either input alone. Finer temporal discretization improved short-term prediction, while coarser intervals provided more stable long-term estimates. These findings demonstrate that temporal modeling and input design influence PET/CT-based survival prediction. The proposed approaches enable time-specific survival estimation from pre-treatment imaging and may support improved risk stratification and clinical decision-making.

09.
arXiv (CS.CL) 2026-06-15

"I Didn't Make the Micro Decisions": Measuring, Inducing, and Exposing Goal-Level AI Contributions in Collaboration

As large language models (LLMs) increasingly shape how users form, refine, and extend their goals, attributing contributions in human-AI collaboration becomes critical for users calibrating their own reliance and for evaluators assessing AI-assisted work. Yet existing methods focus on final artifacts, missing the process through which goals themselves are jointly shaped. We introduce a goal-level attribution framework, CoTrace, that decomposes explicit goals into verifiable requirements and traces both direct contributions and indirect influences across dialogue turns. Applying CoTrace to 638 real-world collaboration logs, we find that while models account for only 11-26% of goal-shaping contribution, they contribute substantially more on introducing lower-level concrete requirements, and make various kinds of indirect contributions. Through controlled simulations, we show that interaction design choices significantly affect model goal-shaping behavior. In a user study, exposing participants to goal-level analyses shifts their perceived contributions by nearly 2 points on a 5-point scale, revealing systematic miscalibration in how users understand their own AI-assisted work.

10.
arXiv (CS.AI) 2026-06-16

FineVLA: Fine-Grained Instruction Alignment for Steerable Vision-Language-Action Policies

arXiv:2605.27284v2 Announce Type: replace-cross Abstract: Vision-Language-Action (VLA) models are increasingly expected to not only complete robot tasks, but also follow human instructions about how those tasks should be executed. However, existing robot datasets usually pair trajectories with coarse goal-level language, leaving execution-critical details such as active arm, approach direction, and contact region unspecified. This limits steerable policy learning and robotic video understanding. We introduce FineVLA, an open framework for action-aligned fine-grained VLA supervision. The framework includes: (1) a data construction tool that unifies 972,247 trajectories across 85K tasks from 10 open-source robot datasets and builds FineVLA-Data, a human-verified dataset of 47,159 fine-grained trajectories; (2) a held-out benchmark with 500 videos, 11,631 atomic facts, and 1,030 VQA questions; (3) a robotics-specialized VLM annotator for scalable fine-grained annotation; and (4) a steerable VLA policy trained with controlled mixtures of fine-grained and raw goal-level instructions. Our experiments yield three findings. First, fine-grained supervision does not sacrifice goal-level success: FG-only improves over Raw-only by +1.4 to +8.1 success-rate points across settings. Second, fine-grained and raw instructions are complementary, following a consistent inverted-U trend peaking at FG:Raw = 1:2 to 1:1. The best mixed setting reaches 86.8%/82.5% in RoboTwin simulation and 62.7/100 in real-world dual-arm manipulation (vs. 49.9 Raw-only). Third, fine-grained supervision improves steerable control: the largest real-world gains appear on pose (+23), color (+18), and approach direction (+18)–factors where goal-level instructions provide no guidance. Overall, fine-grained language should augment goal-level instructions: specifying how to execute alongside what to achieve. Project page: https://finevla.xlang.ai/

11.
arXiv (CS.CV) 2026-06-19

Current World Models Lack a Persistent State Core

World models are increasingly regarded as a decisive step toward artificial general intelligence, yet modeling the physical world demands more than rendering convincing frames on demand: it requires an internal world state that keeps evolving over time, decoupled from observation, so that objects endure and events run to their conclusions whether or not a camera is watching, much as the moon holds to its orbit when no one is looking. This requirement is a blind spot of existing benchmarks, which reward surface properties such as fidelity, motion, and camera controllability while never asking whether a generated world keeps evolving once it is unobserved. We introduce WRBench, the first systematic diagnostic benchmark that treats camera motion as an intervention on observability and resolves evaluation into a human-calibrated chain that asks whether the camera executes the requested interaction, whether the scene stays continuous and identifiable while in view, and whether a returning target remains consistent with the event that was set in motion. Across 9{,}600 videos from 23 models spanning four control paradigms, one finding proves stubborn: current systems maintain the observed world as a tracking shot, resuming a returning target in the state at which it was abandoned rather than advancing the event while it went unseen. Because this failure recurs across control paradigms, model families, and increments of scale, robust world-state evolution does not follow from cleaner imagery, tighter control, richer geometric priors, or sheer parameter count We therefore argue that the stability of the physical state kernel and the consistency of worldlines under viewpoint intervention should become first-class objectives of world-model design, so that a world model captures how the world will unfold rather than how the next frame appears.

12.
arXiv (CS.AI) 2026-06-11

Designing AI-Supported Focus Groups: A Role x Modality Playbook

arXiv:2606.11835v1 Announce Type: cross Abstract: Collecting participants' lived experiences is central to design research. Focus groups are uniquely valuable because participants not only share individual accounts but also respond to one another, surfacing comparison, disagreement, and collective sensemaking. However, focus groups are resource-intensive and highly sensitive to facilitation: moderators must probe for specificity, balance participation, manage topic flow, and sustain psychological safety, and subtle facilitation choices can shape what becomes salient. Recent HCI work and commercial meeting tools show that generative AI can scaffold live conversation through prompting, turn regulation, thematic mapping, and real-time summarization. Yet UXR teams lack a clear map of what these capabilities mean in focus groups and what methodological risks they introduce. We synthesize AI supports for live conversation and translate them into a focus-group-specific playbook organized by AI role (tool, co-host, host) and modality (text, voice, embodied).We synthesize prior work on AI-supported live conversation and propose a focus-group-specific playbook of AI supports organized by role (tool, co-host, host) and modality (text, voice, embodied). We characterize interactional trade-offs and identify open questions for evaluating AI-supported focus groups as methodological configurations.

13.
arXiv (CS.CL) 2026-06-12

From Passive Generation to Investigation: A Proactive Scientific Peer Review Agent

Large language models (LLMs) have shown promise in automating scientific peer review. However, existing approaches often struggle to generate in-depth reviews supported by concrete evidence. We argue that a key limitation is the lack of flexibility to proactively investigate suspicious parts of a paper based on accumulated evidence, as human reviewers do. In this paper, we explore how to enable an LLM-based review agent to perform such proactive investigation. We find that this can be naturally formulated as a Markov Decision Process (MDP), and propose ProReviewer, a scientific peer review agent that proactively reviews a paper guided by a maintained, structured review log. The structured review log serves as a workspace for the agent to track evidence and intermediate findings collected during review. Experiments show that ProReviewer with an 8B backbone, trained by supervised fine-tuning and optimized by reinforcement learning, achieves the highest average score across five quality dimensions, outperforming prompt-based methods with much larger frontier LLMs by up to 39% and the strongest fine-tuned baseline by 16% relatively. It also attains the highest win rates against baselines in human evaluation.

14.
arXiv (CS.LG) 2026-06-18

Online Distributional Prediction via Latent Cluster Geometry Under Drift and Corruption

arXiv:2606.18778v1 Announce Type: new Abstract: Online learning in non-stationary streams is often formulated as tracking a point estimate, but many applications require predicting the full data-generating distribution. We study online distributional prediction under drift and adversarial corruption. Our approach represents each candidate law through a latent cluster geometry: a variable-size configuration of centers that organizes probability mass and induces a predictive distribution. A Gibbs quasi-posterior over these configurations yields an online predictor by posterior averaging, and the resulting variable-dimensional posterior can be sampled with reversible-jump MCMC. The method therefore avoids specifying a parametric streaming law while retaining a structured latent space for uncertainty, regularization, and comparison. We evaluate performance by cumulative Wasserstein-1 regret against the time-varying true law. The analysis separates two effects: corruption perturbs the loss-based posterior update, whereas drift makes long-horizon posterior memory stale. We address the latter with a restarted variant that temporally localizes the same quasi-Bayesian update. The resulting high-probability bounds decompose into a PAC-Bayesian complexity term, a corruption-sensitive posterior perturbation term, and a dynamic optimal-transport term driven by \(A_T^{\mathrm{OT}}=\sum_{t=2}^T W_2^2(p_{t-1}^*,p_t^*)\). Under bounded support, stable latent geometry, predictive-map regularity, oracle realizability, localized restart windows, sublinear transport action, and sublinear corruption budget, the restarted predictor achieves sublinear cumulative Wasserstein regret. These guarantees require no parametric model for the stream, drift mechanism, or corruption process.

15.
bioRxiv (Bioinfo) 2026-06-19

Children's DNA Methylation and Family Dynamics in a Congo Basin Subsistence Community: Links with Parental Conflict and Fathers' Caregiving

Family environments may contribute to children's long-term health through biological processes, including epigenetic regulation such as DNA methylation (DNAm). However, most studies in this area focus on Euro-American populations while also rarely including fathering data. The current study investigated children's blood DNAm associations with positive (father caregiving) and negative (parental conflict) family dynamics in a smaller-scale subsistence society living in the Congo Basin rainforest. We measured DNAm from dried blood spots of 54 children (mean age=8.48 years) and conducted three epigenome-wide association studies aimed at discovering differential co-methylated regions (CMRs) associated with family dynamics. Via path models, we investigated the health implications and shared contribution of family factors of the identified CMRs. Differential DNAm associated with family dynamics was localized to genes related to stress, immunology, development, and aging, thus possibly linking to children's physical health and were simultaneously connected to other family factors such as number of siblings. Our findings suggested similarities in biological embedding of family factors across socio-ecologically diverse contexts.

16.
arXiv (CS.AI) 2026-06-15

Scalable Production Scheduling: Linear Complexity via Unified Homogeneous Graphs

arXiv:2604.23841v2 Announce Type: replace-cross Abstract: Efficiently solving the Job Shop Scheduling Problem in real-world industrial applications requires policies that are both computationally lean and topologically robust. While Reinforcement Learning has shown potential in automating dispatching rules, existing models often struggle with a scalability bottleneck caused by quadratic graph complexity or the architectural overhead of heterogeneous layers. We introduce a unified graph framework that employs feature-based homogenization to project distinct node roles into a shared latent space. This allows a standard homogeneous Graph Isomorphism Network to capture complex resource contention with linear complexity, ensuring low-latency inference for large-scale industrial applications. Our empirical results demonstrate that our framework achieves state-of-the-art performance while exhibiting consistent zero-shot generalization. We identify the job-to-machine ratio as the primary driver of policy effectiveness, rather than absolute problem size. Based on this, we propose a hypothesis of structural saturation, demonstrating that policies trained on critically congested instances ($\mathcal{J} \approx \mathcal{M}$) learn scale-invariant resolution strategies. Agents trained at this saturation point internalize invariant conflict-resolution logic, allowing them to treat massive rectangular instances as a sequential concatenation of saturated sub-problems. This approach eliminates the need for expensive scale-specific retraining and prevents overfitting to statistical shortcuts, providing a robust and efficient pathway for deploying RL solutions in dynamic production environments.

17.
arXiv (CS.CV) 2026-06-17

Future Dynamic 3D Reconstruction: A 3D World Model with Disentangled Ego-Motion

Forecasting the evolution of dynamic environments is crucial for autonomous agents. While generative world models have recently achieved high photorealism in 2D video synthesis by mixing ego-motion and environmental dynamics within the image plane, they exhibit physical inconsistencies, such as morphing or vanishing objects, especially over long time horizons. In this paper, we propose FR3D, a world model that predicts a persistent 3D latent representation for future dynamic 3D reconstruction. Unlike prior works that treat the world as a sequence of image-based features, FR3D explicitly decouples the 3D evolution of the scene from the agent's trajectory, treating the inferred ego-motion as a latent proxy for action. This disentanglement resolves the ambiguities between self-motion and world-motion, ensuring geometric consistency into the future. Furthermore, we introduce a teacher-student distillation strategy that leverages the spatial "common sense" of off-the-shelf foundation models, leading to robust zero-shot generalization. Extensive experiments demonstrate FR3D's strong performance for future dynamic 3D reconstruction from monocular observations across multiple datasets, even 2 seconds into the future. Project page: https://fr3d-wm.github.io.

18.
arXiv (CS.AI) 2026-06-11

FACTR 2: Learning External Force Sensing for Commodity Robot Arms Improves Policy Learning

arXiv:2606.12406v1 Announce Type: cross Abstract: Contact-rich manipulation requires force sensitivity, but many robot arms lack dedicated force sensors due to their high cost. We present Neural External Torque Estimation (NEXT), a data-driven method that estimates external joint torques without needing any dedicated force sensors. NEXT trains in 1 minute from only 10 minutes of free-motion data, yet achieves estimates comparable to dedicated joint-torque sensors. NEXT enables force-feedback teleoperation on low-cost arms and improves policy learning through Force-Informed Re-Sampling Training (FIRST), which up-samples pre-contact and contact segments during behavior cloning. Across five long-horizon tasks, FIRST outperforms prior force-aware policies by over 17% in task progress. Together, NEXT and FIRST bring force-aware teleoperation and policy learning to off-the-shelf robots without additional sensing hardware. Video results and code are available at https://jasonjzliu.com/factr2

19.
arXiv (CS.CV) 2026-06-11

DIRECT: When and Where Should You Allocate Test-Time Compute in Embodied Planners?

Vision-Language Models (VLMs) are increasingly deployed as high-level planners for embodied agents, with an emerging strategy of scaling test-time compute to improve capability. However, we observe that doing so increases latency, token usage, and FLOPs while yielding uneven, often diminishing gains in downstream success, limiting where embodied agents can be deployed. We argue that choosing when and where to spend test-time compute is central to bringing frontier performance to the real world. We introduce DIRECT, a routing framework that uses multimodal scene context to allocate compute per prompt, improving the success–cost Pareto frontier over fixed model selection. Across three dominant scaling axes, namely chain-of-thought depth, model size, and memory history, our experiments on VLABench and RoboMME show that test-time compute is not a uniform lever: different axes yield qualitatively distinct capability gains. We validate these insights on a physical Franka arm in a DROID setup spanning zero-shot manipulation and long-horizon chaining, where our router matches or exceeds a stronger model's success rate at up to 65% lower average latency. Ultimately, our results show that naively scaling test-time compute is wasteful, and that DIRECT can provide frontier-level embodied planning in robotic systems at a fraction of the cost. Project page can be found at jadee-dao.github.io/direct/.

20.
arXiv (math.PR) 2026-06-18

A Unified Approach to Beta Moments, Combinatorial Identities, and Random Walks

arXiv:2605.05420v2 Announce Type: replace Abstract: The study of random walks has increasingly been popular across diverse disciplines such as statistics, mathematics, quantum physics, where they are used to model paths consisting of successive random steps in a mathematical space. A fundamental quantity of interest is the probability that a simple symmetric random walk returns to the origin after 2n steps. In this paper, we develop a unified probabilistic approach that connects the return probabilities in arbitrary dimensions with moment representations. Using this framework, we provide probabilistic proofs of several combinatorial identities involving beta and gamma functions, and derive new combinatorial identities in general dimensions.

21.
arXiv (math.PR) 2026-06-18

The FBSDE approach to sine-Gordon up to $6\pi$

arXiv:2401.13648v3 Announce Type: replace-cross Abstract: We develop a stochastic analysis of the sine-Gordon Euclidean quantum field $(\cos (\beta \varphi))_2$ on the full space up to the second threshold, i.e. for $\beta^2 < 6 \pi$. The basis of our method is a forward-backward stochastic differential equation (FBSDE) for a decomposition $(X_t)_{t \geqslant 0}$ of the interacting Euclidean field $X_{\infty}$ along a scale parameter $t \geqslant 0$. This FBSDE describes the optimiser of the stochastic control representation of the Euclidean QFT introduced by Barashkov and one of the authors. We show that the FBSDE provides a description of the interacting field without cut-offs and that it can be used effectively to study the sine-Gordon measure to obtain results about large deviations, integrability, decay of correlations for local observables, singularity with respect to the free field, Osterwalder-Schrader axioms and other properties.

22.
arXiv (CS.CL) 2026-06-11

Notes2Skills: From Lab Notebooks to Certainty-Aware Scientific Agent Skills

Scientific discovery workflows usually contain and rely heavily on lab notes, where researchers record observations, interpret uncertain results, and plan follow-up experiments. Such informative lab notes preserve evolving scientific reasoning and author uncertainty, rather than polished final results exhibited in publications, providing a valuable opportunity for AI to engage in scientific exploration at a more comprehensive and deeper level. However, most prior work on scientific text focuses on papers, protocols, or structured databases, leaving informal laboratory notes underexplored as inputs to AI agents for science. This gap matters because lab notes often intermingle validated observations, tentative judgments, and possible experimental next steps within the same passage. If these signals are conflated, an AI agent may mistake uncertain scientific judgments for confirmed conclusions or executable actions. To this end, we present Notes2Skills, a two-stage framework for turning lab notebooks into verifiable skills for scientific AI agents while preserving the author's certainty. Across seven conditions and three wet-lab sessions, Notes2Skills is the only configuration that neither mistakes uncertain notes for firm instructions nor discards firm ones. We show that certainty preservation is the missing piece between lab notebooks and reliable agent skills, opening a path toward safer AI co-scientist systems.

23.
arXiv (CS.LG) 2026-06-16

A Gradient Perspective on RLVR Stability and Winner Advantage Policy Optimization

arXiv:2606.16154v1 Announce Type: new Abstract: Reinforcement learning with verifiable rewards (RLVR) improves language-model reasoning, but GRPO-style optimization remains prone to collapse. We analyse this instability through token-level gradient dynamics, deriving a taxonomy that predicts how updates affect next-token probabilities and entropy. The taxonomy shows that stability depends jointly on the advantage sign and token distribution under the current policy. Motivated by this finding, we propose Winner Advantage Policy Optimization (WAPO), a simple online clipped policy-gradient objective that updates only on positive-advantage completions. Across mathematical reasoning and multi-hop QA benchmarks, WAPO improves training stability and matches or outperforms baselines across multiple model families. Full code can be found at https://github.com/layer6ai-labs/wapo.

24.
arXiv (CS.LG) 2026-06-11

Mirror Descent Beyond Euclidean Stability: An Exponential Separation in Initialization Sensitivity

arXiv:2606.11431v1 Announce Type: new Abstract: Mirror Descent (MD) extends Gradient Descent (GD) beyond Euclidean geometry and has recently reappeared as a lens for KL-regularized policy optimization in reinforcement learning and LLM post-training. This raises a basic robustness question, crucial to reproducibility and reliability: how sensitive are MD dynamics to their inputs? We focus on initialization, often itself a pretrained or previously aligned model. Quadratic-regularized MD, including GD and Mahalanobis geometries, is well-known to be stable for convex smooth objectives. We show a sharp contrast: once the regularizer is non-quadratic, MD can be exponentially more sensitive to initialization than GD, even with a well-conditioned regularizer in Euclidean norm. We give a three-dimensional construction with a convex, smooth objective and a strongly convex, smooth, well-conditioned regularizer where an initial $\varepsilon$ perturbation is quickly amplified to $\min\{polylog^{-1}(1/\varepsilon), \varepsilon e^{\Omega(\eta T)}\}$ after $T$ iterations of MD with step size $\eta$. For canonical KL-regularized MD on the simplex, we show that even linear objectives can amplify an initial $\varepsilon$ perturbation exponentially fast in high-dimensional or near-boundary regimes. Finally, we show that adding a Bregman regularization term toward an anchor point can stabilize the dynamics while largely preserving the optimization guarantees, and that the choice of anchor is crucial: anchoring at the initialization only partially mitigates the instability, whereas anchoring at a fixed point yields a more stable mechanism.

25.
arXiv (CS.LG) 2026-06-18

Sequential Kernel-based Conditional Independence Testing via Adaptive Betting

arXiv:2606.18993v1 Announce Type: cross Abstract: Testing conditional independence is fundamental yet intrinsically difficult: without additional assumptions, Type I error control is impossible in general. The "Model-X'' paradigm addresses this difficulty by assuming exact knowledge of a relevant conditional distribution. While small deviations from this assumption can sometimes be tolerated in classical one-shot testing, existing sequential conditional independence tests typically require the Model-X conditional to be known exactly, making them fragile when it must instead be estimated. We propose a new approach that is substantially more robust to such estimation error. Our method applies testing-by-betting to an adaptively optimized Kernel Conditional Independence statistic, together with a normalization scheme and a truncate-and-shift calibration strategy. These modifications greatly reduce Type I error inflation while preserving high power across high-dimensional synthetic benchmarks and real-world fairness tasks, outperforming existing sequential Model-X approaches. Code is available at https://github.com/he-zh/SKCI.