Academic Intelligence · Curated Daily

探索全球前沿学术脉络

AcademicHub 汇聚顶级期刊与预印本平台的实时文献。定制您的专属科研雷达,利用大语言模型自动生成交叉领域文献分析简报。

01.
arXiv (CS.LG) 2026-06-18

Pointwise is Pointless? A Multimodal Ablation Study for Precipitation Nowcasting with Graph Neural Networks

arXiv:2606.18436v1 Announce Type: cross Abstract: Sparse point observations are increasingly available for precipitation nowcasting, but it is unclear how much they improve dense radar-field forecasts. We partially address this question with a multimodal graph neural network nowcasting system over the Nordic radar domain. The model predicts rain rate every five minutes up to two hours ahead and is trained with different combinations of radar history, MEPS numerical weather prediction, Netatmo surface observations, MSG satellite channels, stochastic noise, and CRPS-based ensemble losses. The study is designed as an ablation of operationally relevant information sources and training objectives. We compare radar-only, NWP-informed, station-informed, satellite-informed, noise-augmented, and CRPS-based configurations using complementary diagnostics on the radar grid, at station locations, for rain onset, and through oracle, displacement, and amplitude scores. The results show that each source improves a different part of the forecast problem. MEPS stabilises radar-only extrapolation, Netatmo observations improve local station and onset diagnostics, and satellite predictors reduce some station-level biases but may activate rain too early when used deterministically. CRPS-based configurations provide the most consistent radar-grid gains, while the combined satellite and CRPS setup gives the best overall oracle/DAS score. These results do not support the conclusion that point observations are uninformative for nowcasting, but they show that local observational skill and spatially coherent radar-field skill are distinct targets. The practical implication is that sparse observations can provide useful local constraints, but their benefit for radar-like fields depends on the training loss, uncertainty representation, and how observation support is encoded in the model.

02.
arXiv (CS.AI) 2026-06-16

OmniMouse: Scaling properties of multi-modal, multi-task Brain Models on 150B Neural Tokens

arXiv:2604.18827v2 Announce Type: replace-cross Abstract: Scaling data and artificial neural networks has transformed AI, driving breakthroughs in language and vision. Whether similar principles apply to modeling brain activity remains unclear. Here we leveraged a dataset of 3.1 million neurons from the visual cortex of 73 mice across 323 sessions, totaling more than 150 billion neural tokens recorded during natural movies, images and parametric stimuli, and behavior. We train multi-modal, multi-task models that support three regimes flexibly at test time: neural prediction, behavioral decoding, neural forecasting, or any combination of the three. OmniMouse achieves state-of-the-art performance, outperforming specialized baselines across nearly all evaluation regimes. We find that performance scales reliably with more data, but gains from increasing model size saturate. This inverts the standard AI scaling story: in language and computer vision, massive datasets make parameter scaling the primary driver of progress, whereas in brain modeling – even in the mouse visual cortex, a relatively simple system – models remain data-limited despite vast recordings. The observation of systematic scaling raises the possibility of phase transitions in neural modeling, where larger and richer datasets might unlock qualitatively new capabilities, paralleling the emergent properties seen in large language models. Code available at https://github.com/enigma-brain/omnimouse.

03.
arXiv (CS.CL) 2026-06-12

Demystifying Hidden-State Recurrence: Switchable Latent Reasoning with On-Policy Reinforcement Learning

Latent chain-of-thought compresses reasoning by replacing visible reasoning traces with continuous hidden-state recurrence, but existing formulations are difficult to optimize with standard on-policy reinforcement learning (RL) and hard to interpret causally. Our key insight is that a single pair of explicit boundary tokens can address both issues at once: discrete entry and exit anchors make the latent block compatible with standard on-policy RL, and the same anchors offer a natural foothold for mechanistic analysis. Motivated by this, we propose SWITCH, a switchable latent reasoning framework. The model emits to enter latent mode and to exit. Because the boundaries are ordinary discrete tokens, the GRPO policy ratio is well-defined at every decision point. The same anchors also expose the latent steps to direct probing and causal intervention. We train the model with a visible-to-latent curriculum and a Switch-GRPO objective that propagates gradients through recurrent latent computation. SWITCH consistently outperforms prior hidden-state-recurrence latent reasoning approaches at similar scale. Mechanistic analysis through the boundary tokens further reveals three findings: (i) is a sharply localised, learned switching policy rather than a stylistic artefact; (ii) the latent step it opens performs problem-specific, causally important computation rather than acting as an inert placeholder; and (iii) that computation is concentrated at a single hidden-state transition on entry. Together, these results show that hidden-state-recurrence latent reasoning is both RL-trainable and open to direct mechanistic analysis, including of how on-policy RL itself improves the model from the inside.

04.
arXiv (CS.AI) 2026-06-11

A Survey on Evaluating Quality and Trustworthiness in LLM-Generated Data

arXiv:2601.17717v3 Announce Type: replace Abstract: Large Language Models (LLMs) have emerged as powerful tools for generating data across various modalities. By transforming data from a scarce resource into a controllable asset, LLMs mitigate the bottlenecks imposed by the acquisition costs of real-world data for model training, evaluation, and system iteration. However, ensuring the high quality of LLM-generated synthetic data remains a critical challenge. Existing research primarily focuses on generation methodologies, with limited direct attention to the quality of the resulting data. Furthermore, most studies are restricted to single modalities, lacking a unified perspective across different data types. To bridge this gap, we propose the LLM Data Auditor framework. In this framework, we first describe how LLMs are utilized to generate data across six distinct modalities. More importantly, we systematically categorize intrinsic metrics for evaluating synthetic data from two dimensions: quality and trustworthiness. This approach shifts the focus from extrinsic evaluation, which relies on downstream task performance, to the inherent properties of the data itself. Using this evaluation system, we analyze the experimental evaluations of representative generation methods for each modality and identify substantial deficiencies in current evaluation practices. Based on these findings, we offer concrete recommendations for the community to improve the evaluation of data generation. Finally, the framework outlines methodologies for the practical application of synthetic data across different modalities.

05.
arXiv (CS.LG) 2026-06-17

INI-VPINN: A Variational Physics-Informed Neural Network with Implicit Neumann and Interface Handling for Multi-Material Domains with Geometric Singularities

arXiv:2606.18032v1 Announce Type: cross Abstract: We propose a new weak-form Physics-Informed Neural Network approach (named INI-VPINN). INI-VPINN naturally incorporates Neumann boundary and interface conditions into the variational formulation. It removes the need for additional loss terms or multiple subdomain networks. This framework employs compact support weighting functions and integration by parts to implicitly impose flux and continuity constraints. In this way, it implicitly ensures physical consistency across material boundaries. The proposed method is tested on Poisson and Laplace problems with sharp interfaces and complex geometries. Results show that, compared with several other Physics Informed Neural Networks-based formulations, the INI-VPINN consistently achieves higher accuracy, smoother and faster convergence. The proposed framework provides a general approach for solving multimaterial problems with complex geometries and mixed Neumann-Dirichlet boundary conditions using neural networks. The implementation is publicly available in a GitHub repository.

06.
arXiv (CS.CL) 2026-06-18

SenFlow: Inter-Sentence Flow Modeling for AI-Generated Text Detection in Hybrid Documents

Sentence-level AI-generated text detection (S-AGTD) for hybrid documents, where humans and LLMs co-author one text, faces two gaps: existing methods classify each sentence in isolation, discarding inter-sentence dependencies, and existing benchmarks omit the newest generation of generators. We construct MOSAIC, a benchmark of 16,000 hybrid documents over PubMed and XSum, generated by DeepSeek-V3.2 and Kimi K2 under stringent quality controls including a perplexity-consistency filter absent from prior benchmarks. We recast S-AGTD as structured prediction over the document sentence sequence and instantiate it as SenFlow, integrating graph-based inter-sentence propagation with linear-chain CRF decoding in a single document-level pass over a sentence graph. SenFlow reaches state-of-the-art performance on MOSAIC, with a +4.15 pp average Macro-F1 margin on cross-domain transfer, the hardest of three protocols of increasing difficulty. We further find that even after the perplexity filter equalizes overt cues, AI insertions retain a generator-dependent sentence-length gap that sentence-level detectors still exploit. Code and data: https://github.com/luojingkun22/SenFlow

07.
bioRxiv (Bioinfo) 2026-06-11

GermRL: Alleviating The Germline Bias In Autoregressive Antibody Language Models Through Reinforcement Learning

Antibodies are powerful therapeutics whose antigen specificity arises from sequence diversity shaped during development. Recently, language models trained on large antibody repertoire datasets have enabled the generation and screening of novel candidates, but these models retain a strong germline bias. As AI adoption increases in therapeutic workflows, it is crucial to develop models that harness the diversity of antibodies necessary for the discovery of mutations that encode desirable properties. Previous work explored the germline bias in masked antibody language models, yet the bias in generative autoregressive language models has not yet been addressed. Here, we present GermRL, a lightweight and modular reinforcement learning (RL) framework capable of alleviating the germline bias in pre-trained antibody autoregressive language models through group relative policy optimization (GRPO). GermRL achieves consistent one-shot generation of antibodies that satisfy specified mutation thresholds from germline while maintaining structural plausibility. Under the lowest and highest mutation thresholds tested (5 and 35 mutations from germline), GermRL scores 0.992 and 0.950 pass@1, respectively, compared to 0.398 and 0.034 for the pre-trained language model. Within GermRL, we introduce a key pair of modifications to GRPO that increase training efficiency by discouraging reward hacking under our antibody application. Furthermore, comparison of RL generated and natural antibody sequences reveals how RL based optimization can explore alternative evolutionary mutational patterns and residue compositional strategies while preserving key global properties of natural antibodies, including identifiable germline assignments, embedding-level similarity and comparable developability profiles. Thus, RL-trained generative models optimized to promote antibody mutations through diversity from germline provide a promising framework for navigating the antibody sequence landscape, enabling exploration of novel yet biologically plausible candidates for therapeutic design.

08.
arXiv (CS.LG) 2026-06-17

Instrumental and Proximal Causal Inference with Gaussian Processes

arXiv:2603.02159v2 Announce Type: replace-cross Abstract: Instrumental variable (IV) and proximal causal learning (Proxy) methods are central frameworks for causal inference in the presence of unobserved confounding. Despite substantial methodological advances, existing approaches rarely provide reliable epistemic uncertainty (EU) quantification. We address this gap through a Deconditional Gaussian Process (DGP) framework for uncertainty-aware causal learning. Our formulation recovers popular kernel estimators as the posterior mean, ensuring predictive precision, while the posterior variance yields principled and well-calibrated EU. Moreover, the probabilistic structure enables systematic model selection via marginal log-likelihood optimization. Empirical results demonstrate strong predictive performance alongside informative EU quantification, evaluated via empirical coverage frequencies and decision-aware accuracy rejection curves. Together, our approach provides a unified, practical solution for causal inference under unobserved confounding with reliable uncertainty.

09.
Nature Medicine 2026-06-11

Clinical Profile and Genomic Characterization of the 2026 Bundibugyo Virus Index Case in Uganda

Bundibugyo virus disease (BVD) remains a high-consequence threat in Eastern and Central Africa, where cross-border mobility, nonspecific early symptoms, and delayed recognition can obscure transmission. In this case report, we describe Uganda’s 2026 BVD index case: a male patient who traveled from the Democratic Republic of the Congo to Uganda and was admitted to a private hospital in Kampala on 11 May 2026 after more than two weeks of vomiting and diarrhea, with epigastric pain, weakness, and hiccups. He deteriorated rapidly, developing acute kidney injury, pulmonary edema, hepatic dysfunction, hypoxemia, delirium, atrial flutter, possible disseminated intravascular coagulation, and multiorgan failure, and died on 14 May. A posthumous EDTA whole-blood specimen tested at the Central Emergency Response and Surveillance Laboratory was positive for orthoebolavirus RNA and confirmed as Bundibugyo virus (BDBV) by RT-qPCR. Sequencing achieved 99% genome coverage at ≥100× depth. The 2026 BDBV genome formed a distinct lineage approximately equidistant from the 2007–2008 Butalya and 2012 Isiro variants, differing by 216–227 nucleotides (~1.2% sequence divergence). Here, we demonstrate the value of fatality surveillance, private-sector surveillance, diagnostic optimization through national specimen referral, and rapid molecular-genomic diagnostics for early detection, transmission chain interruption, and public health response coordination.

10.
arXiv (CS.LG) 2026-06-15

Efficient On-Device Diffusion LLM Inference with Mobile NPU

arXiv:2606.13740v1 Announce Type: new Abstract: Diffusion large language models (dLLMs) accelerate generation by denoising multiple tokens in parallel, making them attractive for latency-sensitive mobile inference. However, repeated denoising introduces substantial computation on smartphones. Mobile neural processing units (NPUs) offer high-throughput dense matrix computation, but efficiently exploiting them remains challenging: token commitment shrinks per-block effective workloads, token revision complicates KV cache reuse, and limited NPU-visible address space incurs costly remapping and data transfer overheads. In this paper, we propose llada.cpp, the first NPU-aware inference framework for accelerating dLLMs on smartphones. llada.cpp aligns block-wise dLLM inference with the execution characteristics of mobile NPUs through three techniques. (1) Multi-Block Speculative Decoding fills the shrinking workload in late-stage current-block decoding with speculative future-block tokens. (2) Dual-Path Progressive Revision keeps committed tokens revisable until stable and refreshes unstable tokens through a CPU-side path without stalling dense NPU execution. (3) Swap-Optimized Memory Runtime compacts NPU-visible address layouts and overlaps data staging with NPU computation to reduce remapping and transfer overheads. We implement llada.cpp as an end-to-end framework and evaluate it across diverse hardware platforms and dLLM workloads. llada.cpp reduces LLaDA-8B generation latency by 17x-42x over the CPU baseline with prefix KV cache reuse, while preserving generation quality.

11.
arXiv (CS.LG) 2026-06-12

Differentiable Thermodynamic Phase-Equilibria for Machine Learning

arXiv:2603.11249v3 Announce Type: replace Abstract: Accurate prediction of phase equilibria remains a central challenge in chemical engineering. Physics-consistent machine learning methods that incorporate thermodynamic structure into neural networks have recently shown strong performance for activity-coefficient modeling. However, extending such approaches to equilibrium data arising from an extremum principle, such as liquid-liquid equilibria, remains difficult. Here we present DISCOMAX, a differentiable algorithm for phase-equilibrium calculation that guarantees thermodynamic consistency at both training and inference, only subject to a user-specified discretization. The method combines discrete enumeration of feasible phase states with masked softmax aggregation in the backward pass, with the propagation of the true equilibrium state in the forward pass, using a straight-through gradient estimator to enable physics-consistent end-to-end learning of neural \gls{gE}-models. We show that this approach bears analogy to statistical thermodynamics, and we evaluate it on binary liquid-liquid equilibrium data where it outperforms existing surrogate-based methods, while offering a general framework for learning from different kinds of equilibrium data.

12.
arXiv (CS.AI) 2026-06-18

Augmenting Dysarthric Speech Severity Assessment with MOS Supervision

arXiv:2606.18645v1 Announce Type: cross Abstract: Dysarthria is a speech disorder marked by reduced intelligibility and communicative effectiveness. Automatic utterance-level assessment of dysarthric speech can support scalable speech monitoring and therapy-related analysis. Yet training such systems is bottlenecked by the scarcity of clinically annotated dysarthric speech. This work proposes to augment dysarthric speech assessment using data from speech synthesis evaluations, specifically human-annotated utterances with Mean Opinion Score (MOS) labels from the QualiSpeech corpus. Experiments show that fine-tuning on speech synthesis assessment data consistently improves performance on both intelligibility and naturalness prediction, while joint training yields gains primarily on naturalness. These results suggest that synthesis artifacts and dysarthric speech share perceptual commonalities, and speech synthesis evaluation corpora offer a practical augmentation source that reduces reliance on scarce clinical annotations.

13.
PLOS Computational Biology 2026-06-22

Heterogeneous suppressive effect of <i>Wolbachia</i> incompatible insect technique coupled with sterile insect technique across time and historical <i>Ae. aegypti</i> abundance - using distributional synthetic controls

作者:

by Yichen Zhai, Chia-Chen Chang, Zhiyong Xi, Cheong Huat Tan, Lee Ching Ng, Jue Tao Lim Background Biological control tools such as Wolbachia incompatible-insect technique, are a promising class of interventions to modify and suppress Aedes aegypti mosquitoes to reduce risk of Aedes-borne diseases. Due to the spatial nature of the intervention, intervention effects can be spatio-temporally heterogeneous. Yet, most evaluations of field-based technologies rely on average treatment effects, which preclude characterization and understanding of treatment effect heterogeneities and the factors influencing it. Methods Here, we developed a causal inference framework using distributional synthetic controls to explicitly account for spatio-temporal trap-level mosquito abundance data to ascertain the entomological efficacy of Wolbachia in suppressing Ae. aegypti abundance. This method is able to construct counterfactual distributions of intervened areas, provide detailed comparisons to actual distributions and quantify treatment effects of the intervention on mosquito abundance over different quantiles. By employing our framework to trap-level mosquito abundance data from 57,990 unique mosquito traps routinely maintained and measured twice a week, and a large-scale field trial of Wolbachia incompatible-insect technique coupled with sterile insect technique (IIT-SIT) in Singapore, we (1) quantified heterogeneous treatment effects for IIT-SIT across the time-since-intervention, over the traps’ historical mosquito abundance, over calendar time, (2) quantified whether elimination of wild-type Aedes aegypti was possible in intervention locations and (3) addressed if suppressive effects in spillover locations adjacent to directly intervened locations were heterogeneous. Results IIT-SIT interventions led to a strong suppressive effect on adult Aedes aegypti abundance. From the onset of intervention in directly treated locations, sector-specific intervention effectiveness (IE) ranged from 24.04% in the earliest treatment period, and reached 86.08% in the latest treatment period. Raw reductions in aegypti abundance were also found to increase over time as sectors were intervened over longer time periods. In spillover sectors, IE was lower in magnitude and more variable, but average IE reached a maximum of 78.08% in 2-years post-treatment. Wolbachia interventions also led to an increase in the percentage of traps recording no mosquitoes from 6.8% at the start of intervention to 33.01% 124-weeks post-intervention. We found that IE was higher in sectors with lower historical mosquito abundance. However, IE converged across sectors with different historical mosquito abundance as intervention time increased. Conclusion This study revealed spatial heterogeneities in suppressing wild-type female Ae. aegypti by IIT-SIT and provided strong evidence that IIT-SIT can drastically suppress wild-type Ae. aegypti populations despite heterogeneous treatment effects over time.

14.
arXiv (CS.CV) 2026-06-16

Pantheon360: Taming Digital Twin Generation via 3D-Aware 360{\deg} Video Diffusion

Generating complete digital twins from videos requires precise camera control, global scene coverage, and strict spatial-temporal consistency constraints that remain challenging for perspective video generators due to their limited field of view (FoV). Their narrow FoV forces long or multi-view trajectories, amplifying cross-view inconsistency and temporal drift. We argue that 360{\deg} video generation offers a natural solution: panoramic coverage simplifies trajectory design and provides a strong global context for maintaining coherence. We introduce Pantheon360: Taming Digital Twin Generation via 3D-Aware 360{\deg} Video Diffusion, a controllable 360{\deg} video generation framework that synthesizes high-fidelity videos from sparse 360{\deg} inputs. The key idea is an explicit 3D Cache, reconstructed from the input, which serves as a geometric scaffold for any user-defined camera path. This allows the diffusion model to focus on photorealistic texture refinement while the 3D Cache enforces global geometric consistency. Experiments show that Pantheon360 achieves superior visual quality and unmatched geometric coherence, enabling reliable and flexible 360{\deg} scene generation for downstream simulation and digital-twin applications.

15.
arXiv (CS.CL) 2026-06-16

ttda704 at SemEval-2026 Task 6: Structured Chain-of-Thought Prompting for Political Evasion Detection

This paper describes our system for SemEval-2026 Task 6, which addresses the classification of political evasion strategies in English question-answer pairs extracted from U.S. presidential interviews. We systematically compare two distinct paradigms: (1) Parameter-Efficient Fine-Tuning of Qwen3 models (4B-32B) using QLoRA, enhanced with tiered upsampling and weighted cross-entropy loss to address severe class imbalance, and (2) structured Chain-of-Thought (CoT) prompting of reasoning-capable API models, namely DeepSeek-V3.2 and Grok-4-Fast. Our evaluation demonstrates that structured CoT prompting of reasoning-enabled models substantially outperforms our baseline parameter-efficient fine-tuning implementation in absolute Macro F1. Our best system, Grok-4-Fast with extended reasoning and few-shot hierarchical CoT prompting, achieves a Macro F1 of 0.5147 on Subtask 2 (9-class evasion) and 0.7979 on Subtask 1 (3-class clarity), ranking 8th out of 33 teams on Subtask 2 and 13th out of 41 teams on Subtask 1 on the official leaderboard. Furthermore, our ablation studies reveal key insights into effective prompt design for evasion detection: presenting labels within a hierarchical taxonomy helps structure model reasoning, while few-shot exemplars provide task calibration. However, the strongest prompt variants are not statistically distinguishable in Macro F1, and explicitly enabling extended reasoning modes yields substantial performance gains by facilitating the multi-step pragmatic analysis required to detect evasive intent.

16.
arXiv (math.PR) 2026-06-11

Capital Asset Pricing Model with Size Factor and Normalizing by Volatility Index

arXiv:2411.19444v5 Announce Type: replace-cross Abstract: The Capital Asset Pricing Model (CAPM) relates a well-diversified stock portfolio to a benchmark portfolio. We insert size effect in CAPM, capturing the observation that small stocks have higher risk and return than large stocks, on average. For some size-based stock portfolios, dividing their returns by the Volatility Index makes them closer to independent and normal. In this article, we combine these ideas to create a new discrete-time model, which includes volatility, relative size, and CAPM. We fit this model using real-world data, prove the long-term stability, and connect this research to Stochastic Portfolio Theory. We fill important gaps in our previous article on CAPM with the size factor.

17.
medRxiv (Medicine) 2026-06-16

Infections and suicide and self-harm: a population-based matched cohort study

Background Infections have been associated with adverse mental health outcomes, including suicide, but evidence beyond severe or central nervous system infections is limited. We investigated associations between a range of acute infections and subsequent suicide/self-harm outcomes. Methods We conducted six infection-specific matched cohort studies using English primary care records from the Clinical Practice Research Datalink Aurum (2007-2024), linked to hospital admissions and mortality data. Adults ([&ge;]18 years) with a primary care record of infection (gastroenteritis, lower respiratory tract [LRTI], skin/soft-tissue [SSTI], urinary tract [UTI], sepsis, meningitis/encephalitis [positive control]) were matched (age, sex, practice, calendar period) to up to five comparators without infection. We estimated hazard ratios (HRs) for suicide/self-harm outcomes using Cox regression, stratified by matched set and implicitly adjusting for matching factors, with additional adjustment for deprivation, lifestyle factors, and comorbidities. We examined whether associations varied over time, by infection severity, antimicrobial treatment, sex, and prior mental health conditions. Findings Cohorts ranged from 18,192 individuals with meningitis/encephalitis (matched to 90,915 without) to 398,099 with SSTI (matched to 1,743,747). After adjustment, individuals with infection had a higher hazard of suicide/self-harm outcomes than comparators across all cohorts: sepsis (HR 1.79, 95% CI 1.65-1.93), gastroenteritis (1.62, 1.55-1.70), meningitis/encephalitis (1.56, 1.32-1.84), UTI (1.41, 1.33-1.50), SSTI (1.37, 1.31-1.43), and LRTI (1.37, 1.31-1.44). Risk was highest in the year post-infection, attenuating over time, and was higher among severe infections and those without prior mental health conditions. Interpretation Common acute infections recorded in primary care are associated with increased risk of suicide and self-harm, particularly following severe infections and in the year post-infection. Findings support suicide risk monitoring following acute infection, particularly among individuals without prior mental health conditions, and highlight infection prevention as a potentially modifiable strategy in vulnerable populations. Funding Wellcome and La Caixa. Copyright This work is licensed under a Creative Commons Attribution (CC BY) licence.

18.
arXiv (CS.AI) 2026-06-16

Understanding Diversity Collapse in RLVR via the Lens of Overtraining

arXiv:2606.15455v1 Announce Type: cross Abstract: Reinforcement learning with verifiable rewards (RLVR) has become a key approach for enhancing the reasoning abilities of large language models. However, RLVR often suffers from diversity collapse: Pass@$1$ improves while high-$k$ Pass@$k$ degrades, which is viewed as a narrowing of the model's reasoning boundary. We formalize this diversity collapse through the lens of overtraining: once a problem's contribution to the reference metric has effectively saturated, further updates no longer expand what the model can solve but still concentrate probability mass on the trajectories favored by on-policy sampling. Under a standard setup with few rollouts per problem, even a single observed success places a problem in a nearly saturated regime for high-$k$ Pass@$k$, so most updates in standard RLVR are overtraining from the boundary perspective. This perspective also suggests a reading of whether RLVR can expand the model's reasoning abilities beyond the base model: since RLVR is structurally biased against high-$k$ Pass@$k$, its aggregate decline does not by itself mean that no new reasoning gains occurred. Interventionally, restricting updates to problems with zero observed success lifts Pass@$256$ above the base model on difficult benchmarks; observationally, a non-trivial fraction of initially unsolvable problems become solvable during standard RLVR training. Building on these findings, we propose Bayesian Boundary Gating (BBG), which redirects optimization away from overtraining by estimating each problem's marginal contribution to the reasoning boundary. Across multiple reasoning benchmarks, BBG improves average Pass@$k$ across a wide range of $k$.

19.
arXiv (CS.CV) 2026-06-15

Efficient Online 3D Multi-Camera Multi-Object Tracking and Pose Estimation

This paper proposes a fast and online method for jointly performing 3D multi-object tracking and pose estimation using multiple monocular cameras. Our algorithm requires only 2D bounding box and pose detections, eliminating the need for costly 3D training data or computationally expensive deep learning models. Our solution is an efficient implementation of a Bayes-optimal multi-object tracking filter, enhancing computational efficiency while maintaining accuracy. We demonstrate that our algorithm is significantly faster than state-of-the-art methods without compromising accuracy, using only publicly available pre-trained 2D detection models. We also illustrate the robust performance of our algorithm in scenarios where multiple cameras are intermittently disconnected or reconnected during operation.

20.
arXiv (math.PR) 2026-06-16

A uniform-in-time weakly convergent explicit numerical method for the underdamped Langevin equation with polynomial potentials

作者:

arXiv:2606.15175v1 Announce Type: cross Abstract: The underdamped Langevin equation is a fundamental model in statistical mechanics for sampling Gibbs measures and simulating molecular dynamics, for which numerical methods with uniform-in-time weak convergence are essential for accurately reproducing long-time statistical observables and invariant measures of the underlying dynamics. Currently, such uniform-in-time weak convergence is established for implicit schemes, but remains unknown for explicit ones under polynomially growing potentials. To improve efficiency in long-time simulations, we propose the first explicit numerical method for the underdamped Langevin equation with polynomially growing potentials that is proven to achieve uniform-in-time weak convergence. The explicit numerical method is constructed by introducing a dissipativity on the scalar auxiliary variable (SAV), which we call the DSAV method. The proposed DSAV method enables the approximation of the invariant measure for the underdamped Langevin equation with a precision of $\varepsilon$ at a significantly reduced computational cost of $\mathcal{O}(\varepsilon^{-1} \log(\varepsilon^{-1}))$. In addition, we establish the existence and positivity of the density function of the numerical solution without using the Malliavin calculus. Numerical experiments are performed to verify the theoretical findings and demonstrate the long-time stability of the proposed numerical method.

21.
arXiv (math.PR) 2026-06-17

Killed resolvents and measure-valued stopping gains for reflected optimal stopping with max-type rewards

arXiv:2606.17517v1 Announce Type: new Abstract: We study an infinite-horizon optimal stopping problem for a normally reflected two-dimensional diffusion in the positive quadrant with nonsmooth max-type reward \(G(x_1,x_2)=x_1\vee \alpha x_2\). The paper develops a conditional measure-theoretic framework for the associated reflected obstacle problem. The main innovation is to show that the stopping gain \(\Gamma=c+rG-\mathcal LG\) is a signed measure, not a function: the kink of \(G\) generates an explicit negative surface measure on \(\Delta=\{x_1=\alpha x_2\}\). We then prove that the correct potential representation uses the resolvent of the reflected diffusion killed on first entry into the stopping set, rather than the unrestricted reflected resolvent. Under explicit monotonicity, regularity, and measure-superharmonicity assumptions, we derive an epigraph representation, a continuation-side boundary-trace condition, and a candidate verification theorem. The framework clarifies hidden regularity and uniqueness assumptions in multidimensional nonsmooth optimal stopping.

22.
arXiv (math.PR) 2026-06-12

Pathwise integration beyond Young via Faber–Schauder energy spaces

作者:

arXiv:2606.13331v1 Announce Type: cross Abstract: We develop a pathwise integration theory based on Faber–Schauder energy spaces. The approach replaces the classical Hölder–Young and finite-variation Young conditions by dyadic summability conditions expressed in terms of Faber–Schauder coefficients. On the normalized interval $[0,1]$, these conditions define Banach spaces $\mathcal{E}^p$, which we call Faber–Schauder energy spaces. For $p,q>1$ satisfying $1/p+1/q\ge1$, we prove that every pair $f\in\mathcal{E}^p$ and $g\in\mathcal {E}^q$ admits a continuous pathwise integral $I_{f,g}$, constructed from dyadic left Riemann sums. We call $I_{f,g}$ the Faber–Schauder integral, and show that it depends boundedly and bilinearly on $(f,g)$ in the corresponding energy norms. The integral satisfies additivity, integration by parts, and a dyadic Young–Loève estimate. It is also the uniform limit of classical Riemann–Stieltjes integrals of finite Faber–Schauder approximations. The Faber–Schauder integral agrees with the classical Young integral whenever the latter is available, but also applies to deterministic and Gaussian examples for which neither the Hölder–Young condition nor the finite-variation Young condition can be verified. In this sense, it provides a Faber–Schauder coefficient-based extension of Young's framework.

23.
arXiv (CS.CV) 2026-06-16

Near–Real-Time Conflict-Related Fire Detection in Sudan Using Unsupervised Deep Learning

Ongoing armed conflict in Sudan highlights the need for rapid monitoring of conflict-related fire-affected areas. Recent advances in deep learning and high-frequency satellite imagery enable near–real-time assessment of active fires and burn scars in war zones. This study presents a near–real-time monitoring approach using a lightweight Variational Auto-Encoder (VAE)–based model integrated with 4-band Planet Labs imagery at 3 m spatial resolution. We demonstrate that these impacted regions can be detected within approximately 24 to 30 hours under favorable observational conditions using accessible, commercially available satellite data. To achieve this, we adapt a VAE–based model, originally designed for 10-band imagery, to operate effectively on high-resolution 4-band inputs. The model is trained in an unsupervised manner to learn compact latent representations of nominal land-surface conditions and identify burn signatures by quantifying changes between temporally paired latent embeddings. Performance is evaluated across five case studies in Sudan and compared against cosine distance, CVA, and IR-MAD using precision, recall, F1-score, and the area under the precision-recall curve (AUPRC) computed between temporally paired image tiles. Results show that the proposed approach consistently outperforms the other methods, achieving higher recall and F1-scores while maintaining viable precision in highly imbalanced fire-detection scenarios. Experiments with 8-band imagery and temporal image sequences yield only marginal performance gains over single 4-band inputs, underscoring the effectiveness of the proposed lightweight approach for scalable, near–real-time conflict monitoring.

24.
arXiv (CS.LG) 2026-06-16

A spectral audit framework reveals task-dependent aperiodic reliance across EEG and ECG deep learning

arXiv:2606.08583v2 Announce Type: replace Abstract: Deep learning on physiological time series is interpreted through domain-specific features – oscillatory rhythms in EEG, morphological complexes in ECG – yet these signals sit atop a broadband aperiodic 1/f-like envelope that covaries with arousal, age, and pathology. We introduce a spectral audit framework combining aperiodic/periodic decomposition, phase-preserving Fourier interventions, sham controls, and simulation validation. Aperiodic reliance was task-dependent and architecture-general: across six neural architectures, flattening drops exceeded 0.42 balanced-accuracy points for sleep-wake classification, reached 0.07-0.13 for clinical abnormality detection, and remained minimal for motor imagery. Six of seven EEG foundation models showed FDR-significant aperiodic reliance on clinical EEG; age/sex and recording-era controls reduced but did not eliminate the effect. Applying the audit to PTB-XL ECG revealed neural drops of 0.32–0.36 persisting after demographic matching, confirming this confound class extends beyond EEG. Aperiodic controls should become standard for interpretable physiological time-series deep learning.

25.
medRxiv (Medicine) 2026-06-18

Cost-effectiveness of a virtual fracture clinic versus traditional in-person fracture clinic care for adults with acute simple fractures: a protocol for a health economic evaluation within the RECITAL trial

ABSTRACT Introduction Traditional in-person fracture clinics are often overcrowded and inconvenient for patients. Virtual fracture clinics aim to address some of these concerns by improving the efficiency of the orthopaedic service and reducing unnecessary interventions while maintaining safety and quality of care. The RECITAL trial is a non-inferiority randomised controlled trial comparing follow-up care provided at a virtual fracture clinic for people with acute simple fractures to follow-up care provided at an in-person fracture clinic. This study describes the protocol for an economic evaluation of RECITAL where the primary aim is to investigate the cost-effectiveness of a virtual fracture clinic compared with traditional in-person fracture clinic care from a health system perspective. Methods and analysis The RECITAL trial recruited 312 participants with acute simple fractures and randomised them to receive follow-up care provided at a virtual fracture clinic or follow-up care provided at an in-person fracture clinic. We will conduct a within-trial analysis from a health system perspective (primary analysis), as well as a health service, patient and societal perspective. The economic evaluation will estimate the difference in the cost of resource inputs on an intention to treat basis used by participants in the two arms of the trial, allowing comparisons to be made between the in-person and virtual fracture clinics. Data for intervention costs and healthcare utilisation will be collected from trial records, hospital electronic medical records and district performance units. The results of the economic evaluation will be expressed in terms of incremental cost per utility weight gained at 12 weeks and will be plotted on a cost-effectiveness plane. Bootstrapping by resampling will be used to estimate 95% confidence intervals around costs and outcomes, and to calculate the confidence intervals around the incremental cost-effectiveness ratio. A cost-effectiveness acceptability curve (CEAC) will be plotted, which will provide information about the probability that an intervention is cost-effective, given the level of a decision makers willingness to pay for each additional outcome. Ethics and Dissemination The trail was approved by the SLHD Ethics Review Committee (RPAH Zone) (X23-0200 and 2023/ETH01038). The findings will be disseminated through a peer-reviewed journal and conference presentations. Trial registration number The trial was prospectively registered on the Australian New Zealand Clinical Trials Registry (ANZCTR; 12623000934640)