Academic Intelligence · Curated Daily

Explore the Frontier of Global Academia

AcademicHub aggregates real-time literature from top journals and preprint platforms. Build your personal research radar and let large language models compile cross-disciplinary analysis briefings automatically.

01.
medRxiv (Medicine) 2026-06-22

Evidence-guided AI regularization for suicidal ideation prediction in pediatric bipolar disorder

Background: Suicide prediction models in psychiatry often rely on purely data-driven feature selection, which can produce unstable and clinically opaque predictor sets in modest-sized samples. We developed Evidence-Based AI LASSO (EBAL), an evidence-guided regularization framework that incorporates curated clinical evidence into feature-specific penalty factors for interpretable prediction. Methods: Baseline data from 136 youth with confirmed bipolar spectrum disorder in the Greater Houston Area Bipolar Registry were analyzed using 20 candidate clinical predictors. Forty higher-level evidence documents on suicidality and related predictor domains were curated through a structured evidence synthesis workflow and indexed as an auditable evidence corpus. An open-weight large language model assigned feature-specific penalty factors using a prespecified scoring rubric, and these penalties were used to fit a weighted LASSO model. EBAL was compared with a standard evidence-agnostic LASSO using nested leave-one-out cross-validation. Results: For suicidal ideation, EBAL achieved an AUROC of 0.768, balanced accuracy of 0.757, sensitivity of 0.758, and specificity of 0.757. The standard LASSO achieved an AUROC of 0.760 and balanced accuracy of 0.715. EBAL improved balanced accuracy (+0.042, p=0.010) and Matthews correlation coefficient (+0.079, p=0.010), while retaining fewer stable predictors than standard LASSO (11/20 vs 18/20). The strongest positive predictors were current depressed mood, duration of mood disorder illness, and comorbid generalized anxiety disorder. For suicidal behavior, both models performed near chance and retained all candidate predictors. Limitations: The study was cross-sectional, single-site, and modest in sample size, with no external validation cohort. Conclusions: EBAL produced a sparser and more clinically coherent model for suicidal ideation in pediatric bipolar disorder, but did not improve prediction of suicidal behavior. These findings support evidence-guided regularization as a transparent strategy for aligning psychiatric prediction models with prior clinical knowledge while preserving interpretability.

02.
arXiv (quant-ph) 2026-06-16

The Distribution Postulate in Algorithmic Bohmian Mechanics

arXiv:2606.16165v1 Announce Type: new Abstract: In order to make the right empirical predictions Bohmian mechanics requires a special statistical boundary condition – the distribution postulate – but it is unclear how best to understand this condition. We show how one might use the theory of algorithmic randomness to formulate the distribution postulate as an objective constraining law. The framework requires us to say something about admissible quantum-mechanical states and measurements. In return, algorithmic Bohmian mechanics (aBM) guarantees the standard Born statistics for a collection of canonical quantum experiments in the limit, not just with high probability. The algorithmic distribution postulate provides a sharp typicality condition, clarifies the status of quantum probabilities in the deterministic theory, and provides a concrete example of how notions provided by the theory of algorithmic randomness can aid in specifying the content of a physical law.

03.
medRxiv (Medicine) 2026-06-24

TMPRSS2-Coagulation Nexus: A Novel Molecular Link Revealed by Pairwise Correlation Analysis Following AstraZeneca (ChAdOx1 nCoV-19) Vaccination in a Nigerian Cohort

Background: While haematological and coagulation changes following AstraZeneca vaccination have been described, the molecular mechanisms linking TMPRSS2 expression to coagulation remain underexplored, particularly in African populations. Methods: In this case-control study, 102 adults (51 vaccinated with AstraZeneca >=6 months prior, 51 unvaccinated controls) aged 18-65 years in Port Harcourt, Nigeria, were evaluated. Full blood count (Sysmex XN-1000), PT/aPTT (Erba Mannheim), RNA concentration, and qRT-PCR for ACE2/TMPRSS2 (normalized to GAPDH) were performed. Pearson correlations and t-tests were conducted (SPSS v26, p

04.
medRxiv (Medicine) 2026-06-19

Fine-Tuning SAM2 for Coronary Artery Segmentation in X-Ray Fluoroscopy

Authors:

SAM2 (Meta, 2024) provides a strong starting point for segmentation, but given the unique challenges in medical imaging (noise from patient movement, the projection-based nature of X-ray fluoroscopy, and low contrast between vessels and background), direct application is difficult. We fine-tune MedSAM2 on annotated coronary angiograms and apply it to video data for point-of-care use. On the ARCADE validation set (200 images), the fine-tuned model achieves Dice 0.767 compared to 0.033 zero-shot. On 10 fluoroscopic video studies from CoronaryDominance, it tracks vessels coherently and avoids falsely segmenting ribs, stents, and bypass grafts in 9 of 10 studies. Code is available at https://github.com/elakiyasivakumar/SAM2-Coronary-Angiography-VA and the fine-tuned checkpoint at https://huggingface.co/Elakiya17/CA-SAM2.

05.
arXiv (CS.AI) 2026-06-19

Interpreting Neural Combinatorial Optimization via Evolving Programmatic Bottlenecks

arXiv:2606.19741v1 Announce Type: new Abstract: Neural Combinatorial Optimization (NCO) achieves strong performance, yet its black-box nature remains a key roadblock to deployment and scientific diagnosis. Standard interpretability tools, such as Concept Bottleneck Models (CBMs), are ill-equipped for NCO, whose decisions are dynamic, state-dependent, and lack proper concept vocabulary definition. To close this gap, we introduce Evolving Programmatic Bottlenecks (EPB), to our knowledge, the first framework for interpreting NCO policies by distilling black-box NCO models into human-readable program portfolios. EPB employs an LLM to autonomously evolve a bank of programs, where each program's per-step action distribution serves as the bottleneck. EPB works through an iterative framework: Block I fixes program bank capacity and introduces a hybrid textual-numerical gradient descent scheme that couples numerical gradients for student router updates and textual gradients for LLM-based program revision; Block II dynamically adapts bank capacity via fault-targeted expansion and redundancy pruning. Extensive experiments demonstrate EPB's effectiveness and broad applicability, where the distilled program portfolios largely match original performance. EPB also reveals that NCO behavior shifts across optimization stages and can be approximated as a composition of classic heuristic variants. Our work advances interpretable NCO and establishes EPB as a promising tool for interpreting sequential decision-making models.

06.
arXiv (math.PR) 2026-06-18

Functions of Bounded Variation and Point Processes

arXiv:2606.08304v2 Announce Type: replace-cross Abstract: We investigate the relationship between the analytical properties of functions of bounded variation and the statistical behavior of hyperuniform point processes. We establish several characterization formulas for the jump part of the gradient of a bounded variation function, extending and unifying previous results by Beretti–Gennaioli and Dávila. In particular, we provide new expressions for the $L^2$-jump of the gradient using both difference quotients and Fourier transform methods. Furthermore, we connect these analytic structures to the theory of hyperuniform point processes. By analyzing the variance of linear statistics associated with bounded variation functions, we provide asymptotic estimates that depend on the specific classification of the hyperuniformity of the point process. The results show how the regularity and jump discontinuities of a function dictate the growth rate of fluctuations in point processes. Finally, we introduce an averaged quadratic BMO-type oscillation functional over translated and rotated cube partitions, similar to the one recently studied by Ambrosio et al., and prove, using results from point process, that it converges to an explicit dimensional constant times the $L^2-$jump, giving in particular a further new characterization of the perimeter of a set.

07.
arXiv (quant-ph) 2026-06-11

Multipartite reference-frame-independent quantum cryptographic communication

arXiv:2606.12284v1 Announce Type: new Abstract: Reference frame mismatch among communication parties introduces errors in quantum cryptographic protocols. As the number of participants increases, aligning reference frames becomes increasingly difficult, complicating multipartite quantum cryptographic implementations. Here, we theoretically and experimentally investigate multipartite reference-frame-independent (RFI) quantum cryptographic communication using Greenberger-Horne-Zeilinger (GHZ) states. We generalize the bipartite RFI security parameter $C$ to an $N$-party parameter $C_N$ and derive the asymptotic secret key rate expressed solely in terms of experimentally accessible quantities. We analyze the key rate under global and local depolarizing noise models and find that increasing the number of parties $N$ enhances robustness against global depolarizing noise while increasing vulnerability to local channel noise. We also present a proof-of-principle experimental demonstration of four-party RFI quantum cryptographic communication using four-photon GHZ states, confirming the reference-frame invariance of both the $C_4$ parameter and the secret key rate under various reference frame rotations.

08.
arXiv (CS.AI) 2026-06-12

Variational Learning for Insertion-based Generation

arXiv:2606.02133v3 Announce Type: replace-cross Abstract: Non-monotonic sequence generation methods, such as masked diffusion models, provide a flexible alternative to left-to-right autoregressive modeling by allowing tokens to be generated in non-fixed and prescribed orders. Despite their practical advantages, most existing non-monotonic models are order-agnostic and rely on a fixed-length grid, limiting their ability to support variable-length generation and adaptive insertion order. In this work, we introduce a probabilistic framework for learning insertion order in variable-length insertion models. We formalize a bijective correspondence between insertion trajectories and permutations, which enables an exact reparameterization of the data likelihood as a sum over permutations. Building on this result, we propose the Insertion Process (IP), a stochastic generative model that jointly learns where to insert, what to insert, and when to terminate, trained via permutation-based variational inference. Unlike prior fixed-canvas approaches, IP natively supports variable-length generation and learns data-driven preferences over insertion orders. Experiments on goal-conditioned planning and molecular string generation demonstrate that learning insertion order improves both modeling quality and generalization in domains without a canonical left-to-right structure.

09.
arXiv (CS.AI) 2026-06-19

Autonomous Event-Driven Multi-Agent Orchestration for Enterprise AI at Scale

arXiv:2606.20058v1 Announce Type: new Abstract: Enterprise AI aims to move toward continuous event monitoring, detection, and action across specialist agents, yet existing multi-agent systems largely assume discrete request-response workflows and remain underexplored at enterprise scale. We evaluate DAG Plan and Execute and ReAct across 208 production-derived enterprise scenarios spanning Persona (

10.
arXiv (CS.AI) 2026-06-11

"That's AI Slop, You Bot!" Studying Accusations, Evidence, and Credibility in Online Discourse Towards LLM-Generated Comments

arXiv:2606.12073v1 Announce Type: cross Abstract: Generative AI has made fluent prose cheap to produce, breaking the old promise to readers that good writing meant real thinking. How have readers responded, and what can this tell us about changing anti-AI attitudes? We analyzed 25 million comments from Hacker News and Reddit (2023-2026), combining LLM judgment on 7,500 sampled accusations of AI use, sentiment trajectories, speech-act coding of 300 confirmed accusations of AI use, and a matched-control test of accused versus non-accused parent comments. We found that the pejorative-label share of accusations rose more than tenfold on both platforms while a placebo vocabulary of pre-2022 inauthenticity terms (shill, astroturf) did not. This shift reflected a fast-growing trend of branding any suspicious or seemingly inauthentic prose as "AI slop". The slop frame now constitutes 94 percent of pejorative mentions, with the dominant comments shifting in tone from mockery toward gatekeeping and structural protest. The key surprise comes from a matched-control test which found that prose features that statistically distinguish AI from human text do not predict which human text gets accused as AI. The new accusations work as social gatekeeping of perceived authenticity without actually screening for AI. This research extends signaling theory by showing that substitute signals used socially can grow even when inaccurate if the underlying detection problem cannot be solved at the non-expert level. It shows that AI's effects on writing from the reader side are distinct from those on the production (writer) side. Detection technology cannot resolve this dynamic because the social function of accusations is increasingly to perform social gatekeeping and in-group signaling as opposed to identifying AI-generated writing.

11.
arXiv (CS.LG) 2026-06-19

Reversible Residual Normalization Alleviates Spatio-Temporal Distribution Shift

arXiv:2604.15838v2 Announce Type: replace Abstract: Distribution shift severely degrades the performance of deep forecasting models. While this issue is well-studied for individual time series, it remains a significant challenge in the spatio-temporal domain. Effective solutions like instance normalization and its variants can mitigate temporal shifts by standardizing statistics. However, distribution shift on a graph is far more complex, involving not only the drift of individual node series but also heterogeneity across the spatial network where different nodes exhibit distinct statistical properties. To tackle this problem, we propose Reversible Residual Normalization (RRN), a novel framework that performs spatially-aware invertible transformations to address distribution shift in both spatial and temporal dimensions. Our approach integrates graph convolutional operations within invertible residual blocks, enabling adaptive normalization that respects the underlying graph structure while maintaining reversibility. By combining Center Normalization with spectral-constrained graph neural networks, our method captures and normalizes complex Spatio-Temporal relationships in a data-driven manner. The bidirectional nature of our framework allows models to learn in a normalized latent space and recover original distributional properties through inverse transformation, offering a robust and model-agnostic solution for forecasting on dynamic spatio-temporal systems.

12.
medRxiv (Medicine) 2026-06-16

Adherence to Red Reflex and Vision Screening Recommendations: A Deep Dive into Primary Care Implementation Gaps

Introduction: Early childhood vision screening is critical for detecting amblyopia and other vision-threatening conditions. Despite screening recommendations during well-child visits, rates remain low. Red reflex assessment is recommended to identify serious ocular pathology, yet its use in primary care is not well described. We examined rates and drivers of vision screening in pediatric primary care. Methods: We conducted a retrospective review of electronic health records for children 3 to 5 years attending well-child visits in 2022 in one of three representative primary care clinics within a university health system. Outcomes were documented red reflex and functional vision tests. We evaluated associations with patient demographics and clinic site using multivariable logistic regression Results: Among 1,003 visits, 21.1% (n=212) had a documented red reflex assessment, and 60.8% (n=610) a functional vision test. Younger children (ages 3 and 4 vs. 5 years) had higher odds of red reflex assessment [adjusted odds ratio (aOR) 9.00 and 8.64], and lower odds of a functional vision (aOR 0.47 and 0.59) test. Females had higher odds of red reflex assessment (aOR 1.53). Other/Multiracial children had lower odds of red reflex assessment than Non-Hispanic White children (aOR 0.48). Screening rates varied significantly by clinic site Conclusions: Visual function and red reflex assessment are inconsistently performed in pediatric primary care, with particularly low rates of red reflex documentation. Screening rates varied between clinics and were affected by age. These findings highlight missed opportunities for early detection of vision-threatening conditions and identify targets for improving adherence to pediatric vision screening recommendations

13.
arXiv (CS.AI) 2026-06-16

ToolSelf: Unifying Task Execution and Self-Reconfiguration via Tool-Driven Emergent Adaptation

arXiv:2602.07883v4 Announce Type: replace Abstract: LLM-powered agentic systems excel at complex long-horizon tasks, but remain constrained by static configurations fixed before execution. Such rigidity forces a trade-off between domain-specific performance and cross-task generalization: strong priors and compact tool spaces aid specialization but weaken transfer, while task-agnostic workflows and broad action spaces expand coverage but dilute guidance. Existing pre-execution optimization, planner-worker orchestration, and configuration patching fall short of resolving this tension, as they decouple adaptation from execution, causing information loss, fragmented optimization, and ambiguous credit assignment. We propose ToolSelf, a tool-driven runtime self-reconfiguration paradigm that abstracts configuration updates as a standardized tool interface and unifies execution and adaptation within one policy's action space. The execution agent can dynamically update sub-goals, strategies, toolboxes, context, and context-management modes based on task progress and feedback. We further introduce Configuration-Aware Two-stage Training (CAT), which combines rejection sampling fine-tuning with trajectory-level KTO reinforcement learning to internalize self-reconfiguration. Across diverse benchmarks, zero-shot ToolSelf rivals task-specialized agents; after CAT training, ToolSelf gains 28.8 points over the static-configuration baseline on average, illuminating a path toward emergent adaptivity that obviates manually injected guidance. The code is available at https://github.com/lian-tian-mo-zun/ToolSelf.

14.
arXiv (CS.LG) 2026-06-18

Measurement noise limits the advantage of nonlinear models over linear models in biomedical prediction

arXiv:2606.18420v1 Announce Type: new Abstract: On biomedical tabular data, flexible models such as deep networks, gradient-boosted trees, and kernel methods are repeatedly matched or beaten by linear and logistic regression given the same features. The usual reaction is to treat this as a model-side shortfall, to be fixed with more data, a better architecture, or tuning, on the assumption that the nonlinear structure is there and the model has failed to capture it. We argue that these fixes cannot help when the binding limit is the measurement rather than the model, as it frequently is in biomedicine. Additive noise blurs the population-optimal predictor, and because blurring removes a function's fine, rapidly varying detail before its broad shape, it erases nonlinear structure faster than linear structure. A degree-$k$ interaction is attenuated by the $k$-th power of feature reliability, while the linear part is attenuated only once. At the reliabilities typical of biomedical measurement, the nonlinear advantage can vanish even when the underlying biology is strongly nonlinear, and what the noise removes cannot be recovered by a larger cohort or a more flexible model, only by better measurement. The nonlinearity is hidden, not absent, and a tie between linear and flexible models is not by itself a verdict on the biology. These pieces are classical, drawn from measurement-error statistics, psychometrics, and Gaussian analysis, and we assemble them into an exact excess-risk identity. Measurement reliability is one of three conditions, alongside sample size and feature representation, that must align for a flexible model to help, and together they leave only a narrow window that most biomedical tasks fall outside. Across 140 UK Biobank tasks, the gap between flexible and linear models, where it exists, carries the predicted noise signature, and the three conditions can be separated by intervention but not by a benchmark alone.

15.
arXiv (math.PR) 2026-06-18

On a class of reflected McKean-Vlasov Stochastic Differential Equations with jumps

arXiv:2606.18433v1 Announce Type: new Abstract: This paper investigates a class of reflected McKean-Vlasov Stochastic Differential Equations driven by both Brownian motion and a compensated Poisson random measure. We establish the existence and uniqueness of solutions and provide moments estimates for the state processes.

16.
arXiv (CS.AI) 2026-06-17

Discrete Autoregressive Transformer for Generative Mechanism Synthesis

arXiv:2606.17409v1 Announce Type: cross Abstract: Planar path synthesis requires mechanisms whose coupler curves match a prescribed trajectory; the mapping from curve to linkage is inherently one-to-many across four-, six-, and eight-bar topologies. We address this design problem with simulation-grounded evaluation on a curated corpus of over one million mechanisms, reporting Chamfer distance and dynamic time warping after forward kinematics and geometric alignment. We formulate synthesis as conditional autoregressive sequence modeling: joint coordinates are uniformly quantized to tokens and generated by a decoder-only transformer with a variational-autoencoder (VAE) latent of the target curve and an explicit mechanism-type token. Training combines token cross-entropy with a Gaussian-smoothed bin auxiliary loss that respects ordinal structure among bins. At inference, a bounded latent-noise schedule decodes all mechanism types at each noise level; we retain the top five candidates by geometric error, yielding diverse accurate families without dataset lookup. On held-out tests, aggregate mean Chamfer distance is $0.0132$ and mean dynamic time warping is $0.153$; a latent $k$-nearest-neighbor baseline that conditions on training-set neighbor latents in VAE space achieves matched-topology mean Chamfer distance $0.0071$ and mean dynamic time warping $0.117$ using the same decoder.

17.
arXiv (CS.AI) 2026-06-19

RIVET: Robust Idempotent Voice Attribute Editing

arXiv:2606.19629v1 Announce Type: cross Abstract: Voice attribute editing models modify characteristics such as age and gender while preserving speaker identity. In large-scale speech datasets, however, attribute annotations are often noisy or inconsistent, which can cause conditional generative models to produce unstable edits. In this work, we show that idempotency provides an effective mechanism for improving robustness to noisy labels. An idempotent operator is one for which repeated application does not change the result, i.e., f(f(x)) = f(x). Enforcing this property acts as an implicit regularizer that reduces sensitivity to mislabeled examples. We introduce RIVET, a training framework that incorporates an idempotency objective to improve robustness to label noise. We evaluate RIVET under controlled label noise and on the GLOBE dataset with naturally noisy annotations. RIVET improves editing success and better preserves speaker identity than standard training, showing that idempotency improves robustness in voice editing models.

18.
arXiv (CS.AI) 2026-06-11

Agentic Software: How AI Agents Are Restructuring the Software Paradigm

Authors:

arXiv:2606.05608v2 Announce Type: replace-cross Abstract: For over half a century, software engineering has operated on a foundational premise: human engineers decompose problems, encode decision logic into static code, and manually adapt that code as requirements evolve. This paper argues that the emergence of AI agents – systems where large language models serve as the primary reasoning engine, dynamically generating and discarding code as an instrumental resource – constitutes a fundamental restructuring of what software is, not an incremental tool improvement. We formalize the distinction between traditional deterministic software and agentic software: in the former, code is the carrier of pre-written decision logic; in the latter, the agent itself is the software, and its decision logic is generated at runtime. We trace the historical arc from licensed software to SaaS to Agent-as-a-Service (AaaS), showing that each shift transferred additional complexity away from end-users – with the agentic shift transferring not just operational complexity but decision-making complexity itself. We introduce Agentic Engineering as an expansion of the software engineering discipline into a new paradigm, distinct in its core object of study (agent systems rather than static source code), its control model (LLM-driven rather than human-predefined), and its human role (intent architect rather than code author). Through analysis of recent benchmark evidence including SWE-bench Verified, EvoClaw, and LangChain's multi-agent coordination studies, we demonstrate both the transformative potential of the agentic paradigm and its current limitations. We conclude with a four-stage roadmap toward self-evolving agent ecosystems and concrete recommendations for practitioners navigating this transition.

19.
arXiv (CS.LG) 2026-06-18

ChronoSurv: A Clinical Pathway-Guided Graph Framework for Multimodal Survival Analysis

arXiv:2606.19140v1 Announce Type: new Abstract: Accurate survival prediction is essential for personalized treatment planning in head and neck cancer, yet remains challenging due to the heterogeneous and high-dimensional nature of multimodal clinical data. While deep survival models have improved predictive performance over classical statistical approaches, existing methods typically rely on static fusion strategies or temporally agnostic modeling, limiting their ability to capture structured clinical workflows. In this work, we propose ChronoSurv, a heterogeneous hierarchical directed graph framework for multimodal survival analysis. ChronoSurv represents patient care as a progression-aware clinical trajectory using directed graphs aligned with key diagnostic steps. A hierarchical topology incorporates fine-grained, coarse, and global representations, further supporting flexible adaptation to missing modalities, while heterogeneous message passing models complex and asymmetric relationships across modalities and clinical steps. Experimental results on two public datasets demonstrate that ChronoSurv achieves state-of-the-art discriminative performance while maintaining statistically reliable calibration. Comprehensive ablation studies further confirm the contribution of each architectural component, highlighting the potential of trajectory-aware graph modeling for multimodal survival prediction.

20.
arXiv (CS.LG) 2026-06-15

Nonlinear Two-Time-Scale Stochastic Approximation: A Sharp Phase Transition and How to Beat It

arXiv:2606.14488v1 Announce Type: cross Abstract: Recent finite-time analyses of nonlinear two-time-scale stochastic approximation show that under contractive assumptions the slow iterate $Y_k$ with stepsizes $\beta_k=\Theta(k^{-1})$ and $\alpha_k=\Theta(k^{-a})$, $a\in(1/2,1)$, generally satisfies a mean-square rate of order $k^{-a}$; decoupled $k^{-1}$ rates require strong local linearity. We identify a sharp regularity-dependent boundary. In a rate-determining normal form where the slow drift contains a locally linear leakage and a nonlinear remainder of order $1+\rho$ ($\rho\in[0,1]$), the uncorrected recursion satisfies \[ \mathbb{E}\|Y_k\|^2 \le C\bigl(k^{-1}+k^{-a(1+\rho)}\bigr), \] and a matching scalar Gaussian lower bound shows that the slower term is unavoidable without modifying the update. Thus the decoupled $k^{-1}$ rate is guaranteed for the uncorrected recursion exactly when $a(1+\rho)\ge 1$. This lower bound concerns only the naive update; it is not an information-theoretic obstruction. We demonstrate this by equipping the normal-form recursion with an auxiliary online bias estimator \[ M_{k+1}=M_k+\gamma_k(R(X_k)-M_k),\qquad \beta_k\ll\gamma_k\ll\alpha_k, \] and subtracting $M_k$ from the slow update. Under the same stability, moment, and remainder assumptions, the corrected recursion achieves $\mathbb{E}\|\widetilde Y_k\|^2=O(k^{-1})$ for every $\rho\in[0,1]$, including regimes where the uncorrected update provably suffers the slower rate. Finally, we prove localized transfer theorems that extend the phase-transition mechanism to general nonlinear TTSA in fast-manifold coordinates. The proofs are non-asymptotic and rely on two Abel-transform cancellations: one for the locally linear fast-error leakage, and one for the tracked nonlinear bias.

21.
arXiv (CS.AI) 2026-06-24

More Skills, Worse Agents? Skill Shadowing Degrades Performance When Expanding Skill Libraries

arXiv:2605.24050v2 Announce Type: replace-cross Abstract: Skill libraries allow LLM agents to load task-specific instructions on demand, letting non-expert users solve domain-specific tasks through natural language without knowing which skills exist or how they work. However, performance degrades as libraries grow – by up to 21\% when scaling from a small set of helpful skills to a 202-skill library. In this work, we formulate this performance degradation as the pass rate drop between loading a library of known-helpful skills and the full library. Moreover, we propose to decompose the pass rate drop by conditioning on the skill(s) invocation – which skills the agent selects during a trajectory – into two effects: skill shadowing, where the agent selects wrong skills more often as the library expands, and context overhead, where the enlarged context degrades execution even when selection is correct. We derive upper bounds on both effects to characterize their magnitudes of impacts to the pass rate drop. Our empirical estimates of the effects and their upper bounds both show that the skill shadowing effect grows with library size and significantly contributes to the performance degradation, whereas the context overhead effect remains small and indistinguishable from zero. This observed asymmetry establishes that the skill selection failure, not the enlarged context, is the primary bottleneck when expanding the skill libraries.

22.
arXiv (CS.AI) 2026-06-15

Sensitivity Shaping for Latent Modeling

arXiv:2606.14585v1 Announce Type: cross Abstract: Generative dynamics models enable planning in challenging robotic systems, but safe deployment requires reliably detecting policy-induced out-of-distribution (OOD) transitions. Existing methods typically treat the learned dynamics as fixed and attach post hoc support surrogates. We show that these surrogates can fail when the dynamics are locally insensitive to critical action choices: unsupported control actions may produce latent predictions that resemble demonstrated transitions, suppressing OOD signals despite large true predictive errors. To address this, we introduce support-conditioned control-sensitivity regularization, which promotes sensitive local response to control input changes in learned dynamics in high-support training regions. This preserves control-induced variation while limiting unstable extrapolation due to weak empirical support. Experiments in vision-based obstacle avoidance, manipulation, and real-robot navigation show improved OOD detection and safer closed-loop planning.

23.
arXiv (math.PR) 2026-06-17

Large deviation principle for friendship-biases in Galton–Watson trees

arXiv:2606.17381v1 Announce Type: new Abstract: In this paper we consider the friendship-bias of the vertices in an infinite rooted Galton–Watson tree. The friendship-bias of a vertex is the difference between the average degree of the neighbours of the vertex and the degree of the vertex itself. A vertex is said to be of type $\chi \in S$, with $S = \{-,0,+\}$, when its friendship-bias is, respectively, strictly negative, zero or strictly positive. We consider the fractions $f_l^\chi$ of vertices of type $\chi \in S$ along a random downward path up to branching depth $l \in \mathbb{N}$ and derive a large deviation principle (LDP) for the triple $(f_l^\chi)_{\chi \in S}$ as $l\to\infty$. The branching depth of a vertex counts the number of branchings that occur along the path that connects the vertex to the root of the tree. The rate in the LDP is $l$, while the rate function in the LDP is identified in terms of a variational formula minimising a relative entropy under a linear constraint. We focus on the case of binary branching, for which the rate function is already quite involved. We identify the qualitative properties of the rate function and show how it can be computed numerically. We briefly indicate how to proceed for more general branching and for vertex types along a tree consisting of a finite number of random downward paths. Our paper is the first to consider large deviations of vertex types.

24.
arXiv (CS.AI) 2026-06-24

E-MRL: Cross-view Aligned Evidence-driven Multimodal Reinforcement Learning for Reliable 3D Tumor Analysis

arXiv:2606.23888v1 Announce Type: cross Abstract: While Vision-Language Models (VLMs) show great promise in volumetric medical report generation, they frequently suffer from visual hallucinations and a lack of grounding in 3D CT data. Current Supervised Fine-Tuning (SFT) and Reinforcement Learning (RL) strategies typically optimize text fidelity alone, essentially rewarding correct diagnoses derived from language priors rather than genuine visual perception. To address this, we propose cross-view aligned Evidence-driven Multimodal Reinforcement Learning (Evidence-MRL, noted as E-MRL), a reliable RL reasoning framework that formulates the generation process as a Markov Decision Process of "diagnosis-localization-verification". Unlike standard approaches, our model is explicitly trained to identify a "key evidence slice" alongside the global diagnostic report, grounding its findings in verifiable visual evidence. Crucially, we introduce a novel cross-view consistency reward, which validates the semantic alignment between the golden-standard report and a local visual re-query of the selected key slice, providing additional rewards for correctly-localized reasoning. Experiments on large-scale 3D CT tumor datasets demonstrate that E-MRL significantly reduces hallucinations and improves diagnostic accuracy compared to SFT and RL baselines, offering a clinically interpretable solution for visually-grounded and tumor analysis.

25.
arXiv (CS.CV) 2026-06-16

Wasserstein Equilibrium Decoding for Reliable Medical Visual Question Answering

Small vision-language models (2-8B) are well-suited for clinical deployment due to privacy constraints, limited connectivity, and low-latency requirements favouring on-device or on-premise inference. However, their limited capacity exacerbates the generation of plausible but incorrect outputs. We extend game-theoretic decoding, previously restricted to text-only, closed-ended NLP tasks, to vision-language models for open-ended Medical VQA. We introduce a semantically aware Wasserstein stopping criterion that replaces lexical order matching, enabling convergence based on semantic consensus among near-synonymous candidate answers and avoiding unnecessary iterations caused by clinically equivalent ranking swaps. On VQA-RAD and PathVQA, we obtain consistent, statistically significant improvements over greedy and discriminative baselines. On VQA-RAD, we improve Qwen3-VL-2B by +3.5 percentage points (p < 0.01), surpassing the greedy 4B model, with similar trends at larger scales. On PathVQA, Gemma-3-4B with BDG matches MedGemma-4B under greedy decoding despite no domain-specific fine-tuning. At accuracy parity with classic BDG, the Wasserstein criterion reduces average convergence iterations by approximately 20%, improving inference efficiency while preserving the game-theoretic equilibrium behaviour. Code is available at https://github.com/luca-hagen/ Wasserstein-BDG-medical-VQA.