Academic Intelligence · Curated Daily

探索全球前沿学术脉络

AcademicHub 汇聚顶级期刊与预印本平台的实时文献。定制您的专属科研雷达,利用大语言模型自动生成交叉领域文献分析简报。

01.
arXiv (CS.LG) 2026-06-16

Learning the Geometry of Data: A Mathematical Review of Shape Space Analysis

arXiv:2606.17022v1 Announce Type: cross Abstract: A central objective of machine learning is to identify structure and patterns in data. Advances in data acquisition have increasingly produced datasets whose observations possess rich geometric form, giving rise to shape spaces that encode variability in object geometry. Such datasets arise across a wide range of disciplines, including biology, medicine, anthropology, and computer vision, where subtle geometric differences often carry important scientific information. Traditional machine learning methods, however, are frequently ill-equipped to account for the nonlinear geometric structure underlying these data. This survey synthesizes a rapidly growing body of work on shape space analysis, which provides a mathematical and computational framework for the study of geometric data. Drawing on ideas from differential geometry, statistics, and machine learning, we organize the literature around a common analytical pipeline: shape representation and parameterization, the rigorous construction of robust geodesic metrics, statistical analysis on shape spaces, and geometry-aware learning methods. We discuss how these tools enable the characterization of shape variability, the comparison of geometric objects, and the analysis of structural trajectories across populations and time. To illustrate the breadth of the field, we highlight applications spanning multiple scales of biological organization, including studies of subcellular morphology and primate tooth evolution. Across these and many other domains, researchers face common challenges arising from complex, nonlinear, and often unaligned geometric variation. The review concludes by identifying key theoretical and computational challenges, as well as emerging opportunities driven by increasingly large and diverse geometric datasets.

02.
arXiv (CS.CV) 2026-06-11

RelayFormer: A Unified Local-Global Attention Framework for Scalable Image and Video Manipulation Localization

Visual manipulation localization (VML) aims to identify tampered regions in images and videos, a task that has become increasingly challenging with the rise of advanced editing tools. Existing methods face two central issues. The first is resolution diversity. Resizing or padding can distort subtle forensic cues and introduce unnecessary computational cost. The second is the difficulty of extending spatial models for images to spatio-temporal inputs in videos, which often results in maintaining separate architectures for the two data types. To address these challenges, we propose RelayFormer, a unified framework that adapts to varying resolutions and naturally handles both static and temporal visual data. RelayFormer partitions inputs into fixed-size sub-images and introduces Global Local Relay (GLR) tokens that propagate structured context through a relay-based attention mechanism. This design enables efficient exchange of global cues, such as semantic or temporal consistency, while preserving fine-grained manipulation artifacts. Unlike prior approaches that depend on uniform resizing or sparse attention, RelayFormer scales to variable resolutions and video sequences with minimal overhead. Experiments across diverse benchmarks demonstrate superior performance and strong efficiency, combining resolution adaptivity without interpolation or excessive padding, unified processing for images and videos, and a favorable balance between accuracy and computational cost. Code is available at~\href{https://github.com/WenOOI/RelayFormer}{https://github.com/WenOOI/RelayFormer}.

03.
arXiv (CS.LG) 2026-06-19

Efficient Neural Network Model Selection for Few-Class Application Datasets

arXiv:2606.19712v1 Announce Type: new Abstract: While much effort has focused on developing and benchmarking high-performance neural networks, less attention has been given to how dataset properties, known to practitioners, can guide efficient model selection. Neural models are typically evaluated on datasets with thousands of classes, yet many real-world applications involve fewer than ten. To address this understudied but common setting, we develop a measure of classification difficulty based on data-side properties and show how it enables more efficient model selection for few-class datasets, where traditional approaches are less effective. We term this phenomenon "few-class distinctiveness". Our metric allows comparison of models and datasets 6 to 29$\times$ faster than repeated training and testing. Leveraging this insight, we extend scaled model families below the smallest published models, achieving greater efficiency at similar accuracy, for example models up to 42% smaller than YOLOv5-nano for a mobile robot task. Targeting resource-constrained applications, we demonstrate few-class model selection across mobile robot, drone, and IoT scenarios, highlighting practical gains in efficiency without sacrificing performance.

04.
arXiv (CS.LG) 2026-06-11

The ASE-LSE Disagreement Landscape: An End-to-End Characterisation of Extremes and Structural Drivers

arXiv:2605.22346v3 Announce Type: replace-cross Abstract: Two of the most widely used methods for analysing graph data, Adjacency Spectral Embedding and Laplacian Spectral Embedding, often produce different results when applied to the same graph. Yet the structural reasons behind this disagreement remain incompletely understood. This paper provides an end-to-end account of ASE-LSE latent subspace disagreement. We first prove that the two methods produce identical latent subspaces for every embedding dimension whenever the Laplacian is a scalar multiple of the adjacency matrix, and show that this scalar relationship holds if and only if the graph is either regular or bipartite biregular. This anchor result identifies a sufficient condition for perfect agreement that pins down the floor of the disagreement spectrum and supplies the baseline for the perturbation analysis. We then prove that no maximal-disagreement graph or family of graphs exists: the disagreement is always strictly below its theoretical ceiling, and we exhibit a witness family demonstrating that no finite maximum is attainable, so the disagreement landscape has no maximiser. With both endpoints established, we derive a Regularity Departure Bound whose two terms isolate degree heterogeneity and eigengap as the primary structural factors influencing disagreement in the middle regime. Empirical validation across thousands of simulated graphs confirms the mechanisms predicted by the bound: heterogeneity pushes disagreement up, eigengap suppresses it, and their joint ratio emerges as a unified predictor of ASE-LSE disagreement, suggesting when the two embeddings can be treated as interchangeable and when they cannot.

05.
arXiv (CS.AI) 2026-06-15

Korzhinskii-Net: Physics-Informed Neural Network for Sub-Surface Mineral Prospectivity Modelling

作者:

arXiv:2606.13695v1 Announce Type: cross Abstract: Mineral prospectivity modelling (MPM) underpins exploration economics, yet most operational pipelines reduce to data-driven classifiers trained on shallow surface proxies. Such models are blind to the subsurface physics that actually localises ore: heat advection, fluid flow, and lithology-dependent precipitation. We present Korzhinskii-Net, a 2-D radial physics-informed neural network (PINN) that couples Darcy flow, advective-diffusive heat transport, and a softplus-saturated reaction rate into a single differentiable forward model, weakly supervised by surface and remote-sensing proxies. The network is named after Dmitri S. Korzhinskii (1899-1985), whose theory of infiltration metasomatism provides the physical scaffold. We evaluate Korzhinskii-Net on five ore provinces spanning four commodity classes – Norilsk (Ni-Cu-PGE), Pechenga (Ni-Cu sulphide), Udokan (sandstone-hosted Cu), Sukhoi Log (orogenic Au), and Mirny (kimberlitic diamond) – under a fair, leakage-controlled 5-fold cross-validation protocol with hard ring-shaped negatives. Korzhinskii-Net attains a mean PR-AUC of 0.885 versus 0.281 for the strongest classical baseline (gradient boosting), and a mean fractional rank of 0.019 versus 0.413. The improvement is consistent across all five provinces and four commodity systems, suggesting that physics-informed differentiable simulators, even when constrained only by global open-data proxies, can recover localisation patterns that pure feature-based learners systematically miss. We release the full pipeline and evaluation harness as open source.

06.
arXiv (CS.LG) 2026-06-19

Alzheimer's Disease Diagnosis using a Multimodal Approach with 3D MRI and PET

arXiv:2606.20037v1 Announce Type: new Abstract: Alzheimer's disease (AD) is an irreversible neurodegenerative disorder and a leading cause of death worldwide. Early diagnosis plays an important part especially at the Mild Cognitive Impairment stage, where timely intervention can help slow its progression before it advances to AD. Neuroimaging data, like Magnetic Resonance Imaging (MRI) and Positron Emission Tomography (PET) scans, can help detect brain changes early by providing structural and functional brain changes related to the disease. Yet, many multimodal models still fuse MRI and PET with static concatenation and apply identical computation to all subjects, which limits robustness to patient/site heterogeneity and can waste computation. To address these limitations, we present the first study of combining 3D convolutional feature extractors with three fusion strategies - concatenation, Gated Multimodal Unit (GMU), and gated self-attention - and a sparsely gated Mixture-of-Experts (MoE) classifier that performs input-adaptive routing, activating only the most informative experts per case. Finally, we utilize Grad-CAM to visualize disease-related regions, ensuring model interpretability. Experiments are performed across three binary classification tasks (NC vs. MCI, MCI vs. AD, and NC vs. AD). Results show that GMU achieves accuracies of 80.46 % (NC vs. MCI) and 95.47 % (NC vs. AD), while gated self-attention attains 82.08 % on MCI vs. AD. Ablations show that removing the MoE consistently degrades accuracy across all tasks. These findings underscore the value of input-adaptive, multimodal modeling for AD diagnosis by leveraging the complementary nature of MRI and PET.

07.
arXiv (CS.AI) 2026-06-18

A Variational Framework for LLM Generator-Regulator Games

作者:

arXiv:2606.18424v1 Announce Type: cross Abstract: This paper develops a variational framework for regulated language generation. Starting from autoregressive token sampling, we derive the induced distribution over complete messages and relate it to an entropy-regularized Gibbs law. Regulation is modeled as an optimal discriminator whose convex-dual value is an f-divergence, and the generator-regulator interaction is formulated as a saddle-point problem. The framework applies to moderation, censorship, AI deception detection, compliance auditing, phishing defense, and manipulation control, where regulation concerns a distribution over possible messages rather than a single output. The equilibrium clarifies the tradeoff among utility, entropy, regulatory alignment, and finite-length detectability. Two finite-vocabulary case studies, censorship filtering and phishing defense, illustrate how the theory can be evaluated through utility, entropy, divergence, receiver-side scores, and detection probability.

08.
medRxiv (Medicine) 2026-06-15

Longitudinal monitoring exposes correlated temporal protein variations in the female plasma proteome

The plasma proteome is a valuable resource for assessment of the physiological state of the donor. Containing hundreds of different proteins of variable concentrations, it displays substantial inter-donor differences in individual protein levels, making each plasma proteome highly donor-specific. Less is known about intra-donor variability in the plasma proteome over time, although such variations may even be more indicative of a changing physiological state. Here we assessed data obtained from the TIMES cohort, comprising 51 apparently healthy participants monitored monthly over 12 months, focusing especially on temporal variations in blood protein levels. Most strikingly, we observed that several women in this cohort revealed strongly correlated temporal variations in their plasma proteome, including most notably PZP, SHBG, FETUB, AGT, SERPINA6, SERPINA7, CP, APOL1 and KNG1, with levels sometimes fluctuating by more than 20-fold. In contrast, such variations were absent in men. Some of the fluctuating proteins have been known to be hormone-regulated (e.g., PZP, SHBG), but for others this was not yet fully clear. Through the tight co-variation observed for these proteins in the plasma proteome of women, we can conclude that all these proteins are similarly hormone regulated. The findings reported here not only corroborate previous studies showing estrogen-dependent regulation of several plasma proteins, but also extend this category to include also CP, APOL1, and KNG1. As these latter have been often proposed as candidate biomarkers, they should be validated in sex-balanced cohorts and interpreted with caution, especially in large-scale plasma proteomics studies wherein often only one or a few sampling time points are measured per donor.

09.
arXiv (CS.CL) 2026-06-15

Same-Origin Policy for Agentic Browsers

Agentic browsers integrate autonomous AI agents into web browsers, enabling users to accomplish web tasks through natural-language instructions. The same-origin policy (SOP) is a fundamental browser security mechanism that prevents unauthorized automated cross-origin data flows induced by scripts. However, whether SOP remains effective in agentic browsers is an open question that has not been systematically studied. In this work, we bridge this gap. We first observe that an agentic browser can itself serve as an automated channel for cross-origin data flows, potentially leading to SOP violations. To investigate this phenomenon, we construct SOPBench, a benchmark for evaluating SOP violations in agentic browsers. Our evaluation shows that existing agentic browsers frequently violate SOP, both in benign settings and under attacks. To address this problem, we propose SOPGuard, an SOP enforcement mechanism tailored to agentic browsers. We implement SOPGuard in BrowserOS, an open-source agentic browser. Extensive evaluations demonstrate that SOPGuard effectively enforces SOP while preserving utility and incurring only a small runtime overhead. Our code and data are available at https://github.com/wxl-lxw/BrowserOS-SOPGuard.

10.
arXiv (CS.CV) 2026-06-16

Attention-Based Prototype Calibration for Multi-Rater Few-Shot Medical Image Segmentation

Few-shot medical image segmentation methods typically assume a single ground-truth annotation, overlooking systematic variability across expert raters commonly observed in clinical datasets. We propose an attention-based prototype calibration framework for few-shot multi-rater segmentation that models rater-specific deviations from a consensus representation in prototype space. A lightweight yet principled attention operator directly refines rater prototypes without modifying the backbone feature extractor, making the approach fully compatible with existing prototype-based few-shot segmentation methods. This design preserves semantic consistency while enabling personalized segmentation outputs with minimal computational overhead. Experiments on multi-rater medical imaging datasets demonstrate consistent improvements over baseline prototype approaches, highlighting the effectiveness of structured prototype calibration for modeling annotation variability. Our code is available at https://github.com/truong2710-cyber/JAPC.

11.
arXiv (CS.CL) 2026-06-12

Constrained Semantic Decompression in LLMs through Persian Proverb-Conditioned Story Generation

Transforming a dense, abstract proverb into an engaging and morally faithful narrative requires deep cultural understanding and robust semantic grounding. We frame this problem as a constrained semantic decompression task and study proverb-conditioned story generation as a testbed for abstraction-to-realization in large language models (LLMs). Focusing on Persian, we introduce the Proverb Aligned Narrative Dataset (PAND), pairing proverbs with human-written stories and explicit meanings. By a hybrid evaluation framework that combines human-calibrated LLM-as-a-Judge with structural metrics, we analyze model behavior across multiple prompting regimes. Our findings reveal a persistent decompression gap: current LLMs often achieve strong surface-level fluency while failing to faithfully instantiate the underlying moral and causal structure encoded in proverbs. We further show that explicit reasoning and iterative refinement can partially mitigate these failures, suggesting that many decompression errors arise from difficulties in translating abstract meaning into narrative form rather than a complete lack of relevant knowledge. Our proposed task naturally extends to other forms of compressed cultural knowledge.

12.
medRxiv (Medicine) 2026-06-15

Instrumental Activities of Daily Living in Older Adults with Epilepsy: A Cross-Sectional and Longitudinal Multicenter Study

Objective: Instrumental activities of daily living (IADLs) represent a critical but understudied measure of day-to-day function in persons with epilepsy(PWE). In the multicenter Brain Aging and Cognition in Epilepsy (BrACE) study of PWE aged greater than or equal to 55 years, we examined the proportion, clinical correlates, epilepsy-related predictors, and longitudinal trajectory of IADL impairment. Methods: IADLs were assessed using the Functional Activities Questionnaire (FAQ; range=0 to 30; higher=more impaired); a FAQ greater than or equal to 2 defines MCI-level impairment, and a FAQ greater than or equal to 5 defines dementia-level functional impairment. Multivariable logistic regression identified predictors of baseline function. Global cognition (Montreal Cognitive Assessment [MoCA]), individual cognitive measures, and quality of life (QOL) were compared between the impaired and unimpaired groups. Linear regression evaluated predictors of longitudinal functional decline. Results: Of 57 participants (mean age=66.6 years; female=52.6%), 38.6% (n=22) had MCI-level functional impairment and 17.5% (n=10) had dementia-level functional impairment. In univariate analyses, worse FAQ scores were associated with lower education, higher area deprivation index, early-onset epilepsy (EOE less than 60 years), antiseizure medication polytherapy, and epilepsy localization. In multivariable analysis, temporal lobe epilepsy (OR=4.46, 95% CI=1.09, 21.83,p=0.047), EOE(OR=7.14, 95% CI=1.16, 59.97, p=0.046), and lower education(OR=0.70,95% CI=0.49, 0.93, p=0.025) remained independently associated with baseline MCI-level functional-impairment. Lower education (OR=0.55,95% CI=0.29, 0.84, p=0.021) was the only factor associated with dementia-level IADL-impairment. IADL-impaired participants demonstrated lower verbal memory scores (adjusted p=0.041) and MoCA scores (adjusted p

13.
medRxiv (Medicine) 2026-06-17

Silent Manipulation of Mental Health Treatment Recommendations from a Large Language Model

Importance. Large language models (LLMs) increasingly inform mental health decisions by patients and clinicians. Inference-time activation steering can shift model behavior on a target dimension without altering weights or prompts and without disclosure to users, allowing treatment recommendations to be silently changed for commercial or ideological reasons. Objective. To determine whether directional activation steering can shift an open-weights LLM's depression treatment recommendations. Design, Setting, and Participants. This non-human subjects study applied directional activation steering to an open-weights LLM (DeepSeek V4 Flash) responding to 12 depression-advice scenarios (4 favoring medication, 4 favoring avoidance, 4 neutral), generated at 30 amplitudes from -1.5 to +1.5 in 0.1 increments plus an unsteered baseline. Exposures. A single steering direction contrasting antidepressant medication with self-directed approaches (diet, exercise, meditation, dietary supplements), constructed from 16 paired training prompts and applied at the attention output of every transformer block; weights and system prompt were held constant. Main Outcomes and Measures. The extent to which medication and four self-care categories were addressed, scored 0 to 3 by a human-validated LLM rater (Claude Opus 4.7), the medication-versus-self-care balance, and clinician referral, estimated per unit of amplitude using mixed-effects models with a scenario random intercept. Results. Across 372 generations, steering produced a graded, dose-dependent shift in the medication-versus-self-care balance, which declined by 0.32 per unit of amplitude (beta=-0.32; 95% CI, -0.39 to -0.25; P < .001); medication extent fell and self-care extent rose. The shift was largest for scenarios with no stated treatment preference (beta = -0.44; 95% CI, -0.54 to -0.34; P < .001). A clinician referral appeared in 322 of 372 responses (87%) and did not vary with steering amplitude (P = .63). Conclusions and Relevance. In this open-weights LLM providing depression treatment information, inference-time activation steering shifted treatment recommendations without altering weights, prompt structure, or safety outputs, with the largest effect among users expressing no treatment preference. These findings suggest a need for LLM disclosure standards and independent auditing as such models inform clinical decisions.

14.
bioRxiv (Bioinfo) 2026-06-14

Generative design of antigen-specific T-cell receptor sequences with a conditional diffusion model

T cell receptor (TCR)-based immunotherapy holds immense potential for treating cancers and infectious diseases, where highly antigen-specific TCR recognition is crucial for adaptive immunity against tumors and pathogens. Engineering or de novo generation of the complementarity-determining region 3 (CDR3) loops of TCRs using artificial intelligence offers a powerful alternative to designing reactive TCRs rather than laborious experimental screening. However, current in silico approaches are constrained by weak conditional guidance, limited flexibility, and a lack of rigorous functional validation. To address these limitations, we introduce TCRDiff, a generative diffusion framework for designing antigen-specific TCRs conditioned on peptide-MHC (pMHC) targets and germline-encoded variable genes. By leveraging pre-trained knowledge from massive T-cell repertoires and TCR-pMHC recognition data, TCRDiff generates CDR3{beta} sequences with state-of-the-art fidelity to native binding TCRs through a denoising diffusion process. Furthermore, incorporating the interface geometry features generated TCR-pMHC complexes with superior structural plausibility. As a proof of concept, we deployed TCRDiff in a systematic pipeline to design candidate TCRs for immunotherapy. In vitro activation assays validated that TCRDiff-generated TCRs specifically recognize the MAGE-A3 epitope with minimized off-target cross-reactivity. Together, TCRDiff establishes a powerful, validated computational paradigm to accelerate the development of TCR-based immunotherapies.

15.
arXiv (math.PR) 2026-06-19

Model-independent upper bounds for the prices of Bermudan options with convex payoffs

arXiv:2503.13328v3 Announce Type: replace-cross Abstract: Suppose $\mu$ and $\nu$ are probability measures on $\mathbb{R}$ satisfying $\mu \leq_{cx} \nu$. Let $a$ and $b$ be convex functions on $\mathbb{R}$ with $a \geq b \geq 0$. We are interested in finding $$\sup_{\mathbf{M}} \sup_{\tau} \mathbb{E}^{\mathbf{M}} \left[ a(X) I_{ \{ \tau = 1 \} } + b(Y) I_{ \{ \tau = 2 \} } \right] $$ where the first supremum is taken over consistent models $\mathbf{M}$ (i.e., filtered probability spaces $(\Omega, \mathbf{F}, \mathbb{F}, \mathbb{P})$ such that $Z=(z,Z_1,Z_2)=(\int_{\mathbb{R}} x \mu(dx) = \int_{\mathbb{R}} y \nu(dy), X, Y)$ is a $(\mathbb{F},\mathbb{P})$ martingale, where $X$ has law $\mu$ and $Y$ has law $\nu$ under $\mathbb{P}$) and $\tau$ in the second supremum is a $(\mathbb{F},\mathbb{P})$-stopping time taking values in $\{1,2\}$. Our contributions are first to characterise and simplify the dual problem, and second to completely solve the problem under some structural assumptions on the measures $\mu$ and $\nu$ (namely that $\mu$ and $\nu$ are absolutely continuous probability measures that satisfy the Dispersion Assumption). A key finding is that the canonical set-up in which the filtration is that generated by $Z$ is not rich enough to define an optimal model and additional randomisation is required. This holds even though the marginal laws $\mu$ and $\nu$ are atom-free. The problem has an interpretation of finding the robust, or model-free, no-arbitrage bound on the price of a Bermudan option with two possible exercise dates, given the prices of co-maturing European options.

16.
arXiv (CS.CV) 2026-06-15

MMRINet: Efficient Mamba-Based Segmentation with Dual-Path Refinement for Low-Resource MRI Analysis

Automated brain tumor segmentation in multi-parametric MRI remains a critical yet underserved challenge in resource-constrained clinical settings, where deep 3D networks requiring high-end GPUs are not viable. This is particularly acute across sub-Saharan Africa (SSA), where low-field scanners, heterogeneous patient demographics, and severe data scarcity compound the difficulty of applying standard deep learning pipelines. We present MMRINet, a lightweight segmentation architecture purpose-built for these constraints. At its core, MMRINet replaces quadratic-complexity self-attention with linear-complexity Mamba state-space models, enabling efficient long-range volumetric context modeling without the computational overhead of Transformer-based approaches. We combine two lightweight refinement components:Dual-Path Feature Refinement (DPFR), which extracts complementary detail and contextual representations to improve feature diversity under limited data, and Progressive Feature Aggregation (PFA), which hierarchically fuses multi-scale decoder outputs for sharper segmentation boundaries. Evaluated on the BraTS-Lighthouse SSA 2025 challenge dataset, comprising 3D MRI scans from Nigerian clinical sites, MMRINet achieves an average Dice score of 0.752 and an average HD95 of 12.23 mm with only ~2.5M parameters, outperforming all evaluated baselines, including UNETR, Swin-UNETR, SegMamba, and SegResNet3D. These results indicate that strong validation-set segmentation performance can be achieved with substantially reduced computation, offering a practical step toward AI-assisted neuro-oncology in low-resource clinical environments. Our GitHub repository can be accessed here: BioMedIA-MBZUAI/MMRINet.

17.
arXiv (CS.LG) 2026-06-18

Concept Modulation Models: A Unified Framework for Identifiability and Extrapolation

arXiv:2606.18509v1 Announce Type: new Abstract: Reliable generalization in conditional latent variable models requires understanding both identifiability and extrapolation: how observed variation across attributes determines latent structure, and how that structure determines distributions at unseen attributes. However, existing identifiability and extrapolation guarantees are largely model-specific, with separate analyses in nonlinear ICA, causal representation learning, perturbation modeling, and related conditional latent variable models. We introduce concept modulation models (CMMs), an attribute-indexed class of conditional generative models with structure $A\to \Lambda \to C\to X$, where attributes select modulators, modulators induce latent concept laws, and concepts generate observed features. CMMs lift transition-based identifiability to conditional settings by showing that feature agreement on observed attributes induces a latent concept transition constrained by the CMM class. We express these constraints through attribute potentials, log-density ratios between attribute-conditioned concept laws, separating the generic lifting step from model-specific rigidity arguments. The same potentials control extrapolation: agreement at unseen attributes holds exactly when the transported attribute-potential identities extend to those attributes. This yields algebraic extrapolation criteria, identifies the common potential-based proof objects behind several existing identifiability and extrapolation results, and, when combined with the model-specific rigidity arguments in those works, recovers their stated conclusions.

18.
arXiv (CS.LG) 2026-06-15

Trust but Verify: Mitigating Medical Hallucinations via Post-Hoc Adversarial Auditing and Multi-Agent Feedback Loops

arXiv:2606.14149v1 Announce Type: new Abstract: Large Language Models (LLMs) are increasingly deployed in healthcare settings, yet their tendency to hallucinate poses risks when clinical decisions are involved. This study examine whether LLMs recommend recently banned or withdrawn pharmaceuticals when answering clinical questions and tests an agent-based method for reducing such errors. We developed a five-agent "Trust but Verify" system using a single LLM backbone. To measure regulatory knowledge obsolescence, we created an adversarial dataset of 103 clinical MCQs where historically correct answers now refer to banned substances. This scale ensures statistical significance across various therapeutic classes. We evaluated three open-access model families (GPT-OSS, Llama-3, Falcon-3) under vanilla and agentic conditions. Performance was measured via pointwise score, label accuracy, Hallucination Error Rate (HER), and Component Fidelity (CF) score. We also observed clinical safety regression in proprietary models. In default configurations, all models showed high hallucination rates, consistently selecting banned drugs that matched training data patterns. Our proposed agentic architecture reduced HER by approximately 53% across models. Pointwise scores shifted from -0.25 (unsafe recommendation) toward 0.0 (appropriate refusal). The safety audit intercepted dangerous outputs even when models' parametric knowledge favored the banned substance. The proposed multi-agent framework offers a model-agnostic method for enforcing regulatory compliance that prioritizes patient safety over fluent text generation. Our work demonstrates a practical approach for deploying autonomous AI systems in safety-critical healthcare settings. It shows how real-time regulatory data can be integrated into LLM pipelines to support clinical decision-making.

19.
arXiv (math.PR) 2026-06-11

Asymptotic analysis of the finite predictor for fractional Gaussian noise

arXiv:2504.01562v2 Announce Type: replace-cross Abstract: This paper proposes a new approach to the asymptotic analysis of the finite predictor for stationary sequences. Our method yields the exact asymptotics of both the relative prediction error and the partial correlation coefficients. The underlying assumptions are analytic in nature, making the approach applicable to processes with long-range dependence. The ARMA-type process driven by fractional Gaussian noise (fGn), which had previously remained elusive, is used as a case study.

20.
arXiv (CS.AI) 2026-06-12

Functional Cache Grafting: Robust and Rapid Code-Policy Synthesis for Embodied Agents

arXiv:2606.13097v1 Announce Type: cross Abstract: Code-writing large language models (CodeLLMs) generate executable code policies for embodied agents by translating natural language goals and environmental constraints into structured control programs. However, policy generation in open-domain embodied environments suffers from two fundamental limitations: (i) delayed decoding caused by repetitive prefill computation over long prompts, and (ii) limited robustness due to fully generative decoding, which often produces API mismatches, missing safety guards, and unstable control logic. To address these limitations, we present FCGraft, a Functional Cache Grafting framework. FCGraft maintains a library of function-level validated code skeletons and their associated prompt-level Transformer key-value (KV) caches, and synthesizes new policies by retrieving relevant functions and grafting their KV caches when a new task is provided. Given retrieved function caches, FCGraft performs cache grafting via stitching, which composes cached function segments into a composite policy, and patching, which locally adapts only the necessary code regions to satisfy task-specific parameters and constraints with minimal additional decoding. By eliminating redundant prefill computation, this approach reduces generation latency, while reusing validated control structures improves robustness over prompt-level caching methods RAGCache, achieving 18.31% higher task success rate and 2.3x faster policy synthesis.

21.
arXiv (CS.LG) 2026-06-12

LLM-ODDR: A Large Language Model Framework for Joint Order Dispatching and Driver Repositioning

arXiv:2505.22695v2 Announce Type: replace Abstract: Ride-hailing platforms face significant challenges in optimizing order dispatching and driver repositioning operations in dynamic urban environments. Traditional approaches based on combinatorial optimization, rule-based heuristics, and reinforcement learning often overlook driver income fairness, interpretability, and adaptability to real-world dynamics. To address these gaps, we propose LLM-ODDR, a novel framework leveraging Large Language Models (LLMs) for joint Order Dispatching and Driver Repositioning (ODDR) in ride-hailing services. LLM-ODDR framework comprises three key components: (1) Multi-objective-guided Order Value Refinement, which evaluates orders by considering multiple objectives to determine their overall value; (2) Fairness-aware Order Dispatching, which balances platform revenue with driver income fairness; and (3) Spatiotemporal Demand-Aware Driver Repositioning, which optimizes idle vehicle placement based on historical patterns and projected supply. We also develop JointDR-GPT, a fine-tuned model optimized for ODDR tasks with domain knowledge. Extensive experiments on real-world datasets from Manhattan taxi operations demonstrate that our framework significantly outperforms traditional methods in terms of effectiveness, adaptability to anomalous conditions, and decision interpretability. To our knowledge, this is the first exploration of LLMs as decision-making agents in ride-hailing ODDR tasks, establishing foundational insights for integrating advanced language models within intelligent transportation systems. While the current framework incurs higher computational costs than traditional methods, we show that parallel decomposition and model distillation can reduce latency to production-viable levels for deployment.

22.
arXiv (CS.LG) 2026-06-18

Anomaly Detection for Sparse and Irregular Multivariate Time Series with Latent SDEs

arXiv:2606.18898v1 Announce Type: new Abstract: Multivariate time series anomaly detection (MTSAD) is critical for a wide range of application areas, such as industrial monitoring, cybersecurity, or healthcare. Real-world data is often sparse, irregularly sampled or partially observed, yet existing methods assume uniformly sampled time series. We propose a generative approach based on Latent SDEs that projects the observed time series on a continuous-time stochastic dynamical system, directly being able to handle missing observations and irregular sampling, while also naturally capturing possible cyclic behavior that many real-world use cases inherently possess. Experiments on six anomaly benchmark datasets show that our proposed method ranks first among state-of-the-art baselines. We further demonstrate that our method remains robust under severe data sparsity, while performance significantly degrades for the tested baseline methods. These results highlight latent SDEs as a natural inductive bias for anomaly detection in multivariate time series, especially in presence of real-world irregularities.

23.
arXiv (quant-ph) 2026-06-17

Tensor network compression using fluid dynamics as a testbed: Analytical foundations in one dimension

arXiv:2606.17064v1 Announce Type: cross Abstract: High performance computers produce extreme-scale data sets that require sampling or compression if they are to be used to their full potential. Existing data compression techniques typically exploit features such as sparsity in the data, homogeneity in the data, or {\it a priori} knowledge of what subsets of data are of most interest. Fluid dynamics data in general do not exhibit these features and so are attractive test beds for generic compression techniques that are objective, robust, and tuneable with respect to information lost due to compression. Presented here is a method based on tensor networks, specifically matrix product states or tensor trains, that meets these requirements. The method is demonstrated for compression in one-dimension and is extensible to higher dimensionality. Lossless compression is demonstrated for random Fourier series for sufficiently high bond dimension of the tensor network, with the memory required to store the tensor network scaling directly proportional to the bond dimension. The lossy compression exhibited at lower bond dimension can be well within the relative error of many fluid simulations. The compression algorithm is tested for the time evolution of Burger's equation with excellent results. We additionally demonstrate the capability to perform computations in the compressed form through a tensor network periodic convolution that can be orders of magnitude faster than using fast Fourier transforms and the convolution theorem. In addition to being an attractive method for working with data sets generated by existing computers, the tensor network methods utilised are directly translatable to the emerging paradigm of quantum computing.

24.
arXiv (CS.AI) 2026-06-15

Distributional Biases in Post-Training: A Markovian Analysis of Reasoning Trajectories

arXiv:2511.07368v3 Announce Type: replace-cross Abstract: Foundation models exhibit broad knowledge but limited task-specific reasoning, motivating post-training strategies such as RL with verifiable rewards (RLVR) and test-time scaling (TTS). While recent work highlights the role of exploration in improving pass@K, empirical evidence points to a paradox: RLVR and ORM/PRM typically reinforce existing paths rather than expanding the reasoning scope, raising the question of why exploration helps if no new patterns emerge. To reconcile this paradox, we adopt the perspective of Kim et al. (2025), viewing easy (e.g., simplifying a fraction) versus hard (e.g., discovering the some symmetry) reasoning steps as low versus high probability Markov transitions. In this tractable model, pretraining corresponds to tree-graph discovering, while post-training corresponds to CoT reweighting. We provably show that, both RLVR and ORM/PRM would favor heavily to several high-probability paths, and thereby forget rare-but-crucial CoTs. Building on this, we further prove that exploration strategies such as rejecting easy instances and KL regularization help preserve rare CoTs. Empirical simulations corroborate our theoretical results.

25.
arXiv (CS.CV) 2026-06-16

VigilFormer: Deformable Attention for Video Anomaly Detection with Causal Risk Inference

作者:

Video anomaly detection in surveillance settings must balance detection accuracy against real-time throughput, a tension that existing methods address either through stronger feature extractors or more efficient architectures, but rarely both. We present VigilFormer, a unified framework that combines deformable spatio-temporal attention with causal temporal modeling to detect anomalies in untrimmed surveillance video. The proposed Deformable Spatio-Temporal Encoder (DSTE) attends to a sparse set of informative locations across frames, avoiding the quadratic cost of dense attention while retaining the ability to capture irregular motion patterns. A Causal Anomaly Classifier (CAC) applies dilated causal convolutions over snippet-level features and optimizes a contrastive multiple-instance learning objective that separates anomalous and normal representations without frame-level labels. To meet deployment constraints, an Adaptive Confidence Scheduler (ACS) dynamically skips low-information frames at inference time, reducing redundant computation in static scenes. Evaluated on UCF-Crime, ShanghaiTech, and CUHK Avenue, VigilFormer achieves AUC scores of 87.83%, 97.21%, and 89.74% respectively, at 41.5 FPS on a single GPU, outperforming recent weakly-supervised methods in both accuracy and speed.