Academic Intelligence · Curated Daily

探索全球前沿学术脉络

AcademicHub 汇聚顶级期刊与预印本平台的实时文献。定制您的专属科研雷达,利用大语言模型自动生成交叉领域文献分析简报。

01.
arXiv (CS.CL) 2026-06-12

ArogyaSutra: A Multi-Agent Framework for Multimodal Medical Reasoning in Indic Languages

Multimodal Large Language Models (MLLMs) have shown promising reasoning capabilities in general domains, yet their performance remains limited in specialized settings such as healthcare, especially in multilingual and low-resource scenarios. This gap is critical in regions like rural India, where patients often express complex medical queries in native Indic languages and rely on multimodal inputs such as medical images. Existing English-centric MLLMs struggle to support such use cases, limiting equitable access to AI-driven healthcare assistance. To address this challenge, we introduce ArogyaBodha, a large-scale multilingual multimodal medical question-answer dataset constructed from eight heterogeneous sources, covering 31 body systems, six imaging modalities, and 21 clinical domains across English and seven major Indian languages. We further propose ArogyaSutra, an actor-critic-based multi-agent framework that integrates tool grounding with dual-memory mechanisms for step-wise, reasoning-aware decision making, and uses stored actor-critic simulation trajectories for distillation. Experiments show that our dataset and framework improve multilingual medical reasoning accuracy across all Indic languages, with ablations validating the contribution of each component. The source code and dataset are available at: https://iitp-cse.github.io/ ArogyaSutra/

02.
arXiv (quant-ph) 2026-06-19

Efficient classical representation and quantum state preparation of complete active space wavefunctions

作者:

arXiv:2606.19457v1 Announce Type: new Abstract: Quantum computers promise to solve the electronic structure problem for a large class of molecules. However, the performance of relevant quantum algorithms hinges on preparing initial states with substantial overlap with the target eigenvector. For classically challenging molecules with strong electron correlation, starting from multi-reference states, such as complete active space (CAS) wavefunctions is necessary. Unfortunately, the most advanced state preparation protocols applied to such states result in a gate complexity that scales exponentially with the active space size $d$. In fact, even encoding a CAS state classically is traditionally believed to be intractable for chemically relevant systems. Here, we draw insights from the recently introduced Quantum Paldus Transform (QPT) to show that there exists an efficient classical representation of CAS states and to design a new state preparation routine outperforming previous ones. The QPT represents a transformation from the Fock basis to a friendlier symmetry-adapted basis. Our main contribution consists in showing that CAS states expanded in this basis can efficiently be represented as a matrix product state (MPS) with a bond dimension scaling as $O(d^2)$. One can then efficiently load the MPS on a quantum computer and use the inverse QPT to transform the state to the Fock basis. Moreover, our method can easily be extended to the efficient preparation of CAS states in first quantisation with similar complexity. Crucially, we demonstrate that the complexity of both state preparation protocols only grows polynomially as $O(d^3)$ , which constitutes to the best of our knowledge an exponential improvement over the state of the art.

03.
arXiv (CS.CL) 2026-06-16

T-Mem: Memory That Anticipates, Not Archives

Long-term memory is essential for conversational agents to remain coherent across extended dialogues, follow through on commitments made many sessions earlier, and adapt their behaviour to each user. Current LLM-backed long-term conversational memory, however, is reachability-bounded by the similarity between a query and stored content, both lexical and dense-vector. The approach is effective when query and memory share surface features such as wording or named entities (we call this descriptive). But it misses another, equally valuable class of cases, where query and memory do not share surface features and are tied only by a latent semantic arc (associative). On this regime prevailing long-term memory systems collectively fail. Covering this other half is what allows an assistant, for the first time, to actively draw on past dialogue as a semantic asset. On the memory side, this is the engineering counterpart of what cognitive science calls episodic future thinking: rehearsing past experience for the future contexts under which it will need to be found. We call these write-time rehearsals triggers. We propose T-Mem, the first long-term conversational memory architecture that covers both descriptive and associative recall. At each of two evidence granularities, single facts and full exchanges, T-Mem instantiates one descriptive trigger family and one associative trigger family, so that every memory remains reachable from both surface-similar and relevance-bound queries. As empirical validation, T-Mem reaches state-of-the-art on both LoCoMo and LoCoMo-Plus.

04.
arXiv (math.PR) 2026-06-16

A tree-free approach to 3D Yang-Mills Langevin dynamic. Analytic estimates and the existence of a model for a regularity structure

arXiv:2605.14616v2 Announce Type: replace Abstract: Using the multi-index approach to regularity structures due to F. Otto et al., we construct a regularity structure and a model for it associated to the stochastic Langevin equation for the 3D Euclidean Yang-Mills functional. For the model we also obtain global stochastic and global pointwise weighted Besov type estimates which hold almost surely. The model is defined as a limit of a sequence of smooth models introduced with the help of a mollified noise. When the mollification is removed the sequence converges in a certain topology defined with the help of the stochastic estimates. To obtain these results we develop the multi-index approach for systems of equations with vector-valued white noises. This project is motivated by the problem for constructing 3D Euclidean Yang-Mills measure and by the earlier results of the author on the related problem of canonical quantization of the Yang-Mills field on the Minkowski space.

05.
arXiv (quant-ph) 2026-06-11

Raw-Curve Quantum Fingerprints: A Mahalanobis Authentication Framework with Drift Early Warning and Adversarial Detection

arXiv:2606.11644v1 Announce Type: new Abstract: Quantum cloud platforms are poised to deliver powerful computing capabilities, but users have no direct means to verify which physical device executes their workload. This lack of transparency enables hardware substitution attacks, where a malicious adversary could redirect a job to a substituted or inferior processor. We present a general authentication framework that addresses this problem by constructing multi-dimensional quantum fingerprints from raw measurement data. Without any curve fitting, we directly concatenate the raw statistics of complementary experiments into a high-dimensional feature vector that preserves subtle device-specific information. A Mahalanobis nearest-neighbor classifier achieves 100\% benign authentication accuracy on three superconducting processors over a three-week chronological split. The classifier naturally yields an authentication confidence $C_{\mathrm{claimed}}$ which reveals device-specific safety margins and motivates per-device alert thresholds. We assess the framework's robustness under two distinct scenarios. Under additive isotropic Gaussian noise, $C_{\mathrm{claimed}}$ decays predictably at a rate explained by inverse covariance traces, enabling an early warning mechanism. Against white-box adversarial perturbations, the same confidence threshold detects $L_2$ targeted attacks with near-perfect success and reveals device-dependent empirical thresholds for $L_\infty$ attacks, while untargeted and sparse attacks are ineffective. The proposed framework thus unifies fingerprint extraction, drift-resilient authentication, proactive health monitoring, and adversarial defense, offering a practical step toward trustworthy quantum cloud computing.

06.
arXiv (CS.LG) 2026-06-15

Adaptive Identification and Modeling of Clinical Pathways with Process Mining

arXiv:2512.03787v2 Announce Type: replace Abstract: Clinical pathways are specialized healthcare plans that model patient treatment procedures. They are developed to provide criteria-based progression and standardize patient treatment, thereby improving care, reducing resource use, and accelerating patient recovery. However, manual modeling of these pathways based on clinical guidelines and domain expertise is difficult and may not reflect the actual best practices for different variations or combinations of diseases. We propose a two-phase modeling method using process mining, which extends the knowledge base of clinical pathways by leveraging conformance checking diagnostics. In the first phase, historical data of a given disease is collected to capture treatment in the form of a process model. In the second phase, new data is compared against the reference model to verify conformance. Based on the conformance checking results, the knowledge base can be expanded with more specific models tailored to new variants or disease combinations. We demonstrate our approach using Synthea, a benchmark dataset simulating patient treatments for SARS-CoV-2 infections with varying COVID-19 complications. The results show that our method enables expanding the knowledge base of clinical pathways with sufficient precision, peaking to 95.62% AUC while maintaining an arc-degree simplicity of 67.11%.

07.
arXiv (CS.LG) 2026-06-16

Machine Learning and the Random Walk Puzzle: Forecasting the CAD/USD Exchange Rate with Expanding Window Evaluation and SHAP Interpretability

arXiv:2606.15058v1 Announce Type: new Abstract: This study examines whether machine learning (ML) models can outperform the naive random walk benchmark in forecasting the monthly USD/CAD exchange rate. Using daily data from the Bank of Canada spanning January 2017 to May 2026, resampled into 113 monthly observations, five ML models are evaluated: linear regression, random forest, gradient boosting, XGBoost, and AdaBoost. These models are benchmarked against the naive random walk model and exponential smoothing with Holt-Winters seasonality (ETS). All models are evaluated using an expanding-window framework to maintain strict out-of-sample integrity, and forecast-accuracy differences are assessed using the Diebold-Mariano (DM) test. Structural break detection identifies four significant breakpoints in the series, corresponding to the escalation of the US-China trade war in 2018, the COVID-19 economic recovery in 2020, the peak of the Bank of Canada rate-hiking cycle in 2022, and the start of the Bank of Canada rate-cutting cycle in 2024. SHAP, or Shapley Additive Explanations, analysis is applied to interpret the drivers of the best-performing ML model. The results show that the naive random walk model remains a formidable benchmark. Linear regression is the only model that statistically outperforms the naive random walk model, with a DM statistic of 3.0585 and a p value of 0.0071, whereas the ML ensemble models show only marginal differences. Random Forest with an expanding-window framework achieves the lowest MAPE of 1.17 percent among all models except the random walk. SHAP analysis confirms that short-term lags, particularly lag1 and lag2, and recent rolling means dominate predictions, consistent with the near-random-walk behavior of exchange rates.

08.
arXiv (CS.AI) 2026-06-16

Knowledge-Based Zero-Replay Debugging of Multi-Agent LLM Traces

arXiv:2606.14805v1 Announce Type: cross Abstract: Reliable operation of multi-agent large language model (LLM) systems depends on debugging long execution traces, where the few causally decisive events are buried in unstructured logs of messages, routes, memory writes, and tool calls. The standard tool is counterfactual replay (rewind, edit, and re-run the trajectory to measure each event's effect), but its cost grows linearly with the number of candidate events, making exhaustive replay infeasible at scale. We frame trace debugging as a knowledge-based decision-support problem. Each trace is compiled into a structured event knowledge graph over routing, memory, tool-use, uncertainty, and latent evidence, and a calibrated predictor decides where a scarce replay budget should be spent. We do not propose a new replay oracle; we propose a method to predict its results without paying the replay cost. We formulate zero-replay counterfactual-effect prediction: given a trace under a fixed budget, predict which events the oracle would mark high-effect before any replay is performed. BranchPoint-Latent is a lightweight predictor over observable, structural, uncertainty, and latent features of the knowledge graph. Calibrated against a deterministic replay oracle across 37 trace families, a single learning-to-rank gradient-boosted predictor raises per-trace localization (Branch Recall@5) from 0.73 to 0.93 on held-out families at zero oracle-replay cost. Rather than claiming universal dominance, we characterize when cheap graph centrality suffices and when learned evidence is necessary. The result is an auditable, cost-efficient decision-support system for AI-reliability debugging, positioned explicitly on the cost-accuracy frontier with reproducible artifacts.

09.
arXiv (CS.CL) 2026-06-17

Guidelines for the Annotation and Visualization of Legal Argumentation Structures in Chinese Judicial Decisions

This Guideline presents a systematic and operationalizable annotation framework for representing legal argumentation structures in judicial decisions. Grounded in theories of legal reasoning and argumentation, the framework aims to reveal the logical organization of judicial reasoning and provide a reliable foundation for computational analysis. At the element level, the Guideline distinguishes between the non-propositional layer and the propositional layer. The non-propositional layer consists of two elements: Issue and Non-argumentative Component. At the propositional level, the Guideline defines four proposition types: General Normative Judgment, Particular Normative Judgment, General Factual Judgment, and Particular Factual Judgment. At the relational level, five relation types are defined to represent argumentative structures: Support, Attack, Joint, Match, and Identity. These relations capture positive and negative argumentative connections, conjunctive reasoning structures, correspondences between legal norms and case facts, and identity or semantic equivalence between propositions. The Guideline further specifies formal representation rules and visualization conventions for both basic and nested structures, enabling consistent visualization of complex argumentation patterns. In addition, it establishes a standardized annotation workflow and consistency control mechanisms to ensure the reproducibility and reliability of annotated data. By providing a clear conceptual model, formal representation rules, and practical annotation procedures, this Guideline supports large-scale analysis of judicial reasoning and future research in legal argument mining, computational modeling of legal reasoning, and AI-assisted legal analysis.

10.
medRxiv (Medicine) 2026-06-15

An epidemiological scenario for Mass Events During the World Cup

This brief work discusses potential superspreading events that may occur during the World Cup in Mexico. The study is particularly focused on the city of Guadalajara due to a large recent outbreak in January and February and insufficient vaccine coverage prior to 2026. Keywords: Superspreading; measles outbreak; branching process; individual reproduction number; World Cup

11.
arXiv (CS.CL) 2026-06-11

Beyond representational alignment with brain-guided language models for robust reasoning

The correspondence between large language models (LLMs) and the neural mechanisms underlying human higher-order cognition remains insufficiently characterized. Given that language and reasoning in the human brain appear dissociable, an open question is whether LLMs align with neural signals from reasoning-related regions and whether such signals can improve them. Here, focusing on deductive reasoning, we show that LLM internal representations are not only partially aligned with task-fMRI activity but can also be directly enhanced by these signals. Using a neural-predictivity metric, we find that LLMs explain a substantial fraction of the explainable variance in reasoning-related regions at the aggregate level, whereas predictivity within specific reasoning types is lower, indicating both alignment and divergence. Building on this, we propose a brain-guided framework: we steer model representations along directions induced by the joint structure of model and brain representations, applying intervention at inference and fine-tuning during training. We demonstrate that task-evoked brain signals can directly enhance LLM reasoning, yielding gains orthogonal to language-only supervision across 10 LLMs (1.5B-72B), with transfer across reasoning types and up to 13\% absolute accuracy gain. Our results advance LLM-brain correspondences from correlation to guidance, establishing a brain-signal-driven pathway toward more robust and cognitively aligned AI.

12.
arXiv (CS.CV) 2026-06-17

Principled RL for Flow Matching Emerges from the Chunk-level Policy Optimization

Recent Progress in post-training flow matching for text-to-image (T2I) generation with Group Relative Policy Optimization (GRPO) has demonstrated strong potential. However, it is hindered by a critical limitation: inaccurate advantage attribution. In this work, we argue that aggregating consecutive steps into a coherent 'chunk' and shifting the policy optimization paradigm from GRPO's step level to the chunk level can effectively mitigate the negative impact of this issue. Building on this insight, we propose Group Chunking Policy Optimization (GCPO), the first chunk-level reinforcement learning approach for post-training flow matching. Extensive experiments demonstrate that GCPO achieves superior performance on both standard T2I benchmarks and preference alignment, with up to 43% relative gains over GRPO, highlighting the promise of chunk-level policy optimization. The code is available on https://github.com/xingzhejun/GCPO.

13.
arXiv (quant-ph) 2026-06-16

Sub-Poissonian Statistics and Quantum Non-Gaussianity from High-Harmonic Generation

arXiv:2602.10882v4 Announce Type: replace Abstract: Quantum technologies are powered by platforms to generate complex non-classical states of matter or light to realize applications. We investigate the non-classical properties of high-harmonic generation in semiconductors, an emerging photonic platform. Measuring the click statistics of three double-digit orders, we evaluate witness operators to certify the non-classicality of the generated states. We show that higher-order harmonics driven by a coherent laser are squeezed and entangled. The properties of the emission are well retrieved with an entangled Gaussian state model, obtained by numerical state optimization to multiple observables. Additionally, we perform inter-order heralded measurements to engineer the quantum state of the emission. The heralded states have distinct properties, showing sub-Poissonian photon statistics. Further, we witness the generation of a quantum non-Gaussian state, a resource highly relevant for quantum information. With this, we establish high-harmonic generation as a platform for generating quantum optical resources.

14.
Nature (Science) 2026-06-22

Why heritage sites are at risk in a warming world — and how to save them

As rising seas and intensifying disasters threaten historic sites worldwide, new ways to understand, preserve and adapt these places are needed urgently. As rising seas and intensifying disasters threaten historic sites worldwide, new ways to understand, preserve and adapt these places are needed urgently.

15.
medRxiv (Medicine) 2026-06-16

Risk beliefs, intensive digital information and demand for a new preventative health product in public clinics: Evidence from an experiment in Zimbabwe.

Demand for preventative health care is weak in low-income settings. In a field experiment in a low-income, high-risk setting, we evaluated whether demand for a new bio-medical preventative health product, offered free at public health clinics, responds to digital feedback-based intensive information on health risks and benefits of prevention along with a clinic referral enabling access to the product. In our sample of women aged 18-24 years, we find a large correction in risk beliefs sustained six months after the intervention. Against a background of very low baseline usage, within six months we find a 5.8 percentage point increase in take up of the prevention method, a level of uptake which is very large relative to the control group. Reassuringly, there is no meaningful difference in up-take amongst baseline high- risk and low-risk individuals.

16.
arXiv (CS.CV) 2026-06-16

Efficient Flow Matching using Latent Variables

Flow matching models have shown great potential in image generation tasks among probabilistic generative models. However, most flow matching models in the literature do not explicitly utilize the underlying clustering structure in the target data when learning the flow from a simple source distribution like the standard Gaussian. This leads to inefficient learning, especially for many high-dimensional real-world datasets, which often reside in a low-dimensional manifold. To this end, we present $\texttt{Latent-CFM}$, which provides efficient training strategies by conditioning on the features extracted from data using pretrained deep latent variable models. Through experiments on synthetic data from multi-modal distributions and widely used image benchmark datasets, we show that $\texttt{Latent-CFM}$ exhibits improved generation quality with significantly less training and computation than state-of-the-art flow matching models by adopting pretrained lightweight latent variable models. Beyond natural images, we consider generative modeling of spatial fields stemming from physical processes. Using a 2d Darcy flow dataset, we demonstrate that our approach generates more physically accurate samples than competing approaches. In addition, through latent space analysis, we demonstrate that our approach can be used for conditional image generation conditioned on latent features, which adds interpretability to the generation process.

17.
arXiv (CS.LG) 2026-06-17

Diagnosing and Repairing Shape-Prior Shortcuts in Long-Range Single-Shot Fringe Projection Profilometry

arXiv:2606.17093v1 Announce Type: new Abstract: Learning-based single-shot fringe projection profilometry (FPP) has been studied mostly at close range. The long-range regime (standoff beyond 1 m) remains largely unaddressed: inverse-square intensity falloff lowers fringe signal-to-noise ratio and degrades physical ground truth, the single-shot problem is ill-posed because fringe-order information is absent from one image, and these architectures have not been studied mechanistically. We present a diagnose-repair-verify study using mechanistic interpretability (MI) and conformal uncertainty quantification (UQ) as convergent diagnostics: they agree on one physical failure locus, driving and verifying an architectural repair. On a photorealistic synthetic benchmark (15,600 fringe images, 50 objects at 1.5-2.1 m), a best UNet baseline reaches 14.54 mm object mean absolute error (MAE). Three probes (linear probing, Grad-CAM, flat-plane out-of-distribution test) converge: the baseline solves the task via object-boundary shape priors rather than fringe-phase decoding. We repair this with PhiCalNet, which outputs wrapped phase rather than depth and applies a fixed differentiable calibration layer mapping phase to depth, removing the shape-prior solution from the hypothesis space architecturally rather than by a loss penalty. A physics-informed loss that enforces the same physics as a soft penalty on a depth-regressing network yields no measurable gain, isolating the architecture as the operative factor. PhiCalNet reduces object MAE 3.3x to 4.46 mm; the residual is carried by 0.103% of pixels at the +/-pi wrap discontinuity. Pixel-wise conformal UQ confirms the diagnosis: rejecting the top 5% of object pixels by snapshot disagreement cuts PhiCalNet RMSE by 64% (20.6->7.4 mm) versus 3.5% for the baseline. MI and UQ converge on the same failure locus.

18.
arXiv (CS.CL) 2026-06-12

LAUKIN: A Multi-jurisdictional Common Law Contract Dataset

Multinational companies increasingly require cross-jurisdictional contract review, yet existing legal NLP datasets are largely restricted to a single jurisdiction. We introduce LAUKIN (Legal equivalence dataset of Australia, UK, and INdia), a dataset of clause pairs (AU-UK, UK-IN, IN-AU) labelled for boolean legal equivalence. We develop a novel multi-stage retrieval and reranking pipeline to construct the initial clause pair mapping, with a subset of clause pairs subsequently annotated by legal experts as Equivalent or Not Equivalent. The dataset comprises 14,727 clause pairs from 204 contracts across 8 agreement types, of which 3,000 are manually labelled: 900 train, 600 dev, and 1,500 test. We evaluate 12 models across 4 techniques, achieving a best macro-F1 of 65.11%, establishing LAUKIN as a challenging benchmark. Results reveal that, despite shared legal heritage, drafting conventions diverge significantly across jurisdictions, making cross-jurisdictional equivalence classification non-trivial. LAUKIN also includes 11,727 unlabelled training pairs to support future semi-supervised learning research in legal NLP.

19.
arXiv (CS.LG) 2026-06-17

Rethinking Dataset Distillation for Classification: Do Distilled Sets Outperform Coresets?

arXiv:2606.18209v1 Announce Type: new Abstract: Dataset distillation (DD) has emerged as a prominent approach in data centric machine learning, aiming to synthesize compact training sets for efficient training by compressing the information in large datasets into a small number of synthetic samples. However, DD methods are often evaluated under inconsistent evaluation protocols, ranging from standard ERM to single/multi-teacher supervision, making it difficult to isolate the effectiveness of distilled data from evaluation. Moreover, many prior methods claim that DD outperforms data pruning approaches such as coreset selection (CS), based on the assumption that restricting condensed datasets to subsets of real samples fundamentally limits their expressiveness. In this work, we critically evaluate DD methods through large-scale experiments using standardized datasets and evaluation protocols to assess their intrinsic effectiveness. We benchmark seven state-of-the-art (SOTA) DD methods on ImageNet-1K, ImageNet100, and ImageNette, using three widely adopted training protocols against three CS strategies. Our results show that while some DD methods fail to outperform even simple random subsets, the SOTA DD approaches are comparable to or worse than coresets on large-scale datasets and incur a substantially higher cost for construction. Beyond accuracy, we also evaluate the representativeness, diversity, and quality of condensed sets, and find that coresets consistently achieve better coverage of the original data distribution. These findings highlight the limited practical advantages of current DD methods and show that coresets remain competitive and are often a more computationally efficient alternative for data-centric learning.

20.
arXiv (CS.CL) 2026-06-19

Large Language Models Hack Rewards, and Society

Reinforcement learning (RL) has become a dominant post-training paradigm, enabling large language models (LLMs) to learn from rewards. We observe that societal regulations are structurally similar to reward functions. They define measurable outcomes, thresholds, and exceptions, while often leaving institutional intent only partially specified. We hypothesise that the RL training process may exploit these gaps and therefore ask whether models' well-known tendency to hack reward functions during RL can scale into a more consequential failure mode named societal hacking: discovering loopholes in the rules society runs on. To study this phenomenon, we introduce SocioHack, a sandbox of 72 societal environments, and find that within these environments, reward hacking naturally emerges and leads to regulatory loophole discovery. Models learn to hack the social rules and generate strategies that remain technically compliant while defeating regulatory intent, and current LLM safeguards provide only limited mitigation. Therefore, collecting in-the-wild feedback for model training requires greater caution, and we need a next-generation post-training paradigm for safely iterating LLMs in real society.=

21.
arXiv (CS.CV) 2026-06-19

SSD: Spatially Speculative Decoding Accelerates Autoregressive Image Generation

Autoregressive models excel in visual generation by treating images as 1D sequences of discrete tokens, mirroring language modeling. However, this flattening discards the intrinsic 2D spatial locality of visual signals, creating severe computational bottlenecks during inference. We introduce Spatially Speculative Decoding (SSD), a framework that aligns the predictive objective with the natural geometry of images. Rather than predicting only the immediate next token in a 1D sequence, our model simultaneously predicts the adjacent horizontal token and the token directly below it. By capitalizing on this 2D spatial correlation, spatially speculative decoding overcomes the memory wall in visual inference. Our approach accelerates autoregressive image generation by up to 13.3x while maintaining high fidelity on DPG-Bench and GenEval. Our results suggest that respecting the underlying geometry of vision unlocks massive computational efficiencies, paving the way for real-time, high-resolution autoregressive generative models.

22.
arXiv (CS.AI) 2026-06-15

When Should Agent Trust Be Conditional? Characterizing and Attacking Skill-Conditional Reputation in Agent Swarms

arXiv:2606.14200v1 Announce Type: new Abstract: Open platforms increasingly route tasks among heterogeneous LLM agents–differing in base model, scaffold, and tool stack–whose competence varies sharply by skill: an agent excellent at one skill may be useless at another. The standard reputation approach summarizes each agent by a single global trust score, but that scalar is the wrong object here, because routing every task to the globally most-trusted agent leaves the value of specialization unclaimed. We study skill-conditional trust R(i | k)–the trust to place in agent i for a task requiring skill k, rather than one score per agent–and pose three falsifiable questions: when is conditioning worth it, how much cross-skill evidence should be borrowed, and whether that borrowing is safe. A controlled phase-diagram analysis answers the first two: conditional trust wins only in a specific regime–high agent heterogeneity, sparse per-skill evidence, and correlated skills–and the coupling strength beta that buys this data efficiency is dual-use, because the same cross-skill borrowing is also a laundering channel. On a public benchmark of 14 genuinely heterogeneous AppWorld agents, real pools land inside the beneficial regime–a small but genuine gain, with the per-skill best agent genuinely changing across skills. We then show that an attacker with cheap evidence in one skill and none in a target skill hijacks the conditional router, driving routing regret from 0 to 0.94 on a pool our zero-cost Conditional Information Value Test (CIVT) rates GREEN–while the ungated trust verdict it contaminates reads -0.06 instead of the honest +0.19. A zero-evidence gate bounds the attack but does not eliminate it; we characterize the residual cost under an explicit budget. We do not claim Sybil-resistance–we quantify the trade-off.

23.
arXiv (CS.AI) 2026-06-12

Deterministic Integrity Gates for LLM-Assisted Clinical Manuscript Preparation: An Auditable Biomedical Informatics Architecture

arXiv:2606.09500v3 Announce Type: replace Abstract: As autonomous research agents and AI co-scientist systems push large language models (LLMs) from drafting toward end-to-end manuscript production, the bottleneck shifts from generation to verification. Fluent LLM output can hide fabricated citations, numbers that drift from source tables, and unmet reporting-guideline items; existing tools generate without verifying, and self-critique inherits the blind spots that produce confident fabrication. We describe an architecture pairing generation with verification, resting on three principles: decompose the workflow into self-contained skills, gate every stage transition with halt-on-failure, and resolve each integrity question with the cheapest sufficient mechanism, a deterministic, re-executable check where one suffices and a prose-level probe only where interpretation is unavoidable. This determinism-where-possible split, organized as an integrity-gate taxonomy, is the core contribution. It is realized as MedSci Skills, an open-source toolkit of 43 skills with a 21-detector deterministic tier, evaluated on three public-dataset pipelines (STARD, PRISMA, STROBE) and a seeded-defect ablation. Across the three pipelines every content-hash manifest verified clean and the gates surfaced real defects; on 27 identical injected defects the deterministic gates detected all 27 with no false positives on the matched clean fixtures, whereas a single-prompt LLM reviewer detected 11, its misses in code, bibliography, and style defects the prose hides. Determinism-where-possible verification yields an auditable, re-executable trail that exposes the evidence a human needs to check an LLM-assisted manuscript: feasibility and reproducibility evidence, not a claim of human-competitive quality, which a separate blinded study addresses. MedSci Skills is MIT-licensed and archived (v3.8.0).

24.
arXiv (CS.LG) 2026-06-11

Spatially Masked Regression Reveals Local and Distributed Predictability in Electrophysiological Recordings

arXiv:2606.11415v1 Announce Type: cross Abstract: Neural recordings are often interpreted as local measurements, yet the signal at any one sensor can also reflect structured activity distributed across the broader network. This raises a basic question: to what extent does an electrode's signal reflect local versus distributed information in the underlying system? More specifically, how much of an electrode's activity is carried by its immediate neighborhood, and how much is embedded more broadly across the array? We address this with a Spatially Masked Regression (SMR) framework that reconstructs each electrode's timeseries from the remaining electrodes while excluding a configurable neighborhood around the target. By progressively increasing this mask, spatial locality becomes an experimental control for quantifying how much predictive information survives after nearby channels are withheld. We apply SMR to intracranial EEG with heterogeneous electrode coverage and to scalp EEG with standardized montages over sensorimotor cortex. Using distance correlation between original and reconstructed signals, we find strong within-subject reconstruction in both modalities, substantial residual predictability even when local neighbors are excluded, and markedly stronger cross-subject transfer in EEG than in iEEG. Masking shows that nearby electrodes contribute strongly to reconstruction but do not account for all of it, indicating that individual channels reflect both local redundancy and broader distributed structure. Surrogates that preserve selected marginal or spectral properties while disrupting phase structure or temporal ordering substantially reduce performance, supporting the conclusion that SMR depends on structured temporal and cross-channel organization rather than on marginal statistics alone. These results position SMR as an interpretable framework for quantifying the balance between local and distributed information in recordings.

25.
arXiv (CS.CV) 2026-06-19

Mix-QVLA: Task-Evidence-Aware Mixed-Precision Quantization of Vision-Language-Action Models

We propose Mix-QVLA, a task-evidence-aware mixed-precision PTQ framework for VLA models. Mix-QVLA anchors each quantized variant to the full-precision action-token reference decision and evaluates whether quantization preserves task-relevant evidence across key VLA functional boundaries. It computes normalized gradient-weighted task-evidence maps from boundary activations and compares full-precision and quantized maps using evidence-mass and attribution-distribution distortion, capturing changes in both the strength and allocation of decision-supporting evidence. A soft-bottleneck objective aggregates boundary-level degradation into layer-wise sensitivity scores. Mix-QVLA further models sensitivity throughout task execution, capturing phase-dependent shifts in layer importance rather than assuming a fixed sensitivity profile. The resulting evidence- and time-aware scores guide mixed-precision bit allocation under model-size and BitOps budgets. Extensive evaluations on OpenVLA-style policies show that Mix-QVLA improves the accuracy-efficiency trade-off of low-bit VLA deployment. On LIBERO, Mix-QVLA reduces OpenVLA-OFT memory from 15.4 GB to 4.1 GB, retains 96.3 average success compared with 97.1 for the BF16 model, and achieves a 1.52x inference speedup.