Academic Intelligence · Curated Daily

Explore the Frontier of Global Academia

AcademicHub aggregates real-time literature from top journals and preprint platforms. Build your personal research radar and let large language models compile cross-disciplinary analysis briefings automatically.

01.
arXiv (CS.LG) 2026-06-12

Masked Neural Detection for Constrained Channel Coding in Molecular Communication

arXiv:2606.12489v1 Announce Type: cross Abstract: Molecular communication (MC) suffers from severe diffusion memory because molecules released for one symbol may arrive during later symbols. Neural sequence detectors, especially sliding bidirectional recurrent neural networks (SBRNNs), can substantially outperform threshold detectors in such channels. This raises a central question for MC channel coding: does a code whose advantage was established under threshold detection retain it when both coded and uncoded transmission are evaluated with neural detection? This letter answers this question for run-length-limited ISI-mitigation (RLIM) codes, a class of constrained codes previously shown to provide large BER gains in MC. Across the tested operating points, the best RLIM-SBRNN receiver beats the best uncoded receiver, chosen between threshold and SBRNN detection, in $46$ of $59$ cases, with a mean gain of $10.36\times$ over those wins. We also propose an RLIM-tailored training mask for compact SBRNN detectors, improving the unmasked RLIM-SBRNN in $227$ of $236$ comparisons with $3.267\times$ mean gain when masking is beneficial. Finally, the compact masked RLIM-SBRNN is competitive with channel-state-aware MLSE despite using no channel knowledge.

02.
arXiv (math.PR) 2026-06-18

On a class of unbalanced step-reinforced random walks

arXiv:2504.14767v4 Announce Type: replace Abstract: A step-reinforced random walk is a discrete-time stochastic process with long-range dependence. At each step, with a fixed probability $\alpha$, the so-called positively step-reinforced random walk repeats one of its previous steps, chosen randomly and uniformly from its entire history. Alternatively, with probability $1-\alpha$, it makes an independent move. For the so-called negatively step-reinforced random walk, the process is similar, but any repeated step is taken with its direction reversed. These random walks have been introduced respectively by Simon (1955) and Bertoin (2024) and are sometimes refered to the self-confident step-reinforced random walk and the counterbalanced step-reinforced random walk respectively. In this work, we introduce a new class of unbalanced step-reinforced random walks for which we prove the strong law of large numbers and the central limit theorem. In particular, our work provides a unified treatment of the elephant random walk introduced by Schutz and Trimper (2004) and the positively and negatively step-reinforced random walks.

03.
arXiv (CS.LG) 2026-06-16

On the Energy Distribution of the Galactic Center Excess' Sources

arXiv:2507.17804v2 Announce Type: replace-cross Abstract: The Galactic Center Excess (GCE) may yet herald the discovery of annihilating dark matter. Weighing against that conclusion are analyses showing evidence for dim point sources within the spatial structure of the emission. Due to technical limitations these analyses are purely spatial with all spectral information that could disentangle the excess from astrophysical backgrounds discarded. Here, we demonstrate that a neural network simulation-based inference approach can jointly analyze the spatial and spectra data. The addition is profound: energy information drives the putative point sources to be significantly dimmer, indicating either the GCE is truly diffuse in nature or made of an exceptionally large number of sources. Quantitatively, for our best fit background model, the excess is essentially consistent with Poisson emission as predicted by dark matter. If due to point sources, our median prediction is $\mathcal{O}(10^5)$ sources, or more than 35,000 at 90\% confidence, both orders of magnitude larger than the hundreds preferred by earlier point-source analyses of the GCE, although variations allowed by background systematics could reduce the required number of sources by roughly an order of magnitude.

04.
arXiv (CS.CV) 2026-06-16

CogCanvas: A Benchmark for Evaluating Multi-Subject Reference-Based Image Generation

Multi-subject reference-based image generation requires jointly preserving multiple human identities, binding per-person objects and fashion items, and respecting a specified background scene, a regime where current diffusion models remain brittle. Existing benchmarks evaluate only one axis at a time and none jointly captures multi-identity composition with human-object interaction, background grounding, and spatial plausibility. We introduce CogCanvas, a benchmark of 1,952 curated reference images spanning 100 celebrity identities, 115 distinctive objects and fashion items, and 29 real-world background scenes including landmarks, from which we construct 1,361 compositional prompts covering 2-5 person group sizes. The curation pipeline combines DINOv2-based deduplication, two-stage aesthetic filtering, and automated derivation of structured interaction and position graphs that serve as ground-truth supervision. CogCanvas supports three tasks, reference-based multi-human-object generation (primary), text-to-image compositional generation, and reference retrieval, under a unified six-axis evaluation protocol. We introduce two metrics tailored to the multi-reference setting: BG-Sim, which scores background fidelity on SAM 3-masked regions via DINOv3 feature similarity, and Attr-VQA, which uses a multimodal LLM to verify per-subject attribute binding and inter-person interactions against the structured graphs. Benchmarking five SOTA methods reveals that every model degrades substantially as group size grows from 2 to 5, with near-complete failure on object/fashion binding beyond three subjects.

05.
arXiv (CS.LG) 2026-06-15

Closed-loop discovery of out-of-distribution processing protocols by evolutionary search and uncertainty-aware learning

arXiv:2606.13859v1 Announce Type: cross Abstract: Many materials and chemical systems exhibit history-dependent responses, where functional outcomes are governed not only by final-state variables but by the time-dependent sequence of fields, temperatures, or chemical potentials applied during operation. Discovering new processing protocols is therefore a high-dimensional search problem in which the control variable is an entire waveform or sample history, and conventional strategies either remain confined to conservative interpolative families or become prohibitively measurement intensive. Here, a closed-loop workflow is introduced that couples evolutionary search over a compact waveform representation with uncertainty-aware deep kernel learning to generate, rank, and experimentally validate candidate protocols. Applied to ferroelectric thin films, with the scanning-probe tip-bias waveform as the protocol and the nonlinear electromechanical response as the reward, the workflow discovers waveform families that enhance nonlinearity by de-aging the film. Spatially resolved before/after measurements show that the best-performing waveforms selectively activate pre-existing, weakly pinned domain-wall segments, whereas the worst drive long-range irreversible switching. This framework reframes protocol tuning as out-of-distribution discovery, generalizable to synthesis and annealing trajectories, battery formation protocols, and other high-dimensional control problems.

06.
arXiv (CS.AI) 2026-06-12

Rethinking RAG in Long Videos: What to Retrieve and How to Use It?

arXiv:2606.13141v1 Announce Type: new Abstract: Retrieval-augmented generation is moving beyond text into long, egocentric video, where systems must select query-relevant chunks across multiple modalities and temporal granularities. Yet progress in VideoRAG is limited by two gaps: existing benchmarks allow queries to be answered without the video, obscuring retrieval errors, and prior methods apply a single modality-granularity configuration per query, ignoring chunk-level variability. We address both by introducing V-RAGBench, a benchmark of $\langle$query, evidence chunk, answer$\rangle$ triplets that enables faithful, decoupled evaluation of retrieval and generation, and CARVE, a simple method that runs parallel retrievers across configurations and employs chunk-adaptive reranking to identify the winning configuration for each chunk. Each chunk then enters the generator under its winning configuration selected during retrieval, yielding an interleaved evidence form where the chunk-level decision propagates across both stages. CARVE outperforms eight recent VideoRAG baselines, with the chunks supplied to the generator interleaving multiple configurations rather than sharing a single one, a behavior unattainable by query-level methods.

07.
arXiv (CS.AI) 2026-06-18

Closing the Loop: PID Feedback Control for Interpretable Activation Steering in Symbolic Music Generation

arXiv:2606.18790v1 Announce Type: cross Abstract: Transformer-based architectures have significantly advanced the generation of complex symbolic sequences, yet a significant gap remains in achieving fine-grained, interpretable control over discrete signal attributes. This paper investigates the mechanistic interpretability of the Multitrack Music Transformer (MMT) and proposes a framework for deterministic attribute modulation without retraining to bridge this gap via inference-time activation steering. Utilizing the Difference-in-Means (DiffMean) methodology, we isolate latent directions for signal attributes, specifically Pitch and Duration, within the residual stream. We validate the Linear Representation Hypothesis in this domain, achieving high correlation between steering magnitude and attribute shift. To address the inherent feature entanglement in multi-attribute steering, we introduce a Dual Steering framework utilizing Gram-Schmidt Orthogonalization. Experimental results demonstrate that this geometric decoupling reduces conceptual interference and signal degradation compared to naive vector addition, enabling independent deterministic control even against strong autoregressive conditioning.

08.
arXiv (CS.LG) 2026-06-11

Weighted Random Dot Product Graphs

arXiv:2505.03649v4 Announce Type: replace-cross Abstract: Modeling of intricate relational patterns has become a cornerstone of contemporary statistical research and related data science fields. Networks, represented as graphs, offer a natural framework for this analysis. This paper extends the Random Dot Product Graph (RDPG) model to accommodate weighted graphs, markedly broadening the model's scope to scenarios where edges exhibit heterogeneous weight distributions. We propose a nonparametric weighted (W)RDPG model that assigns a sequence of latent positions to each node. Inner products of these nodal vectors specify the moments of their incident edge weights' distribution via moment-generating functions. In this way, and unlike prior art, the WRDPG can discriminate between weight distributions that share the same mean but differ in other higher-order moments. We derive statistical guarantees for an estimator of the nodal's latent positions adapted from the workhorse adjacency spectral embedding, establishing its consistency and asymptotic normality. We also contribute a generative framework that enables sampling of graphs that adhere to a (prescribed or data-fitted) WRDPG, facilitating, e.g., the analysis and testing of observed graph metrics using judicious reference distributions. The paper is organized to formalize the model's definition, the estimation (or nodal embedding) process and its guarantees, as well as the methodologies for generating weighted graphs, all complemented by illustrative and reproducible examples showcasing the WRDPG's effectiveness in various network analytic applications.

09.
arXiv (CS.AI) 2026-06-17

ZIVARI-TLBO: A Zero-Cost Inter-Group Evaluated-Elite Relay Mechanism for Teaching-Learning-Based Optimization

arXiv:2606.17087v1 Announce Type: cross Abstract: ZIVARI-TLBO is a grouped Teaching-Learning-Based Optimization (TLBO) method that augments an existing population-state controller with a fixed inter-group evaluated-elite relay. At each scheduled event, every group offers its already evaluated elite to the next group in a fixed ring; the elite replaces the receiver's worst eligible learner only when its stored objective value is better. Because the exact relay copies an already evaluated solution and its stored fitness, it requires no additional objective-function calls. The frozen gts-v4-cm-fixed implementation is evaluated under equal 10,000-evaluation budgets on eight classical functions at dimensions 10, 30, 50, and 100, with 30 matched seeds, and on five constrained engineering problems. A direct ablation against the same grouped landscape-aware controller without relay records 728/11/221 wins/ties/losses and a rank-biserial effect size of 0.624 across dimensions. In an eight-method multidimensional comparison, WOA obtains the best average rank (2.914) and ZIVARI-TLBO ranks second (3.382); ZIVARI-TLBO significantly outperforms TLBO, MCTLBO, DE, PSO, and GWO, loses significantly to WOA, and is not significantly different from HHO after Holm adjustment. Feasibility-aware engineering results are mixed and sensitive to the current static-penalty formulation. The evidence supports a scoped relay contribution and budget-consistent information-sharing mechanism, but not universal state-of-the-art, global-convergence, engineering-dominance, or CEC superiority claims.

10.
arXiv (CS.AI) 2026-06-17

SkillJect: Effectively Automating Skill-Based Prompt Injection for Skill-Enabled Agents

arXiv:2602.14211v3 Announce Type: replace-cross Abstract: Agent skills extend LLM agents with task-specific instructions, executable scripts, and auxiliary resources, improving reusability but creating a new supply-chain attack surface. A malicious or compromised skill can be repeatedly loaded as trusted guidance and steer downstream tool use. Existing skill-based prompt-injection attacks are often manual and brittle, because explicit malicious instructions are rejected or ignored when they are not aligned with the original workflow. We propose SkillJect, the first automated framework for generating poisoned skills against skill-enabled agent systems. SkillJect uses two coordinated channels. In the artifact channel, it hides the payload inside an auxiliary helper script. In the instruction channel, it rewrites SKILL.md with a front-loaded inducement strategy, placing injected content at the beginning and framing the helper script as a mandatory prerequisite or initialization step. The rewritten instruction explicitly references the helper-script path and provides an executable example command, making the helper appear to be a legitimate setup step before normal skill operations. SkillJect further adopts a closed-loop multi-agent process to improve attack effectiveness. An Attack Agent generates poisoned skills, a Victim Agent executes downstream tasks with the poisoned skill, and an Evaluate Agent inspects execution traces to determine whether the hidden payload was executed. The Attack Agent then uses this feedback to diagnose failure causes and rewrite SKILL.md, while keeping the payload fixed. Experiments across skill-enabled platforms, backend LLMs, and attack categories show that SkillJect substantially outperforms naive direct injection and prior manual skill-injection attacks, highlighting poisoned skills as a persistent threat in reusable skill ecosystems.

11.
medRxiv (Medicine) 2026-06-16

Wildfire pollution exposure during childhood adversely affects cognitive and neural development

Authors:

Air pollution has well-documented negative cardiovascular and respiratory consequences. However, the impact of particulate matter pollution (PM2.5) on brain development is unclear. Animal studies suggest that exposure to early-life PM2.5 can cause adverse neurodevelopmental outcomes, but in vivo human work has been hampered by cross-sectional designs and heavily confounded PM2.5 exposure measures. Here we use an innovative natural experimental design to isolate the effects of wildfire pollution on neurocognitive development in a large cohort of children (N>9000, 4 waves, age 9-16). Doing so, we find that greater wildfire PM2.5 exposure is robustly associated with slower brain development and shallower cognitive improvement across early adolescence. Our study underscores the urgent public health concern that wildfire PM2.5 poses for childhood development.

12.
bioRxiv (Bioinfo) 2026-06-17

Correcting spatial transcriptomics data affected by a prevalent transcript leakage problem across platforms, species, and tissues

Spatial transcriptomics has been widely applied to study the spatial distribution of cell types, cell states, and specific gene expression in tissue samples. However, we show that there is a prevalent transcript leakage problem in spatial transcriptomics data, where transcripts expressed by a cell diffuse to its neighborhood and are recurrently detected in the nearby cells. By analyzing published data sets, we show that this problem is general across data produced from different tissues and different species using different imaging-based and sequencing-based spatial transcriptomics platforms. It affects both upstream tasks such as expression quantification as well as downstream tasks such as cell-type annotation and detection of spatially-dependent gene expression. To tackle the transcript leakage problem, we propose a reference-free Bayesian model-based method, DeLeakage, which cleans up the data much more effectively than existing denoising methods. DeLeakage also improves cell-type annotation and avoids false detection of spatially dependent expression.

13.
arXiv (quant-ph) 2026-06-15

Dynamically frozen long-distance entanglement via non-Hermitian PT-symmetric systems

arXiv:2606.14177v1 Announce Type: new Abstract: In distributed quantum networks, interacting spin systems can mediate the generation of highly entangled links between distant nodes. We investigate the role of effective parity-time (PT)-symmetric non-Hermitian spin-1/2 bulks weakly coupled to two quantum links, obtained due to the environmental interactions affecting both the bulk and the links. Focusing on effective non-Hermitian nearest-neighbor (NN) Su-Schrieffer-Heeger (SSH) models, we analyze how non-Hermiticity influences the dynamical formation of long-distance entanglement (LDE). For a paradigmatic model consisting of a quantum XX bulk subjected to imaginary staggered magnetic fields, we analytically determine the exceptional points arising from the resulting bulk-mediated interactions between the links. Combining analytical and numerical methods, we demonstrate that an initially fully separable state can dynamically evolve into highly entangled link states near these exceptional points in the broken regime. Further, after optimizing over time and system parameters, near-unit time-averaged entanglement between the links emerges under weak imaginary magnetic fields and bulk-link couplings, which cannot be attained in the corresponding Hermitian systems. Moreover, the non-Hermitian dynamics exhibit a freezing of high entanglement in the vicinity of exceptional points, a feature absent in Hermitian counterparts. We also identify regimes of long-range interaction strengths that yield a higher time-averaged entanglement than the corresponding NN models. Furthermore, we establish that LDE persists in the stationary regime, highlighting the promise of engineered non-Hermitian dynamics for realizing robust and frozen entangled links in quantum networks.

14.
arXiv (CS.CL) 2026-06-19

Thermodynamic Signatures of Reasoning: Free-Energy and Spectral-Form-Factor Diagnostics for Hallucination Detection in Large Language Models

Authors:

Hallucination detection in large language models (LLMs) is deployment-critical, and recent work shows that the spectrum of attention-derived graph Laplacians carries strong signal about reasoning quality. Prior spectral diagnostics, however, summarize the Laplacian spectrum by a handful of eigenvalues or hand-picked scalars, leaving most of its structure unused. We propose Free-Energy Signatures (Fes), a spectral descriptor that treats each layer's attention Laplacian as a Hamiltonian and extracts its thermodynamic potentials partition function, free energy, spectral entropy, heat capacity together with the random-matrix-theory (RMT) spectral form factor. We prove three results: (i)~Lipschitz stability of Fes under attention perturbation; (ii)~an expressiveness result showing that Fes enriches finite spectral summaries and approximates moment-derived spectral functionals under explicit regularity and grid-resolution assumptions; and (iii)~a finite-sample PAC bound on the AUROC of a training-free detector built from Fes. Empirically, across six open-weight LLMs and six benchmarks, a lightweight probe on Fes descriptors achieves the strongest aggregate AUROC among attention-spectral baselines, improving over LapEig by $+6.5$ AUROC points and over GoR-4 by $+2.4$ points on average, while requiring no update to the underlying LLM. In the fully unsupervised setting, an RMT-deviation score achieves mean AUROC $0.71$, providing a label-free but weaker detector. A complementary RMT analysis shows that correct generations exhibit more Wigner-Dyson like spectral statistics, whereas hallucinations exhibit more Poisson-like statistics. The anonymized code and config are provided in the supplementary material.

15.
arXiv (CS.LG) 2026-06-12

ExPLAIND: Unifying Model, Data, and Training Attribution to Study Model Behavior

arXiv:2505.20076v4 Announce Type: replace Abstract: Post-hoc interpretability methods typically attribute a model's behavior to its components, data, or training trajectory in isolation, and are often tied to a particular level of granularity along the local-to-global spectrum. This leads to explanations that lack a unified view and may miss key interactions. We present ExPLAIND, a theoretically grounded, unified framework that integrates model components, data, and training trajectory while supporting explanations across granularities. We generalize recent work on gradient path kernels, reformulating models trained by AdamW as kernel machines. From the resulting kernel feature maps, we derive novel parameter-wise and step-wise influence scores. We empirically validate the resulting decomposition of model behavior in several settings and apply ExPLAIND to two case studies. Our findings on a Transformer exhibiting Grokking support previously proposed learning phases, while refining the final phase as one in which outer layers align around a representation pipeline learned after memorization. For EuroLLM pretraining, ExPLAIND reveals a two-phase dynamic, with the first characterized by outer-layer MLP learning and the second by increased relative influence of intermediate attention layers. These results establish ExPLAIND as a unified framework for interpreting model behavior and training dynamics.

16.
arXiv (CS.AI) 2026-06-19

Bid Farewell to Seesaw: Towards Accurate Long-tail Session-based Recommendation via Dual Constraints of Hybrid Intents

arXiv:2511.08378v4 Announce Type: replace-cross Abstract: Session-based recommendation (SBR) aims to predict anonymous users' next interaction based on their interaction sessions. In the practical recommendation scenario, low-exposure items constitute the majority of interactions, creating a long-tail distribution that severely compromises recommendation diversity. Existing approaches attempt to address this issue by promoting tail items but incur accuracy degradation, exhibiting a "see-saw" effect between long-tail and accuracy performance. We attribute such conflict to session-irrelevant noise within the tail items, which existing long-tail approaches fail to identify and constrain effectively. To resolve this fundamental conflict, we propose HID (Hybrid Intent-based Dual Constraint Framework), a plug-and-play framework that transforms the conventional "see-saw" into "win-win" through introducing the hybrid intent-based dual constraints for both long-tail and accuracy. Two key innovations are incorporated in this framework: (i) Hybrid Intent Learning, where we reformulate the intent extraction strategies by employing attribute-aware spectral clustering to reconstruct the item-to-intent mapping. Furthermore, discrimination of session-irrelevant noise is achieved through the assignment of the target and noise intents to each session. (ii) Intent Constraint Loss, which incorporates two novel constraint paradigms regarding the diversity and accuracy to regulate the representation learning process of both items and sessions. These two objectives are unified into a single training loss through rigorous theoretical derivation. Extensive experiments across multiple SBR models and datasets demonstrate that HID can enhance both long-tail performance and recommendation accuracy, establishing new state-of-the-art performance in long-tail recommender systems.

17.
arXiv (CS.LG) 2026-06-15

A Unified Framework for Structured Flow Modeling: From Representation to Verification and Model Discovery

Authors:

arXiv:2605.18250v3 Announce Type: replace-cross Abstract: Many dynamical systems can be described in terms of structured flows combining source/sink behavior, cyclic dynamics, and topology-constrained transport. These features arise across a wide range of physical, engineered, and data-driven systems. The objective of this work is to establish a unified perspective on such systems, to identify modeling approaches that balance expressivity, interpretability, computational complexity, and data requirements, and to investigate how highly expressive models can be used to uncover the dominant mechanisms underlying observed dynamics. Starting from the Helmholtz-Hodge decomposition of continuous vector fields, we review the recently proposed Graph Vector Field (GVF) framework and its discrete representation on simplicial complexes. We then introduce a hierarchy of alternative approaches, including parametric conditional models, linear graph dynamical systems, and reduced Hodge representations. Finally, we propose a verification and validation methodology based on benchmark datasets from well-understood physical systems and on systematic model-reduction and ablation studies. The resulting family of structured-flow models within a common framework, ranging from low-dimensional parametric representations to full GVF formulations, supports a diagnostic methodology in which gradient, curl, harmonic, and topological contributions are systematically assessed through ablation studies. This process enables the identification of dominant mechanisms underlying the observed dynamics and guides the construction of simplified models tailored to the available data and operational constraints. By separating structural verification, behavioral verification, and domain-specific validation, the proposed approach provides a foundation for scalable and interpretable analysis of complex dynamical systems across multiple application domains.

18.
arXiv (CS.AI) 2026-06-12

Multiagent Protocols with Aggregated Confidence Signals

arXiv:2606.13591v1 Announce Type: new Abstract: Confidence is used for reliability, oversight, and a range of downstream decision tasks in Natural Language Processing (NLP), yet no existing method produces or evaluates a confidence for the output of a multiagent system. Prior work uses confidence within multiagent debate (MAD) to weight messages, trigger debate, or calibrate individual agents, but it never aggregates these into a single confidence for the system itself. We introduce three protocols that produce a final answer along with a single aggregated confidence by first transforming raw confidence signals to make them comparable across models, then combining them via soft voting or a probability fusion we call Bayesian fusion. This aggregated confidence is substantially more discriminative (AUARC) than that of the best single agent or the standard debate baselines, while correctness (F1-score) stays stable and recovers the losses MAD incurs on more ambiguous tasks. Analyzing two estimators, sequence probability and self-report, alongside parametric and non-parametric calibrators, we find that calibration improves F1 for both estimators while AUARC is less reliant on it. We evaluate six homogeneous and heterogeneous debating pairs per benchmark, across five benchmarks and four task types, spanning a range of model capabilities and sizes.

19.
arXiv (CS.CV) 2026-06-16

Trusting Right Predictions for Wrong Reasons: A LIME Based Analysis of Deep Learning Interpretability in Lung Cancer Diagnosis

Lung cancer is the leading cause of cancer-related mortality, with approximately 2.5 million new cases and 1.8 million deaths annually, making reliable diagnosis a clinical priority. Although deep learning models have achieved strong performance in lung cancer classification, evaluation has largely focused on predictive accuracy, leaving their decision-making processes insufficiently examined. This study compares three architecturally distinct models: a Convolutional Neural Network (CNN), a pretrained ResNet50, and a Vision Transformer (ViT), trained on the IQ-OTH/NCCD lung cancer CT dataset. Local Interpretable Model-Agnostic Explanations (LIME) were applied to investigate model reasoning. In addition to standard performance metrics, a dual-correlation framework was introduced to measure both prediction agreement and explanation agreement across model pairs. All three models achieved strong classification performance, with ResNet50 attaining 98.61% accuracy, CNN 97.91%, and ViT 93.75%, while all achieved ROC-AUC scores of 0.99. Prediction correlations exceeded 0.99 across all model pairs, indicating highly consistent outputs. However, LIME explanation correlations remained below 0.26, revealing substantial differences in the image regions used to reach those predictions. Analysis of misclassified samples further identified a consistent spatial pattern: incorrect predictions were associated with attention outside the lung parenchyma, whereas correct predictions focused primarily within lung regions. These findings demonstrate that prediction agreement is a poor proxy for reasoning consistency, and that interpretability evaluation must be treated as an independent validation criterion alongside predictive performance in clinical AI systems.

20.
arXiv (CS.AI) 2026-06-16

SpecAlign: Efficient Specification-Grounded Alignment of Large Language Models via Synthetic Data

arXiv:2606.16276v1 Announce Type: new Abstract: As large language models (LLMs) are increasingly deployed in real-world applications, alignment is no longer governed by a single universal notion of safety or helpfulness, but instead by provider- or application-specific model specifications. These specifications are typically long, structured, and frequently updated, yet existing alignment pipelines lack a systematic mechanism to operationalize them as training signals. In this paper, we propose specification-grounded alignment, a new alignment paradigm that treats provider-authored model specifications as the primary alignment target rather than abstract principles or static benchmarks. To instantiate this paradigm, we introduce SpecAlign, a framework that synthesizes alignment data directly from specification documents. SpecAlign combines structured rule annotation, controllable specification instantiation, and multi-agent adversarial data synthesis to generate fine-grained, boundary-aware preference pairs that capture both compliant behaviors and meaningful specification violations. Experiments across multiple model specifications and backbone models demonstrate that training with SpecAlign consistently improves rule compliance while preserving general capabilities and avoiding over-conservative behavior. These results suggest that grounding alignment in explicit model specifications enables rapid, precise, and scalable adaptation of LLM behavior to evolving policy requirements.

21.
arXiv (quant-ph) 2026-06-19

Generating function and Bloch representation for quantum Fisher tensor

arXiv:2511.05260v2 Announce Type: replace Abstract: The Uhlmann relative amplitude between two density matrices is shown to be a generating function, through which the quantum Fisher tensor that contains both the quantum Fisher information matrix and the mean Uhlmann curvature can be obtained via differentiation over system parameters. In the pure state limit, our generating function recovers that of the quantum geometric tensor proposed by Het\'{e}nyi and L\'{e}vay, and also clarifies the fidelity and phase between two quantum states as the generating functions of the quantum metric and Berry curvature, respectively. A generic expression for the quantum Fisher tensor in terms of the Bloch representation of density matrices is derived, which facilitates the calculation of the tensor, mean Uhlmann curvature, and geometric properties derived from the quantum Fisher information matrix. Canonical ensembles of spins are adopted to demonstrate our formalism, which reveals a constant Ricci scalar, a vacuum Einstein equation, and a cosmological constant on the 3D Euclidean manifold of the magnetic field

22.
arXiv (CS.CV) 2026-06-16

Detect Before You Leap: Mirage Detection in Vision-Language Models

Vision-language models (VLMs) can produce confident visual answers even when the required visual evidence is missing, blank, or unrelated to the question. This failure mode, recently described as mirage (mirage2026), is especially concerning in medical and document VQA, where a plausible but visually ungrounded answer may be mistaken for image-based evidence. We study the complementary problem of pre-release mirage detection: given an image-question pair, determine whether the VLM should answer or abstain before generation. To that end, we propose a novel model-agnostic Text-Conditioned Layer-wise Internal Alignment (TC-LIA) method that probes patch-token representations across the layers of a CLIP ViT-H/14 vision encoder. The key idea is to project layer-wise image patch tokens into the final CLIP embedding space and measure their similarity with the question embedding, thereby tracking whether question-relevant visual evidence emerges across vision layers. TC-LIA summarizes this alignment trajectory using final image-text cosine similarity, late-layer top-k patch-text alignment, early-to-late gain, and layer-wise slope. These features are combined with pixel-statistic based blank/noise detection, zero-shot domain routing, and structured VLM self-assessment in an ensemble. Across five VQA domains with related, unrelated-real, and blank/noise inputs, and across twelve VLM backbones, Qwen2.5-VL-32B achieves the highest three-class detection accuracy of 94.7% with a 3.0% mirage rate, while Qwen2.5-VL-72B achieves 94.6% accuracy with a lower 2.8% mirage rate. Baseline mirage rates span 21.7-66.6%.

23.
arXiv (CS.CL) 2026-06-12

WildIFEval: Instruction Following in the Wild

Recent LLMs have shown remarkable success in following user instructions, yet handling instructions with multiple constraints remains a significant challenge. In this work, we introduce WildIFEval - a large-scale dataset of 7K real user instructions with diverse, multi-constraint conditions. Unlike prior datasets, our collection spans a broad lexical and topical spectrum of constraints, extracted from natural user instructions. We categorize these constraints into eight high-level classes to capture their distribution and dynamics in real-world scenarios. Leveraging WildIFEval, we conduct extensive experiments to benchmark the instruction-following capabilities of leading LLMs. WildIFEval clearly differentiates between small and large models, and demonstrates that all models have a large room for improvement on such tasks. We analyze the effects of the number and type of constraints on performance, revealing interesting patterns of model constraint-following behavior. We release our dataset to promote further research on instruction-following under complex, realistic conditions.

24.
arXiv (quant-ph) 2026-06-11

Necessary and Sufficient Conditions for Universal Gates with Pauli Strings and Beyond

arXiv:2606.12096v1 Announce Type: new Abstract: Any quantum computation consists of a sequence of unitary evolutions described by a finite set of Hamiltonians. For the case where this set consists of only products of Pauli operators, known as Pauli strings, we provide a necessary and sufficient condition for it to generate $\mathfrak{su}(2^n)$, i.e., to be universal for quantum computation on $n$ qubits. When combining Pauli strings with a general Hamiltonian, we show a sufficient (and in certain circumstances even necessary) condition for universality based on the Pauli-basis expansion of the Hamiltonian. As an application of these results, we prove two corollaries: (i) a necessary and sufficient condition for the universality of a general Hamiltonian given arbitrary single-qubit control on all qubits, and (ii) the universality of an XYZ Heisenberg Hamiltonian with local control of just two adjacent qubits.

25.
arXiv (CS.LG) 2026-06-19

Stabilizing Bandits using Regularization: Precise Regret and A Quantitative Central Limit Theorem

arXiv:2603.10184v2 Announce Type: replace-cross Abstract: Statistical inference with bandit data presents fundamental challenges owing to adaptive sampling, which violates the independence assumptions underlying classical asymptotic theory. Recent work has identified stability~\citep{laiwei82} as a sufficient condition for valid inference under adaptivity. This paper first provides a refined stability condition, stated in terms of the iterates of an online algorithm, and shows that a large class of regularized stochastic-mirror-descent-style algorithms satisfy it. This refined condition allows us to strengthen the asymptotic results of~\citet{laiwei82} in several ways. First, we derive a non-asymptotic Berry–Esseen bound for the empirical reward estimates under adaptive sampling. Second, we derive matching non-asymptotic upper and lower bounds on the regret of the proposed algorithm, yielding a precise characterization of its regret. Third, we show that these regularized algorithms preserve asymptotic normality and valid inference under a prescribed level of adversarial corruption. Finally, we show that regularization is necessary rather than incidental: Lai–Wei stability is incompatible with the optimal $O(\sqrt{T})$ regret rate – the rate attained by unregularized algorithms such as EXP3 – so that a controlled, polylogarithmic inflation in regret is the price of valid inference.