Academic Intelligence · Curated Daily

探索全球前沿学术脉络

AcademicHub 汇聚顶级期刊与预印本平台的实时文献。定制您的专属科研雷达,利用大语言模型自动生成交叉领域文献分析简报。

01.
arXiv (CS.LG) 2026-06-19

Model soups need only one ingredient

arXiv:2602.09689v2 Announce Type: replace Abstract: Fine-tuning large pre-trained models on a target distribution often improves in-distribution (ID) accuracy, but at the cost of out-of-distribution (OOD) robustness as representations specialize to the fine-tuning data. Weight-space ensembling methods, such as Model Soups, mitigate this effect by averaging multiple checkpoints, but they are computationally prohibitive, requiring the training and storage of dozens of fine-tuned models. In this paper, we introduce MonoSoup, a simple, data-free, hyperparameter-free, post-hoc method that achieves a strong ID-OOD balance using only a single checkpoint. Our method applies Singular Value Decomposition (SVD) to each layer's update and decomposes it into high-energy directions that capture task-specific adaptation and low-energy directions that introduce noise but may still encode residual signals useful for robustness. MonoSoup then uses entropy-based effective rank to automatically re-weigh these components with layer-wise coefficients that account for the spectral and geometric structure of the model. Experiments on CLIP models fine-tuned on ImageNet and evaluated under natural distribution shifts, as well as on Qwen language models tested on mathematical reasoning and multiple-choice benchmarks, show that this plug-and-play approach is a practical and effective alternative to multi-checkpoint methods, retaining much of their benefits without their computational overhead.

02.
arXiv (quant-ph) 2026-06-16

Quantum enhancement and Doppler suppression of Kasevich-Chu atom interferometer with motional squeezing states

arXiv:2606.16632v1 Announce Type: new Abstract: Hybridization of internal and external atomic degrees of freedom in a Kasevich-Chu interferometer enables the possibility to enhance the sensitivity significantly even under quantum-standard limit. By introducing motional squeezing state as an input, we systematically derive the computational framework of quantum and classical Fisher information of two measurement protocols for arbitrary strength of Doppler effects. Through maximizing the corresponding classical Fisher information, we obtain the optimal control parameters and the corresponding quantum Fisher information. For population measurement, the largest sensitivity can be as large as four times than the semi-classical limit through enlarging the atom coherence length. For joint measurement of population and position, the competition between quantum enhancement and Doppler suppression induces two three behaviors, in one regime, the quantum enhancement dominates even in presence of strong Doppler broadening effects where the sensitivity is significantly enhanced; while in another regime, an optimal squeezing parameter is observed where the classical Fisher information reaches the maximum. Our results clearly demonstrate the robustness of external quantum enhancement against Doppler suppression. Our proposal can be readily applied to gravimeter of mobile platform where decoherence from noise will damage the many-body entanglement of internal spin squeezing.

03.
arXiv (CS.CV) 2026-06-16

Momentum-Guided Semantic Forecasting (MoFore) for Self-Supervised Video Representation Learning

作者:

Self-supervised video representation learning has recently advanced through contrastive learning, masked reconstruction, and predictive representation learning. Reconstruction-based approaches such as MAE and VideoMAE learn representations by recovering masked visual content [he2022mae,tong2022videomae], while contrastive methods such as CLIP learn semantically meaningful embedding spaces through representation alignment [radford2021clip]. In this work, we introduce a Momentum-Guided Semantic Forecasting framework (MoFore) for self-supervised video representation learning. Instead of optimizing for pixel-level reconstruction or task-specific semantic alignment, the proposed method learns temporally predictive video representations by forecasting future latent embeddings from temporally distant context clips. To improve robustness across temporal scales, we further introduce randomized temporal-gap forecasting during training. The framework combines predictive latent forecasting with contrastive regularization to encourage temporal consistency while preventing representation collapse. Experiments on the UCF101 dataset demonstrate that the proposed framework learns temporally consistent and semantically meaningful video representations without using action labels during training. Quantitative analysis shows strong temporal stability and emergent category-level structure in the learned embedding space, while qualitative retrieval experiments reveal motion-aware organization across related activities. Overall, the results suggest that long-range latent forecasting provides an effective and computationally efficient approach for self-supervised video representation learning without relying on reconstruction-based objectives.

04.
arXiv (CS.LG) 2026-06-12

GF-DiT: Scheduling Parallelism for Diffusion Transformer Serving

arXiv:2606.13501v1 Announce Type: cross Abstract: Diffusion Transformers (DiTs) have become the dominant architecture for image and video generation, creating growing demand for efficient DiT serving. Existing systems assign each request a fixed parallel configuration throughout its lifetime. However, DiT workloads exhibit substantial heterogeneity across requests, execution stages, and system conditions, making static parallelism inefficient and often leading to poor GPU utilization and degraded service quality. This paper argues that DiT serving should treat GPU parallelism as a first-class schedulable resource. We present GF-DiT, a policy-programmable runtime for elastic DiT serving that dynamically adapts the parallelism of running requests according to workload demands and service objectives. GF-DiT introduces an asynchronous execution abstraction that decomposes requests into independently schedulable trajectory tasks and enables online GPU reallocation. To make elastic parallelism practical, GF-DiT further proposes group-free collectives, a lightweight communication abstraction that supports low-overhead online formation and reconfiguration of arbitrary execution groups. We implement GF-DiT in vLLM-Omni and evaluate it on representative image and video diffusion workloads. Compared with fixed-pipeline execution with static parallelism, GF-DiT improves throughput by up to 6.01$\times$, reduces mean latency by up to 95%, lowers SLO violation rates by up to 90%, and reduces communication-group setup overhead from 778 ms to approximately 60 $\mu$s.

05.
arXiv (quant-ph) 2026-06-15

Spin-orbit coupling by design in quantum state engineering of atomically defined quantum dots

arXiv:2606.14487v1 Announce Type: cross Abstract: Tuning spin-orbit coupling is essential in controlling both spin and charge in confined semiconductor nanostructures, yet it is rarely a truly controllable parameter. Here, we show control over the spin-orbit Hamiltonian in quantum dots and the resulting quantum states by tailoring the confinement potential with atomic-scale precision. Using scanning tunnelling microscopy and spectroscopy, we pattern individual Cs ions into designer quantum dot structures on the surface of indium antimonide, in which electrons from a two-dimensional electron gas are confined with chosen in-plane electric-field gradients. We then quantify the atomic level structure, both spatially resolving the orbital character of the electronic states and their magnetic-field evolution. We demonstrate that the level structure, including the induced zero-field splitting, can be tailored by the designed geometry of the local electric fields. These effects can be described using a Hamiltonian that allows consistent treatment of the confinement-induced spin-orbit coupling beyond the conventional Bychkov-Rashba description. This Hamiltonian is derived from a multiband k.p model and takes the energy dependence of the relevant physical parameters into account. Such precise control of spin-orbit coupling in semiconductor quantum dots is relevant to quantum and spintronic technologies.

06.
arXiv (CS.AI) 2026-06-16

The Faithfulness Gap: Certifying Semantic Equivalence Between Natural-Language and Formal Mathematical Statements

arXiv:2606.16541v1 Announce Type: new Abstract: Autoformalization, translating natural-language mathematics into formal proof assistants, is bottlenecked not by translation fluency but by faithfulness: a formal statement can typecheck and be provable, yet still encode a different theorem than the source intended. We introduce Bidirectional Provability Fingerprinting (\bpf{}), a framework that certifies faithfulness by characterizing each candidate through its forward and backward consequence neighborhoods in the ambient theory and matching these against probes derived from the natural-language statement. We further introduce four novel components: (i) Counterfactual Probe Generation (\cpg{}), a contrastive procedure that synthesizes probes targeting specific drift directions; (ii) the Equivalence Spectrum, a continuous faithfulness score that replaces brittle binary verdicts; (iii) Adaptive Probe Budget Allocation (\apba{}), an information-theoretic budget router; and (iv) Faithfulness-Guided Decoding (\fgd{}), which uses \bpf{} signals as a reward during autoformalization. We prove a drift detection theorem and a PAC-faithfulness result establishing that the equivalence class of a natural language statement is learnable from $\mathcal{O}(\log(1/\delta)/\varepsilon)$ probes under mild assumptions. We release \driftbench{}, a benchmark of $2{,}183$ NL/Lean~4 pairs with controlled drift labels across six subfields of mathlib4. \bpf{}\,+\,\cpg{} detects $89.6\%$ of drifted formalizations at a $3.0\%$ false-positive rate-against $41.2\%$ for typecheck and $63.3\%$ for LLM-judge baselines, and \fgd{} reduces the rate at which a state-of-the-art autoformalizer emits drifted statements by $47\%$. https://pmlrbd.github.io/BPF/

07.
arXiv (CS.AI) 2026-06-17

Trust-Aware Multi-Agent Traceability: Confidence-Calibrated Knowledge Graphs for Consistent Software Artifact Management

arXiv:2606.17203v1 Announce Type: cross Abstract: Multi-agent AI systems are increasingly used to automate software engineering tasks including requirements analysis, architecture design, test generation, and traceability linking. When these agents operate as a sequential pipeline over shared software artifacts, errors and low-confidence decisions made by upstream agents propagate to downstream stages, producing orphaned requirements, contradictory links, and compliance gaps that pose significant risks in safety-critical domains. We propose a trust-aware coordination framework where a shared knowledge graph serves as both centralized semantic memory and a coordination surface through which agents assess and build upon each other's contributions using calibrated confidence scores. Our approach introduces a two-stage traceability link prediction pipeline combining embedding-based retrieval with LLM-based multi-criteria analysis, a traceability seeding mechanism that enables comparison between derivation-time and validation-time confidence, and a consistency protocol governing pipeline interactions through confidence threshold gating, confidence divergence detection, and conflict resolution. We evaluate on an automotive software engineering case study measuring link prediction calibration, protocol effectiveness, threshold sensitivity, and the impact of traceability seeding. Ablation studies confirm that confidence calibration is essential for effective pipeline coordination.

08.
arXiv (CS.AI) 2026-06-16

CRC-Screen: Certified DNA-Synthesis Hazard Screening Under Taxonomic Shift

作者:

arXiv:2605.00074v2 Announce Type: replace-cross Abstract: DNA-synthesis providers screen incoming orders by searching the requested sequence against curated hazard lists. We show that this baseline collapses to a 100% false-flag rate when the hazardous sequence comes from a taxonomic family absent from the reference set: under Conformal Risk Control's certified miss-rate constraint, a low-discrimination signal forces the threshold below the entire test-benign mass. We compose three signals derived from a synthesis order's public annotation: $k$-mer Jaccard similarity to known toxins, the trimmed-mean score of a five-LLM judge panel, and cosine similarity to clustered embedding centroids. Fused under a monotone logistic aggregator and calibrated by Conformal Risk Control, the resulting screener certifies $\mathbb{E}[\mathrm{FNR}] \le \alpha + \mathrm{TV}$, where the additive term is the calibration-to-test distribution shift under family holdout (a certified ceiling of 24-49% across folds). Across ten leave-one-taxonomic-family-out folds at $\alpha=0.05$ on UniProt KW-0800 reviewed toxins, the calibrated screener achieves 0% empirical test miss rate on every fold and 0% test false-flag rate on nine of ten folds. The bound's finite-sample slack $1/(n_{\mathrm{cal}}+1)$ caps the certifiable miss rate at 1.77% on our 200-hazard subsample; reaching procurement-grade $\alpha=10^{-3}$ requires an $18\times$ larger calibration set, which the full reviewed UniProt KW-0800 corpus is large enough to deliver. The binding constraint on certifiable DNA-synthesis screening is calibration data, not algorithms. Code: https://github.com/najmulhasan-code/crc-screen

09.
medRxiv (Medicine) 2026-06-18

AlphaGenome identifies a deep intronic variant in a family with PLA2G6-associated neurodegeneration: Closing the diagnostic gap in rare genetic diseases

A molecular diagnosis remains out of reach for a substantial subset of patients with clinically recognizable Mendelian disorders, even after comprehensive next-generation sequencing. Causal variants in non-coding regions are difficult to detect and interpret using standard pipelines. Deep intronic variants that disrupt splicing are a known but underexplored source of pathogenic alleles, and systematic tools to evaluate them at scale have only recently emerged. We aimed to resolve an incomplete genetic diagnosis in two siblings with early-onset parkinsonism, prominent neuropsychiatric features, and autonomic dysfunction consistent with PLA2G6-associated neurodegeneration (PLAN), an autosomal recessive condition. Prior clinical exome sequencing, genome sequencing, Multiplex Ligation-dependent Probe Amplification (MLPA), and long-read sequencing had identified only a single heterozygous PLA2G6 missense variant, c.2132C>G (p.Pro711Arg). We used AlphaGenome to score 91 non-coding variants shared among the affected siblings and their father within 1 megabase of the PLA2G6 locus. The deep-learning model identified an intronic variant (c.2034+355G>A) that was predicted to create a cryptic splice acceptor site that could result in inclusion of a 160-bp cryptic exon. Tissue-specific predictions indicated the aberrant splicing would be detectable in blood, confirmed by junction-spanning RNA-seq reads from an unrelated carrier. This analysis completed a compound heterozygous PLAN diagnosis nearly two decades after symptom onset and demonstrates the utility of sequence-to-function models. Systematic integration of tools like AlphaGenome into rare disease workflows offers a practical, low-barrier route to closing the diagnostic gap for patients with compelling Mendelian phenotypes and incomplete genetic diagnoses.

11.
arXiv (CS.AI) 2026-06-16

LLM Jaggedness Unlocks Scientific Creativity

arXiv:2605.10574v3 Announce Type: replace Abstract: As artificial intelligence advances, models are not improving uniformly. Instead, progress unfolds in a jagged fashion, with capabilities growing unevenly across tasks, domains, and model scales. In this work, we examine this dynamic jaggedness through the lens of scientific idea generation. We introduce SciAidanBench, a benchmark of open-ended scientific questions designed to measure the scientific creativity of large language models (LLMs). Given a scientific question, models are asked to generate as many unique and coherent ideas as possible, with the total number of valid responses serving as a proxy for creative potential. Evaluating 19 base models across 8 providers (30 total variants including reasoning versions), we find that jaggedness manifests both across models and within models. First, in a cross-task comparison between general and scientific creativity, improvements in general creativity do not translate uniformly to scientific creativity, revealing divergent capability profiles across models. Second, at the prompt level, stronger models do not improve uniformly; instead, they exhibit high variability, with bursts of creativity on some questions and limited performance on others. Third, at the domain level, individual models display uneven strengths across scientific subfields, reflecting fragmented internal capability profiles. Finally, we show that this jaggedness can be harnessed. We explore mechanisms of inference-time compute, knowledge pooling, and brainstorming to combine models effectively and construct meta-model ensembles that outperform any single model. Our results position jaggedness not as a limitation, but as a resource, a structural feature of AI progress that, when understood and leveraged, can amplify LLM-driven scientific creativity.

12.
arXiv (CS.LG) 2026-06-18

SCOPE-FL: A Strategy-proof Chain-based Optimal pareto efficient Federated Learning System

arXiv:2606.18384v1 Announce Type: new Abstract: Hierarchical Federated Learning (HFL) enables scalable collaborative model training across distributed devices while preserving data privacy. However, existing HFL client selection mechanisms suffer from a fundamental strategic inefficiency. By prioritizing stability over Pareto efficiency (PE), they produce suboptimal resource allocations, and without strategy proofness (SP), participants are incentivized to misrepresent their true preferences, both failures degrading system overall welfare in the Pareto sense in practice. To address it, we propose SCOPE-FL (Strategy-proof Chain-based Optimal pareto efficient Federated Learning), a synchronous HFL framework that formulates client selection as a two-sided school choice problem solved through the Top Trading Cycle (TTC) algorithm that simultaneously guarantees PE and SP. For reward distribution, SCOPE-FL employs a scalable Shapley value approximation based on One-Round Reconstruction (OR), ensuring compensation proportional to each client's contribution. The entire mechanism executes via blockchain smart contracts, providing the tamper-proof environment required for the SP guarantees to hold in practice. A comprehensive evaluation on MNIST, Fashion-MNIST, and CIFAR-10 demonstrates that SCOPE-FL outperforms state-of-the-art approaches, including DA, IAS, and other methods across model accuracy, convergence rate, and reward efficiency, while achieving communication latency comparable to DA and blockchain overhead significantly lower than DA at scale.

13.
arXiv (CS.AI) 2026-06-16

LLM4RTL: Tool-Assisted LLM for RTL Generation

arXiv:2606.15500v1 Announce Type: cross Abstract: Large language models (LLMs) have facilitated impressive progress in software engineering, code generation, tooling, and systems. Concurrently, a significant body of research has developed which explores a growing variety of methods and systems for applying LLMs to hardware and chip design (e.g., systems for RTL code generation based on functional description). However, when it comes to open Verilog/RTL code-generation, we need high-quality training samples to build specialized and more effective LLM systems through fine-tuning or low-rank adaptation. Here, we propose a ``judge-renew-check-renew-check'' (JRCRC) pipeline which updates a current public dataset using a hierarchy of state-of-the-art commercial LLM models differing in their costs and capabilities in RTL code generation. This approach achieves a cost-effective mechanism for filtering and refining code-generation samples into a higher-quality training dataset. Our experiments also identify some common weaknesses of LLMs in rule-based reasoning and logic, and consequently, in RTL code-generation. Having identified these weaknesses, we develop an architecture for incorporating pre-processing tools to dynamically assist the LLMs in inferring logical relationships from tabular data formats. With our tools-assisted architecture for RTL code generation, we achieve significant overall performance gains in the VerilogEval benchmark and outperform many state-of-the-art methods. Our LLM4RTL system achieves performance comparable to that of GPT-4O using a significantly much smaller LLM.

14.
arXiv (CS.AI) 2026-06-12

Reasoning for Mobile User Experience with Multimodal LLMs: Task, Benchmark, and Approach

arXiv:2606.13192v1 Announce Type: new Abstract: User experience (UX) centered on usability, perceived consistency, and functional clarity is fundamental to real-world user interfaces (UI). The application of multimodal large language models (MLLMs) in the field of user interfaces is evolving rapidly, such as visual element grounding, graphical user interface (GUI) agents, and design-to-code generation. However, research efforts on evaluating UX based on UI screenshots are still immature. To address this, we propose UXBench, a novel multimodal benchmark consisting of 2,000 VQA data samples designed to assess MLLMs' ability to perform UI-based reasoning. UXBench includes 8 tasks based on real-world UI screenshots that require fine-grained diagnosis of UX issues across layout relationships, visual hierarchy, and content consistency. Our extensive evaluation of mainstream MLLMs shows that they remain fundamentally limited in their capacity for UI-based reasoning. The results underscore the need for further advancements in this area. To bridge this gap, we propose UI-UX, an MLLM based on Qwen3-VL-4B-Thinking foundation model and enhanced via reinforcement learning with two key innovations: a reward routing mechanism that dynamically balances perceptual understanding and logical reasoning during inference, and an asymmetric transition reward that suppresses redundant or insufficient reasoning steps. Experiments demonstrate that UI-UX achieves state-of-the-art (SOTA) performance on UXBench, attaining an accuracy of 0.7963 – surpassing Claude-4.5-Sonnet's 0.6550 – while exhibiting strong generalization across diverse UI tasks and maintaining low inference latency.

15.
arXiv (math.PR) 2026-06-18

A simple approach to the L{\o}kka-Zervos dichotomy for absolutely continuous dividend strategies

arXiv:2604.13302v3 Announce Type: replace-cross Abstract: We revisit the optimization problem solved in L{\o}kka & Zervos (2008), i.e., the maximization of dividends, in a Brownian risk model, with the possibility (not the obligation) of making capital injections. Following the approach introduced in Alvarez & Shepp (1998), Renaud & Simard (2021), Renaud et al. (2023), we consider instead absolutely continuous (AC) dividend strategies with an affine bound on the payment rates, while singular capital injections are still allowed. In addition, we incorporate a parameter for the cost of ruin or, said differently, a penalty at ruin in the performance function. We show that the solution is a so-called L{\o}kka-Zervos dichotomy: the surplus is never ruined by making bail-out payments, or no capital is injected and bankruptcy can occur; in either case, dividends are paid at full rate when the surplus is above a threshold. Our framework allows us to provide explicit conditions to express the dichotomy, either using the cost of capital injections or the cost of ruin as a criterion, which also exposes the underlying structure of the solution. In particular, for some values of the parameters, we show that it is optimal to liquidate. Moreover, we perform a numerical analysis highlighting the range of values generated under this AC affine-bound structure.

16.
arXiv (quant-ph) 2026-06-15

Interpreting Bohm-like quantum potentials in "Computing quantum waves exactly from classical action"

arXiv:2605.20443v3 Announce Type: replace Abstract: The recent posting arXiv:2605.02621 [14], commenting on the article rspa.2025.0413 [7], argues that the proof of Lemma 3.1 in [7] is missing the spatial derivative of the density, which would lead to a Bohm-like quantum potential. This technical note shows why the propagated density is independent of space in the Feynman propagator construction of Lemma 3.1. This is done by extending the proof of Lemma 3.1 explicitly with Bohm-like quantum potential terms along the stationary action paths, and then showing that these terms are exactly zero. In [7], this property can also be verified directly on most examples (double slit, Aharonov-Bohm, potential well, harmonic oscillator, tunneling, EPR, QED), as well as in the derivations of the Pauli, Dirac, and Maxwell equations. For more general nonlinear actions, a time rescaling may be required to guarantee this space independence along stationary paths. In the hydrogen atom example, this time rescaling can be computed in closed form. In contrast to the general wave of the Madelung solution [9] Lemma 3.1 of [7] is defined first for a propagator, and a general wave is then constructed in a second step. Recall that a propagator is a specific quantum wave, which is initialized at $t=0$ with a Dirac impulse at a given initial position or momentum. In turn, a general wave is constructed in a second step by superposing a distribution of initial conditions using the propagator. This key difference is why the Bohm-like quantum potential terms disappear in the construction [7] (specifically, in the first step) while the Bohm potential in the Madelung analysis does not. This fundamental difference is also consistent with the fact that the wave construction in [7] extends naturally to relativistic contexts, while Bohmian non-locality notoriously prevents such extensions. Keywords - Response to arXiv:2605.02621, in relation to rspa.2025.0413

17.
arXiv (quant-ph) 2026-06-15

Spin counting via projection noise measurement of mesoscopic solid-state spin ensemble

arXiv:2606.14437v1 Announce Type: new Abstract: Quantum projection noise is the fundamental noise source for the population measurement of spin ensembles. While projection-noise-limited measurements have been extensively studied in atomic systems, corresponding experiments on solid-state spin ensembles remain challenging due to dominant classical readout noise. Here, we report direct measurement of the quantum projection noise of mesoscopic ensembles of nitrogen-vacancy (NV) spin defects at room temperature. Our experiment is enabled by a high optically-detected magnetic resonance (ODMR) contrast of over 20% for a single crystallographic orientation of the defect spins, obtained by combining polarization-selective optical excitation with spin-to-charge conversion. We use our protocol to demonstrate projection noise measurements and spin counting from nanoscale NV ensembles of up to 43 spins. We further demonstrate that the protocol allows for significant gains in sensitivity for magnetometry applications without need for cryogenic operation or high bias magnetic fields.

18.
arXiv (CS.AI) 2026-06-18

Bounded Context Management for Tabular Foundation Models on Stream Learning

arXiv:2606.18677v1 Announce Type: cross Abstract: Tabular stream learning requires predictions on sequentially arriving examples under distribution shift. While standard methods adapt by updating model states, tabular foundation models (TFMs) make predictions conditioned on a labeled context in an in-context manner, making them a natural alternative for stream learning. This shifts the challenge from how to update the model to how to manage the context. We propose a future information view that yields three practical requirements for context management: preserve recent examples, retain uncertain examples, and remove redundant examples. We instantiate these requirements as CURE (Context management via Uncertainty-aware admission and Redundancy aware Eviction), a context-managing policy with entropy-gated admission and redundancy-aware eviction. Across seven streams, CURE shows up to 27.0% relative improvement over classical stream learners, remains robust across multiple TFM backbones, and ranks first among other policy variants. Code and datasets are available at https://github.com/morcellinus/CURE-ICML-FMSD.

19.
arXiv (CS.CV) 2026-06-16

Lost at the End: Primacy Bias in Multimodal Retrieval-Augmented Question Answering

Knowledge-based visual question answering (KB-VQA) lets vision-language systems answer questions that exceed their parametric knowledge by conditioning a reader on passages retrieved from a Wikipedia-scale knowledge base. In pure-text long-context LLMs, retrieved-context use follows the U-shaped "lost-in-the-middle" effect of Liu et al. (2024): information at the start and end of context is used, the middle is lost. Whether this transfers to deployed multimodal KB-VQA is open. To close this gap, we design the first controlled probe of reader-side position dependence in multimodal KB-VQA: a gold-position protocol in which only the gold passage's prompt slot varies within question. We run it on three open-source 7B/8B VLM readers and two KB-VQA benchmarks at k up to 20. The shape flips from U to primacy: gold-at-first beats gold-at-last by 16 to 26 points on every reader-by-benchmark cell, an effect we call "Lost at the End". Three targeted ablations narrow the cause: a text-only control shows the multimodal setting amplifies an already-present text-mode primacy 2.2 to 4.5 times, and image-position and distractor-shuffle ablations together pin the locus to prompt slot 0 of the instruction-tuned reader. On a frozen reader, three retrieval-side fixes (MMR, oracle reranking, rank-based reordering) all leave the gap intact (no separable improvement). Our findings indicate that recall@k is the wrong metric for deployed KB-VQA and that closing the gap requires reader-side intervention; we release our protocol as a controlled instrument for evaluating such interventions.

20.
arXiv (CS.CL) 2026-06-16

Islamic Large Language Models: From Knowledge Acquisition to Trustworthy and Hallucination-Resistant AI

Large language models (LLMs) are increasingly used for knowledge-intensive question answering, including religious and legal questions. Islamic knowledge is a particularly demanding setting: answers are expected to be grounded in authoritative sources, citations must be exact, Arabic varieties differ substantially from the language of classical sources, and legitimate jurisprudential disagreement must be represented rather than collapsed into a single answer. This survey reviews the emerging field of Islamic LLMs and trustworthy Islamic AI. We organize the literature around Arabic NLP and Arabic-centric LLMs, Islamic NLP resources, Qur'anic question answering, Islamic knowledge benchmarks, retrieval-augmented generation, Islamic legal reasoning, inheritance reasoning, hallucination evaluation, and trustworthiness. We argue that fluency in Arabic is not sufficient for Islamic AI. Reliable systems require curated sources, retrieval and verification modules, citation-aware generation, madhhab-aware reasoning, human expert evaluation, and benchmarks that measure not only answer accuracy but also faithfulness, source validity, and reasoning quality. The survey concludes with a research agenda for hallucination-resistant Islamic AI systems.

21.
arXiv (quant-ph) 2026-06-11

Quantum thermodynamics of the Caldeira-Leggett model with non-equilibrium Gaussian reservoirs

arXiv:2405.00215v5 Announce Type: replace Abstract: We introduce a non-equilibrium version of the Caldeira-Leggett model in which a quantum particle is strongly coupled to a set of engineered reservoirs. The reservoirs are composed by collections of squeezed and displaced thermal modes, in contrast to the standard case in which the modes are assumed to be at equilibrium. The model proves to be very versatile. Strongly displaced/squeezed reservoirs can be used to generate an effective time dependence in the system Hamiltonian and can be identified as sources of pure work. In the case of squeezing, the time dependence is stochastic and breaks the fluctuation-dissipation relation, this can be reconciled with the second law of thermodynamics by correctly accounting for the energy used to generate the initial non-equilibrium conditions. To go beyond the average description and compute the full heat statistics, we treat squeezing and displacement as generalized Hamiltonians on a modified Keldysh contour. As an application of this technique, we show the quantum-classical correspondence between the heat statistics in the non-equilibrium Caldeira-Leggett model and the statistics of a classical Langevin particle under the action of squeezed and displaced colored noises. Finally, we discuss thermodynamic symmetries of the heat generating function, proving a fluctuation theorem for the energy balance and showing that the conservation of energy at the trajectory level emerges in the classical limit.

22.
bioRxiv (Bioinfo) 2026-06-18

Elucidating the Design Space of Generative Models for Single-Cell Perturbation Prediction

Next-token prediction has produced predictable scaling in language, but the recipe presumes a sequence of tokens with a meaningful order. Single-cell RNA-seq counts have no natural gene ordering, so applying the recipe directly to raw expression fails under an ill-suited left-to-right bias. We instead ask whether a learned latent can supply the structure the recipe needs. We introduce texttt{ExpressionVAE} (eVAE), a discrete-latent perturbation model that compresses each cell into a short sequence of discrete codes through a finite-scalar-quantization (FSQ) bottleneck and trains a perturbation-conditioned discrete prior over those codes. On Replogle and Parse~1M, eVAE sets a new state of the art on every distributional metric and leads on most cell-eval perturbation metrics, with Fr'echet distance and $mathrm{MMD}^2$ roughly $3$ to $20times$ lower than the strongest continuous-latent baseline. Swapping the prior between autoregressive and masked discrete diffusion leaves performance near-identical, isolating the gain to the discrete latent itself rather than the prior family. A decoder-head ablation then exposes a single design axis, the richness of the predictive distribution at inference, that splits the standard metrics into two groups, variance-sensitive and mean-sensitive, which move in opposite directions along the axis. Finally, on a held-out CRISPRi reversion benchmark of $1{,}732$ perturbations under inflammatory cytokine stress, the frozen eVAE encoder outperforms UMAP and differential expression and matches scGPT on perturbation ranking at a fraction of the data.

23.
arXiv (CS.CV) 2026-06-16

Mind the Gap: Diagnosing Constraint Discovery Failures in Text-in-Image Editing

作者:

A key challenge in multimodal reasoning is determining which visual dependencies become relevant under a specific task, rather than merely recognizing visible content. We study this through edit-induced constraint discovery in text-in-image editing, a controlled diagnostic setting where a local text change can activate secondary consistency constraints: given a valid editing instruction and an image, can a model identify the secondary regions that must also change? Across 461 diagnostic cases, four MLLMs, and 19 constraint subtypes, models recover only 46% case-level macro recall under unguided prompting versus 94% when constraints are explicitly provided, suggesting that a substantial portion of the failure arises when models must decide which unstated dependencies to surface. Oracle-field decomposition shows that case-specific causal explanations are the most effective partial guidance (0.782 recall), above region names (0.610) or type labels (0.646), suggesting that edit-specific causal cues account for much of the oracle gain. A downstream experiment further shows that higher self-discovery recall does not necessarily improve task performance: unverified self-discovery introduces false positives that offset recall gains, motivating precision-aware constraint elicitation.

24.
arXiv (CS.CL) 2026-06-11

To Intervene or Not: Guiding Inference-time Alignment with Probabilistic Model Blending

The wide deployment of LLMs has made model alignment necessary to make newly trained models safely and effectively respond to user instructions. Among different methods, inference-time alignment is often cheaper as it intervenes (i.e., offers guidances) only during output generation. Existing proposals apply guidances extracted from certain aligned models without properly assessing their reliability. Nonetheless, our systematic evaluation reveals that guidance effectiveness varies drastically across models; since ineffective guidances lead to further confusion and thus further interventions, the resulting excessive interventions typically indicate poor performance. To make interventions more effective and thus more efficient, we introduce BlendIn, an inference-time alignment framework that shifts from binary decisions to creating hybrid distributions integrating both models' knowledge. BlendIn stabilizes inference-time alignment by performing quality-aware alignment and proportionally weighting each model's contribution based on reliability. Compared with existing works, it preserves beneficial guidance while downweighting unreliable suggestions. BlendIn provides both diagnostic signals and mitigation strategies for misaligned guidance, achieving consistent and up to 50% performance improvement on challenging model pairs. Our code is available at: https://github.com/DecayingSeart/BlendIn.

25.
arXiv (CS.CL) 2026-06-17

PACE-RAG: Patient-Aware Contextual and Evidence-Constrained RAG for Clinical Drug Recommendation

Drug recommendation requires a deep understanding of individual patient context, especially for complex conditions like Parkinson's disease. While LLMs possess broad medical knowledge, they fail to capture the subtle nuances of actual prescribing patterns. Existing RAG methods also struggle with these complexities because guideline-based retrieval remains too generic and similar-patient retrieval often replicates majority patterns without accounting for the unique clinical nuances of individual patients. To bridge this gap, we propose PACE-RAG (Patient-Aware Contextual and Evidence-Constrained RAG). Rather than directly copying frequent medications from retrieved patients, PACE-RAG personalizes recommendations by first extracting patient-specific clinical features, retrieving cases around these features, and then refining the final prescription using the patient's current symptoms, active medication history, and focus-specific prescribing tendencies. By analyzing treatment patterns tailored to specific clinical features, PACE-RAG generates patient-specific medication recommendations along with an explainable clinical summary. Evaluated on a Parkinson's cohort and the MIMIC-IV benchmark using Llama-3.1-8B and Qwen3-8B, PACE-RAG achieved state-of-the-art performance, reaching F1 scores of 80.84% and 47.22%, respectively. These results suggest that PACE-RAG is a robust and clinically grounded framework for personalized decision support. Our code is available at: https://github.com/ChaeYoungHuh/PACE-RAG.