Academic Intelligence · Curated Daily

探索全球前沿学术脉络

AcademicHub 汇聚顶级期刊与预印本平台的实时文献。定制您的专属科研雷达,利用大语言模型自动生成交叉领域文献分析简报。

01.
arXiv (CS.LG) 2026-06-11

Visual-Redundancy-Controlled Parallel Decoding for Diffusion-Based Multimodal Large Language Models

arXiv:2605.25820v2 Announce Type: replace Abstract: Diffusion-based multimodal large language models (dMLLMs) decode by iteratively predicting tokens at multiple masked positions in parallel. This turns each decoding step into a position-selection problem: the model must choose not only which predictions are reliable in isolation, but also which positions should be committed together as context for later decoding steps. Existing confidence-based decoding ranks masked positions independently and commits the top-K positions, largely ignoring whether the committed tokens provide complementary visual grounding. We identify a step-level limitation of this strategy in multimodal settings: high-confidence tokens selected in the same step can rely on overlapping visual grounding, introducing visual redundancy among the committed tokens and leaving less complementary visual grounding available for later decoding. To quantify this effect, we introduce the Visual Redundancy Index (VRI), which measures visual grounding overlap among tokens committed in parallel. To control this redundancy during decoding, we propose Visual-Redundancy-Controlled Decoding (VRCD), a training-free inference-time decoding method that uses token-to-image attention to prioritize visually complementary positions. Across diverse multimodal benchmarks, VRCD reduces visual redundancy and remaining-position entropy with modest runtime overhead. In longer decoding experiments, it also achieves relative accuracy gains of up to 18.8% on M^3CoT and 6.9% on MMBench over confidence-based decoding. Code is available at https://github.com/infiniteYuanyl/VRCD.

02.
arXiv (CS.LG) 2026-06-15

Free Heavy-Tailed Lunch for Muon: A Theoretical Justification of Empirical Success

arXiv:2606.14560v1 Announce Type: cross Abstract: Non-Euclidean optimisation methods with matrix-valued updates, such as Muon and Scion, have recently shown strong empirical performance for training Transformer models, yet their theoretical advantages over Euclidean methods remain poorly understood. We address this gap in the heavy-tailed non-convex regime, where stochastic gradients have bounded $p$-th central moments, $p \in (1,2]$. We show that certain non-Euclidean methods achieve optimal sample complexity under stronger stationarity measures, while Euclidean methods incur additional dimension-dependent costs. As a consequence, for $m \times n$ matrices, Muon finds an $\varepsilon$-stationary point in nuclear norm within $\mathcal{O}\left(\min\{m, n\} \frac{\Delta_1 L}{\varepsilon^2} \left(\frac \sigma \varepsilon \right)^{\frac p {p-1}}\right)$ samples, absorbing heavy-tailed noise without extra dimension dependence, unlike Euclidean methods. We further prove this sample complexity, including its dimension dependence, is optimal for all first-order methods under nuclear-norm stationarity. Experiments on large language models support our theory. Surprisingly, our results suggest that other Schatten geometries beyond the spectral geometry of Muon can perform competitively in certain settings.

03.
arXiv (CS.CL) 2026-06-11

FORT-Searcher: Synthesizing Shortcut-Resistant Search Tasks for Training Deep Search Agents

Training deep search agents requires verifiable questions whose answers remain unavailable until sufficient evidence has been acquired through search. Existing synthesis methods often increase apparent difficulty by enriching graph structures, but structural complexity alone does not guarantee realized search difficulty: the intended search process can collapse through a cheaper identifying route. We formalize this gap with a shortcut-aware difficulty framework and identify four actionable shortcut risks: evidence co-coverage, single-clue selectivity, exposed constants, and prior-knowledge binding. To diagnose their realized effects, we use trajectory signatures including solving cost, answer hit time, and prior-shortcut rate. Guided by this framework, we introduce FORT, a Framework of Shortcut-Resistant Training-Data Synthesis. FORT constructs shortcut-resistant training data by controlling shortcut risks across entity selection, evidence graph construction, question formulation, and adversarial refinement. Experiments show that FORT induces longer pre-answer search and fewer shortcut patterns than existing open-source deep search datasets. Using the resulting trajectories, we train FORT-Searcher with supervised fine-tuning (SFT) only, and it achieves the best overall performance among comparable-size open-source search agents on challenging deep search benchmarks. Relevant resources will be made available at https://github.com/RUCAIBox/FORT-Searcher.

04.
arXiv (quant-ph) 2026-06-12

The Pound-Drever-Hall Method for Superconducting-Qubit Readout

arXiv:2512.03138v3 Announce Type: replace Abstract: Scaling quantum computers to large sizes requires the implementation of many parallel qubit readouts. Here we present an ultrastable superconducting-qubit readout method using the multi-tone self-phase-referenced Pound-Drever-Hall (PDH) technique, originally developed for use with optical cavities. In this work, we benchmark PDH readout of a single transmon qubit, using room-temperature heterodyne detection of all tones to reconstruct the PDH signal. We demonstrate that PDH qubit readout is insensitive to microwave phase drift, displaying $0.73^\circ$ phase stability over 2 hours, and capable of single-shot readout in the presence of phase errors exceeding the phase shift induced by the qubit state. We show that the PDH sideband tones do not cause unwanted measurement-induced state transitions for a transmon qubit, leading to a potential signal enhancement of at least $14$~dB.

05.
arXiv (CS.AI) 2026-06-19

Finetuning Vision-Language-Action Models Requires Fewer Layers Than You Think

arXiv:2606.20246v1 Announce Type: cross Abstract: Vision-Language-Action (VLA) models pre-trained on massive video-robot datasets have revolutionized robotic manipulation, yet their multi-billion parameter architectures impose prohibitive computational burdens during downstream fine-tuning and real-time inference. In this work, we reveal a highly non-trivial architectural characteristic of these continuous control foundation policies (e.g., pi_0, GR00T-N1.5): despite being trained on diverse physical trajectories, they exhibit severe layer-wise representational redundancy. To exploit this, we introduce a structural compression pipeline that is entirely training-free, bypassing the need of existing methods to load full-scale models to learn optimized token reductions or dynamic layer selectors. Instead, using only a single forward pass via Centered Kernel Alignment to identify redundant layer features, we remove twin layers to permanently compress the model depth by up to 50% across both the VLM backbone and the continuous control policy head. Downstream fine-tuning of this streamlined architecture yields a dual acceleration benefit: a 40-50% reduction in training time and up to 30% faster real-time inference, while matching or exceeding full-scale base model performance. We comprehensively validate our method across three simulation benchmarks (LIBERO, RoboCasa, SimplerEnv) and 10 diverse real-world manipulation tasks across 4 unique robotic embodiments. These results prove that advanced VLAs require significantly fewer layers than previously assumed, offering a highly compute-efficient paradigm for scalable robot learning.

06.
arXiv (CS.CV) 2026-06-16

Systematic Evaluation of Novel View Synthesis for Video Place Recognition

The generation of synthetic novel views has the potential to positively impact robot navigation in several ways. In image-based navigation, a novel overhead view generated from a scene taken by a ground robot could be used to guide an aerial robot to that location. In Video Place Recognition (VPR), novel views of ground locations from the air can be added that enable a UAV to identify places seen by the ground robot, and similarly, overhead views can be used to generate novel ground views. This paper presents a systematic evaluation of synthetic novel views in VPR using five public VPR image databases and seven typical image similarity methods. We show that for small synthetic additions, novel views improve VPR recognition statistics. We find that for larger additions, the magnitude of viewpoint change is less important than the number of views added and the type of imagery in the dataset.

07.
arXiv (CS.CV) 2026-06-12

Magnifying What Matters: Attention-Guided Adaptive Rendering for Visual Text Comprehension

Visual Text Comprehension (VTC) renders text into images for a vision-language model (VLM) to read, sidestepping LLM context-window limits and powering applications from long-page OCR to multi-page memory QA. Yet existing VTC pipelines treat rendering and layout as a fixed, content-agnostic preprocessing step and offer little mechanistic understanding of how VLMs internally process visualized text. Through a focused empirical study on VTC QA tasks, we reveal that VLMs exhibit a localization-without-utilization regime: evidence-localizing attention emerges sharply in the middle-to-late layers and is largely decoupled from answer correctness, yet simply enlarging the localized spans on the rendered page recovers a large fraction of the failures. Building on these observations, we propose AGAR (Attention-Guided Adaptive Rendering), a training-free, model-agnostic method that leverages a VLM's own middle-to-late layer attention to identify the top-K important visual patches, maps them back to word spans, and re-renders the page with those spans enlarged before re-inferring the answer. Extensive experiments across nine VTC benchmarks (short-form, long-context, and multi-page memory QA) and four VLM backbones show that AGAR (i)consistently improves off-the-shelf VLMs as a plug-and-play enhancement, (ii)composes with VLM post-training to yield further gains, and (iii)remains robust under both visual- and text-side input degradation.

08.
arXiv (CS.AI) 2026-06-18

Why SWAVE May Not Be All You Need:A Concept-Evolution Retrospective on Complex-Valued Recurrent Language Models

arXiv:2606.18324v1 Announce Type: cross Abstract: SWave is a complex-valued recurrent language model (169.26M parameters, D=384, L=16, T=2048) trained on FineWeb-Edu using 2xH100 NVL. It was designed around three founding premises: that representing language as complex waves rather than real-valued numbers enables richer information encoding; that a Cayley-parameterised unitary transition provides a mathematical guarantee against state decay or explosion; and that a hidden state which rotates rather than shrinks preserves signal integrity over arbitrarily long contexts. The core of SWave evolved substantially across three development phases. The Resonance Head was found to structurally admit imaginary-channel collapse as a global loss minimum (a failure mode we term cos-domination collapse) and was superseded by an untied head with independent real and imaginary embedding tables from the Phase-Associative Memory (PAM) architecture. This resolved the degenerate minimum and enabled stable 200,000-step training (best-step PPL 22.0 at step 89,861). ComplexNorm and the Wave Propagation Scan proved load-bearing throughout all three phases and were retained to the final architecture. ProtectGatedScan was reframed as a structural prior rather than a learned behaviour. The four multi-scale retention concepts showed no measurable improvement under controlled evaluation and were found non-load-bearing. The ComplexGatedUnit was superseded by a real-valued squared-ReLU channel mixer with fewer parameters. The auxiliary training objectives showed no benefit once structural constraints were resolved. The investigation yields a formal characterisation of cos-domination collapse, a parallel scan with a log-space backward pass for numerical stability, six transferable engineering principles for complex-valued recurrent training, and a plan-to-code traceability methodology for catching structural divergences that conventional test suites miss.

10.
Nature (Science) 2026-06-17

Molecular basis of polyadenylated RNA fate determination in the nucleus

作者:

Eukaryotic genomes generate a plethora of polyadenylated (pA+) RNAs1,2, which are packaged into ribonucleoprotein particles (RNPs). To ensure faithful gene expression, functional pA+ RNPs, including protein-coding RNPs, are exported to the cytoplasm, whereas transcripts within non-functional pA+ RNPs are degraded in the nucleus1–4. How cells distinguish these opposing fates remains unknown. The DExD-box ATPase UAP56 (also known as DDX39B) is a central component of functional pA+ RNPs, and promotes their docking to the nuclear pore complex-anchored TREX-25,6, which triggers transcript release from UAP56 to facilitate export7. Here we reveal that the poly(A) tail exosome targeting (PAXT) connection8 binds a TREX-2-like module, which releases pA+ RNAs from UAP56 for decay by the nuclear exosome. The core of this module consists of a LENG8–PCID2–SEM1 trimer, which we show is structurally and biochemically equivalent to the central GANP–PCID2–SEM1 trimer of TREX-2. Mutagenesis and transcriptomic data demonstrate that the nuclear fate of pA+ RNPs is governed by the contending actions of nucleoplasmic PAXT and nuclear pore complex-associated TREX-2, which interpret RNA-bound UAP56 as a signal for RNA decay or export, respectively. As RNA targets of PAXT are generally short and intron-poor, we propose an overall model for pA+ RNP fate determination whereby the distinct sub-nuclear localizations of PAXT and TREX-2 govern the degradation of short non-functional pA+ RNAs while allowing export of their longer and functional counterparts. Biochemical, structural and cell biological analyses reveal that UAP56 (DDX39B) assembles with a TREX-2–like module that redirects non-functional polyadenylated RNAs from export to degradation.

11.
medRxiv (Medicine) 2026-06-12

Genetic basis of dynamic brain states reveals cellular and disease associations

Dynamic resting-state fMRI captures the time-varying patterns of brain activity that are obscured by static approaches. Hidden Markov Models (HMMs) characterise these dynamics as recurring whole-brain states and quantify their fractional occupancy (FO), the proportion of time spent in each state, yet the biological basis of inter-individual variation in FO remains unclear. Using data from 52,335 White UK Biobank participants, with replication in East and South Asian subsamples, this study examined the heritability, cellular and neurotransmitter basis of brain states, and their links with complex phenotypes. FO was significantly heritable and enriched for neuronal populations, particularly glutamatergic and GABAergic signalling. Analyses identified shared and state-specific loci and revealed genetic correlations, colocalisation, and potential causal relationships between FO and several phenotypes, including educational attainment, sleep duration, and disease risk. These findings establish dynamic brain states as biologically grounded intermediate phenotypes, linking genetic variation to neural dynamics, diseases and traits.

12.
arXiv (CS.AI) 2026-06-17

CogGen: Cognitive-Load-Inspired Fully Unsupervised Deep Generative Modeling for Compressively Sampled MRI Reconstruction

arXiv:2603.04438v3 Announce Type: replace-cross Abstract: Fully unsupervised deep generative modeling (FU-DGM) offers significant potential for compressively sampled magnetic resonance imaging (CS-MRI) reconstruction. Representative FU-DGM formulations, such as deep image prior (DIP) and implicit neural representation (INR), employ architectural bias to induce a low-dimensional manifold in the image space that aligns with the forward observation. However, as the underlying inverse system is highly ill-posed, prolonged iterative fitting in FU-DGM typically leads to poor efficiency and noise amplification. In this paper, guided by the cognitive principle of easy-to-hard learning, we propose CogGen, an FU-DGM framework that reformulates CS-MRI reconstruction as a staged inversion problem. Specifically, CogGen implements an self-paced curriculum learning (SPCL)-driven progressive scheduling strategy through an MRI-aware dual-threshold weighting criterion, which adaptively regulates k-space measurement participation. The data-consistency residual thresholding evaluates the fitting reliability of the current generator, while the k-space radius thresholding controls stage-wise measurement exposure, thereby avoiding uniform fitting throughout optimization. Theoretically, our analysis shows that, when early stages favor easy-to-fit measurements, CogGen yields a reduced local sufficient-iteration bound and a smaller cumulative noise-amplification bound, explaining the improved convergence behavior and reconstruction fidelity of CogGen within a finite iteration budget. Numerical experiments demonstrate that both CogGen instantiations, CogGen-DIP and CogGen-INR, achieve superior performance over prevailing CS-MRI reconstruction techniques, including unsupervised and supervised pipelines.

13.
arXiv (quant-ph) 2026-06-11

Experimental straintronics in nanotube quantum dots

arXiv:2606.12180v1 Announce Type: cross Abstract: Single-wall carbon nanotubes (SWCNTs) are narrow ribbons of graphene with atomically precise edges and a single quantum transport channel, at experimentally-relevant dopings. This makes them ideal systems to harness quantum transport straintronics (QTS), i.e. using mechanical strain to control accurately quantum transport. We present QTS data from three single-wall carbon nanotube quantum dot (SWCNT-QD) transistors over a broad range of in-situ tunable and reversible uniaxial strain ($\Delta\varepsilon_mech\approx$ 0 to 3 %). We first present the nanofabrication of the suspended SWCNT transistors whose channel lengths are $\approx$ 30 nm. The channels are strained by moving gold clamps holding firmly the nanotubes. We present detailed charge transport data, $dI/dV_{B} - V_{B} - V_{G}$ and $dI/dV_{B} - V_{B} - \Delta\varepsilon_mech$, showing a large mechanical-gating effect of the SWCNT-QDs. The precise reversibility of the data, and their agreement with QTS theory, confirms that the tubes are strained elastically. We demonstrate that the mechanical control of the QD doping is not due to capacitive-gating effects, but to quantitatively predictable bandstructure changes including a strain-tunable bandgap. This precise mechanical control of the doping and bandgap of SWCNT-QDs could find applications in qubits, condensed matter physics, and homojunction molecular transistors.

14.
arXiv (quant-ph) 2026-06-12

Symmetry and Topology of Monitored Quantum Dynamics

arXiv:2412.06133v4 Announce Type: replace-cross Abstract: The interplay between unitary dynamics and quantum measurements induces diverse phenomena in open quantum systems with no counterparts in closed quantum systems at equilibrium. Here, we generally classify Kraus operators and their effective non-Hermitian dynamical generators, thereby establishing the tenfold classification for symmetry and topology of monitored free fermions. Our classification elucidates the role of topology in measurement-induced phase transitions and identifies potential topological terms in the corresponding nonlinear sigma models. Furthermore, we establish the bulk-boundary correspondence in monitored quantum dynamics: nontrivial topology in spacetime manifests itself as topologically nontrivial steady states and gapless boundary states in Lyapunov spectra, such as Lyapunov zero modes and chiral edge modes, leading to the topologically protected slowdown of dynamical purification.

15.
arXiv (CS.AI) 2026-06-15

Learning optimal policies from event logs through reinforcement learning: a comparison of deep and MDP-based approaches

arXiv:2303.09209v2 Announce Type: replace Abstract: Prescriptive Process Monitoring is an emerging area within Process Mining that focuses on recommending actions to optimize business outcomes. Most existing works prescribe pre-defined interventions, i.e., sets of actions applied to ongoing process executions to achieve a specific objective or Key Performance Indicator (KPI). In contrast, only a few approaches have explored learning and evaluating optimal behavioral policies, i.e., general strategies that determine the best sequence of actions to maximize a desired KPI. In this paper, we address the problem of learning optimal behavioral policies by proposing an AI-based approach that learns an optimal policy directly from historical process executions using Reinforcement Learning (RL) to recommend the best actions for optimizing a KPI. To this end, we employ two RL techniques. The first is a classical model-based approach that extends previous work by the authors through the construction of a Markov Decision Process (MDP) capturing process behavior. The second is a model-free technique based on offline Deep RL. Unlike state-of-the-art work, we aim to minimize the use of domain knowledge and learn optimal policies directly from historical event data. This allows us to learn when to apply interventions and discover effective ones directly from data. Moreover, we target complex scenarios involving external actors, where the process owner controls only part of the activities. We adopt a data-driven Business Process Simulation (BPS) environment to evaluate the learned policies. Results show that both methods improve the targeted KPI with similar effectiveness, while the model-based approach outperforms offline Deep RL in computational efficiency.

16.
arXiv (CS.AI) 2026-06-16

TNODEV: Toolbox for Neural ODE Verification

arXiv:2606.16567v1 Announce Type: new Abstract: Neural ordinary differential equations (neural ODE) have started to appear in safety critical settings such as continuous-time controllers for cyber-physical systems and classifiers integrated into automated decision pipelines, raising the question of whether their behavior can be formally verified. Existing tools dedicated to neural ODE provide only a single reachability call without iterative input set refinement, limiting the precision of their verdicts to whatever one reachability call can deliver. We present TNODEV, the first sound formal verifier for neural ODE that integrates a falsification checker, a fast interval-based reachability backend based on continuous-time mixed monotonicity, a verification and refinement loop with three input-set splitting heuristics, and a parallel scheduler in a single end-to-end pipeline. TNODEV supports safe-set inclusion verification on pure neural ODE, neural ODE in closed loop with a neural network controller and general neural ODE (GNODE), with the safe set specified either as an interval or as the half-space intersection induced by a target classification label. We evaluate TNODEV on a range of benchmarks across safe-set inclusion and classification-robustness properties, including a direct reachability comparison against NNV~2.0 and CORA and a verification comparison against NNV2.0 on MNIST general neural ODE classifiers.

18.
arXiv (CS.CV) 2026-06-16

Text-Vision Co-Instructed Image Editing

Existing image editing methods can be generally categorized into textual instruction-based and visual prompt-based ones. Textual instructions are semantically expressive, but are limited by the coarse granularity of spatial control of the editing results. In contrast, visual prompts such as drag and point can provide precise spatial guidance, but are limited by the inherent ambiguity in semantic intent. To unify the strength of textual and visual prompts, we present Text-Vision Co-Instructed Image Editing, which jointly models textual instructions as semantic intent and sparse visual instructions as spatial guidance, aiming to achieve precise and intent-faithful image manipulation. To this end, we first construct a textual-visual instruction paired dataset with more than 23K samples derived from dynamic videos, enabling aligned supervision for cross-modal instruction. We then propose TV-Edit, a Textual-Visual instruction unified Editing framework to contextualize drag or point-based visual instructions with image-text semantics and lift them into semantic-aware control representations for pretrained editing backbones. By integrating semantic intent and spatial constraints, TV-Edit leads to more precise spatial control, less instruction ambiguity, and stronger structural consistency than text-only or drag-based alternatives. Finally, we establish TV-Edit-Bench, a deliberately designed benchmark to evaluate semantic faithfulness, spatial alignment, and visual consistency with ground-truth references and controlled textual-visual variations for reliable assessment. Our experiments across multiple editing backbones demonstrate that TV-Edit consistently yields more precise and intent-faithful edits, significantly outperforming state-of-the-art instruction-based and drag-based baselines.

19.
arXiv (CS.AI) 2026-06-17

A Machine-Learned Comorbidity Index

arXiv:2606.17450v1 Announce Type: new Abstract: Traditional comorbidity scores (e.g., Charlson and Elixhauser) are widely used for risk adjustment and patient stratification, but they have two key limitations: (i) they are largely mortality-centric and do not align well with other clinical outcomes, and (ii) their linear, rule-based structure cannot capture nonlinear, outcome-specific risk relationships. We propose a Machine-Learned Comorbidity Index (MLCI) that maps diagnosis codes to a single scalar by maximizing the normalized Hilbert-Schmidt Independence Criterion (nHSIC) between the learned score and multiple clinical outcomes. MLCI captures nonlinear risk-outcome dependence and is supported by a theory that characterizes when a unified, informative admission-level ordering can be achieved across outcomes. Empirical results on multiple benchmark electronic health record (EHR) datasets show that MLCI outperforms strong baselines across multiple evaluation metrics.

20.
medRxiv (Medicine) 2026-06-16

Deployment-readiness audit of calibration, clinical utility, and fairness in perioperative infection prediction

Objective: Clinical risk scores intended to guide patient-level decisions can show strong average performance. However, predicted probabilities can be systematically too high or too low in specific subgroups even when overall performance is strong. We audited deployment readiness of a strong end-of-surgery postoperative infection model across clinically relevant subgroups and tested mitigation strategies in miscalibrated subgroups. Materials and Methods: We analyzed out-of-fold predictions for 10,719 surgical procedures at a Swiss tertiary hospital, with 504 postoperative bacterial infection events. Prespecified axes were recorded sex, age stratum, and an EHR-derived physiological-reserve proxy. Within subgroups and pairwise intersections, we evaluated discrimination, calibration, threshold-specific errors, and decision-curve net benefit at the prespecified operating threshold. We compared group-specific isotonic recalibration with Wasserstein-barycenter postprocessing and demonstrated portability in SUPPORT2. Results: Overall AUROC was 0.876. While sex-marginal discrimination was similar in women and men (0.878 vs 0.875), age and reserve stratification revealed deployment-readiness failures. Calibration-in-the-large ranged from -0.86 in frail patients to -2.47 in non-frail patients. At the 0.10 operating threshold, decision-curve net benefit was positive in frail patients but negative in pre-frail and non-frail patients. Isotonic recalibration corrected average physiological-reserve-stratified calibration without worsening Brier scores, whereas Wasserstein postprocessing worsened calibration in most procedure clusters. Discussion: Discrimination-only or sex-marginal evaluation would have missed subgroup failures with clinical-utility implications. Conclusion: Subgroup fairness audits for clinical deployment should jointly evaluate discrimination, calibration, and utility. We implemented the audit as the open-source isitfair framework for identifying deployment-relevant subgroup failures, comparing mitigation strategies, and generating structured reports.

21.
arXiv (CS.CV) 2026-06-15

SMART: Scalable Mesh-free Aerodynamic Simulations from Raw Geometries using a Transformer-based Surrogate Model

Machine learning-based surrogate models have emerged as more efficient alternatives to numerical solvers for physical simulations over complex geometries, such as car bodies. Many existing models incorporate the simulation mesh as an additional input, thereby reducing prediction errors. However, generating a simulation mesh for new geometries is computationally costly. In contrast, mesh-free methods, which do not rely on the simulation mesh, typically incur higher errors. Motivated by these considerations, we introduce SMART, a neural surrogate model that predicts physical quantities at arbitrary query locations using only a point-cloud representation of the geometry, without requiring access to the simulation mesh. The geometry and simulation parameters are encoded into a shared latent space that captures both structural and parametric characteristics of the physical field. A physics decoder then attends to the encoder's intermediate latent representations to map spatial queries to physical quantities. Through this cross-layer interaction, the model jointly updates latent geometric features and the evolving physical field. Extensive experiments show that SMART is competitive with and often outperforms existing methods that rely on the simulation mesh as input, demonstrating its capabilities for industry-level simulations.

22.
arXiv (CS.AI) 2026-06-19

Analyzing Defensive Misdirection Against Model-Guided Automated Attacks on Agentic AI Systems

arXiv:2606.20470v1 Announce Type: cross Abstract: Agentic AI systems increasingly rely on language-model components to interpret instructions, process external data, invoke tools, and coordinate with other agents. These capabilities make prompt-injection and jailbreak attacks more consequential, especially as attackers adopt model-guided automation to scale probing, prompt refinement, and response evaluation. This work analyzes the resulting attack-defense setting through a probabilistic model of a target system, its defense mechanism, and the attacker's automated judge. Our analysis shows that conventional detect-and-block defenses can allow attacker success rate (ASR) to approach one as the query budget grows, since predictable refusals provide useful feedback to automated search. We then examine detect-and-misdirect, where detected malicious interactions receive controlled, non-operational responses designed to induce false-positive errors in the attacker's judge. This strategy reduces the positive predictive value of attacker-selected candidates and yields a bounded asymptotic ASR. We evaluate a proof-of-concept realization of this strategy through Contextual Misdirection via Progressive Engagement (CMPE), a lightweight conversational misdirection method designed to replace predictable refusal text with safe but strategically misleading responses in automated jailbreak settings. On jailbreak benchmarks, CMPE reduces estimated ASR upper bounds by up to two orders of magnitude and nearly eliminates verified attack success in end-to-end PAIR and GPTFuzz attack runs.

23.
arXiv (CS.CL) 2026-06-12

M\"OVE: A Holistic LLM Benchmark for the German Public Sector

We present M\"OVE (Modelle für die \"Offentliche Verwaltung Evaluieren), a holistic benchmark for evaluating large language models (LLMs) in the context of the German public sector. While LLMs are increasingly adopted in public administration, model selection remains largely ad hoc, and existing benchmarks offer limited guidance: they are predominantly English-centric, US-centric in content, and focus exclusively on task performance. M\"OVE addresses these gaps by evaluating 39 models across two complementary dimensions. Performance criteria cover summarization, question answering, and topic extraction. Governance criteria assess hallucination tendencies, energy consumption, provider transparency, and alignment with German constitutional values and knowledge about positions by German political parties. In total, we utilize ten German-language datasets, including gold- and silverstandard datasets that we constructed to reflect public-administration domains. We employ a multi-metric evaluation strategy combining classical NLP metrics, embedding-based methods, and LLM-as-a-judge approaches. Our results show that no single model dominates across all criteria: top performers differ between tasks, and model size alone is a poor predictor of quality. We further evaluate the benchmark itself, analyzing its statistical precision, LLM judge reliability, the impact of our private datasets on model rankings, the sensitivity of our results to prompt formulation, and the validity of our energy consumption estimates. M\"OVE is designed as a living benchmark under active development; results are publicly available at https://moeve.bundesdruckerei.de/.

24.
arXiv (CS.CL) 2026-06-15

ADORE: Iterative Query Expansion with Retrieval-Grounded Relevance Feedback

LLM-based query expansion improves retrieval by enriching the original query with additional context. Yet most methods remain generation-driven, producing plausible pseudo-documents or expansions without checking how the target corpus responds. This can introduce retrieval drift, amplify misleading vocabulary, or miss terms that distinguish relevant from non-relevant documents. We argue that effective expansion requires retrieval-grounded feedback, not just single-pass generation or unverified iteration. We introduce ADORE (ADapt, Observe, Relevance Evaluate), an iterative framework that turns retrieval outcomes into feedback for the next expansion. At each round, an LLM generates pseudo-passages, a retriever exposes the corpus response, and a relevance assessor evaluates retrieved documents against the original query. These judgments identify what to reinforce, what remains undercovered, and what to suppress. Across TREC Deep Learning, BEIR, and BRIGHT, ADORE consistently outperforms strong query expansion baselines with notable improvements across nearly all evaluation settings, improving average nDCG@10 by 24.5% over BM25 and 3.6% over the strongest prior query expansion method on BEIR, and by 122.9% over BM25 and 9.2% over the best query expansion baseline on BRIGHT. Our code and data are publicly available.

25.
arXiv (math.PR) 2026-06-18

Evolution of Conditional Entropy for Diffusion Dynamics on Graphs

arXiv:2510.19441v2 Announce Type: replace-cross Abstract: The modeling of diffusion processes on graphs is the basis for many network science and machine learning approaches. Entropic measures of network-based diffusion have recently been employed to investigate the reversibility of these processes and the diversity of the modeled systems. While results about their steady state are well-known, very few exact results about their finite-time evolution exist. Here, we introduce the conditional entropy of heat diffusion in graphs, and outline a mathematical framework that contextualizes diffusion and conditional entropy within the theories of continuous-time Markov chains and information theory. In particular, we highlight that this entropic measure satisfies an information-theoretical version of the second law of thermodynamics, thereby providing a parallelism between diffusion dynamics on networks and their physical counterparts. Furthermore, we obtain explicit results for its evolution on complete, path, and circulant graphs, as well as a mean-field approximation for Erdös-Rényi graphs. We also obtain asymptotic results for general networks and provide bounds for the evolution of conditional entropy. Finally, we experimentally demonstrate several properties of conditional entropy for diffusion over random graphs, such as the Watts-Strogatz model.