Academic Intelligence · Curated Daily

Explore the Frontier of Global Academia

AcademicHub aggregates real-time literature from top journals and preprint platforms. Build your personal research radar and let large language models compile cross-disciplinary analysis briefings automatically.

01.
PLOS Medicine 2026-05-21

U = U for all: Advancing equity in HIV prevention

by Thiago S. Torres, Paula M. Luz Suppression of HIV with antiretrovirals eliminates HIV transmission risk, summarized as Undetectable = Untransmittable (U = U). However, U = U literacy remains unevenly understood and shared, and stigmas persist. Equitable and accurate awareness of U = U requires culturally tailored interventions, improved provider education, and supportive policy environments beyond biomedical evidence alone. Suppression of HIV with antiretrovirals eliminates HIV transmission risk, summarized as Undetectable = Untransmittable (U=U). However, U=U literacy remains unevenly understood and shared, and stigmas persist. In this Perspective, Thiago Torres and Paula Luz outline what is needed to improve equity and accuracy in global awareness and education of U=U.

02.
arXiv (math.PR) 2026-06-15

Upper tails for irregular graphs beyond the mean-field regime

arXiv:2606.14564v1 Announce Type: new Abstract: Let $G_{n,p}$ be the binomial random graph of density $p$ and let $X_H$ be the number of copies of a fixed graph $H$ in $G_{n,p}$. We prove asymptotically tight bounds on the logarithmic upper-tail probability of $X_H$ whenever $H$ is a connected, irregular graph with maximum degree $\Delta \ge 2$ and $p \ge n^{-1/\Delta - \varepsilon_H} (\log n)^{\omega(1)}$ for an explicit $\varepsilon_H >0$. These bounds are expressed in terms of a new variational problem that generalises the combinatorial optimisation problem arising from the naïve mean-field approximation. This new variational problem includes an entropy term that corresponds to the large number of embeddings of certain highly structured graphs in $K_n$. For a certain class of irregular graphs $H$ that we call stable, we show that this description of the upper-tail probability is valid in a range of densities that is optimal up to a poly($\log\log n$) factor. For a further subclass of stable graphs, which includes all irregular complete bipartite graphs, we show that this range of densities is optimal up to a multiplicative constant.

03.
arXiv (quant-ph) 2026-06-12

A ribbon ZX calculus for gauge theory

arXiv:2606.13551v1 Announce Type: cross Abstract: ZX calculus provides a graphical formalism for reasoning about quantum processes, built from two interacting Frobenius algebras associated with the Z and X bases of a qubit. While it has found widespread application in quantum information and computing, its relationship to quantum field theory has only recently begun to be explored. In this work, we further develop this connection by providing a generalization of ZX calculus to two-dimensional Yang Mills theory with a compact gauge group. The key observation is that both frameworks can be organized around the Hopf Frobenius algebraic structure associated with a group algebra, which can in turn be described by the diagrammatics of two dimensional topological quantum field theory. Given the well known relationship between gauge theory and gravity in two and three dimensions, our work paves the way for applications of ZX to low dimensional gravity.

04.
arXiv (CS.AI) 2026-06-17

Vulcan: Instance-specialized, Verifiable Systems Heuristics Through LLM-driven Search

arXiv:2512.25065v2 Announce Type: replace-cross Abstract: Systems resource management tasks rely primarily on hand-designed heuristics. However, growing hardware heterogeneity and workload diversity require heuristics specialized to particular deployment instances, making manual design expensive and difficult to scale. In this paper, we explore how to synthesize systems heuristics using LLMs. The main challenge is ensuring that generated heuristics execute safely, integrate correctly with the surrounding system, and still achieve strong performance. We propose Vulcan, a framework that identifies LLM-friendly interfaces that isolate core decision logic from the rest of the implementation. With Vulcan, LLM-generated code is restricted to simple stateless decision functions, while trusted runtime abstractions provide rich derived statistics for meaningful policy exploration without system-integration bugs. To ensure execution safety, LLMs synthesize heuristics in a restricted language, Anvil, that guarantees important properties by construction. We evaluate Vulcan across three well-studied domains and demonstrate up to 4.9x higher savings for spot-VM scheduling, up to 2x lower miss ratios for cache eviction, and up to 10% higher application performance for tiered-memory systems, while ensuring execution safety throughout.

05.
arXiv (CS.LG) 2026-06-15

MOSIC: Model-Agnostic Optimal Subgroup Identification with Multi-Constraint for Improved Reliability

arXiv:2504.20908v3 Announce Type: replace Abstract: Current subgroup identification methods typically follow a two-step approach: first estimate conditional average treatment effects and then apply thresholding or rule-based procedures to define subgroups. While intuitive, this decoupled approach fails to incorporate key constraints essential for real-world clinical decision-making, such as subgroup size and propensity overlap. These constraints operate on fundamentally different axes than CATE estimation and are not naturally accommodated within existing frameworks, thereby limiting the practical applicability of these methods. We propose a unified optimization framework that directly solves the primal constrained optimization problem to identify optimal subgroups. Our key innovation is a reformulation of the constrained primal problem as an unconstrained differentiable min-max objective, solved via a gradient descent-ascent algorithm. We theoretically establish that our solution converges to a feasible and locally optimal solution. Unlike threshold-based CATE methods that apply constraints as post-hoc filters, our approach enforces them directly during optimization. The framework is model-agnostic, compatible with a wide range of CATE estimators, and extensible to additional constraints like cost limits or fairness criteria. Extensive experiments on synthetic and real-world datasets demonstrate its effectiveness in identifying high-benefit subgroups while maintaining better satisfaction of constraints.

06.
arXiv (CS.AI) 2026-06-11

Mind the Perspective: Let's Reason Recursively for Theory of Mind

arXiv:2606.11724v1 Announce Type: new Abstract: Theory of Mind (ToM) reasoning requires inferring agents' beliefs from partial and asymmetric observations, which remains an open challenge for LLMs. Existing prompting-based approaches improve ToM reasoning through observable-event filtering or temporal belief chains, without explicitly modeling nested beliefs. We introduce RecToM, an inference-time framework for ToM reasoning that models nested beliefs via recursive perspective construction. RecToM constructs each character perspective from the preceding character perspective along the character chain specified by the question, reducing higher-order belief questions to actual-world questions within the final constructed perspective. We further provide a KD45 analysis showing that RecToM's perspective construction induces a well-formed belief modality beyond simple event filtering. Experiments on ToM benchmarks, including Hi-ToM, Big-ToM, and FanToM, across multiple LLM backbones show that RecToM consistently outperforms recent advanced approaches, achieving state-of-the-art performance. Notably, RecToM reaches 100\% accuracy on Hi-ToM with GPT-5.4 and Qwen3.5, a benchmark requiring higher-order ToM reasoning.

07.
arXiv (CS.CL) 2026-06-11

On the Optimal Reasoning Length for RL-Trained Language Models

Reinforcement learning substantially improves reasoning in large language models, but it also tends to lengthen chain-of-thought outputs and increase computational cost. Although length-control methods have been proposed, the length-accuracy relationship they induce remains unclear. We train policies with several length-control methods on multiple base models in a controlled setup and find that, across both mathematical reasoning and code generation, accuracy is non-monotonic in output length, peaking at an intermediate value. Mode accuracy, however, continues to improve with length even in settings where sample accuracy plateaus or declines, indicating that the non-monotonic length-accuracy relationship is driven by dispersion around an increasingly correct center.

08.
arXiv (CS.CV) 2026-06-17

A Quantitative Analysis of Multimodal Biomarkers in Alzheimer's Disease

Despite increasing adoption of multimodal approaches in Alzheimer's Disease (AD) research – aimed at integrating molecular, structural, clinical, and genetic biomarkers to enhance disease characterization – the relationships among these modalities remain poorly understood. A systematic analysis of their dynamic interaction is essential for improving disease modeling, identifying redundant assessments, and reducing patient burden and acquisition costs. In this paper, we present a quantitative analysis of multimodal AD biomarkers by integrating tau-PET, structural MRI, cognitive scores (MMSE and CDR), and APOE4 data from 789 subjects drawn from the ADNI dataset. In our analyses, we (A) quantify cross-modal mutual information and explained variance to assess redundancy and predictive dependencies; (B) examine associations between tau topologies and structural atrophy across brain regions to select informative ROIs; (C) perform a statistical decomposition of the tau-cognition association into atrophy-related and atrophy-independent components; (D) and identify a dominant neurodegenerative trajectory that aligns with cognitive decline. This study provides a systematic characterization of cross-modal relationships, improving the interpretability and selection of biomarkers in AD. Code is publicly available at: https://github.com/antonioscardace/Multimodal-AD.

09.
arXiv (quant-ph) 2026-06-16

Excited-State Quantum Chemistry on Qumode-Based Processors via Variational Quantum Deflation

arXiv:2604.13457v3 Announce Type: replace Abstract: Variational quantum algorithms on bosonic quantum processors are an emerging paradigm for quantum chemistry calculations, exploiting the natural alignment between molecular structure and harmonic oscillator-based hardware. We introduce the qumode-based variational quantum deflation framework (QumVQD) for finding both electronic and vibrational excited state energies on qumode-based architectures. We validate the approach through electronic structure calculations on H$_{2}$ and linear H$_{4}$, where we introduce Hamming-weight filtering of the Fock basis to enforce particle number conservation and eliminate spurious eigenstates by reducing the required Hilbert space, which reduces the required number of qumodes in turn. We achieve agreement with full configuration interaction (FCI) using the STO-3G basis set within the chemical accuracy threshold at most points along the potential energy surfaces. Extending to the vibrational structure, we combine QumVQD with an existing Hamiltonian fragmentation approach based on Cartan subalgebra, allowing us to compute the vibrational eigenenergies of CO$_{2}$ and H$_{2}$S to spectroscopic accuracy with per-fragment circuits that scale as $O(N)$ in single-qumode gates and $O(N^2)$ in beam-splitter gates for $N$ qumodes. For the case of CO$_{2}$, we get total gate counts more than an order of magnitude smaller than those reported for qubit-based vibrational algorithms at this system size. These results demonstrate that bosonic quantum devices are a viable platform for excited-state quantum chemistry, particularly for vibrational problems where qubit-based methods incur substantial boson-to-qubit mapping overhead.

10.
arXiv (CS.AI) 2026-06-17

An Evaluation of Data Leakage Risks in Tool-Using LLM Agents in Realistic Scenarios

arXiv:2606.17114v1 Announce Type: cross Abstract: AI agents are increasingly being adopted in enterprise and personal settings with access to emails, databases, documents, and other tools where they can read, update, and disseminate sensitive information. Much of prior research on data leakage risks in agents has focused on adversarial data exfiltration through prompt injections and jailbreaks. However, sensitive information may also be exposed during non-adversarial use, creating leakage risks even when users issue benign requests. We report a joint evaluation by the Singapore AI Safety Institute and the Korea AI Safety Institute examining agent data leakage in 12 realistic, non-adversarial tasks spanning customer support, DevOps, web automation, and enterprise and personal productivity. The evaluation covers five risk types: lack of data awareness, audience awareness, policy compliance, data minimization, and access-boundary awareness. Both institutes tested a common set of scenarios mirroring real-world deployments using independent testing environments and task-specific LLM-judge rubrics. Across the three tested agents, none achieved fully correct and fully safe execution across all scenarios. Successful task completion often coincided with data-handling failures such as accessing unnecessary information or disclosing information to inappropriate recipients, indicating that capability and data-handling safety should be evaluated separately. Qualitative review also revealed claim-action mismatches, simulation-aware behavior, user-simulator role reversal, and interpretation gaps in automated judging. Overall, the results indicate that operational data leakage is a first-order agent-safety concern distinct from adversarial exfiltration and provide a methodology for future evaluations of agent data-handling safety.

11.
arXiv (CS.AI) 2026-06-11

Characterizing Software Aging in GPU-Based LLM Serving Systems

arXiv:2606.11916v1 Announce Type: cross Abstract: This paper proposes an empirical methodology to study software aging in GPU-based LLM serving systems. Traditional aging studies focus on CPU-centric software with relatively regular workloads; LLM serving is different, spanning a Python host and a CUDA device, handling requests whose cost varies by orders of magnitude, and relying on rapidly evolving software stacks. We run a 216-hour campaign across six co-located deployments under identical stress conditions, monitor host, device, and client metrics in parallel, and apply a statistical pipeline that accounts for autocorrelation and multiple testing. Our results reveal statistically significant memory aging in all deployments, with leak rates strongly dependent on the serving runtime and deployment configuration. Beyond these findings, we provide a reproducible framework that opens a research direction at the intersection of the software aging and rejuvenation and LLM serving communities.

12.
arXiv (CS.CV) 2026-06-11

Exploring Adaptive Masked Reconstruction for Self-Supervised Skeleton-Based Action Recognition

Recently, masked skeleton reconstruction models have emerged as strong action representation learners, driving significant progress in self-supervised skeleton-based action recognition. However, existing state-of-the-art methods must predict an exceedingly large number of spatiotemporal patches, significantly prolonging training time. Besides, by treating all spatiotemporal regions equally during reconstruction, these models are distracted from learning the critical motion patterns that underlie action semantics. To address these challenges, we propose Adaptive Masked Reconstruction (AMR), a faster and stronger pre-training framework. We first decouple the decoder from the encoder, enabling flexible prediction of larger spatiotemporal patches and dramatically reducing reconstruction complexity. Given that larger patches contain more complex information, which is challenging to predict and consequently degrades performance, we accordingly introduce an adaptive guidance module. This module identifies regions of high motion informativeness, guiding the model to focus on the most discriminative parts of each patch and alleviating reconstruction difficulty. Experiments on NTU RGB+D 60, NTU RGB+D 120, and PKU-MMD datasets demonstrate that AMR not only accelerates pre-training substantially but also improves downstream recognition accuracy, surpassing current state-of-the-art approaches.

13.
arXiv (math.PR) 2026-06-16

A Low-Regularity Semigroup Sewing Lemma via Quotient Structures

arXiv:2606.16164v1 Announce Type: new Abstract: We develop a low-regularity Sewing theory for the semigroup coboundary $\hat\delta=\delta-a$ associated with a strongly continuous semigroup $S$. Unlike the ordinary low-regularity Sewing problem, the semigroup setting has an intrinsic algebraic non-uniqueness below the threshold $1$, in the sense that solutions are canonical only modulo semigroup cocycles. Accordingly, the natural target is a quotient space rather than an increment space. We identify this quotient structure and construct the corresponding semigroup Sewing map. The construction uses a frozen terminal-time transform, which rewrites semigroup defects, for each terminal time, as ordinary low-regularity Sewing problems on a frozen simplex. This reduction, however, does not by itself produce a genuine semigroup increment; the main additional step is to prove that the frozen solution classes are compatible as the terminal time varies and hence assemble into a canonical quotient class for $\hat\delta$. This yields canonical classes for $0

14.
arXiv (CS.CV) 2026-06-15

SMART: Scalable Mesh-free Aerodynamic Simulations from Raw Geometries using a Transformer-based Surrogate Model

Machine learning-based surrogate models have emerged as more efficient alternatives to numerical solvers for physical simulations over complex geometries, such as car bodies. Many existing models incorporate the simulation mesh as an additional input, thereby reducing prediction errors. However, generating a simulation mesh for new geometries is computationally costly. In contrast, mesh-free methods, which do not rely on the simulation mesh, typically incur higher errors. Motivated by these considerations, we introduce SMART, a neural surrogate model that predicts physical quantities at arbitrary query locations using only a point-cloud representation of the geometry, without requiring access to the simulation mesh. The geometry and simulation parameters are encoded into a shared latent space that captures both structural and parametric characteristics of the physical field. A physics decoder then attends to the encoder's intermediate latent representations to map spatial queries to physical quantities. Through this cross-layer interaction, the model jointly updates latent geometric features and the evolving physical field. Extensive experiments show that SMART is competitive with and often outperforms existing methods that rely on the simulation mesh as input, demonstrating its capabilities for industry-level simulations.

15.
arXiv (CS.CL) 2026-06-15

Dialogue SWE-Bench: A Benchmark for Dialogue-Driven Coding Agents

AI coding agents have rapidly transformed software engineering, powering widely used interactive coding assistants. Despite their interactive real-world use, existing benchmarks evaluate them as fully-autonomous systems. In this work, we introduce Dialogue SWE-Bench, an automatic benchmark dataset for evaluating the ability of coding agents to resolve real-world software engineering problems through dialogue with a user. We design a novel, persona-grounded user simulator to support our task evaluation, and augment our task evaluation with automatic evaluations of dialogue quality. We also propose a new schema-guided agent, aimed at improving the dialogue capabilities of off-the-shelf coding agents, which improves over strong baselines by 3-14%. Our results indicate that better coding models do not always correspond to better dialogue models, suggesting that dialogue capability is a distinct and currently understudied dimension of coding agent performance.

16.
arXiv (quant-ph) 2026-06-16

Interaction and non-Hermiticity controlled transmission in extended Su-Schrieffer-Heeger models

arXiv:2606.15245v1 Announce Type: cross Abstract: We study the transport characteristics of an extended version of the Su-Schrieffer-Heeger (SSH) model with next-nearest-neighbor (NNN) interactions and non-Hermitian onsite energies. We observed that transport in such a system is significantly modified by the NNN interaction and the non-Hermitian terms. The transmission coefficient exhibits oscillatory behavior as the strength of the NNN interaction varies in a fixed-length chain. Moreover, the transmission coefficient also shows oscillation with system size for a fixed strength of the NNN interaction. We find that novel oscillatory behavior of the transmission coefficient, arising form the NNN interaction, is a unique feature of such a model and has not been reported previously. The presence of the non-Hermitian terms also enhances/reduces the transmission coefficient depending on the values of the other system parameters like intra-, inter- and NNN hopping. It appears from our study that both the NNN interaction and the non-Hermiticity introduce significant changes in the transport properties of the extended SSH chain, which are not observed in the standard Hermitian nearest-neighbour variant of the SSH model.

17.
arXiv (CS.CV) 2026-06-12

Proto-LeakNet: Towards Signal-Leak Aware Attribution in Synthetic Human Face Imagery

The growing sophistication of synthetic image and deepfake generation models has turned source attribution and authenticity verification into a critical challenge for modern computer vision systems. Recent studies suggest that diffusion pipelines unintentionally imprint persistent statistical traces, known as signal-leaks, within their outputs, particularly in latent representations. Building on this observation, we propose Proto-LeakNet, a signal-leak-aware and interpretable attribution framework that integrates Closed-set classification with a density-based Open-set evaluation on the learned embeddings, enabling analysis of unseen generators without retraining. Acting in the latent domain of diffusion models, our method re-simulates partial forward diffusion to expose residual generator-specific cues. A temporal attention encoder aggregates multi-step latent features, while a feature-weighted prototype head structures the embedding space and enables transparent attribution. Trained solely on closed data and achieving a Macro AUC of 98.13\%, Proto-LeakNet learns a latent geometry that remains robust under post-processing, surpassing state-of-the-art methods, and achieves strong separability both between real images and known generators, and between known and unseen ones. The codebase is available at the following link: https://github.com/claudiunderthehood/Proto-LeakNet .

18.
arXiv (CS.CL) 2026-06-18

Mitigating Scoring Errors and Compensating for Nonverbal Subtests in Speech-Based Dementia Assessment

Early detection of cognitive impairment relies on neuropsychological tests to minimize subjectivity by assessing multiple cognitive domains. Speech-based evaluation can support diagnostics and improve accessibility, but transcription errors and the omission of nonverbal subtests (e.g., motor skills) limit accuracy. Beyond conventional test scores, speech-derived features can provide additional insights into cognitive status. This study investigates the speech-based evaluation of the German "Syndrom-Kurz-Test," a standardized dementia screening test comprising verbal and motor subtests. We train models that integrate transcript-derived scores and Whisper embeddings per verbal subtest to reduce scoring errors. To compensate for missing motor subtests, we then leverage these fused representations to approximate expert overall ratings. Despite omitting subtests, our models strongly correlate with expert ratings and efficiently and accurately discriminate between cognitive status groups.

19.
arXiv (CS.CL) 2026-06-16

A Systematic Evaluation of Large Language Models for PTSD Severity Estimation: The Role of Contextual Knowledge and Modeling Strategies

Large language models (LLMs) are increasingly being used in a zero-shot (generative) fashion to assess mental health conditions, yet we have limited knowledge on what factors affect their accuracy. In this study, we use a clinical dataset of natural language narratives and self-reported PTSD severity scores from 1,437 individuals to comprehensively evaluate the performance of 11 state-of-the-art LLMs. To understand the factors affecting model's assessment accuracy, we systematically varied (i) contextual knowledge prompted to the models like subscale definitions, distribution summary, and interview questions, and (ii) modeling strategies including zero-shot vs few shot, amount of reasoning effort, model sizes, structured subscales vs direct scalar prediction, output rescaling and nine ensemble methods. Our findings indicate that (a) LLMs are most accurate when provided with detailed construct definitions and context of the narrative, even exceeding human raters agreement with self-reported scores; (b) increased reasoning effort leads to better estimation accuracy; (c) performance of open-weight models (Llama, DeepSeek) plateaus beyond 70B parameters while closed-weight (gpt-o3-mini, gpt-5) alternatives improve with newer generations; and (d) best performance is achieved when ensembling a supervised model with the zero-shot LLMs. Beyond agreement with self-reports, LLMs' estimates discriminated PTSD severity from depression, anxiety, and alcohol use, and prospectively predicted future mental healthcare expenditure. Together, these results suggest that contextual knowledge and modeling strategies meaningfully affect accuracy and clinical utility of LLM-based assessments of PTSD severity.

20.
arXiv (CS.AI) 2026-06-16

Forced Deferral: Manipulating Routing Decisions in Multimodal LLM Cascades

arXiv:2606.15308v1 Announce Type: new Abstract: While multimodal large language models (MLLMs) have shown strong visual reasoning abilities, serving a large model for every query is computationally expensive. MLLM cascades mitigate this cost by first querying a weak but cheaper model and deferring to a strong model when the weak model's output is unconfident. However, since the weak model's confidence directly controls compute allocation, these systems expose a new attack surface: an adversary can manipulate confidence so that their queries are consistently deferred to the strong model. Motivated by this vulnerability, we introduce the Forced Deferral Attack (FDA), an adversarial image attack that lowers the weak model's confidence and causes cascades to route queries to the strong model. FDA learns a universal border trigger by optimizing a temperature-flattened objective. This objective pushes the weak model's token distribution on triggered inputs toward less concentrated targets constructed from its clean responses. Across datasets, model families, and deferral metrics, FDA consistently increases strong-model routing while outperforming image-perturbation and prompt-injection baselines. These results show that MLLM cascades are vulnerable to attacks that manipulate compute allocation, forcing unintended strong-model usage without directly targeting answer correctness.

21.
arXiv (CS.AI) 2026-06-12

From Imitation to Alignment: Human-Preference Flow Policies for Long-Horizon Sidewalk Navigation

arXiv:2606.12603v1 Announce Type: cross Abstract: Autonomous long-horizon sidewalk navigation is essential for micro-mobility applications such as robotic food delivery and assistive electronic wheelchairs. Unlike autonomous driving on the road, long-horizon sidewalk navigation requires precise maneuvering through unpredictable sidewalk terrains and pedestrians, with a lightweight perception stack as minimal as a single monocular RGB camera. While imitation learning (IL) from demonstrations offers a practical solution, the resulting autopilot policy often suffers from compounding errors, a lack of social compliance on sidewalks, and deficiencies in counterfactual reasoning to handle complex situations. To address these challenges, we introduce FlowPilot, a mapless navigation policy that achieves robust and efficient long-horizon navigation performance using only a monocular RGB camera. We first propose to use anchored flow matching as an action representation for policy pre-training on large-scale robot fleet data and to capture the diverse, complex, multimodal distribution of sidewalk navigation behaviors. To bridge the gap between imitation and alignment, we further design a human-in-the-loop preference learning scheme to tune the policy on a small amount of human intervention data. It strengthens the model's counterfactual reasoning and social compliance on sidewalks. We evaluate FlowPilot through extensive simulation and real-world experiments in diverse sidewalk environments. FlowPilot achieves 42% success rate and 66% route completion in simulation, while FlowPilot-HP further improves real-world robustness and social compliance, reducing IR by 40.0% and NIR by 52.1% relative to the base model.

22.
arXiv (CS.CV) 2026-06-16

DifferAD-R1: A Difference-Guided IndustrialAnomaly Localization with Multimodal LargeLanguage Models

Industrial anomaly localization aims to accurately identify and localize abnormal regions in industrial products, addressing the critical challenge of detecting unseen defect categories in real-world scenarios. Traditional closed-set methods often suffer from poor cross-scenario generalization, while existingMultimodal Large Language Model (MLLM)-based approachesface two core limitations: they either adopt QA-style paradigmsmisaligned with the practical demands of localization, or relyon standard optimization techniques such as Group RelativePolicy Optimization (GRPO), which fails to deliver effectivelearning signals for subtle defects. To tackle these issues, thispaper proposes DifferAD-R1, an MLLM-augmented reinforcement learning framework tailored for industrial anomaly localization. We design a Difference-Guided dual-image paradigm,which reformulates the localization task as a one-shot difference grounding problem to effectively explore cross-scenarioanomalies. A Dual-Consistency Localization Reward is developedfor hard-to-detect anomalies, enhancing optimization stabilityand robustness. Additionally, we integrate a difficulty-awarestrategy with adaptive reweighting and group-wise resamplingto prioritize learning on challenging instances. To facilitateevaluations in real-world industrial settings, we construct theAD-DualDiff dataset, comprising 13K paired images across 20categories. Experimental results demonstrate that DifferADR1 significantly outperforms existing baselines and achievescompetitive performance compared to large-scale models likeQwen3-VL (235B parameters). Our code is publicly availableat: https://github.com/Rong2026/work-1.

23.
arXiv (CS.LG) 2026-06-17

Reward hacking in physical reinforcement learning revealed by turbulent drag reduction

arXiv:2606.06227v2 Announce Type: replace-cross Abstract: A reinforcement-learning agent maximises its reward, which can diverge from the outcome its designer intended. In physical control the reward rarely closes that gap, and drag reduction in wall turbulence makes it concrete. A mass-conservation projection couples agents' outputs and erases the per-agent credit the policy gradient needs; a memoryless policy cannot resolve the slow near-wall cycle it acts on; and a pressure-gradient reward pays for nominal drag reduction by pumping power through the wall. Two degenerate controllers achieve large drag reductions while total dissipation rises, so the reported figure can mask a more wasteful flow. We trace each fault to its cause and fix it: a differentiable projection that restores credit, a recurrent policy with a widened sensing stencil, and a reward scored on the true wall power. The corrected controller acts on the flow within a closed energy budget, earning a conservative $17\%$ under honest accounting.

24.
arXiv (CS.LG) 2026-06-18

P$^2$CE: Model-Agnostic Plausible Pareto-Optimal Counterfactual Explanations

arXiv:2606.18418v1 Announce Type: new Abstract: The increasing use of machine learning algorithms in social applications has raised concerns about fairness and transparency, leading to the development of counterfactual explanations. These explanations supports individuals to understand and potentially alter unfavorable decisions in areas such as loan applications, job selections, and more, by providing actionable changes to input features that would lead to a desired outcome. Existing methods often struggle to balance feasibility, plausibility, and computational efficiency. To address this, we introduce P$^2$CE, an algorithm for generating plausible Pareto-optimal counterfactual explanations, offering users a diverse set of optimal trade-offs between different notions of feasibility. P$^2$CE employs an auxiliary isolation forest outlier detector to ensure that explanations are in accordance with the data distribution and leverages SHAP values to obtain optimal results with short computing times, regardless of the underlying model. Our algorithm was empirically evaluated on three datasets, demonstrating superior performance in terms of both solution quality and computational efficiency compared to related techniques.

25.
arXiv (quant-ph) 2026-06-12

Measurement Geometry for Quantum Random Access Codes: Beyond Nayak Bound and Toward Optimality

arXiv:2606.12700v1 Announce Type: new Abstract: Quantum random access codes (QRACs) ask how well N classical bits can be encoded into M qubits while allowing any single bit to be recovered. Although the Nayak bound remains the standard general upper bound on the decoding probability, numerical evidence suggests a stronger upper bound in the small-qubit regime. In this work, we formulate the optimal decoding probability in terms of decoding measurements, reformulating QRAC design as a spectral problem for noncommuting measurements. Using this formulation, we give an elementary proof of the Nayak bound by simplifying the Chernoff-bound argument. Moreover, we refine the argument to obtain upper bounds that improve over Nayak's bound in the entire finite-size regime. The equality conditions of our bounds justify defining mutually unbiased projector-valued measurements (MUPVMs), a generalization of mutually unbiased bases. We show that decoding measurement of any two-qubit QRAC attaining the conjectured bound must form MUPVMs. We also show that any MUPVM, assisted by one ancillary qubit, yields a QRAC with optimal N-scaling decoding probability. Finally, we propose a new MUPVM-based construction for the (M+2,M)-QRAC family attaining the conjectured bound.