Academic Intelligence · Curated Daily

Explore the Frontier of Global Academia

AcademicHub aggregates real-time literature from top journals and preprint platforms. Build your personal research radar and let large language models compile cross-disciplinary analysis briefings automatically.

01.
arXiv (CS.AI) 2026-06-15

FedRot-LoRA: Mitigating Rotational Misalignment in Federated LoRA

arXiv:2602.23638v3 Announce Type: replace-cross Abstract: Federated LoRA provides a communication-efficient mechanism for fine-tuning large language models on decentralized data. In practice, however, a discrepancy between the factor-wise averaging used to preserve low rank and the mathematically correct aggregation of local updates can cause significant aggregation error and unstable training. We argue that a major source of this problem is rotational misalignment, arising from the rotational invariance of low-rank factorizations – semantically equivalent updates can be represented in different latent subspaces across clients since $(B_i R_i)(R_i^\top A_i) = B_i A_i$. When such misaligned factors are averaged directly, they interfere destructively and degrade the global update. To address this issue, we propose FedRot-LoRA, a federated LoRA framework that aligns client updates via orthogonal transformations prior to aggregation. This alignment preserves the semantic update while reducing cross-client subspace mismatch, without increasing communication cost or restricting model expressivity. We provide a convergence analysis that examines the aggregation error induced by factor-wise averaging and shows how rotational alignment yields a tighter upper bound on this error. Extensive experiments on natural language understanding and generative tasks demonstrate that FedRot-LoRA consistently outperforms existing federated LoRA baselines across a range of heterogeneity levels and LoRA ranks.

02.
arXiv (CS.CL) 2026-06-19

Quality Over Clicks: Iterative Reinforcement Learning for Early-Stage E-Commerce Query Suggestion

Existing dialogue systems rely on query suggestion to enhance user engagement. Recent approaches mainly optimize generative models using click-through rate (CTR) models to align with user preferences. However, these methods are less effective in early-stage deployment scenarios, where click feedback is sparse and insufficient for training a reliable CTR model. To bridge this gap, we propose QualEQS, a quality-first iterative reinforcement learning framework for e-commerce query suggestion. We formalize actionable suggestion quality along three dimensions that directly affect downstream usability: answerability, factuality, and information gain. To continuously improve from online traffic without click supervision, we further propose group-level disagreement among candidate suggestions to identify ambiguous query contexts and mine hard training cases for iterative refinement. We also introduce EQS-Benchmark, a dataset of 16,949 real-world e-commerce queries for offline training and evaluation. Experiments show that our quality-based offline metrics correlate strongly with online performance, providing a practical evaluation recipe for sparse-feedback deployment. In both offline and online settings, QualEQS consistently outperforms strong baselines, yielding a 6.81% improvement in online ChatPV in a real-world enterprise-level conversational shopping assistant system.

03.
arXiv (CS.CV) 2026-06-17

Million-scale multimodal pollen microscopy with expert-guided foundation models

Automated pollen identification from microscopy remains a bottleneck in aerobiology, palaeoecology and biodiversity monitoring, because scalable systems must generalise across specimen preparation, scanner settings and geographic origins while retaining palynological interpretability. To address this gap, we present a million-scale multimodal pollen microscopy resource, Pollen AI Atlas, assembled from pure-species whole-slide bright-field images spanning four geographic origins, four scanner settings and 46 taxon labels across 31 botanical families. Seeded by one manually selected exemplar per source slide, token-level mining and filtering produced 1,511,390 released grain detections with 99.6\% proposal precision in expert-curated test regions. Each detection was paired with machine-generated grain-level morphological captions from five open-weight vision-language models, guided by expert-verified palynological anchors, yielding structured descriptions of aperture systems, wall ornamentation, shape and size. Among the evaluated models, Gemma4 provided the most controlled primary caption set, combining tight length control, no leakage and the strongest text-retrieval performance. Baseline benchmarks with frozen visual features reached 88.16\% top-1 accuracy, while cross-regional retrieval showed that caption-derived text embeddings remained robust when image similarity degraded (mAP@20 0.811 versus 0.262). Released data, annotations, captions, splits, code, and weights provide a benchmark for pollen recognition, cross-regional domain adaptation and domain-specific multimodal microscopy learning.

04.
arXiv (CS.CV) 2026-06-11

AerialClaw: An Open-Source Framework for LLM-Driven Autonomous Aerial Agents

Unmanned aerial vehicles (UAVs) are increasingly used in inspection, search and rescue, environmental monitoring, and emergency response. However, most UAV applications still rely on pre-defined command sequences or task-specific pipelines, where developers manually connect perception, planning, flight control, simulation, logging, and safety modules. This limits the flexibility, reproducibility, and extensibility of autonomous aerial systems. This paper presents AerialClaw, an open-source software framework that enables UAVs to operate as decision-making aerial agents rather than merely command-following platforms. Given a natural-language mission, AerialClaw allows an LLM-based agent to understand the task, maintain context, invoke executable aerial skills, observe perception and runtime feedback, and iteratively update its decisions in a closed loop. The framework adopts a modular brain-skill-runtime architecture, combining hard skills for atomic UAV operations, Markdown-based soft skills for reusable task strategies, document-driven agent state and capability boundaries, memory-driven reflection, safety-oriented runtime validation, and platform-agnostic execution adapters. AerialClaw supports lightweight mock execution, PX4 SITL with Gazebo, and AirSim-based simulation, together with a web console, pluggable model backends, example missions, simulation assets, and staged deployment scripts. By combining standardized aerial skills, document-driven agent state, memory, and closed-loop LLM decision-making, AerialClaw provides a reproducible and extensible open-source framework for building UAV systems that can interpret missions, make decisions, execute skills, and adapt their behavior from feedback.

05.
arXiv (CS.CV) 2026-06-16

Tool-IQA: Augmenting Image Quality Assessment with Simple Tools

Vision-Language Models (VLMs) have been increasingly adopted for Image Quality Assessment (IQA). However, current methods typically employ a static one-shot scoring paradigm, despite the fact that humans assess image quality through dynamic visual inspection, e.g., selectively adjusting views to verify details and subtle artifacts. Specifically, relying solely on a single-pass observation introduces two primary limitations: first, perceiving the image only at a global scale restricts the assessment of finer local details; second, the original intensity distribution of the image may overwhelm the visibility, leading to insufficient inspection of image quality. To address these issues, we propose Tool-IQA, shifting the assessment mechanism from passive scoring to a tool-augmented workflow. In particular, we equip VLMs with simple yet effective view tools: a Magnifier to inspect local details, and a Gamma Corrector to uncover visibility and hidden artifacts. The assessment follows a structured pipeline that consists of an initial observation with rubric notes, a tool-augmented in-depth inspection, and a final quantification for calibrated quality score. Furthermore, to ensure efficient and purposeful tool callings, we introduce a batch-aware training strategy to reward tool interactions that can yield positive contributions rather than simply encouraging usage. Experiments on a variety of IQA benchmarks demonstrate that, with effective tool calling and calibrated assessment, our proposed Tool-IQA significantly outperforms existing state-of-the-art models, e.g., it achieves a PLCC of 0.854 on the challenging CLIVE dataset.

06.
arXiv (CS.LG) 2026-06-16

Surrogate-Assisted Framework for SI-Compliant Interconnect Design Optimization Using the Earth Mover's Distance

arXiv:2606.15234v1 Announce Type: cross Abstract: This work presents a deterministic, machine-assisted framework for SI-compliant PCB design based on the Earth Mover's Distance (EMD). In contrast to conventional surrogate-based optimization methods that rely on iterative black-box search procedures, the proposed approach follows an interpretable, sequential evaluation strategy. Neural surrogate models are first used to efficiently predict waveform describing features from topology-dependent design parameters. A decision tree then acts as a physically motivated quality gate that identifies SI-compliant waveforms according to predefined SI criteria. Within the resulting valid solution space, the Earth Mover's Distance is employed as a similarity metric to rank candidate designs according to their proximity to an ideal reference signal. This enables not only the deterministic identification of admissible parameter regions but also a transparent prioritization of physically superior solutions without inverse modeling or stochastic search procedures. The methodology is demonstrated using a large-scale set of simulated DDR3 fly-by waveforms. By combining surrogate prediction, interpretable classification, and EMD-based waveform evaluation, the framework provides an explainable and computationally efficient alternative to conventional optimization strategies for supporting PCB development with AI-based methods.

07.
arXiv (quant-ph) 2026-06-19

Locally Gentle State Certification for High Dimensional Quantum Systems

arXiv:2602.04550v3 Announce Type: replace Abstract: Standard approaches to quantum statistical inference rely on measurements that induce a collapse of the wave function, effectively consuming the quantum state to extract information. In this work, we investigate the fundamental limits of locally-gentle quantum state certification, where the learning algorithm is constrained to perturb the state by at most $\alpha$ in trace norm, thereby allowing for the reuse of samples. We analyze the hypothesis testing problem of distinguishing whether an unknown state $\rho$ is equal to a reference $\rho_0$ or $\epsilon$-far from it. We derive the minimax sample complexity for this problem, quantifying the information-theoretic price of non-destructive measurements. Specifically, by constructing explicit measurement operators, we show that the constraint of $\alpha$-gentleness imposes a sample size penalty of $\frac{d}{\alpha^2}$, yielding a total sample complexity of $n = \Theta(\frac{d^3}{\epsilon^2 \alpha^2})$. Our results clarify the trade-off between information extraction and state disturbance, and highlight deep connections between physical measurement constraints and privacy mechanisms in quantum learning. Crucially, we find that the sample size penalty incurred by enforcing $\alpha$-gentleness scales linearly with the Hilbert-space dimension $d$ rather than the number of parameters $d^2-1$ typical for high-dimensional private estimation.

08.
arXiv (CS.CL) 2026-06-17

A Red-Team Study of Anthropic Fable 5 & Opus 4.8 Models

Authors:

We evaluate the adversarial robustness of two frontier large language models (LLMs) developed by Anthropic, Fable 5 and Opus 4.8, against four families of automated jailbreak attack across 7 826 harmful intents spanning a ten-category harm taxonomy. Using the HackAgent red-teaming framework, hundreds of thousands of adversarial attempts were generated and every apparent success was independently re-adjudicated by a panel of three judge models (majority vote). Both models resist the majority of attacks, but the residual surface is larger than aggregate framing suggests: it is dominated by adaptive iterative attacks, while static obfuscation is near-fully neutralised. The strongest adaptive search (tree-of-attacks) breaks Opus 4.8 on 11.5% of intents overall, whereas Fable 5 stays in the single digits (6.1% worst-case). Aggregate rates therefore should not be read as reassurance. Even in these hardened configurations, the two models produced 1 620 (Opus 4.8) and 702 (Fable 5) panel-confirmed harmful completions spanning every harm category, located automatically, cheaply, and within the first one or two refinement steps by an attacker model with no human expert in the loop. The reasonable conclusion is that even the best, most-tested frontier models remain reliably breakable under sustained automated pressure.

09.
arXiv (CS.CV) 2026-06-16

Geometric Action Model for Robot Policy Learning

Generalist robot policies must follow user instructions while reasoning about how objects, cameras, and robot actions interact in the 3D physical world. Recent vision-language-action models (VLAs) and video world-action models (WAMs) inherit strong semantic or temporal priors from large-scale foundation models, but they still operate primarily on 2D image frames or 2D-derived latent spaces, leaving implicit the 3D geometry required for contact-rich manipulation. We propose the Geometric Action Model (GAM), a language-conditioned manipulation policy that directly repurposes a pretrained geometric foundation model (GFM) as a shared substrate for perception, temporal prediction, and action decoding. GAM splits the GFM at an intermediate layer: the shallow layers serve as an observation encoder, and a causal future predictor inserted at the split layer forecasts future latent tokens conditioned on language, proprioception, and action history. The predicted future tokens are then routed through the remaining GFM blocks for feature propagation and decoding, allowing a single backbone to produce both future geometry and actions. This design equips the GFM with language-conditioned temporal world modeling through minimal architectural modification while preserving its rich geometric priors. Across a broad suite of simulation and real-robot manipulation benchmarks, GAM is more accurate, more robust, faster, and lighter than current foundation-model-scale baselines.

10.
arXiv (CS.AI) 2026-06-19

QMFOL: Benchmarking Large Language Model Reasoning via Quantifiable Monadic First-Order Logic Test Case Generation

arXiv:2606.20227v1 Announce Type: new Abstract: Large Language Models (LLMs) have made significant progress in reasoning, particularly in deductive reasoning, which is crucial for high-stakes decision-making. As models improve, evaluation benchmarks should evolve to keep pace. However, existing benchmarks lack fine-grained control over logical complexity and struggle to balance semantic diversity with logical consistency. To address these issues, we propose QMFOL, an automated framework for generating monadic first-order logic reasoning tasks with quantifiable and controllable complexity. It constructs formal logical structures using conjunction and disjunction patterns, enabling precise control over reasoning depth, width, label types, and distractors. These structures are then translated into natural language via LLMs, with logical consistency ensured through round-trip verification using an external prover. Based on our framework, we build QMFOLBench, a benchmark comprising 2880 instances with 960 configurations across diverse logical and semantic dimensions. Evaluations on six large reasoning models (LRMs) and two LLMs show that performance degrades and computational overhead increases with rising logical complexity. Models perform better on True-labeled tasks than on False or Unknown ones, and exhibit sensitivity to semantic variation. Overall, QMFOL offers a scalable and reliable approach for constructing deductive reasoning benchmarks with controllable complexity, enabling more precise evaluation of reasoning capabilities in modern language models.

11.
arXiv (CS.AI) 2026-06-16

Knowledge-Based Zero-Replay Debugging of Multi-Agent LLM Traces

arXiv:2606.14805v1 Announce Type: cross Abstract: Reliable operation of multi-agent large language model (LLM) systems depends on debugging long execution traces, where the few causally decisive events are buried in unstructured logs of messages, routes, memory writes, and tool calls. The standard tool is counterfactual replay (rewind, edit, and re-run the trajectory to measure each event's effect), but its cost grows linearly with the number of candidate events, making exhaustive replay infeasible at scale. We frame trace debugging as a knowledge-based decision-support problem. Each trace is compiled into a structured event knowledge graph over routing, memory, tool-use, uncertainty, and latent evidence, and a calibrated predictor decides where a scarce replay budget should be spent. We do not propose a new replay oracle; we propose a method to predict its results without paying the replay cost. We formulate zero-replay counterfactual-effect prediction: given a trace under a fixed budget, predict which events the oracle would mark high-effect before any replay is performed. BranchPoint-Latent is a lightweight predictor over observable, structural, uncertainty, and latent features of the knowledge graph. Calibrated against a deterministic replay oracle across 37 trace families, a single learning-to-rank gradient-boosted predictor raises per-trace localization (Branch Recall@5) from 0.73 to 0.93 on held-out families at zero oracle-replay cost. Rather than claiming universal dominance, we characterize when cheap graph centrality suffices and when learned evidence is necessary. The result is an auditable, cost-efficient decision-support system for AI-reliability debugging, positioned explicitly on the cost-accuracy frontier with reproducible artifacts.

12.
Nature (Science) 2026-06-10

Deep learning four decades of human migration

Authors:

Human migration is a fundamental driver of global demographic change, shaping population structure, labour markets and social policy across countries1–3. Although long-term migration patterns are often linked to economic development4, they can shift rapidly in response to shocks such as conflict, environmental crises and political change5. Despite its importance, migration remains difficult to measure consistently: existing data are sparse, concentrated in high-income settings and are fragmented across incompatible definitions, temporal resolutions and data types6–8. Past efforts have relied on partial datasets, including flow records, stock estimates and model-based reconstructions with limited coverage9–14. A central challenge is therefore to construct a globally consistent, high-resolution account of migration flows over time. Here we present a new dataset of annual origin-destination migration across 230 countries and regions from 1990 to the present, integrating diverse data sources into a unified modelling framework. By combining official statistics, census-based stocks, net migration estimates and past flow reconstructions, our approach produces temporally detailed and spatially comprehensive estimates that substantially extend existing resources. Using an ensemble of deep recurrent neural networks informed by geographic, economic, cultural and political covariates, we capture both persistent trends and short-term responses to changing conditions—all while propagating uncertainty to generate confidence bounds. Our results outperform existing five-year flow estimates on held-out data and provide finer temporal resolution, revealing previously obscured dynamics in global migration patterns. This framework highlights regions in which uncertainty remains high and data collection is most urgently needed. By releasing all data, code and trained models, we provide a transparent and reproducible foundation for future work. These advances enable a more timely and detailed understanding of human mobility, with implications for research and policy in an increasingly dynamic global system. A global annual migration-flow dataset (1990–2024) is produced using deep-learning models and diverse sources to estimate movements across 230 countries with improved temporal resolution, coverage and uncertainty estimates.

13.
arXiv (CS.AI) 2026-06-16

CmdNeedle: Measuring the Incompleteness of Command Denylists for AI Agents

arXiv:2606.15549v1 Announce Type: cross Abstract: The adoption of AI agents is increasing rapidly. Terminal AI agents, i.e., AI agents that run in terminal environments, are a widely used type of AI agents. Terminal AI agents rely heavily on shell command execution to interact with the host systems. They adopt a three-list command-gating mechanism to mitigate security risks introduced by command execution, with denylists serving as the load-bearing component. However, modern operating systems often ship a large, ever-expanding set of shell commands with complex functionalities. Our observation is that even a built-in denylist of Claude Code, well-maintained by its developers, can overlook bypass commands that invalidate its effectiveness. Such negligence leads to fragile command denylists that cannot even block operations that practitioners expect them to block. This paper presents the first systematic characterization of command denylist fragility in terminal AI agents. The paper formalizes the command denylist fragility problem and proposes an LLM-driven pipeline, CmdNeedle, to detect such fragility. It prompts the LLM to propose possible bypasses and iteratively repairs them using feedback from a validator that executes them in a sandbox. In the evaluation, we applied CmdNeedle to 1,709 real-world command denylists (containing 13,332 denylist rules) collected from GitHub. The evaluation shows several key findings, including that 69.0–98.6% of the denylists are fragile, that this fragility occurs consistently across projects and agents, and the validity of several possible root causes for this fragility. Our pipeline and findings will hopefully facilitate future research and practice regarding the command denylists used by AI agents.

14.
arXiv (quant-ph) 2026-06-11

Quantum iterative approach to the Traveling Salesman Problem

arXiv:2606.11843v1 Announce Type: new Abstract: The Traveling Salesman Problem (TSP) is a classical NP-hard problem in combinatorial optimization, where determining the shortest route among a set of cities becomes computationally prohibitive as the problem size increases. This work explores quantum computing as an alternative approach to address this complexity. Unlike existing methods that primarily rely on quantum annealing, we propose a quantum iterative framework integrating Quantum Phase Estimation (QPE) and Grover's search algorithm. Route costs are encoded as quantum phases, enabling QPE to efficiently evaluate them, while Amplitude Amplification, implemented via the Grover-Long algorithm, iteratively refines the solution space toward the optimal route. A proof-of-concept case study on a small-scale TSP instance demonstrates the feasibility of this approach and its potential for scaling to larger optimization problems. Furthermore, under an expectation-based analysis, the algorithm exhibits an expected computational complexity of $O(\frac{m^2\log_2(m)\log_2(1/\epsilon)}{\sqrt{\epsilon}})$ which depends on the error tolerance parameter $\epsilon$. This estimation omits the initialization term, which we expect future refinements to render subdominant to Phase Estimation.

15.
arXiv (quant-ph) 2026-06-17

Hybrid Acousto-Optical Double Dressing of a Two-Level System

arXiv:2509.25847v2 Announce Type: replace Abstract: We experimentally investigate resonance fluorescence from a two-level system in a novel configuration where a strong laser drives an optical Rabi oscillation while an acoustic field parametrically modulates the frequency of the two-level system. We observe emission spectra that deviate markedly from the standard Mollow triplet, including dynamical cancellation of the central peak. A doubly dressed state model incorporating hybridization among the emitter, optical field, and acoustic field captures these features. Guided by this model, we experimentally validate the condition for optimal cooling of acoustic phonons in an emitter-optomechanical system. These results reveal new regimes of strongly driven quantum nonlinear interactions.

16.
arXiv (CS.AI) 2026-06-12

Free-Placement Optimization of Ground Station Locations for Low-Earth Orbit Satellites

arXiv:2606.12667v1 Announce Type: cross Abstract: Rapidly expanding low Earth orbit satellite constellations are placing increasing demands on terrestrial ground networks, motivating the development of more efficient ground station network designs. Current approaches select sites from predefined locations, limiting optimization to existing infrastructure and constraining performance. In contrast, free-placement optimization operates over a continuous spatial domain on Earth, broadening the search space and allowing higher-throughput configurations at the cost of potentially requiring new infrastructure deployment. In this work, we introduce SCORE (Sequential Cyclic Optimization via Refinement & Evaluation), a two-stage free-placement method for ground station design. SCORE combines sequential coordinate selection with cyclic refinement to manage high-dimensionality, non-convexity, and local minima that challenge global optimizers. We benchmark SCORE against one-shot methods such as differential evolution (DE) and integer programming approaches using locations from Kongsberg Satellite Services and the World Teleport Association. Tests across two commercial Earth observation constellations (Capella Space and ICEYE) and one synthetic Walker-Star constellation show that SCORE requires up to 5x fewer function evaluations to converge relative to DE while improving downlink throughput by up to 13%. Compared to fixed-site methods, unconstrained SCORE achieves up to 15% greater total downlink, establishing a strong empirical performance benchmark for flexible placement; infrastructure-constrained SCORE retains over 92% of this gain while restricting placement to within proximity of existing fiber and power infrastructure. We also explore trade-offs between expanding existing stations and deploying new sites, informing future ground network design for operational constellations.

17.
arXiv (CS.AI) 2026-06-12

On Approximating the Dynamic Response of Synchronous Generators via Operator Learning: A Step Towards Building Deep Operator-based Power Grid Simulators

arXiv:2301.12538v2 Announce Type: replace-cross Abstract: This paper develops an Operator Learning framework for approximating the dynamic response of synchronous generators. The framework can be used to (i) build a neural network-based generator model that interacts with a power grid simulator or (ii) shadow the true generator's transient response. First, we develop a data-driven Deep Operator Network (DeepONet) to approximate the infinite-dimensional solution operator of the generators. Then, we design a numerical scheme based on DeepONet that simulates the generator's response over a given time horizon. The proposed scheme recursively employs the trained DeepONet to simulate the response for a given multi-dimensional input that describes the interaction between the generator and the power grid. In addition, we design a residual DeepONet numerical scheme that can incorporate information from existing mathematical models. We accompany this residual DeepONet scheme with an estimate for the prediction's cumulative error. Finally, we build a data aggregation (DAgger) strategy that allows fine-tuning of DeepONets using aggregated training data that the DeepONets will likely encounter during interactive simulations with other grid components. As a proof of concept, we demonstrate that the proposed frameworks can effectively approximate the transient model of a synchronous generator.

18.
arXiv (CS.LG) 2026-06-16

Mean-Field Parallel Decoding for Discrete Diffusion Language Models

arXiv:2606.15805v1 Announce Type: new Abstract: Discrete diffusion language models enable parallel token generation, offering a pathway to low-latency decoding. However, selecting tokens independently by marginal confidence limits effective parallelism: tokens that appear reliable in isolation can form incompatible configurations when several positions are updated at once. We introduce a training-free decoding framework that coordinates these parallel updates. At each forward pass, the method assigns a commit score to each masked position and refines these scores using pairwise interactions derived from the model's predictive distributions. A variational relaxation yields a simple fixed-point update that suppresses conflicting simultaneous commitments within a single forward pass. This mechanism allows the decoder to commit more tokens in parallel while maintaining competitive generation quality. The method is lightweight, requires no auxiliary model or retraining, and drops into existing diffusion decoding pipelines without modification. Experiments on reasoning and code-generation benchmarks show consistent improvements in the quality-latency trade-off.

19.
arXiv (CS.AI) 2026-06-12

AAbAAC: An Annotated Corpus for Autoimmunity Information Extraction

arXiv:2606.13051v1 Announce Type: new Abstract: Despite advances in information extraction driven by deep learning and large language models, performance gaps remain in highly specialized biomedical fields, where domainspecific complexity poses challenges for generalist models. In this work, we focus on the domain of autoimmunity, where the main entities of interest are autoimmune diseases, autoantibodies (i.e., molecules that may mark or cause these diseases), their molecular targets, their location in the body, and their associated clinical signs. Herein, we present AAbAAC (AutoAntibodies and Autoimmunity Annotated Corpus), a corpus of 115 abstracts selected from PubMed, where we manually annotated entities and their relationships. First, AAbAAC was used to evaluate several methods on the task of named entity recognition (NER), and secondly, to fine-tune NER models. Our study demonstrates the utility of AAbAAC for information extraction in the domain of autoimmunity, showing expected improvement in NER performance after finetuning. This illustrates the value of small-scale annotation efforts for specialized domains and contributes to the computational study of autoimmunity. The AAbAAC corpus is available at https://github.com/f-maury/AAbAAC.

20.
arXiv (CS.LG) 2026-06-15

Recipe-Controlled Decoder Audit for Structural Knowledge-Graph Completion

arXiv:2606.14492v1 Announce Type: new Abstract: We present a recipe-controlled decoder audit (RCDA) for structural transductive knowledge-graph completion (KGC). The audit asks a simple reporting question: before attributing gains to an encoder or training recipe, what changes when the decoder is swapped under the same recipe? Using ComplEx and DistMult as the primary controlled pair, with targeted RotatE/TransE spot-checks, we evaluate seven benchmarks. On five standard KGs, ComplEx-vs-DistMult differences are modest but consistent under our recipe (+0.005 to +0.012 MRR), whereas CompGCN-style encoder effects vary more by dataset. On small KGs, decoder effects become the main diagnostic: Kinship shows a stable ComplEx advantage of +0.143 MRR (6 seeds), while UMLS favours ComplEx by +0.022 MRR in a clean 6-seed server rerun but reverses in an earlier provenance variant. We therefore treat small-KG decoder choice as recipe- and provenance-sensitive rather than as a fixed dataset winner. We further show that decoder choice interacts with encoder depth on WN18RR, and that under our recipe L=0 ComplEx on YAGO3-10 reaches 0.6971 +/- 0.0048 MRR at d=128. The result is a compact audit protocol: report matched decoder rows, log small-KG provenance, and sweep decoder x depth before making encoder-level claims.

21.
arXiv (CS.CV) 2026-06-18

Taming I2V models for Image HOI Editing: A Cognitive Benchmark and Agentic Self-Correcting Framework

Current image editing methods excel at static attributes but fail at complex Human-Object Interactions (HOI), a critical challenge unaddressed by existing benchmarks that conflate HOI with static attributes, relying on global metrics incapable of simultaneously assessing dynamic interaction validity and entangled human-object pair preservation. Thus, we first introduce HOI-Edit, a comprehensive benchmark with three progressive cognitive levels, which features an automated metric HOI-Eval that reliably evaluates instance-level interaction by letting VLM Q&A after thinking with images containing grounded Human-Object pairs. Considering the task's essence of remodeling dynamic relationships, we benchmark Image-to-Video (I2V) models, finding them inherently suited for dynamic editing due to their temporal generation capabilities. Crucially, beyond superior performance, this capability provides a "replay of the failure process," offering unique diagnosability into why errors occur. We thus propose SCPE (Self-Correcting Process Editing), a novel, agentic self-correcting framework that constrains the generation of I2V models through iteratively refined prompts, enabling the generated videos to more accurately present the target HOI. Extracted frames from these videos are the final editing results. On HOI-Edit, SCPE achieves performance competitive with state-of-the-art (SOTA) editing models like Nano Banana on interaction. Code is available at https://github.com/oceanflowlab/HOI-Edit.

23.
arXiv (CS.CL) 2026-06-16

Whose hotel does the AI recommend? An algorithm audit of reputation signals in LLM-assisted hotel selection

Travelers increasingly ask large language model (LLM) assistants which hotel to book, making these systems gatekeepers of property visibility – yet what moves their recommendations is undocumented. We conduct a pre-specified algorithm audit using a randomized choice-based conjoint: across personas, prompt templates, and twelve open-weight and proprietary models, assistants choose among five hotels whose guest rating, review volume and recency, management response, chain affiliation, price, eco-certification, and list position are independently randomized. We estimate the average marginal component effect of each signal on the probability of recommendation. Guest rating and price dominate (a top rating raises selection by 31.6 percentage points; a high price lowers it by 30.0), reproducing human valence-and-price primacy but over-weighting eco-certification and ignoring management response. List position – a content-free artifact – shifts recommendations causally, worth about \$12 per night. Stated reasons track revealed weights imperfectly. The findings ground generative engine optimization and the accountability of AI infomediaries in causal evidence.

24.
arXiv (CS.LG) 2026-06-16

Bayesian Optimization for Learning Nonlinear MPC in Autonomous Agent Navigation

arXiv:2606.14763v1 Announce Type: cross Abstract: Real-time autonomous navigation in dynamic, unknown environments remains a fundamental challenge for mobile robotics. We propose a map-free framework that tightly integrates reactive rolling-horizon planning with nonlinear Model Predictive Control (MPC). At each control cycle, a LiDAR-based Gaussian occupancy representation is constructed and used to generate collision-free trajectories via A* search, which are then tracked by a CasADi/IPOPT MPC formulation incorporating a smooth sigmoid obstacle barrier. To improve robustness to parameter sensitivity, we adopt an offline Bayesian optimization scheme based on Tree-structured Parzen Estimators (TPE), which identifies near-optimal controller parameters with respect to a composite navigation objective. In addition, a Gaussian Process surrogate is used to analyze parameter sensitivity and provide insight into the optimization landscape. The proposed framework is robot-agnostic and is evaluated on the Unitree Go2 quadruped in simulation using Gazebo, followed by deployment on the physical robot. Experimental results show that parameters tuned in simulation transfer effectively to hardware, maintaining comparable performance without additional tuning. The full system achieves up to a 90.0\% navigation success rate when deployed, along with a 38.9\% average improvement in the evaluation metrics across simulated environments.

25.
arXiv (quant-ph) 2026-06-16

The Quantum Transition State

Authors:

arXiv:2606.10266v2 Announce Type: replace Abstract: The transition state – the critical configuration separating reactants from products – is the central organizing concept of chemical reaction rate theory, yet for nearly a century it has been thought to have no exact quantum counterpart: the recrossing-free, one-way flux through a transition state appears to demand simultaneous knowledge of position and momentum, in conflict with the uncertainty principle. We show this obstruction is illusory and construct the quantum transition state directly from the exact quantum flow. Its stable and unstable invariant manifolds intersect in a unique bounded trajectory – the quantum transition-state trajectory – anchoring a moving dividing surface that each reactive characteristic crosses exactly once, yielding a one-way flux of the standard quantum probability current. The geometric framework underlying classical transition-state theory thus survives intact in exact quantum mechanics, in a fundamentally quantum form.