Academic Intelligence · Curated Daily

Explore the Frontier of Global Academia

AcademicHub aggregates real-time literature from top journals and preprint platforms. Build your personal research radar and let large language models compile cross-disciplinary analysis briefings automatically.

01.
arXiv (math.PR) 2026-06-24

Genealogical processes of sequential Monte Carlo methods and other non-neutral population models under rapid mutation

arXiv:2406.16465v3 Announce Type: replace Abstract: We show that genealogical trees arising from a broad class of non-neutral models of population evolution converge to the Kingman coalescent under a suitable rescaling of time. As well as non-neutral biological evolution, our results apply to genetic algorithms encompassing the prominent class of sequential Monte Carlo (SMC) methods. The time rescaling we need differs slightly from that used in classical results for convergence to the Kingman coalescent, which has implications for the performance of different resampling schemes in SMC algorithms. In addition, our work substantially simplifies earlier proofs of convergence to the Kingman coalescent, and corrects an error common to several earlier results.

02.
medRxiv (Medicine) 2026-06-19

Specific epigenetic age acceleration measures are associated with oral health outcomes in U.S. adults

Objectives: Oral health conditions impact a significant proportion of the global population. Chronological age is a known risk factor; however, characterization of epigenetic age remains limited and is expected to provide additional insight into biological mechanisms. Materials and Methods: The National Health and Nutrition Examination Survey (NHANES) was used to analyze the effect of epigenetic age measures of DunedinPoAm, and epigenetic age acceleration (EAA) of Horvath, Hannum, Weidner, Lin, VidalBralo, PhenoAge, GrimAge, and GrimAge2, on various oral health outcomes from survey and examination results. Univariable and multivariable logistic regression were performed, adjusting for sex, race-ethnicity, education, poverty income ratio categories, and dental insurance coverage status. Results: DunedinPoAm was associated with the last dental appointment being for an existing issue (p=0.0093), poor general oral condition (p=0.0226), limiting food due to teeth problems (p=0.0031), and recommendation to see a dentist within the next two weeks (p=0.0171). EAAs for PhenoAge, GrimAge, and GrimAge2, were associated with a smaller number of oral health outcomes, whereas EAAs for Horvath, Hannum, Weidner, Lin, and Vidal-Bralo showed no associations. Conclusions: In a representative U.S. population, DunedinPoAm was most consistently positively associated with different adverse oral health outcomes compared with other epigenetic aging measures. Tracking specific epigenetic ages such as DunedinPoAm, EAA GrimAge, EAA GrimAge2, and PhenoAge, may aid in additional monitoring of oral health outcomes. Understanding specific aging-related CpGs associated with oral health may aid in elucidating underlying molecular mechanisms.

03.
arXiv (CS.CL) 2026-06-18

Retell, Reward, Repeat: Reinforcement Learning for Narrative Theory-Informed Story Retelling

Counterfactual story retelling exposes LLM shortcomings in constrained narrative solution spaces where they can no longer rely on recalling memorised training data. Ground-truth-based post-training, such as SFT, fails to teach LLMs how to generate logical and rational narrative events. In this paper, we introduce Retell, Reward, Repeat (RRR), an RL-based pipeline synthesising Structuralist Narratology with scalar narrativity to teach storytelling structure. We extend the TimeTravel dataset with human-annotated stages of narrative equilibrium to evaluate reward models. By using d-RLAIF, RRR derives training signals from the narrativity of textual features without the need for reference outputs. Evaluations demonstrate that RRR-trained LLMs outperform few-shot and SFT baselines in logic, rationality, and completeness, with output quality additionally validated by blind human preference. Relying on a small, query-only dataset, RRR provides a linguistically grounded, cost-effective post-training mechanism for storytelling–a domain currently lacking effective post-training methods. RRR highlights the continued relevance of integrating established linguistic theories into contemporary NLP.

04.
arXiv (quant-ph) 2026-06-16

The Inverse Born Rule Equivalence. On the Informational Limits of Real-Valued Amplitude Encodings and the Measurement of Quantum Advantage in Data Embeddings

arXiv:2602.21350v2 Announce Type: replace Abstract: When does quantum data encoding provide genuine quantum advantage, and when does it merely rephrase a classically solvable problem? We prove an Equivalence Theorem demonstrating that any encoding mapping classical data to real-valued amplitudes, $\vert\psi_c\rangle = \sum_i c_i \vert i\rangle$ with $c_i \in \mathbb{R}$ and $\sum_i c_i^2 = 1$, composed with a data-independent parameterised unitary and computational-basis measurement, yields exactly the class of classical quadratic forms. We identify the geometric mechanism driving this collapse: the restriction to $\mathbb{R}$ forces a vanishing Berry connection, removing the complex phases required for data-dependent quantum interference. To operationalize this boundary, we introduce encoding diagnostics – phase complexity $C[\Phi]$ and mode-wise von Neumann mutual information $I[\Phi]$ – and link them to the information-geometric excess $\Delta g$. We show that for all real-valued encodings, $\Delta g = 0$ identically. We term the misidentification of such models as evidence of quantum computational power the Inverse Born Rule Fallacy. Supported by numerical experiments, our results establish that complex-phase structure is a strictly necessary condition for data-driven (Type~B) quantum advantage.

05.
arXiv (CS.CV) 2026-06-15

Rethinking Global Average Pooling: Your Classifier Is Secretly a Multi-Instance Learner

Authors:

Modern image classifiers widely adopt global average pooling (GAP) followed by a linear classification head. This linearity ensures that the image-level logits equal the average of logits obtained by applying the classification head pointwise to the feature grid prior to GAP. Consequently, standard classifiers may inherently retain spatial class evidence that remains recoverable even when the image-level prediction is incorrect. This structure naturally suggests a multiple-instance learning (MIL) interpretation, where an image is viewed as a bag of spatial instances. Within this formulation, we demonstrate that standard classifiers trained with a single label per image can still learn the intended classification task in multi-object scenes. We further exploit this property to decompose image-level logits into a prediction grid, providing a post-hoc diagnostic to extract spatial class evidence that GAP otherwise obscures. Our systematic evaluation reveals that off-the-shelf models consistently recover the ground-truth class within foreground regions. The MIL interpretation further suggests that common classifier failures reflect known limitations of mean aggregation.

06.
arXiv (CS.AI) 2026-06-18

LLM-Evolved Domain-Independent Heuristics for Symbolic AI Planning

arXiv:2605.29649v2 Announce Type: replace Abstract: Heuristic search is the dominant paradigm in symbolic AI planning, and the strongest heuristics are the result of decades of work by planning researchers. Recent work has shown that large language models (LLMs) can design heuristics for individual planning domains, but no LLM-generated heuristic has so far worked on arbitrary planning tasks. In this paper, we use evolutionary search to produce the first LLM-generated domain-independent heuristics that exceed the hand-engineered state of the art. We let an LLM mutate parent heuristics written in C++, store candidates in a MAP-Elites archive keyed on informedness and speed and calculate fitness scores by blending coverage with solving time. To place the evolved programs in context, we additionally benchmark a broad set of hand-engineered heuristics on their informedness-speed tradeoff, which to our knowledge has not been done before. On unseen testing domains, our best evolved heuristic solves more tasks than even the strongest baseline, with our full heuristic suite spanning the Pareto frontier of said tradeoff. We also find that seeding evolution from the trivial blind heuristic outperforms seeding from the strong FF heuristic, even when the resulting program is itself an FF variant, and that LLM reasoning effort affects how often candidates compile much more than the quality of those that do. Because the evolved programs are plain C++, they slot into existing planners as drop-in replacements and inherit the soundness and completeness guarantees of the underlying search.

07.
arXiv (CS.AI) 2026-06-16

HOLO-MPPI: Multi-Scenario Motion Planning via Hierarchical Policy Optimization

arXiv:2606.16480v1 Announce Type: cross Abstract: Robots deployed in the real world must plan motions across diverse scenarios without per-scenario retuning. End-to-end reinforcement learning (RL) can generalize across scenarios but often becomes brittle under distribution shift, reward misspecification, and stochastic interactions. Model predictive path integral (MPPI) control enables strong real-time refinement without gradients, but its performance depends on a well-shaped sampling prior, while manually designing the priors does not scale to multi-scenario deployment. We present HOLO-MPPI (High-level Offline, Low-level Online MPPI), a multi-scenario motion planning framework that combines high-level policy learning with low-level stochastic optimal control. Offline, we learn a high-level policy that proposes scenario-robust plans in an abstract action space, with a learned world model for online rollout. Online, the policy serves as a data-driven prior generator that parameterizes MPPI's sampling distribution conditioned on the current observation and goal. MPPI then optimizes low-level control sequences around this prior in real time to adapt to local disturbances. We instantiate HOLO-MPPI in autonomous driving by designing an effective high-level action space and tailored model architectures. Our evaluation across diverse driving scenarios shows that HOLO-MPPI improves upon MPPI and end-to-end RL baselines while maintaining real-time control.

08.
arXiv (CS.AI) 2026-06-12

Otters++: A Time-to-first-spike Based Energy Efficient Optical Spiking Transformer

arXiv:2606.13016v1 Announce Type: new Abstract: Spiking neural networks (SNNs) are promising for energy-efficient inference, and time-to-first-spike (TTFS) coding is especially attractive because each neuron fires at most once. In practice, however, this benefit is often reduced by the cost of computing a temporal decay term and multiplying it by the synaptic weight. We address this issue by turning a physical hardware "bug," the natural signal decay in optoelectronic devices, into the main computation of TTFS, named Otters++. Specifically, we use the measured decay of a custom In$_2$O$_3$ optoelectronic synapse to directly realize the TTFS temporal term, removing the need for explicit digital decay computation. To scale this idea to Transformer models, we establish a layer-wise functional equivalence between the Otters++ and a quantized neural network (QNN), and develop a hybrid training method that uses device-faithful SNN computation in the forward pass and QNN straight-through gradients through the equivalent QNN path in the backward pass, together with model distillation. This avoids differentiation through discrete first-spike events and reduces the over-sparsity problem in direct TTFS-SNN training. We further make training aware of measured device noise by sampling run-to-run variation, and refine the system-level energy model by accounting for device sharing and multi-hop communication. On GLUE dataset, Otters++ improves the average score to 84.17\% while maintaining a clear energy advantage over prior spiking Transformer baselines. These results show that physically grounded TTFS computing can be efficient, trainable, and robust under realistic hardware effects.

09.
bioRxiv (Bioinfo) 2026-06-15

VrySure: A Multi-Task AI Scientific Fraud Detection Platform for Identifying Manipulated and AI-Generated Biomedical Research Images

Integrity of scientific data is critical in biomedical research, where images often serve as primary evidence for experimental observations and conclusions. Advances in image-editing technologies and generative artificial intelligence (AI) have increased the accessibility and realism of visual manipulation, making detection through manual review increasingly challenging. To empower our laboratory researchers to continuously monitor and uphold scientific rigor and data integrity, and serve the global scientific community, we developed VrySure, an easy-to-deploy, AI-driven multi-task platform for automated image-integrity screening in biomedical research. VrySure integrates four detection modules: cross-image transformation detection, within-image copy-move detection, splicing detection in blot and gel images, and AI-generated image detection. The system identifies potentially manipulated images and, when possible, localizes suspicious regions using bounding-box outputs to support downstream verification. To support development and evaluation, we constructed task-specific datasets by combining public biomedical image resources, curated manipulated examples, and synthetic images generated by multiple generative AI systems. We evaluated VrySure using region-level F1 score, recall, precision, false negative rate (FNR), and false discovery rate (FDR) across multiple manipulation categories and compared its performance with two commonly used commercial image-integrity screening platforms under a predefined benchmark protocol. Under the tested conditions, VrySure achieved a higher F1 score and recall, lower FNR, and maintained a low FDR for within-image copy-move detection, splicing detection, and AI-generated image detection, while showing comparable performance in transformation detection. Beyond automated screening, VrySure is designed to support source-data comparison and evidence-based assessment in scientific integrity investigations. By integrating multiple detection capabilities into a unified and scalable workflow, VrySure provides a practical framework to improve the efficiency and consistency of image-integrity screening in biomedical research.

10.
arXiv (CS.AI) 2026-06-24

Learning to Trigger: Reinforcement Learning at the Large Hadron Collider

arXiv:2606.23993v1 Announce Type: cross Abstract: High-throughput scientific facilities such as the Large Hadron Collider depend on real-time event filtering (triggering) under tight constraints on bandwidth, latency, and storage. In practice, trigger menus are largely static and hand-tuned and can become suboptimal as detector conditions, pileup, and background composition drift over time. We cast online threshold tuning as a sequential decision-making problem: a reinforcement learning agent ingests streaming summaries of recent rates and signal-sensitive features and updates trigger thresholds to maximize signal efficiency while tracking a target background rate within a tolerance band. We adapt Group-Filtered Policy Optimization (GFPO) to streaming control and introduce two variants (GFPO-F, GFPO-FR) that enforce background rate feasibility during training. On a benchmark that emulates realistic collider operation, we study two representative triggers: a total transverse energy ($H_{T}$) trigger sensitive to pileup variation, and an anomaly-detection (AD) trigger based on reconstruction loss for rare or non-standard signatures. On Monte Carlo streams, our agent increases the fraction of in-tolerance time intervals by 48\% ($H_T$) and 28\% (AD), with a cumulative gain of up to 2\% in signal efficiency on those in-tolerance intervals. Transferring from simulation to real collision data (CMS Run 283408), the same agent, without fine-tuning, achieves a 56\% ($H_T$) and 28\% (AD) in-tolerance improvement over baselines, with further signal-efficiency gain on both triggers. To our knowledge, this is the first demonstration of RL-based trigger control on real Large Hadron Collider collision data. Code is available at https://github.com/Zixind/GFPO\_LHC.

11.
arXiv (CS.LG) 2026-06-16

A Decision-Theoretic View of Test-Time Training: When, How Far, and Which Directions to Adapt

arXiv:2606.15569v1 Announce Type: new Abstract: Test-time training (TTT) adapts a pretrained model to each prompt via parameter updates, improving accuracy under pretraining-to-test distribution shifts. Yet, its performance often suffers from instability and sensitivity to hyperparameters such as update steps and subspace. We explain this behavior through a decision-theoretic lens, treating TTT as implicit Bayesian inference in the kernel regime. Under a Gaussian process benchmark, we show that TTT reduces prediction error when updates are spectrally matched to the prompt's signal-to-noise ratio and aligned with query-relevant eigen-directions. This perspective underpins the following results: (1) we show when fixed update steps and subspaces fail under distribution shifts, motivating adaptive strategies; (2) we prove that selecting update steps via prompt evidence admits a PAC-Bayes guarantee against overfitting; and (3) we characterize the Bayes-optimal update subspace under a linear-Gaussian correction model, yielding a scoring rule for selecting Transformer blocks and heads. Our theory helps explain the empirical instability of TTT, taking a step toward principled guidance for when, how far, and which directions to adapt.

12.
arXiv (CS.CV) 2026-06-15

TSA: Temporal Slot Activation for Persistent Object-Centric Video Representation

Unsupervised video object-centric learning aims to decompose dynamic scenes into temporally persistent entity representations. Existing recurrent video slot-attention methods propagate a fixed set of slots across frames, but typically assume unconditional slot propagation: every slot is updated and decoded at every frame, regardless of whether its corresponding object is visible. We show that this design violates a basic lifecycle requirement for persistent slots: when an object is absent or fully occluded, its slot should preserve its previous state and avoid explaining unrelated visible content. Instead, unconditional propagation creates two failure pathways: update-induced state drift, where current-frame evidence overwrites the absent object's representation, and decoder-induced reconstruction interference, where the inactive slot remains coupled to reconstruction through decoder attention. We propose Temporal Slot Activation (TSA), a mechanism that learns a per-slot, per-frame activation score $\alpha_{k,t} \in (0, 1)$ without visibility supervision. TSA uses this activation as a shared latent control variable for slot lifecycle modeling. When a slot is inactive, TSA anchors its state to the previous slot via activation-gated updating and suppresses its decoder participation through an activation-dependent additive bias on attention logits before softmax normalization. This jointly reduces state drift and reconstruction-driven interference. To improve decisions under partial occlusion and gradual reappearance, TSA further conditions activation prediction on a per-slot temporal memory produced by a Temporal Context Encoder. We evaluate TSA on MOVi-C/E, YT-VIS, and OVIS benchmarks using both standard and tracking-based metrics (FG-ARI, mBO, IDF1, HOTA). TSA consistently improves object decomposition and temporal identity preservation, with large gains on long, heavily occluded videos.

13.
arXiv (quant-ph) 2026-06-16

Morphology-resolved scrambling in a chaotic quantum billiard

arXiv:2606.16865v1 Announce Type: new Abstract: Chaotic quantum systems can retain spatial memory through scarred eigenstates, but whether these static structures control scrambling remains unclear. This work establishes a morphology-resolved connection between scarred eigenstates and eigenstate-resolved OTOCs in a peanut-shaped quantum billiard. Scalar localisation diagnostics, including differential entropy and continuum participation ratios, detect anomalous concentration but discard spatial architecture. A scale-normalised density overlap, in contrast, directly compares probability density profiles, revealing families of orthogonal eigenstates with nearly identical spatial morphology. Comparing the complete OTOC time traces of these orthogonal eigenstates reveals that morphological recurrence has dynamical content: moderate density overlap yields no universal prediction, whereas strongly recurring morphologies exhibit nearly identical OTOC growth and saturation. Thus, scarred structures act as spatial templates for operator growth, not merely static violations of ergodicity. This morphology-resolved framework turns eigenstate shape into a quantitative predictor of scrambling and provides a scale-controlled diagnostic of weak ergodicity breaking in quantum chaos.

14.
arXiv (CS.CL) 2026-06-17

Perceptual compensation for tonal context in self-supervised speech models

This study examines the extent to which the wav2vec2.0 architecture exhibits evidence of compensation for phonological context. We conducted a pseudo-replication of a perceptional compensation experiment on Mandarin Chinese tones, and compared the embedding similarities and probing classifier outputs between a purely self-supervised pre-trained model and a model fine-tuned for Mandarin ASR. No evidence of compensation was found in the embedding similarities of the purely pre-trained model. Probing classifiers showed some evidence of compensation in addition to the expected layer-wise improvements in categorization, but failed to replicate human performance on isolated test syllables. Our findings contrast with previous reports of sensitivity to phonological structure emerging through pre-training alone, and suggest that supervised objectives may be necessary to encourage the abstraction of at least some types of phonological regularities.

15.
arXiv (quant-ph) 2026-06-16

Neural quantum states for entanglement depth certification from randomized Pauli measurements

arXiv:2512.13121v2 Announce Type: replace Abstract: Entanglement depth quantifies how many qubits share genuine multipartite entanglement, but certification typically relies on tailored witnesses or full tomography, both of which scale poorly with system size. We recast entanglement-depth and non-$k$-separability certification as likelihood-based model selection among neural quantum states whose architecture enforces a chosen entanglement constraint. A hierarchy of separable neural quantum states is trained on finite-shot local Pauli outcomes and compared against an unconstrained reference model trained on the same data. When all constrained models are statistically disfavored, the data certify entanglement beyond the imposed limit directly from measurement statistics, without reconstructing the density matrix. We validate the method on simulated six- and ten-qubit datasets targeting GHZ, Dicke, and Bell-pair states, and demonstrate robustness for mixed states under local noise. Finally, we discuss lightweight interpretability diagnostics derived from trained parameters that expose coarse entanglement patterns and qubit groupings directly from bitstring statistics.

16.
arXiv (math.PR) 2026-06-19

An alternative approach to well-posedness of McKean-Vlasov equations arising in Consensus-Based Optimization

arXiv:2512.19446v4 Announce Type: replace-cross Abstract: In this work we study the mean-field description of Consensus-Based Optimization (CBO), a derivative-free particle optimization method. Such a description is provided by a non-local SDE of McKean-Vlasov type, whose fields lack of global Lipschitz continuity. We propose a novel approach to prove the well-posedness of the mean-field CBO equation based on a truncation argument. The latter is performed through the introduction of a cut-off function, defined on the space of probability measures, acting on the fields. This procedure allows us to study the well-posedness problem in the classical framework of Sznitman. Through this argument, we recover the established result on the existence of strong solutions, and we extend the class of solutions for which pathwise uniqueness holds.

17.
arXiv (CS.CV) 2026-06-15

SinGeo: Unlock Single Model's Potential for Robust Cross-View Geo-Localization

Robust cross-view geo-localization (CVGL) remains challenging despite the surge in recent progress. Existing methods still rely on field-of-view (FoV)-specific training paradigms, where models are optimized under a fixed FoV but collapse when tested on unseen FoVs and unknown orientations. This limitation necessitates deploying multiple models to cover diverse variations. Although studies have explored dynamic FoV training by simply randomizing FoVs, they failed to achieve robustness across diverse conditions – implicitly assuming all FoVs are equally difficult. To address this gap, we present SinGeo, a simple yet powerful framework that enables a single model to realize robust cross-view geo-localization without additional modules or explicit transformations. SinGeo employs a dual discriminative learning architecture that enhances intra-view discriminability within both ground and satellite branches, and is the first to introduce a curriculum learning strategy to achieve robust CVGL. Extensive evaluations on four benchmark datasets reveal that SinGeo sets state-of-the-art (SOTA) results under diverse conditions, and notably outperforms methods specifically trained for extreme FoVs. Beyond superior performance, SinGeo also exhibits cross-architecture transferability. Furthermore, we propose a consistency evaluation method to quantitatively assess model stability under varying views, providing an explainable perspective for understanding and advancing robustness in future CVGL research. Codes will be available upon acceptance.

18.
arXiv (CS.LG) 2026-06-19

SEAGAN: domain-Specific and Edge-Aware Graph Attention Network for Dynamic Plant Processes

arXiv:2606.19623v1 Announce Type: new Abstract: Graph neural networks (GNNs) provide a flexible framework for learning from scientific data linked through physical, biological, or functional relationships. One promising domain is plant physiology, where measured responses often arise from multiple interacting processes whose exact separation remains difficult even with manual intervention. In plant physiology, a key example is the A-Ci curve, which relates net CO2 assimilation rate (Anet) to leaf intercellular CO2 concentration (Ci) and is used to estimate photosynthetic parameters in leaf and crop-canopy models. However, reliable estimation requires identifying the active biochemical limitation state at each curve point, which remains a major source of uncertainty. Here, we formulate limitation-state identification along A-Ci curves as a graph-based node classification problem, with curve points as nodes. Domain-specific graph representations are created using distance-based k-nearest-neighbor (kNN) and auxiliary-signal-guided (ASG) connectivity, with edge attributes encoding pairwise relations. The framework was evaluated against conventional learning baselines, graph-based architectures, and an automated fitting-based benchmark. Results on a large synthetic dataset with known ground-truth limitation states show that graph-based models improve classification, particularly near biochemical transition regions. The best-performing configuration, SEAGAN (domain-Specific and Edge-Aware Graph Attention Network for Dynamic Plant Processes), integrates process-aware node features, edge attributes, kNN connectivity, and graph attention with weighted cross-entropy loss, achieving an F1-score of 0.857 and an accuracy of 0.882. The results show that representing A-Ci curves as graphs improves biochemical limitation-state analysis, with edge-aware attention over local kNN neighborhoods providing the most effective strategy.

19.
arXiv (CS.CL) 2026-06-12

SICI: A Semantic-Pragmatic Complexity Index Reveals Regime Shifts in LLM Stance Detection

Prompt-based LLMs are increasingly used for stance detection, but harder examples are not always repaired by clearer instructions, reasoning prompts, retrieval, or debate. We introduce SICI (Stance Inference Complexity Index), a seven-dimensional diagnostic measure of the semantic-pragmatic burden imposed by a target–text pair. Across SemEval-2016 and VAST, SICI predicts LLM accuracy better than surface proxies and shows substantial cross-scorer reliability ($\alpha=0.771$). More importantly, LLM errors change regime as SICI increases: low-complexity examples invite over-attribution, especially Against predictions; intermediate examples form an unstable boundary; and high-complexity examples rapidly concentrate on None. This phase-transition-like structure persists across GPT-3.5, GPT-4o-mini, DeepSeek-V3, and GPT-4o, although stronger models move the boundaries. A 15-method intervention study further shows that prompting, retrieval, and debate often shift models along the attribution–abstention axis rather than removing the high-complexity bottleneck.

20.
arXiv (CS.AI) 2026-06-11

Sample-Efficient Hypergradient Estimation for Decentralized Bi-Level Reinforcement Learning

arXiv:2603.14867v4 Announce Type: replace-cross Abstract: Many strategic decision-making problems, such as environment design for warehouse robots, can be naturally formulated as bi-level reinforcement learning (RL), where a leader agent optimizes its objective while a follower solves a Markov decision process (MDP) conditioned on the leader's decisions. In many situations, a fundamental challenge arises when the leader cannot intervene in the follower's optimization process; it can only observe the optimization outcome. We address this decentralized setting by deriving the hypergradient of the leader's objective, i.e., the gradient of the leader's strategy that accounts for changes in the follower's optimal policy. Unlike prior hypergradient-based methods that require extensive data for repeated state visits or rely on gradient estimators whose complexity can increase substantially with the high-dimensional leader's decision space, we leverage the Boltzmann covariance trick to derive an alternative hypergradient formulation. This enables efficient hypergradient estimation solely from interaction samples, even when the leader's decision space is high-dimensional. Additionally, to our knowledge, this is the first method that enables hypergradient-based optimization for 2-player Markov games in decentralized settings. Experiments highlight the impact of hypergradient updates and demonstrate our method's effectiveness in both discrete and continuous state tasks.

21.
arXiv (quant-ph) 2026-06-16

Information geometry and entanglement under phase-space deformation through nonsymplectic congruence transformation

arXiv:2505.02269v3 Announce Type: replace Abstract: The Fisher-Rao (FR) information matrix is a central object in multiparameter quantum estimation theory. The geometry of a quantum state can be envisaged through the Riemannian manifold generated by the FR-metric corresponding to the quantum state. Interestingly, any congruence transformation $GL(2n,\mathbb{R})$ in phase space leaves the FR-distance for Gaussian states invariant. In the present paper, we investigate whether this isometry affects the entanglement in the bipartite system. It turns out that the entanglement-generating congruent transformation depends upon the system and background space. To make our study relevant to physical systems, we choose Bopp's shift in phase space as an example of $GL(2n,\mathbb{R})$, so that the results can be interpreted in terms of noncommutative (NC) phase-space deformation. We provide an estimation of the measure of entangled states over separable states for bipartite Gaussian states under a Bopp's shift. Since the dynamics of free oscillators in background NC-space is mathematically equivalent to the dynamics of a charged particle under a homogeneous magnetic field, we provide an outline for a gedankenexperiment through photocurrent measurement in order to determine the effects of congruent transformation on the distinguishibility of Gaussian states.

22.
arXiv (CS.LG) 2026-06-15

Recipe-Controlled Decoder Audit for Structural Knowledge-Graph Completion

arXiv:2606.14492v1 Announce Type: new Abstract: We present a recipe-controlled decoder audit (RCDA) for structural transductive knowledge-graph completion (KGC). The audit asks a simple reporting question: before attributing gains to an encoder or training recipe, what changes when the decoder is swapped under the same recipe? Using ComplEx and DistMult as the primary controlled pair, with targeted RotatE/TransE spot-checks, we evaluate seven benchmarks. On five standard KGs, ComplEx-vs-DistMult differences are modest but consistent under our recipe (+0.005 to +0.012 MRR), whereas CompGCN-style encoder effects vary more by dataset. On small KGs, decoder effects become the main diagnostic: Kinship shows a stable ComplEx advantage of +0.143 MRR (6 seeds), while UMLS favours ComplEx by +0.022 MRR in a clean 6-seed server rerun but reverses in an earlier provenance variant. We therefore treat small-KG decoder choice as recipe- and provenance-sensitive rather than as a fixed dataset winner. We further show that decoder choice interacts with encoder depth on WN18RR, and that under our recipe L=0 ComplEx on YAGO3-10 reaches 0.6971 +/- 0.0048 MRR at d=128. The result is a compact audit protocol: report matched decoder rows, log small-KG provenance, and sweep decoder x depth before making encoder-level claims.

23.
arXiv (CS.CV) 2026-06-18

E-VAds: An E-commerce Short Videos Understanding Benchmark for MLLMs

E-commerce short videos represent a high-revenue segment of the online video industry characterized by a goal-driven format and dense multi-modal signals. Current models often struggle with these videos because existing benchmarks focus primarily on general-purpose tasks and neglect the reasoning of commercial intent. In this work, we first propose a multi-modal information density assessment framework to quantify the complexity of this domain. Our evaluation reveals that e-commerce content exhibits substantially higher density across visual, audio, and textual modalities compared to mainstream datasets, establishing a more challenging frontier for video understanding. To address this gap, we introduce E-commerce Video Ads Benchmark, which is the first benchmark specifically designed for e-commerce short video understanding. We curated 3,961 high-quality videos from Taobao covering a wide range of product categories and used a multi-agent system to generate 19,785 open-ended Q&A pairs, which consist of five distinct tasks. Finally, we develop E-VAds-R1, an RL-based reasoning model featuring a multi-grained reward design called MG-GRPO. This strategy provides smooth guidance for early exploration while creating a non-linear incentive for expert-level precision. Experimental results demonstrate that E-VAds-R1 achieves a 109.2% performance gain in commercial intent reasoning with only a few hundred training samples. Data is available at https://github.com/TaobaoTmall-AlgorithmProducts/E-VAds_Benchmark.

24.
arXiv (CS.AI) 2026-06-11

TreeSeeker: Tree-Structured Trial, Error, and Return in Deep Search

arXiv:2606.11662v1 Announce Type: new Abstract: Deep search requires agents to answer complex questions through multi-step web search, browsing, evidence comparison, and synthesis. A central challenge is deciding how to search when several directions look plausible but only some will later lead to reliable evidence. If an agent greedily follows the current best-looking direction, it may keep extending a weak continuation. If it explores without discipline, it may waste budget on disconnected trials. We propose TreeSeeker, an inference-time framework for controlled trial-and-error in deep search. TreeSeeker organizes search as branch-and-return search over tree-structured states, where each branch is a tentative direction for a sub-goal. At each round, TreeSearch reads all sub-goal trees, identifies active goals, and uses textual UCB signals of value, uncertainty, and risk to select among exploiting a promising branch, exploring an uncertain alternative, or pruning an unproductive continuation and returning to an earlier branch point. TreeMem supports this control loop by keeping evidence, uncertainty, conflicts, progress, and failure cues attached to the branches that produced them, so trial outcomes can guide later decisions. Experiments on XBench-DeepSearch, BrowseComp, and BrowseComp-ZH show that TreeSeeker consistently outperforms strong open-source baselines, suggesting that explicit branch-and-return control complements stronger reasoning and tool execution.

25.
arXiv (CS.CL) 2026-06-24

ErrorLLM: Modeling SQL Errors for Text-to-SQL Refinement

Despite the remarkable performance of large language models (LLMs) in text-to-SQL (SQL generation), correctly producing SQL queries remains challenging during initial generation. The SQL refinement task is subsequently introduced to correct syntactic and semantic errors in generated SQL queries. However, existing paradigms face two major limitations: (i) self-debugging becomes increasingly ineffective as modern LLMs rarely produce explicit execution errors that can trigger debugging signals; (ii) self-correction exhibits low detection precision due to the lack of explicit error modeling grounded in the question and schema, and suffers from severe hallucination that frequently corrupts correct SQLs. In this paper, we propose ErrorLLM, a framework that explicitly models text-to-SQL Errors within a dedicated LLM for text-to-SQL refinement. Specifically, we represent the user question and database schema as structural features, employ static detection to identify execution failures and surface mismatches, and extend ErrorLLM's semantic space with dedicated error tokens that capture categorized implicit semantic error types. Through a well-designed training strategy, we explicitly model these errors with structural representations, enabling the LLM to detect complex implicit errors by predicting dedicated error tokens. Guided by the detected errors, we perform error-guided refinement on the SQL structure by prompting LLMs. Extensive experiments demonstrate that ErrorLLM achieves the most significant improvements over backbone initial generation. Further analysis reveals that detection quality directly determines refinement effectiveness, and ErrorLLM addresses both sides by high detection F1 score while maintain refinement effectiveness.