Academic Intelligence · Curated Daily

Explore the Frontier of Global Academia

AcademicHub aggregates real-time literature from top journals and preprint platforms. Build your personal research radar and let large language models compile cross-disciplinary analysis briefings automatically.

01.
arXiv (CS.AI) 2026-06-11

Embodied-R1.5: Evolving Physical Intelligence via Embodied Foundation Models

arXiv:2606.11324v1 Announce Type: cross Abstract: We introduce Embodied-R1.5, a unified Embodied Foundation Model (EFM) that integrates comprehensive embodied reasoning capabilities, spanning embodied cognition, task planning, correction, and pointing, within a single architecture toward general physical intelligence. Leveraging three automated data construction pipelines to significantly expand the data coverage of critical capabilities, we build a large-scale data system of over 15B tokens, and design a multi-task balanced RL recipe to alleviate heterogeneous task conflicts. We further introduce a Planner-Grounder-Corrector (PGC) closed-loop framework that enables a single model to autonomously execute and self-correct over long-horizon tasks. With only 8B parameters, Embodied-R1.5 achieves SOTA on 16 out of 24 embodied VLM benchmarks, surpassing leading models like Gemini-Robotics-ER-1.5 and GPT-5.4. Benefiting from the internalized embodied capabilities, Embodied-R1.5 can be fine-tuned into a VLA with only a small amount of data, outperforming leading VLA models like $\pi_{0.5}$ across 4 popular manipulation benchmark suites. We further conduct extensive zero-shot real-robot experiments, validating performance in instruction following, affordance grounding, articulated object manipulation, and long-horizon complex tasks, demonstrating strong generalization to the physical world. We open-source model weights, datasets, training code, and EmbodiedEvalKit, an evaluation framework tailored for embodied tasks, to facilitate future research in EFMs.

02.
arXiv (CS.CV) 2026-06-16

Analyzing Visual Aircraft Representations with Sparse Autoencoders

Vision models can achieve strong performance on classification tasks, but the internal representations supporting their predictions are often difficult to interpret. This work investigates whether sparse autoencoders can decompose intermediate representations of a vision model into interpretable features. We train a ConvNeXt classifier on the FGVC-Aircraft dataset, extract spatial activations from its final feature stage, and train a sparse autoencoder on these activations. The learned sparse features are analyzed using top-activating image patches, activation strength, and class selectivity. Qualitative visual inspection reveals that several features correspond to recognizable aircraft structures and visual patterns. We evaluate a subset of selected features using input-space and feature-space ablations, measuring how blurring image patches and suppressing sparse features affect class logits, classification margins, and prediction confidence. The results suggest that sparse autoencoders can reveal partially interpretable, class-relevant visual features associated with aircraft recognition, while also exposing limitations such as polysemanticity and coarse spatial localization.

03.
arXiv (CS.CL) 2026-06-12

Zero-source LLM Hallucination Detection with Human-like Criteria Probing

Large language models (LLMs) often hallucinate by generating factually incorrect or unfaithful content, posing significant risks to their safe use. Detecting such hallucinations is particularly challenging under the zero-source constraint, where no model internals or external references are available, and detection must rely solely on the textual query-answer pair. In this paper, we propose Human-like Criteria Probing for Hallucination Detection (HCPD), a paradigm that emulates the multi-faceted reasoning of human evaluators. Its core is a Human-like Criteria Probing (HCP) mechanism, in which a LLM agent adaptively decomposes its judgment into a weighted set of interpretable criteria and aggregates criterion-specific scores into a final truthfulness measure. To achieve this adaptive capability, we introduce a reward-based alignment scheme using only weak supervision from semantic consistency. At inference, we employ a multi-sampling aggregation strategy to ensure robust decisions while preserving full interpretability. We further provide theoretical analysis supporting the reliability of our approach. Extensive experiments show that HCPD consistently outperforms state-of-the-art baselines, offering an effective and explainable solution for zero-source hallucination detection. Code is available at https://github.com/TRISKEL10N/HCPD.

04.
arXiv (CS.LG) 2026-06-11

Higher-Order Token Interactions via Quantum Attention

arXiv:2606.11673v1 Announce Type: cross Abstract: Standard dot-product self-attention computes, in a single layer, only pairwise (order-2) interactions between tokens; representing a generic order-$k$ interaction is known to require either super-quadratic resources in one layer or composition across depth. We introduce Quantum Higher-Order Attention (QHA), a shallow, hardware-realizable quantum attention head that, via data re-uploading and an all-to-all non-Clifford entangler, synthesizes order-$k$ token interactions inside the circuit and exposes them through a local single-qubit read-out. We prove (i) an expressivity separation: any single standard self-attention layer with embedding dimension $m$, $H$ heads and $p$-bit precision satisfying $mHp=o(N/\log\log N)$ cannot represent the order-$k$ correlation family that one QHA head represents with circuit depth $O(\log k)$ ($O(k)$ two-qubit gates); and (ii) a trainability guarantee for its local-design instantiation: with a local read-out and $O(\log n)$ depth the gradient variance is $\Omega(1/\mathrm{poly}(n))$ (no barren plateau), which we confirm empirically – while being explicit that the more expressive all-to-all instantiation we benchmark is trained empirically and shows exponentially decaying gradients. Empirically, at a $6.5\times$ smaller parameter budget, QHA generalizes hidden-subset parity of every order $k\le6$ from disjoint inputs, whereas the larger classical attention head collapses past order~2; consistent with theory, the size of the advantage tracks the target's Fourier degree - largest for parity and shrinking when low-order structure is present. As an application, QHA serves as a compact high-order interaction detector across three domains - genetic epistasis, learning-parity-with-noise, and graph triangle detection - reaching the noise ceiling at the smallest parameter budget where field-standard linear methods fail.

05.
arXiv (CS.CL) 2026-06-18

RCEM: Robust Conversational Search EMbedder in Distributional Shift

We propose RCEM, a Robust Conversational search EMbedder that is additionally equipped with LLM's query reformulation capability without losing base model's generalization. Unlike prior conversational dense retrieval approaches that learn direct conversation-to-passage matching, RCEM aligns conversations, prepended by special token, to LLM-rewritten queries, while preserving the original embedding space. The unchanged embedding space automatically maps the rewritten-query to the relevant passages. As a result, RCEM (1) reduces overfitting by simplifying the alignment task from long passages to shorter rewritten queries, (2) eliminates the need for conversation-to-passage relevance labels for training, and (3) maintains its original embedding space that allows conversational queries against indexes built by original embedder without rebuilding them. Extensive experiments show that RCEM consistently outperforms prior approaches, achieving up to 30% improvement under distributional shift.

06.
arXiv (CS.CL) 2026-06-17

When English Isn't the Best Teacher: Source Language Effects in Cross-Lingual In-Context Learning

Cross-lingual transfer in multilingual NLP has been widely explored in supervised fine-tuning contexts, where factors like data availability and linguistic similarity largely determine transfer quality. As the field shifts toward few-shot In-Context Learning (ICL), it is often presumed that insights from fine-tuning carry over unchanged. Yet this assumption has not been rigorously evaluated, leaving open the question of how to choose source languages for cross-lingual ICL. We conduct a broad empirical study of cross-lingual transfer in ICL spanning seven tasks, six models, and a typologically diverse set of languages. We further analyze language confusion, a key obstacle for generative tasks in cross-lingual ICL. Our results show that conventional fine-tuning-based expectations do not consistently apply in the ICL regime and point to alternative heuristics for selecting source languages effectively.

07.
arXiv (CS.LG) 2026-06-18

OpenAnt: LLM-Powered Vulnerability Discovery Through Code Decomposition, Adversarial Verification, and Dynamic Testing

arXiv:2606.19149v1 Announce Type: cross Abstract: Automated vulnerability discovery in large codebases remains challenging: traditional static analysis produces high false-positive rates, while dynamic approaches such as fuzzing require substantial infrastructure and often target narrow classes of bugs. Recent advances in large language models (LLMs) enable semantic reasoning about program behavior, but applying LLMs to repository-scale security analysis introduces challenges related to context management, cost, and verification. We present OpenAnt, an open-source vulnerability discovery system that integrates static program analysis with LLM-based reasoning in a multi-stage pipeline. OpenAnt introduces three key techniques. First, codebases are decomposed into self-contained analysis units filtered by reachability from external entry points, reducing the analysis surface by up to 97% while preserving attack-relevant code. Second, candidate vulnerabilities undergo adversarial verification through constrained attacker simulation, where the model evaluates exploitability under realistic attacker capabilities. Third, findings are validated through dynamic verification, in which exploit environments are generated automatically, executed in sandboxed containers, and discarded after use. Evaluation on widely used open-source projects including OpenSSL, WordPress, and Flowise shows that this architecture can identify previously unknown vulnerabilities while maintaining manageable analysis cost and substantially reducing false positives. Our results suggest that closed-loop vulnerability discovery pipelines, combining semantic reasoning with exploit validation, provide a practical path toward scalable automated security analysis. OpenAnt is released as open source under the Apache 2.0 license at https://github.com/knostic/OpenAnt.

08.
arXiv (CS.LG) 2026-06-18

INDEQS: Informed Neural controlled Differential EQuationS

arXiv:2606.19138v1 Announce Type: new Abstract: Neural Controlled Differential Equations (NCDE) provide a powerful continuous-time framework for forecasting time series, but standard graph-based extensions typically learn spatial structure purely from data, even in settings where a directed graph structure is known a priori. We introduce Informed Neural controlled Differential EQuationS (INDEQS), a graph-based NCDE forecasting method that incorporates prior knowledge of a directed graph at distinct architectural positions. INDEQS separates inner mixing of hidden states across graph nodes from outer mixing between vector field and control, and offers both a lightweight graph-constrained variant and a more expressive variant, learning additional graph connections from data via adaptive graph convolutions. To systematically study when graph informedness is beneficial in forecasting, we devise a continuous advection simulation on directed graphs, yielding synthetic spatio-temporal datasets with known ground-truth flow structure. We then evaluate INDEQS on two real-world tasks: river discharge forecasting on a hydrological network and traffic flow prediction on PeMS08. Across these synthetic and real-world benchmarks, outer informedness consistently improves mean absolute error over an uninformed NCDE with comparable parameter count, particularly on larger graphs, while inner informedness offers a more parameter-efficient alternative when strict adherence to a known adjacency is desired. A comparison of discrete convolutional and continuous-time decoders further shows that continuous decoders yield better accuracy and greater temporal flexibility on real-world tasks. An implementation of INDEQS and the advection simulation is available at https://github.com/Mitchi1/indeqs.

09.
arXiv (quant-ph) 2026-06-16

Counterdiabatic Raman Atom Optics for Compact High-Sensitivity Gravimetry

arXiv:2606.16945v1 Announce Type: new Abstract: Large-momentum-transfer (LMT) atom interferometry provides a route toward enhanced inertial sensitivity in compact quantum sensors, but its scalability is limited by the accumulation of pulse-transfer errors across long Raman pulse sequences. We investigate theoretically the use of stimulated Raman shortcut-to-adiabatic passage (STIRSAP) for high-fidelity LMT atom optics in a Mach–Zehnder interferometer geometry. The counterdiabatic correction is encoded directly into the Raman pulse envelopes, eliminating the need for auxiliary microwave or radio-frequency control fields. Numerical simulations based on an effective Raman model show that $1~\mu\mathrm{s}$ STIRSAP pulses achieve single-pulse transfer fidelities of $F_\pi = 0.99902$ while maintaining negligible pulse-time overhead even at high momentum order. We analyze the resulting tradeoff between interferometric phase enhancement and compound contrast decay and identify an unconstrained shot-noise optimum near $n\approx270$. The analysis further shows that practical operation at extreme LMT order is constrained by wave-packet separation, vibration noise, Doppler detuning, and accumulated systematic effects rather than by pulse duration itself. These results establish superadiabatic Raman control as a promising approach for scalable high-fidelity atom optics and clarify the physical limitations governing compact high-order atom interferometers.

10.
arXiv (quant-ph) 2026-06-16

MAPS: A Novel Multi-Axial Projective Sphere for Geometrically Visualizing Higher d-Valued Quantum State-Space of Qudits

Authors:

arXiv:2606.15801v1 Announce Type: new Abstract: Visualizing the d-valued quantum state-space of quantum systems serves as a foundational pillar for the scientific research and practical applications in quantum computing and information science, where d >= 2. The 2-valued quantum states of a qubit are elegantly visualized on the three-dimensional Bloch sphere. In contrast, expanding this geometrical paradigm to visualize higher d-valued quantum states of a qudit (d >= 3), e.g., a qutrit (d=3), ququadit (d=4), and quintit (d=5), leads to severe structural and topological complexities. This paper introduces a new generalized three-dimensional framework to effectively visualize higher d-valued quantum states of a qudit, in the aspects of ease of illustration, structural simplicity, and natural representation for researchers and engineers. We called this new framework the "multi-axial projective sphere (MAPS)", which consists of n projectional intersecting spatial axes, where d-1

11.
arXiv (CS.LG) 2026-06-12

Robustness Verification of Recurrent Neural Networks with Abstraction Refinement

arXiv:2606.12490v1 Announce Type: new Abstract: Certified local robustness verification for recurrent neural networks (RNNs) is challenging because approximation errors introduced by nonlinear relaxations can propagate through recurrent connections and accumulate over time. As a result, scalable linear bound propagation methods often become overly conservative and fail to certify inputs that are in fact robust, especially when many pre-activation intervals cross zero. We propose an abstraction-refinement framework for RNN verification that partitions such intervals to remove the dominant relaxation error: on each refined branch, ReLU becomes exact, and smooth activations such as tanh and sigmoid admit substantially tighter linear envelopes. To control the combinatorial cost of splitting in long sequences, we introduce a SHAP-guided timestep selection strategy that ranks hidden states by their contribution to the verification objective and refines only the most critical timesteps in temporal order. Experiments on CIFAR10 and MNIST stroke benchmarks demonstrate consistent improvements in verification success and robustness-margin tightness over abstraction-only baselines, while exposing clear runtime trade-offs between ReLU and tanh models.

12.
medRxiv (Medicine) 2026-06-16

Supplementation with Arabinoxylan Dietary Fiber at Low Doses Produces Behavioral, Metabolic, and Gut Microbial Changes in Healthy, Overweight Adults: A Randomized Placebo-Controlled Trial

Background: Dietary fiber comprises a heterogeneous group of compounds with distinct physicochemical properties and biological effects. As such, functional outcomes observed for one fiber cannot be generalized to others. Some fermentable fibers, such as arabinoxylan, may exert biologically selective effects across multiple physiological domains, highlighting the need to evaluate individual ingredients for their domain-specific activity in controlled human studies. Methods: In this randomized, double-blind, parallel, 3-arm, placebo-controlled trial, healthy, overweight adults were assigned to consume one of two low doses of an arabinoxylan dietary fiber (3.5g or 5g) or placebo over the intervention period. Self-reported appetite sensations were assessed as the primary outcome using validated visual analogue scales. Secondary and exploratory endpoints included lipid parameters, gastrointestinal outcomes, mood-related measures, and gut microbiota composition and fermentation-derived metabolites. Analyses were conducted in the full analysis set and a high-compliance population to assess responses under sustained intake conditions, as per the intended dosing regimen. Results: The primary endpoint of appetite sensations did not differ between either arabinoxylan group and placebo. In contrast, evidence of microbial fermentation and selective microbiota engagement was observed. These responses occurred alongside consistent and favorable changes in lipid parameters under conditions of sustained intake, including reductions in low-density lipoprotein cholesterol and triglycerides. Additional outcomes, including gastrointestinal symptoms and mood, demonstrated domain-specific responses. Conclusion: This study demonstrates that supplementation with low doses of arabinoxylan dietary fiber elicit biologically selective, domain-specific effects across metabolic, microbial, gastrointestinal, and behavioral outcomes, particularly under conditions of sustained intake. These responses occurred independently of changes in appetite sensation, indicating that functional effects were not mediated through appetite-related pathways. Collectively, the findings highlight the ingredient's biological versatility and contextual responsiveness across physiological systems, and suggest its prebiotic potential through alignment with ISAPP's definition of a prebiotic, supporting further investigation of specific mechanistic pathways. Clinical trial registration: https://clinicaltrials.gov/study/NCT06884449, identifier: NCT06884449

13.
medRxiv (Medicine) 2026-06-12

Genetic basis of dynamic brain states reveals cellular and disease associations

Dynamic resting-state fMRI captures the time-varying patterns of brain activity that are obscured by static approaches. Hidden Markov Models (HMMs) characterise these dynamics as recurring whole-brain states and quantify their fractional occupancy (FO), the proportion of time spent in each state, yet the biological basis of inter-individual variation in FO remains unclear. Using data from 52,335 White UK Biobank participants, with replication in East and South Asian subsamples, this study examined the heritability, cellular and neurotransmitter basis of brain states, and their links with complex phenotypes. FO was significantly heritable and enriched for neuronal populations, particularly glutamatergic and GABAergic signalling. Analyses identified shared and state-specific loci and revealed genetic correlations, colocalisation, and potential causal relationships between FO and several phenotypes, including educational attainment, sleep duration, and disease risk. These findings establish dynamic brain states as biologically grounded intermediate phenotypes, linking genetic variation to neural dynamics, diseases and traits.

14.
arXiv (CS.LG) 2026-06-19

Adversarial Bandit Optimization with Globally Bounded Perturbations to Convex Losses

arXiv:2606.19891v1 Announce Type: new Abstract: We study adversarial bandit optimization in which the loss functions may be non-convex and non-smooth. In each round, the learner selects an action and observes only the loss incurred at that action. The loss consists of an underlying convex and $\beta$-smooth component and an adversarial perturbation that may be chosen after observing the learner's action. The perturbations are subject to a global budget controlling their cumulative magnitude over time. This framework extends the globally budgeted, post-action perturbation model from underlying linear losses to general convex and $\beta$-smooth losses. For this broader class, we establish expected regret guarantees that explicitly characterize the effect of the perturbation budget. To establish these guarantees, we modify a standard bandit optimization algorithm and develop an analysis that controls the additional regret caused by the perturbations. In the absence of perturbations, our results reduce to regret guarantees for the standard bandit convex optimization setting with $\beta$-smooth losses.

15.
arXiv (CS.CV) 2026-06-17

See First, Answer Later: Visual Evidence Pre-Alignment via Sufficiency-Driven RL

Multimodal large language models (MLLMs) integrate strong text reasoning with visual inputs, yet their responses can be inconsistent with the underlying images, indicating ineffective utilization of visual evidence during inference. The prevailing training paradigm relies on large-scale caption-based pretraining for general alignment, followed by supervised fine-tuning and reinforcement learning to enable instruction following and complex reasoning. However, such pretraining provides only weak visual grounding: short, coarse captions bias models toward salient objects while neglecting fine-grained visual evidence. In this paper, we introduce Visual Evidence Pre-Alignment (VEPA), an intermediate stage between pretraining and post-training that explores a novel sufficiency-driven objective with Group Relative Policy Optimization (GRPO) to optimize question-conditioned visual evidence descriptions. Extensive experiments across diverse benchmarks show that our VEPA consistently enhances performance on visually demanding evaluations and complements standard supervised post-training. Further analyses show that the income stems from strengthened, transferable visual grounding, rather than from additional task-specific training.

16.
arXiv (CS.CV) 2026-06-18

HeatKV: Head-tuned KV-cache Compression for Visual Autoregressive Modeling

Visual Autoregressive (VAR) models have recently demonstrated impressive image generation quality while maintaining low latency. However, they suffer from severe KV-cache memory constraints, often requiring gigabytes of memory per generated image. We introduce HeatKV, a novel compression method that adapts cache allocation in each head based on its attention to previously generated scales. Using a small offline calibration set, the attention heads are ranked according to their attention scores over prior scales. Based on this ranking, we construct a static pruning schedule tailored to a given memory budget. Applied to the Infinity-2B model, HeatKV achieves $2 \times$ higher compression ratio in memory allocation for KV cache compared to existing methods, while maintaining similar or better image fidelity, prompt alignment and human perception score. Our method achieves a new state-of-the-art (SOTA) for VAR model KV-cache compression, showcasing the effectiveness of fine-grained, head-specific cache allocation. Code and calibration script available at https://github.com/arm-research/heatkv.

17.
arXiv (quant-ph) 2026-06-12

Exploring Exotic Spin-Dependent Interactions Beyond the Standard Model: Theoretical Foundations and Experimental Investigations

arXiv:2606.13318v1 Announce Type: cross Abstract: New interactions mediated by novel particles propose solutions to several important questions in modern physics. Axions serve as examples of such particles; they are lightweight and interact weakly with ordinary matter. This category of particles, including those similar to axions-termed Axion-Like Particles (ALPs)-arises from diverse theoretical frameworks, such as the Peccei-Quinn mechanism addressing the strong CP problem, string theory, and spontaneous supersymmetry breaking. Given their light mass and weak coupling, ALPs are also possible candidates for cold dark matter. Introducing these new interactions mediated by novel particles not only tackles several challenges in modern physics but also raises a crucial question: Are there undiscovered interactions beyond the Standard Model? Many of the interactions predicted by these theories are spin-dependent, which is the primary focus of this review. In this review, we first outline the theoretical foundations for investigating exotic spin-dependent interactions, highlighting their importance in various models beyond the Standard Model. We examine the potential roles of new lightweight particles in mediating these interactions, which may enhance our understanding of dark matter. Relevant formulas derived from theoretical models are included to support experimental investigations. Following this theoretical framework, we conduct a detailed review of recent experimental efforts to detect these exotic interactions. A systematic review of current constraints on these interactions is presented, along with an assessment of various detection approaches.

18.
arXiv (CS.CL) 2026-06-12

sebis at CRF Filling 2026: A Two-Stage Local LLM Pipeline for Medical CRF Filling

The extraction of structured clinical information from unstructured EHR notes is a persistent bottleneck in healthcare informatics. While large language models (LLMs) offer high performance, their deployment in clinical settings is hindered by privacy risks, inference costs, and the tendency to hallucinate beyond textual evidence. We address these challenges for the CL4Health 2026 Case Report Form (CRF) filling task by proposing a fully local, domain-adapted pipeline using the MedGemma-27B model. Our two-stage architecture, which separates binary presence classification from value extraction, enforces strict adherence to textual evidence and ensures deterministic outputs for negated, uncertain, or unknown states. By leveraging item-specific, few-shot in-context learning without external API calls or fine-tuning, our approach achieves a macro-F1 score of 0.55 on the official English test track. This result secures second place among all locally-hosted, open-source submissions. Our work demonstrates that privacy-preserving, on-premise LLM pipelines can achieve near-competitive performance with proprietary frontier models, providing a practical, data-sovereign framework for clinical NLP.

19.
arXiv (CS.LG) 2026-06-11

Flow Matching with In-Context Priors for Out-of-Distribution Brain Dynamics

arXiv:2606.11833v1 Announce Type: new Abstract: Flow matching and diffusion models enable conditional generation across domains ranging from images to proteins, with recent extensions to out-of-distribution contexts. Yet generative models of neural time series have largely remained restricted to categorical conditioning, precluding compositional and zero-shot generalization. In this work, we propose a per-timestep conditioned diffusion transformer for generating realistic fMRI brain dynamics during unseen cognitive tasks by injecting both compositional language and optional spatial priors in-context. Such zero-shot generation could enable counterfactual neuroscience by supporting in-silico design and evaluation of novel cognitive experiments before empirical validation. Leveraging this model, we evaluate across hundreds of held-out task conditions and characterize predictive performance in relation to the training manifold. From language alone, the model recovers region-specific recruitment across tasks and held-out spatial activation patterns. Spatial priors, when available, complement the text pathway by anchoring generation in regions of task space where language alone degrades, while retaining the compositional structure needed for counterfactual task specification. To our knowledge this is the first generative model of whole-cortex fMRI dynamics for unseen cognitive tasks, advancing counterfactual neuroscience and data-driven experimental design.

20.
arXiv (CS.CL) 2026-06-19

ShoppingBench: A Real-World Intent-Grounded Shopping Benchmark for LLM-based Agents

Existing benchmarks in e-commerce primarily focus on basic user intents, such as finding or purchasing products. However, real-world users often pursue more complex goals, such as applying vouchers, managing budgets, and finding multi-products seller. To bridge this gap, we propose ShoppingBench, a novel end-to-end shopping benchmark designed to encompass increasingly challenging levels of grounded intent. Specifically, we propose a scalable framework to simulate user instructions based on various intents derived from sampled real-world products. To facilitate consistent and reliable evaluations, we provide a large-scale shopping sandbox that serves as an interactive simulated environment, incorporating over 2.5 million real-world products. Experimental results demonstrate that even state-of-the-art language agents (such as GPT-4.1) achieve absolute success rates under 50% on our benchmark tasks, highlighting the significant challenges posed by our ShoppingBench. In addition, we propose a trajectory distillation strategy and leverage supervised fine-tuning, along with reinforcement learning on synthetic trajectories, to distill the capabilities of a large language agent into a smaller one. As a result, our trained agent achieves competitive performance compared to GPT-4.1.

21.
arXiv (CS.CL) 2026-06-18

Narrative Theory-Driven LLM Methods for Automatic Story Generation and Understanding: A Survey

Applications of narrative theories using large language models (LLMs) deliver promising methods in automatic story generation and understanding tasks. Our survey examines how natural language processing (NLP) research uses LLM methods to engage with diverse concepts from narrative studies. We use established distinctions from narratology to categorise ongoing efforts and discover the following: \redtext{(a) narrative texts come from diverse sources beyond just literature, (b) theoretical synthesis and validation are potential outcomes, (c) generation tasks lag behind understanding in several ways: theoretical application, post-training methods, exploring non-fiction narratives and addressing narrative levels beyond fabula and discourse.} For future directions, instead of the pursuit of a single, generalised benchmark for `narrative quality', we believe that progress can benefit from efforts that focus on the following: defining and improving theory-based metrics for individual narrative attributes; continue conducting large-scale, theory-driven literary/social/cultural analysis; generating narratives in situated contexts; and continuing experiments where outputs can be used to validate or refine narrative theories. This work provides a contextual foundation for more systematic and theoretically informed narrative research in NLP by providing an overview to ongoing research efforts and the broader narrative studies landscape.

22.
arXiv (math.PR) 2026-06-18

Functions of Bounded Variation and Point Processes

arXiv:2606.08304v2 Announce Type: replace-cross Abstract: We investigate the relationship between the analytical properties of functions of bounded variation and the statistical behavior of hyperuniform point processes. We establish several characterization formulas for the jump part of the gradient of a bounded variation function, extending and unifying previous results by Beretti–Gennaioli and Dávila. In particular, we provide new expressions for the $L^2$-jump of the gradient using both difference quotients and Fourier transform methods. Furthermore, we connect these analytic structures to the theory of hyperuniform point processes. By analyzing the variance of linear statistics associated with bounded variation functions, we provide asymptotic estimates that depend on the specific classification of the hyperuniformity of the point process. The results show how the regularity and jump discontinuities of a function dictate the growth rate of fluctuations in point processes. Finally, we introduce an averaged quadratic BMO-type oscillation functional over translated and rotated cube partitions, similar to the one recently studied by Ambrosio et al., and prove, using results from point process, that it converges to an explicit dimensional constant times the $L^2-$jump, giving in particular a further new characterization of the perimeter of a set.

23.
arXiv (CS.LG) 2026-06-18

KEPLA: A Knowledge-Enhanced Deep Learning Framework for Accurate Protein-Ligand Binding Affinity Prediction

arXiv:2506.13196v5 Announce Type: replace Abstract: Accurate prediction of protein-ligand binding affinity is critical for drug discovery. While recent deep learning approaches have demonstrated promising results, they often rely solely on structural features of proteins and ligands, overlooking their valuable biochemical knowledge associated with binding affinity. To address this limitation, we propose KEPLA, a novel deep learning framework that explicitly integrates prior knowledge from Gene Ontology and ligand properties to enhance prediction performance. KEPLA takes protein sequences and ligand molecular graphs as input and optimizes two complementary objectives: (1) aligning global representations with knowledge graph relations to capture domain-specific biochemical insights, and (2) leveraging cross attention between local representations to construct fine-grained joint embeddings for prediction. Experiments on two benchmark datasets across both in-domain and cross-domain scenarios demonstrate that KEPLA consistently outperforms state-of-the-art baselines. Furthermore, interpretability analyses based on knowledge graph relations and cross attention maps provide valuable insights into the underlying predictive mechanisms.

24.
arXiv (CS.CL) 2026-06-18

ASyMOB: Algebraic Symbolic Mathematical Operations Benchmark

Large language models (LLMs) are increasingly applied to symbolic mathematics, yet existing evaluations often conflate pattern memorization with genuine reasoning. To address this gap, we present ASyMOB, a high-resolution dataset of 35,368 validated symbolic math problems spanning integration, limits, differential equations, series, and hypergeometrics. Unlike prior benchmarks, ASyMOB systematically perturbs each seed problem using symbolic, numeric, and equivalence-preserving transformations, enabling a fine-grained assessment of generalization. Our evaluation reveals three key findings: (1) most models' performance collapses under minor perturbations, while top systems exhibit an apparent regime shift in robustness; (2) integrated code tools stabilize performance, particularly for weaker models; and (3) we identify examples where Computer Algebra Systems (CAS) fail while LLMs succeed, as well as problems solved only via a hybrid LLM-CAS approach, highlighting a promising integration frontier. ASyMOB serves as a principled diagnostic tool for measuring and accelerating progress toward building verifiable, trustworthy AI for scientific discovery.

25.
arXiv (math.PR) 2026-06-12

Sub-Riemannian spectral distance

arXiv:2606.12804v1 Announce Type: cross Abstract: We study eigenvalues and eigenfunctions of the ``div-grad type" sub-Laplacian with respect to Popp's volume on a compact equiregular sub-Riemannian manifold $M$. Since Popp's volume is canonically determined by the sub-Riemannian structure of $M$, the spetra of the sub-Laplacian carry geometric meanings. In this paper, we first embed $M$ into the Hilbert space of square-summable sequences using eigenfunctions and then define a spectral distance between two compact equiregular sub-Riemannian manifolds. Our result is a sub-Riemannian analogue of Berard-Besson-Gallot's classical work in the Riemannian case.