Academic Intelligence · Curated Daily

Explore the Frontier of Global Academia

AcademicHub aggregates real-time literature from top journals and preprint platforms. Build your personal research radar and let large language models compile cross-disciplinary analysis briefings automatically.

01.
arXiv (CS.LG) 2026-06-11

PAWS: Preference Learning with Advantage-Weighted Segments

arXiv:2606.11982v1 Announce Type: new Abstract: Preference-based reinforcement learning (PbRL) learns policies from human trajectory-level comparisons, avoiding explicit reward design and expert demonstrations. Existing methods typically train utility functions on trajectory or segment-level preferences while relying on per-step utility estimates during policy optimization. This training and inference mismatch induces a distribution shift that severely degrades temporal credit assignment and limits policy learning. We analyze this issue and propose PAWS, a segment-based preference learning method that performs policy updates directly using segment-level advantage functions. By aligning utility training with policy optimization, PAWS preserves trajectory-level preference information and avoids unreliable per-step learning signals. Experiments on simulated robotic manipulation and locomotion tasks demonstrate that PAWS consistently outperforms existing PbRL approaches, highlighting the importance of distribution-consistent preference learning.

02.
arXiv (CS.AI) 2026-06-18

An In-depth Study of LLM Contributions to the Bin Packing Problem

arXiv:2510.27353v2 Announce Type: replace Abstract: Recent studies have suggested that Large Language Models (LLMs) could provide interesting ideas contributing to mathematical discovery. This claim was motivated by reports that LLM-based genetic algorithms produced heuristics offering new insights into the online bin packing problem under uniform and Weibull distributions. In this work, we reassess this claim through a detailed analysis of the heuristics produced by LLMs, examining both their behavior and interpretability. Despite being human-readable, these heuristics remain largely opaque even to domain experts. Building on this analysis, we propose a new class of algorithms tailored to these specific bin packing instances. The derived algorithms are significantly simpler, more efficient, more interpretable, and more generalizable, suggesting that the considered instances are themselves relatively simple. We then discuss the limitations of the claim regarding LLMs' contribution to this problem, which appears to rest on the mistaken assumption that the instances had previously been studied. Our findings instead emphasize the need for rigorous validation and contextualization when assessing the scientific value of LLM-generated outputs.

03.
arXiv (CS.CL) 2026-06-18

Narrative Theory-Driven LLM Methods for Automatic Story Generation and Understanding: A Survey

Applications of narrative theories using large language models (LLMs) deliver promising methods in automatic story generation and understanding tasks. Our survey examines how natural language processing (NLP) research uses LLM methods to engage with diverse concepts from narrative studies. We use established distinctions from narratology to categorise ongoing efforts and discover the following: \redtext{(a) narrative texts come from diverse sources beyond just literature, (b) theoretical synthesis and validation are potential outcomes, (c) generation tasks lag behind understanding in several ways: theoretical application, post-training methods, exploring non-fiction narratives and addressing narrative levels beyond fabula and discourse.} For future directions, instead of the pursuit of a single, generalised benchmark for `narrative quality', we believe that progress can benefit from efforts that focus on the following: defining and improving theory-based metrics for individual narrative attributes; continue conducting large-scale, theory-driven literary/social/cultural analysis; generating narratives in situated contexts; and continuing experiments where outputs can be used to validate or refine narrative theories. This work provides a contextual foundation for more systematic and theoretically informed narrative research in NLP by providing an overview to ongoing research efforts and the broader narrative studies landscape.

04.
bioRxiv (Bioinfo) 2026-06-14

Cellfm-datasets: A Unified Data Infrastructure for Single-Cell and Spatial Transcriptomics Foundation Model Pretraining

Large-scale cell foundation models are increasingly limited not only by model architecture, but also by the data infrastructure required to repeatedly sample sparse transcriptomic profiles from out-of-core cohorts. AnnData/H5AD has become a standard exchange format for single-cell and spatial omics analysis, yet its HDF5-backed layout is not designed for high-frequency random mini-batch loading under multi-worker and distributed pretraining. We present Cellfm-datasets, a data infrastructure artifact that converts H5AD cohorts into a self-describing compressed sparse row (CSR) memmap layout and exposes the resulting corpus through Hugging Face Dataset and IterableDataset interfaces. The artifact stores a shared gene vocabulary, per-sample metadata, optional spatial coordinates, observation metadata, manifests, and checksums, and reconstructs sparse cell or group records at runtime without dense expansion. A unified sampling abstraction supports random-cell groups, manifest-defined biological regions, and coordinate-based spatial blocks, with deterministic sharding across distributed ranks and data-loader workers. Spatial demonstrations on P14 mouse brain transcriptomics sections illustrate region- and block-level sampling over real anatomical structures. In controlled benchmarks on a public heterogeneous ModelScope scRNA-seq subset, Cellfm-datasets reached 60,571 +/- 1,734 samples/s in single-core random loading, scaled to approximately 160,000 samples/s with eight workers, and maintained near-constant process-private memory while reading up to one million cells. By moving sparse single-cell and spatial corpora from model-specific loader code into reusable, validated, and framework-native dataset artifacts, this design may reduce the engineering burden of reproducible cell foundation model pretraining and make repeated training runs, model comparisons, and mixed-modality data reuse easier to standardize.

05.
PLOS Computational Biology 2026-06-04

Cell differentiation can underpin the reproducibility of morphogenesis

by Dominic K. Devlin, Austen R. D. Ganley, Nobuto Takeuchi Morphogenesis of complex body shapes is reproducible despite the noise inherent in the underlying morphogenetic processes. However, how these morphogenetic processes work together to achieve this reproducibility remains unclear. Here, we ask how this reproducibility is achieved by evolving complex morphologies in a multi-scale, computational model. Each morphology consists of a population of cells on a two-dimensional grid using the Cellular Potts Model framework. Each cell contains a genome that encodes a gene regulatory network, morphogens for cell-cell signalling, and proteins that determine cell behaviours. By repeatedly simulating our model with different initial conditions under selection for shape complexity, we obtained a “zoo” of evolved morphologies. We find that these evolved, complex morphologies are reproducible in a sizeable fraction of simulations, despite no direct selection for reproducibility. We show that high reproducibility is caused by spatially segregating moving cells that “shape” morphologies from stationary cells that “maintain” morphologies during morphogenesis. Strikingly, most highly reproducible morphologies also evolved cell differentiation, where proliferative, moving progenitor cells irreversibly differentiate into non-dividing, stationary differentiated cells at tissue boundaries. These results suggest that cell differentiation observed in natural development plays a fundamental role in morphogenesis in addition to the production of specialised cell types. This previously unrecognised role of cell differentiation has major implications for our understanding of how morphologies are generated and regenerated.

06.
arXiv (quant-ph) 2026-06-16

Dressed Floquet scars from protected zero modes in a Rydberg chain

arXiv:2606.15605v1 Announce Type: cross Abstract: In this Letter, we present an approximate analytic construction of two zero quasienergy quantum many-body scars in a periodically driven model of Rydberg atoms on a ring, which persist over a range of driving amplitudes and frequencies for finite sizes. An index theorem protects an exponentially large number (in system size) of exact zero energy modes of the Floquet Hamiltonian in this setting. Unlike most of these zero modes which continuously change with drive parameters, these two quantum many-body scars retain the memory of particular states. They can be expressed as {\it dressed versions} of two contrasting states, the Rydberg vacuum and a unitarily rotated variant of a volume-law scar [Ivanov and Motrunich, Phys. Rev. Lett. {\bf 134}, 050403 (2025)], respectively. We provide an analytic understanding of their existence using a Floquet perturbation theory and show their resilience beyond the perturbative regime using exact diagonalization in finite systems. Our study provides insight into the structure of protected zero modes in interacting Floquet settings.

08.
arXiv (CS.AI) 2026-06-15

Learning optimal policies from event logs through reinforcement learning: a comparison of deep and MDP-based approaches

arXiv:2303.09209v2 Announce Type: replace Abstract: Prescriptive Process Monitoring is an emerging area within Process Mining that focuses on recommending actions to optimize business outcomes. Most existing works prescribe pre-defined interventions, i.e., sets of actions applied to ongoing process executions to achieve a specific objective or Key Performance Indicator (KPI). In contrast, only a few approaches have explored learning and evaluating optimal behavioral policies, i.e., general strategies that determine the best sequence of actions to maximize a desired KPI. In this paper, we address the problem of learning optimal behavioral policies by proposing an AI-based approach that learns an optimal policy directly from historical process executions using Reinforcement Learning (RL) to recommend the best actions for optimizing a KPI. To this end, we employ two RL techniques. The first is a classical model-based approach that extends previous work by the authors through the construction of a Markov Decision Process (MDP) capturing process behavior. The second is a model-free technique based on offline Deep RL. Unlike state-of-the-art work, we aim to minimize the use of domain knowledge and learn optimal policies directly from historical event data. This allows us to learn when to apply interventions and discover effective ones directly from data. Moreover, we target complex scenarios involving external actors, where the process owner controls only part of the activities. We adopt a data-driven Business Process Simulation (BPS) environment to evaluate the learned policies. Results show that both methods improve the targeted KPI with similar effectiveness, while the model-based approach outperforms offline Deep RL in computational efficiency.

09.
arXiv (CS.CV) 2026-06-16

Projection and Quantisation: A Unifying View of Learning to Hash, from Random Projections to the RAG Era

Authors:

Approximate nearest-neighbour search underpins large-scale retrieval and retrieval-augmented generation, yet its methods are studied in communities that seldom read one another. We argue that they form one field with three design choices. We develop the projection-quantisation-organisation lens: every method places its projections, places its quantisation thresholds, and organises the resulting codes for search. We test the lens with a reproducible measurement, released as the open BitBudget benchmark, and report three findings. First, the quantisation axis delivers the largest memory savings: a one-bit code with full-precision re-ranking matches uncompressed quality for six of seven embedders, the scanned code one thirty-second of the float's size. Second, the orderings the lens anticipates, including a learned-embedding regime where binary codes overtake an inverted-file product quantiser at a matched byte budget, recur as the embedding is enlarged. Third, given class labels, an eight-byte supervised code more than doubles the retrieval quality of the two-kilobyte task-agnostic float it replaces. We also recast the semantic identifiers of generative retrieval as quantisation codes. The main contribution is a single, tested account of compact-code search, from random projections to the retrieval-augmented era.

10.
arXiv (CS.LG) 2026-06-16

The Machine Learning Approach to Moment Closure Relations for Plasma: A Review

arXiv:2511.22486v3 Announce Type: replace-cross Abstract: The requirement for large-scale global simulations of plasma is an ongoing challenge in both space and laboratory plasma physics. Any simulation based on a fluid model inherently requires a closure relation for the high order plasma moments. This review compiles and analyses the recent surge of machine learning approaches developing improved plasma closure models capable of capturing kinetic phenomena within plasma fluid models. We survey two methodological families: neural-network surrogates (from multilayer perceptrons to Fourier neural operators, the latter recently reproducing both linear and non-linear Landau damping online within a fluid solver) and equation-discovery methods such as sparse regression; and organise the studies by whether they are tested offline against reference data or online within a time-evolving solver. We outline the challenges associated with machine-learning closures, including off-diagonal pressure-tensor accuracy, generalisation beyond the training distribution, and stable integration into large-scale simulations, and the directions future research might take to address them.

11.
arXiv (CS.CV) 2026-06-17

Phenotyping TPF via Self-Supervised Learning: A Label-Agnostic Framework with Expert Validation

The full potential of artificial intelligence in tibial plateau fracture characterisation remains unrealised, constrained by a fundamental dependency on labelled datasets whose consistency cannot be guaranteed: conventional classification schemes such as Schatzker and AO/OTA suffer from inter-observer variability, causing supervised models to learn human disagreement rather than stable fracture morphology. We design, implement, and validate a label-agnostic framework that eliminates this constraint by learning fracture representations directly from imaging data without observer-assigned labels. A RadImageNet-pretrained ResNet-50 encoder is fine-tuned on 154 cleaned knee radiographs using the SimCLR contrastive objective, preceded by a data cleaning protocol and followed by UMAP dimensionality reduction and k-means clustering to discover four imaging-derived phenotypes. Phenotype validity is assessed through a blinded expert review protocol administered to two independent clinicians. The four phenotypes demonstrate robust stability (bootstrap ARI = 0.319 +/- 0.041), strong internal cohesion (silhouette = 0.511), and coherence ratings of 3-5/5 from both reviewers under blinded conditions; one phenotype was unanimously identified as exhibiting comminution – a high-complexity feature isolated without any supervisory signal. Inter-partition comparison against Schatzker labels yields ARI = 0.013, confirming orthogonality to conventional classification boundaries. Notably, expert reviewers anchored to established classification vocabularies perceived imaging-derived groups as heterogeneous precisely where Schatzker alignment was lowest, suggesting that Schatzker-trained perception and label-agnostic embedding geometry measure orthogonal dimensions. These findings establish label-agnostic SSL phenotyping as a reproducible and clinically interpretable complement to conventional classification.

12.
arXiv (math.PR) 2026-06-17

Analysis of the asymmetric shelf shuffle

arXiv:2606.18047v1 Announce Type: new Abstract: In an asymmetric shelf shuffle, a deck of $n$ cards is dealt sequentially from the bottom and assigned one of the $m$ shelves uniformly at random. The card is placed at the top of the assigned shelf with probability $p$, and at the bottom of the assigned shelf with probability $(1-p)$. Analysis of the shelf shuffle has gained much attention recently, and the case $p=1/2$ was first treated by Diaconis–Fulman–Holmes [Ann. Appl. Prob. 23 (2013), no. 4, 1692–1720]. In this paper, we extend the analysis of the shelf shuffle to general $p\in (0, 1)$. In particular, we study the distribution of cycles, cycle lengths, number of descents, number of valleys, number of inversions, and the RSK shape of a permutation obtained from an asymmetric shelf shuffle. Our results extend the analysis of Diaconis–Fulman–Holmes to arbitrary $p$. Furthermore, our analysis of the distribution of descents and inversions is new even for $p=1/2$.

13.
arXiv (CS.AI) 2026-06-16

DualGauge: Automated Joint Security-Functionality Benchmarking of Specification-Only Code Generation by LLMs and Coding Agents

arXiv:2511.20709v2 Announce Type: replace-cross Abstract: Large language models (LLMs) and LLM-based coding agents are now used to generate code from natural-language specifications, yet ensuring such code is both functionally correct and secure remains a challenge. We present DualGauge, the first fully automated framework for jointly evaluating correctness and security of specification-only code generation, supported by DualGauge-Bench, a language-agnostic benchmark of 307 coding tasks each paired with functional and security tests derived from the same specification. Evaluating 10 representative LLMs across Python, C++, and JavaScript, we find that functional correctness substantially overestimates reliable code generation: even the strongest model remains below 15% joint security-functionality success in every language. Common model-side factors–scale, extended thinking, quantization, instruction tuning, and code specialization–do not reliably improve joint performance, suggesting secure-and-correct code generation does not simply emerge from stronger coding capability. Evaluation of 3 leading agentic coding systems (Codex, OpenHands, and Claude Code) shows that iterative scaffolding provides no advantage over direct (LLM-based) generation on specification-only tasks. A qualitative audit reveals failures concentrate at the output contract boundary and in guards that exist but are insufficient–patterns that only joint benchmarking reliably exposes.

14.
arXiv (quant-ph) 2026-06-16

Preparation of Fractional Quantum Hall States on Quantum Computers

arXiv:2606.16548v1 Announce Type: new Abstract: The realization of fractional quantum Hall (FQH) states, characterized by fractional charge and intrinsic topological order, on quantum computers represents a central challenge at the interface of condensed matter physics and quantum information science. Current methods are grouped into two types: methods based on (quasi-)adiabatic evolution of complex parent Hamiltonians to yield target states, and circuit-based approaches for direct state preparation, which are confined to effectively one-dimensional systems near the thin cylinder or torus limit. We introduce a complementary scheme relying on direct quantum circuit construction, which works for arbitrary geometries. Specifically, we present a method to precisely prepare the $\nu=1/3$ Laughlin state on the sphere geometry and demonstrate that it significantly reduces the required number of two-qubit gates and circuit depth, compared to variational quantum circuit approaches. In addition, we employ optimal control techniques to design control pulses for both superconducting and Rydberg atom platforms, identifying experimentally feasible protocols for state preparation. Our results provide an efficient and hardware-relevant pathway for realizing generic FQH states on both noisy intermediate-scale and fault-tolerant quantum devices.

15.
arXiv (CS.CV) 2026-06-12

Towards Effective Waste Segmentation for Automated Waste Recycling in Cluttered Background

Rapid expansion of urban areas and population growth is causing an immense increase in waste production, which demands the need for efficient and automated waste management. In this scenario, automated waste recycling (AWR) using deep learning methods can assist humans in optimal waste management. Recent deep learning approaches for AWR provide promising waste segmentation performance, however, these methods rely on large backbone networks that are inefficient for AWR systems and suffer from performance deterioration in cluttered scenes. To this end, an optimal waste segmentation network is introduced which effectively utilizes the spatial domain to capture localized structural dependencies and the spectral domain to efficiently extract global contextual relationships. This cascaded design allows the network to progressively leverage both local and global representations across complementary domains to highlight the semantic information necessary for effective segmentation of various waste objects. Furthermore, auxiliary feature enhancement module (AFEM) is introduced to enhance the target objects' boundaries and blob amplification for better segmentation in cluttered scenarios. Extensive experimentation on ZeroWaste-aug, ZeroWaste-f and SpectralWaste datasets reveals the merits of the proposed method.

16.
arXiv (CS.AI) 2026-06-12

Benchmarking AI Agents for Addressing Scientific Challenges Across Scales

arXiv:2606.12736v1 Announce Type: new Abstract: AI agents are increasingly being developed to accelerate scientific discovery, yet their practical capabilities in real research settings remain poorly understood. Existing benchmarks for AI agents rarely capture the complexity, heterogeneity, and extended reasoning required by scientific work, whereas benchmarks for scientific tasks often reduce research to static, direct problems and provide limited support for interactive evaluation. Here, we introduce SciAgentArena, a systematic benchmark for evaluating AI agents in real-world scientific research scenarios drawn from emerging needs across multiple domains. SciAgentArena comprises approximately 200 tasks with stepwise verification and an interactive, agent-agnostic environment for assessing diverse AI agents. Using this benchmark, we find that current agents can contribute effectively to well-specified data-analysis workflows, particularly when the task structure and evaluation criteria are clear. However, their performance remains uneven across scientific contexts: agents struggle to generate genuinely novel insights, sustain self-directed exploration, and formulate robust solutions for open-ended research questions. We further characterize common failure modes across agents and identify opportunities for improving their reliability, autonomy, and scientific reasoning. Together, SciAgentArena provides a practical framework for measuring progress in AI agents for science and for guiding the design of future agents capable of addressing complex scientific challenges. Full codes, tasks, and datasets can be accessed via this link: https://sciagentarena.github.io/.

17.
arXiv (CS.AI) 2026-06-18

AI-Driven Assessment of Human Tutors: Linking Training Performance to Real-Life Practice

arXiv:2606.18617v1 Announce Type: cross Abstract: There exist numerous tutor training platforms. However, few provide AI-driven training and evaluation for human tutors based on real-life performance. We present an AI-driven system that assesses both open responses during training and authentic real-life tutoring. Unlike platforms that only assess learning through online training or simulations, our system utilizes Generative AI (Gemini-2.5-pro) to analyze transcriptions of authentic tutoring, measuring the transfer of tutor skills to real-life application. Human tutors instructing students remotely in math (N=86) completed six scenario-based lessons, averaging a significant 7.4% learning gain. Using mixed-effects models across 405 session-to-lesson pairs, we found that training performance significantly predicted real-life transcript scores with an effect size of 0.25 SD. Model comparison (AIC/BIC) indicated averaging open response and multiple choice performance during training predicted real-life tutor performance best, although open responses were comparatively more predictive. Exploratory analysis showed that after training, tutors were significantly more likely to encounter pedagogical opportunities to apply their skills (61.1% to 68.9%) and demonstrated higher execution quality within those opportunities (65.5% to 68.1%). Interrupted time series analysis suggested that these tutor improvements were part of a gradual trend over time rather than an immediate intervention effect of training. We illustrate an AI-driven method to link tutor training with real-life assessment. In doing so, we contribute open datasets, AI prompts, and scoring rubrics to support transparency and reproducibility.

18.
arXiv (CS.AI) 2026-06-16

Mosaic: Data-Free Knowledge Distillation via Mixture-of-Experts for Heterogeneous Distributed Environments

arXiv:2505.19699v2 Announce Type: replace-cross Abstract: Federated Learning (FL) is a decentralized machine learning paradigm that enables clients to collaboratively train models while preserving data privacy. However, the coexistence of model and data heterogeneity gives rise to inconsistent representations and divergent optimization dynamics across clients, ultimately hindering robust global performance. To transcend these challenges, we propose Mosaic, a novel data-free knowledge distillation framework tailored for heterogeneous distributed environments. Mosaic first trains local generative models to approximate each client's personalized distribution, enabling synthetic data generation that safeguards privacy through strict separation from real data. Subsequently, Mosaic forms a Mixture-of-Experts (MoE) from client models based on their specialized knowledge, and distills it into a global model using the generated data. To further enhance the MoE architecture, Mosaic integrates expert predictions via a lightweight meta model trained on a few representative prototypes. Extensive experiments on standard image and multimodal benchmarks demonstrate that Mosaic consistently outperforms state-of-the-art approaches under both model and data heterogeneity. The source code has been published at https://github.com/Wings-Of-Disaster/Mosaic.

19.
arXiv (quant-ph) 2026-06-12

Entropic order parameters and topological holography

arXiv:2512.24225v2 Announce Type: replace-cross Abstract: We show that the symmetry topological field theory (SymTFT) construction, also known as the topological holography, provides a natural and intuitive framework for the entropic order parameter characterising phases with (partially) broken symmetries. Various examples of group and non-invertible symmetries are studied. In particular, the origin of the distinguishability of the vacua resulting from spontaneously broken non-invertible symmetries is made manifest with an information-theoretic perspective, where certain operators in the SymTFT are excluded from observation.

20.
medRxiv (Medicine) 2026-06-15

Non-Parametric Ancestry Adjustment for Polygenic Scores

Modern polygenic risk scores (PRS) exhibit shifts correlated with ancestry, leading to erroneous predictions for non-European individuals when models are trained on predominantly European cohorts. Such shifts arise from, among other factors, (1) algorithmic limitations in the ability of PRS model training to detect causal variants, rather than nearby variants with ancestry-dependent correlations to the causal one, (2) under-representation of alleles with higher prevalence in non-European populations in the association study training, and (3) gene-by-environment interactions where the environment is correlated with genetic ancestry. Current ancestry-adjustment methodologies often discretize individuals into population categories and apply a simple affine mapping to reduce these genetic ancestry biases. However, such approaches provide suboptimal adjustments, particularly for admixed individuals. In this work, we introduce a detailed theoretical characterization of ancestry-dependent biases and propose novel methods based on non-parametric neighborhood techniques that provide more accurate empirical results and admit statistical consistency guarantees. Extensive experiments using the UK Biobank demonstrate the effectiveness of the proposed methods.

21.
arXiv (quant-ph) 2026-06-19

Random Projections for Multi-Copy Quantum Algorithms

arXiv:2606.20238v1 Announce Type: new Abstract: Estimating nonlinear properties of quantum states is a central task in quantum information science. Multivariate traces, $\mathrm{tr}(\rho_1 \cdots \rho_K)$, and nonlinear observables such as $\mathrm{tr}(\rho^K)$, for integer $K$, can be accessed through collective measurements on multiple state copies, but standard protocols based on swap tests require coherent operations on the full Hilbert space and become experimentally unfeasible for large systems. In this work, we introduce a framework for multi-copy measurements based on random projections onto lower-dimensional subspaces prior to the collective measurement, which is then performed only on the reduced Hilbert space. This procedure yields a tunable tradeoff between coherent quantum resources and statistical sampling overhead, allowing the amount of coherent processing to be matched to the capabilities of the underlying hardware. We derive explicit formulas relating the Haar-averaged projected moments to multivariate traces of the original states and analyze the sampling overhead induced by the projection procedure. Specifically, after compressing an $n$-qubit state to a reduced $q$-qubit subspace, estimating $\mathrm{tr}(\rho^K)$ requires approximately $O(2^{(n-q)(K-1)})$ copies of $\rho$, with each qubit projected out increasing the sampling cost by a factor of $2^{K-1}$. Our results establish how coherent multi-copy operations can be traded for additional state copies, enabling multi-copy quantum protocols to be optimized for the available hardware resources.

22.
arXiv (CS.AI) 2026-06-11

An XAI View on Explainable ASP: Methods, Systems, and Perspectives

arXiv:2601.14764v2 Announce Type: replace Abstract: Answer Set Programming (ASP) is a popular declarative reasoning and problem solving approach in symbolic AI. Its rule-based formalism makes it inherently attractive for explainable and interpretive reasoning, which is gaining importance with the surge of Explainable AI (XAI). A number of explanation approaches and tools for ASP have been developed, which often tackle specific explanatory settings and may not cover all scenarios that ASP users encounter. In this survey, we provide, guided by an XAI perspective, an overview of types of ASP explanations in connection with user questions for explanation, and describe their coverage by current theory and tools. Furthermore, we pinpoint gaps in existing ASP explanations approaches and identify research directions for future work.

24.
arXiv (CS.AI) 2026-06-16

GIST-CMTF: Goal-State Inference for Causal Minimal Tool Filtering in LLM Agents

arXiv:2606.16813v1 Announce Type: new Abstract: Tool-augmented LLM agents rely on runtime filtering to decide which tools should be visible at each step. Causal Minimal Tool Filtering (CMTF) reduces tool-choice confusion by exposing only the next causally necessary tool frontier, but it assumes that the user request has already been mapped to a symbolic goal state. In practice, requests such as "handle my appointment" or "take care of this email" may correspond to multiple possible goals. This creates wrong-goal execution, where an agent follows a valid causal tool path for an unintended objective. We introduce GIST-CMTF, a goal-state inference layer that predicts candidate symbolic goals over the same state-transition vocabulary used by CMTF, estimates ambiguity, and either applies CMTF or exposes clarification as a causal action that produces missing goal or state variables. We evaluate GIST-CMTF across seven model backends, six filtering methods, and 120 controlled tool-use tasks. GIST-CMTF achieves 97.0% task success, compared with 80.1% for top-goal CMTF and 82.9% for semantic-goal CMTF. It reduces wrong-goal execution from 19.4% under top-goal CMTF to 2.5%, while preserving the one-tool exposure of causal filtering and using substantially fewer tokens than all-tools exposure. These results suggest that reliable tool-augmented agents should validate goal state, not only tool relevance, before exposing external actions.

25.
arXiv (CS.CV) 2026-06-18

Revisiting Active Speaker Detection: An In-the-Wild Benchmark for Generalization and Robustness

We present UniTalk, a novel dataset emphasizing challenging scenarios to enhance model generalization for the task of active speaker detection (ASD). Previously established benchmarks such as AVA predominantly comprise old movies and thus exhibit significant domain gaps with real-world video. In contrast, UniTalk covers diverse video types reflecting challenging real-world conditions, including underrepresented languages, noisy backgrounds, and crowded scenes, while being on par with AVA in scale. Extensive evaluations reveal that ASD remains unsolved under realistic conditions: state-of-the-art models near-perfect on AVA fail to reach saturation on UniTalk. Conversely, models trained on UniTalk generalize better to modern in-the-wild datasets including Talkies and ASW. UniTalk thus establishes a new benchmark for ASD, providing researchers with a valuable resource for developing and evaluating versatile and resilient models.