Academic Intelligence · Curated Daily

探索全球前沿学术脉络

AcademicHub 汇聚顶级期刊与预印本平台的实时文献。定制您的专属科研雷达,利用大语言模型自动生成交叉领域文献分析简报。

01.
arXiv (CS.AI) 2026-06-16

Do Large Language Models Have Emotions?

arXiv:2606.14742v1 Announce Type: cross Abstract: Do LLMs have emotions? A recent paper from Anthropic reports finding internal representations of emotion concepts in Claude Sonnet 4.5, concluding that the LLM has 'functional emotions.' We evaluate this claim against what is known about how emotions actually function in biological systems. We argue that emotions serve two core functions: the context-sensitive interpretation of situations, and the reorganization of processing across multiple systems in response to those interpretations. The Anthropic findings offer partial support for the first function, though the consistent, discrete emotional representations identified in Claude sit uneasily with affective neuroscience findings that human emotion is characterized by variable rather than uniform neural signatures. On the second function, the evidence is mixed: Claude's representations modulate output without producing the dynamic reorganization of attention, decision speed, and motivational state that defines emotion in biological systems. We close by proposing what it would take for an LLM to have emotions.

02.
arXiv (CS.CL) 2026-06-12

Emergence of Hierarchical Emotion Organization in Large Language Models

As large language models (LLMs) increasingly power conversational agents, understanding how they model users' emotional states is critical for ethical deployment. Inspired by emotion wheels, i.e., a psychological framework that argues emotions organize hierarchically, we analyze probabilistic dependencies between emotional states in model outputs. We find that LLMs naturally form hierarchical emotion trees that align with human psychological models, and larger models develop more complex hierarchies. We also uncover systematic biases in emotion recognition across socioeconomic personas, with compounding misclassifications for intersectional, underrepresented groups. Human studies reveal striking parallels, suggesting that LLMs internalize aspects of social perception. Beyond highlighting emergent emotional reasoning in LLMs, our results hint at the potential of using cognitively-grounded theories for developing better model evaluations.

03.
PLOS Computational Biology 2026-06-05

A multiscale, Bayesian inference approach to augment mechanistic models of cell signaling with machine-learning predictions of binding affinity

by Holly A. Huber, Stacey D. Finley Computational models in systems biology are often underdetermined—that is, there is little data relative to the complexity and size of the model. This lack of data is primarily due to limits in our ability to observe specific biological systems and restricts the utility of computational models. To reduce this uncertainty, recent methods have explored augmenting parameter inference of systems biology models with predictions from machine learning models. Such approaches expand the pool of data that is applicable for the inference problem. Here, we explore augmenting the parameter inference of intracellular signaling models. We choose to investigate signaling because experimental measurements of the variables of interest, protein dynamics, are still quite limited. To investigate, we propose a novel, multiscale, Bayesian inference approach that augments traditional signaling data with predictions of binding affinity. These predictions are generated using a machine learning pipeline with measurements of amino acid sequence, from the Universal Protein Resource, or protein structure, from the Protein Data Bank, as inputs. We find that we can successfully integrate these measurements into the inference problem using our novel framework. Excitingly, this integration significantly improves the parameter estimates of signaling models. We demonstrate that how much this improvement impacts predictions of signaling depends on the sensitivity of the prediction to perturbations in the parameter values. Overall, the framework we establish here improves the parameter inference of intracellular signaling models by successfully bridging data on protein sequence and structure with systems-level signaling.

04.
arXiv (CS.AI) 2026-06-12

Valid Inference with Synthetic Data via Task Exchangeability

arXiv:2606.13629v1 Announce Type: cross Abstract: There is a proliferation of work arguing for the use of synthetic data in scientific research. For example, social scientists are arguing for the use of LLM-generated "silicon samples" in pilot studies; AI evaluations increasingly rely on "LLM-as-a-judge" outputs; and proteomics research is accelerated by generative models that produce synthetic protein structures. These developments raise an intriguing possibility: synthetic data may help researchers ask more questions, run more studies, and accelerate discovery. But they also raise a fundamental concern: synthetic data can be biased, noisy, and misspecified. In this work, we propose statistical principles for using synthetic data in scientific research with provable validity guarantees. The key insight is a new technical condition that we call task exchangeability. Informally, this is a requirement that the researcher can identify historical tasks, for which real data is available, such that their current task of interest is exchangeable with the historical tasks in an appropriate mathematical sense. We develop methods for valid inference under task exchangeability, together with extensions that provide guarantees even beyond exchangeability. We demonstrate the framework on public opinion surveys with silicon samples and AI evaluation with autoraters.

05.
arXiv (CS.AI) 2026-06-16

LiteOdyssey: A Lightweight Reasoning AI Agent for Interpretable Rare-Disease Diagnosis

arXiv:2606.16149v1 Announce Type: new Abstract: Most medical AI systems improve by scaling additional machinery: more fine-tuning data, more agents, and/or larger retrieval databases. In rare-disease diagnosis, however, such scaling can produce systems that are difficult to deploy, audit, and maintain. We asked whether state-of-the-art diagnostic performance could instead be achieved by extending the reasoning chain of a single AI agent: guiding it with a diagnostic policy, developed through human-AI collaboration and augmenting with freely available biomedical tools. We introduce LiteOdyssey, a lightweight rare-disease diagnostic framework that guides reasoning language model through a clinical genetics workflow. This framework was developed through Policy Iteration with Human Feedback (PIHF) and uses dynamic access to public biomedical tools. On two challenging benchmarks that provide only patient clinical features, LiteOdyssey achieved state-of-the-art performance, with an overall disease Recall@1 of 59.3% over the combined 1,243 cases of LIRICAL (n = 370) and the PhenoPacket Store (n = 873). Both benchmarks have a high proportion of ultra-rare disease (a prevalence below 1 in 1,000,000, with ultra-rare shares of approximately 45% and 52.8%, respectively). On the more difficult PhenoPacket subset, where causal diseases were not mapped to Orphanet in our rarity-mapping pipeline, LiteOdyssey achieved 60.7% Recall@1, compared with 10.7% for the same baseline model (GPT-5.4) without tools. This performance was achieved without fine-tuning, multi-agent ensembles, or a large case-retrieval database. Gains were also observed in the following: on cases never seen during development, on a private cohort of real-world rare disease patients, and on a smaller open-weights model. LiteOdyssey suggests a path toward rare-disease AI systems that are accurate, easier to deploy, and more transparent for physician review.

06.
arXiv (CS.LG) 2026-06-12

Out-of-Distribution (OOD) Detectors for Open-Set RF Fingerprinting

arXiv:2606.12718v1 Announce Type: new Abstract: Radio-frequency (RF) fingerprinting systems must operate in open-world environments where signals from unknown transmitters and temporal drift introduce distribution shift at test time. Out-of-distribution (OOD) detection provides a natural framework for this problem, yet its application to RF fingerprinting (RFF) remains limited. A key barrier to their adoption is that most OOD detectors require auxiliary OOD data for parameter tuning, an assumption that is difficult to satisfy in RF environments where representative OOD data is impractical to collect. In this work, we introduce a promising set of OOD detection methods from the machine learning literature to open-set RFF domain. We present these methods within a unified mathematical framework based on information theory, which is a natural framework for communication systems. Our framework allows for the systematic analysis of methods and development of new methods. We further demonstrate the applicability of recent work on tuning OOD detectors without given OOD tuning data for open-set RFF. We evaluate on the POWDER RF fingerprinting dataset, showing that detectors tuned without any given OOD data achieve performance comparable to baselines with access to true OOD tuning data and greatly out-perform baseline approaches without access to true OOD tuning data, showcasing the practical viability for the RFF problem.

07.
arXiv (CS.CV) 2026-06-16

To forget is to preserve: Machine Unlearning for 3D medical image segmentation

With new data privacy laws such as the General Data Protection Regulation (GDPR) [1] that allow individuals to ask that any of their personal information be erased from trained machine learning models, there has been a push to investigate the unlearning of data from models as a way to comply with these laws. In this regard, based on four mechanics, we consider several approximate unlearning strategies applied to the MRBrainS18 dataset [2]. We use a 3D ResNet-50 [3] as a backbone architecture for segmentation that has been pre-trained with the Med3D framework [4]. Considering the pre-trained model as a baseline, we evaluate respective retention accuracy on 2 types of subjects, i.e., retain and forget. We assess these approaches through their Dice similarity coefficient and mean absolute error (MAE) values using two separate training horizons 20 and 50 epochs. The results show that the Noisy Label strategy had the best overall trade-off with a decrease of 93% in the forget set while maintaining 84% accuracy for the retained set after 50 epochs. All other strategies showed extreme levels of forgetting at higher epoch numbers while also demonstrating catastrophic degradation of their retain set performance. The results of this study provide a strict baseline of performance metrics for unlearning on a subject-specific level and provide practitioners with clear criteria for selecting the proper strategies.

08.
arXiv (CS.LG) 2026-06-16

HRIR-Former: Grid-Free Time-Domain Reconstruction of Head-Related Impulse Responses with a Spatially Encoded Transformer

arXiv:2603.27998v2 Announce Type: replace-cross Abstract: Individualized head-related impulse responses (HRIRs) enable binaural rendering, but dense per-listener measurements are costly. We address HRIR spatial up-sampling from sparse per-listener measurements: given a few measured HRIRs for a listener, predict HRIRs at unmeasured target directions. Prior learning methods often work in the frequency domain, rely on minimum-phase assumptions or separate timing models, and use a fixed direction grid, which can degrade temporal fidelity and spatial continuity. We propose HRIR-Former, a time-domain, grid-free binaural Transformer for reconstructing HRIRs at arbitrary directions from sparse inputs. It uses sinusoidal spatial features, a Conv1D refinement module, and auxiliary interaural time difference (ITD) and interaural level difference (ILD) heads. On SONICOM, it improves normalized mean squared error (NMSE), cosine distance, and ITD/ILD errors over prior methods; ablations validate modules and show minimum-phase preprocessing is unnecessary.

09.
arXiv (CS.CV) 2026-06-17

Edit3DGS: Unified Framework for Dynamic Head Editing via 2D Instruction-Guided Diffusion and 3D Gaussian Splatting

We present Edit3DGS, a unified framework for dynamic 3D head editing that integrates 2D instruction-guided diffusion with 3D Gaussian splatting. Unlike prior approaches that separately address frame-based edits or static 3D reconstruction, our method couples semantic controllability in the image domain with photorealistic, temporally consistent 3D representations. Given an input video, editable facial regions are masked and modified using a text-conditioned diffusion model to support fine-grained operations such as expression transformation, attribute modification, and appearance refinement. The edited frames are then aggregated through 3D Gaussian splatting to produce a coherent, high-fidelity avatar that preserves both identity and motion dynamics. To enforce consistency, Edit3DGS incorporates multi-view batch editing and lightweight inpainting strategies that recover lost expressions across timesteps. Experimental results demonstrate that our framework enables controllable, artifact-free head editing with smooth temporal transitions, offering practical applications in virtual avatars, immersive communication, film production, and interactive media.

10.
arXiv (CS.CV) 2026-06-16

DPC-VQA: Decoupling Quality Perception and Residual Calibration for Video Quality Assessment

Recent multimodal large language models (MLLMs) have shown promising performance on video quality assessment (VQA) tasks. However, adapting them to new scenarios remains expensive due to large-scale retraining and costly mean opinion score (MOS) annotations. In this paper, we argue that a pretrained MLLM already provides a useful perceptual prior for VQA, and that the main challenge is to efficiently calibrate this prior to the target MOS space. Based on this insight, we propose DPC-VQA, a decoupling perception and calibration framework for video quality assessment. Specifically, DPC-VQA uses a frozen MLLM to provide a base quality estimate and perceptual prior, and employs a lightweight calibration branch to predict a residual correction for target-scenario adaptation. This design avoids costly end-to-end retraining while maintaining reliable performance with lower training and data costs. Extensive experiments on both user-generated content (UGC) and AI-generated content (AIGC) benchmarks show that DPC-VQA achieves competitive performance against representative baselines, while using less than 2% of the trainable parameters of conventional MLLM-based VQA methods and remaining effective with only 20% of MOS labels. The code will be released upon publication.

11.
arXiv (CS.LG) 2026-06-16

Pushing the Boundaries of Natural Reasoning: Interleaved Bonus from Formal-Logic Verification

arXiv:2601.22642v2 Announce Type: replace Abstract: Large Language Models (LLMs) show remarkable capabilities, yet their stochastic next-token prediction creates logical inconsistencies and reward hacking that formal symbolic systems avoid. To bridge this gap, we introduce a formal logic verification-guided framework that dynamically interleaves formal symbolic verification with the natural language generation process, providing real-time feedback to detect and rectify errors as they occur. Distinguished from previous neuro-symbolic methods limited by passive post-hoc validation, our approach actively penalizes intermediate fallacies during the reasoning chain. We operationalize this framework via a novel two-stage training pipeline that synergizes formal logic verification-guided supervised fine-tuning and policy optimization. Extensive evaluation on six benchmarks spanning mathematical, logical, and general reasoning demonstrates that our 7B and 14B models outperform state-of-the-art baselines by average margins of 10.4% and 14.2%, respectively. These results validate that formal verification can serve as a scalable mechanism to significantly push the performance boundaries of advanced LLM reasoning.

12.
arXiv (CS.LG) 2026-06-11

Bypassing Prompt Guards in Production with Controlled-Release Prompting

arXiv:2510.01529v4 Announce Type: replace Abstract: Ball et al. recently established that prompt filtering for AI alignment faces a fundamental barrier: under standard cryptographic assumptions, no filter running significantly faster than the protected model can universally distinguish adversarial prompts from benign ones. We investigate whether this impossibility result translates to real-world vulnerabilities in deployed large language model (LLM) systems. We answer affirmatively by introducing controlled-release prompting, a practical instantiation of the theoretical framework that exploits the resource asymmetry between lightweight input filters and the main models they protect. Unlike the theoretical construction, our attack does not require model modification: it generates malicious prompts that are indecipherable by any bounded filter yet remain tractable to the target LLM. We find our attack to be successful on four major chat platforms (Google Gemini, DeepSeek Chat, xAI Grok, and Mistral Le Chat) where baseline methods fail. Additionally, we apply our attack to extract copyrighted data from Gemini. Finally, we provide a systematic evaluation of 14 open-weight prompt guard models, revealing that even reasoning-capable filters cannot reliably detect our attack without incurring prohibitive resource overhead.

13.
arXiv (CS.LG) 2026-06-12

Understanding Truncated Positional Encodings for Graph Neural Networks

arXiv:2606.13671v1 Announce Type: new Abstract: Positional encodings (PEs) enhance the power of graph neural networks (GNNs), both theoretically and empirically. Two of the most popular families of PEs - spectral (e.g., Laplacian eigenspaces, effective resistance) and walk-based (polynomials of the adjacency matrix) - are theoretically equivalent in expressive power, with expressivity between the 1-WL and 3-WL tests. However, this equivalence assumes the GNN uses the "complete" version of these PEs, which requires $O(n^3)$ time and space complexity. Instead, practitioners commonly use truncated variants of these encodings, such as the first $k$ eigenspaces or powers of the adjacency matrix. However, the theoretical properties of these truncated PEs are unknown. In this work, we initiate the study of these truncated PEs. Theoretically, we show that, under truncation, several families of PEs are fundamentally different in expressive power. As a corollary, we show that truncated spectral PEs are no longer stronger than the 1-WL test. We also study a family of spectral PEs, the $k$-harmonic distances, to highlight the differences in expressive power of even closely related truncated PEs. Finally, we experimentally show that a mix of truncated PEs is preferable to any single family on real-world datasets.

14.
arXiv (CS.AI) 2026-06-16

Medical world models: representing medical states, modelling clinical dynamics and guiding intervention policies

arXiv:2606.16721v1 Announce Type: new Abstract: Medical diagnosis and treatment are dynamic processes in which patient states evolve over time and clinical interventions alter future outcomes. Although current medical AI can detect disease, estimate risk and generate reports, many systems still return static labels or scores, offering limited insight into how illness may progress or how alternative interventions may reshape its trajectory. Medical world models adapt the world-model idea from artificial intelligence to healthcare by learning internal simulators of patient-state dynamics. Their long-term goal is to help clinicians anticipate deterioration, compare treatment-conditioned futures and tailor care to individual patients. Yet relevant work remains scattered across foundation models, longitudinal modelling, disease simulation, treatment-effect estimation, reinforcement learning and digital twins. To bridge this gap, this review outlines a roadmap for advancing medical AI from isolated diagnosis and prediction toward medical world models that simulate disease evolution and support intervention decisions. This roadmap is organized around three coupled capabilities: patient-state construction, clinical dynamics modelling and intervention decision support. Across representative systems, the comparison highlights what each capability contributes and how partial components can be integrated into more mature perception–dynamics–planning systems. Finally, we identify the challenges involved in turning plausible rollouts into clinically useful simulators. Related literature is available at https://github.com/1999kevin/awesome_medical_world_models.

15.
medRxiv (Medicine) 2026-06-12

Deconvolution-based cell-type specific DNA methylation-wide and transcriptome-wide association studies identify risk CpG sites and genes associated with colorectal cancer risk

Bulk tissue-based DNA methylation-wide (MWAS) and transcriptome-wide association studies (TWAS) have identified CpG sites and genes associated with colorectal cancer (CRC) risk, but do not account for cellular heterogeneity. To address this, we developed a deconvolution-informed framework to infer cell-type specific DNA methylation and gene expression profiles from bulk normal colon tissues using reference single-cell epigenomic and transcriptomic datasets. We performed cell-type specific MWAS (ctMWAS) using deconvoluted DNA methylation data from 293 normal colon samples and conducted cell-type specific TWAS (ctTWAS) using deconvoluted gene expression data from 707 normal colon samples. Genetically predicted methylation and expression models were integrated with CRC GWAS summary statistics (78,473 cases and 107,143 controls) to identify risk-associated CpG sites and genes. Through ctMWAS, ctTWAS, and colocalization analyses, we identified 178 significant cell-type-specific CpG sites in 106 loci and 68 risk genes in 40 loci, including 26 previously unreported loci. Through additional integrative methylation-gene analysis, we prioritized 132 candidate risk genes, the majority of which were supported by multi-omics evidence and stage-specific dysregulation across the adenoma-carcinoma and serrated-carcinoma progression pathways. Pathway enrichment analyses implicated pathways involved in DNA double-strand break repair, TP53 regulation, TGF-{beta} signaling, and innate immune responses. Among prioritized genes, 14 were identified as putative druggable targets linked to 90 FDA-approved or clinical-stage drugs. Experimental validation supports an oncogenic role for SF3A3. These findings demonstrate that deconvolution-informed integrative analyses enable cell-type-resolved identification of epigenetic and transcriptional mechanisms underlying CRC susceptibility and provide insights into disease biology, prevention, and therapeutic target discovery.

16.
arXiv (CS.LG) 2026-06-19

Multimodal Concept Bottleneck Models

arXiv:2606.19882v1 Announce Type: cross Abstract: Concept Bottleneck Models (CBMs) enhance the interpretability of deep learning networks by aligning the features extracted from images with natural concepts. However, existing CBMs are constrained in their ability to generalize beyond a fixed set of predefined classes and the risk of non-concept information leakage, where predictive signals outside the intended concepts are inadvertently exploited. In this paper, we propose Multimodal Concept Bottleneck Model (MM-CBM) to address these issues and extend CBMs into CLIP. MM-CBM utilizes dual Concept Bottleneck Layers (CBLs) to align both the image and text embeddings into interpretable features. This allows us to perform new vision tasks like zero-shot classification or image retrieval in an interpretable way. Compared to existing methods, MM-CBM achieves up to 51.26% accuracy improvement on average across four standard benchmarks. Our method maintains high accuracy, staying within ~5% of black-box performance while offering greater interpretability.

17.
arXiv (CS.CV) 2026-06-16

RQUL-UIE: Revitalizing Quality-Unstable Labels for Underwater Image Enhancement via In-Dataset Self-Supervision

Underwater Image Enhancement (UIE) is essential for mitigating degradations caused by water medium. Although learning-based methods have advanced significantly, most rely on paired datasets with unstable label quality, which bottlenecks model performance. This paper proposes a diffusion-based, in-dataset self-supervised learning strategy designed to exploit the quality distribution of training labels. Specifically, we evaluate label quality via semantic perception embeddings from a pre-trained diffusion model in a training-free manner. These quality scores are subsequently quantized into noise-level indices, guiding a multi-step denoising process for level-wise supervision. This mechanism prevents low-quality labels from degrading the model while maximizing their utility during training. Furthermore, a Fourier-based refinement network is incorporated to explicitly reconstruct high-frequency components. Extensive evaluations demonstrate that our method consistently outperforms SOTA approaches in restoration quality. The code and pre-trained model will be available once accepted in link.

18.
arXiv (quant-ph) 2026-06-24

On the Limits of Stretching Quantum Pseudorandomness

arXiv:2606.24736v1 Announce Type: new Abstract: Pseudorandom states, introduced by Ji, Liu, and Song (CRYPTO '18), are quantum analogues of classical pseudorandom generators. A fundamental property of classical pseudorandom generators is that their output can be stretched to arbitrary polynomial length. Whether an analogous stretching property holds for quantum pseudorandom states remains unclear. In this work, we prove the first black-box separation between single-copy secure pseudorandom states ($\mathsf{1PRS}$) with different output lengths. Specifically, we construct a quantum oracle relative to which $\mathsf{1PRS}$ with output length $m(n)=1.1n$ exist, but $\mathsf{1PRS}$ with output length $m(n)=\Omega(n^{2+\epsilon})$ do not, for any $\epsilon>0$. Our proof leverages the Common Haar Random State (CHRS) model introduced by Chen, Coladangelo, and Sattath (EUROCRYPT '25), and introduces a technique to bound the effective number of resource CHRS states utilized by any $\mathsf{1PRS}$ generator in this model.

19.
arXiv (CS.AI) 2026-06-24

Learning to Trigger: Reinforcement Learning at the Large Hadron Collider

arXiv:2606.23993v1 Announce Type: cross Abstract: High-throughput scientific facilities such as the Large Hadron Collider depend on real-time event filtering (triggering) under tight constraints on bandwidth, latency, and storage. In practice, trigger menus are largely static and hand-tuned and can become suboptimal as detector conditions, pileup, and background composition drift over time. We cast online threshold tuning as a sequential decision-making problem: a reinforcement learning agent ingests streaming summaries of recent rates and signal-sensitive features and updates trigger thresholds to maximize signal efficiency while tracking a target background rate within a tolerance band. We adapt Group-Filtered Policy Optimization (GFPO) to streaming control and introduce two variants (GFPO-F, GFPO-FR) that enforce background rate feasibility during training. On a benchmark that emulates realistic collider operation, we study two representative triggers: a total transverse energy ($H_{T}$) trigger sensitive to pileup variation, and an anomaly-detection (AD) trigger based on reconstruction loss for rare or non-standard signatures. On Monte Carlo streams, our agent increases the fraction of in-tolerance time intervals by 48\% ($H_T$) and 28\% (AD), with a cumulative gain of up to 2\% in signal efficiency on those in-tolerance intervals. Transferring from simulation to real collision data (CMS Run 283408), the same agent, without fine-tuning, achieves a 56\% ($H_T$) and 28\% (AD) in-tolerance improvement over baselines, with further signal-efficiency gain on both triggers. To our knowledge, this is the first demonstration of RL-based trigger control on real Large Hadron Collider collision data. Code is available at https://github.com/Zixind/GFPO\_LHC.

20.
arXiv (CS.CV) 2026-06-11

Multimodal Brain Tumour Classification Using Feature Fusion

Clinicians diagnose brain tumors by synthesizing patient symptoms, medical history, and quantitative imaging data from modalities such as MRI and CT scans into a unified clinical judgement. However, most deep learning models rely on MRI/CT images alone, failing to replicate the clinicians multimodal reasoning. We explore a two-branch multimodal network combining raw MRI scans with 91 extracted radiomic features (intensity, texture, shape, and boundary descriptors) to classify brain tumors into glioma, meningioma, pituitary, and no-tumor. A pre-trained CNN backbone encodes the image stream, whereas a dedicated MLP encodes the radiomic stream. Both streams are fused via concatenation, gated, or bidirectional cross-modal attention strategies. Across nine experimental runs on a balanced 7,200 image dataset, all multimodal configurations outperform unimodal baselines with gated fusion achieving the best accuracy of 96.13%.

21.
arXiv (CS.AI) 2026-06-16

PrologMCP: A Standardized Prolog Tool Interface for LLM Agents

arXiv:2606.14935v1 Announce Type: new Abstract: Frontier reasoning-tuned language models still fail on deductive tasks at depth, and the cost of improved performance through extended internal reasoning scales poorly. Symbolic delegation offers a complementary route: a language model translates the problem, while a solver performs the inference. However, current autoformalization pipelines for logic programming are typically bespoke integrations tied to particular tasks or agents. We introduce PrologMCP, a task-agnostic, open-source server that exposes Prolog as a stateful tool through the Model Context Protocol (MCP). Its compact tool interface, structured error reporting, and per-session isolation make the translate-run-inspect-repair loop a reusable primitive for MCP-capable agents. We evaluate a formalizer agent enhanced with PrologMCP against standard and reasoning LLMs (Claude Sonnet 4.6, GPT-4.1, and o4-mini) on two subsets of PARARULE-Plus: a general-purpose sample and a more challenging one targeting a specific failure mode of natural-language reasoning. On the general sample, the formalizer matches or exceeds reasoning LLMs (accuracy 1.00 vs.\ 1.00 / 0.998), with the largest gains over standard models (0.762 for GPT-4.1). On the challenging subset, the formalizer remains near-perfect (1.00 / 0.99) while reasoning LLMs drop to 0.95 / 0.94. These results suggest that delegating inference to Prolog via MCP is a robust and inspectable alternative to extended natural-language reasoning.

22.
arXiv (CS.AI) 2026-06-24

Can Aggregate Invariants Accelerate Continuous Subgraph Matching? Limits, Laws, and a Dynamic Spectral Index

arXiv:2606.24421v1 Announce Type: new Abstract: Spectral filtering recently delivered substantial pruning for static subgraph matching: Laplacian interlacing rejects candidates whose neighborhoods cannot host the query. We study whether such aggregate structural tests can accelerate continuous subgraph matching (CSM) over dynamic graphs, and answer in three parts. First, lazily maintained spectral bounds are infeasible exactly where spectral pruning has value: we characterize the tightest safe rule over a formalized perturbation relaxation and show that even it loses essentially all pruning power within four touching updates. Second, exact maintenance is affordable when selective: pruning utility and recomputation cost are anti-correlated across vertices – hubs provably never prune – so recomputing small-neighborhood spectra on touch sustains exact local spectra at microseconds per update, complete by construction. Third, integrated into a decoupled CSM benchmark against an identical-minus-spectra control, the tests remove up to $51\%$ of candidates or safely skip up to $47\%$ of update enumerations, yet enumeration intermediates remain unchanged – beyond the gates' skipped first-level bindings, typically zero – across two engines, four real graphs, two stream types, and $77$ solved queries; a constructed radius-stratified workload confirms the instrument detects the exception when one exists ($-99.9\%$ intermediates, $748\times$ faster). Aggregate tests accelerate what scales with candidate sets – construction, list scans – never adjacency-guided exploration. We distill an intermediate-invariance methodology for evaluating CSM filters and release a reusable dynamic local-spectra index.

23.
arXiv (CS.AI) 2026-06-17

The Discrete-Log Clock: How a Transformer Learns Modular Multiplication

arXiv:2606.17399v1 Announce Type: cross Abstract: When small transformers grok modular multiplication, prior work reports that the learned embedding has a "dense" Fourier spectrum requiring all frequencies. This contrasts with modular addition, where only a sparse set of key frequencies suffices. We show this density is an artifact of analyzing in the wrong basis. The natural Fourier transform for multiplication is not the standard additive DFT but the multiplicative character transform, which decomposes functions on the multiplicative group $(\mathbb{Z}/p\mathbb{Z})^*$ into its irreducible representations. Applying this transform to a grokked transformer trained on $a \cdot b \bmod 113$, we find the embedding spectrum becomes highly sparse (Gini coefficient 0.58 vs. 0.07 in the additive basis) with only 4 key frequencies carrying significant energy. Furthermore, 96.9% of MLP neurons are cleanly tuned to a single multiplicative frequency, and neuron activation heatmaps reveal 2D-periodic structure when reordered by the discrete logarithm. These results demonstrate the transformer reduces multiplication to addition in discrete-log space, implementing a "Discrete-Log Clock" algorithm analogous to Nanda et al.'s Clock algorithm for addition. The methodology generalizes: matching the analysis basis to the algebraic structure of the task reveals interpretable structure where standard tools see noise.

24.
arXiv (CS.CV) 2026-06-16

Last But Not Least: Boundary Attention CalibratiON for Multimodal KV Cache Compression

Multimodal Large Language Models (MLLMs) achieve strong vision-language reasoning, but long visual contexts enlarge the KV cache and increase decoding latency. Existing compression methods rely on observation window attention for stable token-importance estimation, yet this aggregation can dilute sparse visual evidence and discard answer-critical tokens under aggressive compression. Therefore, we identify last-query attention as a complementary source for recovering such evidence, but its answer-irrelevant signals can mislead retention. We propose BACON, a plug-and-play method that calibrates observation window attention with last-query evidence and suppresses isolated noise via intra-layer coherence and inter-layer persistence. Across diverse benchmarks, models, budgets, and compression methods, BACON improves multimodal KV compression by 7.5% on average under the most aggressive budget, with gains up to 30.9%.

25.
arXiv (CS.LG) 2026-06-18

Contextualizing Biological Language Models across Modalities via Logit-Space Contrastive Alignment

arXiv:2606.18703v1 Announce Type: new Abstract: Pretrained biological language models expose per-token probability distributions through masked-token prediction, providing the likelihood interface central to sequence design, variant scoring, and mechanistic interpretation. Yet these distributions are learned from broad unlabeled corpora and are not naturally conditioned on task-specific biological contexts such as interaction partners, cellular environments, or therapeutic interventions. Existing contextual matching methods often distort this interface through pooled embeddings, contrastive latent spaces, or task-specific prediction heads. We introduce LOGICA (Logit-space Contrastive Alignment), a framework for context-conditioned prediction that performs contrastive learning directly in output-logit space. Using gated cross-modal adapters compatible with each model's native token head, LOGICA preserves the pretrained likelihood interface and converts contextualized token log-likelihoods into matching scores. Alignment is defined through context-sensitive token probabilities rather than proximity in a shared embedding space, enabling learning from sparse paired data across models with distinct vocabularies, without a shared tokenizer or decoder. LOGICA is particularly effective for mutation-local variant ranking, where comparisons reduce to context-conditioned likelihoods of mutant tokens at perturbed sites. Across protein–ligand binding, TCR–peptide activity, and drug-conditioned resistance prediction, LOGICA improves over prior state-of-the-art methods, including matched latent-contrastive and conditional MLM baselines, while retaining a token-level interface for interpretation and generation. On held-out-gene single-mutation drug-resistance prediction, LOGICA improves AUC from near-random latent-space baselines of $\sim$0.55 to $\sim$0.65.