Academic Intelligence · Curated Daily

探索全球前沿学术脉络

AcademicHub 汇聚顶级期刊与预印本平台的实时文献。定制您的专属科研雷达,利用大语言模型自动生成交叉领域文献分析简报。

01.
arXiv (CS.AI) 2026-06-17

A Gradient-based Causal Discovery Framework with Applications to Complex Industrial Processes

arXiv:2507.11178v3 Announce Type: replace-cross Abstract: With the advancement of deep learning technologies, various neural network-based Granger causality models have been proposed. Although these models have demonstrated notable improvements, several limitations remain. Most existing approaches adopt the component-wise architecture, necessitating the construction of a separate model for each time series, which results in substantial computational costs. In addition, imposing the sparsity-inducing penalty on the first-layer weights of the neural network to extract causal relationships weakens the model's ability to capture complex interactions. To address these limitations, we propose Gradient Regularization-based Neural Granger Causality (GRNGC), which requires only one time series prediction model and applies $L_{1}$ regularization to the gradient between model's input and output to infer Granger causality. Moreover, GRNGC is not tied to a specific time series forecasting model and can be implemented with diverse architectures such as KAN, MLP, and LSTM, offering enhanced flexibility. Numerical simulations on DREAM, Lorenz-96, fMRI BOLD, and CausalTime show that GRNGC outperforms existing baselines and significantly reduces computational overhead. Meanwhile, experiments on real-world DNA, Yeast, HeLa, and bladder urothelial carcinoma datasets further validate the model's effectiveness in reconstructing gene regulatory networks.

02.
arXiv (CS.AI) 2026-06-15

Patcher: Post-Hoc Patching of Backdoored Large Language Models

arXiv:2606.02995v2 Announce Type: replace-cross Abstract: Large language models remain vulnerable to jailbreak backdoor attacks, where adversaries poison safety alignment data to embed hidden triggers that bypass safety mechanisms. Existing defenses often require comprehensive attack information or multiple triggered examples, making them impractical when defenders only observe a single reported failure case without knowing whether it stems from a backdoor attack or a natural alignment bug. This paper presents Patcher, a post-hoc defense framework that repairs backdoored language models using only a single reported failure case and the model parameters. Patcher operates in two stages. First, it localizes backdoor triggers by computing response-conditioned gradient-based saliency scores and applying adaptive clustering to separate triggers from benign context. Second, it patches the model through a constrained fine-tuning objective that breaks the trigger-response association while preserving benign-task utility and robustness to non-triggered jailbreak attacks through KL-divergence constraints. We conduct extensive evaluations across multiple backdoor attack strategies and demonstrate that Patcher successfully localizes triggers and neutralizes backdoors while maintaining model utility. We further show robustness against adaptive attacks designed to evade our defense. This work represents a significant step toward practical defenses against training-time attacks in deployed language models.

03.
arXiv (CS.LG) 2026-06-12

From Parameters to Feature Space: Task Arithmetic for Backdoor Mitigation in Model Merging

arXiv:2606.12498v1 Announce Type: cross Abstract: Model merging (MM) has gained significant attention as a cost-effective approach to integrate multiple task-specific models into a unified model. However, recent work reveals that MM is highly susceptible to backdoor attacks. Existing defenses based on task arithmetic often fail to eliminate backdoors without substantially degrading clean-task performance, owing to their reliance on direct parameter-space editing. To address this gap, we propose Linear Feature Path Minimization (LFPM), a backdoor mitigation framework for model merging, which introduces an anti-backdoor task vector into the backdoored merged model. Unlike prior approaches, LFPM formulates the backdoor robustness of the merged model from a unified feature-space perspective under the Cross-Task Linearity (CTL) framework, which leverages the approximate linearity of features across tasks. This perspective guides the optimization of the anti-backdoor task to suppress backdoors while preserving clean-task performance. Furthermore, we introduce an effective optimization mechanism based on gradient accumulation and loss path-integral, ensuring robust backdoor suppression along the interpolation path. Extensive experiments demonstrate that LFPM consistently exhibits strong robustness against backdoor attacks in both full fine-tuning and Parameter-Efficient Fine-Tuning (PEFT) settings.

04.
arXiv (CS.CV) 2026-06-12

JSCGC: Joint Source-Channel-Generation Coding for Wireless Generative Communications

Conventional communication systems, including both separation-based coding and learning-based joint source-channel coding (JSCC), are typically designed under Shannon's rate-distortion theory. However, relying on generic distortion metrics fails to capture complex human visual perception, often resulting in blurred or unrealistic reconstructions. In this paper, we propose Joint Source-Channel-Generation Coding (JSCGC), a generative communication paradigm that replaces the conventional decoder with a generative model at the receiver. The received signal is treated as a condition that controls the sampling process into the learned conditional distribution, reformulating communication from deterministic reconstruction for distortion minimization to controlled generation for mutual information maximization under perceptual constraints. Based on this formulation, we develop a unified joint training and efficient stochastic sampling framework, and provide theoretical analysis of its effectiveness in both learning and inference stages. Extensive experiments on latent-space image transmission demonstrate that the JSCGC consistently improves feature-based, semantic-level, and distributional quality across diverse channel conditions, while exhibiting a distinct error behavior characterized by semantic inconsistency rather than distortion.

05.
arXiv (CS.AI) 2026-06-16

Red-Teaming Agent Execution Contexts: Open-World Security Evaluation on OpenClaw

arXiv:2605.11047v2 Announce Type: replace-cross Abstract: Agentic language-model systems increasingly rely on mutable execution contexts, including files, memory, tools, skills, and auxiliary artifacts, creating security risks beyond explicit user prompts. This paper presents DeepTrap, an automated framework for discovering contextual vulnerabilities in OpenClaw. DeepTrap formulates adversarial context manipulation as a black-box trajectory-level optimization problem that balances risk realization, benign-task preservation, and stealth. It combines risk-conditioned evaluation, multi-objective trajectory scoring, reward-guided beam search, and reflection-based deep probing to identify high-value compromised contexts. We construct a 42-case benchmark spanning six vulnerability classes and seven operational scenarios, and evaluate nine target models using attack and utility grading scores. Results show that contextual compromise can induce substantial unsafe behavior while preserving user-facing task completion, demonstrating that final-response evaluation is insufficient. The findings highlight the need for execution-centric security evaluation of agentic AI systems. Our code is released at: https://github.com/ZJUICSR/DeepTrap

06.
arXiv (CS.LG) 2026-06-16

Continual Backdoor Training in IoT/CPS

arXiv:2606.14987v1 Announce Type: cross Abstract: Internet of Things (IoT) and Cyber-physical systems (CPS) increasingly rely on continual learning (CL) to adapt to evolving environments, device heterogeneity, and concept drift, thereby improving overall utility. While continual adaptation is essential for long-lived IoT deployments where data patterns evolve, it also introduces new security vulnerabilities. In particular, backdoor attacks can exploit incremental updates, replay buffers, and representation reuse to implant persistent malicious behaviors that remain dormant during normal operation but activate upon specific triggers. In this paper, we present a backdoor attack in continual learning used in IoT/CPS systems. To this end, we formalize an IoT/CPS-specific threat model, analyze why continual learning amplifies backdoor persistence in IoT pipelines, and evaluate our technique under varying conditions. Our analysis highlights critical open challenges in securing lifelong learning in IoT/CPS and industrial IoT (IIoT) environments, as well as the need for heightened security controls.

07.
arXiv (CS.CL) 2026-06-12

HyPE: Category-Aware Hypergraph Encoding with Persistent Edge Embeddings for Persona-Grounded Dialogue

Persona-grounded dialogue systems aim to produce responses consistent with a speaker's persona, yet existing methods treat personas as a flat set of sentences and fail to model the high-order relations among persona attributes-e.g., that several persona sentences share a topical category. We propose HyPE (Hypergraph Persona Encoder), a framework that (i) analyzes each persona-bearing text as a (Core, Expression, Sentiment, Category) quadruple, and (ii) organizes persona elements into a hypergraph whose hyperedges are induced by shared category labels. An HyperGCN hypergraph neural network propagates this structure into a persona summary vector and a soft-memory bank that condition the response generator. We further propose Persistent Edge Embeddings (PEE), lightweight per-category learnable priors fused into the HyperGCN message-passing step. On PersonaChat under greedy decoding, HyPE consistently outperforms sentence-level pooling baselines across GPT-2, LLaMA-3.2-3B, and Qwen2.5-3B backbones by demonstrating that structured hyperedge-level persona encoding provides a transferable advantage across model scales.

08.
PLOS Computational Biology 2026-06-15

Environmental “knees” and “wiggles” as strong stabilizers of species’ range limits set by interspecific competition

by Farshad Shirani, Benjamin G. Freeman Whether interspecific competition is a major contributing factor to setting species’ range limits has been debated for a long time. Theoretical studies have proposed that the interactions between interspecific competition and disruptive gene flow along an environmental gradient can halt range expansion of ecologically similar species where they meet. However, the stability of such range limits has not been well addressed. We use a deterministic mathematical model of adaptive range evolution over a continuous habitat to show that the range limits set by interspecific competition are unlikely to be evolutionarily stable if the environmental optima for fitness-related traits vary (almost) linearly in space. That is, in a linear environment without a dispersal barrier or a third (or more) species, the range borders formed between two competing species constantly move towards the weaker species. We demonstrate that environmental nonlinearities such as “knees” and “wiggles”—wherein an isolated sharp change or a step-like change occurs in the steepness of a trait optimum—can strongly stabilize competitively formed range limits. The stabilization mechanism relies on the contrast that such nonlinearities create in the level of disruptive gene flow to the peripheral population of each species, and succeeds when an additional process, such as Allee effects, prevents the establishment of an infinitesimal population in the presence of an abundant competitor. We show that the stability of the range limits at these nonlinearities is robust against moderate environmental disturbances. Whether strong disturbances such as rapid high-amplitude climate changes can destabilize such range limits depends on how the competitive dominance of the species changes across the nonlinearity. Therefore, our findings underscore the importance of assessing species’ competitive ability when predicting responses to climate change, and identify geographic regions where established range limits are likely to persist as well as regions where shifting limits may eventually stabilize.

09.
arXiv (CS.AI) 2026-06-17

Surrogate Assisted Pedestrian Protection Design via a Foundation Model Orchestrated Workflow

arXiv:2606.17577v1 Announce Type: new Abstract: AI-driven engineering workflows face particular challenges in crash safety design: unlike aerodynamics, crash events involve highly nonlinear contact dynamics, material nonlinearity, and discrete state transitions that are difficult to capture with data-driven surrogate models. To the best of our knowledge, we present the first foundation model–orchestrated workflow for crash safety design that enables surrogate-assisted exploration for pedestrian protection, reducing evaluation time from hours per CAE simulation to seconds. The workflow integrates four components: (1) a surrogate trained on CAE crash simulations to predict pedestrian leg injury metrics from design parameters, achieving an average $R^2=0.87$ and providing distribution-free conformal prediction intervals; (2) multiobjective evolutionary search (NSGA-II) to discover diverse feasible parameter sets under user-specified constraints; (3) a morphing-based geometry generator that maps parameters to topology-preserving 3D shapes; and (4) a natural-language interface in which an LLM orchestrates the workflow and a vision–language model supports semantic comparison of generated designs. In an automotive front-bumper case study, the workflow produces 35 distinct safety-compliant alternatives from a single exploration, a process that would require weeks with conventional CAE iteration. These results suggest that foundation models can serve as integration layers between ML surrogates and physics-based simulation, helping bring AI capabilities to safety-critical engineering domains.

10.
arXiv (CS.LG) 2026-06-18

Investigating Inductive Biases for Machine Learning Emulation of Sudden Stratospheric Warmings in Idealised Isca Simulations

arXiv:2606.18857v1 Announce Type: new Abstract: Machine-learning emulators are increasingly used for weather prediction and have the potential to extend skill on subseasonal-to-seasonal timescales by learning dynamically important sources of predictability. A key challenge is whether the models can exploit predictability anchors, such as stratospheric variability, that influence tropospheric circulation beyond short lead times. We test how architectural inductive bias affects emulation of sudden stratospheric warming (SSW) dynamics using paired idealised Isca simulations that differ only in an imposed wave-2 heating perturbation. Across convolutional, transformer, and graph-based architectures trained for one-step prediction, model differences are modest when the stratosphere is dynamically quiet but widen substantially when SSW-like variability is active. Our results identify explicit three-dimensional vertical coupling as a key inductive bias for machine-learning emulation of stratospheric dynamics. However, Eliassen-Palm flux diagnostics show that low forecast error does not guarantee physically faithful wave-mean-flow interaction, with coherent errors remaining in stratospheric wave-driving structure.

11.
arXiv (CS.CV) 2026-06-16

Semantic Editing with Coupled Stochastic Differential Equations

Editing the content of an image with a pretrained text-to-image model remains challenging. Existing methods often distort fine details or introduce unintended artifacts. We propose using coupled stochastic differential equations (coupled SDEs) to guide the sampling process of any pre-trained generative model that can be sampled by solving an SDE, including diffusion and rectified flow models. By driving both the source image and the edited image with the same correlated noise, our approach steers new samples toward the desired semantics while preserving visual similarity to the source. The method works out-of-the-box, without retraining or auxiliary networks, and achieves high prompt fidelity along with near-pixel-level consistency. These results position coupled SDEs as a simple yet powerful tool for controlled generative AI. Project page: https://z-jianxin.github.io/syncSDE-release/. Code: https://github.com/Z-Jianxin/syncSDE-release.

12.
arXiv (CS.CL) 2026-06-11

LatticeBridge: Rare-Event Sequential Inference for Faithful Structured Sequence Synthesis

Structured sequence generation often requires a model to satisfy several input-derived constraints in a single output. Standard decoding methods may assign high probability to fluent continuations while placing low mass on continuations that realize all required anchors jointly. We study this regime as a rare-event sequential inference problem. LatticeBridge combines a compact prefix language model, instance-compiled surface automata, and a twisted sequential Monte Carlo (SMC) decoder with resampling, multilevel splitting, and a source-support proposal term derived from instance-provided phrases. The constraint representation is compiled from each input instance and does not rely on manually curated lexical classes. On 2,610 attainable validation tasks spanning CommonGen, E2E NLG, and WikiBio, the particle decoder improves exact anchor satisfaction and mean anchor coverage over greedy, beam-filtered, and best-of-k ancestral baselines under a shared proposal model. Since exact anchor satisfaction alone does not rule out unsupported attribute substitutions, the evaluation reports required-anchor coverage, source coverage, source-intrusion diagnostics, overlap, runtime, and particle statistics jointly. The benchmark characterizes the faithfulness-overlap-latency frontier under a fixed proposal model.

13.
arXiv (CS.CL) 2026-06-16

JE-IRT: A Geometric Lens on LLM Abilities through Joint Embedding Item Response Theory

Standard LLM evaluation practices compress diverse abilities into single scores, obscuring their inherently multidimensional nature. We present JE-IRT, a geometric item-response framework that embeds both LLMs and questions in a shared space. For question embeddings, the direction encodes semantics and the norm encodes difficulty, while correctness on each question is determined by the geometric interaction between the model and question embeddings. This geometry replaces a global ranking of LLMs with topical specialization and enables smooth variation across related questions. Building on this framework, our experimental results reveal that out-of-distribution behavior can be explained through directional alignment, and that larger norms consistently indicate harder questions. Moreover, JE-IRT naturally supports generalization: once the space is learned, new LLMs are added by fitting a single embedding. The learned space further reveals an LLM-internal taxonomy that only partially aligns with human-defined subject categories. We also show that simple linear probes of the embedding space recover cross-subject ability directions, such as an arithmetic axis that highlights quantitatively demanding questions in seemingly distant subjects like virology and global facts. JE-IRT thus establishes a unified and interpretable geometric lens that connects LLM abilities with the structure of questions, offering a distinctive perspective on model evaluation and generalization.

14.
arXiv (CS.AI) 2026-06-17

The Discrete-Log Clock: How a Transformer Learns Modular Multiplication

arXiv:2606.17399v1 Announce Type: cross Abstract: When small transformers grok modular multiplication, prior work reports that the learned embedding has a "dense" Fourier spectrum requiring all frequencies. This contrasts with modular addition, where only a sparse set of key frequencies suffices. We show this density is an artifact of analyzing in the wrong basis. The natural Fourier transform for multiplication is not the standard additive DFT but the multiplicative character transform, which decomposes functions on the multiplicative group $(\mathbb{Z}/p\mathbb{Z})^*$ into its irreducible representations. Applying this transform to a grokked transformer trained on $a \cdot b \bmod 113$, we find the embedding spectrum becomes highly sparse (Gini coefficient 0.58 vs. 0.07 in the additive basis) with only 4 key frequencies carrying significant energy. Furthermore, 96.9% of MLP neurons are cleanly tuned to a single multiplicative frequency, and neuron activation heatmaps reveal 2D-periodic structure when reordered by the discrete logarithm. These results demonstrate the transformer reduces multiplication to addition in discrete-log space, implementing a "Discrete-Log Clock" algorithm analogous to Nanda et al.'s Clock algorithm for addition. The methodology generalizes: matching the analysis basis to the algebraic structure of the task reveals interpretable structure where standard tools see noise.

15.
arXiv (CS.AI) 2026-06-19

Analyzing the Narration Gap in LLM-Solver Loops

arXiv:2606.19588v1 Announce Type: new Abstract: Formal tools such as SAT and SMT solvers are increasingly embedded in language model reasoning pipelines when a safety or security critical question can be formulated in logic. Unlike chain of thought whose steps are sampled from the model distribution without formal guarantee, a solver produces a sound and independently verifiable answer. However, the soundness guarantee can be lost in the interaction between the solver and the model. The hybrid pipeline has three components: formalizing the question, deciding it, and narrating the result. Prior work has studied the formalization and decision, but not narration, which is the step that turns a formal tool's output into the user answer. To fill the narration gap, we first model the LLM-solver loop as a verified decision procedure. We further evaluate five open-sourced models under prompt injection, and we find certificate gating makes the solver verdict sound, while an adversary can invert a verified conclusion across phrasings and channels. We study the mitigation through hardened prompt that reduces injection significantly but cannot eliminate it and still suffers under adaptive attack. Combining the formal analysis and empirical studies, we show in the LLM-solver loop, robustness does not reach to the answer that the user finally reads.

16.
arXiv (CS.CL) 2026-06-12

LAUKIN: A Multi-jurisdictional Common Law Contract Dataset

Multinational companies increasingly require cross-jurisdictional contract review, yet existing legal NLP datasets are largely restricted to a single jurisdiction. We introduce LAUKIN (Legal equivalence dataset of Australia, UK, and INdia), a dataset of clause pairs (AU-UK, UK-IN, IN-AU) labelled for boolean legal equivalence. We develop a novel multi-stage retrieval and reranking pipeline to construct the initial clause pair mapping, with a subset of clause pairs subsequently annotated by legal experts as Equivalent or Not Equivalent. The dataset comprises 14,727 clause pairs from 204 contracts across 8 agreement types, of which 3,000 are manually labelled: 900 train, 600 dev, and 1,500 test. We evaluate 12 models across 4 techniques, achieving a best macro-F1 of 65.11%, establishing LAUKIN as a challenging benchmark. Results reveal that, despite shared legal heritage, drafting conventions diverge significantly across jurisdictions, making cross-jurisdictional equivalence classification non-trivial. LAUKIN also includes 11,727 unlabelled training pairs to support future semi-supervised learning research in legal NLP.

17.
arXiv (CS.AI) 2026-06-16

Harnessing cortical geometry, wiring, and function as inductive biases for recurrent neural networks

arXiv:2606.14975v1 Announce Type: cross Abstract: How the wiring and functional organization of cortex shape recurrent computation remains a central question in both neuroscience and machine learning. Here, we leverage data released through the Machine Intelligence from Cortical Networks (MICrONS) program–a functional connectomics resource spanning multiple areas of mouse visual cortex, in which dense calcium imaging is co-registered with high-resolution electron microscopy reconstruction from the same animal–to build biologically grounded recurrent neural networks. Using neuronal spatial coordinates, anatomical connectivity, and function-derived relationships from nearly 12,000 coregistered excitatory neurons, we initialize recurrent weights and impose communication-aware spatial constraints during learning. Across three cognitive decision-making tasks, networks constrained by cortical structure and function consistently outperform baseline and partially constrained models. Functional weight initialization provides the largest gain, while real spatial embedding yields robust additional improvements across conditions. These biologically grounded networks also develop low-entropy, modular, and small-world organization, and retain strong performance even when recurrence is restricted to positive weights. Together, our results show that the machinery of cortex–its geometry, wiring, and functional structure–can be harnessed as a powerful inductive basis for building recurrent networks that learn more effectively while converging toward key organizational principles of biological computation.

18.
arXiv (CS.AI) 2026-06-16

Visual-Seeker: Towards Visual-Native Multimodal Agentic Search via Active Visual Reasoning

arXiv:2606.15231v1 Announce Type: new Abstract: Multimodal large language models (MLLMs) have demonstrated impressive capabilities in many visual tasks, but they often struggle with factual grounding when confronted with complex, open-world scenarios. While recent multimodal deep search agents attempt to address this issue by utilizing external tools, the visual-native search paradigm remains underexplored. Existing methods primarily rely on simple images with explicit semantics and text-only evidence trajectories, limiting the agent's ability to perform multi-hop, cross-modal reasoning and search. To address these limitations, we propose Visual-Seeker, a visual-native multimodal deep search agent via active visual reasoning. Rather than treating vision as a static input, our agent actively attends to fine-grained visual details, dynamically harvests visual evidence throughout the search process. To unlock its visual-native potential, we design an active visual reasoning data pipeline and synthesize 5K high-quality multimodal trajectories for model training. Extensive experiments demonstrate the state-of-the-art performance across five challenging multimodal search benchmarks, even surpassing several proprietary models, validating robust visual-native reasoning and search in real-world web environments. The code and data can be accessed at: https://github.com/ZhengboZhang/Visual-Seeker.

19.
arXiv (CS.LG) 2026-06-16

Contextual Bandits for Maximizing Stimulated Word-of-Mouth Rewards

arXiv:2606.15146v1 Announce Type: new Abstract: Stimulated word-of-mouth is a strategy that promotes information sharing through prompts or incentives. Optimizing stimulated word-of-mouth through social networks requires identifying and targeting connected users who are most susceptible to spillover, a phenomenon where the influence of recommendations extends beyond the immediate audience to impact their connected users. The probability of spillover varies across individuals, and their connections, leading to heterogeneity. Understanding and accurately estimating the spillover probabilities among users in social networks is crucial for improving the effectiveness of stimulated word-of-mouth. To address this, we present a novel contextual multi-armed bandit framework that learns individual spillover probabilities and ranks connected users to maximize rewards from stimulated word-of-mouth. Experiments on real-world network datasets demonstrate that accounting for spillover heterogeneity enhances the targeting precision of top-$k$ connected users, boosting rewards and outperforming baseline methods that do not learn individual spillover effects.

20.
arXiv (quant-ph) 2026-06-17

Effects of Josephson Junction Non-idealities on Adiabatic Quantum Flux Parametron Circuits

arXiv:2606.17338v1 Announce Type: new Abstract: Adiabatic quantum flux parametron (AQFP) gate is a promising approach to scale up the cryogenic microwave electronics for superconducting qubit multiplexed control. However, the performance of these circuits depends on the quality of the Josephson junctions which are ideally superconductor-insulator-superconductor (SIS) type following the ideal sinusoidal relation between current and quantum phase. We demonstrate how the non-sinusoidal current-phase relation in Superconductor-Normal metal-Superconductor (SNS) and weak link (WL) junctions affects the speed, delay, and margin of the AQFP gates. The JJ models are defined in the Keysight ADS simulator using symbolically defined device (SDD) method.

21.
arXiv (CS.CL) 2026-06-11

Verifiable Environments Are LEGO Bricks: Recursive Composition for Reasoning Generalization

Reinforcement Learning (RL) with verifiable environments has emerged as a powerful approach for enhancing the reasoning capabilities of Large Language Models (LLMs). While prior research demonstrates that scaling environment quantity improves RL performance, existing manual or individual construction methods suffer from linear scaling limits, thereby hindering scalable reasoning generalization. This paper introduces RACES (Recursive Automated Composition for Environment Scaling), a framework that conceptualizes verifiable environments as composable building blocks that can be recursively assembled. The key insight is that when the codomain (output type) of one environment matches the domain (input type) of another, they can be automatically fused into a new verifiable environment, enabling recursive composition. RACES is implemented with 300 individual environments and defines a set of composition operators (\textsc{SEQUENTIAL}, \textsc{PARALLEL}, \textsc{SORT}, and \textsc{SELECT}) that induce diverse reasoning patterns. Extensive experiments show that RL training on these composite environments consistently enhances reasoning generalization. Specifically, RACES improves DeepSeek-R1-Distill-Qwen-14B by an average of 3.1 points (from 48.2 to 51.3) and boosts Qwen3-14B performance from 58.8 to 61.1 on six benchmarks, which are unseen during the construction of training environments. Moreover, RACES achieves performance comparable to training on 300 individual environments using only 50 base environments, demonstrating significant efficiency in environment utilization.

22.
medRxiv (Medicine) 2026-06-12

Does the method matter? Evaluating the effectiveness, efficiency and ease of hearing-aid gain self-adjustment

In conventional hearing-aid personalisation, clinicians cannot hear what their patients hear, and patients cannot often reliably detect or describe what they hear. Self-adjustment avoids this issue but requires user controls that adjust hearing-aid signal processing parameters to be effective, efficient and easy. In this study, we explored (a) the roles of interface complexity and stimulus type in the self-adjustment of hearing-aid gain, and (b) how well individuals can adjust one sound to match another to assess the same interfaces and stimuli. Adult hearing-aid users with mild to moderate symmetrical sensorineural hearing loss repeatedly adjusted the gain (a) to their preference from individual prescription (n = 41) and (b) to match their previous preferences from a random starting point (n = 32) using three interfaces representing different bass/mid/treble configurations and three stimuli (music, speech and speech-in-noise). The large interindividual variability in self-adjusted gains clustered into three patterns of deviation from initial prescription: increased relative bass, overall gain reduction, and close to initial prescription. There were no substantial effects of interface nor stimulus on self-adjustment reliability (median {sigma} = 2.8 dB), whereas absolute sound-matching error increased with increasing interface complexity and centre frequency. Neither individual matching accuracy nor questionnaire responses predicted either self-adjusted gains or reliability. Overall, these results show that many - but not all - hearing-aid users can adjust gains with reasonable reliability, and while it can be difficult to predict the behaviour from the individual, the individual applies a similar self-adjustment behaviour across different interfaces and stimuli.

23.
arXiv (CS.CL) 2026-06-11

Sonar-TS: Search-Then-Verify Natural Language Querying for Time Series Databases

Natural Language Querying for Time Series Databases (NLQ4TSDB) aims to assist non-expert users retrieve meaningful events, intervals, and summaries from massive temporal records. However, existing Text-to-SQL methods are not designed for continuous morphological intents such as shapes or anomalies, while time series models struggle to handle ultra-long histories. To address these challenges, we propose Sonar-TS, a neuro-symbolic framework that tackles NLQ4TSDB via a Search-Then-Verify pipeline. Analogous to active sonar, it utilizes a feature index to ping candidate windows via SQL, followed by generated Python programs to lock on and verify candidates against raw signals. To enable effective evaluation, we introduce NLQTSBench, the first large-scale benchmark designed for NLQ over TSDB-scale histories. Our experiments highlight the unique challenges within this domain and demonstrate that Sonar-TS effectively navigates complex temporal queries where traditional methods fail. This work presents the first systematic study of NLQ4TSDB, offering a general framework and evaluation standard to facilitate future research.

24.
arXiv (CS.LG) 2026-06-17

Perron–Frobenius Operator Matching for Generative Modeling

arXiv:2606.17465v1 Announce Type: new Abstract: We introduce Perron–Frobenius Operator Matching (PFOM), a generative framework that matches density evolution via the integral PF operator, subsuming flow, diffusion, and jump models. We prove that among Bregman divergences, only Kullback–Leibler divergence preserves equality between density-level and sample-conditioned objectives, yielding a practical loss equivalent to Koopman path matching. We further develop Nesterov-accelerated training and sampling that stabilize discretization and accelerate convergence. %On Gaussian mixtures and two-moons, PFOM achieves faster KL/$W_2$/MMD decrease and improved wall-clock efficiency with empirical validation. PFOM unifies operator-theoretic identification with modern generative modeling and opens paths to adaptive dictionaries and high-dimensional applications.

25.
arXiv (CS.AI) 2026-06-11

TileFuse: A Fused Mixed-Precision Kernel Library for Efficient Quantized LLM Inference on AMD NPUs

arXiv:2606.11357v1 Announce Type: cross Abstract: With the growing demand for on-device LLM inference, edge SoCs increasingly integrate NPUs to improve performance and energy efficiency under tight power and thermal budgets. However, practical LLM deployment on current client NPUs remains difficult: widely used quantization formats such as AWQ do not map cleanly onto many existing NPU software stacks, which are often proprietary and expose limited low-level control. In this work, we present TileFuse, a close-to-metal mixed-precision kernel library for AMD XDNA2 NPUs that targets transformer linear layers in quantized LLM inference. TileFuse brings practical low-bit formats such as AWQ-style W4A16 and W8A16 directly onto XDNA2, rather than forcing the model to be reshaped around an NPU-specific quantization scheme. TileFuse co-designs weight layout, metadata placement, mixed-precision microkernels, and array-level dataflow. Specifically, it fuses unpacking, dequantization, and GEMM/GEMV execution into a single kernel flow, introduces an interleaved pre-tiling layout that supports GEMM dimensions up to 32K, and redesigns GEMV dataflow to utilize the full 4x8 AIE array. Across kernel-level evaluations, TileFuse improves performance by up to 121.6% for GEMM and 281% for GEMV over full-precision baselines, while delivering more than 2x performance and energy-efficiency gains over strong iGPU baselines on GEMM. In end-to-end LLM experiments on Ryzen AI laptops, TileFuse achieves up to 2.0x lower prefilling latency with more than 64.6% lower energy consumption. Together, these results show that XDNA2 is a practical target for AWQ-style edge LLM inference and that native NPU support for off-the-shelf quantization can make NPUs substantially more usable in real client deployments.