Academic Intelligence · Curated Daily

探索全球前沿学术脉络

AcademicHub 汇聚顶级期刊与预印本平台的实时文献。定制您的专属科研雷达,利用大语言模型自动生成交叉领域文献分析简报。

01.
arXiv (CS.AI) 2026-06-19

Triangular Consistency as a Universal Constraint for Learning Optical Flow

arXiv:2606.19938v1 Announce Type: cross Abstract: We propose triangular consistency as a first-principled constraint for optical flow, which is agnostic to network architecture, supervision type, and dataset, and applies to both image-pair and multi-frame settings. This simple but powerful constraint is to compose two flows to induce a third flow and enforce consistency among the three. The composed flows may arise from (i) image pairs, yielding cycle consistency; (ii) multiple video frames, producing longer-range motion through temporal chaining; or (iii) image pairs combined with controlled synthetic transformations, which becomes data augmentation. This triangular consistency introduces negligible computational overhead and requires no additional annotations. Since it is derived directly from the geometry of optical flow, it does not rely on model-specific assumptions and serves as a ``universal'' plug-and-play component for optical flow training. Experiments show consistent improvement across supervised, unsupervised, and transfer learning settings.

02.
arXiv (CS.AI) 2026-06-15

Metabolic cost of information processing in Poisson variational autoencoders

arXiv:2602.13421v2 Announce Type: replace-cross Abstract: Computation in biological systems is fundamentally energy-constrained, yet standard theories of computation treat energy as freely available. Here, we argue that variational free energy minimization under a Poisson assumption offers a principled path toward an energy-aware theory of computation. Our key observation is that the Kullback-Leibler (KL) divergence term in the Poisson free energy objective becomes proportional to the prior firing rates of model neurons, yielding an emergent metabolic cost term that penalizes high baseline activity. This structure couples an abstract information-theoretic quantity – the *coding rate* – to a concrete biophysical variable – the *firing rate* – which enables a trade-off between coding fidelity and energy expenditure. Such a coupling arises naturally in the Poisson variational autoencoder (P-VAE) – a brain-inspired generative model that encodes inputs as discrete spike counts and recovers a spiking form of *sparse coding* as a special case – but is absent from standard Gaussian VAEs. To demonstrate that this metabolic cost structure is unique to the Poisson formulation, we compare the P-VAE against Grelu-VAE, a Gaussian VAE with ReLU rectification applied to latent samples, which controls for the non-negativity constraint. Across a systematic sweep of the KL term weighting coefficient $\beta$ and latent dimensionality, we find that increasing $\beta$ monotonically increases sparsity and reduces average spiking activity in the P-VAE. In contrast, Grelu-VAE representations remain unchanged, confirming that the effect is specific to Poisson statistics rather than a byproduct of non-negative representations. These results establish Poisson variational inference as a promising foundation for a resource-constrained theory of computation.

03.
arXiv (CS.CL) 2026-06-18

BCL: Bayesian In-Context Learning Framework for Information Extraction

Existing information extraction (IE) tasks increasingly adopt in-context learning (ICL) with large language models. However, current approaches either show inconsistent performance across model scales or lack systematic optimization and generalizability. Building on this, we propose BCL (Bayesian In-Context Learning Framework for Information Extraction), the first optimization framework that uses particle filtering with Bayesian updates to systematically refine label representations across IE tasks. Through four steps initialization, observation, weight update, and resampling, BCL generalizes to both sequence labeling and relation classification paradigms. Extensive experiments demonstrate substantial and consistent improvements over existing approaches.

04.
arXiv (CS.LG) 2026-06-18

Task-Restricted Symmetries in Recurrent Weight Space

arXiv:2606.18457v1 Announce Type: new Abstract: Recurrent networks can contain substantial functional redundancy in weight space: changing a recurrent matrix may leave the input-output rollout nearly unchanged on a task distribution, while similar-scale changes can destroy the same behavior. We study this redundancy in one-layer tanh RNNs using ordered real Schur coordinates. The Schur form separates spectral blocks from directed nonnormal couplings, giving a diagnostic basis for structured ablations that keep the input and readout maps fixed. In a fixed-length copy task, selected nonnormal Schur couplings can be removed with little loss in some trained solutions, whereas other couplings are necessary for accurate autonomous replay. Across flip-flop, sine generation, and context-dependent integration, the loss-preserving ablation profile varies across tasks and trained solutions. These results identify candidate approximate functional invariances, not universal symmetries of recurrent weight space. Schur-coordinate ablations provide a practical diagnostic for which structured perturbations preserve a trained recurrent solution and which ones disrupt its computation.

05.
bioRxiv (Bioinfo) 2026-06-24

Generative Modeling of Mouse Embryogenesis for Fate and Disease Prediction

Embryonic development is orchestrated by complex gene regulatory networks, and learning regulatory dynamics from developmental data could allow us to understand, predict, and ultimately engineer cell fates. Here we introduce Navigo (https://github.com/aristoteleo/Navigo-release), a biologically grounded generative modeling framework that learns a developmental vector field by integrating flow matching at the population level with RNA kinetics modeling at the molecular level. Navigo accurately maps developmental trajectories across lineages on a mouse embryogenesis scRNA-seq atlas spanning 43 time points and comprising 12.4 million cells. Applied to cardiac development, Navigo enables disease modeling by mechanistically resolving regulatory networks that distinguish congenital heart disease subtypes. Navigo also predicts perturbation effects in a zero-shot manner, as validated on independent in vivo data from six knockout genotypes without perturbation-specific training, uncovering lineage-specific gene-compensation mechanisms. Moreover, Navigo guides rational cell-fate engineering, exemplified by fibroblast reprogramming analyses, including identifying pro-fibrotic barriers to cardiac fates and evaluating hundreds of pairwise transcription factor combinations for neuronal fate, each consisting of one bHLH factor and one POU factor. Overall, Navigo provides a generalizable AI platform for perturbation-effect prediction, disease modeling, and rational cell-fate engineering, advancing toward AI-based virtual embryos for developmental biology and regenerative medicine.

06.
arXiv (CS.CL) 2026-06-19

Large Language Models Do Not Always Need Readable Language

Large language models (LLMs) are commonly prompted and interfaced with human-readable natural language, even when the intended reader is another model. This paper investigates whether semantic information can be encoded in compact, non-standard textual forms that sacrifice human readability while remaining recoverable by LLMs. We refer to this class of model-centric textual representations as BabelTele, approached here not as a fixed protocol but as an empirical probe into LLMs' capacity to generate and interpret such representations. Through readability diagnostics, model likelihood measures, human questionnaires, and downstream task evaluations, we find that BabelTele can substantially depart from ordinary natural language while preserving core semantics for instruction-tuned LLMs. As a task-agnostic representational paradigm, BabelTele demonstrates high information density, maintaining 99.5% semantic fidelity even when the text volume is condensed to 27.9% of its original length. We further evaluate its semantic robustness in cross-model transfer, agent memory, and multi-agent communication. Results suggest that BabelTele can reduce context overhead while generally maintaining reliable downstream performance, although its effectiveness depends on the compressor-reader pair and task setting. These findings indicate that human readability, natural-language typicality, and model-side semantic recoverability can be partially decoupled, opening a path toward model-native representations in future exploration of LLM systems.

07.
arXiv (CS.LG) 2026-06-24

WiFi-Based People Counting Using Beam-Steerable Antennas: A Test-bed Study

arXiv:2606.23710v1 Announce Type: cross Abstract: Ubiquitous perception through RF signals is a pivotal opportunity for future technology: it enables personalized services such as smart living, remote healthcare, automated logistics or interaction through free-space gestures. The ubiquity of Wi-Fi and cellular networks presents a promising platform for the development of innovative sensing tools. Future standards will also introduce dedicated sensing features which, for example, will allow routers to work as frequency modulated continuous wave radios targeting radar applications. Most of the current chip designs support ad-hoc firmware for CSI extraction with MIMO arrangements of the transmitter (TX) and receiver (RX) antennas and OFDM subcarriers. The CSI describes the phase shift and amplitude attenuation of multiple propagation paths on each subcarrier. The latest IEEE 802.11be standard (Wi-Fi 7) offers a wider subcarrier bandwidth of 160MHz (up to 320MHz), providing at least 120 usable pilot subcarriers for CSI or CIR estimation. Additionally, Wi-Fi signals have been recently exploited to track daily human movements and behaviors, while Wi-Fi signal variations have been shown to differ between different people and can consequently be used for their re-identification.

08.
arXiv (CS.CL) 2026-06-24

SHERLOC: Structured Diagnostic Localization for Code Repair Agents

LLM agents solve repository-level coding tasks through multi-turn tool use, but utilize half their budget on locating faults before editing. Dedicated localization frameworks have emerged, yet are still evaluated as file retrieval rather than actionable diagnosis, producing locations without the diagnostic context a repair agent needs. We introduce SHERLOC (Structured Hypothesis-driven Exploration and Reasoning for Localization), a training-free framework pairing a reasoning LLM with compact repository tools and self-recovery, without fine-tuning or multi-agent orchestration. SHERLOC reaches state-of-the-art localization across model scales: 84.33% accuracy@1 on SWE-Bench Lite and 81.27% recall@1 on SWE-Bench Verified; at ~30B parameters, it matches or outperforms other agentic methods. Injecting our locations and diagnostic findings into repair agents yields, on average, +5.95 pp resolve rate on SWE-Bench Verified while cutting localization and total tokens by 36.7% and 23.1%.

09.
arXiv (CS.AI) 2026-06-19

Tri-Info: Generalizable, Interpretable Failure Prediction for VLA Models via Information Theory

arXiv:2606.19998v1 Announce Type: cross Abstract: Vision-Language-Action (VLA) models are increasingly deployed across diverse tasks, yet they remain black boxes whose physical interactions can cause irreversible harm, making generalizable and interpretable failure detection essential. We observe that successful and failed rollouts carry systematically different information-theoretic signatures. Building on this, we formalize VLA control as a closed-loop information pipeline and derive the Triple Information-theoretic (Tri-Info) signals that capture whether actions remain diverse, temporally consistent, and coupled to state transitions. Across six VLA models and three benchmark environments, Tri-Info matches the strongest baselines in-domain. Moreover, Tri-Info transfers across architectures, environments, and the sim-to-real gap without retraining, reaching 83\% accuracy on real-world tasks where prior detectors collapse to chance. This establishes Tri-Info as a simple yet powerful method that not only detects failures with strong cross-domain generalization, but also delivers interpretable diagnostics of the underlying failure modes.

10.
arXiv (CS.LG) 2026-06-12

From Uncertain Judgments to Calibrated Rankings: Conformal Elo Estimation for LLM Evaluation

arXiv:2606.13221v1 Announce Type: new Abstract: Evaluating new large language models typically requires costly human annotation campaigns at scale. LLM-as-a-judge offers a cheaper alternative, but judge scores carry systematic errors - such as position bias, self-preference, or intransitivity - that can strongly miscalibrate the resulting rankings. We quantify the resulting judge-human disagreement at two complementary levels. At the local level, we estimate per-battle uncertainty from the judge's own score differences by propagating calibrated win probabilities rather than hard labels into the Bradley-Terry procedure. This alone provides a drastic improvement to Elo estimation accuracy, bringing LLM-derived ratings within 17.9 Elo MAE of human-derived ones when averaged over 55 held-out models on LMArena. At the global level, we apply split conformal prediction to the residual gap between LLM-derived and human-derived Elo ratings across held-out models, producing prediction intervals with distribution-free marginal coverage guarantees that account for irreducible LLM-human disagreement. Together, these two layers yield a low-cost evaluation tool that provides developers with calibrated Elo estimates and honest uncertainty bounds, without access to large-scale human annotations.To facilitate reproducibility, we release our code at https://github.com/kargibora/SoftElo .

11.
arXiv (CS.LG) 2026-06-17

Eigen-Spike Emergence and Quadratic Equivalents for Conjugate Kernels on Nonlinearly Separable Data

arXiv:2605.29669v2 Announce Type: replace-cross Abstract: Recent work in random matrix theory (RMT) has developed the notion of deterministic equivalents: typically linear surrogate models that approximate the spectral behavior of large nonlinear random matrices, such as nonlinear feature maps in neural networks (NNs). Such equivalents make theoretical predictions tractable by reducing a complex model to a simpler one with properties that fall under the umbrella of classical RMT tools. However, this leaves open the question of whether this idealized linear equivalence remains meaningful for classification of high-dimensional nonlinearly separable data. Motivated by this, we consider the conjugate kernel (CK), which is the nonlinear feature map of a one-layer feedforward NN, under a canonical nonlinearly separable dataset for the XOR problem; and we use the study of informative outlier eigenvalues in the CK and whether their corresponding eigenvectors asymptotically align with XOR labels as a proxy for nonlinear learnability. We develop a robust quadratic equivalent of the CK matrix that enables a precise analysis of emergent informative spikes, as one modifies various knobs common in ML practice: sample complexity, signal-to-noise ratio (SNR), nonlinear activation choice, and pretrained features. We identify regimes in which these knobs move the CK beyond the linear equivalent and produce BBP-type transitions to label-aligned outlier eigenspaces. Our analysis helps bring deterministic-equivalence tools from RMT to bear on problems of practical relevance in ML.

12.
arXiv (CS.CV) 2026-06-24

High-Fidelity Synthetic Transmission Electron Microscopy Image Generation Using Diffusion Probabilistic Models for Data-Limited Semiconductor Metrology

Advanced semiconductor nodes drastically increased demand for Transmission Electron Microscopy (TEM), yet destructive sample preparation, slow imaging and high costs severely limit the availability of diverse datasets needed for downstream machine learning (ML). Synthetic data generation is becoming essential, but current generative models often miss TEM-specific noise, structural detail, and stochastic variability crucial for evaluation. We present a Denoising Diffusion Probabilistic Model (DDPM) framework for synthetic TEM image generation under extreme data scarcity. A progressive patch-based training strategy scales from low-resolution patches to full images, enabling from-scratch training with only 15 samples. We integrate a custom TrivialAugment adaptation, cross-process domain transfer, classifier guidance, and RePaint-style inpainting, culminating in full-image generation that preserves global structural and spatial relationships in compliance with FAB metrology requirements. Beyond synthesis, we repurpose DDPM feature representations for segmentation, partitioning encoder feature maps to obtain coherent region masks. Our synthetic images achieve up to MS-SSIM > 0.98 and qualitative expert assessment consistent with structural similarity results, facilitating downstream ML training for defect detection, segmentation, and metrology while preserving statistical and physical realism.

13.
arXiv (CS.LG) 2026-06-19

Streaming Interventions: Can Video Large Language Models Correct Mistakes as They Occur?

arXiv:2606.09547v2 Announce Type: replace-cross Abstract: Learning everyday skills, like cooking a dish, relies increasingly on instructional media such as online videos. This opens the door to the use of video (and multimodal) large language models (LLMs) as task guidance assistants. A crucial capability for the real-world success of a prospective task guidance assistant is it's ability to intervene proactively as soon as a mistake is apparent in order to guide the user. To evaluate this crucial capability, we introduce Ego-MC-Bench (Mistake Corrections), a benchmark for evaluating reactive, step-by-step task guidance in realistic cooking scenarios. Extensive experiments show that Ego-MC-Bench is highly challenging for state-of-the-art video LLMs. We argue that a key reason is the limited availability of training data for fine-tuning models on this task. Although there exists a wide range of cooking video datasets, existing datasets lack examples of mistakes along with appropriately timed interventions. To help address this data limitation, we also introduce Ego-CoMist, a counterfactual synthetic dataset created by transforming non -interactive cooking videos into supervised training examples showing proactive interventions. We show that fine-tuning on Ego-CoMist yields performance gains especially for smaller and more efficient video LLMs that are well suited for delivering assistance on edge devices.

14.
arXiv (CS.AI) 2026-06-25

Shepherd: Enabling Programmable Meta-Agents via Reversible Agentic Execution Traces

arXiv:2605.10913v3 Announce Type: replace Abstract: As LLM agent systems take on more complex tasks, they increasingly rely on meta-agents: higher-order agents that create, operate on and manage other agents. Meta-agent operations such as coordinating agents, halting risky actions before execution, or repairing failed runs, require runtime manipulation of agentic execution. Yet existing agentic substrates make this difficult: they expose only transcripts and environment snapshots, forcing meta-agents to build ad hoc tooling to reconstruct and operate over full execution state. Therefore, we introduce Shepherd, a Python substrate grounded in functional programming principles, where an agent's execution is itself a first-class object that a meta-agent can easily inspect and transform. Every model action, tool call, and environment change becomes a structured event in a reversible, Git-like execution trace, where any past state can be reverted 5x faster than docker commit and fork. Three example use cases show Shepherd's versatility: (1) a supervisor meta-agent prevents conflicts among parallel coding agents, lifting pair-coding pass rate from 28.8% to 54.7% on CooperBench; (2) a counterfactual optimization meta-agent repairs agent workflows by proposing edits and replaying runs from the point of changed behavior, outperforming MetaHarness on Terminal-Bench 2.0 by 12.8% with 58% lower wall-clock; (3) a training meta-agent picks fork points during rollouts to improve credit assignment in long-horizon agentic RL, doubling GRPO's uplift on Terminal-Bench 2.0. We open-source Shepherd to enable principled and efficient operations over agentic execution for both users and meta-agents.

15.
arXiv (CS.LG) 2026-06-15

Lyapunov-Based Sample Complexity Analysis for Weakly-Coupled MDPs

arXiv:2606.14095v1 Announce Type: new Abstract: We study the sample complexity of learning in average-reward weakly-coupled Markov decision processes (WCMDPs) and Restless Bandits (RBs) under a generative model. Naive reduction to a tabular MDP leads to high complexity bounds as the state-action space is exponentially large in the number of arms $N$. By exploiting the weakly coupled structure, we show that near-optimal policies can be learned with sample and computational complexities that are polynomial in $N$. Specifically, we analyze the plug-in approach, which applies an efficient planning algorithm to an empirical model estimated from data. For fully heterogeneous WCMDPs, we establish the first finite-sample PAC guarantee with polynomial complexity and an $O(1/\sqrt{N})$ optimality gap. For homogeneous RBs, we further prove that a smaller optimality gap is achievable under mild structural assumptions. A primary technical contribution of our work is a novel Lyapunov-based analysis framework. Unlike classical approaches that rely on the difficult-to-control bias function, our framework uses an explicitly constructed Lyapunov function along with a drift transfer technique between the true and empirical models. A key step of independent interest in our framework is a fine-grained perturbation analysis for the underlying linear programming (LP) relaxation, which provides a general tool for analyzing LP-based policies and weakly-coupled systems.

16.
arXiv (CS.CV) 2026-06-11

Finding Sparse Subnetworks in One Training Cycle via Progressive Magnitude-Based Pruning

Neural network pruning reduces model size by removing less important parameters while aiming to preserve predictive performance. Although the Lottery Ticket Hypothesis (LTH) shows that sparse subnetworks can match dense networks when trained from suitable initializations, its iterative pruning procedure requires multiple complete training cycles. This work evaluates progressive magnitude-based pruning as a single-cycle alternative. The method gradually increases sparsity during training using a linear schedule and updates pruning masks based on active weight magnitudes. We conduct systematic experiments on CIFAR-10 and MNIST across ResNet, VGG-style, and LeNet architectures, comparing the proposed method with representative iterative and initialization-based pruning baselines, including LTH, SNIP, and GraSP. On CIFAR-10, the method achieves 95.12\% accuracy on ResNet-18 at 72.9\% sparsity, compared with 90.5\% reported for LTH. At extreme sparsity, it achieves 93.13\% accuracy on a VGG-like architecture at 97\% sparsity, compared with approximately 92.0\% for SNIP, and 93.44\% accuracy on VGG-19 at 97.97\% sparsity, compared with 92.19\% for GraSP at 98\% sparsity. A sparsity-accuracy analysis on ResNet-18 further shows that accuracy remains within 0.1 percentage points of the dense baseline across 70–85\% sparsity. These results indicate that progressive magnitude-based pruning provides an effective single-cycle approach for neural network sparsification under the evaluated settings.

17.
arXiv (CS.AI) 2026-06-16

Mosaic: Data-Free Knowledge Distillation via Mixture-of-Experts for Heterogeneous Distributed Environments

arXiv:2505.19699v2 Announce Type: replace-cross Abstract: Federated Learning (FL) is a decentralized machine learning paradigm that enables clients to collaboratively train models while preserving data privacy. However, the coexistence of model and data heterogeneity gives rise to inconsistent representations and divergent optimization dynamics across clients, ultimately hindering robust global performance. To transcend these challenges, we propose Mosaic, a novel data-free knowledge distillation framework tailored for heterogeneous distributed environments. Mosaic first trains local generative models to approximate each client's personalized distribution, enabling synthetic data generation that safeguards privacy through strict separation from real data. Subsequently, Mosaic forms a Mixture-of-Experts (MoE) from client models based on their specialized knowledge, and distills it into a global model using the generated data. To further enhance the MoE architecture, Mosaic integrates expert predictions via a lightweight meta model trained on a few representative prototypes. Extensive experiments on standard image and multimodal benchmarks demonstrate that Mosaic consistently outperforms state-of-the-art approaches under both model and data heterogeneity. The source code has been published at https://github.com/Wings-Of-Disaster/Mosaic.

18.
medRxiv (Medicine) 2026-06-19

Hyperleukocytosis and outcomes in pediatric B-cell acute lymphoblastic leukemia: A report from the REDIAL Consortium

Hyperleukocytosis (white blood cell [WBC] count >100 000/uL) at diagnosis is an important prognostic risk factor in pediatric acute lymphoblastic leukemia (ALL), though its significance with contemporary therapy is unclear. We analyzed 1 826 pediatric ALL patients from a multi-institution cohort to determine whether hyperleukocytosis independently predicts outcomes using multivariable Cox proportional hazard modeling. Hyperleukocytosis occurred in 211 patients (12%), with 121 having B-ALL, and showed no prognostic significance in T-ALL patients. In B-ALL, 5-year event-free survival (EFS) was 65% versus 89% for non-hyperleukocytosis patients, and overall survival (OS) was 78% versus 93%. After adjustment for age, cytogenetic risk, central nervous system disease status, and treatment site, hyperleukocytosis remained an independent predictor of end-of-induction minimal residual disease (MRD) positivity (odds ratio 2.53 [95% confidence interval [CI]: 1.71-3.94; p

19.
arXiv (CS.CV) 2026-06-16

Mutual Distillation of Dual-Foundation Models for Semi-Supervised PET/CT Segmentation

Organ segmentation from PET/CT is critical for quantitative analysis and radiotherapy planning in oncology. To ease the high annotation cost of PET/CT segmentation, semi-supervised learning (SSL) provides a practical and effective solution for developing deep models with limited labeled data. Recent developments in visual foundation models have demonstrated remarkable adaptability with improved efficiency. In this work, we propose a mutual distillation framework that seamlessly exploits both structural and functional foundation models, which act as modality-specific generalists for distilling knowledge from structural CT and metabolic PET imaging. By bridging the gap between the task-specific precision of student models and the segmentation priors of generalist foundation models, we propose MuDuo, a mutual distillation framework that synergistically leverages SAM-Med3D for CT and SegAnyPET for PET to distill their knowledge into a lightweight student network. Our approach eliminates the need for manual prompts while maximizing the utility of unlabeled data for automatic segmentation, achieving state-of-the-art performance on the AutoPET dataset with only 5 labeled cases. Our source code is available at https://github.com/Wu-beining/MuDuo.

20.
arXiv (CS.CV) 2026-06-25

A welding penetration prediction model for laser welding process based on self-supervised learning using physics-informed neural networks

The laser welding full-penetration is of critical importance, as it constitutes one of the fundamental factors in achieving defect-free welded joints. Accurate prediction of the penetration state is therefore essential for ensuring weld quality. To this end, this paper introduces SimPhysNet, a novel algorithm that achieves high classification accuracy in laser welding penetration prediction using only a limited number of labelled images. This approach effectively overcomes the limitations of supervised learning classification algorithms, which are hindered in industrial applications by their dependence on extensive, high-quality labelled data. The core of SimPhysNet is a unique self-supervised learning paradigm that embeds physical priors into a contrastive learning framework. By incorporating a physics-informed neural network (PINN), the model is guided to extract physically meaningful features of the molten pool and keyhole from a large set of unlabelled data, while three image augmentation tasks further enhance its generalization capabilities. Subsequently, a few-shot learning strategy, based on prototypical networks, enables robust classification by constructing class representations from a minimal set of labelled images. Experimental results demonstrate that SimPhysNet achieves a classification accuracy of 96.06% using only 200 labelled images (approximately 5% of the total labelled dataset), which is comparable to the performance of conventional supervised learning algorithms that utilize the entire labelled dataset. This work presents a new, efficient, and highly accurate method, providing the way for the intelligent automation of laser welding.

21.
arXiv (CS.LG) 2026-06-15

When to Write and When to Suppress: Route-Specialized Dual Adapters for Memory-Assisted Knowledge Editing

作者:

arXiv:2606.14668v1 Announce Type: new Abstract: Knowledge editing systems must update selected facts while preserving nearby but irrelevant behavior. This paper studies this problem in a memory-assisted setting where an edit memory is retrieved at inference time and a parameter-efficient adapter corrects the model's object preference. We argue that the central design question is not only how to write an edit, but also when to suppress it. We introduce \method{}, a route-specialized dual-adapter editor. A relevance router first decides whether a prompt should receive an edit memory. Routed prompts use an edit adapter trained to prefer the new object over the original object; unrouted non-direct prompts use a separate locality adapter trained to preserve or restore the original-object preference. We evaluate \method{} on three 1,000-case protocols, \cf{}, \zsre{}, and \mquake{}, under the same memory protocol and two 7B/8B base models. On Llama-3.1-8B-Instruct, \method{} obtains the best overall probability-preference accuracy on all three benchmarks: 0.8180 on \cf{}, 0.8946 on \zsre{}, and 0.9922 on \mquake{}. The same trend holds on Qwen3-8B. Router ablations show that the relevant memory boundary differs across datasets: a lexical neural router is safest on \cf{}, while BGE embedding routing is better on \zsre{} and \mquake{}. Component and module ablations show that the gain mainly comes from separating edit injection from off-route suppression rather than from simply increasing LoRA capacity.

23.
arXiv (CS.AI) 2026-06-24

VeriPilot: An LLM-Powered Verilog Debugging Framework

arXiv:2606.23759v1 Announce Type: cross Abstract: Verilog debugging remains one of the most time-consuming stages in digital circuit design. Recent advances in Large Language Models (LLMs) have enabled automated debugging; however, most existing approaches rely solely on test outputs and compiler feedback in an end-to-end manner, limiting their effectiveness on complex bugs. A key challenge is that the root cause of an error may be far removed from its observable outputs, making it difficult for LLMs to trace long dependency chains in code. This challenge is further exacerbated in large codebases, where long context lengths hinder efficient reasoning. To address these limitations, we propose VeriPilot, an LLM-powered debugging framework that leverages golden reference models to enable fine-grained bug localization and repair. VeriPilot goes beyond output-level comparison by aligning internal variable semantics between the Verilog design and its corresponding golden model through LLM-based analysis. It then performs step-by-step signal tracing using Control-Data-Flow Graphs (CDFGs) derived from static analysis, identifying a minimal set of suspicious code regions along with their correct counterparts from the golden model. These structured insights are subsequently provided to the LLM to guide reasoning and automated code repair. Experimental results on the Comprehensive Verilog Design Problems (CVDP) benchmark from NVIDIA demonstrate that VeriPilot improves the repair success rate of GPT-4o from 54.3\% to 85.71\%, significantly enhancing both bug localization accuracy and repair effectiveness for complex Verilog designs. The source code and benchmark are publicly available at Github https://github.com/YihanWn/VeriPilot.git.

24.
arXiv (CS.AI) 2026-06-24

Ten Digits on a Train: AI-Assisted Verification of Two Eigenvalue Problems

arXiv:2606.23821v1 Announce Type: cross Abstract: Accurate numerical eigenvalues are often difficult to certify, especially in singular or non-normal settings. This article reports a human–AI collaboration on two such computations. For a singular self-adjoint Schrödinger operator, a verified zero count and Dirichlet–Neumann bracketing certify the complete negative spectrum to ten decimal places. For a delicate non-normal atom–molecule benchmark, a previously unresolved resonance pair is separated, with each member enclosed to ten digits. The second result is achieved not by increasing the precision of one-way shooting, but by reformulating the problem as a global matching system for projective solution lines. The infinite tail is encoded as uncertainty in the terminal projective data, and a componentwise, tail-robust Krawczyk–Brouwer inclusion supplies the certificate. This gives a reusable architecture for analytic boundary-value systems with ill-conditioned propagation and uncertain asymptotic data. The collaboration also exposes the strengths and limits of AI assistance. AI rapidly produced accurate candidates and plausible proof strategies, but several failed, including one apparently complete tail argument that omitted the componentwise check required by a nonuniform polydisc. Validated computation is a stringent test of AI-assisted mathematics: the output is not merely a number, but a number with a proof. These examples show why the proof object matters, and why human mathematical judgment remained decisive. More broadly, as AI makes code, exposition, and plausible numerical claims inexpensive, standards for verification, attribution, peer review, and training must adapt. The implications are unsettling; the opportunity is extraordinary.

25.
arXiv (CS.AI) 2026-06-11

TouchThinker: Scaling Tactile Commonsense Reasoning to the Open World with Large-scale Data and Action-aware Representation

arXiv:2606.11637v1 Announce Type: new Abstract: Touch is a key modality for embodied agents to understand the physical world. Although recent work has incorporated tactile signals into language systems for tactile commonsense reasoning, scaling such systems to realistic open-world settings remains challenging due to two key bottlenecks: (1) current tactile reasoning datasets remain limited in format and scale, providing insufficient supervision for reasoning from tactile observations to physical commonsense and hindering the learning of transferable tactile commonsense; (2) Tactile signals are inherently redundant and action-specific, yet existing methods often overlook these properties, resulting in inefficient representations with limited semantic expressiveness. To address these limitations, we propose TouchThinker, a tactile-language framework that scales tactile commonsense reasoning to the open world from both data and representation perspectives. First, we construct TouchThinker-1M, a million-scale, multi-source tactile reasoning dataset covering 415 objects, 8 scenarios, and 7 sensor types, providing a solid data foundation for open-world generalization. We further introduce TouchThinker-Bench, an open-world benchmark with more realistic and diverse tasks. Then, we propose action-aware modeling mechanism to improve tactile representation efficiency and enable efficient reasoning. Experimental results demonstrate that TouchThinker achieves competitive performance against state-of-the-art models across multiple datasets. Our code and dataset will be made available at: https://github.com/lvkailin0118/TouchThinker.