Academic Intelligence · Curated Daily

Explore the Frontier of Global Academia

AcademicHub aggregates real-time literature from top journals and preprint platforms. Build your personal research radar and let large language models compile cross-disciplinary analysis briefings automatically.

01.
arXiv (CS.CL) 2026-06-16

G-Loss: Graph-Guided Fine-Tuning of Language Models

Traditional loss functions, including cross-entropy, contrastive, triplet, and su pervised contrastive losses, used for fine-tuning pre-trained language models such as BERT, operate only within local neighborhoods and fail to account for the global semantic structure. We present G-Loss, a graph-guided loss function that incorporates semi-supervised label propagation to use structural relationships within the embedding manifold. G-Loss builds a document-similarity graph that captures global semantic relationships, thereby guiding the model to learn more discriminative and robust embeddings. We evaluate G-Loss on five benchmark datasets covering key downstream classification tasks: MR (sentiment analysis), R8 and R52 (topic categorization), Ohsumed (medical document classification), and 20NG (news categorization). In the majority of experimental setups, G-Loss converges faster and produces semantically coherent embedding spaces, resulting in higher classification accuracy than models fine-tuned with traditional loss functions.

02.
arXiv (CS.LG) 2026-06-17

AIMER: Calibration-Free Task-Agnostic MoE Expert Pruning

arXiv:2603.18492v3 Announce Type: replace Abstract: Mixture-of-Experts (MoE) language models increase parameter capacity without proportional per-token computation, yet deployment still requires storing the full expert pool, making expert pruning important for reducing memory and serving overhead. Existing task-agnostic expert-pruning methods are typically calibration-dependent: they estimate expert importance from routing or activation statistics on a calibration set, making pruning decisions sensitive to calibration-data variation while introducing substantial preprocessing cost. We propose AIMER (Absolute mean over root mean square IMportance for Expert Ranking), a simple calibration-free criterion that identifies more distinct experts by capturing the concentration pattern of expert weights, making it well suited for task-agnostic expert pruning. Across 7B to 47B MoE language models with distinct architectures and 16 diverse benchmarks, AIMER consistently delivers stronger capability balance across diverse tasks than existing calibration-free methods. Surprisingly, AIMER also achieves better balance than strong calibration-based expert-pruning baselines calibrated on the widely used task-agnostic C4 corpus, while requiring only 0.22–2.06 seconds to score all experts.

03.
arXiv (quant-ph) 2026-06-25

A phase-space approach for performing continuous-variable quantum teleportation with a non-Gaussian resource

arXiv:2606.25471v1 Announce Type: new Abstract: We present a comprehensive phase-space analysis of continuous-variable quantum teleportation employing a photon-subtracted two-mode squeezed Fock state (PS-TMSFS) as an entangled resource. We investigate the usefulness of PS-TMSFS within the Braunstein-Kimble teleportation protocol. We explain the generation scheme for the resource state and derive the analytical expression for the success probability associated with the photon-subtraction process. The Wigner characteristic function of PS-TMSFS is calculated and then employed to determine the fidelity for coherent and squeezed state inputs. The dependence of the success probability and teleportation fidelity on the squeezing parameter and beam-splitter transmissivity is analyzed in detail for both symmetric and asymmetric photon-subtraction scenarios. We find that the teleportation fidelity exhibits a strong dependence on the resource parameters and is highly sensitive to variations in the subtraction process. The photon-subtraction process modifies the non-Gaussianity of the resource state, but no substantial enhancement of the teleportation fidelity is detected. Despite the non-Gaussian character of the resource state, fidelity above the classical coherent-state benchmark is observed only for the symmetric $(1,1) $ photon-subtraction configuration in the low-squeezing regime that decreases with increasing squeezing. The remaining configurations remain below the classical threshold throughout the parameter range considered. These findings indicate that the PS-TMSFS may not be a suitable resource for continuous-variable quantum teleportation and offers insight into the limitations of this class of non-Gaussian states.

04.
arXiv (CS.AI) 2026-06-24

TIP-Search: Time-Predictable Inference Scheduling for Market Prediction under Uncertain Load

Authors:

arXiv:2506.08026v4 Announce Type: replace Abstract: Real-time market prediction services need correct predictions before a decision deadline; a correct prediction delivered late is not usable. TIP-Search studies time-predictable inference scheduling over fixed market predictors under uncertain load. It filters conformal latency-quantile feasible models, dispatches over finite workers, and uses shielded constrained online experts to trade accuracy, queue pressure, and deadline risk. On the optimized deployable pool, TIP-Search reaches 0.994 raw accuracy and 0.991 timely accuracy. On official TLOB FI-2010 h=10, TIP-Search++ raises timely accuracy from 0.156 to 0.239 and deadline satisfaction from 0.391 to 0.962. In matched h10 profiled systems replay, OCO-ACPO reaches 0.303 timely accuracy and 0.951 deadline satisfaction, with paired gains over RAMSIS/SneakPeek/utility-style comparators of $+0.00285$ timely accuracy ($p=0.0118$) and $+0.0146$ deadline satisfaction ($p=1.5{\times}10^{-5}$). SA-OCO-ACPO improves timely/deadline service by 0.188–0.417 over CPO under nonstationary stress. The claim is a systems scheduling result, not a broad LOB classifier leaderboard.

05.
arXiv (CS.LG) 2026-06-16

Self-Supervised Learning of Iterative Solvers for Constrained Optimization

arXiv:2409.08066v3 Announce Type: replace Abstract: The real-time solution of parametric optimization problems is critical for applications that demand high accuracy under tight real-time constraints, such as model predictive control. To this end, this work presents a learning-based iterative solver for constrained optimization, comprising a neural network predictor that generates initial primal-dual solution estimates, followed by a learned iterative solver that refines these estimates to reach high accuracy. We introduce a novel loss function based on Karush-Kuhn-Tucker (KKT) optimality conditions, enabling fully self-supervised training without pre-solved optimizer solutions. Theoretical guarantees ensure that the training loss function attains minima exclusively at KKT points. A convexification procedure enables application to nonconvex problems while preserving these guarantees. Experiments on two nonconvex case studies demonstrate speedups of up to one order of magnitude compared to state-of-the-art solvers such as IPOPT, while achieving orders of magnitude higher accuracy than competing learning-based approaches.

06.
arXiv (CS.CL) 2026-06-25

BitNet Text Embeddings

LLM-based text embedders have substantially improved retrieval and semantic representation quality, but their deployment remains costly: large backbone models slow down embedding inference, while high-dimensional full-precision embeddings impose substantial storage and bandwidth overhead on large-scale indexes. In this paper, we present BITEMBED, an extreme low-bit framework for LLM-based text embedding that jointly targets encoding efficiency and vector storage. BITEMBED converts pretrained LLM backbones into BitNet-style embedding encoders with ternary weights, quantized activations, and lightweight normalization refinement. The converted model is adapted to representation learning through continual contrastive pre-training, followed by supervised contrastive fine-tuning with both similarity-distribution distillation and attention-relation distillation from a full-precision teacher. Beyond quantizing the backbone, BITEMBED further trains output embeddings to support multiple storage precisions meeting different storage needs in various scenarios. Experiments on MMTEB (eng, v2) with Qwen3-0.6B and Gemma3-270M show that BITEMBED is largely comparable to full precision teacher embedders. Moreover, BITEMBED flexibly obtains text embeddings of various precisions, achieving a trade-off between performance and storage cost.

08.
arXiv (CS.AI) 2026-06-24

Representation Interventions Enable Lifelong Knowledge Memory Control in LLMs

arXiv:2511.20892v4 Announce Type: replace Abstract: Large language models (LLMs) often produce incorrect or outdated content after being employed. Efficient and accurate knowledge updates without costly retraining are a major challenge. This problem is particularly challenging in lifelong settings, where complex, unstructured knowledge must coexist without interference. We introduce RILKE (Representation Intervention for Lifelong KnowledgE Control), a robust and scalable method that treats knowledge control as interventions within the model's representation space. Leveraging representation-space expressiveness, we identify two key properties enabling RILKE to achieve fine-grained control over complex, unstructured knowledge while maintaining general utility with frozen base weights. During training, RILKE learns paraphrase-robust and edit-localized modules that limit each update to a low-dimensional subspace to minimize cross-edit interference. At inference, a query-adaptive router selects the appropriate module to guide the model's generation. Across LLaMA and Qwen models, RILKE scales effectively to large-scale benchmarks, demonstrating high edit success and strong paraphrase generalization while preserving general utility with modest memory overhead. These results show RILKE is an effective and scalable solution for lifelong knowledge control in LLMs.

09.
medRxiv (Medicine) 2026-06-17

Womens intentions and motivations towards health behaviour change before pregnancy: a cross-sectional survey of pregnant women in Australia

Introduction: The preconception period (i.e. the weeks and months before pregnancy) is a critical window during which parental health behaviours can influence pregnancy outcomes and the childs long-term health. Modifiable factors such as nutrition, physical activity, substance use, and environmental exposures play a key role, yet womens ability to adopt and sustain healthy behaviours is shaped by complex psychological, social and environmental influences. This study applies the Theory of Planned Behaviour to identify the beliefs underpinning womens preconception behaviours, with the aim of informing support for effective and sustained health behaviour change. Methods: An Australian national retrospective cross-sectional survey of pregnant women (18-49 years), recruited through social media platforms. The 92-item survey captured respondent socio-demographics, pregnancy status and health conditions, health behaviours, and beliefs regarding preconception health behaviours. Respondents level of pregnancy planning was categorised using the London Measure of Unplanned Pregnancy (LMUP). Items regarding preconception beliefs were structured in accordance with the Theory of Planned Behaviour, with a focus on regular exercise, healthy diet, and alcohol avoidance. These beliefs variables were analysed using structured equation modelling to identify paths between latent variables and the items used to estimate each concept. Results: The study was completed by 430 pregnant women of whom 72.7% had a planned pregnancy. Most had a partner, were university educated and in good health. Structural equation modelling showed intention strongly predicted exercise ({beta}=0.65), healthy diet ({beta}=0.54) and alcohol avoidance ({beta}=0.64). Perceived control and partner norms influenced intentions, whereas health professional norms had limited effect. Positive beliefs were associated with folate supplement use and smoking cessation. Conclusion: These findings highlight intention as a key driver of preconception health behaviours, with perceived control and partner influences playing a more significant role than individual beliefs or health professional input. Effective interventions should therefore address structural barriers and actively involve partners, while respecting womens autonomy. Overall, couples-focused, multi-level strategies are likely essential to support meaningful and sustained preconception health behaviour change.

10.
arXiv (CS.LG) 2026-06-11

Reverse Flow Matching: A Unified Framework for Online Reinforcement Learning with Diffusion and Flow Policies

arXiv:2601.08136v2 Announce Type: replace Abstract: Diffusion and flow policies are gaining prominence in online reinforcement learning (RL) due to their expressive power, yet training them efficiently remains a critical challenge. A fundamental difficulty that distinguishes online RL from standard generative modeling is the lack of direct samples from the target Boltzmann distribution defined by the Q-function. To address this, two seemingly distinct families of methods have been proposed for diffusion policies: a noise-expectation family, which uses a weighted average of noise as the training target, and a gradient-expectation family, which employs a weighted average of Q-function gradients. However, it remains unclear how these objectives are formally related, or whether they can be synthesized into a more general formulation. In this paper, we propose a unified framework, reverse flow matching (RFM), which rigorously addresses the problem of training diffusion and flow models without direct target samples. By adopting a reverse inferential perspective, we formulate the training target as a posterior mean estimation problem given an intermediate noisy sample. Crucially, we introduce Langevin Stein operators to construct zero-mean control variates, deriving a general class of estimators that share the same expectation. We show that existing noise-expectation and gradient-expectation methods are simply two specific instances within this broader class. This unified view yields two key advancements: it extends the capability of targeting Boltzmann distributions from diffusion to flow policies, and it enables the principled combination of Q-value and Q-gradient information to form an effective estimator, thereby improving training efficiency and stability. We instantiate RFM to train a flow policy in online RL and demonstrate improved performance on continuous-control benchmarks compared to diffusion policy baselines.

11.
arXiv (CS.LG) 2026-06-18

Some Complexity Results for Robustness Verification for Binarized Neural Networks

arXiv:2606.18918v1 Announce Type: new Abstract: This paper studies the computational complexity of verification problems for Binarized Neural Networks (BNNs), where activations (and sometimes weights) are binary. We analyze two problems: satisfiability and robustness under uniform image occlusion. We show that BNN satisfiability is NP-complete via a reduction from Boolean satisfiability problem (SAT), and that uniform occlusion induces a piecewise-constant structure in the network output, enabling a polynomial-time robustness-checking algorithm.

12.
arXiv (CS.CL) 2026-06-16

Formalize Once, Edit the Rest: Efficient Lean-Based Answer Selection for Math Reasoning

With large language models (LLMs) increasingly applied to mathematical reasoning, formal proof assistants such as Lean can be leveraged to verify reasoning outputs with machine-checkable rigor, enabling use cases such as answer selection in test-time scaling with K sampled candidate answers. However, employing Lean requires that LLM outputs, originally in natural language, first be formalized. Existing Lean-based answer-selection work uses an autoformalization model to generate a formal statement in Lean for each candidate answer independently, incurring a significant computational cost. We propose BASE, a base-and-edit pipeline that formalizes a single base candidate per problem and derives the remaining K-1 statements by editing the answer expression in place. To facilitate this, we train a rewriter model LEANSCRIBE to localize the answer in the base formalization and generate a reusable edit function for the other K-1 candidates. BASE simultaneously improves selection accuracy and reduces formalization cost - a Pareto improvement that holds on all 12 (dataset, solver) configurations across four benchmarks and three solvers, cutting autoformalizer calls by about 5x at K=8, with the reduction expected to become larger as K grows. Code is available at https://github.com/ucr-rai/base-and-edit.

13.
arXiv (CS.AI) 2026-06-24

The Latent Bridge: A Continuous Slow-Fast Channel for Real-Time Game Agents

arXiv:2606.24470v1 Announce Type: new Abstract: A real-time agent for general computer use - with games as the most demanding case - must act within tens of milliseconds while still planning over seconds. These two regimes sit at opposite ends of the latency-quality tradeoff. A reasoning VLM (Qwen3-VL-8B-Thinking) deliberates effectively but requires ~1.5 s per response - far too slow for a 15 Hz control loop. In contrast, a reactive VLM (MiniCPM-o 4.5) acts in milliseconds but underperforms on planning-heavy tasks. We couple two frozen models of matched scale (9B reactive, 8B reasoning), leaving the communication channel as the sole trainable component. The standard coupling is a Text Bridge (T): the slow model writes a suffix the fast model reads. We introduce a learned continuous Latent Bridge (L) that projects the slow model's residuals into the fast model's input-embedding space in a LLaVA-style manner, avoiding any text round-trip; both are compared against Fast-Only (F). On 7 Atari games and a driving domain (MetaDrive), tuning the action decoder per channel on held-out seeds, the Latent Bridge matches or beats the Text Bridge in every domain: it significantly improves two games (MsPacman +57%, RoadRunner +28%) and is a safe drop-in elsewhere. Combining both channels interferes destructively (RoadRunner -96%), so only one should be used. The benefit is highly predictable: the bridge helps if and only if slow reasoning already beats fast reaction (T > F) - the Latent and Text gains over Fast-Only move together at r=0.93. MetaDrive is the controlled negative, where the Latent Bridge is demonstrably inert because the Text Bridge adds no value. We release replay recordings and reproducible pipelines.

14.
arXiv (CS.AI) 2026-06-16

SPARK: Security Knowledge Priming and Representation-Guided Knowledge Activation for LLM-based Secure Code Generation

arXiv:2606.16244v1 Announce Type: cross Abstract: Large language models routinely generate code with exploitable security flaws. Prior literature attributes this limitation to a lack of security expertise, steering current defense mechanisms toward heavy fine-tuning or external knowledge retrieval, which introduces significant computational overhead and data bias through redundant code examples. Contrary to this view, we argue that pretraining corpora are already rich in security material. The bottleneck is activation: without an explicit and brief cue, statistical pressure toward common training-distribution patterns suppresses the model's safety-relevant representations. We present SPARK, an inference-time security harness that activates this latent knowledge without any retraining. The harness has two parts. Component~I retrieves a few of the relevant Common Weakness Enumeration (CWE) entries for each coding task and appends a short structured cue to the prompt; this alone is enough to surface the model's existing security representations. Component~II adds a precomputed token bias to the logits at every decoding step. We obtain the bias by projecting a safe-direction vector, the unit difference between the mean safe and mean unsafe last-layer hidden states, through the language model head. The bias is computed once offline; applying it costs a single vector addition per generated token. We evaluate SPARK on 9 open-source models across C++, Java, and Python, and compare with 7 baselines spanning fine-tuning and retrieval-augmented methods. SPARK matches or improves on the best baseline in every setting while preserving HumanEval utility. We further test Component~I in a black-box setting on 7 of today's strongest models, including Claude, DeepSeek, and GPT, demonstrating the bottleneck of insecure code generation and the improvements enabled by our method.

15.
arXiv (CS.CV) 2026-06-18

Geometry-Aware Dataset Condensation for Diffusion Model Training

Dataset condensation aims to construct compact datasets from real data via synthesis or selection. However, existing approaches are ill-suited for diffusion model training: synthetic data generation often yields low-fidelity samples unsuitable for authentic modeling, while real subset selection typically fails to preserve the distributional geometry required by diffusion likelihood objectives. To address this, we propose to reformulate real subset selection as a geometry-aware distribution alignment problem. By incorporating one-sided partial optimal transport, our method selectively aligns a compact subset with the full data distribution while allowing unmatched mass in low-density regions, ensuring the preserved geometric structure necessary for effective diffusion model training. To further ensure distributional fidelity, we complement geometric alignment with lightweight feature-statistics and semantic consistency regularization. An efficient two-stage discrete optimization strategy is proposed to achieve this alignment objective. Extensive experiments across diffusion variants, subset sizes, image resolutions, and training rounds show that our method achieves superior fidelity and distributional coverage in diffusion model training. Codes are available at https://github.com/2018cx/GADC.

16.
arXiv (CS.CV) 2026-06-11

MSUE: Multi-Modal Soccer Understanding Expert

This paper presents our solution to the 2026 SoccerNet VQA Challenge. We first develop a cost-effective data synthesis pipeline driven by a Vision-Language Model (VLM), which systematically restructures raw domain data into diverse VQA samples, including concise answers and long-form responses. Second, we propose MSUE, a multi-expert question answering architecture that employs a Large Language Model (LLM) to dynamically dispatch questions to text, image, and video experts. These experts are instantiated as a strong text baseline Gemini3-Flash, a fine-tuned Qwen3-VL, and an external knowledge base, respectively, working collaboratively to enhance VQA performance. MSUE achieves an accuracy of 0.95 on the challenge benchmark, securing third place in the leaderboard.

17.
arXiv (CS.AI) 2026-06-25

EPTS: Elastic Post-Training Sparsity for Efficient Large Language Model Compression

arXiv:2606.25285v1 Announce Type: cross Abstract: Post-Training Sparsity (PTS) has emerged as a crucial paradigm for compressing Large Language Models to facilitate efficient deployment on resource-constrained devices. However, existing PTS methodologies are typically confined to Single-Sparsity optimization, necessitating a separate, time-consuming optimization session for each specific sparsity level. This rigid paradigm significantly hinders flexible deployment across diverse hardware scenarios, as adapting to a new sparsity requirement mandates a complete re-optimization process. To address these limitations, we propose Elastic Post-Training Sparsity (EPTS), a unified Multi-Sparsity framework that produces a single elastic model capable of maintaining robust performance across diverse sparsity configurations through a one-shot optimization process. Specifically, we design a Multi-Sparsity Hierarchy LoRA (MS-HiLoRA) mechanism that facilitates knowledge inheritance from low- to high-sparsity groups, effectively mitigating the competition for parameter reconstruction. Furthermore, we introduce a Multi-Sparsity Feature Mixer (MSFM), which significantly enhances the model's adaptability to pruning perturbations by dynamically fusing feature representations of varying sparsity granularities. Extensive experiments on LLaMA and OPT families demonstrate that EPTS achieves competitive performance compared to state-of-the-art methods like SparseGPT and Wanda, while offering significant efficiency gains by enabling multi-scenario deployment from a single optimization. our source code is available at https://github.com/xuke225/EPTS.

18.
arXiv (CS.CL) 2026-06-25

When Certainty Is an Artifact: Keyword Lexicon Blindness and the (Mis)Measurement of Rhetorical Stance

Authors:

Can a statistically significant, large-effect-size finding in computational social science be entirely an artifact of the measurement instrument? We present a case where the answer appears to be yes. Analyzing 85 interviews across four public intellectuals (2016–2026), we find a robust negative-affect/emphatic-certainty lexical co-occurrence pattern under keyword-based scoring ($r = 0.72$–$0.93$, $p < 0.01$ for all four speakers). Replacing keyword counting with LLM-based zero-shot semantic classification on the complete diarized corpus (32,625 sentences) dramatically reduces this correlation: Dalio's $r = 0.851$ drops to $r = 0.206$, with two speakers showing negative $r(neg, emphatic)$ and one showing null. In contrast, the LLM reveals a strong negative-hedging coupling across speakers – Rogoff's $r(neg, hedged) = 0.875$ ($p = 0.001$) and Zeihan's $r(neg, hedged) = 0.722$ ($p = 0.008$) – consistent with the conventional expectation that pessimistic discourse attracts hedging, not certainty. Sentence-level error analysis traces this discrepancy to three structural failure modes in keyword lexicons – syntactic blindness, polysemy blindness, and categorical absence – illustrated through cases where keyword counting inverts semantic meaning (e.g., ''never absolutely totally confident'' scored as high-certainty). We argue that keyword lexicons measure a universal lexical co-occurrence tendency – negative discourse naturally attracts emphatic vocabulary – that is orthogonal to, and can systematically invert, rhetorical stance. Treating keyword counts as measurements of epistemic certainty is a category error: a finding that appears to be about a speaker's psychology may be entirely about the counting of words.

19.
arXiv (quant-ph) 2026-06-19

On chip, multifunctional quantum sensing using single spins in a van der Waals crystal

arXiv:2606.19978v1 Announce Type: new Abstract: Nanoscale thermometry and magnetometry are in high demand across a wide range of scientific and technological applications. In this context, optically addressable spins in solids have emerged at the forefront of on-chip quantum sensing. However, simultaneous quantum sensing of multiple parameters (e.g., temperature and magnetic field) using the same spin sensor remains challenging due to cross-sensitivity to multiple physical quantities. Here, we demonstrate independent dual sensing of temperature and magnetic field using single quantum emitters in hexagonal boron nitride (hBN). We experimentally verify the independent response of the zero-phonon line (ZPL) position to temperature and of optically detected magnetic resonance (ODMR) to magnetic fields. Furthermore, we demonstrate local temperature sensing of a microcircuit while simultaneously measuring an external magnetic field. Our results establish quantum emitters in hBN as a robust platform for multifunctional quantum sensing under realistic operating conditions.

20.
arXiv (CS.LG) 2026-06-19

The Significance of Style Diversity in Annotation-Free Synthetic Data Generation

arXiv:2606.20400v1 Announce Type: new Abstract: Generating high-utility synthetic data for intent classification typically requires human-annotated seed data, which is often unavailable in fast-paced industrial settings. In this paper, we propose a framework for synthetic dialogue generation that works entirely without human-annotated data, relying solely on intent definitions. Our proposed dialogue generation framework utilizes two different types of topic and style attributes to improve data diversity. Also, we propose two novel post-hoc stylization models called Univ and Exam to transform synthetic LLM-generated utterances into more varied, human-like linguistic styles. To enhance data quality, we utilize an LLM-as-a-judge filtering process. Experimental results on both industrial and public datasets demonstrate that the proposed approach achieves up to 93.3% of the performance obtained using human-annotated training data. Crucially, the findings reveal that style diversity is more critical than topic diversity for synthetic data utility, as it prevents models from learning spurious stylistic correlations. Furthermore, the study shows that incorporating style attributes during the generation process is more effective than post-hoc style adaptation.

21.
arXiv (CS.AI) 2026-06-17

PIVOT: Bridging Black-Scholes Implied-Volatility and Price Objectives via Differentiable Jäckel Operator

arXiv:2606.17065v1 Announce Type: cross Abstract: Modern option-learning systems operate in two coordinates: price space, where markets quote and no-arbitrage constraints are most naturally enforced, and implied volatility (IV) space, where volatility surfaces are smoothed, regularized, and evaluated. The bottleneck is interface, not approximation: Jäckel's seminal "Let's Be Rational" (LBR) solver already inverts the Black-Scholes price to machine precision efficiently. What is missing is a differentiable layer that preserves LBR in the forward pass and avoids backpropagating through its branch logic. Such a layer must also confront the unavoidable singularity of the inverse map in the low-vega regime, where the sensitivity 1/vega diverges as vega -> 0. We close this gap with PIVOT, the Price-Implied-Volatility Objective Translator. PIVOT keeps the LBR forward pass intact and supplies the backward pass by implicit differentiation through the smooth Black-Scholes/Black-76 price map, with an explicit gating contract: invalid domains return NaN, well-conditioned rows receive the exact 1/vega gradient, and low-vega rows are attenuated rather than silently regularized. On a single H100, a fused Triton kernel reaches 1.79e9 IV/s at machine precision (9.3e-14 max relative error vs. the reference C solver); end-to-end label generation sustains 48.9M/s on synthetic chains and 16.6M/s on SPX OptionMetrics. In a HyperIV-style one-day reproduction on SPX, PIVOT-augmented objectives Pareto-dominate the baselines, reducing held-out price MAE by up to 43.4% and the strongest three-seed gated objective improving price MAE by 38.8% and IV MAE by 21.3% jointly; cross-asset results on RUT, VIX, and NDX show directional price-MAE gains of 40.1%, 24.2%, and 16.7%, while an ungated IV-roundtrip control collapses to a degenerate near-zero surface, confirming the gate as a correctness contract rather than a tuning knob.

23.
arXiv (CS.CL) 2026-06-17

EnvRL: Learn from Environment Dynamics in Agentic Reinforcement Learning

Reinforcement learning (RL) has emerged as a powerful paradigm for training Large Language Models (LLMs) as agents. However, conventional RL methods for long-horizon agentic tasks often struggle with sparse outcome rewards. Intuitively, this overlooks the rich environment dynamics information contained in rollout interaction trajectories. We argue that the interaction experience inherently serves as an implicit supervision signal, reveals the underlying transition mechanisms of the environment, and enables the agent to construct a more accurate internal model of the environment.. Therefore, in this work, we investigate how to leverage this additional signal to improve policy learning. Specifically, we propose EnvRL, a framework that incorporates environment dynamics learning into agentic RL via two auxiliary objectives: state prediction and inverse dynamics. By jointly optimizing with the primary RL objective, we encourage the agent to internalize environment dynamics from its own interaction experience. Extensive experiments on two long-horizon agentic benchmarks demonstrate that EnvRL achieves significant improvements on success-rates over RL-only baselines, e.g., when trained with GRPO, lifting Qwen-2.5-1.5B-Instruct from 72.8% to 77.4% on ALFWorld, and from 56.8% to 67.0% on WebShop.

24.
arXiv (CS.CL) 2026-06-16

TokenPilot: Cache-Efficient Context Management for LLM Agents

As LLM agents are deployed in long-horizon sessions, context accumulation drives up inference costs. Existing approaches utilize text pruning or dynamic memory eviction to minimize token footprints; however, their unconstrained sequence mutations alter layouts, introducing prefix mismatches and cache invalidation. This reveals a critical trade-off between text sparsity and prompt cache continuity. To address this, we present TokenPilot, a dual-granularity context management framework. Globally, Ingestion-Aware Compaction acts as a framework harness to stabilize prompt prefixes and eliminate open-world environmental noise at the ingestion gate. Locally, Lifecycle-Aware Eviction monitors the ongoing residual utility of context segments, enforcing a conservative batch-turn schedule to offload content segments only when task relevance expires. Experiments on PinchBench and Claw-Eval under both isolated and continuous modes demonstrate that TokenPilot reduces costs by 61% and 56% in isolated mode, and 61% and 87% in continuous mode, while maintaining competitive performance compared to prior systems. TokenPilot has been integrated into LightMem2 at https://github.com/zjunlp/LightMem2.

25.
arXiv (CS.AI) 2026-06-16

Demystifying Variance in Circuit Discovery of LLMs

arXiv:2606.16920v1 Announce Type: cross Abstract: Circuit discovery is a key technique in mechanistic interpretability to pinpoint the model components that are crucial for performing a given task. Although the current state-of-the-art method (EAP-IG) performs well on the metric of (un)faithfulness, it suffers from substantial variability. This includes resampling variance, where the circuit changes when we probe with a new batch of data from the same distribution; rephrasing variance, where the discovered circuit shifts when the prompts are rephrased; and sample-wise variance, where a circuit with low population unfaithfulness exhibits large fluctuations in unfaithfulness across individual samples. This paper studies the roots of these variances. We demonstrate that CEAP, our new circuit discovery method that improves upon EAP-IG with a theoretical guarantee, can substantially lessen resampling variance. We further show that rephrasing variance arises because prompts with different templates tend to activate different circuits in the model. This leads us to argue that it may be challenging to find a comprehensive circuit that explains and controls the model's behavior on a task, which can be expressed in countless templates, suggesting that LLMs may be inherently hard to steer. We show that sparsity, which has been claimed to form more compact and interpretable task circuits, fails to solve this problem. Regarding sample-wise variance, we argue that it is largely benign: extremely poor unfaithfulness scores often stem from how unfaithfulness is defined, rather than from defects in the measured circuits. We show that the magnitude of unfaithfulness is affected by selective contribution scaling, a neural mechanism that accounts for the extremely poor scores sometimes observed.