Academic Intelligence · Curated Daily

Explore the Frontier of Global Academia

AcademicHub aggregates real-time literature from top journals and preprint platforms. Build your personal research radar and let large language models compile cross-disciplinary analysis briefings automatically.

01.
arXiv (CS.LG) 2026-06-19

How to sketch a learning algorithm

Authors:

arXiv:2604.07328v3 Announce Type: replace Abstract: How does the choice of training data influence an AI model? This broad question is of central importance to interpretability, privacy, and basic science. At its technical core is the data deletion problem: after a reasonable amount of precomputation, quickly predict how the model would behave in a given situation if a given subset of training data had been excluded from the learning algorithm. We present a data deletion scheme capable of predicting model outputs with vanishing error $\varepsilon$ and failure probability $\delta$ in the deep learning setting. Our precomputation and prediction algorithms are only $\tilde{O}(\log(1/\delta)/\varepsilon^2)$ factors slower than regular training and inference, respectively. The storage requirements are those of $\tilde{O}(\log(1/\delta)/\varepsilon^2)$ models. Our proof is based on an assumption that we call stability. In contrast to the assumptions made by prior work, stability appears to be fully compatible with learning powerful AI models. In support of this, we show that stability is satisfied in a minimal set of experiments with microgpt. Our code is available at https://github.com/SamSpo1/microgpt-sketch. At a technical level, our work is based on a new method for locally sketching an arithmetic circuit by computing higher-order derivatives in random complex directions. Forward-mode automatic differentiation allows cheap computation of these derivatives.

02.
arXiv (CS.LG) 2026-06-18

Fair Online Resource Allocation

arXiv:2606.18679v1 Announce Type: cross Abstract: We study the problem of fair online resource allocation, motivated by applications such as refugee resettlement and airline scheduling, where agents arrive sequentially and must be assigned to facilities with limited capacities. We introduce a model that maximizes the overall welfare subject to resource constraints and a Lipschitz fairness requirement, which ensures that similar agents arriving in the same batch receive similar expected outcomes. We first analyze the offline problem, proving that the value of the optimal fair allocation is at least an $\Omega(1/\gamma)$ fraction of the optimal unfair allocation, where $\gamma$ is the fairness coefficient, thereby bounding the price of fairness. For the online setting, we propose an algorithm based on dual mirror descent that enforces fairness constraints within batches while estimating optimal dual variables. We prove that this algorithm achieves sublinear regret relative to the optimal offline fluid benchmark. Finally, we validate our theoretical results using real-world data from the Refugee Economies Programme, demonstrating the algorithm's performance and examining the trade-offs between welfare maximization and fairness enforcement.

04.
arXiv (CS.LG) 2026-06-19

Distributionally Robust Set Representation Learning Under Inference-Time Element Corruption

arXiv:2605.30089v2 Announce Type: replace Abstract: Standard Set Representation Learning methods typically excel on curated data but often overlook the challenge of inference-time element corruption. This refers to scenarios where deployed models encounter element-level degradations, such as outliers or missing components, that may distort set representation and degrade performance. We propose SW-DRSO, a distributionally robust optimization framework tailored for sets. Rather than minimizing loss solely on observed training data, SW-DRSO optimizes a tractable surrogate of the worst-case expected loss over a family of plausible inference-time variations. We introduce a barycentric adversary that approximates the intractable search over corrupted sets by a differentiable training-time optimization over simplex weights. Extensive experiments across four tasks demonstrate that SW-DRSO effectively enhances robustness against corruption while maintaining high overall performance.

05.
arXiv (math.PR) 2026-06-16

Excursion Fluctuations and Spectral Universality in Gaussian Fields

arXiv:2606.15630v1 Announce Type: new Abstract: We study the large-scale spatial fluctuations of excursion volumes for a class of smooth stationary Gaussian fields. In the case of Berry's random wave model in dimension $d \geq 2$, we show that the spatial fluctuations for fixed $u>0$ converge to the fractional Gaussian field $(-\Delta)^{-1/4}W$ in the space of tempered distributions $\mathcal S'(\mathbb{R}^d)$, where $W$ is the $d$-dimensional Gaussian white noise. This explains the long-range correlations in the apparent filament structure of the Random Plane Wave model. For a class of smooth planar Gaussian fields whose spectral density has a power-law singularity at the origin, we prove convergence to fractional Gaussian fields with an index determined by the singularity exponent. More generally, the results illustrate that, for stationary random measures, large-scale spatial fluctuations are determined by the behaviour of the spectral measure density exponent near zero.

06.
arXiv (CS.AI) 2026-06-18

DRIFT: Refining Instruction Data via On-Policy Data Attribution

arXiv:2606.18307v1 Announce Type: cross Abstract: Optimizing the training data distribution for Supervised Fine-Tuning (SFT) dictates the capability of Large Language Models (LLMs). While existing data curation methods excel at accelerating training under constrained budgets, they are less suited to elevating the capability upper bound. The challenge here is no longer to identify a smaller subset that preserves performance, but to refine the data distribution toward instances most capable of improving the final model. To address this problem, we explore instance-level data attribution using Influence Functions (IF). We identify that standard IF formulations struggle in this setting due to two structural limitations: a proximity gap caused by off-policy validation targets, and a severe bias towards gradient norm. We propose DRIFT (Data Refinement via On-Policy Influence Functions for Supervised Fine-Tuning). Instead of relying on external reference data, DRIFT utilizes the model's on-policy rollouts as validation targets, which empirically minimizes the parameter proximity gap and better aligns with the local neighborhood assumption of IF. It further applies signed weighting based on trajectory correctness and debiases influence scores against the gradient hacking issue, allowing a small set of validation queries to act as reliable anchors for attributing the full dataset. Experiments on 7B-parameter instruction and reasoning models show that DRIFT consistently raises the performance ceiling on both, outperforming existing data curation baselines.

07.
arXiv (CS.LG) 2026-06-17

VISTA: Scale-Aware Visual Navigation via Action History Conditioning

arXiv:2606.17294v1 Announce Type: cross Abstract: Vision Navigation Foundation Models (VNMs) promise end-to-end learned navigation policies capable of zero-shot deployment across diverse embodiments and environments. To maintain generality, many vision-based navigation models predict normalized actions. However, this normalization introduces a critical deployment vulnerability: applying different scaling factors to the same normalized trajectory alters its physical geometry, which degrades navigation performance and increases collision risks. We address this vulnerability by conditioning the model on normalized action histories alongside image observations, providing explicit context on the relationship between the model's predictions and the robot's actual physical displacement. Furthermore, current VNMs often struggle in visually repetitive environments that lack distinct features. To resolve this issue, we integrate a DINOv3 encoder, whose richer representations enable our model to capture both spatial and geometric dimensions between observations. VISTA generalizes robustly to out-of-distribution environments, achieving 100% goal prediction accuracy in zero-shot, real-world deployment in Outdoor, Forest and Office settings, and an average of 95% checkpoints crossed, demonstrating consistent path following in unseen environments.

08.
arXiv (CS.CL) 2026-06-16

How Much Can We Trust LLM Search Agents? Measuring Endorsement Vulnerability to Web Content Manipulation

Large language model (LLM)-based search agents synthesize open-web content into actionable recommendations on behalf of users, creating a risk that attacker-published pages are transformed into endorsed claims. We introduce SearchGEO, a controlled evaluation framework for measuring endorsement corruption in LLM-based web-search agents, combining a web-evidence manipulation pipeline, a five-mode attack taxonomy, and multiple output-level metrics. We evaluate 13 LLM backends on 308 cases each. Results show that vulnerability patterns vary across backends: overall attack success rate (ASR) ranges from 0.0% on Claude-Sonnet-4.6 to 31.4% on Gemini-3-Flash, the strongest attack mode differs by model family, and the same deployment scaffold could amplify or decrease ASR on different backends. An auxiliary agent-skill probe, where endorsement becomes an install command, exposes a sharp split among otherwise robust backends: Claude over-rejects while GPT over-trusts. These findings argue for treating recommendation reliability under adversarial search content as a first-class dimension of backend safety evaluation.

09.
arXiv (CS.CL) 2026-06-16

Oops, Wait: Discourse Tokens Matter in Reasoning Model

Recent studies suggest that even data-efficient training with ($\simeq$1K) reasoning trajectories can induce non-trivial reasoning capabilities in large language models through post-training. Such training corpora often contain iconic tokens such as "wait", "so", and "alternatively", which frequently appear in reasoning trajectories and may play a role in this process. This paper focuses on characterizing observable token-level patterns in post-training and a case study of how data-efficient supervised fine-tuning (SFT) differs from, and falls short of, large-scale post-training. To this end, we first identify tokens that correlate with correct answers along reasoning trajectories across models and training setups. We then focus on the distribution and (functional) roles of the "wait" token to primarily study the model trained in a data-efficient manner compared with the counterpart. Our study finds that discourse tokens are associated with correctness and a reasoning accuracy jump, even in data-efficient SFT. This suggests data-efficient SFT can partially reproduce discourse-token patterns to mimic meaningful reasoning behavior, but the patterns are less aligned with high-confidence answer transitions than those from large-scale post-training.

10.
arXiv (CS.AI) 2026-06-16

Multi-Granular Node Pruning for Causal Circuit Discovery

arXiv:2512.10903v2 Announce Type: replace Abstract: Circuit discovery aims to identify minimal subnetworks that are responsible for specific behaviors in large language models (LLMs). Existing approaches primarily rely on iterative edge pruning, which is computationally expensive and limited to coarse-grained units such as attention heads or MLP blocks, overlooking finer structures like individual neurons. We propose a node-level pruning framework for circuit discovery that addresses both scalability and granularity limitations. Our method introduces learnable masks across multiple levels of granularity, from entire blocks to individual neurons, within a unified optimization objective. Granularity-specific sparsity penalties guide the pruning process, allowing a comprehensive compression in a single fine-tuning run. Empirically, our approach identifies circuits that are smaller in nodes than those discovered by prior methods; moreover, we demonstrate that many neurons deemed important by coarse methods are actually irrelevant, while still maintaining task performance. Furthermore, our method has a significantly lower memory footprint, 5-10x, as it does not require keeping intermediate activations in the memory to work.

11.
arXiv (CS.CL) 2026-06-11

TokenRatio: Principled Token-Level Preference Optimization via Ratio Matching

Direct Preference Optimization (DPO) is a widely used RL-free method for aligning language models from pairwise preferences, but it models preferences over full sequences even though generation is driven by per-token decisions. Existing token-level extensions typically decompose a sequence-level Bradley-Terry objective across timesteps, leaving per-prefix (state-wise) optimality implicit. We study how to recover token-level preference optimality using only standard sequence-level pairwise comparisons. We introduce Token-level Bregman Preference Optimization (TBPO), which posits a token-level Bradley-Terry preference model over next-token actions conditioned on the prefix, and derive a Bregman-divergence density-ratio matching objective that generalizes the logistic/DPO loss while preserving the optimal policy induced by the token-level model and maintaining DPO-like simplicity. We introduce two instantiations: TBPO-Q, which explicitly learns a lightweight state baseline, and TBPO-A, which removes the baseline through advantage normalization. Across instruction following, helpfulness/harmlessness, and summarization benchmarks, TBPO improves alignment quality and training stability and increases output diversity relative to strong sequence-level and token-level baselines.

12.
arXiv (CS.CV) 2026-06-17

Where Should Action Generation Begin? A Learnable Source Prior for Generative Robot Policies

Generative robot policies typically begin action generation from an observation-independent standard Gaussian distribution, leaving the choice of source distribution underexplored. This work asks a simple question: where should action generation begin? We propose LeaP, a Learnable source Prior that replaces the standard Gaussian with a proprioception-conditioned diagonal Gaussian over action chunks. Parameterized by a lightweight MLP, LeaP jointly predicts the mean and state-adaptive variance of the source distribution, while keeping the downstream generator architecture and inference solver unchanged. This design provides an observation-informed yet stochastic initialization, allowing the generator to focus on precise action refinement rather than transporting samples from an uninformed noise source. On 15 RoboTwin manipulation tasks, LeaP achieves an average success rate of 81.6%, outperforming four representative baselines – including deterministic-source methods, a no-prior counterpart, and a diffusion-bridge policy – by 6.5 to 25.5 percentage points. The same prior consistently improves both flow-matching and diffusion-bridge generators, while using fewer parameters and converging faster. The advantage carries over to real-world deployment, where LeaP attains the best performance. These results suggest that the source distribution is an independent and reusable design axis for generative robot policies, complementary to the choice of generative dynamics.

13.
arXiv (CS.LG) 2026-06-17

MiniFool – Physics-Constraint-Aware Minimizer-Based Adversarial Attacks in Deep Neural Networks

arXiv:2511.01352v2 Announce Type: replace Abstract: In this paper, we present a new algorithm, MiniFool, that implements physics-inspired adversarial attacks for testing neural network-based classification tasks in particle and astroparticle physics. While we initially developed the algorithm for the search for astrophysical tau neutrinos with the IceCube Neutrino Observatory, we apply it to further data from other science domains, thus demonstrating its general applicability. Here, we apply the algorithm to the well-known MNIST data set and furthermore, to Open Data data from the CMS experiment at the Large Hadron Collider. The algorithm is based on minimizing a cost function that combines a $\chi^2$ based test-statistic with the deviation from the desired target score. The test statistic quantifies the probability of the perturbations applied to the data based on the experimental uncertainties. For our studied use cases, we find that the likelihood of a flipped classification differs for both the initially correctly and incorrectly classified events. When testing changes of the classifications as a function of an attack parameter that scales the experimental uncertainties, the robustness of the network decision can be quantified. Furthermore, this allows testing the robustness of the classification of unlabeled experimental data.

14.
bioRxiv (Bioinfo) 2026-06-10

Promera: a unified model for biomolecular structure prediction, filtering, and design

Generative models have become staple tools for modeling and designing biomolecular structures. However, although these tools have improved in structural prediction accuracy, their ability to filter designed binders—an essential use case—remains insufficient; whereas design methods have focused more on unconstrained binder generation rather than capabilities enabled by controllable design. We introduce Promera, a unified generative model that combines all-atom structure prediction with improved filtering and controllable design. We find that Promera's confidence metrics are more accurate for filtering binders from non-binders for both miniproteins and nanobodies, while its co-folding performance surpasses popular open-source models (OpenFold3-p2, Boltz-2) on therapeutically relevant categories. As a design model, Promera generates binders by predicting masked protein sequences with optional epitope, paratope, and template constraints. Remarkably, our nanobody designs match the in silico success rates from backprop-based techniques (mBER) when evaluated under co-folding confidence filters. We further provide two in silico demonstrations of the the versatile capabilities of our design method: epitope targeting of the Andes hantavirus glycoprotein with VHHs and active state stabilization of the beta-2 andrenergic GPCR. We conclude by proposing a scaling law for co-folding models, suggesting a path for further performance improvement.

15.
medRxiv (Medicine) 2026-06-11

Corticospinal tract risk modifies motor recovery after minimally invasive surgery for intracerebral hemorrhage: a secondary analysis of MISTIE-III

Objective: Outcome after surgical hematoma evacuation for intracerebral hemorrhage (ICH) depends on hematoma location. As corticospinal tract (CST) integrity affects motor recovery after stroke, we hypothesized that CST integrity drives heterogeneity in surgical outcomes and investigated this in a secondary analysis of MISTIE-III participants. Methods: Risk of CST injury was categorized into four levels, based on the interaction between the CST, the hematoma, and perihematomal edema (PHE) on automatically segmented stability CT: no risk, PHE infiltration, hematoma infiltration, and complete interruption of the CST. Associations with outcome were tested using multivariable linear regression for motor National Institutes of Health Stroke Scale (NIHSS) at day 180 and ordinal regression for modified Rankin Scale (mRS) at day 365, introducing an interaction term between CST risk and treatment group. Results: Day 180 motor NIHSS was significantly lower for 'no risk' ({beta}:-3.77, [95% confidence interval [CI]: -5.8 to -1.70], p=0.0003) and 'PHE infiltration' ({beta}:-2.3, [95%CI: -3.5 to -1.1]; p=0.0002) vs. 'complete interruption'. Surgery was associated with lower Day 180 motor NIHSS in participants with hematoma infiltration ({beta}:-2.07, [95%CI: -3.8 to -0.4], p=0.016). Compared to complete interruption, 'no risk' (adjusted odds ratio [aOR]:0.27, [95%CI: 0.10 to 0.74], p=0.01) and 'PHE infiltration' (aOR:0.41, [95%CI: 0.23 to 0.74]; p=0.003) were associated with lower odds of unfavorable day 365 mRS. Surgery was associated with lower mRS in participants with no risk (aOR:0.23, [95%CI: 0.05 to 0.97, p=0.045). Interpretation: Increasing CST risk is associated with worse motor recovery (day 180) and disability (day 365). CST risk modifies the effect of the MISTIE-III procedure on motor recovery and disability.

16.
arXiv (quant-ph) 2026-06-19

Sparse Configuration Interaction for the Electronic Schrödinger Equation Revisited: Complete Basis Set Limit Complexity and Quantum-Encoding Impact

arXiv:2606.20385v1 Announce Type: new Abstract: In this article we revisit regularity results for eigenfunctions in the discrete spectrum of the electronic Schrödinger equation and study their consequences for approximation complexity. In particular, for the convergence to the complete basis set limit, it can be shown that the curse of dimensionality in the leading algebraic exponent can be mitigated. That is, for general sparse grid constructions, the main term of the convergence rate with respect to the number of degrees of freedom is independent of the number of electrons. These insights indicate potential benefits for classical numerical solvers of the electronic Schrödinger equation and also for quantum-computing approaches through new qubit-efficient wavefunction encodings.

17.
arXiv (CS.AI) 2026-06-19

Library-Aware Doubles and Iterative Repair for Large Language Model-Generated Unit Tests in OpenSIL Firmware

arXiv:2606.19725v1 Announce Type: cross Abstract: Validating changes in low-level C firmware is expensive because unit tests (UTs) are fragile under strict build constraints, where missing headers, unresolved symbols, and dependency mismatches frequently prevent compilation and linking. This study introduces an automated UT authoring workflow for the Open-Source Silicon Initialization Library (openSIL) firmware codebase maintained by Advanced Micro Devices (AMD) that reduces manual effort through a large language model (LLM) guided multi-agent pipeline. The workflow combines automated generation of test scaffolds, library-aware creation or reuse of stubs, mocks, and fakes, and an iterative compile-dispatch repair loop driven by build logs and line-coverage feedback. We evaluate the approach using compilation success, repair iterations, dispatch success, and line coverage, with time, cost, and token usage as secondary measures. Across 76 functions under test, the workflow generated compilable UTs for 73 functions. In a configuration without line coverage guidance or retrieval augmentation, mean line coverage reached 73.9%. On a 48-function subset evaluated under both configurations, mean line coverage reached 98.8% with line-coverage guidance alone and reached 94.7% when combined with vector-database retrieval. Results show that automated generation-and-repair pipelines can substantially improve UT creation efficiency and coverage for constrained firmware environments while reducing manual debugging effort.

18.
arXiv (CS.CV) 2026-06-19

CalTennis: Large Multi-View Tennis Video Dataset and Benchmark of Monocular-to-3D Pose Estimation

The Caltech Tennis Dataset (CalTennis) is a large-scale video benchmark for evaluating monocular-to-3D pose estimation in the wild. CalTennis comprises over 11 million frames (51 hours) of tennis practice and match play from 40 players, captured with 2-6 synchronized cameras at 60 Hz. It is 10 times larger than existing in-the-wild human motion video datasets and 3 times larger than existing MOCAP-ground-truthed datasets, and it is the first large-scale benchmark to provide synchronized multi-view recordings of expert athletic motion. The multi-view setup enables inexpensive, label-free evaluation of monocular-to-3D pose estimation algorithms. We describe a simple, standardized protocol that enables data collection without specialized equipment or expertise, along with fully automated video calibration and synchronization. Benchmarking state-of-the-art monocular-to-3D pose methods on CalTennis, we find that while 3D joint angle recovery is now quite accurate, all models struggle to estimate depth and foot contact consistently. We further propose two novel performance metrics, footwork and stability, as well as qualitatively study body shape inconsistency. These metrics expose previously underexplored failure modes and point to concrete opportunities for improvement in pose estimation and action analysis.

19.
arXiv (CS.AI) 2026-06-16

Automating Low-Risk Code Review at Meta: RADAR, Risk Calibration, and Review Efficiency

arXiv:2605.30208v2 Announce Type: replace-cross Abstract: AI-assisted coding tools have altered software production. At Meta, significant lines of code per human-landed diff grew by 105.9% year over year and per-developer diff volume rose 51%, with agentic AI responsible for over 80% of that growth. Meanwhile, the share of diffs receiving timely review has declined, exposing a widening gap between code supply and reviewer bandwidth. We ask three questions that progress from feasibility through calibration to impact: (1) can risk-stratified automation operate at scale across diverse organizations, (2) how does tuning the risk threshold affect the trade-off between automation yield and safety, and (3) to what extent does automated review reduce end-to-end latency for AI-generated changes? We deployed RADAR (Risk Aware Diff Auto Review), a multi-stage funnel that classifies each diff by authorship and source type, applies eligibility gates, static heuristics, a machine-learned Diff Risk Score, LLM-based Automated Code Review, and deterministic validation before landing qualifying changes. We evaluate RADAR through telemetry covering 535K+ RADAR-reviewed diffs, observational before-after comparisons for policy changes, and difference-in-differences analysis of efficiency outcomes. RADAR has reviewed 535K+ diffs and landed 331K+. Relaxing the Diff Risk Score threshold from the 25th to the 50th percentile increased the approve rate to 60.31%. The revert rate for RADAR-reviewed diffs is 1/3 that of non-RADAR diffs, and the Production Incident rate is 1/50 that of non-RADAR diffs. RADAR reduces median time to close by over 330% and median diff review wall time by 35%. Risk-aware layered automation can materially reduce review bottlenecks created by AI-driven code growth without compromising production safety.

20.
bioRxiv (Bioinfo) 2026-06-16

Expanding gene regulatory networks from transcriptome data through graphical modeling with heterogeneous priors

Gene regulatory network inference is widely used to reconstruct large-scale networks and identify functional genes from transcriptome data. Meanwhile, in many biological fields, core regulatory genes have been extensively studied, leading to the establishment of small-scale gene regulatory networks, and novel genes connected to these networks remain to be identified. However, methods for expanding existing gene networks by identifying novel regulatory interactions, rather than reconstructing the entire network, are not well established. Here, we propose a method for gene network expansion that incorporates known regulatory relationships and evaluates each candidate gene individually to infer its regulatory connections to the existing network. Using simulated datasets from the DREAM4 benchmark and the PRECISE-1K experimental dataset, our method outperformed conventional methods by incorporating prior knowledge. In particular, it improved the ability to distinguish true regulatory interactions from indirect associations arising from strong correlations among genes in the existing network. The method also showed strong performance for interactions involving genes with high outdegree or centrality. Furthermore, it maintained stable performance as the size of the existing network increased and was robust to noise in prior information. These results demonstrate that our method provides an effective framework for expanding existing gene regulatory networks by leveraging prior knowledge.

21.
arXiv (CS.CV) 2026-06-19

SAFE-Cascade: Cost-Adaptive Vision-Language Routing for Chart Question Answering

Vision-language models (VLMs) are powerful for chart question answering, but invoking a VLM for every query can be unnecessarily expensive when many questions are answerable from OCR text and lightweight language reasoning. We demonstrate SAFE-Cascade, an interactive system for cost-adaptive chart question answering. Given a chart image and a natural-language question, SAFE-Cascade first extracts chart text with OCR, obtains a provisional answer from a text-only language model, and then uses a learned router to decide whether to accept the text answer or escalate to a VLM. The demo exposes this decision process to users: OCR evidence, text-only answer, routing probability, escalation decision, final answer, estimated cost, and estimated latency are shown side by side. SAFE-Cascade is designed as a transparent interface for understanding when visual grounding is actually needed. Users can upload or select charts, ask questions, inspect the evidence used by each pathway, compare text-only and VLM answers, and adjust the escalation threshold to explore the accuracy-cost frontier. The system is implemented with Azure Document Intelligence for OCR, gpt-5-mini as the text-only model, gemini-2.5-flash-image as the VLM, and a Random Forest router trained on inference-time features. On a held-out ChartQA test split of 375 examples from a 2,500-example experiment, SAFE-Cascade achieves 69.1% unified accuracy with 73.1% VLM invocation, compared with 67.7% accuracy and 100% VLM invocation for the full-VLM baseline. The observed +1.4 percentage-point difference is statistically uncertain, so we interpret SAFE-Cascade as matching full-VLM performance while reducing VLM calls by 26.9% and estimated cost by 9.3%. The demonstration shows how selective modality routing can make multimodal knowledge systems more transparent, tunable, and cost-aware.

22.
arXiv (CS.AI) 2026-06-11

Offline Diffusion Policy for Multi-User Delay-Constrained Scheduling

arXiv:2501.12942v2 Announce Type: replace Abstract: Effective multi-user delay-constrained scheduling is crucial in various real-world applications, including embodied AI, instant messaging, live streaming, and data center management, where efficient resource allocation is required among users with diverse delay sensitivities. In these scenarios, schedulers must make real-time decisions to satisfy both delay and resource constraints without prior knowledge of system dynamics, which are often time-varying and challenging to estimate. {Current learning-based methods typically require online interactions with actual systems during the training stage. Therefore, these approaches are often difficult or impractical, as they can significantly degrade system performance and incur substantial service costs.} To address these challenges, we propose a novel offline reinforcement learning-based algorithm, named \underline{S}cheduling By \underline{O}ffline Learning with \underline{C}ritic Guidance and \underline{D}iffusion Model (SOCD), to learn efficient scheduling policies purely from pre-collected offline data. SOCD innovatively employs a diffusion policy, complemented by a sampling-free critic network for policy guidance. By integrating the Lagrangian multiplier optimization into the offline reinforcement learning, SOCD efficiently trains high-quality constraint-aware policies exclusively from available datasets, eliminating the need for online interactions with the system. Experimental results demonstrate that SOCD is resilient to various system dynamics, including partially observable and large-scale environments, and delivers superior performance compared to existing methods.

23.
arXiv (CS.CL) 2026-06-16

LoLA: Low-Rank Linear Attention With Sparse Caching

The per-token cost of transformer inference scales with context length, preventing its application to lifelong in-context learning. Linear attention is an efficient alternative that maintains a constant memory footprint, even on infinite context lengths. While this is a potential candidate for lifelong learning, it falls short in memory capacity. In this paper, we propose LoLA, a training-free augmentation to linear attention that boosts associative recall. LoLA distributes past key-value pairs from context into three memory systems: (i) recent pairs in a local sliding window cache; (ii) difficult-to-memorize pairs in a sparse, global cache; and (iii) generic pairs in the recurrent hidden state of linear attention. We show through ablations that our self-recall error metric is crucial to efficiently manage long-term associative memories. On pass-key retrieval tasks, LoLA improves the base model's performance from 0.6% to 97.4% accuracy. This is achieved with a 4.6x smaller cache than Llama-3.1 8B on 4K context length. LoLA also outperforms other 1B and 8B parameter subquadratic models on zero-shot commonsense reasoning tasks.

24.
arXiv (CS.CL) 2026-06-11

Evolving Agents in the Dark: Retrospective Harness Optimization via Self-Preference

AI agents rely on a harness of skills, tools, and workflows to solve complex problems. Continually improving this harness is essential for adapting to new tasks. However, existing optimization methods typically require ground-truth validation sets, yet such labeled data is difficult to acquire in practical deployment settings. To address this problem, we introduce Retrospective Harness Optimization (RHO), a self-supervised method that optimizes the agent harness using only past trajectories. Specifically, RHO selects a diverse coreset of challenging tasks from past trajectories and re-solves them in parallel. The agent analyzes these rollouts using self-validation and self-consistency, then generates candidate harness updates and selects the most effective one by its own pairwise self-preference. We evaluate RHO across three diverse domains, spanning software engineering, technical work, and knowledge work. Notably, a single optimization round improves the pass rate on SWE-Bench Pro from 59% to 78% without any external grading. Furthermore, our analysis demonstrates that RHO effectively targets prior failure modes. As a result, the optimized harness alters the agent's behavior patterns and sustains higher accuracy during long-horizon sessions.

25.
arXiv (math.PR) 2026-06-11

Hilbert space embeddings of independence tests and interaction measures of several variables

arXiv:2411.08653v2 Announce Type: replace-cross Abstract: We present a unified theoretical framework for kernel-based measures of dependence on product spaces. Building on the ideas underlying distance covariance, distance multivariance, and the Hilbert-Schmidt Independence Criterion (HSIC), we define a new family of kernels on an $n$-fold Cartesian product, termed positive definite independent of order $k$ (PDI$_{k}$ kernels). These kernels extend the concepts of positive definite and conditionally negative definite kernels to higher orders and provide the foundation for generalized independence and interaction tests, such as the generalized Lancaster interaction of order $k$ ($\Lambda_{k}^{n}$), and the Streitberg interaction ($\Sigma$). Our analysis focuses on the continuous setting, where we prove a Kernel Mean Embedding Theorem for PDI$_{k}$ kernels and establish the corresponding integrability restrictions. Based on these results, we characterize how the Kronecker products of PDI kernels behave.