Academic Intelligence · Curated Daily

探索全球前沿学术脉络

AcademicHub 汇聚顶级期刊与预印本平台的实时文献。定制您的专属科研雷达,利用大语言模型自动生成交叉领域文献分析简报。

01.
arXiv (CS.LG) 2026-06-12

GF-DiT: Scheduling Parallelism for Diffusion Transformer Serving

arXiv:2606.13501v1 Announce Type: cross Abstract: Diffusion Transformers (DiTs) have become the dominant architecture for image and video generation, creating growing demand for efficient DiT serving. Existing systems assign each request a fixed parallel configuration throughout its lifetime. However, DiT workloads exhibit substantial heterogeneity across requests, execution stages, and system conditions, making static parallelism inefficient and often leading to poor GPU utilization and degraded service quality. This paper argues that DiT serving should treat GPU parallelism as a first-class schedulable resource. We present GF-DiT, a policy-programmable runtime for elastic DiT serving that dynamically adapts the parallelism of running requests according to workload demands and service objectives. GF-DiT introduces an asynchronous execution abstraction that decomposes requests into independently schedulable trajectory tasks and enables online GPU reallocation. To make elastic parallelism practical, GF-DiT further proposes group-free collectives, a lightweight communication abstraction that supports low-overhead online formation and reconfiguration of arbitrary execution groups. We implement GF-DiT in vLLM-Omni and evaluate it on representative image and video diffusion workloads. Compared with fixed-pipeline execution with static parallelism, GF-DiT improves throughput by up to 6.01$\times$, reduces mean latency by up to 95%, lowers SLO violation rates by up to 90%, and reduces communication-group setup overhead from 778 ms to approximately 60 $\mu$s.

02.
arXiv (math.PR) 2026-06-16

Cluster sizes in subcritical soft Boolean models

arXiv:2404.13730v2 Announce Type: replace Abstract: We consider the soft Boolean model, a model that interpolates between the Boolean model and long-range percolation, where vertices are given via a stationary Poisson point process. Each vertex carries an independent Pareto-distributed radius and each pair of vertices is assigned another independent Pareto weight with a potentially different tail exponent. Two vertices are now connected if they are within distance of the larger radius multiplied by the edge weight. We determine the tail behaviour of the Euclidean diameter and the number of points of a typical maximally connected component in a subcritical percolation phase. For this, we present a sharp criterion in terms of the tail exponents of the edge-weight and radius distributions that distinguish a regime where the tail behaviour is controlled only by the edge exponent from a regime in which both exponents are relevant. Our proofs rely on fine path-counting arguments identifying the precise order of decay of the probability that far-away vertices are connected.

03.
arXiv (CS.CV) 2026-06-11

Understanding Cross-Sensor Feature Variations for Generalizable 3D Perception

Radar-camera BEV perception often suffers from degraded performance when evaluated across datasets, as changes in driving scenes, sensor configurations, and environmental conditions can alter both the input observations and the internal fused representations. This work studies this issue from the perspective of source-domain variation modeling, aiming to improve the robustness of BEV-based 3D detectors without relying on target-domain samples. We introduce a framework that characterizes visual scene variations in the frequency domain and uses them to synthesize diverse source-domain views. By comparing the resulting fused BEV representations, the framework further captures how image-level variations influence multi-modal BEV features. These variation patterns are then used to regularize the detector, encouraging the learned fusion space to remain stable under latent scene changes. The proposed method is applied only during training and leaves the inference pipeline unchanged. Experiments on cross-dataset radar-camera 3D detection between View-of-Delft and TJ4DRadSet demonstrate consistent improvements over multiple BEV fusion backbones, and the gains remain effective when a small amount of target-domain data is available.

05.
arXiv (CS.AI) 2026-06-16

Beyond Classification: A Cough Regression Benchmark for Respiratory Acoustic Foundation Models

arXiv:2606.15436v1 Announce Type: cross Abstract: Respiratory acoustic foundation models (FMs) excel at cough classification, yet their ability to predict continuous health quantities from cough audio remains largely unexplored, despite the clinical value of passive age, BMI, and disease probability estimation in settings where physical measurements are unavailable. We introduce the multi-model, multi-target cough regression benchmark evaluating five FMs (OPERA-CT, OPERA-CE, OPERA-GT, HeAR, M2D+Resp) across six targets on three datasets under subject-disjoint protocols, comparing linear, MLP-small, and full MLP regression heads. MLP-small beats the mean-predictor baseline on all tasks and linear probing in 23 of 30 model x task cases, with full MLP overfitting on small clinical data but recovering on larger sets, revealing a dataset size x head-capacity trade-off. HeAR leads within-dataset age regression on Coswara (9.12 yr MAE); its CIDRZ result is excluded from headline claims owing to possible HeAR-CIDRZ pretraining overlap. OPERA-GT is favored over OPERA-CT on age in all three datasets, with the CIDRZ margin within seed variance, extending a generative-pretraining advantage from breath to cough. HeAR and M2D+Resp reach near-full performance at N = 50 samples while OPERA models require N = 400. Cross-dataset transfer is strongly asymmetric as large diverse data generalises to small clinical populations (CoughVID to CIDRZ: -0.17 yr) but not vice versa (CIDRZ to Coswara: +2.43 yr, +26.6%).

06.
arXiv (CS.LG) 2026-06-17

Dropout Neural Network Training Viewed from a Percolation Perspective

arXiv:2512.13853v2 Announce Type: replace Abstract: In this work, we investigate the existence and effect of percolation in training deep Neural Networks (NNs) with dropout. Dropout methods are regularisation techniques for training NNs, first introduced by G. Hinton et al. (2012). These methods temporarily remove connections in the NN, randomly at each stage of training, and update the remaining subnetwork with Stochastic Gradient Descent (SGD). The process of removing connections from a network at random is similar to percolation, a paradigm model of statistical physics. If dropout were to remove enough connections such that there is no path between the input and output of the NN, then the NN could not make predictions informed by the data. We study new percolation models that mimic dropout in NNs and characterise the relationship between network topology and this path problem. The theory shows the existence of a percolative effect in dropout. We also show that this percolative effect can cause a breakdown when training NNs without biases with dropout; and we argue heuristically that this breakdown extends to NNs with biases.

07.
arXiv (CS.CV) 2026-06-16

Beyond Scalar Rewards by Internalizing Reasoning into Score Distributions

Reward models are central to text-to-image post-training, but visual preference is subjective and better represented as a distribution over rubric scores than as a deterministic scalar. Existing scalar, score-token, and pairwise reward models over-compress uncertainty and fine-grained score differences, while reasoning-based generative rewards provide stronger judgments but are costly to deploy and difficult to use as direct optimization signals. We propose Z-Reward, a teacher-student reward modeling framework that decouples reasoning-heavy judgment from efficient reward deployment. The teacher is a large VLM that uses reasoning to infer rubric-aligned score distributions, and is trained with Group-wise Direct Score Optimization (GDSO), which combines policy-gradient rewards from distribution expectations with direct pointwise and pairwise supervision on score distributions and score gaps. The student is trained with Reasoning-Internalized Score Distillation (RISD), which transfers the teacher's reasoning-conditioned score distribution into a compact VLM without requiring explicit reasoning chains at inference time. On our internally annotated evaluation set, the 27B GDSO teacher reaches 89.6% human preference accuracy, outperforming SFT, RewardDance, and GRPO, while the 9B RISD student reaches 88.6%, outperforming the OPD baseline and closely matching the larger teacher. We further show that Z-Reward can serve as a differentiable reward signal for text-to-image optimization, yielding a 41.3% net human-preference improvement over the SFT baseline.

08.
bioRxiv (Bioinfo) 2026-06-16

DMcloud: Macromolecular Structure Modeling Using Local Structure Fitting for Medium to Low Resolution cryo-EM maps

Cryogenic electron microscopy (cryo-EM) has become an essential experimental approach in structural biology for determining macromolecular structures. When the resolution of a cryo-EM map is worse than approximately 5[A], fitting known or predicted molecular models into the map becomes a common strategy for interpretation. However, accurately fitting biomolecular models into cryo-EM maps, particularly for large macromolecular complexes, remains challenging when the input structure models contain errors or are in a conformation different from that represented in the map. Here, we present DMcloud, a method for local structure fitting of proteins and nucleic acids in cryo-EM maps. Instead of forcing an entire input model into the map, DMcloud divides input structures into local regions, identifies regions that are supported by the density, removes unsupported regions, and assembles the retained regions into a final model. We benchmarked DMcloud on 176 cryo-EM maps, including intermediate and high-resolution maps that include proteins, DNAs, or RNAs. For EM maps in the 5.0-10.0 [A] and 2.5-5.0 [A] resolution ranges, DMcloud achieved average sequence modeling coverage of 0.49 and 0.70, respectively. For DNA/RNA maps, DMcloud achieved an average sequence coverage of 0.75. Across all datasets, DMcloud consistently outperformed existing methods in model accuracy, map-model correlation, and modeling coverage.

09.
arXiv (CS.AI) 2026-06-16

Cordyceps: Covert Control Attacks on LLMs via Data Poisoning

arXiv:2605.26595v2 Announce Type: replace-cross Abstract: Large language models (LLMs) are often fine-tuned on uncurated text datasets that adversaries can poison. Existing poisoning attacks primarily rely on fixed trigger phrases that defenses such as outlier detection, clean-data regularization, or online monitoring can neutralize. In this paper, we propose a data poisoning method that teaches an LLM an information hiding scheme reliably and stealthily through semantic associations between shared knowledge such as facts or concepts and attacker-chosen phrases. The induced hiding scheme can encode and decode arbitrary malicious instructions, thus revealing a new and subtle poisoning-induced vulnerability: covert control attacks. We precisely characterize covert control attacks and evaluate them across $5$ LLMs, $3$ backdoor defenses, and $4$ prompt injection defenses. With a small poisoned fraction, covert control attacks outperform heuristic-based prompt injection attacks in average attack success rate by about $40\%$ relative to clean fine-tuned models. They also circumvent defenses based on detection and fine-tuning, maintaining up to $93\%$ attack success rate after backdoor defenses and up to $98\%$ after prompt injection defenses.

10.
arXiv (CS.CV) 2026-06-11

MFEN:Multi-Frequency Expert Network for Visible-Infrared Person Re-ID

Visible-infrared person re-identification (VI-ReID) is challenging due to the large modality discrepancy between visible and infrared images. We contend that this discrepancy is largely related to differing lighting conditions, including differences in light wavelength and light source type. Recently, frequency-based VI-ReID approaches have achieved notable success because frequency information can better extract identity-relevant contours and details while excluding irrelevant lighting and color. However, existing methods either do not distinguish different frequency bands or focus on only one band, which is insufficient under diverse lighting conditions. To perform comprehensive frequency domain learning, we propose a Multi-Frequency Expert Network (MFEN) that enables multi-frequency modulation and adaptively combines different bands through a mixture-of-experts design. We further introduce Random Frequency Augmentation (RFA) and Frequency Auxiliary Optimization (FAO) to better train MFEN. The three modules are complementary and jointly capture critical frequency-domain details for robust representation learning. Extensive experiments on three VI-ReID datasets demonstrate the effectiveness of our approach.

11.
arXiv (CS.CV) 2026-06-16

Vision-Language Models as Zero-Annotation Oracles in Histopathology

Foreground segmentation is the critical first step of every computational pathology pipeline, yet existing methods rely on hand-tuned heuristics or supervised models that overfit to narrow stain and scanner distributions, failing silently on specialised stains such as Jones silver or Elastica van Gieson. We propose a coarse-to-fine approach that recasts foreground segmentation as a visual perception task and leverages general-purpose vision-language models (VLMs) as zero-annotation oracles. Our key insight is that tissue-versus-background discrimination is a natural-image recognition problem, not a histopathological one, so VLMs trained on internet-scale corpora generalise where domain-specific models cannot. We introduce Leica-75, a benchmark of 75 renal transplant whole-slide images spanning three stain families. On Leica-75, our method achieves the highest segmentation quality on out-of-distribution stains (Dice 0.858 +/- 0.027 on Jones, 0.853 +/- 0.041 on EVG) with 7x lower cross-stain variance than the best supervised baseline, while remaining competitive on in-distribution H&E. Few-shot prompting with automatically curated exemplars (Auto-context) rescues hard cases on Stress-32 (n=32), a curated stress-test subset (Dice 0.470 to 0.819 for the 2B model). VLM-based annotation review matches human expert consensus (kappa=0.989 for blur detection; mean precision/recall grading accuracy 0.708 vs. human 0.646 for segmentation mask review). The resulting pseudo-labels are used to distil lightweight student models that are as performant as the teacher model while running for a fraction of the cost. Our framework provides a principled, scalable solution to a persistent infrastructure bottleneck in digital pathology.

12.
arXiv (CS.CV) 2026-06-15

ADAPT: An Autonomous Forklift for Construction Site Operation

Efficient material logistics play a critical role in controlling costs and schedules in the construction industry. However, manual material handling remains prone to inefficiencies, delays, and safety risks. Autonomous forklifts offer a promising solution to streamline on-site logistics, reducing reliance on human operators and mitigating labor shortages. This paper presents the development and evaluation of ADAPT (Autonomous Dynamic All-terrain Pallet Transporter), a fully autonomous off-road forklift designed for construction environments. Unlike structured warehouse settings, construction sites pose significant challenges, including dynamic obstacles, unstructured terrain, and varying weather conditions. To address these challenges, our system integrates AI-driven perception techniques with traditional approaches for decision making, planning, and control, enabling reliable operation in complex environments. We validate the system through extensive real-world testing, comparing its continuous performance against an experienced human operator across various weather conditions. Our findings demonstrate that autonomous outdoor forklifts can operate near human-level performance, offering a viable path toward safer and more efficient construction logistics.

13.
arXiv (CS.LG) 2026-06-17

Rethinking Dataset Distillation for Classification: Do Distilled Sets Outperform Coresets?

arXiv:2606.18209v1 Announce Type: new Abstract: Dataset distillation (DD) has emerged as a prominent approach in data centric machine learning, aiming to synthesize compact training sets for efficient training by compressing the information in large datasets into a small number of synthetic samples. However, DD methods are often evaluated under inconsistent evaluation protocols, ranging from standard ERM to single/multi-teacher supervision, making it difficult to isolate the effectiveness of distilled data from evaluation. Moreover, many prior methods claim that DD outperforms data pruning approaches such as coreset selection (CS), based on the assumption that restricting condensed datasets to subsets of real samples fundamentally limits their expressiveness. In this work, we critically evaluate DD methods through large-scale experiments using standardized datasets and evaluation protocols to assess their intrinsic effectiveness. We benchmark seven state-of-the-art (SOTA) DD methods on ImageNet-1K, ImageNet100, and ImageNette, using three widely adopted training protocols against three CS strategies. Our results show that while some DD methods fail to outperform even simple random subsets, the SOTA DD approaches are comparable to or worse than coresets on large-scale datasets and incur a substantially higher cost for construction. Beyond accuracy, we also evaluate the representativeness, diversity, and quality of condensed sets, and find that coresets consistently achieve better coverage of the original data distribution. These findings highlight the limited practical advantages of current DD methods and show that coresets remain competitive and are often a more computationally efficient alternative for data-centric learning.

14.
arXiv (CS.AI) 2026-06-17

OmniSapiens: A Foundation Model for Social Behavior Processing via Heterogeneity-Aware Relative Policy Optimization

arXiv:2602.10635v3 Announce Type: replace Abstract: Socially intelligent AI systems must reason across diverse human behavioral tasks and generalize to new social contexts. However, behavioral data is inherently heterogeneous, comprising diverse modalities and prediction targets that produce uneven training signals across samples, creating imbalanced learning dynamics that challenge existing AI models. To address this, we develop Omnisapiens-7B 2.0, a foundation model for social behavior processing that explicitly addresses learning from heterogeneous behavioral data. This is enabled through Heterogeneity-Aware Relative Policy Optimization, a new RL method that rebalances learning signals across samples by approximating each sample's contribution to the policy update and using these estimates to drive geometrically centered, inertially smoothed advantage modulation for stable training. Omnisapiens-7B 2.0 achieves the best and most consistent performance across 10 behavioral tasks, while also attaining the best performance on all five held-out benchmarks, with gains of up to +12.02% and +9.37% respectively. Furthermore, it demonstrates more consistent and interpretable reasoning traces, supporting reliable real-world behavioral applications. Our model is available at https://github.com/MIT-MI/human_behavior_atlas.

15.
arXiv (CS.AI) 2026-06-12

Before You Think: System 0, AI-Mediated Cognition and Cognitive Colonization

arXiv:2606.13658v1 Announce Type: new Abstract: This paper examines three recent frameworks for understanding the cognitive and epistemic consequences of artificial intelligence: Tri-System Theory, Thinkframes, and System 0. It argues that while the first two capture important dimensions of AI's influence on individual reasoning and collective epistemic practices, System 0 occupies a theoretically distinctive position that neither can fully replicate. The paper introduces the concept of cognitive colonization, according to which AI systems can embed external interests within the architecture of the self in ways that are difficult for users to perceive. Because such systems are already widely deployed, understanding these invisible forms of influence is an urgent philosophical and practical task.

16.
arXiv (CS.AI) 2026-06-17

A T-API-Compliant ReAct Agentic Loop for Optical Networks: Generic vs. Domain-Specific Tool Abstractions

arXiv:2606.18000v1 Announce Type: cross Abstract: Optical networks need intent-driven, closed-loop agentic management, a key enabler for higher autonomy levels. We present the first T-API-compliant reasoning and act (ReAct) loop. We show that domain-specific composite tools achieve 90% oracle-validated correctness with threefold token savings compared to generic tools.

17.
arXiv (quant-ph) 2026-06-11

Holographic Complexity, Extremality, and Cosmic Censorship

arXiv:2604.20170v2 Announce Type: replace-cross Abstract: We propose a holographic complexity origin for the third law of black-hole mechanics and weak cosmic censorship. In both complexity equals action and complexity equals volume prescriptions, the relative complexity between subextremal and extremal AdS black holes diverges logarithmically. For overcharged RN-AdS, explicit calculations in both prescriptions show that the near-singularity action terms are power-law divergent or finite, while the maximal-volume contribution is finite. Thus, the extremal-to-naked relative complexity also diverges, obstructing finite-time transitions.

18.
arXiv (CS.AI) 2026-06-11

SkillJuror: Measuring How Agent Skill Organization Changes Runtime Behavior

arXiv:2606.11543v1 Announce Type: new Abstract: Agent Skills augment large language model (LLM) agents with procedural knowledge at inference time, but current benchmarks rarely distinguish what a Skill says from how it is organized. We study this distinction through Progressive Disclosure, where a concise root file points agents to supporting resources on demand, and compare it with a normalized flat baseline. We present SkillJuror, a framework for evaluating Skill writing paradigms through semantically controlled variants, matched multi-trial evaluations, and trajectory evidence while holding task knowledge fixed. In an 82-task SkillsBench study, Progressive Disclosure changes runtime behavior before aggregate outcomes: distinct Skill resources touched per trajectory rise from 1.18 to 3.85, and effective uptake events rise from 1.33 to 3.92. It also yields 17 additional verifier-passing trials out of 410 matched trials (+4.1%) over the normalized flat baseline. The benefit is task-dependent. Progressive Disclosure helps when supporting resources guide implementation, checking, or repair, but is weaker when success hinges on exact output conventions, numerical thresholds, or long artifact-generation pipelines. These results show that Skill organization is not mere presentation: it can change how agents search and apply procedural knowledge, while outcome gains depend on whether the exposed resources are actionable for the task. Code is available at https://github.com/zhiyuchen-ai/skill-juror.

19.
arXiv (CS.AI) 2026-06-17

Learning to Decide with AI Assistance under Human-Alignment

arXiv:2605.12646v2 Announce Type: replace-cross Abstract: It is widely agreed that when AI models assist decision-makers in high-stakes domains by predicting an outcome of interest, they should communicate the confidence of their predictions. However, empirical evidence suggests that decision-makers often struggle to determine when to trust a prediction based solely on this communicated confidence. In this context, recent theoretical and empirical work suggests a positive correlation between the utility of AI-assisted decision-making and the degree of alignment between the AI confidence and the decision-makers' confidence in their own predictions. Crucially, these findings do not yet elucidate the extent to which this alignment influences the complexity of learning to make optimal decisions through repeated interactions. In this paper, we address this question in the canonical case of binary predictions and binary decisions. We first show that this problem is equivalent to a two-armed online contextual learning problem with full feedback, and establish a lower bound of $\Omega (\sqrt{|H| \cdot |B| \cdot T} )$ on the expected regret any learner can attain, where $H$ and $B$ denote the sets of human and AI confidence values. We then demonstrate that, under perfect alignment between AI and human confidence, a learner can attain an expected regret of $O(\sqrt{|H| \cdot T\log T})$ and, when $\sqrt{|H|} = O(\log T)$ and $B$ is countable, a non-trivial generalization of the Dvoretzky-Kiefer-Wolfowitz inequality improves the regret bound to $O(\sqrt{T\log T})$. Taken together, these results reveal that alignment can reduce the complexity of learning to make decisions with AI assistance. Experiments on real data from two different human-subject studies where participants solve simple decision-making tasks assisted by AI models show that our theoretical results are robust to violations of perfect alignment.

20.
arXiv (CS.AI) 2026-06-16

Green AI Carbon Optimizer: Carbon-Efficient Training Location Recommendation and Global AI Energy Demand Forecasting

arXiv:2606.14707v1 Announce Type: cross Abstract: AI training and deployment consume substantial electricity, but carbon outcomes remain weakly integrated into routine model development decisions. This paper presents Green AI Carbon Optimizer with two primary contributions: (i) a carbon aware cloud region recommendation method for training workloads, and (ii) a power law forecasting pipeline for global AI energy demand. For location recommendation, we combine regional grid carbon intensity, renewable share, and data center Power Usage Effectiveness (PUE) into a unified scoring model across 100+ regions from major cloud providers. For a reference workload (8*A100, 100h), estimated emissions in our sampled regions range from 7.74kg to 272.00kg CO2. Selecting the best region instead of the worst corresponds to a 97.2% reduction relative to the worst case. Ablation shows that ranking by renewable share alone can select regions with higher CO2 emissions than rankings that include grid carbon intensity. For forecasting, we fit a power law relation between parameter count and training energy using 26 anchor models. We combine this fit with scenario assumptions on model growth, hardware efficiency, and training frequency, and evaluate sensitivity to inference ratio and ecosystem scaling. Across scenarios, projected 2030 demand ranges from 7TWh to 1,436TWh under the stated assumptions, highlighting the importance of deployment choices, model scaling discipline, and transparent energy reporting.

21.
bioRxiv (Bioinfo) 2026-06-18

Calculation of sequence space coverage in a mutagenesis library

Directed evolution requires screening of large mutagenesis libraries, but accurate calculation of library sizes needed to discover functional variants remains challenging. Existing models provide baseline estimates, yet current computational approaches for finding the best variants scale poorly with library complexity. Here, we introduce a scalable algorithmic framework to compute exact discovery probabilities in saturation mutagenesis libraries with no requirement for explicit sequence enumeration. By aggregating variants into a composition log–sum distribution and applying log-space convolution across randomisation blocks, it is possible to extend this to massive sequence spaces and mixed codon schemes. By inverting these calculations, absolute mathematical ceilings for experimental design are established. Ultimately, this framework provides a rapid, quantitative tool to balance the statistical coverage-diversity trade-off within the limitations of laboratory screening. Finally, this is implemented as an open-source web application (SSCC) that allows researchers to construct heterogeneous library designs and compute required sampling depths, coverage probabilities, and absolute randomisation limits.

22.
arXiv (quant-ph) 2026-06-16

Intrinsic preservation of plasticity in continual quantum learning

arXiv:2511.17228v2 Announce Type: replace Abstract: Artificial intelligence in dynamic, real-world environments requires the capacity for continual learning. However, standard deep learning suffers from a fundamental issue: loss of plasticity, in which networks gradually lose their ability to learn from new data. Here we show that quantum learning models naturally overcome this limitation, preserving plasticity over long timescales. We demonstrate this advantage systematically across a broad spectrum of tasks from multiple learning paradigms, including supervised learning and reinforcement learning, and diverse data modalities, from classical high-dimensional images to quantum-native datasets. Although classical models exhibit performance degradation correlated with unbounded weight and gradient growth, quantum neural networks maintain consistent learning capabilities regardless of the data or task. We identify the origin of the advantage as the intrinsic physical constraints of quantum models. Unlike classical networks where unbounded weight growth leads to landscape ruggedness or saturation, the unitary constraints confine the optimization to a compact manifold. Our results suggest that the utility of quantum computing in machine learning extends beyond potential speedups, offering a robust pathway for building adaptive artificial intelligence and lifelong learners.

23.
arXiv (CS.CL) 2026-06-15

Sentinel: Decoding Context Utilization via Attention Probing for Efficient LLM Context Compression

Retrieval-augmented generation (RAG) often suffers from long and noisy retrieved contexts. Existing context compression methods typically rely on heuristic relevance estimation or supervised compression models rather than on how LLMs utilize retrieved context during inference. We propose Sentinel, a lightweight sentence-level compression framework that decodes inference-time contextual utilization behaviors from head-wise attention patterns of frozen LLMs. To ground supervision in retrieval-dependent answering behavior, Sentinel trains a lightweight probe using QA examples where the model succeeds only when retrieved context is available. Sentinel performs compression using only a single non-autoregressive forward pass without dedicated compression training or autoregressive scoring. Empirically, we find that effective contextual utilization signals remain accessible even in compact proxy models. On LongBench, Sentinel with a 0.5B proxy model achieves up to 5$\times$ compression while attaining question-answering performance competitive with compression methods built on 7B-scale models. Despite being trained only on English QA data, Sentinel also generalizes effectively to Chinese and out-of-domain settings.

24.
arXiv (CS.CV) 2026-06-15

Context-Guided Semantic Alignment for Feature Fusion Networks

Feature fusion networks are fundamental components in modern object detectors, aggregating multi-scale features to detect objects of varying sizes. However, directly fusing features from different pyramid levels often introduces semantic inconsistency due to their heterogeneous representations. In this paper, we propose Feature Interaction NEtwork (FINE), a lightweight semantic alignment module that refines low-level features via high-level contextual guidance using cross-level attention prior to fusion. To bridge the structural gap and ensure computational efficiency, we introduce an Alignment-Aware Token Sampling that aligns corresponding spatial regions across scales, reducing the attention complexity by an order of magnitude. The resulting attention weights generate a spatial-channel modulation map that is upsampled and applied to the low-level features via residual element-wise modulation. This mechanism ensures that the network selectively enhances semantically relevant pixels while preserving the sub-pixel localization accuracy necessary for dense prediction tasks. FINE is generally applicable to various detectors and consistently improves detection accuracy without compromising efficiency.

25.
arXiv (CS.CV) 2026-06-16

Projection and Quantisation: A Unifying View of Learning to Hash, from Random Projections to the RAG Era

作者:

Approximate nearest-neighbour search underpins large-scale retrieval and retrieval-augmented generation, yet its methods are studied in communities that seldom read one another. We argue that they form one field with three design choices. We develop the projection-quantisation-organisation lens: every method places its projections, places its quantisation thresholds, and organises the resulting codes for search. We test the lens with a reproducible measurement, released as the open BitBudget benchmark, and report three findings. First, the quantisation axis delivers the largest memory savings: a one-bit code with full-precision re-ranking matches uncompressed quality for six of seven embedders, the scanned code one thirty-second of the float's size. Second, the orderings the lens anticipates, including a learned-embedding regime where binary codes overtake an inverted-file product quantiser at a matched byte budget, recur as the embedding is enlarged. Third, given class labels, an eight-byte supervised code more than doubles the retrieval quality of the two-kilobyte task-agnostic float it replaces. We also recast the semantic identifiers of generative retrieval as quantisation codes. The main contribution is a single, tested account of compact-code search, from random projections to the retrieval-augmented era.