Academic Intelligence · Curated Daily

Explore the Frontier of Global Academia

AcademicHub aggregates real-time literature from top journals and preprint platforms. Build your personal research radar and let large language models compile cross-disciplinary analysis briefings automatically.

01.
arXiv (math.PR) 2026-06-15

Stability of Synthetic Ricci Curvature Lower Bounds for Inverse Limit Extended Metric Measure Spaces

arXiv:2606.14322v1 Announce Type: cross Abstract: We show that every Polish extended metric measure space arises as an inverse limit of metric measure spaces up to isomorphism. We then prove that synthetic Ricci curvature lower bounds and several functional inequalities, including the log-Sobolev, Talagrand, Poincaré, and dimension-free Harnack inequalities are stable under inverse limit. We discuss applications to infinite-dimensional spaces, including abstract Wiener spaces and their quotient spaces.

02.
arXiv (CS.LG) 2026-06-17

Recursive Learning Without Collapse: A Weighting-Based Stabilization Framework

arXiv:2502.18049v5 Announce Type: replace-cross Abstract: Recent studies identified an intriguing phenomenon in recursive generative model training known as model collapse, where models trained on data generated by previous models exhibit severe performance degradation. Addressing this issue and developing more effective training strategies have become central challenges in generative model research. In this paper, we investigate this phenomenon within a novel framework, where generative models are iteratively trained on a combination of newly collected real data and synthetic data from the previous training step. To develop an optimal training strategy for integrating real and synthetic data, we evaluate the performance of a weighted training scheme in various scenarios, including Gaussian distribution estimation, generalized linear models, and nonparametric estimation. We theoretically characterize the impact of the mixing proportion and weighting scheme of synthetic data on the final model's performance. Our key finding is that, across different settings, the optimal weighting scheme under different proportions of synthetic data asymptotically follows a unified expression, revealing a fundamental trade-off between leveraging synthetic data and model performance. In some cases, the optimal weight assigned to real data corresponds to the reciprocal of the golden ratio. Finally, we validate our theoretical results on extensive simulated datasets and a real tabular dataset.

03.
arXiv (CS.AI) 2026-06-12

Bag of Dims: Training-Free Mechanistic Interpretability via Dimension-Level Sign Patterns

arXiv:2606.12629v1 Announce Type: cross Abstract: We show that the standard basis of transformer hidden states already provides a training-free, architecture-general feature basis. Individual dimensions encode semantic content via their signs and confidence via their magnitudes, functioning as independent binary registers. We validate this Bag of Dims framework across three model families (Qwen 3.5-4B, Gemma 3-4B, Mistral 7B) through four progressive experiments. Sign patterns alone carry predictive content: replacing all magnitudes with unity achieves 72-93% top-5 next-token accuracy through the LM head, and pure Hamming scoring without any decoder reaches 80-90% top-4096. These sign patterns organize into semantic features: using a single-token type cache (one forward pass per vocabulary token, no context), we discover 175 categories via per-dimension sign consistency (mean AUC 0.80) from 50 anchors with zero training. A trained probe adds only +0.018 AUC and converges to axis-aligned weights, confirming negligible cross-dimension structure. This structure extends to attention: all 175 categories remain discoverable in K and V projections. On the write side, static FFN weight inspection links 20% of features to individual writer neurons (>0.70 agreement; random controls: 0%), with top-200 neuron coalitions achieving >0.70 agreement on 99.9% of prototypes via majority vote. Fully unsupervised discovery (random seeds, no labels) scales to 1500 features at 100% yield and 99% sparsity across all three models, with pairwise MI of 0.0014 bits confirming low inter-dimension coupling. These results establish that the standard basis already suffices for feature reading throughout the transformer compute pathway, requiring no training, no optimization, and no GPU-days beyond a single forward pass per vocabulary token.

04.
arXiv (quant-ph) 2026-06-17

Quantum Chip Paradigm Framework

arXiv:2606.17899v1 Announce Type: new Abstract: Quantum Electronic Design Automation (Q-EDA) is emerging as quantum chips move from laboratory prototypes to scalable engineering systems. This paper argues that superconducting quantum chip design is approaching a "SPICE moment" similar to early classical EDA, where growing qubit scale, control complexity, frequency planning, packaging, process variation, and cryogenic measurement feedback require a shift from experience-based design to model-driven engineering. We propose a Quantum Chip Paradigm Framework that treats Q-EDA not only as software, but as part of the quantum chip development paradigm. Unlike classical HDL-first design, quantum chip design must begin with physical structures such as Josephson junctions, resonators, couplers, readout elements, control lines, and packaging environments. The framework emphasizes PCell-based modeling, SPICE-Q simulation, Quantum PDKs, and design-technology-measurement co-optimization. We further outline a hierarchical Q-EDA system spanning physical structures, qubit PCells, logical qubits, quantum arithmetic, functional quantum IP, and Quantum SoC systems. The key goal is to turn physical models, layout rules, simulation results, fabrication data, and measurement feedback into reusable and auditable engineering objects for large-scale quantum processors and fault-tolerant quantum computing.

05.
arXiv (CS.LG) 2026-06-16

Multi-Scale Separable Fourier Neural Networks for Solving High-Frequency PDEs

arXiv:2605.31027v2 Announce Type: replace Abstract: We propose a novel neural network architecture, termed Multi-Scale Separable Fourier Neural Networks (MS-SFNN), for the accurate and efficient solution of linear and nonlinear high-frequency partial differential equations (PDEs). MS-SFNN exploits a separable representation: given a $d$-dimensional input, it employs $d$ independent subnetworks – each acting on a single coordinate – and constructs basis functions via element-wise multiplication of their outputs. The PDE solution is approximated as a linear combination of these basis functions, with coefficients determined by least squares. Critically, all network weights and biases are randomly initialized once, from a uniform distribution with unit variance, and remain fixed thereafter. To enhance expressivity, a tunable scaling factor is introduced in each subnetwork to modulate the frequency content of the resulting basis functions. Fourier features are explicitly embedded through cosine activations, endowing the method with strong spectral approximation capabilities. To mitigate the memory bottleneck associated with dense collocation in high-frequency or three-dimensional problems, we replace automatic differentiation with analytically derived basis function derivatives and develop a memory-efficient batched QR decomposition algorithm for solving large-scale least-squares systems. Numerical experiments demonstrate that MS-SFNN achieves unprecedented accuracy across a range of challenging PDEs, significantly outperforming state-of-the-art methods such as Physics-Informed Neural Networks (PINN) and Separated-Variable Spectral Neural Networks (SV-SNN).

07.
arXiv (CS.LG) 2026-06-18

ChronoSurv: A Clinical Pathway-Guided Graph Framework for Multimodal Survival Analysis

arXiv:2606.19140v1 Announce Type: new Abstract: Accurate survival prediction is essential for personalized treatment planning in head and neck cancer, yet remains challenging due to the heterogeneous and high-dimensional nature of multimodal clinical data. While deep survival models have improved predictive performance over classical statistical approaches, existing methods typically rely on static fusion strategies or temporally agnostic modeling, limiting their ability to capture structured clinical workflows. In this work, we propose ChronoSurv, a heterogeneous hierarchical directed graph framework for multimodal survival analysis. ChronoSurv represents patient care as a progression-aware clinical trajectory using directed graphs aligned with key diagnostic steps. A hierarchical topology incorporates fine-grained, coarse, and global representations, further supporting flexible adaptation to missing modalities, while heterogeneous message passing models complex and asymmetric relationships across modalities and clinical steps. Experimental results on two public datasets demonstrate that ChronoSurv achieves state-of-the-art discriminative performance while maintaining statistically reliable calibration. Comprehensive ablation studies further confirm the contribution of each architectural component, highlighting the potential of trajectory-aware graph modeling for multimodal survival prediction.

08.
arXiv (CS.LG) 2026-06-16

SPICE: Synergy and Partial Information Based Curriculum Evolution

arXiv:2606.16639v1 Announce Type: new Abstract: Multimodal learning exploits complementary information across heterogeneous modalities. The informativeness of each modality can vary widely across samples and training stages. Existing multimodal curriculum learning strategies often assume that the relative complexity of samples remains unchanged throughout training and therefore cannot adapt to model evolution. We propose SPICE (Synergy and Partial Information based Curriculum Evolution), a novel progressive curriculum framework for multimodal interaction learning. Guided by Partial Information Decomposition (PID) theory, our approach decomposes multimodal interactions into redundant, unique, and synergistic information components, enabling an interpretable and dynamic characterization of sample complexity. Building on this decomposition, we design a progressive curriculum that evolves throughout training, allowing the model to transition from learning shared cross-modal cues to modality-specific patterns and, finally, to complex synergistic interactions. Adapting to model evolution, sample ordering is refined in real-time using PID information estimates derived from unimodal and multimodal predictions. Experiments across multiple multimodal benchmarks demonstrate consistent improvements over conventional training and state-of-the-art baselines, highlighting the effectiveness of PID information decomposition and adaptive sample ordering for multimodal curriculum learning.

09.
arXiv (math.PR) 2026-06-16

An Algebraic Matrix Spencer Theorem

arXiv:2606.16005v1 Announce Type: new Abstract: We develop an algebraic approach to matrix discrepancy based on the representation theory of finite-dimensional C$^*$-algebras. As an application, we resolve a substantial structured special case of the Matrix Spencer conjecture. In particular, we show that for every family of contractions $A_1,\ldots,A_n$ that are contained in a finite-dimensional $C^*$-algebra $\mathcal A$ with $dim_{\mathbb C} (\mathcal A) \lesssim n$, there exists signs $x\in\{\pm1\}^n$ such that $\|\sum_{i=1}^n x_i A_i\| \le O(\sqrt n)$. As a noteworthy special case, our main result also resolves the Group Spencer conjecture of (Bandeira'24). We furthermore prove that Matrix Spencer continues to hold for low-rank perturbations of matrix families coming from an $C^*$-algebra of small dimension.

10.
arXiv (CS.CL) 2026-06-16

CHILLGuard: Towards Fine-Grained Chinese LLM Safety Guardrail with Scalable Data Construction and Model-aware Preference Alignment

Malicious content generated from large language models (LLMs) could pose severe safety risks and ethical concerns. While existing LLM safety guardrails excel in English or multilingual settings, they lack adaptation to Chinese-specific regulatory policies, cultural context and linguistic nuances, failing to support fine-grained risk classification for diverse deployment needs. In this paper, we introduce a 5-macro, 31-micro category fine-grained risk taxonomy for Chinese scenarios, and build CHILLGuard: a dedicated Chinese LLM content safety guardrail. To address the critical scarcity of high-quality annotated Chinese safety data, we propose a scalable multi-stage data construction pipeline: we expand multi-source corpus via retrieval-augmented generation, generate implicit harmful samples through prompt engineering rewriting, and refine high-quality data via multi-model voting-based label calibration. Based on this, we build CHILLGuardTrain, a large-scale training set with 405,007 samples, and CHILLGuardTest, a rigorously curated annotated test set with 51,745 samples. We then train CHILLGuard on CHILLGuardTrain under a generator-classifier collaborative framework via Model-aware Direct Preference Optimization. Extensive experiments under multiple settings demonstrate the state-of-the-art performance of CHILLGuard, e.g., a 15.92% improvement of F1 score over Qwen3Guard-8B-Strict on our benchmark. We will release our resources at https://github.com/cswbyu/CHILLGuard.

11.
arXiv (CS.AI) 2026-06-18

Correcting Sensor-Induced Distribution Drift with Wasserstein Adversarial Learning

arXiv:2606.18561v1 Announce Type: cross Abstract: The quality of recorded data depends on the stability of the sensor system that acquires it. Sensor motion and aging can degrade the performance and stability of downstream data-driven methods. We present a Wasserstein-GAN-inspired approach for unsupervised inference of physically interpretable transformation parameters that map a changed detector response distribution back to a nominal reference distribution. In contrast to standard generative modeling, the generator is used as a learnable calibration transformation whose trainable weights represent the sought parameters, while the critic provides a distributional distance signal via the Wasserstein objective. We validate the approach on a tracking-detector toy model with controlled layer shifts and demonstrate its application on high-granularity Geant4-simulated calorimeter data with cell-wise aging effects. The method recovers aging coefficients for individual cells with correlation to ground truth and improves agreement between calibrated and reference energy-sum distributions, while exhibiting the expected degradation at increasing channel-to-channel noise levels. These results indicate that adversarial distribution matching can serve as a data-driven component of calibration strategies in settings where direct labels for degradation parameters are unavailable.

12.
arXiv (CS.CV) 2026-06-16

Momentum-Guided Semantic Forecasting (MoFore) for Self-Supervised Video Representation Learning

Authors:

Self-supervised video representation learning has recently advanced through contrastive learning, masked reconstruction, and predictive representation learning. Reconstruction-based approaches such as MAE and VideoMAE learn representations by recovering masked visual content [he2022mae,tong2022videomae], while contrastive methods such as CLIP learn semantically meaningful embedding spaces through representation alignment [radford2021clip]. In this work, we introduce a Momentum-Guided Semantic Forecasting framework (MoFore) for self-supervised video representation learning. Instead of optimizing for pixel-level reconstruction or task-specific semantic alignment, the proposed method learns temporally predictive video representations by forecasting future latent embeddings from temporally distant context clips. To improve robustness across temporal scales, we further introduce randomized temporal-gap forecasting during training. The framework combines predictive latent forecasting with contrastive regularization to encourage temporal consistency while preventing representation collapse. Experiments on the UCF101 dataset demonstrate that the proposed framework learns temporally consistent and semantically meaningful video representations without using action labels during training. Quantitative analysis shows strong temporal stability and emergent category-level structure in the learned embedding space, while qualitative retrieval experiments reveal motion-aware organization across related activities. Overall, the results suggest that long-range latent forecasting provides an effective and computationally efficient approach for self-supervised video representation learning without relying on reconstruction-based objectives.

13.
arXiv (CS.CV) 2026-06-18

Semantic Router: On the Feasibility of Hijacking MLLMs via a Single Adversarial Perturbation

Multimodal Large Language Models (MLLMs) are increasingly deployed in stateless systems, such as autonomous driving and robotics. This paper investigates a novel threat: Semantic-Aware Hijacking. We explore the feasibility of hijacking multiple stateless decisions simultaneously using a single universal perturbation. We introduce the Semantic-Aware Universal Perturbation (SAUP), which acts as a semantic router, "actively" perceiving input semantics and routing them to distinct, attacker-defined targets. To achieve this, we conduct theoretical and empirical analysis on the geometric properties in the latent space. Guided by these insights, we propose the Semantic-Oriented (SORT) optimization strategy and annotate a new dataset with fine-grained semantics to evaluate performance. Extensive experiments on three representative MLLMs demonstrate the fundamental feasibility of this attack, achieving a 66% attack success rate over five targets using a single frame against Qwen.

14.
arXiv (CS.CL) 2026-06-19

Clusters are All You Need: Pre-Training the Tsetlin Machine with Semantic Clusters from Language Models for Interpretability

Pre-trained language models such as BERT achieve strong text classification performance but lack transparency, limiting their use in high-stakes settings. The Tsetlin Machine (TM) offers fully interpretable, clause-based reasoning but captures little semantic information, and prior attempts to bridge the two rely on static word embeddings that miss contextual meaning. We propose a semantic pre-training framework that transfers knowledge from a pre-trained language model into a TM without using embeddings. Text samples are grouped into semantically coherent clusters with K-means or Top2Vec, and the resulting cluster-sample pairs pre-train a non-negated TM with enhanced Type I feedback. The TM thereby learns interpretable semantic keywords that are fine-tuned on downstream tasks. Across five datasets, our method substantially outperforms vanilla and embedding-based TMs and reaches performance competitive with BERT while remaining interpretable.

15.
arXiv (CS.AI) 2026-06-12

The Challenges of Balancing AI Compliance and Technological Innovations in Critical Sectors: A Systematic Literature Review

arXiv:2606.12423v1 Announce Type: cross Abstract: The rapid integration of artificial intelligence (AI) into critical infrastructure including healthcare, finance, energy, and defense, offers transformative benefits but also conflicts with evolving regulatory and governance frameworks. This paper presents a systematic literature review (SLR) to examine the challenges of balancing AI compliance and technological innovation across critical infrastructure sectors. The review follows established SLR guidelines to extract and synthesize insights from peer-reviewed articles, report, and institutional sources published between 2020-2025. The study identifies three interrelated challenges: fragmented regulations, excessive compliance burdens for smaller to medium enterprises (SMEs), and misaligned governance models. To address these challenges, the study highlights practical governance strategies, including risk-tiered regulation, compliance by design, and explainable AI, to support scalable and trustworthy AI deployment in critical sectors. Key contributions include a concise mapping of core AI-governance challenges and a conceptual diagram illustrating their overlap, as well as actionable strategies for policymakers and practitioner to harmonize oversight with innovation.

16.
arXiv (quant-ph) 2026-06-24

Quantum Metric Bound State of Light

arXiv:2606.22479v2 Announce Type: replace-cross Abstract: The spatial confinement of defect-induced bound states is conventionally governed by the effective mass in dispersive bands. More recently, Compact Localized States (CLSs) arising from exact destructive interference have been utilized to achieve confinement in flat bands. However, CLSs rely on pristine lattice symmetries and fine-tuned defect profiles. The introduction of a generic local impurity inevitably breaks these strict phase-matching conditions, resulting in extensive bound states whose fundamental length scale has remained an open question. Here, we establish a third regime of confinement: the quantum metric bound state. We provide a rigorous mathematical proof demonstrating that in the absence of kinetic energy and CLS protection, the exponential decay length of these states is lower-bounded by the quantum metric of the unperturbed flat band. We demonstrate the tightness of this geometric limit by constructing a family of highly tunable flat-band generators, and we verify its universality across diverse realistic architectures. Ultimately, this classification establishes the independently measurable quantum metric as a predictive design principle for engineering confined modes in synthetic wave platforms.

17.
arXiv (math.PR) 2026-06-16

Eyring-Kramers asymptotics for infinite-dimensional stochastic gradient systems

arXiv:2606.16083v1 Announce Type: new Abstract: We study small-noise asymptotics for a class of reversible stochastic evolution equations in infinite dimensions. The dynamics are of the form \[ dX_t=-A\nabla F(X_t)\,dt+\sqrt{2\beta^{-1}A}\,dW_t, \] where $F$ is a regular multi-well potential, $A$ is a selfadjoint mobility operator, $W$ is a cylindrical Brownian motion and $\beta\gg 1$ is the inverse noise strength. The invariant measure is a Gibbs perturbation of a Gaussian reference measure, and the resulting framework covers, in particular, the stochastic Allen-Cahn and stochastic Cahn-Hilliard equations on bounded intervals. In the double-well case, we derive a sharp asymptotic formula for the first nonzero eigenvalue of the generator. This gives an infinite-dimensional Eyring-Kramers law for the spectral gap, with exponential rate determined by the communication height and leading prefactor determined by the local quadratic behavior at the relevant minima and saddle points. Our approach provides a general strategy for lifting finite-dimensional Eyring-Kramers analysis to infinite-dimensional stochastic gradient systems.

18.
arXiv (CS.AI) 2026-06-15

When Good Verifiers Go Bad: Self-Improving VLMs Can Regress on New Tasks

Authors:

arXiv:2606.14629v1 Announce Type: cross Abstract: Verifier-driven self-DPO is a common recipe for self-improving production visual-language models. In this setup, a frozen verifier scores candidate generations, the top- and bottom-scoring candidates form a preference example, and DPO updates the learner. The deployment-time assumption is monotone: a stronger verifier should yield a stronger student. We show that this assumption can fail because verifier quality is highly task-specific. On a four-rung open-source verifier ladder across MathVista, MMMU, and BLINK, the same verifiers that are above-threshold and improve a Qwen-3-VL-2B student on MathVista become sub-threshold on MMMU, where their task-rubric accuracy drops to 8% to 23%. In this regime, every verifier we tested silently regresses the student, producing drops of 3.4 to 10.9 percentage points below the frozen baseline while the DPO training loss continues to decrease. The regression replicates on a second student, Qwen-2.5-VL-3B. Moreover, within the failure regime, damage is confidence-inverted: the more accurate-but-still-wrong verifier causes larger regression than a near-random verifier, suggesting that progress-gated replay amplifies confidently wrong preference pairs. We give a compact mechanistic explanation via a variance theorem for progress-gated replay and its direction-mismatch failure mode. The deployment message is operational rather than purely diagnostic: before running any verifier-driven loop, teams should measure target-task rubric accuracy, rank verifiers by target-task rubric quality rather than parameter count, and treat diminishing returns in above-threshold regimes as a verifier-side compute budget cap.

19.
medRxiv (Medicine) 2026-06-16

The biological clock of multimorbidity: temporal dynamics of disease co-occurrence in primary care

Multimorbidity is the dominant clinical reality of primary care, yet the temporal dynamics governing when and how persistent comorbidity associations emerge remain poorly characterised. Most large-scale comorbidity studies adopt a single observation window after an index diagnosis, implicitly assuming that associations detectable at one year are equally detectable at five. Using 11 years of electronic health records from 5,821,197 individuals in Catalan primary care, we applied a matched cohort design across nine complementary follow-up windows, five cumulative (0-1 to 0-5 years) and four conditional (1-2 to 4-5 years), to 1,315 index diseases, identifying 144,030 significant directed comorbidity associations in the five-year network. We found that 60.1% of these associations required at least three years of follow-up and were undetectable in shorter-window analyses, demonstrating that observation window length is a primary determinant of which comorbidities can be observed. To organise this temporal heterogeneity, we introduce the biological clock of multimorbidity: a two-dimensional framework that positions ICD-10 disease categories according to their rates of cumulative signal attenuation and the persistence of conditional risk. This framework identifies four reproducible temporal patterns (episodic, chronic stable, chronic progressive, and transient-persistent) that are robust under bootstrap resampling, leave-one-disease-out sensitivity analysis, and alternative clustering approaches. The biological clock is systematically modulated by sex, with Blood/Immune and Musculoskeletal disorders showing the largest sex differences in temporal dynamics. Network analysis identified 19 disease "initiators" that generate broad downstream comorbidity burdens and 21 "sinks" representing convergent endpoints of multiple disease trajectories. Comparison with hospital-based Danish data from 6,909,676 individuals showed that shared associations were 2.7-fold enriched over chance expectation (hypergeometric test, p

20.
arXiv (CS.AI) 2026-06-24

BioPIE: A Biomedical Protocol Information Extraction Dataset for Experiment Understanding

arXiv:2601.04524v2 Announce Type: replace Abstract: Understanding biomedical experiments provides a foundation for downstream tasks, e.g., laboratory automation, and facilitates effective cross-disciplinary communication. Two challenges, High Information Density (HID) and Multi-Step Reasoning (MSR), pose unique difficulties for precise experimental understanding. Extracting structured knowledge, e.g., Knowledge Graphs (KGs), is an effective approach to address the HID and MSR. However, existing biomedical datasets for structured knowledge information extraction are limited to a general or coarse-grained level, hindering fine-grained experimental understanding. To address this gap, we introduce Biomedical Protocol Information Extraction Dataset (BioPIE), a dataset providing procedure-centric KGs that capture entities, actions, and relations at a scale sufficient for reasoning across biomedical protocols. We evaluate information extraction methods on BioPIE and implement a question answering system leveraging the dataset for validation, demonstrating improved understanding performance on test sets as well as on the HID and MSR question sets.

21.
arXiv (CS.AI) 2026-06-19

DF3DV-1K: A Large-Scale Dataset and Benchmark for Distractor-Free Novel View Synthesis

arXiv:2604.13416v2 Announce Type: replace-cross Abstract: Advances in radiance fields have enabled photorealistic novel view synthesis. In several domains, large-scale real-world datasets have been developed to support comprehensive benchmarking and to facilitate progress beyond scene-specific reconstruction. However, for distractor-free radiance fields, a large-scale dataset with clean and cluttered images per scene remains lacking, limiting the development. To address this gap, we introduce DF3DV-1K, a large-scale real-world dataset comprising 1,048 scenes, each providing clean and cluttered image sets for benchmarking. In total, the dataset contains 89,924 images captured using consumer cameras to mimic casual capture, spanning 128 distractor types and 161 scene themes across indoor and outdoor environments. A curated subset of 41 scenes, DF3DV-41, is systematically designed to evaluate the robustness of distractor-free radiance field methods under challenging scenarios. Using DF3DV-1K, we benchmark nine recent distractor-free radiance field methods and 3D Gaussian Splatting, identifying the most robust methods and the most challenging scenarios. Beyond benchmarking, we demonstrate an application of DF3DV-1K by fine-tuning a diffusion-based 2D enhancer to improve radiance field methods, achieving average improvements of 0.96 dB PSNR and 0.057 LPIPS on the held-out set (e.g., DF3DV-41) and the On-the-go dataset. We hope DF3DV-1K facilitates the development of distractor-free vision and promotes progress beyond scene-specific approaches. The dataset and leaderboard are available at https://johnnylu305.github.io/df3dv1k_web/.

22.
arXiv (CS.CL) 2026-06-16

ACC: Compiling Agent Trajectories for Long-Context Training

Recent development of agents has renewed demand for long-context reasoning capacity of LLMs. However, training LLMs for this capacity requires costly long-document curation or heuristic context synthesis. We observe that agents produce massive trajectories when solving problems, invoking tools and receiving environment observations across many turns. The evidence needed to answer the original question is thus scattered throughout these turns, requiring integration of distant context segments. Nevertheless, standard agent SFT masks tool responses and only trains turn-level tool selection, creating a supervision blind spot where these scattered signals go unused. We propose Agent Context Compilation (ACC), which converts trajectories from search, software engineering, and database querying agents into long-context QA pairs that combine the original question with tool responses and environment observations gathered across multiple turns, training the model to answer directly without tool use. This makes the dependencies between the question and the evidence explicit, enabling direct supervision of long-context reasoning over distant segments without additional annotation. ACC is a simple but effective approach that can be combined with any existing long-context extension or training method, providing scalable supervised fine-tuning data. We validate ACC on long-range dependency modeling tasks through MRCR and GraphWalks, challenging benchmarks requiring cross-turn coreference resolution and graph traversal over extended contexts. Training Qwen3-30B-A3B with ACC achieves 68.3 on MRCR (+18.1) and 77.5 on GraphWalks (+7.6), results comparable to Qwen3-235B-A22B, while preserving general capabilities on GPQA, MMLU-Pro, AIME, and IFEval. Further mechanism analysis reveals that the ACC-trained model exhibits task-adaptive attention restructuring and expert specialization.

23.
arXiv (CS.CL) 2026-06-19

Large Language Models Do Not Always Need Readable Language

Large language models (LLMs) are commonly prompted and interfaced with human-readable natural language, even when the intended reader is another model. This paper investigates whether semantic information can be encoded in compact, non-standard textual forms that sacrifice human readability while remaining recoverable by LLMs. We refer to this class of model-centric textual representations as BabelTele, approached here not as a fixed protocol but as an empirical probe into LLMs' capacity to generate and interpret such representations. Through readability diagnostics, model likelihood measures, human questionnaires, and downstream task evaluations, we find that BabelTele can substantially depart from ordinary natural language while preserving core semantics for instruction-tuned LLMs. As a task-agnostic representational paradigm, BabelTele demonstrates high information density, maintaining 99.5% semantic fidelity even when the text volume is condensed to 27.9% of its original length. We further evaluate its semantic robustness in cross-model transfer, agent memory, and multi-agent communication. Results suggest that BabelTele can reduce context overhead while generally maintaining reliable downstream performance, although its effectiveness depends on the compressor-reader pair and task setting. These findings indicate that human readability, natural-language typicality, and model-side semantic recoverability can be partially decoupled, opening a path toward model-native representations in future exploration of LLM systems.

24.
bioRxiv (Bioinfo) 2026-06-16

Accelerating String Comparison in RLZ Compressed Sequences via LCE Jumps

Relative Lempel-Ziv (RLZ) is an effective compression method for large, repetitive collections; however, the fundamental primitives required to elevate it from a passive archival format to a tractable representation for compressed construction have yet to be fully established. In this paper, we introduce an algorithmic framework for structurally comparing and lexicographically sorting sequences of RLZ factors. We characterize when direct factor comparisons are necessary and when they can be bypassed using RLZ specific shortcuts. We further introduce a method for extending truncated factors into right-maximal matches, enabling the recovery of matching statistics from the RLZ parse. Experimentally, RLZ sorting achieved speedups of up to 3.93x over character-based sorting. Together, these results advance the use of the RLZ format as a foundation for compressed construction.