Academic Intelligence · Curated Daily

探索全球前沿学术脉络

AcademicHub 汇聚顶级期刊与预印本平台的实时文献。定制您的专属科研雷达,利用大语言模型自动生成交叉领域文献分析简报。

01.
arXiv (CS.CV) 2026-06-17

WeaveLA: Event Driven Cross-Subtask Latent Memory Weaving for Repetitive Robot Manipulation

Vision-Language-Action (VLA) policies have achieved remarkable single-step manipulation, yet they remain brittle precisely where each stage depends on what was just completed. The core issue is structural: short-window VLAs lack an explicit channel for rouxting information across sub-task boundaries, and existing memory-augmented variants either write at every frame, retrieve from demonstration-time stages, or fire at sub-goal events without performing an explicit sub-task-to-sub-task hand-off into the action expert. We identify the sub-goal completion event as the natural temporal unit for cross-subtask memory hand-off, and present WeaveLA (Weave Latent memory for Vision-Language-Action policies), a cross-subtask memory interface that, on top of a frozen VLA backbone, compresses each completed segment into latent tokens via query-driven attention pooling and routes them directly into the action-generation path of the next sub-task. This event-triggered, action-side design preserves the base policy's short-window interface while adding a lightweight cross-subtask channel. Through stratified evaluation on RoboMME with a $\pi_{0.5}$ backbone, WeaveLA's gains land exactly where the channel is needed: on the hardest repetition slice (SwingXtimes, $N{=}3$), success rises from $0\%$ to $47.8\%$, while single-execution episodes remain unchanged. Per-episode paired analysis confirms the gains are confined to tasks whose causal structure requires cross-subtask information.

02.
bioRxiv (Bioinfo) 2026-06-10

ECMME: an atlas of selection pressures on the mammalian extracellular matrix reveals contrasting evolutionary dynamics

The extracellular matrix (ECM) is a fundamental metazoan innovation that provides structural support and regulatory cues essential for multicellular life. While core matrisome components are subject to strong functional constraints, their evolutionary dynamics at the molecular level remain incompletely characterized. Here, we present a comprehensive per-residue analysis of selection pressures across 272 human core matrisome proteins using high-quality orthologous sequences from up to 228 placental mammal species. We developed an automated pipeline integrating ortholog identification, codon-aware alignments, and site-specific selection analyses with the MEME and FUBAR methods from the HyPhy suite. Results reveal pervasive strong purifying selection across the matrisome, consistent with its structural and functional indispensability. This is accompanied by episodic positive selection and rarer pervasive positive selection, with collagens exhibiting significantly elevated episodic positive selection compared to glycoproteins and proteoglycans. To facilitate community access, we developed ECMME (ECM Molecular Evolution) browser, an intuitive open-access web resource that visualizes selection metrics plotted directly onto protein topologies. ECMME allows researchers to seamlessly browse and investigate the data, providing a powerful framework for interpreting functional sites. It is available online and requires no local installation or set-up (https://izzilab-ecmme.share.connect.posit.cloud/).

03.
medRxiv (Medicine) 2026-06-22

Rare loss-of-function variants in POLD1, PMS1 and FAN1 modify age at onset of motor symptoms in Huntington's disease

Huntington's disease is a rare neurodegenerative disease whose primary risk factors are inherited expansions of a CAG repeat tract in the HTT gene. Somatic expansion of these tracts leads to neuronal toxicity, neuronal death and clinical disease progression. To identify genetic factors with a major impact on disease onset and progression, we genome sequenced 18,825 individuals for the ENROLL-HD study. Our results show rare inactivating mutations in three genes, all involved in DNA damage repair, are major determinants of age of onset for motor symptoms (n=10,610) and other clinical manifestations. Heterozygote carriers of predicted loss-of-function (pLoF) variants in POLD1 and PMS1 developed motor symptoms an average 20 years (n=3; P=1x10-5) and 7 years (n=6; P=2x10-3) later than non-carriers, respectively. Conversely, heterozygote carriers of pLoF variants in FAN1 (n=30) developed symptoms 10 years earlier (P=2x10-10). Our findings highlight therapeutic strategies and help predict age of onset for at-risk individuals.

04.
arXiv (CS.AI) 2026-06-12

Position: Generative Engine Optimization Creates Underexamined Risks, Governance Must Target Concentration, Disclosure, and Academic Blind Spots

arXiv:2606.12439v1 Announce Type: cross Abstract: Large language model (LLM) answer engines are increasingly used for information seeking, shifting visibility from ranked lists to synthesized answers. This enables Generative Engine Optimization (GEO), which targets LLM answer engines' evidence pool and generation. We analyze the search engine optimization (SEO) to GEO transition to identify two risks: (i) concentrated influence from low contestability and system sensitivity, and (ii) undisclosed commercial influence embedded in evidence and reasoning. We then formalize a general GEO pipeline to locate where optimization acts and compare academic and industry practices, revealing a third risk: (iii) academic-industry blind spots driven by visibility and evaluation asymmetries between offline setups and deployed systems. This position argues the need for answer-level governance and measurement: stronger contestability, high-precision disclosure, black-box auditing of material influence, and deployment-aligned metrics for exposure persistence.

05.
arXiv (CS.CV) 2026-06-11

Cross-Modal Benchmarking for Robotic Perception in Natural Environments

Natural environments present a complex challenge to robotics perception systems. Current models, particularly vision foundation models, are largely trained on structured, urban environments leading to weaknesses in their perception for field robotics tasks. We showcase the limitations of current models using our recently released WildCross benchmark, a new cross-modal benchmark for place recognition and metric depth estimation in large-scale natural environments. WildCross comprises over 476K sequential RGB frames with semi-dense depth and surface normal annotations, each aligned with accurate 6DoF pose and synchronized dense lidar submaps. In this work, we provide an expanded analysis of the benchmark results from the recent WildCross benchmark, with particular emphasis on expanded metric depth estimation experiments. Access to the code repository and dataset for this work can be found at https://csiro-robotics.github.io/WildCross.

06.
bioRxiv (Bioinfo) 2026-06-12

From Proteome Mining to Structural Validation: Phosphopyruvate Hydratase as a Structurally Tractable Drug Target in Kinetoplastid Parasites

Chagas disease, caused by Trypanosoma cruzi, demands novel therapeutic strategies that overcome the toxicity and limited efficacy of current treatments. To address this need, herein we report an integrative, target-centric strategy that combines parasite proteome mining, structural modeling, and experimental validation. Functional enrichment and druggability analyses identified phosphopyruvate hydratase (PPH) as a promising candidate due to its essential metabolic role and limited similarity to human homologs. Notably, proteome mining revealed the presence and conservation of PPH across kinetoplastid parasites, including Leishmania donovani, supporting its evaluation beyond T. cruzi. For the selected PPH sequences, AlphaFold-derived three-dimensional models underwent extensive molecular dynamics refinement, yielding stable conformational ensembles suitable for structure-based studies. Using this validated model, virtual screening of the Latin American Natural Products Database - LANaPDB - identified aptosimon as a top-ranked compound candidate. Molecular dynamics simulations further showed ligand-dependent binding behavior, suggesting alternative binding modes distinct from the canonical substrate configuration. In vitro assays demonstrated consistent antiparasitic activity against intracellular T. cruzi amastigotes (IC50 = 3.52 ug/mL) and Leishmania donovani promastigotes (IC50 = 13.06 ug/mL), supporting the biological relevance of the aptosimon-related lignan chemotype, hinokinin, across two kinetoplastid parasite models. Together, these results support PPH as a structurally tractable and biologically relevant candidate target, while identifying an aptosimon-related lignan chemotype, represented experimentally by hinokinin, as a cross-species antiparasitic scaffold that warrants further biochemical target-validation studies.

07.
medRxiv (Medicine) 2026-06-10

Frozen elephant trunk repair in heritable thoracic aortic disease: Impact of genetic aortopathy on long-term outcomes - A multicenter analysis

Aims This multicenter study aims to compare outcomes of total aortic arch replacement (TAR) using the frozen elephant trunk (FET) technique in patients with and without heritable thoracic aortic disease (HTAD) and to assess whether HTAD influences postprocedural adverse aortic events (AAEs). Methods From 06/2007 to 05/2024, aortic databases from 13 European centers were screened for HTAD patients undergoing TAR with FET. All consecutive dissection and aneurysm non-HTAD patients from the four core centers served as comparator. The primary outcome was AAE, a composite of diameter progression, distal stent graft induced new entry (dSINE), malperfusion, rupture and pseudoaneurysm at 5 years after FET implantation. Results Of 2739 FET patients, 196 (7.2%) were diagnosed with HTAD. The control group consisted of 867 non-HTAD FET patients. Marfan syndrome was the most common condition (72%), followed by Loeys-Dietz syndrome (11%), vascular Ehlers-Danlos syndrome (5.6%) and Turner syndrome (2.0%). Seventeen (8.8%) patients were diagnosed with ns-HTAD. At 5 years 46 (24%) AAEs occurred in the HTAD group, 169 (20%) in the non-HTAD group (p=0.2). Diameter progression was the most common event (10% vs. 12%; p=0.6), followed by dSINE (5.8% vs. 4.5%; p=0.5), malperfusion (4.2% vs. 3.3%; p=0.5), rupture (2.1% vs. 0.7%; p=0.09) and pseudoaneurysm (0.5% vs. 0.2%; p=0.5). Conclusions The FET technique appears safe and effective for acute and chronic aortic disease in HTAD patients, with outcomes comparable to non-HTAD cases and no increase in graft-related complications, challenging traditional concerns about stent graft use in genetically mediated aortic disease.

08.
arXiv (CS.AI) 2026-06-16

CmdNeedle: Measuring the Incompleteness of Command Denylists for AI Agents

arXiv:2606.15549v1 Announce Type: cross Abstract: The adoption of AI agents is increasing rapidly. Terminal AI agents, i.e., AI agents that run in terminal environments, are a widely used type of AI agents. Terminal AI agents rely heavily on shell command execution to interact with the host systems. They adopt a three-list command-gating mechanism to mitigate security risks introduced by command execution, with denylists serving as the load-bearing component. However, modern operating systems often ship a large, ever-expanding set of shell commands with complex functionalities. Our observation is that even a built-in denylist of Claude Code, well-maintained by its developers, can overlook bypass commands that invalidate its effectiveness. Such negligence leads to fragile command denylists that cannot even block operations that practitioners expect them to block. This paper presents the first systematic characterization of command denylist fragility in terminal AI agents. The paper formalizes the command denylist fragility problem and proposes an LLM-driven pipeline, CmdNeedle, to detect such fragility. It prompts the LLM to propose possible bypasses and iteratively repairs them using feedback from a validator that executes them in a sandbox. In the evaluation, we applied CmdNeedle to 1,709 real-world command denylists (containing 13,332 denylist rules) collected from GitHub. The evaluation shows several key findings, including that 69.0–98.6% of the denylists are fragile, that this fragility occurs consistently across projects and agents, and the validity of several possible root causes for this fragility. Our pipeline and findings will hopefully facilitate future research and practice regarding the command denylists used by AI agents.

09.
bioRxiv (Bioinfo) 2026-06-11

Revealing trajectories of multi-modal voxel-level changes in neurodegenerative diseases using latent event mapping

Neurodegenerative diseases are driven by pathological mechanisms that can be indirectly measured in vivo using multi-modal neuroimaging. However, current computational methods that aim to reconstruct trajectories of voxel-level changes in the brain are either not computationally scalable or fully interpretable, limiting their ability to reveal associations between disease progression and underlying mechanisms. Here we introduce Latent Event Mapping (LEMING), a generative unsupervised modelling technique that learns a latent map of disease events along a common pseudo-timeline of events. We apply LEMING to amyloid PET and structural MRI data from the Alzheimer's Disease Neuroimaging Initiative to reveal the first voxel-level trajectories of events in Alzheimer's disease. Notably, we show how LEMING can provide new insights into progression-dependent disease mechanisms. We find that acetylcholine receptor density is significantly positively associated with both late-stage amyloid and atrophy events, suggesting that either these receptors are targeted later in disease progression, or that amyloid does not play an active role. This has strong implications for therapeutics that target acetylcholine receptors, particularly for early-stage intervention strategies.

10.
arXiv (CS.CV) 2026-06-18

Improving Visual Token Reduction via Rectifying Distortions for Efficient Multimodal LLM Inference

Recent advancements in Multimodal Large Language Models (MLLMs) have achieved remarkable success in vision-language tasks, yet the quadratic computational complexity arising from the vast number of visual tokens incurs significant memory and latency bottlenecks. While visual token reduction (VTR) strategies have been explored to mitigate this burden, existing methods overlook the positional and attentional consistency between the full and reduced sequences, resulting in a distorted representation. To this end, we propose RESTORE, a novel VTR framework that rectifies the positional and attentional distortions while maintaining efficiency. Specifically, we present a simple yet effective calibration method that restores lost visual attention by augmenting attention weights based on relative distances. We also introduce a distinctive anchor selection for token merging to mitigate information loss during feature averaging. Experimental results on multiple benchmarks demonstrate that our method consistently improves the accuracy of various reduction methods, achieving state-of-the-art performance while maintaining computational efficiency. Project page is available at https://cvlab.yonsei.ac.kr/projects/RESTORE

11.
arXiv (CS.AI) 2026-06-17

From Noise to Order: Learning to Rank via Denoising Diffusion

arXiv:2602.11453v3 Announce Type: replace-cross Abstract: Learning-to-rank (LTR) methods have traditionally been limited to discriminative machine learning approaches that model the probability of the document being relevant to the query given some feature representation of the query-document pair. We propose an alternative denoising diffusion-based generative approach to LTR that instead models the full joint distribution over features and relevance labels. While in discriminative LTR, an over-parameterized ranking model may find different ways to fit the training data, we posit that candidate solutions that can explain the full data distribution under the generative setting maybe better at estimating relevance. Thus, we propose DiffusionRank that extends TabDiff, an existing diffusion model for tabular datasets, to create generative alternatives to classical discriminative pointwise and pairwise LTR objectives. Our work demonstrates improvements from DiffusionRank over discriminative counterparts on four standard LTR datasets and points to a rich space for future exploration to leverage ongoing advancements in deep generative models for LTR. Our code is publicly available at https://github.com/sadjadeb/DiffusionRank.

12.
arXiv (CS.AI) 2026-06-18

Scaling Learning-based AEB with Massive Unlabeled Data

arXiv:2606.18864v1 Announce Type: cross Abstract: This paper studies how to scale learning-based automatic emergency braking (AEB) with massive unlabeled fleet data under production constraints. Our approach is based on meta-feedback semi-supervised learning (MF-SSL), where a teacher generates pseudo labels for unlabeled driving data and is updated using a small labeled anchor set as safety-critical feedback. In production, anchor ambiguity and labeled-unlabeled mismatch can amplify systematic pseudo-label errors, leading to spurious triggers. We propose a stabilized MF-SSL framework with (i) Noise-Aware Decoupling, which removes ambiguity-prone anchors from the teacher's supervised update path, and (ii) kinematics-gated pseudo-labeling with a teacher conflict penalty to suppress mismatch-induced risk hallucinations on unlabeled data while maintaining broad coverage. Extensive experiments show consistent gains as unlabeled data scale from 1M to 1B windows, improving safety while keeping comfort stable. The 1B-trained student model is deployed to hundreds of thousands of vehicles and validated over \$10^9$ km of driving, achieving a positive-to-false activation ratio exceeding 100:1 and a 35% improvement in accident-free driving mileage over a production rule-only baseline.

13.
arXiv (CS.LG) 2026-06-19

Enhancing Graph Neural Networks Using Proximity Graphs for Dust Source Emission Forecasting

arXiv:2606.19825v1 Announce Type: new Abstract: Accurate prediction of dust source emissions is critical for mitigating the significant environmental and health hazards posed by dust storms. Traditional forecasting methods often struggle to capture the complex spatiotemporal dynamics of these phenomena. In this paper, we demonstrate that proximity graphs enable Graph Neural Networks (GNNs) to effectively model the intricate spatial and temporal relationships between data points. Specifically, we use proximity graphs–such as Delaunay triangulation, Gabriel graph, k-Nearest Neighbor graph, and Yao graph–as the input for GNNs (including GraphSAGE, Graph Convolutional Networks, and Graph Attention Networks) to perform message passing. Our approach highlights the effectiveness of integrating proximity graphs with GNNs for robust and accurate dust source forecasting. To emphasize the importance of proximity graph representations, we compare our method against GNNs using random graphs for message passing. The results show that GNNs with proximity graphs significantly outperform those with random graphs and are also far superior to Long Short-Term Memory (LSTM) model in dust source emission forecasting.

14.
arXiv (CS.CL) 2026-06-19

Benchmarking Local LLMs for Natural-Language-to-SQL Querying in Biopharmaceutical Manufacturing: An Empirical Benchmark on Consumer-Grade Hardware

Biopharmaceutical manufacturing organizations operate under regulatory frameworks such as FDA guidance, EU Good Manufacturing Practice (GMP), and the EU AI Act, which can restrict the use of cloud-based artificial intelligence systems. Locally deployed large language models (LLMs) offer a privacy-preserving alternative, but their suitability for pharmaceutical manufacturing tasks remains underexplored. This study evaluates four open-source LLMs (Qwen 2.5 Coder 7B, Llama 3.1 8B, Mistral 7B, and Meditron 7B) deployed locally via Ollama for natural-language-to-SQL generation over a pharmaceutical manufacturing database. A FastAPI-based evaluation platform, PharmaBatchDB AI, was developed using a synthetic Microsoft SQL Server database containing approximately 63,000 records across Batch, Manufacturing Execution System (MES), and Clean-In-Place (CIP) modules. Models were benchmarked on 60 domain-specific natural-language questions using metrics including SQL extraction rate, SQL compliance, factual consistency, ROUGE-L, hallucination rate, throughput, and latency. Qwen 2.5 Coder 7B, Llama 3.1 8B, and Mistral 7B generated SQL for all evaluation tasks, while Meditron 7B failed on nearly all tasks due to context-window limitations and poor SQL generation capability. Llama 3.1 8B achieved the highest SQL compliance, whereas Qwen 2.5 Coder 7B achieved the strongest overall text similarity and factual consistency. Performance differences between the two leading models were not statistically significant. The results show that code-tuned general-purpose LLMs outperform a domain-specific biomedical model on structured query generation for pharmaceutical manufacturing data. Although fully local, GxP-aligned NLQ systems are feasible on consumer hardware, current performance levels still require human oversight and downstream validation for regulated use.

15.
arXiv (CS.AI) 2026-06-19

Modeling Day-Long ECG Signals to Predict Heart Failure Risk with Explainable AI

arXiv:2601.00014v2 Announce Type: replace-cross Abstract: Heart failure (HF) affects 11.8% of adults aged 65 and older, reducing quality of life and longevity. Preventing HF can reduce morbidity and mortality. We hypothesized that artificial intelligence (AI) applied to 24-hour single-lead electrocardiogram (ECG) data could predict the risk of HF within five years. To research this, the Technion-Leumit Holter ECG (TLHE) dataset, including 69,663 recordings from 47,729 patients, collected over 20 years was used. Our deep learning model, DeepHHF, trained on 24-hour ECG recordings, achieved an area under the receiver operating characteristic curve of 0.80 that outperformed a model using 30-second segments and a clinical score. High-risk individuals identified by DeepHHF had a two-fold chance of hospitalization or death incidents. Explainability analysis showed DeepHHF focused on arrhythmias and heart abnormalities. This study highlights the feasibility of deep learning to model 24-hour continuous ECG data, capturing paroxysmal events essential for reliable risk prediction. Artificial intelligence applied to single-lead Holter ECG is non-invasive, inexpensive, and widely accessible, making it a promising tool for HF risk prediction.

16.
arXiv (CS.LG) 2026-06-11

Anatomy of Post-Training: Using Interpretability to Characterize Data and Shape the Learning Signal

arXiv:2606.12360v1 Announce Type: new Abstract: Language-model post-training is the main stage at which model behavior is shaped, yet it still largely involves optimization of scalar rewards that summarize diverse desiderata. This abstraction gives practitioners little visibility into what their data actually teaches models, allowing spurious correlations to be learned by a model and inducing undesirable behaviors such as over-stylization and sycophancy. To address this problem, we ask: can we inspect a preference dataset before optimization and decide, at the level of concepts, which behaviors a model should be allowed to learn? Motivated by this, we introduce a data-centric post-training pipeline that uses interpretability protocols to develop statistical hypotheses for the latent concepts separating preferred from dispreferred generations, making them explicit for fine-grained user feedback. Building on this view, we unify several interpretability-based training protocols as ways of shaping rewards via feature or data interventions. Empirically, we show that our pipeline diagnoses undesirable signals in existing preference data, mitigates off-target learning, and can also help amplify or shape desired properties such as safeguards and model personality. More broadly, our results suggest that interpretability can turn post-training from optimizing opaque proxy rewards into a process of auditing and sculpting the learning signal itself.

17.
arXiv (CS.LG) 2026-06-11

Point-Identification of a Robust Predictor Under Latent Shift with Imperfect Proxies

arXiv:2603.15158v2 Announce Type: replace Abstract: Addressing the domain adaptation problem becomes more challenging when distribution shifts across domains stem from latent confounders that affect both covariates and outcomes. Existing proxy-based approaches that address latent shift rely on a strong completeness assumption to uniquely determine (point-identify) a robust predictor. Completeness requires that proxies have sufficient information about variations in latent confounders. For imperfect proxies the mapping from confounders to the space of proxy distributions is non-injective, and multiple latent confounder values can generate the same proxy distribution. This breaks the completeness assumption and observed data are consistent with multiple potential predictors (set-identified). To address this, we introduce latent equivalent classes (LECs). LECs are defined as groups of latent confounders that induce the same conditional proxy distribution. We show that point-identification for the robust predictor remains achievable as long as multiple domains differ sufficiently in how they mix proxy-induced LECs to form the robust predictor. This domain diversity condition is formalized as a cross-domain rank condition on the mixture weights, which is substantially weaker assumption than completeness. We introduce the Proximal Quasi-Bayesian Active learning (PQAL) framework, which actively queries a small, targeted set of diverse domains that satisfy this rank condition. PQAL can recover the point-identified predictor, demonstrates robustness to varying degrees of shift and outperforms previous methods on synthetic data and semi-synthetic dSprites, IHDP, ACS Folktables datasets.

18.
arXiv (CS.CV) 2026-06-16

MMDiff: Extending Diffusion Transformers for Multi-Modal Generation

Diffusion transformers have demonstrated remarkable generative capabilities, yet the rich perceptual representations computed across their denoising trajectory are discarded once the content is rendered. We present MMDiff, a framework that transforms a frozen diffusion transformer into a multi-modal generative system that jointly produces images alongside any combination of dense perceptual modalities using lightweight decoder heads. Our central finding is that perceptual information is temporally distributed along the denoising trajectory, and that multi-timestep feature fusion with spatially varying aggregation weights is essential, improving semantic segmentation results by up to 28.7% mIoU over single-timestep extraction. We further adopt concept-driven attention extraction for interpretable spatial guidance, and show that frozen diffusion features are competitive with and complementary to state-of-the-art encoders such as DINOv3. By training only lightweight decoder heads on a frozen backbone, we achieve strong performance in semantic segmentation, salient object detection, and depth estimation, and demonstrate that this framework enables effective synthetic data generation at scale.

19.
arXiv (CS.AI) 2026-06-17

LLM-as-Judge in Education: A Curriculum-Grounded Marking Pipeline

arXiv:2606.17507v1 Announce Type: new Abstract: Generative AI and large language models (LLMs) are increasingly applied to question generation and automated assessment. However, deploying LLMs in preparation for high-stakes exams requires more than prompt engineering; it demands software pipelines that systematically ground model outputs in authorised curriculum artefacts and marking guidelines issued by education authorities. This paper presents a curriculum-grounded, configurable LLM-as-Judge pipeline for question-level marking, co-developed with an industrial partner, to support exam preparation for university admission. The pipeline identifies the relevant topics, subtopics, and cognitive demand of a question, and assembles verifiable and authorised context to support LLM judgement. Curriculum intent is operationalised through concrete syllabus artefacts, including prescribed verbs and outcomes, performance band descriptors, glossary definitions, and marking-guideline principles. A staged LLM workflow is employed to first generate question-specific rubrics, capturing structured expectations of performance, and then derive and evaluate marking criteria used to allocate marks to student responses. This design improves consistency, transparency, and alignment with official marking practices. Preliminary evaluation shows that the proposed LLM-as-Judge pipeline delivers marking outcomes comparable to human tutors, while yielding justifications that are more traceable to authorised curriculum artefacts and marking standards. The pipeline has also been integrated into an online study platform, where early deployment data provide initial insights into operational usage and manual overrides.

20.
arXiv (CS.AI) 2026-06-18

Spotlight: Synergizing Seed Exploration and Spot GPUs for DiT RL Post-Training

arXiv:2606.19004v1 Announce Type: cross Abstract: Reinforcement learning (RL) post-training of Diffusion Transformers (DiTs) is prohibitively expensive, requiring thousands of high-end GPUs. Existing works explore two directions to reduce cost: seed exploration improves training convergence by selecting high-contrast samples, yet adds compute to the critical path; spot GPUs offer 69–77\% lower cost, yet sit idle during training because DiT rollouts finish nearly simultaneously, which prevents LLM-style pipelining of rollout with training. Spot preemptions further break Sequence Parallelism (SP) groups, fragmenting GPU topology. We present Spotlight, the first system that harvests spot GPUs for DiT RL post-training. Spotlight rests on two key insights we devise: (1)~we show that exploration can tolerate stale model weights because exploration that uses the model weights from the previous iteration preserves the relative ranking of random seeds, allowing exploration to run on idle spot GPUs during training. (2)~SP reconfiguration can reuse on-node state, reducing group recovery from minutes to sub-second launches. Built on these insights, Spotlight introduces three techniques: a bandit-based exploration planner that maximizes reward variance within the training time budget, elastic sequence parallelism that reconfigures SP groups on the fly via persistent schedulers and intra-node weight copying, and a preemption-aware pull-based request scheduler that balances load and commits in-flight state upon preemption. We implement Spotlight on the open-source RL platform ROLL and evaluate it on Qwen-Image post-training. Spotlight reaches the same target validation score $4\times$ faster than baselines, reducing total cost by $1.4$-$6.4\times$ while achieving superior image quality on DeepSeek-OCR and Geneval datasets with resolution $512\times512$ and $1280\times1280$.

21.
arXiv (CS.LG) 2026-06-16

Transformers Learn the Mestre-Nagao Heuristic

arXiv:2606.15036v1 Announce Type: new Abstract: We train a two-layer transformer encoder to classify rational elliptic curves $E/\mathbb{Q}$ of conductor $\leq 10000$ as either rank 0 or rank 1 from the first 128 normalized Frobenius traces. We achieve >99% accuracy on both classes, and accuracy is essentially unchanged on test curves with no isogeny or quadratic-twist relative in the training set. We then apply techniques from mechanistic interpretability such as attention analysis, linear probing, activation patching, logit attribution, and neuron-level circuit analysis to reverse-engineer the algorithm the (centroid in function space) model learned. We find that a sparse circuit of 20 out of 512 layer-1 MLP neurons is sufficient for rank prediction under a linear probe with an AUROC of 0.992 at plateau, implementing a push-pull detector architecture of rank-0 and rank-1 detectors with a one-sided readout. However, we notice that the model has sub-optimal readout problems indicating a mismatch in rank-order between the readout pathway and the discriminative circuit. Critically, the learned input weights of the top discriminating neuron match the Mestre-Nagao sum heuristic weights $\log(p)/(p\cdot \log{B})$ with a Spearman coefficient $r = 0.997$ and Pearson coefficient $r = 0.952$: the model has learnt a result from analytic number theory from the Frobenius trace data alone. We additionally find that all 50 independently trained models concentrate CLS attention on prime positions at 2-50$\times$ the rate of composite positions. The CLS embedding encodes $\log{L(E,1)}$ with $R^2 = 0.962\pm 0.011$ across the 50 models (after controlling for the conductor). Activation patching analysis reveals that attention weights are dissociated from causal information flow. Additionally, the 50 solutions from training are near-identical in function space (with pairwise agreement $>$98.8%) despite large weight space barriers.

22.
arXiv (CS.LG) 2026-06-12

Geometry of Lightning Self-Attention: Identifiability and Dimension

arXiv:2408.17221v3 Announce Type: replace Abstract: We consider function spaces defined by self-attention networks without normalization, and theoretically analyze their geometry. Since these networks are polynomial, we rely on tools from algebraic geometry. In particular, we study the identifiability of deep attention by providing a description of the generic fibers of the parametrization for an arbitrary number of layers and, as a consequence, compute the dimension of the function space. Additionally, for a single-layer model, we characterize the singular and boundary points. Finally, we formulate a conjectural extension of our results to normalized self-attention networks, prove it for a single layer, and numerically verify it in the deep case.

23.
arXiv (CS.CL) 2026-06-18

Compact Geometric Representations of Hierarchies

Computing geometric representations of data is a cornerstone of modern machine learning, typically achieved by training dual encoders which map queries and documents into a shared embedding space. Recent work of You et al. [NeurIPS '25] has extended this approach to hierarchical retrieval, where relevance is determined by the ancestor-descendant relationships in a Directed Acyclic Graph (DAG). While previous work has shown that valid embeddings exist when the number of descendants is small, these bounds degrade significantly for deep hierarchies, requiring dimensions as large as the total number of nodes. In this paper, we investigate compact reachability embeddings for more general graph classes and provide theoretical guarantees for representing hierarchies using embeddings whose dimension depends on structural graph parameters. We prove that for any directed tree, there exists a reachability embedding in constant dimension 3, independent of the tree's size or depth. We generalize this result to graphs characterized by treewidth $t$, constructing embeddings of dimension $O(t \log n)$, where $n$ is the number of nodes. Complementing these upper bounds, we provide matching or near-matching lower bounds, showing that dimension $\Omega(n)$ is necessary for general DAGs and $\Omega(t/\log(n/t))$ is required for graphs of treewidth $t$. We also obtain upper and lower bounds parameterized by the number of cross-edges in the DAG. We additionally show that our embeddings can be constructed on real world datasets, and that they give much smaller dimensions in high recall regimes compared to prior embeddings with theoretical guarantees.

24.
arXiv (CS.LG) 2026-06-16

Priority-Aware Shapley Value

arXiv:2602.09326v2 Announce Type: replace Abstract: Shapley values are widely used for model-agnostic data valuation and feature attribution, yet they implicitly assume contributors are interchangeable. This can be problematic when contributors are dependent (e.g., reused/augmented data or causal feature orderings) or when contributions should be adjusted by factors such as trust or risk. We propose Priority-Aware Shapley Value (PASV), which incorporates both hard precedence constraints and soft, contributor-specific priority weights. PASV is applicable to general precedence structures, recovers precedence-only and weight-only Shapley variants as special cases, and is uniquely characterized by natural axioms. We develop an efficient adjacent-swap Metropolis-Hastings sampler for scalable Monte Carlo estimation and analyze limiting regimes induced by extreme priority weights. Experiments on data valuation (MNIST/CIFAR10) and feature attribution (Census Income) demonstrate more structure-faithful allocations and a practical sensitivity analysis via our proposed "priority sweeping".

25.
arXiv (quant-ph) 2026-06-19

Solving Nonequilibrium Dynamics via Influence Matrix Bootstrap: Floquet-PXP Model

arXiv:2606.19430v1 Announce Type: new Abstract: Studies of integrable systems have profoundly deepened the fundamental understanding of quantum many-body physics. While equilibrium properties such as ground states and thermodynamics can often be characterized efficiently, accurately characterizing nonequilibrium integrable dynamics remains a significant challenge. Here, we address this problem in the "Rule 201" quantum cellular automaton, an integrable Trotterization of the PXP Hamiltonian. Using the tensor-network approach of the influence matrix, we develop local conditions called generalized zipper conditions that allow exact solutions of local dynamics. We also introduce a numerical bootstrap method for solving influence matrices with finite but relatively large bond dimensions. This uncovers a rich landscape of nonequilibrium behavior exhibiting initial-state dependence. As an example, we investigate the fate of persistent oscillating dynamics under local non-integrable perturbations, and present analytical results for non-thermal relaxation constrained by conservation laws. We also obtain numerically exact results for entanglement growth across a broad class of initial states. Furthermore, from an information-theoretic perspective, we identify a refined structure of multitime correlations termed the hidden Markov order: the memory encoded in the dynamics separates into finite-length and long-range distributed components, which becomes transparent in an exact split-index matrix-product-state representation of the influence matrix. Our approach enables unified investigations of nonthermalizing and thermalizing regimes of nonequilibrium dynamics within a single analytically tractable model, and can be tested experimentally in state-of-the-art quantum simulators such as Rydberg atom arrays.