Academic Intelligence · Curated Daily

探索全球前沿学术脉络

AcademicHub 汇聚顶级期刊与预印本平台的实时文献。定制您的专属科研雷达,利用大语言模型自动生成交叉领域文献分析简报。

01.
arXiv (CS.CL) 2026-06-12

SkMTEB: Slovak Massive Text Embedding Benchmark and Model Adaptation

We introduce SkMTEB, the first comprehensive MTEB-style text embedding benchmark for Slovak, a low-resource West Slavic language, comprising 31 datasets across 7 task types – nearly 4$\times$ the depth of existing multilingual benchmark coverage for Slovak. Our evaluation of 31 embedding models reveals that large instruction-tuned multilingual models achieve the strongest performance, while existing Slovak-specific models trained for NLU tasks transfer poorly to embedding tasks. To address the need for efficient, locally-deployable Slovak embeddings, we develop \texttt{e5-sk-small} (45M parameters) and \texttt{e5-sk-large} (365M) by applying vocabulary trimming and fine-tuning to Multilingual E5 models. Despite size reductions of up to 62\%, our open-source models achieve competitive performance with proprietary APIs while remaining locally deployable for semantic search and retrieval-augmented generation (RAG). We release the benchmark, models, datasets, and code openly, hoping our approach offers a replicable path for other under-resourced languages.

02.
bioRxiv (Bioinfo) 2026-06-11

DeePEn - A Depth sensitive benchmark for Protein Engineering

Recent progress in modeling techniques and high-throughput screening has significantly enhanced the accessibility of protein engineering. Nevertheless, further progress gets hindered by the lack of robust benchmarks that capture the practical challenges for real-world protein engineering. Here, we introduced DeePEn, a Depth-sensitive benchmark for Protein Engineering that quantifies a models generalization capabilities when predicting protein fitness at increasing mutational distance from the wildtype or training data. We defined distance as the number of simultaneous point mutations, i.e., single amino acid variants (SAVs), moving from wild-type to mutant (edit distance in computer science jargon). Specifically selecting four deep mutational scanning (DMS) datasets with sufficient multi-mutation data points from ProteinGym, we assessed recent predictive models, including general and biophysics-informed protein Language Models (pLMs), and a non-transformer neural network. Our results highlight how the performance of all models deteriorates with increasing mutational distance and that no single metric sufficiently captures the diverse requirements of protein engineering. To overcome these shortcomings, DeePEn provides a readily available resource for multi-metric benchmarking that focuses on the prediction of distant variants.

03.
Nature (Science) 2026-06-10

Light slows down carbon nanotubes in water

Water-suspended carbon nanotubes move more slowly in green light, suggesting that excited electrons in the tubes couple to the water through ‘quantum friction’. Water-suspended carbon nanotubes move more slowly in green light, suggesting that excited electrons in the tubes couple to the water through ‘quantum friction’.

04.
arXiv (CS.LG) 2026-06-12

GenAutoML: An Agentic Framework for Dynamic Architecture Generation and Optimization in Time-Series Analysis

arXiv:2606.05860v2 Announce Type: replace Abstract: Designing neural architectures for time-series forecasting and anomaly detection remains a resource-intensive task that often requires substantial domain expertise. Traditional Automated Machine Learning (AutoML) systems typically rely on static, predefined search spaces, limiting their ability to adapt to diverse data characteristics. We present GenAutoML, an agentic framework that leverages Large Language Models (LLMs) as neural architects to bridge natural-language requirements and executable PyTorch implementations. The framework incorporates a Sandboxed Reflection Loop for autonomous code refinement and a Signature-Aware Runtime that enforces architectural consistency and execution safety. To improve robustness under non-stationary conditions, we further introduce a Dynamic Reversible Instance Normalization (Dyn-RevIN) wrapper. Experiments on the ETTh1, ETTm1, and Weather benchmarks demonstrate that GenAutoML can dynamically generate task-specific neural architectures tailored to dataset characteristics. Among the generated models, WaveInterferenceNet achieves inference latency below 0.01 ms per sample while maintaining competitive predictive performance. By emphasizing computational efficiency, architectural adaptability, and stable optimization behavior, GenAutoML enables the creation of ultra-lightweight neural networks suitable for resource-constrained and latency-sensitive Edge AI deployments.

05.
arXiv (CS.AI) 2026-06-17

LLM-Aided Joint Secrecy Precoding and Trajectory for RSMA-Based Heterogeneous UAV Networks

arXiv:2507.17188v3 Announce Type: replace-cross Abstract: This paper investigates secure communications in rate-splitting multiple access (RSMA) enabled heterogeneous UAV networks, where multiple UAVs collaboratively serve ground terminals in the presence of eavesdroppers. By jointly considering secrecy rate maximization and propulsion energy consumption minimization, we formulate a multi-objective optimization problem involving UAV trajectory design, service association, power allocation, and secrecy precoding under mobility, collision-avoidance, service-capacity, and communication constraints. The formulated problem is highly non-convex due to the coupling among UAV trajectories, RSMA transmission variables, and secrecy constraints.To address the resulting non-convex and highly coupled optimization problem, we propose a hierarchical optimization framework. The inner layer uses a semidefinite relaxation (SDR)-based S2DC algorithm combining penalty functions and difference-of-convex (D.C.) programming to solve the secrecy precoding problem with fixed UAV positions. The outer layer introduces a Large Language Model (LLM)-guided heuristic multi-agent reinforcement learning approach (LLM-HeMARL) for trajectory optimization. LLM-HeMARL efficiently incorporates LLM-generated expert heuristic policy, enabling UAVs to learn energy-aware, security-driven trajectories without the inference overhead of real-time LLM calls. The simulation results show that our method outperforms existing baselines in secrecy rate and energy efficiency, with consistent robustness across varying UAV swarm sizes and random seeds.

06.
arXiv (CS.CV) 2026-06-18

CAOA – Completion-Assisted Object-CAD Alignment

Accurately aligning CAD models to their corresponding objects in indoor RGB-D scans is a central challenge in 3D semantic reconstruction. The task requires estimating a 9-Degree-of-Freedom (DoF) pose-position, rotation, and scale along three axes-but is hindered by noisy and incomplete scans, as well as segmentation errors that cause geometric distortions. We present Completion-Assisted Object-CAD Alignment (CAOA), a method that integrates a semantically and contextually aware point cloud completion module with a symmetry-aware relative pose estimation algorithm, enabling precise alignment of CAD models to scanned objects. Existing completion methods are typically trained and evaluated on synthetic datasets, which often fail to generalize to real-world scans. To bridge this gap, we introduce a synthetic data generation strategy tailored to indoor scenes, significantly reducing the synthetic-to-real domain gap-validated through quantitative comparisons with widely used completion datasets. In addition, we release S2C-Completion, an expert-annotated dataset of over 8,500 object-CAD pairs from Scan2CAD, created for real-world indoor single-object completion and intended as a new benchmark for this task. For object-CAD alignment, we incorporate symmetry information via a symmetry-aware loss, improving robustness to symmetric ambiguities. On the Scan2CAD benchmark, CAOA achieves a 17% accuracy improvement over state-of-the-art methods.

07.
arXiv (CS.AI) 2026-06-19

UniMM: A Unified Mixture Model Framework for Multi-Agent Simulation

arXiv:2501.17015v2 Announce Type: replace Abstract: Simulation plays a crucial role in assessing autonomous driving systems, where the generation of realistic multi-agent behaviors is a key aspect. In multi-agent simulation, the primary challenges include behavioral multimodality and closed-loop distributional shifts. In this study, we formulate a unified mixture model (UniMM) framework for generating multimodal agent behaviors, which can cover the mainstream methods including regression-based mixture models and discrete NTP models. Furthermore, we introduce a closed-loop sample generation approach tailored for mixture models to mitigate distributional shifts. Within the UniMM framework, we recognize critical configurations from both the model and data perspectives. We conduct a systematic examination of various model configurations, and comprehensively characterize their effects. Moreover, our investigation into the data configuration highlights the pivotal role of closed-loop samples in achieving realistic simulations. To extend the benefits of closed-loop samples across a broader range of mixture models, we further introduce a temporal disentanglement-and-alignment mechanism to address the shortcut learning and off-policy learning issues. Leveraging insights from our exploration, the distinct variants proposed within the UniMM framework, including discrete, anchor-free, and anchor-based models, all achieve state-of-the-art performance on the WOSAC benchmark.

08.
arXiv (CS.CL) 2026-06-17

Implicit vs. Explicit Prompting Strategies for LVLMs in Referential Communication

Two recent studies (Jones et al. (2026); Zeng et al. (2026)) reach apparently contradictory conclusions about whether LVLMs can coordinate on efficient referring expressions. We control for task differences between the studies while directly comparing their prompting styles. We replicate the finding that models can coordinate efficient referring expressions when explicitly prompted to do so, suggesting that other task differences are not responsible for divergent results. However, we also find that the same models fail to infer the need for communicative efficiency from a more implicit prompt, highlighting critical differences between how humans and AI systems communicate.

09.
arXiv (quant-ph) 2026-06-12

Quantum Stochastic Inflation

arXiv:2606.12636v1 Announce Type: cross Abstract: We formulate stochastic inflation in an open quantum system framework. The field coarse-grained in a patch of fixed physical size, and the total momentum of that patch, form a canonical pair and act on a one-mode Fock space which we identify as the "bulk". At each time step, new comoving modes join the coarse-grained patch and the bulk has to be redefined. This redefinition produces an entangled mode that is traced over, yielding a non-unitary evolution equation for the bulk's density matrix. For a free test field in de Sitter, one obtains GKLS dynamics, generated by an effective Hamiltonian and a single non-Hermitian Lindblad operator, hence diffusion and Hubble friction originate from the same quantum channel. The Wigner-Weyl transform of the GKLS equation leads to a Fokker-Planck equation for the Wigner function, which matches the one that applies to the classical phase-space distribution of stochastic inflation. We also provide several schemes under which one can unravel the GKLS dynamics into stochastic Schrodinger equations when continuous measurements of the decoupled mode are performed, making contact with Langevin formulations of stochastic inflation. In the light-field regime, an additional overdamped reduction can be performed by integrating out the momentum variable in the Wigner distribution, leading to Starobinsky's slow-roll Fokker-Planck equation. In that regime, the purity of the patch is strongly suppressed. In contrast, for heavy fields, field diffusion is suppressed and the coarse-grained patch remains close to a pure underdamped oscillator, which prevents a classical stochastic treatment.

10.
arXiv (CS.CL) 2026-06-16

Understanding the Behaviors of Environment-aware Information Retrieval

Recent retrieval-augmented generation (RAG) approaches have demonstrated strong capability in handling complex queries, yet current research overlooks a critical challenge: different retrievers require fundamentally different query formulation strategies for optimal performance. In this work, we present the first systematic analysis of how LLMs can learn to adapt their query formulation strategies for different retrievers via reinforcement learning (RL). Our empirical study reveals that RL effectively teaches an LLM to tailor its queries to specific retriever characteristics. We discover that different retrievers exhibit surprisingly distinct optimal query styles (e.g., descriptive vs. question-like), suggesting strategies learned for one retriever ineffective for another. We further show that performance can be enhanced by incorporating retriever-specific human guidance and by scaling model size. To facilitate learning over multi-retrieval-step trajectories, we introduce a branching-based rollout technique that improves training stability. Our work provides the first empirical evidence and actionable insights for building truly retriever-aware RAG systems. Code and resources are available at https://github.com/LCO-Embedding/Envs-aware-Information-Retrieval.

11.
arXiv (CS.CV) 2026-06-16

Bridging Geographic Bias in Urban Streetscape Inference via Lifelong Learning with Visual-Semantic Pivoting

作者:

Visual perception of urban streetscapes underpins evidence-based decisions in landscape planning, public health, and place-making. Yet models trained on a few well-photographed metropolises systematically misjudge underrepresented districts, propagating geographic bias into downstream policy. We address this gap with HVSP-LL, a lifelong learning framework that couples a stratified visual-semantic pivoting module with an equity-aware rehearsal mechanism. The pivoting module organises landscape concepts along a three-tier ontology (macro structure, meso composition, micro element) and aligns image features to learnable semantic anchors at each tier, providing transferable representations that resist distributional drift. The lifelong adaptation component sequentially absorbs new urban regions while constraining inter-region perception gaps through a worst-region sample-reweighting objective and a structurally-aware exemplar buffer. We evaluate HVSP-LL on a panoramic streetscape benchmark assembled from twelve cities across four continents and seven perceptual dimensions. The framework attains 0.834 Spearman correlation on the held-out city sequence, an absolute 6.1 point improvement over the strongest continual baseline, and shrinks the inter-city perception gap to 0.094 – a 38% reduction relative to the strongest continual baseline (0.151) and a 57% reduction relative to a representative regularisation baseline (0.218). Ablations confirm that each tier of the pivoting hierarchy contributes monotonically, and the equity-aware rehearsal converts mean backward transfer from -0.038 (without retention) to +0.013, eliminating catastrophic forgetting on the held-out sequence. Our results indicate that hierarchical anchoring is a practical pathway toward geographically equitable streetscape inference at city scale.

12.
arXiv (CS.CV) 2026-06-16

OneFocus: Enabling Real-World X-ray Security Screening with a Unified Vision-Language Model

X-ray contraband detection is critical for security in large-scale logistics and transportation, yet conventional detectors struggle to adapt to emerging contraband types and lack fundamental visual understanding. Vision-language models (VLMs) offer strong generalization but are hindered by the scarcity of high-quality X-ray image-caption data. To bridge this critical gap, we present MMXray, a meticulously curated benchmark of 52,124 image-caption pairs spanning 28 fine-grained classes of X-ray contraband. To enrich MMXray with realistic occlusion patterns, we further introduce CleanDET, a dedicated synthesis dataset containing clean foreground contraband images from 28 categories and background images with diverse density levels, together with AnyContraSyn, a controllable synthesis method designed to operate on CleanDET. We also develop OnePipe, an extensible pipeline for systematic data curation. Built on MMXray, we propose OneFocus, a unified VLM that supports four core tasks: visual question answering, contraband localization, classification, and image understanding. OneFocus achieves state-of-the-art performance in X-ray contraband understanding and demonstrates robust cross-domain generalization, establishing a strong vision-language baseline for security screening.

13.
arXiv (CS.CL) 2026-06-16

CAF-Gen: A Multi-Agent System for Enriching Argumentation Structures

Formalizing complex reasoning from natural text is one of the central challenges in computational linguistics. It requires systems to understand not just keywords but also the context and complex reasoning embedded in a text. Current Argument Mining (AM) techniques identify basic claims and premises, yet they often struggle to capture the richer structural information required by advanced schemas such as the Carneades Argumentation Framework (CAF), which incorporates features such as premise types, proof standards, and argument schemes. We address this limitation by introducing CAF-Gen, an automated multi-agent framework designed to enrich shallow argument structures into CAF-compliant argument models. By employing an iterative Creator-Reviewer pipeline, a creator agent's output is validated by a critical agent to ensure structural integrity. This multi-agent collaboration is crucial for mitigating the structural instability typical of single-pass generative models. Our experiments demonstrate that the iterative feedback loop improves the quality of the resulting data and achieves strong alignment with the original annotations, while producing structurally richer models. Our findings show that the multi-agent system can overcome the limitations of single-pass generation, providing a robust methodology for the automated modeling of formal argumentation.

14.
arXiv (CS.AI) 2026-06-17

A Machine-Learned Comorbidity Index

arXiv:2606.17450v1 Announce Type: new Abstract: Traditional comorbidity scores (e.g., Charlson and Elixhauser) are widely used for risk adjustment and patient stratification, but they have two key limitations: (i) they are largely mortality-centric and do not align well with other clinical outcomes, and (ii) their linear, rule-based structure cannot capture nonlinear, outcome-specific risk relationships. We propose a Machine-Learned Comorbidity Index (MLCI) that maps diagnosis codes to a single scalar by maximizing the normalized Hilbert-Schmidt Independence Criterion (nHSIC) between the learned score and multiple clinical outcomes. MLCI captures nonlinear risk-outcome dependence and is supported by a theory that characterizes when a unified, informative admission-level ordering can be achieved across outcomes. Empirical results on multiple benchmark electronic health record (EHR) datasets show that MLCI outperforms strong baselines across multiple evaluation metrics.

15.
arXiv (CS.CV) 2026-06-16

DPC-VQA: Decoupling Quality Perception and Residual Calibration for Video Quality Assessment

Recent multimodal large language models (MLLMs) have shown promising performance on video quality assessment (VQA) tasks. However, adapting them to new scenarios remains expensive due to large-scale retraining and costly mean opinion score (MOS) annotations. In this paper, we argue that a pretrained MLLM already provides a useful perceptual prior for VQA, and that the main challenge is to efficiently calibrate this prior to the target MOS space. Based on this insight, we propose DPC-VQA, a decoupling perception and calibration framework for video quality assessment. Specifically, DPC-VQA uses a frozen MLLM to provide a base quality estimate and perceptual prior, and employs a lightweight calibration branch to predict a residual correction for target-scenario adaptation. This design avoids costly end-to-end retraining while maintaining reliable performance with lower training and data costs. Extensive experiments on both user-generated content (UGC) and AI-generated content (AIGC) benchmarks show that DPC-VQA achieves competitive performance against representative baselines, while using less than 2% of the trainable parameters of conventional MLLM-based VQA methods and remaining effective with only 20% of MOS labels. The code will be released upon publication.

16.
arXiv (CS.AI) 2026-06-19

Wisdom of Committee: Diverse Distillation from Large Foundation Models and Domain Experts

arXiv:2402.14035v4 Announce Type: replace-cross Abstract: Knowledge distillation from foundation models to compact domain models is challenging due to substantial gaps in capacity, architecture, and modality. For example, in our experiments, distilling from a 76M-parameter language model to a 2M-parameter recommender closes less than 40% of the performance gap between the undistilled student and the teacher. We show that introducing domain-specific experts – which share the student's architectural characteristics – alongside the foundation model as a diverse teacher committee significantly improves transfer. However, standard multi-teacher methods fail to exploit this diversity: naively combining heterogeneous teachers can degrade performance below single-teacher distillation. To address this, we propose DiverseDistill, an interactive distillation framework that employs a learnable Question-Answer mechanism to generate teacher-conditioned queries and align heterogeneous teacher outputs into the student's representation space. Unlike methods requiring gradient-based co-optimization or architectural modification of teachers, DiverseDistill operates with frozen teachers using only forward-pass inference through their intermediate layers: no parameter updates, no co-training, and no architectural surgery. A dynamic teacher importance mechanism further reduces training cost by filtering low-relevance teachers per sample (e.g., ~30% fewer forward passes with no quality loss for recommendation tasks), while the entire Distillation Module is discarded after training, adding zero inference overhead. Evaluations on recommendation (38x compression) and vision (3.6x compression) tasks demonstrate that DiverseDistill recovers 73-114% of the teacher-student performance gap, consistently outperforming all single- and multi-teacher baselines.

17.
arXiv (CS.AI) 2026-06-12

Mechanical Conscience: A Mathematical Framework for Dependability of Machine Intelligenc

arXiv:2605.03847v2 Announce Type: replace Abstract: Distributed collaborative intelligence (DCI), encompassing edge-to-edge architectures, federated learning, transfer learning, and swarm systems, creates environments in which emergent risk is structurally unavoidable: locally correct decisions by individual agents compose into globally unacceptable behavioral trajectories under uncertainty. Existing approaches such as constrained optimization, safe reinforcement learning, and runtime assurance evaluate acceptability at the level of individual actions rather than across behavioral trajectories, and none addresses the multi-participant, uncertainty-laden nature of DCI deployments. This paper introduces mechanical conscience (MC), a novel concept and simplified mathematical framework that operationalizes trajectory-level normative regulation for both single-agent and distributed intelligent systems. Mechanical conscience is defined as a supervisory filter that minimally corrects a baseline policy's actions to reduce cumulative deviation from a normatively admissible region, while accounting for epistemic uncertainty. We introduce associated constructs, conscience score, mechanical guilt, and resonant dependability, that provide an interpretable vocabulary and computable governance signals for this emerging field. Core theoretical properties are established: admissibility equivalence, existence of optimal regulation, and monotonic deviation reduction. Illustrative results demonstrate that MC-regulated agents maintain trajectory-level normative acceptability where conventional controllers drift outside admissible bounds, and that the framework naturally extends to suppress interaction-induced emergent risk in multi-agent DCI settings.

19.
arXiv (CS.LG) 2026-06-17

Uncertainty Quantification of Engineering Structures by Polynomial Chaos Expansion and Multivariate Active Learning

arXiv:2606.17233v1 Announce Type: new Abstract: In many engineering applications, a single high-fidelity model produces multiple quantities of interest (QoIs) under the same input parameters, e.g. finite element models of complex physical systems. To alleviate the high computational cost of direct model evaluations, surrogate models are widely used to construct efficient approximations of model responses. Naturally, the accuracy of surrogates strongly depends on the quality of the experimental design (ED). However, a single ED may not provide an adequate representation for all outputs simultaneously, especially when different outputs exhibit varying sensitivities to the input variables. A straightforward solution is to perform separate sampling for each output, but this results in increased sampling complexity and computational cost. From a statistical perspective, such an approach also ignores potential correlations among all outputs and may compromise data consistency. To address this issue, an adaptive sequential sampling method for constructing polynomial chaos expansion surrogate models is generalized for vector valued QoIs. The method sequentially selects new samples from a candidate pool based on their local contribution to the output variance, while balancing distance-based exploration of the input space and exploitation of aggregated variance information across all outputs. Its performance is compared with non-sequential Latin Hypercube Sampling through several numerical examples from engineering problems. Numerical results demonstrate that the proposed strategy improves both surrogate accuracy and stability, and provides a more reliable estimation of second-order statistics.

20.
arXiv (CS.LG) 2026-06-16

In-Context Learning Is Provably Bayesian Inference: A Generalization Theory for Meta-Learning

arXiv:2510.10981v3 Announce Type: replace-cross Abstract: This paper develops a finite-sample statistical theory for in-context learning (ICL), analyzed within a meta-learning framework that accommodates mixtures of diverse task types. We introduce a principled risk decomposition that separates the total ICL risk into two orthogonal components: Bayes Gap and Posterior Variance. The Bayes Gap quantifies how well the trained model approximates the Bayes-optimal in-context predictor. For a uniform-attention Transformer, we derive a non-asymptotic upper bound on this gap, which explicitly clarifies the dependence on the number of pretraining prompts and their context length. The Posterior Variance is a model-independent risk representing the intrinsic task uncertainty. Our key finding is that this term is determined solely by the difficulty of the true underlying task, while the uncertainty arising from the task mixture vanishes exponentially fast with only a few in-context examples. Together, these results provide a unified view of ICL: the Transformer selects the optimal meta-algorithm during pretraining and rapidly converges to the optimal algorithm for the true task at test time.

21.
arXiv (CS.AI) 2026-06-16

Visualizing Uncertainty: Spatial Maps of Missing and Conflicting Evidence in Deep Learning

arXiv:2606.15767v1 Announce Type: cross Abstract: Understanding when and why deep neural networks are uncertain is crucial for deploying reliable machine learning systems in safety-critical domains. While existing uncertainty quantification methods provide scalar measures of model confidence, they offer limited insight into which spatial regions of an input contribute to different types of uncertainty. We propose a novel visualization framework, Uncertainty Activation Map (UAM), that combines Evidential Deep Learning (EDL) with Full-Gradient Class Activation Mapping (FullGrad) to generate interpretable spatial uncertainty activation maps. Our approach distinguishes between two fundamental types of uncertainty: vacuity, representing lack of evidence, and dissonance, capturing conflicting evidence between competing hypotheses. By leveraging the complete gradient decomposition property of FullGrad and the principled uncertainty quantification of Subjective Logic, our method produces theoretically grounded visualizations that highlight specific image regions responsible for model uncertainty. With this framework, vacuity and dissonance activation maps are generated by computing belief-weighted attributions, enabling identification of where models lack knowledge versus where they encounter ambiguous evidence. Extensive evaluations across multiple benchmark datasets demonstrate that the proposed framework effectively addresses the critical gap between uncertainty quantification and explainability, providing intuitive visual feedback to assess model reliability in complex visual recognition tasks.

22.
arXiv (CS.CV) 2026-06-17

TerraTransfer: Learning End-to-End Driving Policies Without Expert Demonstrations

End-to-end autonomous driving has achieved state-of-the-art performance on benchmarks and real-world deployments. Its standard training recipe, however, is expensive across all stages: collecting and labeling millions of driving frames is costly, and closed-loop RL on images is bottlenecked by the per-step cost of photorealistic rendering plus a forward pass through a large vision backbone. Self-play in vectorized simulators changes the economics: millions of rollout steps per second, and a state distribution naturally rich in collisions, near-misses, and recoveries that no driving log contains. Our approach exploits this asymmetry by decoupling learning to drive from learning to see. We pretrain a single policy by self-play, then align its latent space with a pretrained vision backbone, through the action KL divergence and a batch-relational low-rank structural loss. The action target comes from the self-play policy, so alignment never supervises against a logged trajectory: a paired dataset of (image, scene-state) frames suffices, with no need for the curated expert demonstrations that imitation pretraining is built on. On photorealistic 3D Gaussian splatting closed-loop scenarios, the resulting end-to-end policy matches or exceeds prior end-to-end methods.

23.
arXiv (CS.CL) 2026-06-12

Observable Patterns Are Not Explanations: A Causal-Geometric Analysis of Latent Reasoning Models

Latent reasoning models (LRMs) replace explicit chain-of-thought with continuous thoughts. Recent work treats observable latent-state patterns, such as BFS-like frontiers and decodable arithmetic computation, as evidence for internal reasoning mechanisms. Evaluating two LRMs (Coconut and CODI) against controls lacking the proposed recurrence or curriculum, we find these patterns also appear in the controls and do not always causally affect behavior. Causal interventions reveal that latent-thought utilization is not binary but graded, scaling with a thought's causal effect on model behavior. Geometric analyses reveal this effect concentrates in low-rank directions whose step-to-step geometry grows more structured as their behavioral influence increases. Latent thoughts should therefore be treated as hidden computation, not hidden explanation: decodability, attention, or static structure alone cannot establish mechanism. LRM interpretability thus requires matched controls and causal tests.

24.
medRxiv (Medicine) 2026-06-11

Long-term Penetrance of Disease Variants in Genes Prioritized for Genomic Newborn Screening: Evidence from Adult Biobanks

Importance: Genomic newborn screening (gNBS) is a potential public health intervention, but its positive predictive value (PPV) remains uncertain. Estimating the prevalence and penetrance of pathogenic and likely pathogenic (P/LP) variants in genes prioritized for screening may clarify the long-term PPV and clinical utility of gNBS. Objective: To compare ICD-based ascertainment, electronic medical record (EMR) review, and clinical assessment of genetic disorders in adults with P/LP variants in 54 genes prioritized for gNBS. Design: Two-cohort observational study with EMR review and clinical assessment in the hospital-based cohort. Setting: The U.K. Biobank (UKB) and Mass General Brigham Biobank (MGBB). Participants: 451,877 adults from the UKB and 53,371 from the MGBB, all with exome sequencing data. Exposures: P/LP variants in 54 genes prioritized through expert consensus for gNBS, in genotypes consistent with each gene's inheritance pattern. Main outcomes and measures: The primary outcome was the absolute difference in the proportion of MGBB participants identified as affected by ICD versus EMR ascertainment. Secondary outcomes included findings from clinical assessments of undiagnosed MGBB participants, corrected UKB penetrance estimates, and extrapolation to U.S.. annual birth cohorts and living adults. Results: P/LP variants were identified in 665 UKB participants (0.15%) and 82 MGBB participants (0.15%), approximately 1 in 650. In MGBB, EMR review revealed that 58/82 individuals (70.7%) were undiagnosed, although 25 of 58 (43.1%) had documented symptoms. Disease-associated ICD codes were found in 39.0% (32/82) of participants, whereas EMR review identified symptoms in 59.8% (49/82, McNemar P

25.
arXiv (CS.LG) 2026-06-16

When to use what Schatten-$p$ norm in deep learning?

arXiv:2606.15268v1 Announce Type: new Abstract: Schatten-$\infty$ based optimizers such as Muon have shown promising empirical performance, but there remains seemingly conflicting observations regarding whether they are beneficial. We resolve this conflict by showing that the conclusion is regime dependent. Even when the objective is smooth in the Schatten-$\infty$ geometry, smaller Schatten-$p$ geometries can be optimal, specifically in the low-dimensional regime, which we show includes Chinchilla scaling. This conclusion follows from a new noise-robust acceleration result for the SODA framework for $p>2$. The same analysis explains why Muon-like methods do not require warmup, why they naturally favor large batches, and yields a batch size scaling rule for arbitrary $p$.