Academic Intelligence · Curated Daily

探索全球前沿学术脉络

AcademicHub 汇聚顶级期刊与预印本平台的实时文献。定制您的专属科研雷达,利用大语言模型自动生成交叉领域文献分析简报。

01.
arXiv (CS.CV) 2026-06-16

Effective and Low-cost Lane-based Map Localization for Vehicle-Centric Route Generation

Driver-centric route representation plays a vital role in intuitive driving guidance systems. This paper presents OLRA, a low-cost, map-localization-based framework that derives driver-view-aligned routes by matching map-based navigation routes with camera-detected lane markings. This alignment process mutually enhances vehicle localization accuracy and visual route consistency. To bridge the evaluation gap across different paradigms, we introduce practical route evaluation metrics and benchmark OLRA against OpenPilot, a representative direct-generation approach. Experimental results on the nuScenes dataset demonstrate that OLRA outperforms OpenPilot in complex road segments and in route estimation at distance beyond 20 meters, achieving lower overall Euclidean error. This study is expected to promote future research in low-cost, maplocalization-based route generation methods.

02.
arXiv (CS.AI) 2026-06-12

TimeROME-DLM: Temporal Causal Tracing and Low-Rank Inference-Time Knowledge Editing for Masked Diffusion Language Models

arXiv:2606.12841v1 Announce Type: cross Abstract: Masked diffusion language models (MDLMs) such as LLaDA now rival autoregressive (AR) LLMs, but every existing knowledge-editing and unlearning method (ROME, MEMIT, etc.) targets AR transformers and either makes assumptions that fail under iterative denoising, or requires gradient updates whose backward-pass activations cost tens of GB of extra VRAM and which collapse MDLMs at standard learning rates. We introduce TimeROME-DLM, the first training-free, gradient-free, inference-time knowledge-editing framework for MDLMs. It couples two components: a Temporal Indirect Effect (TIE) causal-tracing protocol that identifies, for each fact, the coordinate whose intervention most strongly drives the object prediction at later denoising steps; and a closed-form, low-rank residual edit memory that aggregates subject keys and target deltas across all forget facts and applies a single ridge-regularised update at that coordinate at every diffusion forward, with sparsification to limit utility spillover. Backbone weights stay frozen; only three hyperparameters (alpha, lambda, q) are tuned on a small validation split. On TOFU forget01 with TOFU-finetuned LLaDA-8B-Base, TimeROME-DLM cuts forget-set log-probability by roughly 83 nats. The same configuration transfers to LLaDA-8B-Instruct, Dream-7B, MMaDA-8B, DiffuLLaMA-7B, and LLaDA-MoE-1.4B. It keeps retain-set log-probability nearly flat (within ~1 nat at the utility-safe operating point) across 50 sequentially inserted facts, delivers a four- to fourteen-fold wall-clock speedup with zero additional VRAM over the strongest converged training-time baseline, and scales sub-linearly to 400 facts. TimeROME-DLM closes the locate-then-edit gap between AR LLMs and MDLMs at a fraction of the computational cost.

03.
arXiv (CS.LG) 2026-06-12

From geometry to dynamics: Learning overdamped Langevin dynamics from sparse observations with geometric constraints

arXiv:2512.23566v2 Announce Type: replace-cross Abstract: How can we learn the laws underlying the dynamics of stochastic systems when their trajectories are sampled sparsely in time? Existing methods either require temporally resolved high-frequency observations, or rely on geometric arguments that apply only to conservative systems, limiting the range of dynamics they can recover. Here, we present a new framework that reconciles these two perspectives by reformulating inference as a stochastic control problem. Our method uses geometry-driven path augmentation, guided by the geometry in the system's invariant density to reconstruct likely trajectories and infer the underlying dynamics without assuming specific parametric models. Applied to overdamped Langevin systems, our approach accurately recovers stochastic dynamics even from extremely undersampled data, outperforming existing methods in synthetic benchmarks. This work demonstrates the effectiveness of incorporating geometric inductive biases into stochastic system identification methods.

04.
medRxiv (Medicine) 2026-06-15

Differential DNA Methylation and Delirium After Anesthesia and Surgery

Background: DNA methylation is an epigenetic modification that regulates gene expression in response to environmental exposures. We measured differential DNA methylation levels in blood before after general anesthesia and surgery in participants with and without postoperative delirium (POD) and postoperative neurocognitive disorder (PNCD). Methods: Blood sampling, delirium assessment and cognitive testing were prospectively performed at baseline before non-cardiac, non-neurologic surgery, and at 24 hours (24h) and 6 weeks (6wk) thereafter in 94 participants comprising 13 with POD and 81 without POD, and 40 with PNCD and 54 without PNCD 6wk after surgery who were matched for age and sex in the INTUIT and MADCO cohorts. DNA methylation was assessed using the Illumina Infinium MethylationEPIC Beadchip. Results: 132 differentially methylated positions (DMPs) annotated to 198 differentially methylated genes (DMGs) were identified in 94 participants 24h after surgery compared to baseline with a local false discovery rate (LFDR)

05.
arXiv (CS.CL) 2026-06-16

Less is More: Improving LLM Reasoning with Minimal Test-Time Intervention

Recent progress in large language models (LLMs) has focused on test-time scaling to improve reasoning via increased inference computation, but often at the cost of efficiency. We revisit test-time behavior and uncover a simple yet underexplored phenomenon: reasoning uncertainty is highly localized-only a small subset of high-entropy tokens dominantly affects output correctness. Motivated by this, we propose Minimal Test-Time Intervention (MTI), a training-free framework that enhances reasoning accuracy and stability with minimal overhead. MTI includes: (i) Selective CFG intervention, applying classifier-free guidance only at uncertain positions; and (ii) Lightweight negative-prompt guidance, reusing the main model's KV cache to approximate unconditional decoding efficiently. MTI yields consistent gains across general, coding, and STEM tasks-e.g., +9.28% average improvement on six benchmarks for DeepSeek-R1-7B and +11.25% on AIME2024 using Ling-mini-2.0-while remaining highly efficient.

06.
Science (Express) 2026-06-18

Indium-free perovskite/silicon tandem solar cells with tin oxide recombination layer and electrodes | Science

作者: 未知作者

Indium-based transparent conductive oxides are widely used as electrodes and recombination layers in perovskite/silicon tandem solar cells, yet their scalability is constrained by indium scarcity and sputtering-induced damage. Here we report high efficiency and stable indium-free perovskite/silicon tandem solar cells enabled by reactive plasma deposited tin oxide (RPD-SnO x ). For RPD-SnO x as the recombination layer, a certified efficiency of 33.6% is achieved. Fully indium-free tandems that used RPD-SnO x as both recombination layer and electrodes delivering a champion PCE of 33.2% (1 cm 2 ) and a mini-module with a certified efficiency of 31.0% (207.9 cm 2 ). Dense and uniform self-assembled monolayer anchoring enabled by RPD-SnO x suppressed non-radiative recombination and reduced halide migration. Indium-free mini-modules exhibited high thermal, damp-heat, and outdoor operational stability and retained 65% of their maximum initial efficiency after 105 days of outdoor operation.

07.
arXiv (CS.LG) 2026-06-17

Edge Flow: A Tractable and Predictive Continuous-Time Model for Gradient Descent at the Edge of Stability

arXiv:2606.18080v1 Announce Type: new Abstract: Gradient descent in deep learning may operate at the edge of stability (EoS), a regime in which the largest eigenvalue of the loss Hessian hovers near the stability threshold $2/\eta$, where $\eta$ is the learning rate. Classical analysis tools such as gradient flow and the descent lemma do not apply here, motivating the search for a continuous-time model valid at EoS. We propose Edge Flow, a system of three coupled ordinary differential equations that provides a tractable, faithful, and predictive model of gradient descent dynamics at EoS. Edge Flow decomposes the dynamics into a center, an oscillation direction, and an oscillation magnitude. The center follows a modified gradient flow on a symmetrized loss; the direction tracks a top eigenvector of the Hessian via Rayleigh quotient dynamics; and the magnitude grows or decays exponentially depending on whether the sharpness exceeds or falls below the threshold $2/\eta$. Crucially, sharpness stabilization emerges from the coupled dynamics via a self-stabilization feedback loop. Discretizing Edge Flow only requires two gradient evaluations and one Hessian–vector product at each iteration. We demonstrate empirically that Edge Flow tracks the dynamics of gradient descent at least as faithfully as previously proposed continuous-time EoS models, while in addition resolving the oscillation of the sharpness at the onset of EoS, and that it provides a principled framework for understanding and mitigating instabilities in this regime.

08.
arXiv (CS.AI) 2026-06-16

SPARK: Security Knowledge Priming and Representation-Guided Knowledge Activation for LLM-based Secure Code Generation

arXiv:2606.16244v1 Announce Type: cross Abstract: Large language models routinely generate code with exploitable security flaws. Prior literature attributes this limitation to a lack of security expertise, steering current defense mechanisms toward heavy fine-tuning or external knowledge retrieval, which introduces significant computational overhead and data bias through redundant code examples. Contrary to this view, we argue that pretraining corpora are already rich in security material. The bottleneck is activation: without an explicit and brief cue, statistical pressure toward common training-distribution patterns suppresses the model's safety-relevant representations. We present SPARK, an inference-time security harness that activates this latent knowledge without any retraining. The harness has two parts. Component~I retrieves a few of the relevant Common Weakness Enumeration (CWE) entries for each coding task and appends a short structured cue to the prompt; this alone is enough to surface the model's existing security representations. Component~II adds a precomputed token bias to the logits at every decoding step. We obtain the bias by projecting a safe-direction vector, the unit difference between the mean safe and mean unsafe last-layer hidden states, through the language model head. The bias is computed once offline; applying it costs a single vector addition per generated token. We evaluate SPARK on 9 open-source models across C++, Java, and Python, and compare with 7 baselines spanning fine-tuning and retrieval-augmented methods. SPARK matches or improves on the best baseline in every setting while preserving HumanEval utility. We further test Component~I in a black-box setting on 7 of today's strongest models, including Claude, DeepSeek, and GPT, demonstrating the bottleneck of insecure code generation and the improvements enabled by our method.

09.
arXiv (CS.LG) 2026-06-11

RCAP: Robust, Class-Aware, Probabilistic Dynamic Dataset Pruning

arXiv:2606.11761v1 Announce Type: new Abstract: Dynamic data pruning techniques aim to reduce computational cost while minimizing information loss by periodically selecting representative subsets of input data during model training. However, existing methods often struggle to maintain strong worst-group accuracy, particularly at high pruning rates, across balanced and imbalanced datasets. To address this challenge, we propose RCAP, a Robust, Class-Aware, Probabilistic dynamic dataset pruning algorithm for classification tasks. RCAP applies a closed-form solution to estimate the fraction of samples to be included in the training subset for each individual class. This fraction is adaptively adjusted in every epoch using class-wise aggregated loss. Thereafter, it employs an adaptive sampling strategy that prioritizes samples having high loss for populating the class-wise subsets. We evaluate RCAP on six diverse datasets ranging from class-balanced to highly imbalanced using five distinct models across three training paradigms: training from scratch, transfer learning, and fine-tuning. Our approach consistently outperforms state-of-the-art dataset pruning methods, achieving superior worst-group accuracy at all pruning rates. Remarkably, with only $10\%$ data, RCAP delivers $>1\%$ improvement in performance on class-imbalanced datasets compared to full data training while providing an average $8.69\times$ speedup. The code can be accessed at https://github.com/atif-hassan/RCAP-dynamic-dataset-pruning

10.
medRxiv (Medicine) 2026-06-11

Long-term Penetrance of Disease Variants in Genes Prioritized for Genomic Newborn Screening: Evidence from Adult Biobanks

Importance: Genomic newborn screening (gNBS) is a potential public health intervention, but its positive predictive value (PPV) remains uncertain. Estimating the prevalence and penetrance of pathogenic and likely pathogenic (P/LP) variants in genes prioritized for screening may clarify the long-term PPV and clinical utility of gNBS. Objective: To compare ICD-based ascertainment, electronic medical record (EMR) review, and clinical assessment of genetic disorders in adults with P/LP variants in 54 genes prioritized for gNBS. Design: Two-cohort observational study with EMR review and clinical assessment in the hospital-based cohort. Setting: The U.K. Biobank (UKB) and Mass General Brigham Biobank (MGBB). Participants: 451,877 adults from the UKB and 53,371 from the MGBB, all with exome sequencing data. Exposures: P/LP variants in 54 genes prioritized through expert consensus for gNBS, in genotypes consistent with each gene's inheritance pattern. Main outcomes and measures: The primary outcome was the absolute difference in the proportion of MGBB participants identified as affected by ICD versus EMR ascertainment. Secondary outcomes included findings from clinical assessments of undiagnosed MGBB participants, corrected UKB penetrance estimates, and extrapolation to U.S.. annual birth cohorts and living adults. Results: P/LP variants were identified in 665 UKB participants (0.15%) and 82 MGBB participants (0.15%), approximately 1 in 650. In MGBB, EMR review revealed that 58/82 individuals (70.7%) were undiagnosed, although 25 of 58 (43.1%) had documented symptoms. Disease-associated ICD codes were found in 39.0% (32/82) of participants, whereas EMR review identified symptoms in 59.8% (49/82, McNemar P

11.
arXiv (CS.LG) 2026-06-11

Why Depth Matters in Parallelizable Sequence Models: A Lie Algebraic View

arXiv:2603.05573v2 Announce Type: replace Abstract: Scalable sequence models, such as Transformer variants and structured state-space models, often trade expressivity power for sequence-level parallelism, which enables efficient training. Here we examine the bounds on error and how error scales when models operate outside of their expressivity regimes using a Lie-algebraic control perspective. Our theory formulates a correspondence between the depth of a sequence model and the tower of Lie algebra extensions. Echoing recent theoretical studies, we characterize the Lie-algebraic class of constant-depth sequence models and their corresponding expressivity bounds. Furthermore, we analytically derive an approximation error bound and show that error diminishes exponentially as the depth increases, consistent with the strong empirical performance of these models. We validate our theoretical predictions using experiments on symbolic word and continuous-valued state-tracking problems.

12.
arXiv (math.PR) 2026-06-15

Limiting partition function for the Mallows model: a conjecture and partial evidence

作者:

arXiv:2406.18855v2 Announce Type: replace Abstract: Let $S_n$ denote the set of permutations of $n$ labels. We consider a class of Gibbs probability models on $S_n$ that is a subfamily of the so-called Mallows model of random permutations. The Gibbs energy is given by a class of right invariant divergences on $S_n$ that includes common choices such as the Spearman foot rule and the Spearman rank correlation. Mukherjee in 2016 computed the limit of the (scaled) log partition function (i.e. normalizing factor) of such models as $n\rightarrow \infty$. Our objective is to compute the exact limit, as $n\rightarrow \infty$, without the log. We conjecture that this limit is given by the Fredholm determinant of an integral operator related to the so-called Schrödinger bridge probability distributions from optimal transport theory. We provide partial evidence for this conjecture, although the argument lacks a final error bound that is needed for it to become a complete proof.

13.
arXiv (CS.CV) 2026-06-19

Contour-Constrained Deformable Registration with Parameter Characterization for Head and Neck Surgical Guidance

With 890,000 annual new cases globally, head and neck squamous cell carcinoma has one of the highest recurrence rates among solid malignancies. Although frozen section analysis is the standard of care for intraoperative margin assessment, accurately relocating detected positive margins on the resection bed remains challenging due to imprecise alignment between resected specimens and their resection bed, compounded by post-resection mucosal tissue shrinkage. We present a biomechanics-driven deformable registration framework that corrects post-resection tissue deformation to provide intraoperative guidance. Our approach registers 3D specimen meshes to intraoperative resection bed point clouds using a deformable registration approach based on regularized Kelvinlet basis functions. The registration matches surface point clouds, fiducial landmarks, and boundary contour constraints that directly penalize perpendicular distance-to-agreement between specimen and resection bed boundaries. Across nine specimens from skin, buccal mucosa, and tongue sites, the overall mean target registration error was $11.11 \pm 4.07$ mm using rigid registration, which decreased to $8.20 \pm 2.68$ mm (26.19\% reduction) using deformable registration without contour constraint. The proposed contour-constrained deformable registration further reduced the error to $5.62 \pm 2.28$ mm, a 49.41\% reduction relative to rigid registration. We observed the largest reduction in the most clinically challenging tongue specimens. We also performed a systematic two-stage parameter search to characterize the relative importance of surface alignment, fiducial correspondences, contour constraint, and strain energy regularization. This search revealed that contour weighting dominates registration accuracy for tissue types with large lateral deformation, while the algorithm operates over a broad range of parameter combinations.

14.
arXiv (CS.AI) 2026-06-16

MemPO: Self-Memory Policy Optimization for Long-Horizon Agents

arXiv:2603.00680v4 Announce Type: replace Abstract: Long-horizon agents face the challenge of growing context size during interaction with environment, which degrades the performance and stability. Existing methods typically introduce the external memory module and look up the relevant information from the stored memory, which prevents the model itself from proactively managing its memory content and aligning with the agent's overarching task objectives. To address these limitations, we propose the self-memory policy optimization algorithm (MemPO), which enables the agent (policy model) to autonomously summarize and manage their memory during interaction with environment. By improving the credit assignment mechanism based on memory effectiveness, the policy model can selectively retain crucial information, significantly reducing token consumption while preserving task performance. Extensive experiments and analyses confirm that MemPO achieves absolute F1 score gains of 25.98 over the base model and 7.1 over the previous SOTA baseline, while reducing token usage by 67.58% and 73.12%. The code is released at https://github.com/TheNewBeeKing/MemPO.

15.
arXiv (CS.CV) 2026-06-16

XMedFusion: A Knowledge-Guided Multimodal Perception and Reasoning Framework for Autonomous Medical Systems

Autonomous medical and robotic systems increasingly rely on intelligent perception and reasoning capabilities to interpret visual data and support clinical decision making. Radiology report generation represents a critical component of such automated diagnostic workflows, yet existing end-to-end multimodal models often suffer from weak visual grounding, resulting in unreliable interpretations and omission of subtle clinical findings. This paper presents XMedFusion, a modular AI framework designed as an intelligent perception and reasoning module for autonomous medical systems. The proposed framework decomposes visual information into coordinated functional components that emulate expert-driven analysis, including a visual perception agent that extracts image-grounded evidence, a knowledge graph construction agent that structures clinically relevant findings, and a retrieval-guided drafting process that ensures a consistent reporting structure. A synthesis agent iteratively integrates visual and structured evidence through reasoning-driven verification to produce reliable and interpretable diagnostic outputs. Experimental evaluation on a public chest radiograph dataset demonstrates significant improvements over baseline vision-language models, achieving gains from 0.0493 to 0.3359 in BLEU-1, 0.0863 to 0.2440 in ROUGE-L, and 0.0829 to 0.1708 in METEOR, along with substantial improvements in semantic evaluation metrics such as Consistency (2.38 to 7.80) and Accuracy (2.34 to 6.93). The results highlight the effectiveness of structured multi-agent perception and reasoning for enhancing robustness, transparency, and automation in intelligent medical imaging systems, enabling integration into autonomous healthcare and robotic diagnostic workflows.

16.
arXiv (CS.AI) 2026-06-16

Forced Deferral: Manipulating Routing Decisions in Multimodal LLM Cascades

arXiv:2606.15308v1 Announce Type: new Abstract: While multimodal large language models (MLLMs) have shown strong visual reasoning abilities, serving a large model for every query is computationally expensive. MLLM cascades mitigate this cost by first querying a weak but cheaper model and deferring to a strong model when the weak model's output is unconfident. However, since the weak model's confidence directly controls compute allocation, these systems expose a new attack surface: an adversary can manipulate confidence so that their queries are consistently deferred to the strong model. Motivated by this vulnerability, we introduce the Forced Deferral Attack (FDA), an adversarial image attack that lowers the weak model's confidence and causes cascades to route queries to the strong model. FDA learns a universal border trigger by optimizing a temperature-flattened objective. This objective pushes the weak model's token distribution on triggered inputs toward less concentrated targets constructed from its clean responses. Across datasets, model families, and deferral metrics, FDA consistently increases strong-model routing while outperforming image-perturbation and prompt-injection baselines. These results show that MLLM cascades are vulnerable to attacks that manipulate compute allocation, forcing unintended strong-model usage without directly targeting answer correctness.

17.
arXiv (CS.LG) 2026-06-18

Unreduced Persistence Diagrams for Topological Machine Learning

arXiv:2507.07156v2 Announce Type: replace-cross Abstract: Supervised machine learning pipelines trained on features derived from persistent homology have been experimentally observed to ignore much of the information contained in a persistence diagram. Computing persistence diagrams is often the most computationally demanding step in such a pipeline, however. To explore this dynamic, we introduce several methods to generate topological feature vectors from unreduced boundary matrices and investigate their theoretical and computational properties. We compared the performance of pipelines trained on vectorizations of unreduced PDs to vectorizations of fully-reduced PDs across several data and task types. Our results indicate that models trained on PDs built from unreduced diagrams can perform on par and even outperform those trained on fully-reduced diagrams on some tasks. We also benchmarked the computational performance of an algorithm for computing unreduced diagrams, which was implemented as a heavily modified version of Ripser. These computations are parallelizable and required an order of magnitude less memory on average compared to computing full persistence diagrams. Our results suggest that machine learning pipelines which incorporate topology-based features may benefit in terms of computational cost and performance by utilizing information contained in unreduced boundary matrices.

18.
arXiv (CS.CL) 2026-06-16

SAMark: A Self-Anchored Text Watermarking with Paragraph-Level Paraphrase Robustness

Semantic-level watermarking (SWM) improves robustness against text modifications by treating sentences as the basic unit. However, robustness to paragraph-level paraphrasing remains difficult because such attacks globally disrupt watermark signals by changing sentence order. In this work, we propose SAMark, a self-anchored watermarking framework that removes the dependency on sentence order by establishing a step-independent green region in semantic space. To improve detectability, we introduce a multi-channel hyperbolic scoring mechanism that amplifies watermark signals while suppressing noise from weakly aligned candidates. We further propose a diversity-aware filtering strategy that combines hard filtering with soft regularization, extending beyond simple n-gram repetition filters to address semantic redundancy. Experimental results show that SAMark achieves up to 90.2% TP@FP1% under typical paragraph-level paraphrasing attacks, outperforming the strongest prior baseline by more than 30% on average, while maintaining generation quality competitive with unwatermarked text and breaking the robustness-quality trade-off that limits prior methods.

19.
arXiv (quant-ph) 2026-06-16

A Gauge-Covariant Geometric Framework for Non-Hermitian Quantum Systems

arXiv:2606.15922v1 Announce Type: new Abstract: We develop a comprehensive, gauge-covariant geometric framework for non-Hermitian quantum systems in the quasi-Hermitian regime, that is, the region of parameter space where the non-Hermitian Hamiltonian admits a real spectrum and a positive-definite metric operator. We build this framework by elevating the Dyson map to a central geometric object. This map is the transformation that converts a non-Hermitian Hamiltonian into an equivalent Hermitian one. From it we construct the Dyson connection and decompose it into Hermitian and anti-Hermitian parts, identified respectively as {\it stretching } and {\it rotation } components. This decomposition cleanly separates the genuine physical metric deformations from the unitary gauge redundancies. Working with manifestly gauge-covariant states, we then derive the complex non-Hermitian Berry phase and the quantum geometric tensor (QGT), and show that the non-Hermitian geometric curvature originates from the non-commutativity of the stretching components at the operator level. We further analyse the geometric singularities near an exceptional point (EP) and uncover a distinct hierarchy of divergences. For a general two-level non-Hermitian model, the quantum metric tensor (QMT) exhibits a leading-order divergence $\sim |\epsilon_\mu|^{-2}$, while the Berry curvature shows a weaker, subleading divergence $\sim |\epsilon_\mu|^{-3/2}$, with $\epsilon_\mu$ denoting the parameter displacement from the EP along an individual parameter axis $\mu$. Finally, we examine physical realizations of this model, including the non-Hermitian Su–Schrieffer–Heeger (SSH) and Hatano–Nelson (HN) models, where exact analytical results confirm the predicted critical scaling laws and illustrate the metric-deformation-driven non-Hermitian geometries.

20.
arXiv (CS.CL) 2026-06-18

UMA-Split: unimodal aggregation for both English and Mandarin non-autoregressive speech recognition

This paper proposes a unimodal aggregation (UMA) based nonautoregressive model for both English and Mandarin speech recognition. The original UMA explicitly segments and aggregates acoustic frames (with unimodal weights that first monotonically increase and then decrease) of the same text token to learn better representations than regular connectionist temporal classification (CTC). However, it only works well in Mandarin. It struggles with other languages, such as English, for which a single syllable may be tokenized into multiple fine-grained tokens, or a token spans fewer than 3 acoustic frames and fails to form unimodal weights. To address this problem, we propose allowing each UMA-aggregated frame map to multiple tokens, via a simple split module that generates two tokens from each aggregated frame before computing the CTC loss.

21.
arXiv (CS.LG) 2026-06-18

Point-Cloud-Assistant Localized Statistical Channel Prediction by Tangent Gaussian Splatting

arXiv:2606.18734v1 Announce Type: cross Abstract: Accurate, site-specific channel information is crucial for optimizing next-generation wireless networks. Among various approaches, localized statistical channel modeling (LSCM), which models the channel multipath angular power spectrum (APS) from the reference signal received power (RSRP) measurement, has emerged as a state-of-the-art method tailored for efficient network optimization. However, despite its effectiveness, LSCM cannot predict APS at the vast majority of locations where no measurements are available, which significantly restricts its applicability in large-scale, real-world scenarios. To address this challenge, we present point-cloud-assisted tangent Gaussian splatting (PC-TGS), the first framework to extrapolate APS to unmeasured outdoor grids by integrating sparse radio measurements with dense LiDAR-based geometry. PC-TGS represents environmental scatterers as anisotropic 3D Gaussians, initialized and refined through a relaxed-mean reparameterization of the raw point cloud. A tangent-plane projection accurately maps each Gaussian into the local angular domain, while a depth-aware electromagnetic splatting process aggregates their contributions. To ensure practical deployment, we derive a closed-form Gaussian-weighted average (GWA) for APS bin integration and provide a provable error bound. { Evaluations on a LiDAR-scanned city-scale dataset (5M points, 6,310 RSRP samples) demonstrate that PC-TGS achieves better APS and RSRP prediction performance compared to state-of-the-art baselines and faster inference time for APS extrapolation task. These results highlight the potential of PC-TGS to enable geometry-aware and data-efficient channel prediction in large-scale wireless digital twins.

22.
arXiv (CS.AI) 2026-06-15

Hy-Embodied-0.5-VLA: From Vision-Language-Action Models to a Real-World Robot Learning Stack

arXiv:2606.14409v1 Announce Type: cross Abstract: In this report, we present Hy-Embodied-0.5-VLA, abbreviated as HyVLA-0.5, an end-to-end system that spans the full robot learning stack: data collection, model design, continued pre-training and supervised fine-tuning, RL post-training, and real-world deployment. Each component serves a distinct role in this stack.

23.
arXiv (CS.CL) 2026-06-11

Agent Skill Evaluation and Evolution: Frameworks and Benchmarks

The growth of agent skills has transformed how agentic systems are built, evaluated, and deployed. As skill libraries continue to scale, rigorous evaluation becomes critical to ensuring their utility, quality, and safety in real-world applications. Consequently, the field is undergoing an emerging paradigm shift from isolated skill creation to automated, evaluation-driven skill evolution. In this survey, we systematically examine the landscape of skill evolution and evaluation beyond foundational skill creation. We categorize evolution into four distinct paradigms, spanning execution feedback, trajectory distillation, compression, and reinforcement learning, showing how each element contributes to improving skill utility and reliability. We also provide an analysis of six skill-centric benchmark categories, identifying structural gaps in benchmark coverage, trade-offs, and metric richness to advance skill research. Finally, we identify open directions for building skill ecosystems that are generalizable, efficient, and verifiably safe. The project URL is https://github.com/Cassie07/AgentSkill_Survey

24.
arXiv (quant-ph) 2026-06-15

Quantum gates with parametrically driven multi-qubit couplers

arXiv:2606.14522v1 Announce Type: new Abstract: Superconducting quantum processors could significantly profit from enhanced connectivity together with precise control of interactions and gates between qubits. Here we investigate plaquettes of four qubits that are coupled via a central tunable coupling circuit, so that not only gates between qubits connected by an edge of the plaquette can be executed but also between qubits across the diagonal. By numerically and analytically analyzing parametrically driven processes, we explore $\sqrt{iSWAP}$-gates between any pair of qubits, also across the diagonal, as well as three-qubit interactions and gates. For experimentally available circuit parameters, we for example find $\sqrt{iSWAP}$-gates with a gate time of 50 ns and 99.9\% fidelity, which is decreased to 99.4\% if two such gates are executed in parallel on disjoint qubit pairs in the plaquette. For three-qubit gates we find fidelities of 95\% fidelity at a gate time of 200 ns.

25.
arXiv (CS.LG) 2026-06-19

Model soups need only one ingredient

arXiv:2602.09689v2 Announce Type: replace Abstract: Fine-tuning large pre-trained models on a target distribution often improves in-distribution (ID) accuracy, but at the cost of out-of-distribution (OOD) robustness as representations specialize to the fine-tuning data. Weight-space ensembling methods, such as Model Soups, mitigate this effect by averaging multiple checkpoints, but they are computationally prohibitive, requiring the training and storage of dozens of fine-tuned models. In this paper, we introduce MonoSoup, a simple, data-free, hyperparameter-free, post-hoc method that achieves a strong ID-OOD balance using only a single checkpoint. Our method applies Singular Value Decomposition (SVD) to each layer's update and decomposes it into high-energy directions that capture task-specific adaptation and low-energy directions that introduce noise but may still encode residual signals useful for robustness. MonoSoup then uses entropy-based effective rank to automatically re-weigh these components with layer-wise coefficients that account for the spectral and geometric structure of the model. Experiments on CLIP models fine-tuned on ImageNet and evaluated under natural distribution shifts, as well as on Qwen language models tested on mathematical reasoning and multiple-choice benchmarks, show that this plug-and-play approach is a practical and effective alternative to multi-checkpoint methods, retaining much of their benefits without their computational overhead.