Academic Intelligence · Curated Daily

探索全球前沿学术脉络

AcademicHub 汇聚顶级期刊与预印本平台的实时文献。定制您的专属科研雷达,利用大语言模型自动生成交叉领域文献分析简报。

01.
arXiv (CS.LG) 2026-06-11

MASK: Multi-Agent Semantic K-Scheduling for Risk-Sensitive 6G Robotics

arXiv:2606.11249v1 Announce Type: cross Abstract: Realizing the vision of 6G connected robotics requires reconciling high-performance collaborative control with the rigid spectral limitations of physical wireless channels. In realistic collaborative sensing scenarios, spectral resources are quantized into finite physical resource blocks or orthogonal subcarriers, rendering simultaneous transmission by all agents infeasible. To address this, we propose Multi-Agent Semantic K-Scheduling (MASK), a control architecture designed to sustain robust, risk-aware coordination under strict instantaneous bandwidth caps. We introduce Arbiter-Assisted Semantic Information Gating (A-SIG), a lightweight coordination mechanism that enforces hard access constraints by scheduling only the top-K agents based on locally computed semantic importance scores. By aggregating these prioritized observations into a compact latent state, a self-supervised global encoder enables a distributional policy to mitigate tail risks despite data sparsity. We evaluate MASK across diverse benchmarks, demonstrating that it matches the performance of communication-unconstrained baselines even when channel access is restricted to a small fraction of the swarm size. Furthermore, the framework exhibits inherent resilience to packet erasures, validating semantic scheduling as a critical enabler for resource-constrained 6G systems.

02.
arXiv (CS.LG) 2026-06-19

Light Interaction: Training-Free Inference Acceleration for Interactive Video World Models

arXiv:2605.31158v3 Announce Type: replace-cross Abstract: Interactive video world models generate video chunk by chunk in response to user-controlled camera movements, enabling applications such as real-time game simulation, virtual scene navigation, and embodied AI training. However, scaling to long interactive trajectories is prohibitively expensive due to growing context memory, quadratic attention complexity, and repeated denoising steps. We present Light Interaction, a training-free inference acceleration framework for interactive video world models. Our key insight is that interaction naturally enables trajectory-dependent adaptive computation: retrieved spatial memory can be discarded during novel exploration, temporal context can be adjusted according to local latent dynamics, and early-step model outputs can be reused when the camera revisits familiar regions. Based on this insight, Light Interaction combines adaptive context management, denoising cache acceleration, and hardware-software co-designed 3D block sparse attention with fused Triton kernels. Evaluated on HY-WorldPlay and Matrix-Game-3.0, Light Interaction achieves up to 2.59x speedup without model retraining while maintaining competitive visual quality.

03.
arXiv (CS.LG) 2026-06-16

Schattor: Schatten-family methods for deep learning optimization

arXiv:2606.15702v1 Announce Type: cross Abstract: Modern deep learning optimization features heterogeneous parameter structures, noisy gradients, and highly nonconvex landscapes, posing significant challenges for both algorithm design and theoretical analysis. Motivated by the limitations of SGD and the success of adaptive optimizers, we propose {\it Schattor}, a family of adaptive first-order methods based on Schatten norms. Schattor unifies SGD and the recently proposed matrix-variate adaptive optimizer Muon within a single Schatten-norm-based framework. We establish dimension-free stationarity guarantees for methods in the Schattor family for stochastic matrix optimization problems via a novel matrix martingale moment bound. We also develop multi-block extensions that adaptively balance block-wise optimization progress and prove dimension-free stationarity guarantees in this more general setting.

04.
arXiv (quant-ph) 2026-06-16

Counterdiabatic Raman Atom Optics for Compact High-Sensitivity Gravimetry

arXiv:2606.16945v1 Announce Type: new Abstract: Large-momentum-transfer (LMT) atom interferometry provides a route toward enhanced inertial sensitivity in compact quantum sensors, but its scalability is limited by the accumulation of pulse-transfer errors across long Raman pulse sequences. We investigate theoretically the use of stimulated Raman shortcut-to-adiabatic passage (STIRSAP) for high-fidelity LMT atom optics in a Mach–Zehnder interferometer geometry. The counterdiabatic correction is encoded directly into the Raman pulse envelopes, eliminating the need for auxiliary microwave or radio-frequency control fields. Numerical simulations based on an effective Raman model show that $1~\mu\mathrm{s}$ STIRSAP pulses achieve single-pulse transfer fidelities of $F_\pi = 0.99902$ while maintaining negligible pulse-time overhead even at high momentum order. We analyze the resulting tradeoff between interferometric phase enhancement and compound contrast decay and identify an unconstrained shot-noise optimum near $n\approx270$. The analysis further shows that practical operation at extreme LMT order is constrained by wave-packet separation, vibration noise, Doppler detuning, and accumulated systematic effects rather than by pulse duration itself. These results establish superadiabatic Raman control as a promising approach for scalable high-fidelity atom optics and clarify the physical limitations governing compact high-order atom interferometers.

05.
arXiv (quant-ph) 2026-06-11

Coupled integrated photonic quantum memristors using a single photon source made of a colour center

arXiv:2602.14736v2 Announce Type: replace Abstract: Photonic quantum memristors provide a measurement-induced route to nonlinear and history-dependent quantum dynamics. Experimental demonstrations have so far focused on isolated devices or simple cascaded devices configurations. Here, we experimentally realize and characterize a network of two coupled photonic quantum memristors with crossed feedback, implemented on a silicon nitride photonic integrated circuit and fed by a room-temperature single-photon source based on a silicon-vacancy color center SiV$^-$ in a nanodiamond. Each memristor consists of an integrated Mach-Zehnder interferometer whose transfer function is adaptively updated by photon detection events on another memristor, thus generating novel non-Markovian input-output dynamics with an enhanced memristive behaviour compared to single devices. In particular, we report inter-memristor input-output hysteresis curves exhibiting larger form factors and displaying self-intersecting loops, respectively revealing marked bistability and self-intersecting hysteresis geometry. Furthermore, numerical simulations show how these features emerge from the interplay between memory depth and relative input phase, for both intra- and inter-memristor input-output relations. We experimentally test the performance of our system in the NARMA task. Our results establish coupled integrated photonic quantum memristors as scalable nonlinear building blocks and highlight their potential for implementing compact quantum neuromorphic and reservoir computing architectures.

06.
arXiv (CS.AI) 2026-06-16

Rescaling Confidence: What Scale Design Reveals About LLM Metacognition

arXiv:2603.09309v2 Announce Type: replace Abstract: Verbalized confidence, in which LLMs report a numerical certainty score, is widely used to estimate uncertainty in black-box settings, yet the confidence scale itself (typically 0–100) is rarely examined. We show that this design choice is not neutral. Across six LLMs and three datasets, verbalized confidence is heavily discretized, with more than 78\% of responses concentrating on just three round-number values. To investigate this phenomenon, we systematically manipulate confidence scales along three dimensions: granularity, boundary placement, and range regularity, and evaluate metacognitive sensitivity using $meta-d'$. We find that a 0–20 scale consistently improves metacognitive efficiency over the standard 0–100 format, while boundary compression degrades performance and round-number preferences persist even under irregular ranges. These results demonstrate that confidence scale design directly affects the quality of verbalized uncertainty and should be treated as a first-class experimental variable in LLM evaluation.

07.
arXiv (CS.AI) 2026-06-16

Autonomous End-to-End SOH Prediction Services for Battery Systems via Temporal-Contrastive Representation Learning

arXiv:2606.16434v1 Announce Type: cross Abstract: Accurate state of health (SOH) estimation is a critical diagnostic service for lithium-ion battery management. However, reliance on labor-intensive manual feature engineering and opaque black-box models hinders scalable industrial deployment. To address this, we introduce TC-SOH: a modular, plug-and-play service architecture for autonomous, end-to-end SOH prediction. TC-SOH employs a temporal-contrastive mechanism and a cross-window prediction pretext task to extract degradation-relevant representations directly from raw operational data. To improve transparency, we connect model efficacy with representation diagnostics: visualization, sensitivity analysis, redundancy analysis, bidirectional probing, future-SOH probing, and temporal shuffling show that learned features overlap with selected expert descriptors while retaining additional SOH-relevant variation, and that ordered temporal context improves subsequent-SOH prediction. Across four public datasets, TC-SOH outperforms the considered physics-informed and data-driven baselines, reducing MAPE by 1.91 times and RMSE by 2.13 times.

08.
arXiv (CS.AI) 2026-06-19

Augmenting Game AI with Deep Reinforcement Learning

arXiv:2606.20210v1 Announce Type: new Abstract: Immersion in video games depends not only on graphics, audio, and game mechanics, but also on the quality of in-game characters. Producing believable characters, or game AI, remains a significant challenge as behavioral complexity is hard to capture with hand-coded systems. Game AI is a source of immersion and engagement; however, the limitations stemming from the challenges of creating game AI often lead to frustration and the breaking of the illusion of realism within the game. The introduction of machine learning models opens the door to creating more believable, authentic, and relatable characters in games. The promise is that they either learn from interacting with the game, or from player data, to develop true human-like behavior. In this paper, we envision more applications of reinforcement learning for game AI in the future. For this to materialize, current research limitations are prohibitive to broad deployment across game genres. Therefore, we propose a framework for training reinforcement learning models with a set of requirements in mind that are suited towards game AI and game development. We present examples of games with reinforcement learning-augmented game AI and describe the practicalities of deploying player-facing machine learning agents in modern games. Furthermore, we identify bottlenecks and hard problems in these areas, which we believe offer promising research directions to accelerate the adoption of machine learning in game AI for the video game industry.

09.
arXiv (CS.AI) 2026-06-16

Scaling Adaptive Depth with Norm-Agnostic Residual Networks

arXiv:2606.16112v1 Announce Type: cross Abstract: Residual architectures are ubiquitous in deep learning, but they suffer from a subtle structural limitation: the norm of the residual stream can grow rapidly with depth. As a result, updates from later layers become small relative to the accumulated residual state. This reduces their impact on the representation and limits the benefits of scaling models in depth. To address this, we introduce NAG, a norm-agnostic residual architecture that separates magnitude from directional information in the residual stream, preserving meaningful layer contributions throughout depth and preventing later updates from being systematically suppressed by residual-norm growth. Importantly, NAG introduces only a negligible number of additional parameters and relies on simple operations that are easily kernel-fusible, preserving training efficiency in practice. We show that this architecture outperforms baseline Transformers, with gains that increase substantially as depth grows, enabling effective training of much deeper models. The norm-agnostic formulation also leads to an interpretable Mixture-of-Depths (MoD) mechanism that adaptively skips both attention and MLP layers. Beyond serving as a post-training accuracy-compute tradeoff, this mechanism can be used as a pretraining-time scaling strategy: under iso-FLOP training, compute saved by reducing per-token forward-pass cost can be reinvested into training on more tokens while keeping the total parameter count and KV-cache budget fixed. In our experiments, moderate Mixture-of-Depths rates of approximately 20%-25% match full-depth baseline performance under equal training compute while substantially reducing the number of executed layer parameters and forward-pass FLOPs. These results identify sparsity in depth as a new scaling axis for fixed-compute training, enabling very deep yet FLOP-efficient models.

10.
arXiv (quant-ph) 2026-06-15

Fourier analysis of quantum neural network with non-linear data embedding

arXiv:2606.14206v1 Announce Type: new Abstract: Fourier analysis has become a crucial tool for understanding the expressivity of Variational Quantum Circuit (VQC) models, as well as an important indicator of barren plateaus (BP). While existing literature has only studied angle-embedded VQCs in a noiseless environment, here we develop the Fourier analysis of VQCs with non-linear data embedding, with particular focus on amplitude embedding, which provides a naturally compact encoding scheme. We first investigate a subtle difference in the domain of input features within amplitude embedding that leads to a distinct expressivity of the zero-frequency Fourier coefficient. By assuming that the ensemble of unitaries generated from the parameter space forms at least a 2-design with respect to the unitary group, we derive, via Weingarten calculus, that the mean of the Fourier coefficients is concentrated at zero, and the variance scales at an exponentially decaying order with respect to the multi-dimensional frequency magnitude. When a noise channel with unitary Kraus operators and probabilities $\{p_k\}$ is taken into account, the variance is further suppressed by a factor $\left(\sum_k p_k^2\right)^{Q}

11.
arXiv (quant-ph) 2026-06-15

Tantalum as a base material for superconducting integrated circuits

arXiv:2606.13750v1 Announce Type: new Abstract: The performance of superconducting integrated circuits for quantum applications is fundamentally limited by material-related losses. Tantalum, as an emerging material for next-generation quantum circuits, has attracted considerable attention in recent years after demonstrating breakthrough performance in both superconducting microwave resonators and qubits. Concurrently, a growing body of work is devoted to the operation of tantalum-based circuits and related fabrication techniques. This interest is further stimulated by tantalum thin films polymorphism resulting in a variety of its crystalline structure, superconducting properties, coherence, etc. Furthermore, tantalum circuits exhibit distinctive features in cryogenic experiments, which have not been observed in aluminum- or niobium-based ones. In this review, we summarize the recent research of tantalum thin films growth and phase selection mechanisms on various substrates, key aspects of fabrication and performance of superconducting circuit, including a material first-principles theoretical study. In conclusion, we address a number of open issues, including the role of \b{eta}-phase impurities, the effect of hydrofluoric acid solutions on chain characteristics, and the anomalous behavior of {\alpha}-tantalum chains at cryogenic temperatures.

12.
arXiv (CS.CV) 2026-06-17

Pulling The REINS: Training-Free Safety Alignment of Video Diffusion Models via Representation Steering

Open-weight video diffusion models can generate photorealistic unsafe content, from violence to misinformation, yet existing defenses either require expensive safety fine-tuning that degrades general capability, or apply external filters that are trivially bypassed by adversarial prompts. We present REINS (REpresentation-space INference-time Safety steering), a training-free method that aligns video diffusion models at inference time by steering their internal representations toward safe generation. Our key finding is that safety-relevant structure is linearly encoded in the hidden-state activations of video diffusion transformers, and a single direction, discovered via Supervised PCA on binary safety labels, suffices to separate safe from unsafe generation trajectories. At inference, adding this direction to hidden states at an intermediate transformer layer redirects generation from harmful content to semantically related safe alternatives, with no weight updates, no concept enumeration, and negligible computational overhead. Through mechanistic analysis, we reveal that while safety information accumulates monotonically with transformer depth, steering effectiveness peaks at intermediate layers (~50% depth), exposing a fundamental tradeoff between information availability and downstream propagation capacity. We evaluate REINS across 9 video diffusion models, multiple parameter scales (1.3B-5B), and both text-to-video and image-to-video generation, to our knowledge, the broadest safety evaluation suite in the video generation literature.

13.
arXiv (quant-ph) 2026-06-19

Many-Body Protection of Topological Edge Memory in Strong Interacting Quenches

arXiv:2606.19437v1 Announce Type: cross Abstract: Quantum quenches drive edge states far from equilibrium, yet whether the memory of a topological initial state survives in a non-integrable, interacting system has remained largely unexplored. We study this question in the bond-alternating XXZ chain – an interacting Su–Schrieffer–Heeger model hosting symmetry-protected topological edge modes with markedly enhanced boundary magnetization – and analyze quenches across all combinations of single-particle and many-body initial and final Hamiltonians. The results organize by a single distinction as we rigorously establish in this work: whether the post-quench Hamiltonian is free or genuinely interacting. For a free post-quench Hamiltonian, the dynamics is solved exactly by a correlation-matrix approach; the boundary-mode return amplitude decays as $t^{-3/2}$, and initial interactions enter only through a dressed one-body density matrix. For a genuinely interacting post-quench Hamiltonian, finite-time stability bounds prove that away from local resonances the first-dimer magnetization remains stable on time windows growing as arbitrarily large powers of the inverse inter-dimer coupling. Matrix product state simulations across all four protocols show that interactions in the final Hamiltonian markedly extend finite-time boundary memory – with local suppression near the isotropic $SU(2)$ point – revealing a many-body protection mechanism in a non-integrable system where scrambling would otherwise wash out initial-state memory fast.

14.
arXiv (CS.AI) 2026-06-15

Quantized Evolution Strategies: High-precision Fine-tuning of Quantized LLMs at Low-precision Cost

arXiv:2602.03120v2 Announce Type: replace-cross Abstract: Post-Training Quantization (PTQ) is essential for deploying Large Language Models (LLMs) on memory-constrained devices, yet it renders models static and difficult to fine-tune. Standard fine-tuning paradigms, including Reinforcement Learning (RL), fundamentally rely on backpropagation and continuous weights to compute gradients. Thus they cannot be used on quantized models, where the parameter space is discrete and non-differentiable. While Evolution Strategies (ES) offer a backpropagation-free alternative, optimization of the quantized parameters can still fail due to vanishing or inaccurate gradient estimation. This paper introduces Quantized Evolution Strategies (QES), an optimization paradigm that performs full-parameter fine-tuning directly in the quantized space. QES is based on two innovations: (1) it integrates accumulated error feedback to preserve high-precision weight updating signals, and (2) it utilizes a stateless seed replay to reduce memory usage to low-precision inference levels. QES significantly outperforms the state-of-the-art zeroth-order fine-tuning methods on a variety of tasks, making direct fine-tuning for quantized models possible. It therefore opens up the possibility for scaling up LLMs entirely in the quantized space. The source code is available at https://github.com/dibbla/Quantized-Evolution-Strategies .

15.
arXiv (CS.CV) 2026-06-17

Geometric Consistency Protocol for Foundation Model Features in Multi-View Satellite Imagery

Standardized evaluation protocols are indispensable for robust benchmarking in remote sensing, particularly as foundation features are increasingly transferred across diverse sensors and complex imaging geometries. In satellite multi-view reconstruction, conventional evaluations relying on unconstrained 2D global matching are often misleading. The Rational Function Model (RFM) and its Rational Polynomial Coefficients (RPC) dictate a curved, height-dependent epipolar geometry that render flat 2D search spaces physically inconsistent. We propose a geometry-faithful and reproducible protocol tailored for the RPC framework. Our approach integrates an RPC-projected 3D consistency metric with a geometry-constrained dense matching proxy, specifically evaluating whether similarity responses remain localized and unique under physically plausible search manifolds. A pivotal finding of our joint reporting strategy is the decoupling of semantic agreement and geometric localization: high cross-view similarity at a projected 3D point does not guarantee reliable matchability in practical inference. Our benchmark demonstrates that incorporating geometric constraints is fundamental to the problem definition in satellite imagery. Furthermore, we show that state-of-the-art 2D backbones remain remarkably competitive against specialized 3D-aware models when subjected to this RPC-consistent evaluation.

16.
medRxiv (Medicine) 2026-06-15

High Demand, Low Possession: Dilemmas and Strategies for Research Capability Cultivation in Clinical Medicine Postgraduates

Most previous studies have examined medical postgraduate research training from a single dimension, lacking a full-chain analysis that integrates capability demand, actual possession, obstacles, and output. Consequently, the measurement of capability gaps and the analysis of underlying training model deficiencies remain insufficient. To address this gap, we administered a self-designed multidimensional questionnaire to 86 clinical medicine postgraduates at a medical school, covering research cognition, interest, capability demand and possession, participation pathways, difficulties, and outputs. The aim was to systematically characterize the current situation, identify problems, and propose optimization strategies. Over 90% of participants expressed interest in research, yet only 1.16% self-rated as very knowledgeable. The largest demand-possess gap was for writing and publication (86.05% vs. 16.28%), followed by independent research capability (75.58% vs. 11.63%). A total of 59.30% cited lack of foundational knowledge, making experiments very difficult, as the greatest challenge, and 66.28% had no research achievements. The primary source of research topics was supervisor assignment (54.65%), with only 4.65% choosing topics independently. No statistically significant differences were found across grades or training types (P > 0.05). These findings reveal a structural high demand, low possession gap in medical postgraduate research training, with early research experience deficit and a passive research model as key constraining factors. Accordingly, an integrated bachelor-postgraduate progressive research competency training system is proposed.

17.
arXiv (CS.AI) 2026-06-17

A Machine-Learned Comorbidity Index

arXiv:2606.17450v1 Announce Type: new Abstract: Traditional comorbidity scores (e.g., Charlson and Elixhauser) are widely used for risk adjustment and patient stratification, but they have two key limitations: (i) they are largely mortality-centric and do not align well with other clinical outcomes, and (ii) their linear, rule-based structure cannot capture nonlinear, outcome-specific risk relationships. We propose a Machine-Learned Comorbidity Index (MLCI) that maps diagnosis codes to a single scalar by maximizing the normalized Hilbert-Schmidt Independence Criterion (nHSIC) between the learned score and multiple clinical outcomes. MLCI captures nonlinear risk-outcome dependence and is supported by a theory that characterizes when a unified, informative admission-level ordering can be achieved across outcomes. Empirical results on multiple benchmark electronic health record (EHR) datasets show that MLCI outperforms strong baselines across multiple evaluation metrics.

18.
medRxiv (Medicine) 2026-06-16

Supplementation with Arabinoxylan Dietary Fiber at Low Doses Produces Behavioral, Metabolic, and Gut Microbial Changes in Healthy, Overweight Adults: A Randomized Placebo-Controlled Trial

Background: Dietary fiber comprises a heterogeneous group of compounds with distinct physicochemical properties and biological effects. As such, functional outcomes observed for one fiber cannot be generalized to others. Some fermentable fibers, such as arabinoxylan, may exert biologically selective effects across multiple physiological domains, highlighting the need to evaluate individual ingredients for their domain-specific activity in controlled human studies. Methods: In this randomized, double-blind, parallel, 3-arm, placebo-controlled trial, healthy, overweight adults were assigned to consume one of two low doses of an arabinoxylan dietary fiber (3.5g or 5g) or placebo over the intervention period. Self-reported appetite sensations were assessed as the primary outcome using validated visual analogue scales. Secondary and exploratory endpoints included lipid parameters, gastrointestinal outcomes, mood-related measures, and gut microbiota composition and fermentation-derived metabolites. Analyses were conducted in the full analysis set and a high-compliance population to assess responses under sustained intake conditions, as per the intended dosing regimen. Results: The primary endpoint of appetite sensations did not differ between either arabinoxylan group and placebo. In contrast, evidence of microbial fermentation and selective microbiota engagement was observed. These responses occurred alongside consistent and favorable changes in lipid parameters under conditions of sustained intake, including reductions in low-density lipoprotein cholesterol and triglycerides. Additional outcomes, including gastrointestinal symptoms and mood, demonstrated domain-specific responses. Conclusion: This study demonstrates that supplementation with low doses of arabinoxylan dietary fiber elicit biologically selective, domain-specific effects across metabolic, microbial, gastrointestinal, and behavioral outcomes, particularly under conditions of sustained intake. These responses occurred independently of changes in appetite sensation, indicating that functional effects were not mediated through appetite-related pathways. Collectively, the findings highlight the ingredient's biological versatility and contextual responsiveness across physiological systems, and suggest its prebiotic potential through alignment with ISAPP's definition of a prebiotic, supporting further investigation of specific mechanistic pathways. Clinical trial registration: https://clinicaltrials.gov/study/NCT06884449, identifier: NCT06884449

19.
arXiv (CS.LG) 2026-06-12

CLARITree: Cholesky and Lookahead Accelerations for Regression with Interpretable Piecewise Linear Trees

arXiv:2606.12840v1 Announce Type: new Abstract: Regression trees are among the most interpretable yet expressive model classes in machine learning. Historically, greedy induction has been the dominant approach for constructing well-performing regression trees. While optimal methods based on dynamic programming and branch-and-bound exist, they are computationally prohibitive for general linear regression trees, despite often achieving substantially better performance than greedy approaches. Recent work has shown that specialized lookahead strategies can dramatically improve runtime while maintaining near-optimal performance, primarily in classification settings. In this work, we develop a novel algorithm for near-optimal, sparse, piecewise linear regression trees that combines a lookahead-style search strategy with efficient rank-one Cholesky updates of the Gram matrix. We demonstrate, both theoretically and empirically, that our method achieves a favorable trade-off between computational efficiency, predictive accuracy, and sparsity, and scales significantly better than the current state of the art.

20.
arXiv (CS.AI) 2026-06-17

PearlVLA: Progressive Embodied Action-Plan Refinement in Latent Space

arXiv:2606.17924v1 Announce Type: cross Abstract: Current Vision-Language-Action (VLA) models face a trade-off between efficient action generation and explicit deliberation. Directly decoding actions from vision-language backbone representations enables low-latency control, whereas explicit reasoning through textual chains, pixel-level subgoals, or action search can improve planning but incurs substantial latency and computational cost. We propose PearlVLA, a VLA framework that moves deliberation into the latent space of a vision-language model (VLM). PearlVLA separates VLM meta-query representations into a fixed visual grounding branch and an iterative latent plan branch. At each refinement round, a plan-conditioned world query probes a lightweight frozen latent world model for an action-free future observation latent, which is fed back to guide plan refinement. A future-guided RefineNet then applies scheduled residual updates to progressively refine a coarse semantic draft into a fine-grained latent action plan. The refined plan after K rounds is then decoded in parallel into an action chunk for low-latency execution. We further introduce Causal Refinement-Grouped Process-Reward RL to optimize the latent refinement process with rewards from longer-horizon imagined futures induced by latent plan edits. Empirical evaluations on the LIBERO benchmark demonstrate that PearlVLA achieves state-of-the-art performance among existing methods.

22.
arXiv (CS.LG) 2026-06-19

Does Text Actually Help? Uncovering and Resolving Text Collapse in Multimodal Time Series Forecasting

arXiv:2606.19413v1 Announce Type: new Abstract: Multimodal time series forecasting, which pairs numerical sequences with domain-relevant textual reports, promises to inject world knowledge into forecasting pipelines. However, we uncover a critical failure mode in existing frameworks that we term text collapse: the text branch converges to a content-independent transformation, contributing negligible discriminative signal regardless of the input description. We argue that text collapse is a consequence of a fundamental asymmetry in time series forecasting: the numerical input is strongly autocorrelated with the output, making the numerical backbone inherently dominant, while the text branch, despite carrying complementary and often critical information, is insufficiently utilized, leading to its systematic underexploitation. To address this, we propose REST-TS (Residual-Exclusive Supervision for Text in Time Series), which turns the asymmetry into a design principle: the numerical backbone produces its own independent numerical forecast, and the text branch is exclusively supervised to predict the structured components of the residual, the prediction gap that numbers cannot explain. Because no numerical pathway can reduce these losses, the text branch must extract genuine content from the input description. Evaluated across diverse real-world domains and backbone architectures, REST-TS achieves state-of-the-art performance and consistently demonstrates greater text-branch utilization than existing frameworks, providing strong empirical evidence that supervising the text branch on the residual compels it to extract genuine content from the input.

23.
arXiv (CS.LG) 2026-06-16

Latent space mapping of interpretable structural coordinates from stochastic single-molecule signals

arXiv:2606.16950v1 Announce Type: cross Abstract: Nanopores are versatile single-molecular sensors, but their utility is fundamentally constrained by stochastic translocation dynamics warping any encoded information. We resolve it by shifting from time-domain analysis to a learned latent-space mapping via a contrastive encoder trained exclusively on simulated signals from a physics-informed model. This encoder maps solid-state nanopore signals of engineered DNA barcodes into an interpretable molecular coordinate system. The learned representation is responsive to structural barcode parameters while remaining invariant to acquisition conditions and translocation conformation, allowing data pooling across devices. Molecule identification requires a single pass through the encoder, reducing computational cost by three orders of magnitude relative to alignment-based methods. We experimentally validate through mixture quantification, rare-variant detection, consensus barcode reconstruction, and real-time signal acquisition. This shift from temporal analysis to mapping structural coordinates into a latent space changes the paradigm behind analyzing stochastic sensor signals by linking classification to interpretable encoded molecular information.

24.
arXiv (CS.LG) 2026-06-15

Lyapunov-Based Sample Complexity Analysis for Weakly-Coupled MDPs

arXiv:2606.14095v1 Announce Type: new Abstract: We study the sample complexity of learning in average-reward weakly-coupled Markov decision processes (WCMDPs) and Restless Bandits (RBs) under a generative model. Naive reduction to a tabular MDP leads to high complexity bounds as the state-action space is exponentially large in the number of arms $N$. By exploiting the weakly coupled structure, we show that near-optimal policies can be learned with sample and computational complexities that are polynomial in $N$. Specifically, we analyze the plug-in approach, which applies an efficient planning algorithm to an empirical model estimated from data. For fully heterogeneous WCMDPs, we establish the first finite-sample PAC guarantee with polynomial complexity and an $O(1/\sqrt{N})$ optimality gap. For homogeneous RBs, we further prove that a smaller optimality gap is achievable under mild structural assumptions. A primary technical contribution of our work is a novel Lyapunov-based analysis framework. Unlike classical approaches that rely on the difficult-to-control bias function, our framework uses an explicitly constructed Lyapunov function along with a drift transfer technique between the true and empirical models. A key step of independent interest in our framework is a fine-grained perturbation analysis for the underlying linear programming (LP) relaxation, which provides a general tool for analyzing LP-based policies and weakly-coupled systems.

25.
arXiv (CS.LG) 2026-06-12

Robust State-Conditional Feature-Weighted Jump Models for Temporal Clustering

arXiv:2606.13146v1 Announce Type: cross Abstract: We propose a robust feature-weighted jump model for time-dependent clustering. A penalty is used to encourage smoothness of transitions over time, while robustness is achieved through the use of a Tukey's biweight loss function. An additional parameter controls the variability of feature weights across states, allowing the model to assign state-specific relevance to each feature. We illustrate in simulation how the method accurately recovers the true cluster sequence and reliably identifies relevant features, outperforming competing approaches, particularly in the presence of outliers. We conclude with two empirical applications, one on the number of conflict-related homicides in Kosovo in the period 1998-2000, and another on macroeconomic performance of twelve European countries in the period 1949-2024.