Academic Intelligence · Curated Daily

探索全球前沿学术脉络

AcademicHub 汇聚顶级期刊与预印本平台的实时文献。定制您的专属科研雷达,利用大语言模型自动生成交叉领域文献分析简报。

01.
arXiv (CS.AI) 2026-06-19

Triangular Consistency as a Universal Constraint for Learning Optical Flow

arXiv:2606.19938v1 Announce Type: cross Abstract: We propose triangular consistency as a first-principled constraint for optical flow, which is agnostic to network architecture, supervision type, and dataset, and applies to both image-pair and multi-frame settings. This simple but powerful constraint is to compose two flows to induce a third flow and enforce consistency among the three. The composed flows may arise from (i) image pairs, yielding cycle consistency; (ii) multiple video frames, producing longer-range motion through temporal chaining; or (iii) image pairs combined with controlled synthetic transformations, which becomes data augmentation. This triangular consistency introduces negligible computational overhead and requires no additional annotations. Since it is derived directly from the geometry of optical flow, it does not rely on model-specific assumptions and serves as a ``universal'' plug-and-play component for optical flow training. Experiments show consistent improvement across supervised, unsupervised, and transfer learning settings.

02.
arXiv (CS.AI) 2026-06-16

ArtNet: A JEPA-Like Articulatory Predictive Framework for Robust Zero-Shot Phoneme Recognition

arXiv:2606.16595v1 Announce Type: cross Abstract: Zero-shot cross-lingual phoneme recognition is often hindered by the fragility of direct acoustic-to-symbol mapping, which is susceptible to language-specific variations. Echoing joint-embedding predictive architecture (JEPA) work in vision, we propose ArtNet, a framework that explores a structured feature prediction task based on articulatory features to enhance acoustic robustness. Specifically, ArtNet integrates an articulatory predictor, designed to extract universal articulatory representations from self-supervised learning (SSL) features, with a variational information bottleneck (VIB) to suppress language-specific variations. Experiments on seven unseen languages demonstrate that ArtNet, particularly when synergized with the proposed vector-space inventory alignment (VSIA) strategy, significantly outperforms competitive baselines, achieving a 20.56\% relative reduction in phoneme error rate (PER) and 7.01\% in phoneme feature error rate (PFER).

03.
arXiv (CS.LG) 2026-06-16

Floating-Point Networks with Automatic Differentiation Can Represent Almost All Floating-Point Functions and Their Gradients

arXiv:2605.01702v2 Announce Type: replace Abstract: Theoretical studies show that for any differentiable function on a compact domain, there exists a neural network that approximates both the function values and gradients. However, such a result cannot be used in practice since it assumes real parameters and exact internal operations. In contrast, real implementations only use a finite subset of reals and machine operations with round-off errors. In this work, we investigate whether a similar result holds for neural networks under floating-point arithmetic, when the gradient with respect to the input is computed by the automatic differentiation algorithm $D^\mathtt{AD}$. We first show that given a floating-point function $\phi$ (e.g., a loss function), arbitrary function values and gradients can be represented by a floating-point network $f$ and $D^\mathtt{AD}(\phi\circ f)$, respectively. We further extend this result: given $\phi_1,\dots,\phi_n$, $D^\mathtt{AD}(\phi_i\circ f)$ can simultaneously represent arbitrary gradients while $f$ represents the target values, under mild conditions. Our results hold for practical activation functions, e.g., $\mathrm{ReLU}$, $\mathrm{ELU}$, $\mathrm{GeLU}$, $\mathrm{Swish}$, $\mathrm{Sigmoid}$, and $\mathrm{tanh}$.

04.
arXiv (CS.LG) 2026-06-19

Comparative Study of Neural Surrogate Architectures for Autoregressive Prediction of Internal Battery States

arXiv:2606.20053v1 Announce Type: new Abstract: The Doyle-Fuller-Newman (DFN) model resolves internal electrochemical states in lithium-ion batteries with high fidelity. However, the numerical solution of its governing equations is computationally prohibitive for real-time deployment, limiting scalability from individual cells to pack and fleet-scale applications. While machine learning surrogates can substantially reduce inference latency through GPU acceleration, most existing approaches learn solution approximations tied to specific operating conditions rather than learning generalizable state-evolution dynamics. This work presents a systematic comparison of four neural network architectures (MLP, ResNet, U-Net, FNO) formulated as autoregressive state-transition operators that predict full DFN internal states across a wide range of operating conditions. To ensure a controlled architectural comparison, all models are trained under a unified framework using multi-step unrolling and current-conditioning, isolating the impact of spatial inductive bias. Results demonstrate that the U-Net's multi-scale feature hierarchy achieves a mean final-step nRMSE of 3% averaged across all internal state variables after 300-step autoregressive rollouts, while providing a 5.38x speed-up over the numerical solver. These findings highlight spatial inductive bias as a critical determinant of surrogate performance, advancing the development of surrogates for internal state observability for next-generation battery management systems and digital twins.

05.
arXiv (CS.LG) 2026-06-11

OmniLoc: A Geometry-Aware Foundation Model for Anchor-Free UE Localization Across Diverse Indoor Environments

arXiv:2606.11490v1 Announce Type: new Abstract: Indoor localization from wireless measurements remains challenging in large-scale deployments due to substantial variation in building geometry, the set of detectable access points (APs), and the heterogeneity of received signals. Existing learning-based methods often perform well only in limited settings and degrade under environmental shifts, making robust anchor-free localization across diverse indoor environments notoriously difficult. In this paper, we present OmniLoc, an environment-interactive foundation model for anchor-free user equipment localization across diverse indoor environments. To the best of our knowledge, OmniLoc is the first foundation-model-based approach built directly on wireless measurements for this task. OmniLoc is built on three key designs. First, a unified input tokenization module converts heterogeneous wireless measurements into a common representation that is more amenable to learning. Second, a geometry-aware Transformer performs AP-aware feature extraction by emphasizing dominant APs while aggregating complementary evidence from supporting APs. Third, a geometry-aware location estimation module conditions regression on geometric embeddings to produce geometrically consistent location predictions. We evaluate OmniLoc on both a large-scale in-house dataset and a public benchmark dataset. Results show that OmniLoc significantly outperforms existing methods, consistently improves existing backbones when its design components are integrated, and demonstrates strong generalization in cross-environment evaluations.

06.
Nature Medicine 2026-06-12

The Hong Kong Genome Project is a flagship initiative for precision medicine in Chinese populations

作者: 未知作者

The Hong Kong Genome Project established a genome sequencing database that provides improved diagnoses for patients and more efficient, population-tailored carrier status screening. Actionable pharmacogenomic variants were identified in almost all participants, informing drug prescriptions. This work establishes a genomic resource and a transferable model for equitable precision medicine in underrepresented populations worldwide.

07.
arXiv (CS.AI) 2026-06-16

FastMix: Fast Data Mixture Optimization via Gradient Descent

arXiv:2606.14971v1 Announce Type: cross Abstract: While large and diverse datasets have driven recent advances in large models, identifying the optimal data mixture for pre-training and post-training remains a significant open problem. We address this challenge with FASTMIX, a novel framework that automates data mixture discovery while training only a single proxy model. Instead of relying on predefined heuristics or resource-intensive simulations, FASTMIX jointly optimizes mixture coefficients and model parameters, substantially improving efficiency and scalability over prior approaches. At the core of FASTMIX is a reformulation of mixture selection as a bilevel optimization problem. Under this reformulation, we show that optimizing mixture ratios is mathematically equivalent to assigning per-source loss weights under uniform source sampling. This embeds the mixture coefficients directly into the differentiable iterative optimization objective, enabling efficient, gradient-based optimization of both mixture and model. To solve the optimization problem, FASTMIX implements an approximate iterative optimization procedure, alternating between (i) updating model parameters on data sampled according to current mixture ratios (inner loop) and (ii) updating mixture ratios based on validation feedback (outer loop). Across pre- and post-training, FASTMIX outperforms baselines while drastically reducing search cost. Code (https://github.com/hrtan/fastmix)

08.
arXiv (CS.CV) 2026-06-16

CausalDrive: Real-time Causal World Models for Autonomous Driving

World models have emerged as a promising paradigm for scaling autonomous driving (AD) data, yet existing video generative models fall short as interactive simulators. Layout-conditioned renderers rely on "oracle" future trajectories of all background agents, rendering them strictly non-reactive. Conversely, pure action-conditioned predictors lack semantic control over complex interactions and suffer from prohibitive diffusion latencies, hindering closed-loop policy learning. To bridge this gap, we present CausalDrive, a controllable, real-time foundation driving world renderer. CausalDrive operates solely on the initial front-view frame, the ego-vehicle's trajectory, and a macroscopic text prompt. By excluding future NPC layouts, we compel the model to intrinsically predict causal interactions, enabling text-driven control over Driving Sociology, allowing users to dynamically orchestrate diverse counterfactual reactions to identical ego-actions. To overcome the efficiency bottleneck and address the covariate shift in autoregressive generation, we propose a novel Context-Forced DMD architecture. This combines continuous flow-matching with a self-correcting distillation objective, achieving interactive speeds of 12 FPS. This breakthrough transforms the passive video generator into a playable neural simulator. We demonstrate its versatility across three downstream applications: (1) generative closed-loop evaluation with significantly mitigated collision artifacts, (2) large-scale Reinforcement Learning (RL) post-training driven by a Video2Reward module, and (3) real-time human-in-the-loop simulation. Extensive experiments validate that policies trained within CausalDrive's reactive scenarios exhibit superior interaction capabilities in the real world.

09.
arXiv (CS.CV) 2026-06-17

Reasoning Text-to-Video Retrieval for Operating Room Clips via Action-Driven Digital Twins

Text-to-video retrieval in operating rooms (OR) is an enabling technology for OR safety, as it allows stakeholders to retrieve and inspect recordings of specific events. However, because the most safety-critical events may not follow the common structure, to unlock its full potential text-to-video retrieval must be able to handle implicit queries that require reasoning to identify the right video (e.g., the step right before clipping). However, existing methods rely on global embeddings that cannot reason over such queries. We propose OR3, a text-to-video retrieval method that converts clips into action-driven digital twins (ActDTs), grouping concurrent subject-action-object triplets under non-overlapping temporal intervals. Moreover, rather than cross-modal matching through paired encoders, OR3 performs imagination-based retrieval where an LLM generates hypothetical ActDTs from queries. This enables intra-modal matching via a single encoder trained with ActDT-tailored hard negatives. Finally, evidence-grounded refinement revises imagined ActDTs based on discrepancies with top candidates to capture procedure-specific patterns. We construct a benchmark from MM-OR with 276 implicit queries across four reasoning categories over 386 clips from robotic knee procedures. OR3 achieves 57.6 R@1 and 77.3 R@5, outperforming the strongest baseline. These results demonstrate that OR3 enables fine-grained discrimination between visually similar OR video clips through temporal action reasoning.

10.
arXiv (CS.LG) 2026-06-15

Contract-Based Compositional Shielding for Safe Multi-Agent Reinforcement Learning

arXiv:2606.14130v1 Announce Type: new Abstract: Safe coordination problems surface in multi-agent reinforcement learning when global safety cannot be enforced by any agent unilaterally: the admissibility of one agent's action may depend on the dynamics of other agents. Decentralised shields can enforce safety at runtime, but purely factorised permissions often exclude optimal team behaviour that is safe only through coordination. We study deterministic safety guarantees for agents trained and deployed under decentralised execution, recovering team-optimal safe behaviour without centralised runtime control. Agents have a shared global specification $\phi$ in the safety fragment of Linear Temporal Logic ($\mathsf{LTL}_{\mathsf{safe}}$ ), and select among tuples of local $\mathsf{LTL}_{\mathsf{safe}}$ obligations whose conjunction implies the global specification $\phi$. Each agent may rely on the other agents' local obligations as assumptions because the whole contract tuple is certified simultaneously and allows projection into local action masks. At learning time, a non-stationary multi-armed bandit chooses among a library of local $\mathsf{LTL}_{\mathsf{safe}}$ obligations to select the tuple that optimises team reward, all without forgoing end-to-end safety. We evaluate the approach across 6 environments and 15 algorithmic variants.

11.
arXiv (CS.LG) 2026-06-16

Brownian Kernel Ladders

arXiv:2606.15812v1 Announce Type: new Abstract: Constructing mathematically tractable function spaces that capture hierarchical compositional representations remains a central challenge in statistical learning theory. We introduce Brownian kernel ladders (BKLs), a recursively defined hierarchy of integral reproducing kernel Hilbert spaces generated through Brownian-kernel integral constructions. Starting from linear functionals, each layer is obtained by integrating Brownian kernels over probability measures supported on subsets of the previous layer, yielding a recursive function-space model in which depth is encoded directly through the hierarchy. Based on this framework, we define canonical BKL spaces together with an associated complexity functional. We establish several analytical and statistical properties of these spaces. In particular, we show that BKL spaces form quasi-Banach spaces, satisfy depth-dependent Hölder regularity estimates, and exhibit strict monotonicity with respect to depth. We further prove existence results for regularized empirical risk minimization and derive Gaussian complexity bounds that remain uniformly controlled with respect to both the ambient dimension and the hierarchy depth. A key ingredient of the analysis is a combinatorial proof technique based on recursive subset decompositions and Brownian-kernel threshold representations. These estimates yield excess-risk guarantees of near-parametric order for regularized empirical risk minimization over BKL spaces. Our results provide a mathematically tractable hierarchical function-space framework for studying compositional representations in deep learning.

12.
arXiv (CS.AI) 2026-06-15

Optimizing Agentic Reasoning with Retrieval via Synthetic Semantic Information Gain Reward

arXiv:2602.00845v3 Announce Type: replace Abstract: Agentic reasoning enables large reasoning models (LRMs) to dynamically acquire external knowledge, but yet optimizing the retrieval process remains challenging due to the lack of dense, principled reward signals. In this paper, we introduce InfoReasoner, a unified framework that incentivizes effective information seeking via a synthetic semantic information gain reward. Theoretically, we redefine information gain as uncertainty reduction over the model's belief states, establishing guarantees, including non-negativity, telescoping additivity, and channel monotonicity. Practically, to enable scalable optimization without manual retrieval annotations, we propose an output-aware intrinsic estimator that computes information gain directly from the model's output distributions using semantic clustering via bidirectional textual entailment. This intrinsic reward guides the policy to maximize epistemic progress, enabling efficient training via Group Relative Policy Optimization (GRPO). Experiments across seven question-answering benchmarks demonstrate that InfoReasoner consistently outperforms strong retrieval-augmented baselines, achieving up to 5.4% average accuracy improvement. Our work provides a theoretically grounded and scalable path toward agentic reasoning with retrieval. The code is available at https://github.com/dl-m9/InfoReasoner

13.
arXiv (CS.AI) 2026-06-16

Phantoms and Disclosures: a Causal Framework for Auditing Synthetic Data

arXiv:2606.16952v1 Announce Type: cross Abstract: The rapid adoption of generative AI and Large Language Models (LLMs) has spurred interest in synthetic data as a privacy-preserving alternative to sensitive real-world datasets. However, generating high-utility synthetic data often carries the risk of memorizing and regurgitating private information from the training corpus. In this work, we present a customizable empirical auditing framework designed to detect and explain such data disclosures. Our framework introduces a mechanism to distinguish between "true disclosures"-where the system directly reproduces a user's information-and "phantom disclosures''-where the system incidentally generates a user's data. By partitioning input data into training and holdout sets and applying rigorous statistical hypothesis testing, we determine if observed disclosures are consistent with strict privacy baselines, such as zero-learning or specific Differential Privacy (DP) bounds. Crucially, this approach requires no model access, no canary insertion, and no reference model training -only the synthetic output and a held-out control set. We demonstrate that this framework effectively functions as a membership inference attack, providing empirical lower bounds on privacy leakage that are tighter than prior data-based auditing methods. Our approach is model-agnostic, applies to any synthetic data generation mechanism, and requires orders of magnitude fewer computational resources than shadow-model or canary-based alternatives.

14.
arXiv (CS.LG) 2026-06-15

On Rate-Optimal Partitioning Classification from Observable and from Privatised Data

arXiv:2312.14889v4 Announce Type: replace-cross Abstract: In this paper we revisit the classical method of partitioning classification and prove novel convergence rates under relaxed conditions, both for observable (non-privatised) and for privatised data. We consider the problem of classification in a $d$ dimensional Euclidean space. Previous results on the partitioning classifier worked with the strong density assumption (SDA), which is restrictive, as we demonstrate through simple examples. Here, we study the problem under much milder assumptions. We presuppose that the distribution of the inputs is a mixture of an absolutely continuous and a discrete distribution, such that the absolutely continuous component is concentrated on a $d_a$ dimensional subspace. In addition to the standard Lipschitz and margin conditions, a novel characteristic of the absolutely continuous component is introduced, by which the convergence rate of the classification error probability is computed, both for the binary and for the multi-class cases. This bound can reach the minimax optimal convergence rate achievable using SDA, but under much milder distributional assumptions. Interestingly, this convergence rate depends only on the intrinsic dimension of the continuous inputs, $d_a$, and not on $d$. Under privacy constraints, the data cannot be directly observed, and the constructed classifiers are functions of the randomised outcome of a suitable local differential privacy mechanism. In this paper we add Laplace distributed noises to the discretisations of all possible locations of the feature vector and to its label. Again, tight upper bounds on the convergence rate of the classification error probability can be derived, without using SDA, such that this rate depends on $2d_a$.

15.
arXiv (CS.CV) 2026-06-16

SLUM-i: Semi-supervised Learning for Urban Mapping of Informal Settlements and Data Quality Benchmarking

Rapid urban expansion has fueled the growth of informal settlements in major cities of low- and middle-income countries, with Lahore and Karachi in Pakistan and Mumbai in India serving as prominent examples. However, large-scale mapping of these settlements is severely constrained not only by the scarcity of annotations but by inherent data quality challenges, specifically high spectral ambiguity between formal and informal structures and significant annotation noise. We address this by introducing a benchmark dataset for Lahore, constructed from scratch, along with companion datasets for Karachi and Mumbai, which were derived from verified administrative boundaries, totaling approximately 900 $km^2$ of urban area. This collection is supplemented by four cities from prior literature across Sub-Saharan Africa and Latin America, with comprehensive data quality assessments provided for each city. We also propose a semi-supervised segmentation framework designed to mitigate the class imbalance and distribution mismatch inherent in standard semi-supervised learning pipelines. Our method integrates a Class-Aware Adaptive Thresholding mechanism that dynamically adjusts confidence thresholds to prevent minority class suppression, and a DINOv2-based unlabeled pool filter that removes out-of-distribution tiles prior to training to reduce covariate shift. Extensive experiments across seven cities spanning three continents, repeated over five random seeds, demonstrate gains of up to +5.9 pp mIoU over state-of-the-art semi-supervised baselines, with both components being architecture-agnostic and adding no inference overhead.

16.
medRxiv (Medicine) 2026-06-16

Diurnal variation in brain-derived tau and five other blood-based biomarkers for dementia and their association with cognitive performance

Blood-based biomarkers of dementia are a promising scalable tool for early diagnosis, tracking disease progression, and evaluating therapeutic efficacy. Utility of these biomarkers will not only be dependent on the reliability of their association with pathology but also contingent on their ability to track cognitive status. Previously, we demonstrated diurnal variation in several biomarkers (amyloid beta (A{beta}) 42 and 40, 42/40 ratio, glial fibrillary acidic protein (GFAP), neurofilament light (NfL), and phosphorylated-Tau 217 (p-Tau217)) which has implications for their reliability. Here, we extend these observations to a larger cohort, include brain-derived tau (BD-Tau), which is assumed to be produced exclusively in the brain, and report endocrine measures of circadian rhythmicity. We not only assessed whether these biomarkers vary with time of day, but also whether they associate with daytime function and whether these associations vary with cognitive domain and number of repeated assessments. Data collected in 20 PLWA (72.4{+/-}5.9 years, mean{+/-}SD) and 19 controls (68.9{+/-}9.8 years) were analysed. Participants completed 14 days of home monitoring and one laboratory assessment of sleep and daytime function: mood, daytime sleepiness, reaction time, immediate and delayed memory recall, everyday memory errors. During the 27-hour residential laboratory session, 3-hourly blood samples were collected and analysed for the six blood-based biomarkers of dementia as well as melatonin and cortisol. Rhythmicity of melatonin and cortisol did not differ between groups. P-Tau217 and GFAP (p

17.
medRxiv (Medicine) 2026-06-12

Deconvolution-based cell-type specific DNA methylation-wide and transcriptome-wide association studies identify risk CpG sites and genes associated with colorectal cancer risk

Bulk tissue-based DNA methylation-wide (MWAS) and transcriptome-wide association studies (TWAS) have identified CpG sites and genes associated with colorectal cancer (CRC) risk, but do not account for cellular heterogeneity. To address this, we developed a deconvolution-informed framework to infer cell-type specific DNA methylation and gene expression profiles from bulk normal colon tissues using reference single-cell epigenomic and transcriptomic datasets. We performed cell-type specific MWAS (ctMWAS) using deconvoluted DNA methylation data from 293 normal colon samples and conducted cell-type specific TWAS (ctTWAS) using deconvoluted gene expression data from 707 normal colon samples. Genetically predicted methylation and expression models were integrated with CRC GWAS summary statistics (78,473 cases and 107,143 controls) to identify risk-associated CpG sites and genes. Through ctMWAS, ctTWAS, and colocalization analyses, we identified 178 significant cell-type-specific CpG sites in 106 loci and 68 risk genes in 40 loci, including 26 previously unreported loci. Through additional integrative methylation-gene analysis, we prioritized 132 candidate risk genes, the majority of which were supported by multi-omics evidence and stage-specific dysregulation across the adenoma-carcinoma and serrated-carcinoma progression pathways. Pathway enrichment analyses implicated pathways involved in DNA double-strand break repair, TP53 regulation, TGF-{beta} signaling, and innate immune responses. Among prioritized genes, 14 were identified as putative druggable targets linked to 90 FDA-approved or clinical-stage drugs. Experimental validation supports an oncogenic role for SF3A3. These findings demonstrate that deconvolution-informed integrative analyses enable cell-type-resolved identification of epigenetic and transcriptional mechanisms underlying CRC susceptibility and provide insights into disease biology, prevention, and therapeutic target discovery.

18.
medRxiv (Medicine) 2026-06-20

EpiLink: a simulation-based compatibility model for genomic transmission clustering in infectious disease surveillance

Identifying recently linked infections from pathogen genome sequences is central to infectious disease surveillance, yet many clustering approaches rely on fixed genetic distance thresholds whose relationship to transmission is often unclear. This limitation is especially important in rapidly growing outbreaks and superspreading events, where many cases may be sampled close together in time and share little genetic variation, making true transmission links difficult to distinguish from other closely related infections. Supervised models can improve discrimination, but they require labelled transmission data that are rarely available during outbreak response. We developed EpiLink, a threshold-free method that estimates whether two cases are compatible with recent transmission. Here, compatibility means how well the observed genetic distance and sampling-time difference between two cases fit what would be expected if they were linked by defined recent transmission scenarios. EpiLink simulates plausible recent transmission histories while accounting for uncertainty in infection timing, testing delay, and mutation accumulation, then assigns higher scores to pairs whose observed differences are typical of those simulations. EpiLink was evaluated using both synthetic and empirical SARS-CoV-2 outbreak data from the 2020 Boston epidemic. Two EpiLink variants were compared to a logistic regression model trained on labelled transmission data. One EpiLink variant assumed deterministic mutation accumulation, with genetic differences proportional to elapsed evolutionary time; the other accounted for stochasticity by sampling mutation counts from a Poisson distribution. The logistic regression model performed better at distinguishing linked from unlinked pairs, but EpiLink achieved comparable clustering accuracy. In the Boston data, EpiLink recovered clusters enriched for documented conference and skilled nursing facility outbreaks. EpiLink thus provides an interpretable, simulation-based approach for identifying recent transmission clusters when fixed thresholds are difficult to justify and labelled transmission data are unavailable.

19.
arXiv (CS.LG) 2026-06-16

Bayesian Optimization for Learning Nonlinear MPC in Autonomous Agent Navigation

arXiv:2606.14763v1 Announce Type: cross Abstract: Real-time autonomous navigation in dynamic, unknown environments remains a fundamental challenge for mobile robotics. We propose a map-free framework that tightly integrates reactive rolling-horizon planning with nonlinear Model Predictive Control (MPC). At each control cycle, a LiDAR-based Gaussian occupancy representation is constructed and used to generate collision-free trajectories via A* search, which are then tracked by a CasADi/IPOPT MPC formulation incorporating a smooth sigmoid obstacle barrier. To improve robustness to parameter sensitivity, we adopt an offline Bayesian optimization scheme based on Tree-structured Parzen Estimators (TPE), which identifies near-optimal controller parameters with respect to a composite navigation objective. In addition, a Gaussian Process surrogate is used to analyze parameter sensitivity and provide insight into the optimization landscape. The proposed framework is robot-agnostic and is evaluated on the Unitree Go2 quadruped in simulation using Gazebo, followed by deployment on the physical robot. Experimental results show that parameters tuned in simulation transfer effectively to hardware, maintaining comparable performance without additional tuning. The full system achieves up to a 90.0\% navigation success rate when deployed, along with a 38.9\% average improvement in the evaluation metrics across simulated environments.

20.
arXiv (CS.CV) 2026-06-16

AVA-VLA: Improving Vision-Language-Action models with Active Visual Attention

Vision-Language-Action (VLA) models have shown remarkable progress in embodied tasks recently, but most methods process visual observations independently at each timestep. This history-agnostic design treats robot manipulation as a Markov Decision Process, even though real-world robotic control is inherently partially observable and requires reasoning over past interactions. To address this mismatch, we reformulate VLA policy learning from a Partially Observable Markov Decision Process perspective and propose AVA-VLA, a framework that conditions action generation on a recurrent state that serves as a neural approximation to the agent's belief over task history. Built on this recurrent state, we introduce Active Visual Attention (AVA), which dynamically reweights visual tokens in the current observation to focus on regions most relevant given both the instruction and execution history. Extensive experiments show that AVA-VLA achieves state-of-the-art performance on standard robotic benchmarks, including LIBERO and CALVIN, and transfers effectively to real-world dual-arm manipulation tasks. These results demonstrate the effectiveness of temporally grounded active visual processing for improving VLA performance in robotic sequential decision-making. The project page is available at https://liauto-dsr.github.io/AVA-VLA-Page.

21.
arXiv (CS.CV) 2026-06-11

Active Sampling for Ultra-Low-Bit-Rate Video Compression via Conditional Controlled Diffusion

Diffusion models provide a powerful generative prior for perceptual reconstruction at ultra-low bitrates, but effective video compression requires controlling the generative process using highly compact conditioning signals. In this work, we present ActDiff-VC, a diffusion-based video compression framework for the ultra-low-bitrate regime. Our method partitions videos into variable-length segments, transmits keyframes only when needed, and summarizes temporal dynamics using a compact set of tracked point trajectories. Conditioned on these sparse signals, a conditional diffusion decoder synthesizes the remaining frames, enabling perceptually realistic reconstruction under severe rate constraints. To support this design, we introduce two mechanisms: content-adaptive keyframe selection and budget-aware sparse trajectory selection, which together enable compact yet effective conditioning for generative reconstruction. Experiments on the UVG and MCL-JCV benchmarks show that ActDiff-VC achieves up to 64.6\% bitrate reduction at matched NIQE, improves KID by up to 64.6\% and FID by up to 37.7\% at comparable bitrates against strong learned codecs, and delivers favorable perceptual rate–distortion trade-offs relative to learned and diffusion-based baselines in the ultra-low-bitrate regime.

22.
arXiv (CS.LG) 2026-06-19

FloatDoor: Platform-Triggered Backdoors in LLMs

arXiv:2606.19535v1 Announce Type: cross Abstract: Large language models (LLMs) are increasingly deployed in sensitive settings such as software engineering, where their outputs directly shape downstream artifacts. Recent work has shown that an identical model can produce measurably different outputs depending on the deployment platform, a consequence of non-associative floating-point arithmetic and divergent kernel implementations. We study the security implications of this platform-dependent variability and uncover a novel attack surface on LLM deployments. We introduce FloatDoor, the first input-independent, platform-triggered backdoor attack against generative LLMs. The compromised model exhibits adversary-chosen behavior when served on a target platform and is otherwise benign. FloatDoor is realized through two lightweight LoRA adapters, one that amplifies inter-platform numerical divergence and one that binds the resulting platform signature to a malicious downstream task, while leaving aggregate model utility largely intact. FloatDoor exploits a pronounced time-of-check, time-of-use gap between model auditing and serving. We demonstrate FloatDoor on Qwen3-4B across a broad range of deployment targets, including NVIDIA GPUs, Google TPUs, AWS Graviton, and Alibaba Yitian-710. As a final case study, we show that FloatDoor reliably induces exploitable code vulnerabilities on a chosen target platform. Our results establish a new class of attacks on LLM deployments and underscore the pressing need for trusted model supply chains in sensitive, LLM-powered applications.

24.
arXiv (CS.CV) 2026-06-18

Benchmarking Physics-Informed Time-Series Models for Operational Global Station Weather Forecasting

The development of Time-Series Forecasting (TSF) models is often constrained by the lack of comprehensive datasets, especially in Global Station Weather Forecasting (GSWF), where existing datasets are small, temporally short, and spatially sparse. To address this, we introduce WEATHER-5K, a large-scale observational weather dataset that better reflects real-world conditions, supporting improved model training and evaluation. While recent TSF methods perform well on benchmarks, they lag behind operational Numerical Weather Prediction systems in capturing complex weather dynamics and extreme events. We propose PhysicsFormer, a physics-informed forecasting model combining a dynamic core with a Transformer residual to predict future weather states. Physical consistency is enforced via pressure-wind alignment and energy-aware smoothness losses, ensuring plausible dynamics while capturing complex temporal patterns. We benchmark PhysicsFormer and other TSF models against operational systems across several weather variables, extreme event prediction, and model complexity, providing a comprehensive assessment of the gap between academic TSF models and operational forecasting. The dataset and benchmark implementation are available at: https://github.com/taohan10200/WEATHER-5K.

25.
arXiv (CS.AI) 2026-06-12

An Explainable AI Assistant for Introductory Programming Education: Improving Feedback Reliability with Instructor-AI Collaboration

arXiv:2606.12425v1 Announce Type: cross Abstract: Active learning is widely recognized as an effective approach for improving learning outcomes in introductory programming courses. However, insufficient instructional support often limits students' access to timely, personalized feedback, which is crucial for mastering foundational programming concepts. Although recent advances in AI, particularly large language models, offer scalable opportunities for feedback, concerns about explainability and reliability remain. In this paper, we present an AI-driven classroom assistant that leverages an explainable AI model to analyze student code, map logical errors to instructor-identified misconceptions, and deliver instructor-authored feedback, thereby grounding reliability in instructor-defined pedagogical knowledge. To evaluate the effectiveness of our framework, we conducted an expert evaluation to examine its alignment with instructor-verified feedback and deployed the system in a classroom setting to assess students' perceptions of its usability. Results indicate that the assistant can provide accurate, instructor-verified feedback to students while fostering a positive experience.