Academic Intelligence · Curated Daily

Explore the Frontier of Global Academia

AcademicHub aggregates real-time literature from top journals and preprint platforms. Build your personal research radar and let large language models compile cross-disciplinary analysis briefings automatically.

01.
arXiv (CS.LG) 2026-06-18

Pointwise is Pointless? A Multimodal Ablation Study for Precipitation Nowcasting with Graph Neural Networks

arXiv:2606.18436v1 Announce Type: cross Abstract: Sparse point observations are increasingly available for precipitation nowcasting, but it is unclear how much they improve dense radar-field forecasts. We partially address this question with a multimodal graph neural network nowcasting system over the Nordic radar domain. The model predicts rain rate every five minutes up to two hours ahead and is trained with different combinations of radar history, MEPS numerical weather prediction, Netatmo surface observations, MSG satellite channels, stochastic noise, and CRPS-based ensemble losses. The study is designed as an ablation of operationally relevant information sources and training objectives. We compare radar-only, NWP-informed, station-informed, satellite-informed, noise-augmented, and CRPS-based configurations using complementary diagnostics on the radar grid, at station locations, for rain onset, and through oracle, displacement, and amplitude scores. The results show that each source improves a different part of the forecast problem. MEPS stabilises radar-only extrapolation, Netatmo observations improve local station and onset diagnostics, and satellite predictors reduce some station-level biases but may activate rain too early when used deterministically. CRPS-based configurations provide the most consistent radar-grid gains, while the combined satellite and CRPS setup gives the best overall oracle/DAS score. These results do not support the conclusion that point observations are uninformative for nowcasting, but they show that local observational skill and spatially coherent radar-field skill are distinct targets. The practical implication is that sparse observations can provide useful local constraints, but their benefit for radar-like fields depends on the training loss, uncertainty representation, and how observation support is encoded in the model.

02.
arXiv (CS.AI) 2026-06-16

Sensory Restoration via Brain-Computer Interfaces: A Unified 2 x 2 Framework and Convergence Roadmap

Authors:

arXiv:2606.15091v1 Announce Type: cross Abstract: Millions of individuals worldwide suffer from sensory and communication deficits caused by neurodegenerative diseases, stroke, or trauma. Brain-computer interfaces (BCIs) offer a promising avenue for sensory and motor restoration. However, the scientific literature remains highly fragmented between invasive neuroprosthetics and non-invasive electrophysiological decoders, with a lack of consistent terminology and comparison metrics. This chapter proposes a unified 2 x 2 framework categorizing BCIs along two axes: degree of invasiveness (invasive vs. non-invasive) and signal direction (afferent sensory-IN vs. efferent sensory-OUT). We define and distinguish the paradigms of restoration, substitution, and augmentation. Furthermore, we outline a structural roadmap for the convergence of these modalities over near-, medium-, and long-term horizons, focusing on physical limits and the integrative role of machine learning foundation models.

03.
arXiv (CS.AI) 2026-06-19

GDGU: A Gradient Difference-based Graph Unlearning Method for Cyberattack Localization in Electric Vehicle Charging Networks

arXiv:2606.19566v1 Announce Type: cross Abstract: Electric vehicle charging stations (EVCSs) can expose distribution feeders to cyberattacks. While machine learning methods, including graph neural networks, can localize which bus is compromised, significant challenges remain in data sharing and model training. For example, privacy regulations grant EVCS owners the right to delete their training data from a deployed model, yet retraining from scratch on every request is computationally prohibitive. To address this, we study graph unlearning (GU) for EVCS cyberattack localization, formulated as a feature-level unlearning problem on a graph-level multi-label classification task. Specifically, we propose gradient difference-based graph unlearning (GDGU), which removes the influence of the requested deletion data through a first-order parameter correction. The correction is computed from the gradient difference between the original training data and a modified dataset in which only the charging power features at the requested EVCS buses are unlearned. Then, a batch-normalization recalibration and a brief recovery fine-tuning step are applied to restore localization utility. We benchmark GDGU against two second-order GU baselines on the IEEE 34-bus, 123-bus, and 8500-node distribution networks across three graph neural network backbones and cumulative unlearning scenarios. GDGU matches the strongest baseline on localization utility and reaches forgetting fidelity close to full-retraining, while unlearning 10 to 12 times faster than retraining from scratch and using far less memory than the second-order GU baselines.

04.
medRxiv (Medicine) 2026-06-11

Plasma protein prioritisation in rheumatoid arthritis reveals druggable targets and shared biology with cardiovascular diseases

Abstract Background Rheumatoid arthritis (RA) is an autoimmune inflammatory disease with complex and incompletely understood molecular mechanisms. Understanding circulating proteins associated with RA may improve understanding of disease biology and clarify its pathological links with cardiometabolic comorbidities. Methods A proteome-wide two-sample Mendelian randomisation (MR) drug target analysis was conducted using plasma proteins measured in 54,219 participants from the UK Biobank Pharma Proteomics Project as exposures and RA and cardiometabolic diseases as the outcomes. Summary statistics for RA included 53,663 cases and 1,070,200 controls. Colocalisation analysis was performed to confirm shared single causal variants and prioritise RA proteins supported by both MR and colocalisation. The prioritised proteins were then evaluated in the Accelerating Medicines Partnership RA Phase II synovial single-cell dataset for cell-type expression patterns. Druggability was then assessed followed by analysis of genetic overlap between RA-associated proteins and cardiometabolic diseases. Results 37 plasma proteins had a causal effect on RA risk, supported by combined evidence from MR and conditional colocalisation. In synovial tissue, TPPP3, RARRES2, AKAP12, and GGT5 were predominantly expressed in stromal and endothelial cell clusters. Druggability assessment identified IFNGR2, IL6R, CD40, and FCGR2B as Tier 1 targets. However, several biologically relevant proteins, including RARRES2, AKAP12, TPPP3, and SNX2, had limited available druggability data. Genetic overlap analysis demonstrated shared protein signals between RA and cardiovascular diseases, including overlap of RARRES2 and TPPP3 with coronary artery disease (CAD) and FCGR2B with atrial fibrillation (AF). To approximate the therapeutic effect of target inhibition, the direction of effect estimates for proteins showing overlap between RA-CAD and RA-AF was reversed. Conclusion This study identified circulating proteins involved in RA pathogenesis and reveals shared mechanisms between RA and cardiovascular diseases. While some proteins showed clear translational potential targets, several prioritised proteins had limited available druggability information and could not be confidently classified. Addressing these gaps may help identify new targets relevant to RA management. Future work should also use phenome-wide MR studies to evaluate potential on-target adverse effects of protein inhibition across RA-CAD and RA-AF.

05.
arXiv (CS.AI) 2026-06-24

VoltanaLLM: Energy-Efficient and SLO-Aware Disaggregated LLM Serving via Adaptive Frequency Control and State-Space Routing

arXiv:2509.04827v3 Announce Type: replace-cross Abstract: The energy cost of Large Language Model (LLM) inference is rapidly becoming a barrier to sustainable and scalable deployment. Although modern serving architectures expose distinct prefill and decode behaviors, existing systems fail to exploit these phase differences for energy-efficient serving under strict latency SLOs. This paper introduces VoltanaLLM, the first system that explicitly targets and reduces the energy bloat in modern prefill-decode (P/D) disaggregated LLM serving. Guided by a control-theory perspective, VoltanaLLM separates two levers: per-instance operating-point selection (GPU frequency per iteration) and system-level state-space routing of requests. We empirically observe that LLM inference exhibits a U-shaped energy-frequency curve creating "sweet spots" that depend on phase behavior and load. VoltanaLLM exploits this by combining phase-specific, iteration-level frequency selection driven by a lightweight, online-adaptive latency predictor, with a decode state-space guided router that avoids architectural granularity-induced inefficiencies, all while meeting desired SLOs. We implement VoltanaLLM using SGLang and evaluate it across multiple models and real-world workloads. Our results show VoltanaLLM reduces end-to-end energy by up to 36.3% versus a static max-frequency baseline while maintaining high SLO attainment, and generalizes to newer GPUs. These results point to sustainable LLM serving via phase-aware, iteration-level frequency selection coupled with architecture-aware routing. Source code is available in https://github.com/Supercomputing-System-AI-Lab/VoltanaLLM.

06.
arXiv (CS.AI) 2026-06-24

TIP-Search: Time-Predictable Inference Scheduling for Market Prediction under Uncertain Load

Authors:

arXiv:2506.08026v4 Announce Type: replace Abstract: Real-time market prediction services need correct predictions before a decision deadline; a correct prediction delivered late is not usable. TIP-Search studies time-predictable inference scheduling over fixed market predictors under uncertain load. It filters conformal latency-quantile feasible models, dispatches over finite workers, and uses shielded constrained online experts to trade accuracy, queue pressure, and deadline risk. On the optimized deployable pool, TIP-Search reaches 0.994 raw accuracy and 0.991 timely accuracy. On official TLOB FI-2010 h=10, TIP-Search++ raises timely accuracy from 0.156 to 0.239 and deadline satisfaction from 0.391 to 0.962. In matched h10 profiled systems replay, OCO-ACPO reaches 0.303 timely accuracy and 0.951 deadline satisfaction, with paired gains over RAMSIS/SneakPeek/utility-style comparators of $+0.00285$ timely accuracy ($p=0.0118$) and $+0.0146$ deadline satisfaction ($p=1.5{\times}10^{-5}$). SA-OCO-ACPO improves timely/deadline service by 0.188–0.417 over CPO under nonstationary stress. The claim is a systems scheduling result, not a broad LOB classifier leaderboard.

07.
arXiv (CS.CV) 2026-06-16

ReGenHuman: Re-Generating Human Appearances for Realistic Full-Body Video Anonymization

Anonymizing human-centric video data is an understudied problem. Prior anonymization techniques either blur or redact pixels at the cost of realism and downstream utility, or generate frame-by-frame at the cost of temporal coherence. We introduce ReGenHuman, the first full-body video anonymization pipeline that is simultaneously realistic, temporally consistent, and anonymous by construction. Contrary to past approaches which redact or edit the inputs directly, we propose a regenerate, don't edit paradigm. Our approach composites 2D pose, segmentation, and monocular depth into two complementary conditioning streams - StructAll and StructHuman, which are used to fine-tune a video-to-video diffusion backbone on in-the-wild human videos, synthesizing the human regions entirely from identity-free structural cues. We evaluate our model on privacy, quality, and utility, and show that our ReGenHuman achieves the best tradeoff across all three axes against current baselines. We further show that our anonymized videos remain effective for downstream tasks, including video question answering.

08.
arXiv (CS.AI) 2026-06-18

Examining Human-Like Behaviors in LLMs: A Multi-Dimensional Analysis of Model Behaviors, User Factors, and System Prompts

arXiv:2606.18258v1 Announce Type: cross Abstract: Large language models (LLMs) exhibit a wide range of human-like behaviors, from expressing thoughts and emotions, to engaging in relationship-building with users, to refusing requests and maintaining boundaries. Despite their prevalence, researchers and practitioners lack methods and empirical insights to make informed decisions about when and what types of human-like behaviors LLMs should exhibit. To fill this gap, we present a multi-dimensional analysis of the prevalence, potential effects, and controllability of these behaviors using LLM-as-a-judge and human evaluation. Across 21,000 multi-turn conversations from four widely used models (gpt-4o, gpt-4.1-mini, claude-sonnet-4.6, gemini-2.5-flash), we find that human-like behaviors are pervasive but vary across models and user factors (conversation goals and user profiles). In terms of perceived appropriateness, human evaluators judged self-referential and relationship-building behaviors as less appropriate from LLMs than from humans, but boundary-maintaining behaviors more appropriate from LLMs than from humans. Finally, we show that system prompting can control these behaviors, though it requires careful evaluation to avoid unintended effects. We discuss the implications of our findings and provide recommendations for responsible LLM design and evaluation.

09.
arXiv (CS.CV) 2026-06-12

Masked and Predictive Self-Supervised Foundation Models for 3D Brain MRI

Self-supervised foundation models have shown strong promise in medical imaging. However, existing MRI foundation-model studies have primarily emphasized segmentation and dense prediction tasks, while systematic investigation of self-supervised foundation models for MRI-based disease detection remains limited. In this work, we investigate two major self-supervised pretraining paradigms for MRI-based disease detection: reconstruction-based learning via Masked Autoencoders (MAE) and predictive representation learning via Joint Embedding Predictive Architectures (JEPA). We study the role of auxiliary objectives by introducing a novel spectral-domain reconstruction loss for MAE to enhance sensitivity to fine-grained anatomical structure, and by integrating variance–covariance regularization (VCR) within our JEPA framework to encourage decorrelated latent representations. Our models are pretrained on heterogeneous single-contrast MRI volumes in a contrast-agnostic setting, without modality concatenation. Across five downstream disease detection tasks, our results highlight the importance of self-supervised objective design for medical foundation model pretraining, demonstrating that the downstream benefit of each objective is determined by its relevance to the task's structure. Specifically, spectral regularization yields the largest improvements when the downstream discriminative signal is characterized by strong high-frequency anatomical structures, while covariance regularization is most beneficial when discriminative information spans multiple decorrelated feature dimensions. MAE with spectral-domain supervision consistently achieves superior downstream performance for MRI-based disease detection. These findings suggest that self-supervised objectives in medical imaging encode specific biases, and their downstream benefit is fundamentally conditioned on the task's structure.

10.
arXiv (quant-ph) 2026-06-24

Teleportation-based quantum state tomography

arXiv:2511.18621v2 Announce Type: replace Abstract: We explicitly show that the quantum teleportation protocol can be employed to completely reconstruct arbitrary two- and three-qubit density matrices. We also extend the present analysis to n-qubit density matrices. The only quantum resources needed to implement the teleportation-based quantum state tomography protocol are the ability to make Bell measurements and the ability to prepare a few different single qubit states to be teleported from Alice to Bob.

11.
arXiv (quant-ph) 2026-06-11

Sharing quantum indistinguishability with multiple parties

arXiv:2512.15199v3 Announce Type: replace Abstract: Quantum indistinguishability of non-orthogonal quantum states is a valuable resource in quantum information applications such as cryptography and randomness generation. In this article, we present a sequential state-discrimination scheme that enables multiple parties to share quantum uncertainty, in terms of the max relative entropy, generated by a single party. Our scheme is based upon maximum-confidence measurements and takes advantages of weak measurements to allow a number of parties to perform state discrimination on a single quantum system. We review known sequential state discrimination and show how our scheme would work through a number of examples where ensembles may or may not contain symmetries. Our results will have a role to play in understanding the ultimate limits of sequential information extraction and guide the development of quantum resource sharing in sequential settings.

12.
arXiv (CS.AI) 2026-06-16

Optimizing Health Coverage in Ethiopia: A Learning-augmented Approach and Persistent Proportionality Under an Online Budget

arXiv:2509.00135v2 Announce Type: replace Abstract: As part of nationwide efforts aligned with the United Nations' Sustainable Development Goal 3 on Universal Health Coverage, Ethiopia's Ministry of Health is strengthening health posts to expand access to essential healthcare services. However, only a fraction of this health system strengthening effort can be implemented each year due to limited budgets and other competing priorities, thus the need for an optimization framework to guide prioritization across the regions of Ethiopia. In this paper, we develop a tool, Health Access Resource Planner (HARP), based on a principled decision-support optimization framework for sequential facility planning that aims to maximize population coverage under budget uncertainty while satisfying region-specific proportionality targets at every time step. We then propose two algorithms: (i) a learning-augmented approach that improves upon expert recommendations at any single-step; and (ii) a greedy algorithm for multi-step planning, both with strong worst-case approximation estimation. In collaboration with the Ethiopian Public Health Institute and Ministry of Health, we demonstrated the empirical efficacy of our method on three regions across various planning scenarios.

13.
medRxiv (Medicine) 2026-06-17

Postoperative Cognitive Decline in Older Patients with Cardiovascular Disease and Preoperative Mild Cognitive Impairment

Objective. Older adults undergoing cardiac surgery may be vulnerable to postoperative cognitive decline. However, no studies have examined postoperative cognitive outcomes in older patients with cardiovascular disease (CVD) according to preoperative mild cognitive impairment (MCI). This study examined 12-month postoperative cognitive outcomes in older CVD patients according to preoperative MCI diagnosis and explored predictors of postoperative cognitive decline. Method. Twenty-two older CVD patients ([≥]65 years) and twenty-five controls were included. Neuropsychological assessment was conducted at baseline in both groups and repeated 12 months after surgery in the CVD group. MCI was diagnosed using current clinical criteria. Postoperative cognitive change was examined across preoperative MCI groups. Results. Fifty percent of patients met criteria for postoperative MCI, showing high diagnostic stability relative to preoperative frequency (45.5%). The preoperative CVD-MCI group showed a decline in working memory, executive functions, visual memory, and naming, whereas CVD-nMCI group declined only in verbal memory. Furthermore, CVD-MCI showed more heterogeneous postoperative cognitive trajectories of change than CVD-nMCI, who showed stability. Estimated IQ, APACHE-II score, and postoperative frailty were important variables in predicting the postoperative pattern. Conclusions. MCI frequency remained high and stable in older CVD patients across the preoperative and one-year postoperative period. However, this apparent diagnostic stability masks subclinical cognitive decline, particularly among patients with preoperative MCI, who showed greater susceptibility to further impairment. Estimated IQ, APACHE-II score, and postoperative frailty may be considered relevant predictors of outcome. These results highlight the value of preoperative neuropsychological assessment for characterizing postoperative cognitive risk in older CVD patients.

14.
arXiv (CS.CV) 2026-06-24

PatternGSL: A Structured Specification Language for Template-Free and Simulation-Ready 3D Garments

Reconstructing realistic, physically plausible garments from a single image remains a fundamental challenge. Template-free methods capture surface geometry but lack explicit sewing structure for simulation; while programmatic systems are simulation-ready but constrained by predefined templates. This reveals a fundamental representation gap between geometric reconstruction and structured garment construction. We present PatternGSL, a structured garment representation in the form of a template-free and learnable specification language that encodes complete sewing patterns, including panel boundaries, parameterized seams, and explicit stitch topology, in a compact and standardized form. PatternGSL preserves the physical rigor of pattern-based models while removing template dependence, elevating sewing structure as a first-class target for generative modeling. We further propose a vision-language framework that predicts PatternGSL specifications directly from a single image and decodes them into garments using lightweight deterministic validity handling, without optimization-based refinement or manual cleanup. In addition, we introduce PatternGSLData, the first large-scale image-to-GSL paired dataset comprising 300K samples with complete sewing pattern annotations, enabling supervised VLM training for structured garment reconstruction. Experiments demonstrate improved pattern accuracy over prior baselines, explicit sewing-structure recovery, reliable cloth simulation, and pattern-level editing through the same deterministic decoding pipeline. Code and data-processing scripts will be released at https://github.com/PatternGSL/PatternGSL.

15.
arXiv (CS.CV) 2026-06-11

Density Ridge Selective Prediction for LLM and VLM Hallucination Detection under Calibration Label Scarcity

Hallucination detection in large language and vision-language models is increasingly framed as selective prediction, where a detector assigns a confidence score and abstains when confidence is low. Unsupervised sampling detectors (Semantic Entropy) avoid labels but plateau in quality, while supervised probes attain stronger in-distribution scores yet degrade sharply when calibration labels are scarce. We recover the response manifold of an LLM as the density ridge of a kernel density estimate built on a six-dimensional kinematic feature map of hidden state generation trajectories. A test generation is scored by the negated Euclidean distance from its projected feature point to the nearest ridge vertex, yielding a low-dimensional geometric skeleton of the stochastic output distribution. We evaluate against Semantic Entropy, topological methods, and log-probability on six QA benchmarks (HaluEval-QA, TriviaQA, GSM8K, POPE, ScienceQA, A-OKVQA) using eight text and vision LLMs in a deliberately label-scarce protocol ($n_{cal}{=}200$ queries, $N{=}5$ generations). Our ridge-based score beats on AUROC with 5-20 points gain, while demonstrating tempered degradation under calibration-label scarcity.

16.
arXiv (CS.CL) 2026-06-17

Compositional Skill Routing for LLM Agents: Decompose, Retrieve, and Compose

Authors:

LLM agents increasingly rely on external skills – reusable tool specifications – but real-world tasks often require composing multiple skills, not just selecting one. We formalize this as the Compositional Skill Routing problem: given a complex user query and a large skill library, decompose the query into atomic sub-tasks, retrieve the appropriate skill for each sub-task, and compose an executable plan. We present SkillWeaver, a decompose-retrieve-compose framework combining an LLM task decomposer, a bi-encoder skill retriever with FAISS indexing, and a dependency-aware DAG planner. To support evaluation, we introduce CompSkillBench, a benchmark of 300 compositional queries over 2,209 real MCP server skills spanning 24 functional categories, sourced from the public MCP ecosystem. Our experiments reveal that task decomposition quality is the primary bottleneck: standard LLM decomposition reaches only 34.2% category recall at the step level. To address this, we propose Iterative Skill-Aware Decomposition (SAD), a retrieval-augmented feedback loop that iteratively aligns decomposition with available skills. SAD improves decomposition accuracy from 51.0% to 67.7% (+32.7%, Wilcoxon p < 10^-6) in a single iteration; DA-conditioned analysis confirms that correct granularity is the prerequisite for effective retrieval (CatR@1 rises from 34% to 41% when DA=1). SkillWeaver reduces context window consumption by over 99%, and transfer experiments confirm generalization (+35.6% relative DA gain even when target categories are absent from the retrieval pool).

17.
medRxiv (Medicine) 2026-06-15

Supporting people to access social security payments through the Special Rules for End of Life: a qualitative study of the perspectives of patients, carers and health care professionals

Background: People living with terminal illness face a double financial burden from additional costs and loss of earning for themselves and their carers. Social security benefits are intended to help alleviate some of this financial pressure, and in the UK and other countries people are eligible for fast-tracked access to financial support via the Special Rules for End of Life. One in 3 people who are eligible miss out on this support, yet there is limited evidence on the reasons for this take-up deficit. Objectives: The aim of this study is to understand the barriers and facilitators to claiming benefits for terminally ill people from the perspectives of patients, carers, and health care professionals. Methods: This is a qualitative study combining i) focus groups with healthcare professionals recruited via professional networks and social media, and ii) interviews with patients and carers recruited in hospital and hospice settings. We analysed the data using Practical Thematic Analysis Results: Fifty-five multidisciplinary healthcare professionals participated in 11 focus groups, and we interviewed 10 patients and carers. We constructed five descriptive themes to summarise the data: Navigating priorities and uncertainty; positive impacts alongside a sense of shame and stigma; talking about money, difficulties and dividends; everybodys, yet nobodys, responsibility; and sticking points in the system. Conclusion: The themes reveal several challenges that may contribute to people not taking up this financial support. However, discussions about access to benefits were also seen as a core part of holistic care, a positive way to offer support and a gateway to other discussions about end-of-life care preferences and decisions. Recommendations for policy and practice include evaluating the adoption of a diagnostic rather than a prognostic eligibility criteria, integrating discussions about benefits into existing processes such as advance care planning, and improving education and support for clinicians.

18.
arXiv (CS.AI) 2026-06-12

SymQNet: Amortized Acquisition for Low-Latency Adaptive Hamiltonian Learning

arXiv:2606.12808v1 Announce Type: cross Abstract: Adaptive Hamiltonian learning is central to calibrating and characterizing quantum devices. In an adaptive controller, choosing the next experiment is itself a computation. Bayesian design rules are recomputed after every posterior update, and that step can take seconds. Across hundreds of shots, those seconds become a significant wall-clock cost for adaptivity. We introduce SymQNet, an amortized reinforcement-learning approach for low-latency adaptive Hamiltonian learning. SymQNet learns a posterior-conditioned acquisition policy offline, then uses a fast policy forward pass online while retaining Bayesian posterior feedback. On transverse-field Ising benchmarks, SymQNet substantially reduces acquisition latency relative to bounded Fisher-information search and bounded two-step Bayesian active learning by disagreement (BALD). At five qubits, it reduces acquisition-only decision latency by $47.1\times$ and $72.6\times$ relative to these online baselines; at twelve qubits, full simulated steps take $1.02$ s for SymQNet versus $13.27$ s for bounded two-step BALD. Overall, we show that learned acquisition can make adaptive Hamiltonian learning practical for repeated low-latency workloads.

19.
arXiv (CS.AI) 2026-06-16

Evidence of an Emergent "Self" in Continual Robot Learning

arXiv:2603.24350v3 Announce Type: replace-cross Abstract: A key challenge to understanding self-awareness has been a principled way of quantifying whether an intelligent system has a concept of a "self", and if so how to differentiate the "self" from other cognitive structures. We propose that the "self" can be isolated by seeking the invariant portion of cognitive process that changes relatively little compared to more rapidly acquired cognitive skills - because our self is the most persistent aspect of our experiences. We used this principle to analyze the cognitive structure of robots under two conditions: One robot learns a constant task, while a second undergoes continual learning under variable tasks. We find that robots subjected to continual learning develop an invariant subnetwork that is significantly more stable (p < 0.001) compared to the control, and that this subnetwork is also functionally important: preserving it aids adaptation while damaging it impairs performance. We validate this pattern across three different robots spanning locomotion and manipulation.

20.
arXiv (CS.CL) 2026-06-17

Speaking in Self-Assessing Tongues: On the Verbalized Confidence of LLMs in Machine Translation

The rapid rise in popularity of large language models (LLMs) for translation calls for a thorough study of the reliability of their confidence in their own outputs. Unlike many generation tasks, translation errors and confidence levels can be useful at different levels of granularity (tokens, words, or spans). Unsupervised approaches based on internal signals like predicted probabilities can be misleading because they reflect certainty among alternatives rather than correctness. In addition, they require access to such internal signals. Here, we devise five verbalized methods of extracting an LLM's per-token confidence without those shortcomings and compare their reliability with that of the model's internal signals of certainty. We evaluate reliability using two forms of alignment: fine-grained error detection and calibration. For both, internal and verbalized methods perform similarly, although results vary by model. Interestingly, we find little to no correlation between internal and verbalized methods.

21.
arXiv (CS.LG) 2026-06-12

ProtoX-AD: Self-Explainable Time Series Anomaly Detection and Characterization

arXiv:2606.13277v1 Announce Type: cross Abstract: Recent advances in time series anomaly detection (TSAD) have highlighted the effectiveness of self-supervised classification-based approaches. These methods apply transformations to normal training samples, training a classifier to recognize transformation-specific patterns that help identify anomalies through increased classification errors. Despite their strong performance, a significant challenge is their lack of explainability, as they provide limited insight into the characteristics of flagged anomalies. To address this limitation, we propose ProtoX-AD, a prototype-based self-explainable framework for self-supervised TSAD. ProtoX-AD learns transformation-aware latent representations alongside interpretable prototypes, enabling both accurate anomaly detection and the identification of distinct anomalous profiles through prototype-based explanations. Additionally, it allows for systematic analysis of how transformation design impacts detection performance and explainability. Experimental results on synthetic and real-world datasets demonstrate that ProtoX-AD achieves detection performance comparable to its black-box counterparts while offering more consistent and semantically meaningful explanations than existing explainable baselines. Our code is publicly available at https://github.com/Aitorzan3/ProtoX-AD.

22.
arXiv (CS.AI) 2026-06-19

Before the Pull Request: Mining Multi-Agent Coordination

arXiv:2606.19616v1 Announce Type: cross Abstract: Autonomous coding agents now open millions of pull requests, yet large-scale studies find their PRs are produced faster but accepted less often - a coordination and trust gap that pull-request-level telemetry cannot explain. We argue the missing signal lives before the PR, in how concurrent agents claim, divide, and collide over shared work. We study this process through grite, our open-source coordination substrate that needs no central server and stores its records inside git itself, so its append-only, signed event log captures the coordination process directly. We show that (i) this shared substrate reduces duplicate and conflicting work at bounded overhead - the share of work that merely re-does a teammate's task falls from 78% to 0% while useful throughput more than triples; (ii) every agent's copy of the log converges to the same state with no write silently dropped, where a file-based tracker loses concurrent writes; and (iii) the log is a mineable artefact from which concrete failure modes - conflicting edits, lock starvation, redundant rediscovery, race-to-close - are automatically recoverable with provenance, several invisible in pull-request history. We release the dataset, harness, and mining toolkit.

23.
arXiv (CS.LG) 2026-06-16

ROVE: Unlocking Human Interventions for Humanoid Manipulation via Reinforcement Learning

arXiv:2606.17011v1 Announce Type: cross Abstract: Human interventions provide crucial corrective signals for post-training Vision-Language-Action (VLA) models. However, enabling seamless humanoid interventions is a formidable systems challenge due to complex whole-body kinematics and dexterous-hand control. Consequently, the collected intervention trajectories are often suboptimal, and methods that rely on human interventions as expert supervision can absorb hesitant, inefficient, or even erroneous behaviors. To address both the system and algorithmic challenges, we propose ROVE, a reinforcement learning framework for humanoid VLA post-training with imperfect human interventions. First, ROVE introduces a human-in-the-loop pipeline capable of collecting deployment and intervention data for humanoid manipulation. Second, it utilizes Optimistic Value Estimation (OVE) to prioritize high-value behaviors from mixed-quality trajectories. To further robustify value estimation, we incorporate cross-embodiment human experience videos to provide rich supervision for long-tailed failure and recovery modes. The resulting critic yields informative advantage signals, steering the VLA actor to focus on high-value behaviors rather than indiscriminately imitating all actions. On challenging real-world contact-rich and fine-grained humanoid manipulation tasks, ROVE outperforms experience-learning baselines and consistently improves across multiple rollout-intervention iterations.

24.
arXiv (CS.CV) 2026-06-15

PMOF: A Dataset and Benchmark for Passenger Monitoring Using Overhead Fisheye Cameras

Autonomous staff-free public transport requires reliable in-vehicle passenger monitoring. However, perception inside moving vehicles is challenged by confined spaces, variable illumination, motion-induced background variation, occlusion, and limited viewpoints. To mitigate these spatial constraints, ceiling-mounted fisheye cameras provide full-scene coverage from a single viewpoint. Yet existing public overhead fisheye datasets are recorded in static environments and do not capture the domain shift introduced by vehicle motion. To fill this gap, we introduce PMOF, Passenger Monitoring using Overhead Fisheye cameras, the first public dataset of top-view fisheye imagery captured inside a moving vehicle, comprising over 19k manually annotated frames. PMOF provides rotated bounding boxes, tracking identifiers, and action labels, supporting object detection, tracking, and action recognition. We benchmark PMOF using YOLO26m-obb models fine-tuned under multiple dataset configurations that combine PMOF with existing overhead fisheye datasets. Cross-domain fine-tuning with custom rotation-aware augmentation achieves 94.8% AP50 on PMOF and 96.5% AP50 on an unseen overhead fisheye dataset from a different domain. Our results highlight the domain gap between static and moving environments and show that incorporating PMOF improves detection performance and advances generalization beyond passenger monitoring to broader fisheye-based person detection tasks. The dataset and code are available at https://swermuth.github.io/pmof/.

25.
arXiv (CS.AI) 2026-06-24

Exploring the relationship between human-centric AI and firm idiosyncratic risks

arXiv:2606.24224v1 Announce Type: new Abstract: Despite the extensive discussions of human-centric AI (HCAI) in Industry 5.0, its effects on firms' idiosyncratic risks (IR) remains underexplored. This is an imperative issue for firms navigate financial risks during the current technological revolution, as IR reflects investor reactions to corporate heterogeneous AI strategies and implementations by isolating firm-level stock volatility from systematic factors. Integrating situated AI theory with social-technical systems theory, we conceptualise HCAI as a situated AI strategy that reduces AI-related ethical risks and fosters AI-Human synergies in firms' business operations, ultimately reducing IR by aligning with stakeholders' diverse expectations. Moreover, socio-technical factors, namely digitalisation, operational efficiency, executive shareholding, and CEOs with IT background, may moderate the HCAI-IR relationship. Using a multi-source panel dataset of Chinese listed firms from 2015 to 2023, we find that HCAI is associated with lower firm IR. Furthermore, digitalisation and executive shareholding strengthen this risk-reducing effect, whereas operational efficiency and CEOs with IT background surprisingly attenuate it. Our findings offer theoretical contributions and practical insights for both ethical AI governance and firm financial risk management in the AI era.