Academic Intelligence · Curated Daily

Explore the Frontier of Global Academia

AcademicHub aggregates real-time literature from top journals and preprint platforms. Build your personal research radar and let large language models compile cross-disciplinary analysis briefings automatically.

01.
arXiv (CS.CV) 2026-06-15

Mirage Probes: How Vision Models Fake Visual Understanding

Vision-language models (VLMs) can answer image-based questions confidently, and often correctly, even when no image is provided. This mirage behavior inflates benchmark scores without reflecting visual grounding. Prior work treats this as a single failure mode. We argue it is two. Using Mirage Probes, a contrastive probing framework that pairs paraphrased question variants with matched mirage and non-mirage labels on the same image, we show that mirage behavior is linearly decodable from internal activations across residual stream, MLP, post-attention, and attention-head sites in two open-source VLMs. We demonstrate that a Naive Bayes text baseline cannot recover this signal, ruling out surface lexical confounds. Cross-benchmark separability patterns, together with a novel Prior Harnessing Index (PHI) measuring how much a model can answer from text alone, expose two distinct regimes: textual biases, where the model answers from language priors without engaging visual representations, and spurious images, where it constructs false visual content in latent space and answers as if grounded. The distinction has direct mitigation consequences: text-distribution cleaning can address the first regime but cannot reach the second, since spurious-image mirages live in the model's visual representations rather than its text. Faithful visual grounding will require interventions at the representational level.

02.
arXiv (CS.CV) 2026-06-11

Higher order PCA-like rotation-invariant features for detailed shape descriptors modulo rotation

Authors:

PCA can be used for rotation invariant features, describing a shape with its $p_{ab}=E[(x_i-E[x_a])(x_b-E[x_b])]$ covariance matrix approximating shape by ellipsoid, allowing for rotation invariants like its traces of powers. However, real shapes are usually much more complicated, hence there is proposed its extension to e.g. $p_{abc}=E[(x_a-E[x_a])(x_b-E[x_b])(x_c-E[x_c])]$ order-3 or higher tensors describing central moments, or polynomial times Gaussian allowing decodable shape descriptors of arbitrarily high accuracy, and their analogous rotation invariants. Its practical applications could be rotation-invariant features to include shape modulo rotation e.g. for molecular shape descriptors, or for up to rotation object recognition in 2D images/3D scans maybe also for 3D scene understanding, or shape similarity metric allowing inexpensive comparison of objects modulo rotation avoiding costly optimization over rotations.

03.
arXiv (CS.LG) 2026-06-24

A Differentially Private Weighted Empirical Risk Minimization Procedure and its Application to Outcome Weighted Learning

arXiv:2307.13127v3 Announce Type: replace-cross Abstract: Data used to train predictive models via empirical risk minimization (ERM) often contain sensitive personal information. While differential privacy (DP) provides mathematically provable bounds to protect such data, previous work has focused almost exclusively on unweighted ERM. We consider weighted ERM (wERM) – an important generalization where individual contributions to the objective function vary. We propose the first DP algorithm for general wERM with formal privacy guarantees and derive both its empirical and population excess risk bounds. Crucially, this general wERM framework provides a pathway for deriving privacy-preserving learning methods for individualized treatment rules, including the popular outcome-weighted learning (OWL) approach. We evaluate DP-wERM applied to OWL in simulated and real data experiments. Our empirical results demonstrate that training OWL models via wERM provides strong DP guarantees while maintaining robust performance, proving the method is practical for sensitive, real-world data.

04.
arXiv (CS.CL) 2026-06-18

Phonikud: Overcoming Phonetic Underspecification for Hebrew Text-To-Speech

Text-to-speech (TTS) for Modern Hebrew is challenged by the language's orthographic complexity, with existing solutions ignoring underspecified phonetic features such as stress. We present a framework for more phonetically accurate Hebrew TTS with four contributions: (1) Phonikud, an open-source Hebrew grapheme-to-phoneme (G2P) system that outputs fully-specified International Phonetic Alphabet (IPA) transcriptions, designed by augmenting a base diacritizer. (2) The ILSpeech corpus of paired Hebrew audio, text, and expert IPA annotations. (3) A benchmark for the previously unmeasured task of Hebrew G2P conversion. (4) Hebrew audio-to-IPA models capturing previously disregarded phonetic details for automatic TTS evaluation. Our results show that Phonikud more accurately predicts Hebrew phonemes than prior methods, and that small, local TTS models with phonetic input from Phonikud approach large proprietary systems. We release our code, data, and models at https://phonikud.github.io.

05.
arXiv (CS.CV) 2026-06-17

Disentangling Perception and Reasoning in Multimodal LLMs via Reward Design

Reinforcement learning with verifiable rewards has driven major gains in LLM reasoning, and it is intuitive to assume this recipe will transfer well to multimodal models. However, multimodal models do two things: first, perceive what is in an image, then reason about what it implies. Because these stages are graded jointly, it is hard to tell how much room reasoning alone has to grow. We study this on algorithmic visual puzzles, where both components are necessary and show that perception, not reasoning, is the binding constraint. Replacing images with simple textual descriptions raises performance by over 20 points on average for Claude models. We then evaluate six reward designs aimed at inducing visual grounding during reasoning without chain-of-thought supervision. Training Qwen-2.5-VL-7B with GRPO, reward design induces long, structured reasoning with self-reflection and visual references, yielding a 5.56-point gain over the base model. These gains are, however, uneven; no single reward improves all categories, and rewards with verifiable accuracy signals trade out-of-domain transfer for in-domain accuracy. These results point to perception-aware reward design as a path forward, so that signals correct perception at its source rather than the reasoning that inherits its errors.

06.
arXiv (quant-ph) 2026-06-16

Single-Image Entanglement Verification with Spatially Encoded Measurement Contexts

arXiv:2606.15382v1 Announce Type: new Abstract: Entangled photon pairs produced by spontaneous parametric down-conversion exhibit rich spatial entanglement structure that is often difficult to probe with conventional measurements. Here, we show that spin-orbit optical elements can convert this spatial structure into directly observable quantum interference patterns. Using a $q$-plate, we demonstrate that the relative wavefront curvature of biphoton states generated by a pair of nonlinear crystals can be retrieved from the spatial modulation of coincidence images. Building on this principle, we introduce a liquid-crystal metasurface that performs spatially multiplexed Bell measurements across the transverse profile of the photon field. The device, which we call a Clauser-Horne-Shimony-Holt (CHSH) plate, assigns different polarization projections to different azimuthal sectors of the beam, allowing the sixteen joint measurements required for a CHSH test to be realized simultaneously in a single acquisition. In this architecture, the spatial coordinate acts as a classical register selecting the measurement context, while photon pairs sample these contexts according to their emission directions. We further demonstrate that the same measurement concept can be implemented using a programmable spatial light modulator, providing a dynamically reconfigurable realization of the scheme. Our results show that spatially structured optical elements can transform Bell tests into parallel measurements distributed across the transverse plane, enabling rapid characterization of spatially varying entanglement. This approach opens new possibilities for structured-light quantum measurements, Bell-inequality-based imaging, and the study of spatially engineered entangled photon sources.

07.
arXiv (CS.LG) 2026-06-24

Task Vector Bases: A Unified and Scalable Framework for Compressed Task Arithmetic

arXiv:2502.01015v5 Announce Type: replace Abstract: Task arithmetic, representing downstream tasks through linear operations on task vectors, has emerged as a simple yet powerful paradigm for transferring knowledge across diverse settings. However, maintaining a large collection of task vectors introduces scalability challenges in both storage and computation. We propose Task Vector Bases, a framework compressing $T$ task vectors into $M < T$ basis vectors while preserving the functionality of task arithmetic. By representing each task vector as a structured linear combination of basis atoms, our approach supports standard operations such as addition, negation, as well as more advanced arithmetic ones. The framework is orthogonal to other efficiency-oriented improvements in task arithmetic and can be used in combination with them. We provide theoretical analysis showing that basis compression retains addition generalization guarantees and enables principled unlearning, with error bounds depending on reconstruction quality. Empirically, our proposed basis construction methods consistently outperform heuristic basis construction baselines and, in some cases, even surpass the performance of full task vector collections across diverse downstream applications while reducing storage and computational requirements. The code is available at https://github.com/uiuctml/TaskVectorBasis.

08.
arXiv (math.PR) 2026-06-17

Killed resolvents and measure-valued stopping gains for reflected optimal stopping with max-type rewards

arXiv:2606.17517v1 Announce Type: new Abstract: We study an infinite-horizon optimal stopping problem for a normally reflected two-dimensional diffusion in the positive quadrant with nonsmooth max-type reward \(G(x_1,x_2)=x_1\vee \alpha x_2\). The paper develops a conditional measure-theoretic framework for the associated reflected obstacle problem. The main innovation is to show that the stopping gain \(\Gamma=c+rG-\mathcal LG\) is a signed measure, not a function: the kink of \(G\) generates an explicit negative surface measure on \(\Delta=\{x_1=\alpha x_2\}\). We then prove that the correct potential representation uses the resolvent of the reflected diffusion killed on first entry into the stopping set, rather than the unrestricted reflected resolvent. Under explicit monotonicity, regularity, and measure-superharmonicity assumptions, we derive an epigraph representation, a continuation-side boundary-trace condition, and a candidate verification theorem. The framework clarifies hidden regularity and uniqueness assumptions in multidimensional nonsmooth optimal stopping.

09.
arXiv (CS.LG) 2026-06-16

Closing the Approximation Gap in Simulation-free Latent SDEs

arXiv:2606.16138v1 Announce Type: cross Abstract: Recovering dynamical systems from noisy observations is a recurring challenge across scientific domains, including neuroscience and physics. Latent stochastic differential equations (SDEs) address this by modeling the system as an unobserved state that evolves according to a learnable SDE and generates the observations. Variational inference (VI) provides a tractable objective for fitting latent SDEs. Traditional VI algorithms evaluate this objective by numerical simulation over a time discretization, trading fidelity for computational cost. A recent class of algorithms, simulation-free VI, sidesteps this tradeoff by parameterizing the posterior through its instantaneous marginals rather than its drift. In this work, we show that the efficiency of existing simulation-free VI algorithms comes at a price: their parameterizations restrict the approximate posterior to a subset of the SDEs available to simulation-based methods, degrading posterior inference and parameter learning. We propose Helmholtz-SDE, a simulation-free VI algorithm that closes this gap by optimizing over path laws compatible with a prescribed collection of marginals. Helmholtz-SDE recovers dynamics more faithfully than prior simulation-free methods, with the largest gains under high posterior uncertainty. It further matches the performance of simulation-based VI at a fraction of the runtime.

10.
medRxiv (Medicine) 2026-06-11

Validity and Limitations of the Empatica E4 Wristband for Autonomic and Thermoregulatory Sleep Monitoring Against Concurrent Polysomnography: A Wearanize+ Dataset Study

The Empatica E4 wristband provides continuous multi-modal physiological monitoring including blood volume pulse (BVP), electrodermal activity (EDA) and skin temperature (TEMP) but its validity for sleep-stage-specific autonomic and thermoregulatory monitoring has not been systematically evaluated against concurrent polysomnography (PSG). Using the Wearanize+ dataset which provides synchronised PSG, Empatica E4, and Zmax EEG recordings from 100 home-recorded participants; a systematic validation of Empatica E4 physiological signals against PSG ground truth across five sleep stages was conducted. Of 100 participants, 92 had Empatica data; 69 met Zmax EEG signal quality criteria and formed the analysis sample. Heart rate (HR) from the pre-computed Empatica HR channel showed valid stage-specific patterns (Wake: 70.9 bpm, N3: 61.2 bpm) and moderate inter-device MeanNN correspondence with PSG ECG (Spearman r=0.35-0.42 across stages). Skin temperature showed the expected thermoregulatory pattern (Wake: 33.92C, N3: 35.48C) and is recommended for downstream analyses. Tonic EDA showed an inverted stage pattern attributable to wrist sweat accumulation during deep sleep, representing a known confound for wrist-worn EDA during sleep. Phasic EDA showed plausible patterns and may be used with caution. These findings establish a validated feature set for Empatica E4 sleep research and directly inform multimodal psychiatric biomarker studies using the Wearanize+ dataset.

11.
arXiv (CS.AI) 2026-06-16

LatentGym: A Testbed For Cross-Task Experiential Learning With Controllable Latent Structure

arXiv:2606.15306v1 Announce Type: cross Abstract: We envision continually learning agentic systems that become more useful over time: as they encounter sequences of related tasks, they should infer the hidden structure shared across those tasks and use it to improve future decisions. This cross-task experiential learning capability is pivotal in domains such as personalization and interactive assistance, but existing training/evaluation frameworks do not provide shared, controllable latent structures and cannot measure whether or why agents improve. We introduce LatentGym: a controllable suite in which each environment is organized around a ground-truth latent variable governing the structure across tasks. Our construction yields metrics that separate exploration (whether the agent's actions gather information about the latent) from exploitation (whether the agent uses what it has gathered). We demonstrate our suite on empirical studies addressing three questions: how and why frontier models fail to adapt across related tasks; whether post-training on related task sequences improves general cross-task adaptation, and where those gains come from; and how design choices such as inter-task feedback shape training dynamics and generalization. Together, these results establish a controlled foundation for studying how LLM agents learn from experience across tasks, and for designing agents that adapt more reliably in sequential, personalized, and interactive settings.

12.
arXiv (CS.AI) 2026-06-11

Bridging the Morphology Gap: Adapting VLA Models to Dexterous Manipulation via Intent-Conditioned Fine-Tuning

arXiv:2606.12109v1 Announce Type: cross Abstract: Vision-Language-Action (VLA) models have demonstrated remarkable zero-shot generalization in robotic manipulation, yet the vast majority of pre-trained pipelines remain strictly confined to low-DoF parallel grippers. Adapting these rich semantic priors to high-DoF dexterous hands introduces a severe morphology gap, direct end-to-end joint fine-tuning inherently causes catastrophic forgetting of spatial reasoning and acute action manifold collapse due to data scarcity. In this paper, we present InDex, a novel, data-efficient adaptation framework rooted in cross-morphology semantic inheritance. Rather than discarding the pre-trained 1-DoF parallel grasp output, we repurpose it as a continuous, macroscopic virtual grasp intent proxy to sequentialize the control topology. We implement a two-stage decoupled learning architecture: the first stage parameter-efficiently aligns the VLA backbone to predict continuous arm trajectories and the scalar grasp intent; the second stage freezes this spatial backbone and leverages an intent-conditioned denoising diffusion head to decode fine-grained joint articulations for multi-fingered end-effectors. Extensive simulation benchmarks across a suite of multi-stage, contact-rich dexterous manipulation tasks demonstrate that InDex effectively masters intricate skills with minimal demonstration data, substantially outperforming monolithic baselines while preserving the robust spatial generalizability of the original VLA prior.

13.
arXiv (CS.LG) 2026-06-12

Physics-Aware Auxiliary Losses Improve Out-of-Distribution Generalization of a GNN Synthesizability Filter

arXiv:2606.12651v1 Announce Type: new Abstract: Machine-learning drug-discovery pipelines increasingly rely on generative models that propose molecules far from the data used to train downstream synthesizability filters. Existing filters (SAScore, SCScore, RAscore, DeepSA) are purely statistical and degrade in exactly this out-of-distribution (OOD) regime. We ask whether cheap, closed-form physical priors, used as auxiliary supervision on a graph neural network (GNN), improve OOD generalization. We add two auxiliary losses to a GINE backbone: a topological complexity regression supervised by the Bertz index, and a strain-energy soft penalty supervised by MMFF94 force-field energy. On a 65,177-molecule corpus (HIV, Tox21, COCONUT) labeled by SAScore thresholds we reproduce a strong in-distribution baseline, then evaluate a 4-way ablation (baseline / +complexity / +strain / +both) on a single-source OOD split (train on drug-like HIV+Tox21, test on COCONUT natural products), repeated over 5 seeds with paired bootstrap confidence intervals. All three physics-aware variants give a small but statistically significant OOD improvement over the baseline (mean OOD AUC 0.9774): +complexity Delta = +0.0060 (95% CI [+0.0023, +0.0102]), +strain Delta = +0.0032 ([+0.0008, +0.0052]), +both Delta = +0.0066 ([+0.0038, +0.0093]); every interval excludes zero, and the combination is best. The variants are indistinguishable in-distribution, so the effect is visible only under OOD evaluation. We are explicit that the effects are modest, and we report a cautionary methodological finding: a single-seed version of this experiment produced a qualitatively different (non-monotone) story that did not survive multi-seed evaluation.

14.
arXiv (CS.AI) 2026-06-24

AI-Driven Analytics of Team-Teaching Talk: Acoustic Patterns across Experience, Cohorts and the Learning Design

arXiv:2606.09831v2 Announce Type: replace-cross Abstract: As classroom cohorts expand, team teaching is increasingly used to integrate the expertise and pedagogical perspectives of multiple teachers. Yet, there is limited empirical understanding of how team teaching unfolds in practice, particularly regarding differences in teachers' contributions across experience levels, student cohorts, and learning task design. Prior research on team teaching has largely relied on retrospective self-reports or small-scale observations, offering limited insight into the micro-level processes through which team teaching is enacted. Teacher talk offers a scalable lens on these processes. While research in individual teaching contexts shows that acoustic features of speech (e.g., voice quality, intonation, and loudness) can shape student learning, evidence from team-teaching settings remains scarce. Moreover, capturing such features through manual observation or transcription is especially challenging in team-teaching classrooms, where multiple teachers speak across extended sessions and spatial locations, limiting scalability without automation. Grounded in spatial pedagogy theory and team-teaching research, this paper presents an AI-based speech processing approach to analyse classroom talk in team-teaching settings. We analysed 36 recorded undergraduate and postgraduate sessions involving 12 teachers. Spatial pedagogy behaviours were coded and acoustic features extracted to examine variation across teachers' experience, student cohorts, and the learning task design. The results reveal systematic differences, most notably in loudness dynamics: high-experience teachers, undergraduate classes and collaborative learning tasks exhibited greater loudness variation, suggesting more frequent modulation of volume to foreground key information and support classroom interaction and engagement.

15.
Nature (Science) 2026-06-10

A prognostic human brain network for diffuse midline glioma

Authors:

Diffuse midline gliomas (DMGs) are near-universally lethal tumours of the&nbsp;childhood central nervous system1,2. In animal models, DMGs form brain-wide integrated networks through neuron-to-glioma synapses3–6 and glioma-to-glioma gap junctional coupling3. This extensive connectivity robustly promotes the growth and invasion of DMG3–9 and other glial malignancies10–12 through paracrine mechanisms and direct neuron-to-glioma synapses. However, the organization and clinical implications of these connections in the living human brain remain to be elucidated. Here, we develop tumour network mapping to compute the brain-wide connectivity profile of DMG, defining a conserved brain network across pontine and thalamic DMG associated with patient short-term survival (DMG network). Tumour functional connectivity with the DMG network was independently predictive of patient overall survival across two external validation cohorts. Tumour growth mapped to DMG&nbsp;network-specific trajectories and peak in-network neurometabolic changes across development spatiotemporally aligned with the peak age incidence of DMG. Analyses of single-nucleus RNA&nbsp;sequencing data&nbsp;confirmed diverse synaptic gene enrichment in high-connectivity DMG. Strikingly, incidental surgical resection of high-connectivity thalamic DMG tissue conferred a significant survival advantage. Collectively, these data define a conserved and prognostically important brain network in children with DMG, consistent with the hypothesis that DMGs exploit otherwise healthy brain circuits to promote tumour growth. Tumour network mapping of diffuse midline glioma&nbsp;(DMG) defines a conserved and prognostically important brain network in children with DMG, consistent with the hypothesis that DMGs exploit otherwise healthy brain circuits to promote tumour growth.

16.
arXiv (CS.AI) 2026-06-19

LoRDO: Distributed Low-Rank Optimization with Infrequent Communication

arXiv:2602.04396v2 Announce Type: replace-cross Abstract: Distributed training of foundation models via $\texttt{DDP}$ is limited by interconnect bandwidth. While infrequent communication strategies reduce synchronization frequency, they remain bottlenecked by the memory and communication requirements of optimizer states. Low-rank optimizers can alleviate these constraints; however, in the local-update regime, workers lack access to the full-batch gradients required to compute low-rank projections, which degrades performance. We propose $\texttt{LoRDO}$, a principled framework unifying low-rank optimization with infrequent synchronization. We first demonstrate that, while global projections based on pseudo-gradients are theoretically superior, they permanently restrict the optimization trajectory to a low-rank subspace. To restore subspace exploration, we introduce a full-rank quasi-hyperbolic update. $\texttt{LoRDO}$ achieves near-parity with low-rank $\texttt{DDP}$ in language modeling and downstream tasks at model scales of $125$M–$720$M, while reducing communication by $\approx 10 \times$. Finally, we show that $\texttt{LoRDO}$ improves performance even more in very low-memory settings with small rank/batch size.

17.
PLOS Computational Biology 2026-06-24

The transcriptional gradient in negative-strand RNA viruses suggests a common RNA transcription mechanism

by Connor R. King, Casey-Tyler Berezin, Brian Munsky, Jean Peccoud Nonsegmented negative-strand RNA viruses (NNSV) are a diverse class of medically relevant viruses which display a conserved attenuation gradient in the transcription of their genomes. This gradient has been traditionally explained by the Stop-Start model which attributes attenuation to polymerase behavior at gene junctions. In this article, we evaluate an alternative explanation where the gradient arises from polymerase dynamics during transcription. We introduce the RNA Polymerase Association Mechanism (RAM) model, a coarse-grained stochastic framework that describes transcription using two parameters related to polymerase processivity and the ability of the polymerase to backtrack. The RAM model accurately reproduces transcriptional gradients across diverse NNSVs as well as in gene-shuffled VSV variants. Additionally, the inferred polymerase processivity appears correlated to the length of the viral genomes suggesting a conserved constraint on transcription across these viruses. While the RAM model does not account for all known molecular features of NNSV transcription, it provides a parsimonious and predictive framework for relating genome architecture and transcription. These results support the view that, in tandem with the traditional junction-centric mechanisms governing transcription, nonspecific attenuation mechanisms contribute to the NNSV transcriptional gradient and warrant closer inspection in future studies which could lead to better rational genome design in viral studies and biomedical applications.

18.
arXiv (quant-ph) 2026-06-25

Fast and Parallel High-Rate STAR Architecture for Megaquop Quantum Simulation

arXiv:2606.25011v1 Announce Type: new Abstract: Fault-tolerant quantum simulation is approaching a phase where encoding overhead, logical Clifford operations, magic-state preparation, and rotation synthesis must be optimized together for efficient implementation. Space-Time efficient Analog Rotation (STAR) architectures reduce two of these costs by preparing small-angle rotation magic states directly, and the transversal STAR variant further lowers the Clifford overhead. Existing concrete implementations, however, largely inherit the low $O(1/d^2)$ encoding rate of the surface code, while high-rate codes have not yet been integrated into comparably explicit architectures. Here, we introduce a high-rate STAR architecture for local lattice Hamiltonian simulation based on a symmetry-driven co-design of the algorithm, QEC code, and neutral-atom hardware. Translation symmetries of the target lattice determine the choice of bicycle chain codes, a tunable family of self-dual bivariate bicycle codes that natively implement Clifford gates required for lattice simulation. Disjoint logical representatives allow STAR injections to be performed in parallel on all $k$ logical qubits in a code block, amortizing resource state preparation and enabling practical post-selection rates. On neutral-atom platform, the same translation symmetry compiles the key logical operations into low-depth, hardware-native acousto-optic-deflector shifts. End-to-end estimates show that an $8 \times 8$ transverse-field Ising simulation to $T^* \approx 8 (zJ)^{-1}$ requires $2240$ physical qubits and $\sim 200$ s per shot, a $\sim 5.5\times$ space reduction relative to a surface code STAR baseline at comparable speed; for Fermi-Hubbard dynamics to $T^* \approx 4 (zt)^{-1}$, the corresponding estimates are $\sim 6300$ physical qubits and $\sim 200$ s per shot. These results provide a concrete route toward early fault-tolerant quantum simulation with high-rate codes.

19.
medRxiv (Medicine) 2026-06-24

Barriers and facilitators to diabetes management among adults and healthcare providers in a peri-urban Ugandan health facility: A qualitative study

Diabetes mellitus is an increasing public health challenge in Uganda and other low- and middle-income countries, where health systems face growing demands for chronic disease care. Although quantitative studies have documented poor glycemic control and health system constraints, less is known about how patients and healthcare providers experience diabetes management in peri-urban public health settings. This study explored barriers and facilitators to diabetes management among adults with diabetes mellitus and healthcare providers at a peri-urban health facility in Uganda. We conducted a qualitative descriptive study at Kasangati Health Centre IV, Wakiso District, Uganda, between February and March 2025. Data were collected through 15 in-depth interviews with adults living with diabetes mellitus and 8 key informant interviews with healthcare providers involved in diabetes care. Participants were purposively selected based on their experience with diabetes management and service delivery. Interviews were audio-recorded, transcribed verbatim, translated where necessary, and analyzed using a hybrid inductive-deductive thematic approach informed by the Theoretical Domains Framework. Five interrelated themes were identified: (1) institutional and environmental factors influencing access to diabetes care; (2) cognitive and informational factors influencing medication adherence; (3) social influences on diabetes management; (4) emotional experiences of patients and healthcare providers; and (5) self-management strategies and continuity of care. Across these themes, participants identified barriers including resource limitations, communication challenges, medication management difficulties, stigma, emotional distress, and weak follow-up systems. Facilitators included peer support, religious and community networks, health education, provider flexibility, and patient-developed adherence strategies. Diabetes management was influenced by interacting health-system, social, informational, and behavioural factors. Resource constraints, limited health literacy, stigma, and weak follow-up systems hindered effective management, while social support, health education, and patient self-management strategies facilitated continued engagement in care. Interventions that strengthen chronic care services, patient education, and community support may improve diabetes outcomes in similar resource-constrained settings.

20.
arXiv (CS.CV) 2026-06-16

Rel-Zero: Harnessing Patch-Pair Invariance for Robust Zero-Watermarking Against AI Editing

Recent advancements in diffusion-based image editing pose a significant threat to the authenticity of digital visual content. Traditional embedding-based watermarking methods often introduce perceptible perturbations to maintain robustness, inevitably compromising visual fidelity. Meanwhile, existing zero-watermarking approaches, typically relying on global image features, struggle to withstand sophisticated manipulations. In this work, we uncover a key observation: while individual image patches undergo substantial alterations during AI-based editing, the relational distance between patch pairs remains relatively invariant. Leveraging this property, we propose Relational Zero-Watermarking (Rel-Zero), a novel framework that requires no modification to the original image but derives a unique zero-watermark from these editing-invariant patch relations. By grounding the watermark in intrinsic structural consistency rather than absolute appearance, Rel-Zero provides a non-invasive yet resilient mechanism for content authentication. Extensive experiments demonstrate that Rel-Zero achieves substantially improved robustness across diverse editing models and manipulations compared to prior zero-watermarking approaches.

21.
arXiv (CS.LG) 2026-06-19

A graph neural network surrogate model for mesh-based crashworthiness prediction of vehicle panel components

arXiv:2503.17386v2 Announce Type: replace-cross Abstract: Crashworthiness is a key performance measure in the design of safety-critical vehicle panel components such as B-pillars. Finite element (FE) simulations are widely used to evaluate crash responses but remain computationally expensive for large-scale, nonlinear impact scenarios, particularly when integrated into iterative design and optimisation processes. Although machine learning-based surrogate models have been developed for rapid crashworthiness analysis, they exhibit limitations in detailed representation of complex 3-dimensional components. Graph Neural Networks (GNNs) have emerged as a promising solution for processing data with complex structures. However, existing GNN models often lack sufficient accuracy and computational efficiency to meet industrial demands. This paper proposes Recurrent Graph U-Net (ReGUNet), a graph-based surrogate model for crashworthiness analysis of vehicle panel components. By representing FE meshes in graph form, the model naturally accommodates complex irregular structural geometries. Its hierarchical architecture improves computational efficiency and accuracy, while the introduction of recurrence enhances stability of temporal predictions over multiple time steps. A side-impact case study of hot-stamped steel B-pillars with varying geometries is used to generate training dataset. The trained model demonstrates high accuracy in predicting the dynamic deformation behaviour and crashworthiness indicators of previously unseen component designs. ReGUNet achieves over a 52% reduction in the average deformation prediction error relative to baseline methods, together with markedly improved computational efficiency. ReGUNet provides rapid and reliable crashworthiness assessments, which in turn accelerates the design cycle of vehicle panel components.

22.
arXiv (math.PR) 2026-06-25

Degree-preserving conservative processes and a unified approach for their hydrodynamics

arXiv:2604.03548v2 Announce Type: replace Abstract: We investigate a broad class of large-scale one-dimensional interacting systems characterized by a single conservation law and satisfying the "degree-preserving property". Under mild and natural assumptions, we establish a unified framework for the analysis of both invariant measures and hydrodynamic limits. In particular, we prove that when the generator preserves the degree of polynomials of the state variables up to order two, the marginals of any product invariant measure must belong to a family of six specific distributions. This classification is shown to be consistent with a classical result on univariate natural exponential families due to C.N. Morris, which we apply here for the first time in the context of microscopic stochastic systems. As a consequence, we construct a new interacting particle system whose invariant measure is given by the generalized hyperbolic secant distribution. Furthermore, we prove that, despite the generality of the dynamics, the macroscopic behavior of all models in this class is governed by the classical heat equation, with a diffusion coefficient depending explicitly on the underlying microscopic interactions.

23.
arXiv (CS.LG) 2026-06-11

OGPO: Sample Efficient Full-Finetuning of Generative Control Policies

arXiv:2605.03065v2 Announce Type: replace Abstract: Generative control policies (GCPs), such as diffusion- and flow-based control policies, have emerged as effective parameterizations for robot learning. This work introduces Off-policy Generative Policy Optimization (OGPO), a sample-efficient algorithm for finetuning GCPs that maintains off-policy critic networks to maximize data reuse and propagate policy gradients through the full generative process of the policy via a modified PPO objective, using critics as the terminal reward. OGPO achieves state-of-the-art performance on manipulation tasks spanning multi-task settings, high-precision insertion, and dexterous control. To our knowledge, it is also the only method that can fine-tune poorly-initialized behavior cloning policies to near full task-success with no expert data in the online replay buffer, and does so with few task-specific hyperparameter tuning. Through extensive empirical investigations, we demonstrate that OGPO drastically outperforms methods alternatives on policy steering and learning residual corrections, and identify the key mechanisms behind its performance. We further introduce practical stabilization tricks, including success-buffer regularization, two-sided conservative advantages, and Q-variance reduction, to mitigate critic over-exploitation across state- and pixel-based settings. Beyond proposing OGPO, we conduct a systematic empirical study of GCP finetuning, identifying the stabilizing mechanisms and failure modes that govern successful off-policy full-policy improvement.

24.
arXiv (CS.CL) 2026-06-19

Towards Truly Multilingual ASR: Generalizing Code-Switching ASR to Unseen Language Pairs

Automatic Speech Recognition (ASR) has become a key technology for human–AI interaction. However, code-switching ASR (CS-ASR) remains particularly challenging due to the severe scarcity of multilingual CS speech resources across diverse language pairs. Existing approaches primarily improve CS-ASR performance through synthetic CS speech generation or pair-specific fine-tuning on limited bilingual datasets. Nevertheless, these approaches face an inherent scalability limitation, as support for CS must be developed separately for language pairs whose number grows combinatorially with the number of supported languages. In this work, we investigate whether CS capabilities learned from a limited set of seen language pairs can generalize to unseen language pairs through model merging and domain generalization methods. Our experiments show that merged bilingual CS-ASR models modestly generalize to unseen language pairs, suggesting limited transfer of bilingual CS capabilities across language pairs.

25.
arXiv (CS.CV) 2026-06-16

Multi-Modal Attention for Automated Disaster Damage Assessment Using Remote Sensing Imagery and Deep Learning

Timely and accurate disaster damage assessment is crucial for effective emergency response, resource allocation, and recovery. Traditional methods, which often rely on manual inspections or sparse data, are typically slow and error-prone. This paper introduces a novel framework leveraging remote sensing imagery and deep learning to automate building damage classification. Using pre- and post-disaster satellite imagery, our model categorizes buildings into four damage levels: no damage, minor damage, major damage, and destroyed. The core innovation is a multi-modal attention mechanism that fuses bi-temporal features to explicitly detect and assess structural changes. We employ a lightweight ConvNeXT-Tiny backbone to ensure efficient processing without compromising performance. Key contributions include: (1) a cross-attention module for multi-modal data fusion, (2) an optimized preprocessing pipeline for large-scale datasets, and (3) robust data augmentation techniques. Experiments on a large-scale disaster dataset demonstrate an overall classification accuracy of 94.90%. The model effectively discriminates between damage categories and remains resilient to incomplete data. This system significantly improves assessment speed and accuracy, aiding emergency responders in prioritizing interventions. This work advances automated disaster damage detection by integrating multi-temporal imagery with deep learning, offering a scalable solution for real-time response.