Academic Intelligence · Curated Daily

探索全球前沿学术脉络

AcademicHub 汇聚顶级期刊与预印本平台的实时文献。定制您的专属科研雷达,利用大语言模型自动生成交叉领域文献分析简报。

01.
arXiv (CS.LG) 2026-06-16

Robust Neural Tucker Factorization with Bias Correction and Adaptive Initialization

arXiv:2606.16388v1 Announce Type: new Abstract: High-dimensional incomplete (HDI) tensors are widely used in traffic and climate applications, but sparse observations make accurate completion difficult. The intrinsic non-linear dynamics and non-stationary variations across distinct multi-modal fields severely hinder the efficacy of conventional linear reconstruction frameworks. Neural Tucker factorization provides an effective framework for modeling high-order interactions among tensor modes. By parameterizing underlying structural characteristics into continuous latent spaces, neural representations circumvent the rigid low-rank constraints of classical algebra. However, its performance can still be affected by implementation-level choices, especially parameter initialization and the bias configuration of the final output mapping. Suboptimal initializations frequently lead to variance explosion across the cubically expanded interaction spaces, driving the subsequent non-linear activation boundaries into severe gradient saturation zones, while the omission of a dedicated translation parameter forces interaction weights to implicitly absorb global statistical deviations. This paper proposes a simple yet effective neural Tucker factorization model with Kaiming initialization and bias correction (KaBiN) for HDI tensor completion. The proposed model utilizes Kaiming uniform initialization for the embedding and Tucker linear parameters, and adopts a simple bias correction in output mapping. By elegantly decoupling global mean shifts from local structural representations, the framework provides a highly stable and well-conditioned optimization landscape. Experiments on three real-world HDI tensor datasets show that KaBiN achieves better performance than the original NeuTucF, while introducing minimal computational overhead.

02.
arXiv (quant-ph) 2026-06-24

Discovery of connectivity-trainability trade-off of IQP Circuits for Hamiltonian Optimization

arXiv:2606.24264v1 Announce Type: cross Abstract: Instantaneous Quantum Polynomial-time (IQP) circuits are promising candidates for near-term quantum advantage due to the conjectured classical hardness of their sampling task. However, their capabilities for optimization remain largely unexplored. We present a systematic investigation of the performance and trainability of IQP circuits for Hamiltonian optimization. Our results reveal a trade-off between optimization performance and circuit connectivity, demonstrating that the circuit structure plays a key role in determining the ability of IQP circuits to reach low-energy states.

03.
arXiv (CS.CV) 2026-06-16

Effective and Low-cost Lane-based Map Localization for Vehicle-Centric Route Generation

Driver-centric route representation plays a vital role in intuitive driving guidance systems. This paper presents OLRA, a low-cost, map-localization-based framework that derives driver-view-aligned routes by matching map-based navigation routes with camera-detected lane markings. This alignment process mutually enhances vehicle localization accuracy and visual route consistency. To bridge the evaluation gap across different paradigms, we introduce practical route evaluation metrics and benchmark OLRA against OpenPilot, a representative direct-generation approach. Experimental results on the nuScenes dataset demonstrate that OLRA outperforms OpenPilot in complex road segments and in route estimation at distance beyond 20 meters, achieving lower overall Euclidean error. This study is expected to promote future research in low-cost, maplocalization-based route generation methods.

04.
arXiv (CS.CV) 2026-06-12

VISTA: Video Interaction Spatio-Temporal Analysis Benchmark

Existing benchmarks for Vision-Language Models (VLMs) primarily evaluate spatio-temporal understanding on simple single-action videos, closed attribute sets and restricted entity types, failing to capture the freeform, multi-action interactions between diverse entities which characterize real-world video understanding. Furthermore, the lack of a systematic framework for analyzing model failures across complementary spatio-temporal axes hinders comprehensive evaluation. To address these gaps, we introduce VISTA, a Video Interaction Spatio-Temporal Analysis benchmark designed for open-set, multi-entity and multi-action spatio-temporal understanding in VLMs. VISTA decomposes videos into interpretable entities, their associated actions, and relational dynamics, enabling multi-axis diagnostics and unified assessment of relational, spatial, and temporal understanding. Our benchmark integrates multiple datasets into a single interaction-aware taxonomy and comprises ~12K curated video-query pairs spanning diverse scenes and complexities. We systematically evaluate 11 state-of-the-art VLMs on VISTA, and break down aggregate performance across our taxonomy to reveal shortcomings and pronounced spatio-temporal biases obscured by traditional metrics. By providing detailed, taxonomy-driven diagnostics on a challenging dataset, VISTA offers a nuanced framework to guide advances in model design, pretraining strategies, and evaluation protocols. Overall, VISTA is the first, large-scale, interaction-aware diagnostic benchmark for spatio-temporal understanding in VLMs.

05.
arXiv (CS.LG) 2026-06-24

XConv: Low-memory stochastic backpropagation for convolutional layers

arXiv:2106.06998v5 Announce Type: replace Abstract: Training convolutional neural networks at scale demands substantial memory, largely because intermediate activations must be stored for backpropagation. Existing remedies (checkpointing, invertible architectures, or gradient-approximation methods such as randomized automatic differentiation) either add significant computation, impose architectural constraints, or require non-trivial code changes. We propose XConv, a near-drop-in replacement for standard 2D and 3D convolutional layers that addresses all three: it preserves standard backpropagation, imposes no architectural constraints, and integrates into existing codebases with minimal changes. XConv exploits the algebraic structure of convolutional weight gradients, storing highly compressed projections of the activations rather than the full tensors and approximating the gradients via multi-channel randomized trace estimation. The number of probing vectors sets a memory-accuracy tradeoff and recovers the exact gradient in the limit. We establish convergence guarantees and error bounds for the estimator, showing that its gradient-error variance is comparable to that of stochastic gradient descent. Empirically, XConv matches exact-gradient methods across classification, generative modeling, super-resolution, inpainting, and segmentation, with gaps that narrow as the number of probing vectors grows, while reducing activation memory by a factor of two or more when convolutional activations dominate, and remaining computationally competitive with optimized convolution kernels at larger batch sizes. At half precision the gradient-approximation error falls to the rounding floor, so XConv adds essentially no error beyond that of low-precision arithmetic. The savings matter most where activation memory rather than compute is the binding constraint, such as high-resolution and volumetric training and on-device finetuning.

06.
arXiv (quant-ph) 2026-06-12

Positive Conserved Quantities in the Klein-Gordon Equation

作者:

arXiv:2410.04666v3 Announce Type: replace Abstract: We introduce an embedding of the Klein-Gordon equation into a pair of coupled equations that are first-order in time. The existence of such an embedding is based on a positivity property exhibited by the Klein-Gordon equation. These coupled equations provide a more satisfactory reduction of the Klein-Gordon equation to first-order differential equations in time than the Schrodinger equation. Using this embedding, we show that the ``negative probabilities" associated with the Klein-Gordon equation do not need to be resolved by introducing matrices as Dirac did with his eponymous equation. For the case of the massive Klein-Gordon equation, the coupled equations are equivalent to a forward Schrodinger equation in time and a backward Schrodinger equation in time, respectively, corresponding to a particle and its antiparticle. We show that there are two positive integrals that are conserved (constant in time) in the Klein-Gordon equation and thus provide a concrete resolution of the historical puzzle regarding the previously supposed lack of a probabilistic interpretation for the field governed by the Klein-Gordon equation. A significant consequence is that the Schrodinger equation is given a relativistic formulation, which does not require creation and annihilation operators, i.e. quantum fields. Physically, this corresponds to a theory in which the positive and negative energy parts do not directly interact, hence there will be no annihilation events–for example, particle-antiparticle collisions which do not result in photon emission. Thus, one practical consequence of this relativistically consistent theory is a simple explanation for dark matter.

07.
arXiv (CS.LG) 2026-06-19

Score Approximation for Diffusion Models on Arbitrary Low-Dimensional Structures

arXiv:2606.19894v1 Announce Type: new Abstract: The remarkable success of score-based diffusion models has spurred significant efforts to establish their theoretical foundations. However, existing complexity bounds for score approximation rely heavily on restrictive assumptions like Lipschitz continuous densities or smooth manifold supports, which are routinely violated by the singularities, sharp boundaries, and disjoint clusters inherent to real-world perceptual data. This work establishes a universal score approximation theorem that works for any distribution supported on any compact set of upper Minkowski dimension $d$. Using a novel discrete-mixture formulation, we prove that the score function can be approximated with a ReLU network whose complexity grows exponentially only with $d$, thus breaking the exponential curse of ambient dimensionality. Combined with existing theories on accurately solving the backward diffusion SDE for arbitrary compact distributions, our work shows that diffusion models readily adapt to irregular, non-smooth data structures, explaining their competence in real-world generative tasks.

08.
medRxiv (Medicine) 2026-06-17

Diagnostic Concordance of Immediate Versus 1-Hour Technetium-99m Hydroxydiphosphonate Scintigraphy in Suspected Transthyretin Amyloid Cardiomyopathy

Background Bone-avid tracer myocardial scintigraphy for the diagnosis of transthyretin amyloid cardiomyopathy (ATTR-CM) has traditionally employed imaging at one or 3-hour intervals. Technetium-99m hydroxydiphosphonate (99mTc-HDP) has unique characteristics that may enable earlier imaging. We investigated the diagnostic concordance of immediate versus 1-hour acquisitions. Methods Consecutive patients with suspected ATTR-CM underwent planar imaging and SPECT/CT immediately and at 1-hour following the administration of 99mTc-HDP. Perugini grades and heart to contralateral lung (H/CL) ratios were assessed. Target-to-background ratios (TBRs) were calculated on the SPECT/CT acquisitions using the left ventricular (LV) septum and three background regions: aorta, LV blood-pool, and vertebrae. We assessed diagnostic concordance using Cohen's Kappa ({kappa}), temporal stability using paired t-tests, and correlation between timepoints using Pearson's coefficient (r). The 1-hour SPECT/CT interpretation served as the protocol reference standard. Results Forty-eight patients (83% male; median age, 80 [73-85] years) were evaluated. One-hour SPECT/CT identified 19 positive and 29 negative cases. Immediate SPECT/CT demonstrated 100% diagnostic concordance with the 1-hour reference standard ({kappa} = 1.000; 95% CI: 1.00 to 1.00; p < 0.001). The LV septum/LV Blood-Pool TBR showed the highest correlation (r = 0.956; 95% CI: 0.922 to 0.975; p < 0.001). The LV Septum/Aorta TBR demonstrated high correlation (r = 0.918; 95% CI: 0.857 to 0.953; p < 0.001) and remained stable in the ATTR-negative cohort (-0.02; 95% CI: -0.08 to 0.04; p = 0.54). Significant decrease in the LV Septum/Vertebrae TBR in the ATTR-negative (-0.55; 95% CI: -0.64 to -0.47; p < 0.001) and ATTR-positive cohorts (-1.14; 95% CI: -1.39 to -0.89; p < 0.001) was observed. Conclusions Immediate 99mTc-HDP SPECT/CT is diagnostically concordant with standard 1-hour protocols. By leveraging SPECT/CT and the favorable kinetics of 99mTc-HDP, immediate-phase imaging can accurately reproduce 1-hour acquisitions in cases of suspected ATTR-CM. This expedited approach may improve nuclear laboratory throughput and patient satisfaction.

09.
arXiv (CS.CL) 2026-06-15

Fusing Stylometric and Embedding Systems to Estimate Authorship Likelihood Ratios in Japanese

The likelihood ratio framework is widely recognized as the logically and legally sound basis for evidential analysis across forensic sciences, and its importance is increasingly acknowledged in analyses of authorship in textual evidence. To date, however, its application has been confined to English-language texts. Meanwhile, authorship attribution has traditionally relied on a diverse array of stylometric features, even as the rise of pre-trained large language models enables new contextual-embedding approaches. Combining these diverse approaches through fusion promises enhanced performance, yet it has not been applied to integrate stylometric-feature systems with embedding-based systems within the likelihood ratio paradigm. This study is the first to apply likelihood ratio-based forensic text comparison to Japanese digital texts, using ~1,000-character excerpts from blogs, to 1) evaluate system performance and likelihood ratio magnitudes and 2) assess the impact of fusing stylometric-feature systems with embedding-based systems. The results demonstrate that the fused system maintains excellent calibration while 1) increasing consistent-with-fact likelihood ratio magnitudes; 2) decreasing contrary-to-fact likelihood ratio magnitudes and 3) improving overall discriminability. The best-performing fusion achieved a log-likelihood-ratio cost of 0.32484, illustrating both the feasibility of likelihood ratio framework for Japanese and the benefits of fusion across heterogeneous systems.

10.
arXiv (CS.CV) 2026-06-16

Temporally Consistent and Controllable Video Generation of 2D Cine CMR via Latent Space Motion Modeling

Cine cardiac magnetic resonance is the gold standard for assessing cardiac function, but the scarcity of public datasets limits the development of advanced data-driven models. To address this limitation, we propose a generative method for synthesizing temporally coherent and anatomically consistent cardiac sequences. Our text-to-video framework decouples cardiac spatial structure from temporal motion. First, a fine-tuned diffusion model synthesizes an initial frame from a clinical text prompt, controlling anatomical features. Then, a latent flow model conditioned on a cardiac phase embedding generates the complete cardiac motion, ensuring spatial consistency and temporal control. Our model generates anatomically and pathologically diverse sequences with high temporal coherence and strong fidelity to input prompts, achieving a FID of 31.68 for image realism and a CLIP score of 31.04 for text-image alignment. These experimental results highlight its potential to produce high-fidelity, on-demand medical data, offering a scalable solution to data scarcity.

11.
arXiv (quant-ph) 2026-06-19

Measuring Rényi entropy with an Echo Protocol

arXiv:2504.05237v3 Announce Type: replace Abstract: We present efficient and practical protocols to measure the second Rényi entropy, whose exponential is known as the purity. Our approach is based on expressing the purity in terms of transition probabilities generated by an echo-type forward-backward evolution sequence, making it applicable to quantum many-body systems. Notably, our approach does not rely on random-noise averaging, a feature that can be extended to protocols to measure out-of-time-order correlation functions, as we demonstrate. By way of example, we show that our protocols can be practically implemented in superconducting qubit-based platforms, as well as in cavity-QED trapped ultra-cold gases.

12.
arXiv (quant-ph) 2026-06-16

How Many Shots Are Enough for a Quantum Circuit?

arXiv:2606.16965v1 Announce Type: new Abstract: Quantum algorithms require repeated circuit executions, known as shots, to estimate output distributions accurately. Determining the minimal number of shots needed to meet a target accuracy is crucial to reduce costs and resource usage, especially on today's noisy and expensive quantum hardware. In this paper, we address the shot optimisation problem in a black-box setting, where no assumptions are made about the structure of the quantum circuit or the noise model of the backend. We introduce IncrementalExecution, a novel online framework that dynamically determines when to stop executing shots based on the principle of point of diminishing returns: the point at which additional shots no longer significantly alter the empirical distribution of a fixed circuit. The framework supports customisable policies for shot management, enabling flexible trade-offs between execution cost and result fidelity within static execution scenarios. We assess our proposal through an extensive experimental evaluation spanning 33,750 framework configurations across 180 unique static quantum circuit-backend combinations, for a total of 7.3M independent experiments. Unlike prior work that relies on problem-specific knowledge or algorithm-dependent assumptions (e.g., variational or adaptive workflows), our approach is applicable to a large set of static circuits and immediately deployable on current quantum cloud platforms.

13.
arXiv (CS.AI) 2026-06-19

FlowEdit: Associative Memory for Lifelong Pronunciation Adaptation in Flow-Matching TTS

arXiv:2606.20518v1 Announce Type: new Abstract: Flow-matching text-to-speech systems achieve remarkable zero-shot quality but remain static after deployment: pronunciation errors on out-of-vocabulary proper nouns persist unless the model is retrained. We introduce FlowEdit, a life-long adaptation framework for frozen flow-matching TTS that learns pronunciation corrections as latent conditioning edits rather than weight updates. When corrective feedback is provided, FlowEdit optimizes a token-level perturbation in the text embedding space, then stores the correction in a Modern Hopfield Network serving as content-addressable episodic memory. At inference, corrections are retrieved via soft attention with a similarity gate, enabling fuzzy morphological matching. On our curated benchmark of 312 multilingual proper nouns across 18 language families, FlowEdit reduces target-word Phoneme Error Rate by 92.7% relative to the zero-shot baseline while maintaining identical general-speech quality. Corrections complete in approximately 15 seconds on a single GPU.

14.
arXiv (CS.CV) 2026-06-16

Instance-Aware Knowledge Distillation for Semi-Supervised Learning of an On-Board Multi-Task Dense Prediction Model for Collision Avoidance System

Collision avoidance systems have evolved toward camera-based deep learning approaches for driving scene understanding. However, deployment in edge environments such as country clubs is constrained by limited computational resources and unreliable communication infrastructure. Moreover, constructing large-scale datasets for the target domain involves substantial annotation cost. To address these limitations, we propose an instance-aware knowledge distillation framework for semi-supervised learning. Specifically, we generate pseudo labels that mitigate teacher bias by leveraging domain priors from the teacher and instance-centric knowledge from foundation models. The trained lightweight student is deployed in the proposed collision avoidance system and performs multiple dense prediction tasks in real-time. The system detects frontal obstacles and encodes their spatial information into controller area network messages for automated guided vehicle operation. To achieve this, we construct a large-scale country club dataset and perform field validation of the proposed system. Experimental results demonstrate that the student outperforms the large teacher in instance segmentation while mitigating performance degradation in monocular depth estimation. Compared with the teacher, the student reduces FLOPs by 22.68$\times$ and parameters by 14.33$\times$, achieving 6.46 FPS on a low-cost edge device.

15.
arXiv (CS.AI) 2026-06-24

BioMedArena: An Open-source Toolkit for Building and Evaluating Biomedical Deep Research Agents

arXiv:2605.06177v2 Announce Type: replace Abstract: Reproducing and comparing deep research agents today is hard: the same backbone evaluated on the same benchmark can report different accuracies across papers because the harness and tool registry differ, and integrating a new model into a comparable evaluation surface costs weeks of model-specific engineering. These are symptoms of a broader reproducibility problem in deep research agent research. Here, we introduce BioMedArena, an open-source toolkit that addresses this reproducibility gap and provides an arena for comparing deep research agents under a shared evaluation environment. BioMedArena decouples six layers of biomedical agent evaluation – benchmark loading, tool exposure, tool selection, harness mode, context management, and scoring – and exposes 166 biomedical benchmarks and 75 biomedical tools across 9 functional families. Adding a new model, benchmark, or tool can be accomplished with a few-line provider adapter. Beyond evaluation infrastructure, BioMedArena ships a library of high-quality reference components: 6 agent harnesses (including our proposed Mutual-Evolve) and 6 context-management strategies, any of which can be equipped on any backbone. Equipping these components substantially improves all 12 backbones; on each of 8 representative biomedical benchmarks, the best equipped backbone surpasses prior state-of-the-art (SOTA), by 15.01 percentage points on average. The toolkit, configurations, and per-task traces are available at https://github.com/AI-in-Health/BioMedArena.

16.
arXiv (quant-ph) 2026-06-17

Probes of chaos over the Clifford group and approach to Haar values

arXiv:2603.29695v3 Announce Type: replace Abstract: Chaotic behavior of quantum systems can be characterized by the adherence of the expectation values of given probes to moments of the Haar distribution. In this work, we analyze the behavior of several probes of chaos using a technique known as Isospectral Twirling [1]. This consists in fixing the spectrum of the Hamiltonian and picking its eigenvectors at random. Here, we study the transition from stabilizer bases to random bases according to the Haar measure by T-doped random quantum circuits. We then compute the average value of the probes over ensembles of random spectra from Random Matrix Theory, the Gaussian Diagonal Ensemble and the Gaussian Unitary Ensemble, associated with non-chaotic and chaotic behavior respectively. We also study the behavior of such probes over the Toric Code Hamiltonian.

17.
arXiv (CS.AI) 2026-06-15

A Comparative Study of Deep Learning Architectures for Multi-Horizon Behavioural Forecasting for Mobile Health

arXiv:2606.14604v1 Announce Type: cross Abstract: Wearable devices and smartphones generate rich behavioural time series that can support proactive health interventions, yet systematic comparisons of modern forecasting architectures for these data are lacking. In particular, it remains unclear how models generalise across populations, how different architectures respond to participant-level fine-tuning and how forecasting accuracy degrades across multi-day horizons. We benchmark six deep learning architectures, two zero-shot Foundation Models (FM) and statistical baselines on three public datasets encompassing over 800 participants, reporting per-feature metrics for step counts, screen time and sleep duration across 1-8 day horizons. We further conduct a per-feature personalisation study across all six architectures and assess FM transferability across dataset sizes and temporal granularities. Our key findings are: (i) no single architecture dominates, PatchTST leads among trained models while the three runners-up (TCN, MLP, Transformer) show no meaningful performance difference; (ii) the FM TimesFM matches or exceeds trained models zero-shot, especially in low-data regimes and (iii) participant-level fine-tuning reduces per-feature RMSE by 16-60\%, with sleep benefiting most and step counts least. These results provide practical guidance on architecture selection, FM applicability and personalisation strategies for mobile health forecasting. To the best of our knowledge, this is the first study to jointly evaluate modern deep learning, FMs and personalisation for multi-horizon behavioural forecasting from wearables.

18.
medRxiv (Medicine) 2026-06-15

Diabetes and the Life-Course: Evidence from Panel Data and Electronic Health Records

Incidence of type 2 diabetes is increasing at ages when education, work, family, and financial transitions are taking place, yet we lack robust evidence of whether earlier treatment changes life-course outcomes and over which time span this takes place. This paper uses the medical cutoff for diabetes diagnosis (HbA1c of 6.5 percent) as a natural experiment to study the effects of diabetes treatment using electronic health records (EHR) and panel data. This paper has three main findings. First, using EHR data, we find that there is a sharp increase in the probability of both diagnosis of diabetes and prescription when the HbA1c equals 6.5 percent. Second, we find that treating diabetes reduces HbA1c levels, weight, BMI, and blood pressure and increases the amount of care received, proxied by the number of HbA1c tests. Both the diagnosis and a prescription are independently able to produce positive changes in metabolic health, although a prescription is more effective in this regard. Third, we conclude that treating diabetes does not have a significant effect on life-course outcomes for a cohort of young Americans aged 24-32, although it does result in a reduction in HbA1c levels that are seen even eight years after the intervention. Taken together, these findings suggest that receiving a diagnosis and prescription are both effective treatments for diabetes, but they do not translate to significant alterations in the lives of young adults in the medium-term.

19.
arXiv (CS.CL) 2026-06-24

Balalaika: Data-Centric, Prosody-Aware Annotation Pipeline for Russian Speech

We introduce Balalaika, an open-source, data-centric pipeline for processing audio and producing prosody-aware annotations. It combines semantic VAD for context-preserving segmentation, multi-ASR ensembling with ROVER consensus decoding, while retaining optional word-level timestamps, followed by automatic quality and speaker-purity filtering. The text is further enriched with punctuation restoration, lexical stress and "\textipa{e}/\textipa{\H{e}}" normalization, and IPA phonemes. Using Balalaika, we build a 5.1k-hour multi-source Russian corpus with rich annotations, and show consistent gains under equalized training budgets for both speech denoising and TTS; ablations confirm complementary benefits of stress and punctuation and improved synthesis with stricter MOS filtering. The datasets are publicly available at \href{https://huggingface.co/collections/lab260/balalaika-dataset}{\underline{HuggingFace}}

20.
arXiv (CS.AI) 2026-06-12

Evaluation Sovereignty in Metadata-Driven Classification: A Multi-Track Framework for Weakly Supervised Information Systems

arXiv:2606.13436v1 Announce Type: new Abstract: Evaluation in machine learning is typically treated as a neutral measurement process. However, in operational information systems, evaluation outcomes are often conditioned by the processes used to generate labels. This paper does not seek to improve classification performance. Instead, it examines the validity of performance measurement under differing label-authority regimes. This issue is particularly relevant in large-scale metadata-driven systems, where labels are often incomplete, inconsistent, or weakly supervised. We introduce evaluation sovereignty, defined as the degree to which performance metrics are independent of label authority and supervision regime, and propose a multi-track evaluation framework that systematically varies training and evaluation label sources. Using hierarchical multi-label classification on large-scale scientific metadata, we demonstrate that models exhibiting strong performance under operational ("silver") evaluation degrade substantially under independent ("gold") evaluation, particularly for fine-grained classification. For example, Micro-F1 decreases from approximately 0.54 to 0.03. Notably, ranking-based metrics remain above baseline, revealing a divergence between latent model signal and classification validity. These findings suggest that commonly reported performance metrics may reflect alignment with labeling processes rather than true predictive capability. We therefore reconceptualize evaluation validity as a system-level property shaped by label governance and provide a practical methodology for auditing intelligent systems operating under weak supervision.

21.
medRxiv (Medicine) 2026-06-22

Vaccine introductions in the WHO African Region, 2023-26: a country-level ecological analysis by Gavi eligibility and conflict-affected status

Background. The Immunization Agenda 2030 (IA2030) tracks new and underused vaccine introduction as an access metric, and its mid-term review calls for stronger country ownership, prioritisation, data use and tailored support in conflict-affected and resource-constrained settings; however, national launch status does not measure recurrent financing, implementation, safety or equity. We examined how recent vaccine-introduction activity was distributed across the WHO African Region. Methods. We conducted a descriptive country-level ecological analysis of all 47 Member States from January 2023 to June 2026. The country was the unit of analysis and contributed one cumulative, unweighted count of nationally endorsed vaccine-introduction and programme-change events. Counts were linked to Gavi eligibility, World Bank FY26 conflict-affected status, broader fragile and conflict-affected situation status in sensitivity analysis, and concurrent system-performance indicators, and modelled with Poisson regression using HC1 robust standard errors. Two Expanded Programme on Immunization (EPI) manager survey waves were summarised at country level. Reporting followed STROBE and RECORD. Results. Seventy-two events were recorded across 38 of 47 Member States: 48 new-antigen introductions, 20 dose or schedule expansions and four combination-vaccine introductions; malaria vaccines accounted for 21. Gavi-eligible conflict-affected countries averaged 2.50 events per country versus 1.27 in both comparison groups. Gavi-eligible conflict-affected status was associated with a higher count (incidence rate ratio [IRR] 1.97, 95% confidence interval [CI] 1.38-2.81; p

22.
arXiv (CS.CV) 2026-06-17

Phys4D: Fine-Grained Physics-Consistent 4D Modeling from Video Diffusion

Recent video diffusion models have achieved impressive capabilities as large-scale generative world models. However, these models often struggle with fine-grained physical consistency, exhibiting physically implausible dynamics over time. In this work, we present Phys4D, a pipeline for learning physics-consistent 4D world representations from video diffusion models. Phys4D adopts a three-stage training paradigm that progressively lifts appearance-driven video diffusion models into physics-consistent 4D world representations. We first bootstrap robust geometry and motion representations through large-scale pseudo-supervised pretraining, establishing a foundation for 4D scene modeling. We then perform physics-grounded supervised fine-tuning using simulation-generated data, enforcing temporally consistent 4D dynamics. Finally, we apply simulation-grounded reinforcement learning to correct residual physical violations that are difficult to capture through explicit supervision. To evaluate fine-grained physical consistency beyond appearance-based metrics, we introduce a set of 4D world consistency evaluation that probe geometric coherence, motion stability, and long-horizon physical plausibility. Experimental results demonstrate that Phys4D substantially improves fine-grained spatiotemporal and physical consistency compared to appearance-driven baselines, while maintaining strong generative performance. Our project page is available at https://sensational-brioche-7657e7.netlify.app/

23.
arXiv (CS.LG) 2026-06-16

Dual-Network PINNs for Optimal Control: A Reproducible Benchmark on the Mass-Spring-Damper System

arXiv:2606.15271v1 Announce Type: cross Abstract: This work presents a transparent and reproducible benchmark study of a direct dual-network Physics-Informed Neural Network (PINN) formulation for the optimal control of a mass-spring-damper system. The classical linear-quadratic optimal control problem is solved by two independent classical methods – Pontryagin's Minimum Principle with single shooting, and direct transcription through trapezoidal collocation – and recast as a constrained optimization problem solved by two feedforward neural networks: a state network whose boundary conditions are enforced exactly through a composite cubic-and-mask ansatz, and an unconstrained control network. The composite loss combines the physics residual at the collocation points with a trapezoidal approximation of the cost functional, weighted by a single scalar hyperparameter. On the benchmark considered, the PINN reproduces the classical optimal cost to four significant digits, satisfies the terminal state constraints exactly by construction, and produces pointwise state and control errors that fall within the spread of the two classical references. Training is approximately two orders of magnitude slower than classical shooting on this benchmark, which is honestly reported. The contribution is methodological clarity rather than methodological novelty: the formulation and the accompanying Google Colab implementation are intended to lower the barrier to entry for practitioners exploring PINN-based optimal control without prior exposure to adjoint methods or two-point boundary value problems.

24.
arXiv (CS.LG) 2026-06-17

Domain-Validity-Gated Metamorphic Testing of Scientific ML Surrogates

arXiv:2606.17529v1 Announce Type: cross Abstract: Scientific machine-learning (SciML) surrogates approximate expensive simulations, but exact expected outputs for arbitrary inputs are unavailable (the oracle problem). Metamorphic testing checks relations across executions, yet a candidate relation is not automatically valid: its preconditions, output mapping, and the numerical floor of the scoring operator determine whether a violation is meaningful. We study how candidate metamorphic relations (MRs) can be screened for domain validity and turned into executable, oracle-free test assets for SciML surrogates. We propose (i) a domain-validity rubric that admits a candidate only when its tolerance dominates the operator's numerical floor and its preconditions hold; (ii) an MR-card executable-asset format recording source cases, transformations, metrics, tolerances, and typed relation-level verdicts; and (iii) a case-study protocol on MeshGraphNets cylinder-flow surrogates, with a claim ledger binding every result to a tracked artifact. On a MeshGraphNets checkpoint, node permutation holds to machine precision, mirror-y is a bounded out-of-distribution stress finding rather than an exact symmetry, and absolute conservation stays deferred while a reference-relative guard passes. The same readings hold across held-out trajectories, a checkpoint roster, three further architectures, and PhysicsNeMo. On a second CFD task (compressible airfoil) the predicate instead rejects incompressible continuity on physical grounds, showing it reasons about domain validity rather than running a fixed checklist. On a second PDE family, FNO Burgers and heat surrogates run full admit/reject/execute verdicts. The evidence spans two CFD tasks and a second PDE family, supporting a validity-aware bridge from candidate MRs to auditable SciML test assets that separates model-level violations from out-of-domain applications.

25.
arXiv (CS.AI) 2026-06-19

ITNet: A Learnable Integral Transform That Subsumes Convolution, Attention, and Recurrence

arXiv:2606.19538v1 Announce Type: new Abstract: Convolutional networks, recurrent networks, and transformers each encode different inductive biases – locality, sequential memory, and content-dependent pairwise interaction – and have remained mathematically distinct since their inception. We show that this fragmentation reflects not a fundamental diversity in how signals should be processed, but rather incomplete views of a single underlying mathematical object: a learnable integral transform. We introduce the Integral Transform Network (ITNet), a unified architecture built around a learnable kernel that depends jointly on positions and features. This kernel is implemented as a small neural network, specifically an MLP, that models pairwise interactions, enabling the model to adapt its behavior from data. We show that convolution, self-attention (including multi-head), and autoregressive recurrence (including LSTM, GRU, S4, and Mamba) arise as special cases under appropriate parameterizations, and that ITNet is a universal approximator of continuous operators. To make this practical, we develop tiled kernel fusion, importance-weighted Monte Carlo integration, and learned low-rank factorization, enabling efficient and scalable computation. A single ITNet architecture with a shared operator and lightweight modality-specific encoders matches or exceeds specialized baselines on ImageNet-1K , GLUE, ModelNet40, VQA\,v2 and NLVR2. The results demonstrate that a single learned interaction mechanism can recover the behavior of all three architectural families from data.