Academic Intelligence · Curated Daily

Explore the Frontier of Global Academia

AcademicHub aggregates real-time literature from top journals and preprint platforms. Build your personal research radar and let large language models compile cross-disciplinary analysis briefings automatically.

01.
arXiv (quant-ph) 2026-06-11

Nonlocal continuous-variable gates by amplified optical connections

arXiv:2603.12866v2 Announce Type: replace Abstract: Nonlocal quantum gates, coupling quantum systems located at a distance, are crucial for distributed quantum computing. To this aim, high-capacity optical noiseless connections between different processing units are essential for transmitting large amounts of information per mode. Simultaneously, optical quantum computing offers future high-speed multimode quantum processors. We propose a library of feasible protocols to implement a necessary nonlocal continuous-variable (CV) quantum nondemolition (QND) gate between two distant users sharing a quantum channel and exploiting classical communication. The users are endowed with a newly achieved high-fidelity and large-bandwith element - single-pass phase-sensitive optical parametric amplifier (OPA), that allows for both online squeezing and channel-loss compensation. The use of OPAs enhances quality of the resulting gate in terms of both excess noise and entangling capability. The proposed schemes are also applicable to CV cluster state fusion, providing a first step towards development of distributed CV measurement-based quantum computation.

02.
arXiv (CS.CL) 2026-06-11

Massive Open-Vocabulary Keyword Spotting

Automatic speech recognition systems have been shown to under-perform when it comes to transcribing words rarely seen in the training data, namely specialized terminology. Open-vocabulary keyword spotting, combined with contextual biasing, has been shown to mitigate this issue. However, existing systems can only handle glossaries of a few hundred terms without becoming an infeasible bottleneck. We propose a system that stores features with a memory footprint up to 128 times smaller than a comparable baseline and allows users to process massive databases while remaining open-vocabulary. Without fine-tuning the speech recognition model, our system achieves a comparable entity recall as uncompressed solutions, even in languages not seen during training.

03.
arXiv (CS.AI) 2026-06-17

CMIP-Forge: An Agentic System that Retrieves, Computes, and Self-Reviews Climate Science

arXiv:2606.17076v1 Announce Type: cross Abstract: The Coupled Model Intercomparison Project Phase 6 (CMIP6) has generated thousands of peer-reviewed publications documenting model configurations, evaluation procedures, emergent constraints, and projection uncertainties. As the community transitions toward CMIP7, efficiently extracting and operationalizing this unstructured knowledge alongside live data analysis represents a critical bottleneck. Here we present CMIP-Forge, a hybrid retrieval-augmented generation (RAG) and autonomous analysis system that bridges the gap between scientific literature and Earth System Grid Federation (ESGF) data archives. The system pairs a curated corpus of 6,581 CMIP6-related open-access publications (101,828 indexed chunks) with an agentic pipeline in which a tool-augmented worker plans and executes Python workflows over live climate data, while a panel of independent reviewer models audits its methodology end to end. CMIP-Forge introduces a multi-layered Defense-in-Depth architecture that enforces physical and methodological invariants through executable mechanisms: Abstract Syntax Tree (AST) static analysis, audited scientific primitives, and an autonomous adversarial peer-review protocol. We demonstrate the system's capabilities through end-to-end autonomous research pipelines spanning atmospheric teleconnections, ocean dynamics, regional extremes, and global warming projections. An agentic analysis system grounded in peer-reviewed literature, constrained by automated code guardrails, and audited by an independent adversarial review loop can complete complex climate-research workflows autonomously. The same experiments expose concrete failure modes of the review loop (sycophantic regression, REVISE verdicts that are never resolved, and the submission of stub code for review), each diagnosable from the immutable telemetry and provenance record released with the article.

04.
arXiv (CS.CV) 2026-06-16

SiGnature: Explicit Motion Diffusion for Stylized Semantic Gesture

While recent advances in co-speech gesture generation have achieved impressive rhythmic synchronization, synthesizing gestures that are both semantically meaningful and faithful to a speaker's unique non-verbal style remains an open challenge. Semantic gestures, such as iconic shapes or deictic pointing, are statistically sparse, making them difficult to learn effectively within standard generative models. We present SiGnature, a framework for Stylized and Semantic Gesture generation that reconciles precise semantic control with high-fidelity style preservation. Unlike prevalent methods that rely on entangled latent representations, SiGnature operates in an explicit joint-rotation space. This design enables our core contribution, Joint Motion Integration (JMI), a training-free inference mechanism capable of injecting any external motion sequence, particularly in-the-wild semantic gestures, directly into the diffusion process. JMI automatically identifies the specific ``active joints'' conveying a semantic action and injects them into the generation, while relying on the diffusion backbone to synthesize the remaining body dynamics, including posture and flow, in accordance with the pre-learned style of the target speaker. This allows for the plug-and-play integration of arbitrary motions, including complex semantic gestures, without retraining or introducing the ``Frankenstein'' artifacts typical of cut-and-paste methods. Extensive experiments and perceptual studies demonstrate that SiGnature offers superior semantic motion control while maintaining smooth and natural co-speech gesture generation and preserving the distinct characteristics of the speaker, thereby outperforming state-of-the-art baselines.

05.
arXiv (CS.CL) 2026-06-24

Progressive Alignment Objectives for Aligner-Encoder based ASR

Aligner-Encoders are recently proposed seq2seq end-to-end ASR models that replace decoder attention by predicting the uth token directly from the u-th encoder position, so the encoder must learn the alignment internally without cross-attention or a transducer lattice. In practice, this alignment often forms abruptly in the upper layers, making training sensitive and brittle on long utterances. We propose InterAligner, which adds an intermediate Aligner objective so alignment can form progressively across depth, together with an intermediate CTC loss (InterCTC) to stabilize optimization. On LibriSpeech with a 17-layer Conformer, a final-only Aligner reaches 5.0/7.8 WER (test-clean/other). InterCTC improves to 3.4/6.0, and InterAligner further reduces WER to 3.1/5.6 with the largest gains on long utterances.

06.
arXiv (CS.CL) 2026-06-24

CORE-Bench: Fostering the Credibility of Published Research Through a Computational Reproducibility Agent Benchmark

AI agents have the potential to aid users on a variety of consequential tasks, including conducting scientific research. To spur the development of useful agents, we need benchmarks that are challenging, but more crucially, directly correspond to real-world tasks of interest. This paper introduces such a benchmark, designed to measure the accuracy of AI agents in tackling a crucial yet surprisingly challenging aspect of scientific research: computational reproducibility. This task, fundamental to the scientific process, involves reproducing the results of a study using the provided code and data. We introduce CORE-Bench (Computational Reproducibility Agent Benchmark), a benchmark consisting of 270 tasks based on 90 scientific papers across three disciplines (computer science, social science, and medicine). Tasks in CORE-Bench consist of three difficulty levels and include both language-only and vision-language tasks. We provide an evaluation system to measure the accuracy of agents in a fast and parallelizable way, saving days of evaluation time for each run compared to a sequential implementation. We evaluated two baseline agents: the general-purpose AutoGPT and a task-specific agent called CORE-Agent. We tested both variants using two underlying language models: GPT-4o and GPT-4o-mini. The best agent achieved an accuracy of 21% on the hardest task, showing the vast scope for improvement in automating routine scientific tasks. Having agents that can reproduce existing work is a necessary step towards building agents that can conduct novel research and could verify and improve the performance of other research agents. We hope that CORE-Bench can improve the state of reproducibility and spur the development of future research agents.

07.
arXiv (CS.AI) 2026-06-12

EWAM: An Enhanced World Action Model for Closed-Loop Online Adaptation in Embodied Intelligence

arXiv:2606.12690v1 Announce Type: cross Abstract: In this paper, we propose the Enhanced World Action Model (EWAM), a closed-loop online adaptation architecture built upon a pretrained and fully frozen Cosmos3 backbone network. Evaluated entirely under a zero-shot task protocol, EWAM is centrally focused on reducing the amount of additional deployment data required to adapt to new task layouts. Notably, no extra task-specific demonstration sets were introduced in any of the evaluations, and no fine-tuning was performed on the backbone network. Its performance gains stem entirely from an inference-time co-reasoning mechanism composed of four inserted lightweight neural layers: the Neural Experience Memory Layer located in the intermediate layers of the Diffusion Transformer (DiT) provides task-relevant execution context; the Neural Anomaly Detection Layer after the state prediction head monitors the divergence between predicted and actual states in real time; the Neural Policy Routing Layer dynamically selects direct execution, conservative replanning, or rollback recovery based on the anomaly severity; and the Neural Action Correction Layer refines the generated action chunks using execution diagnostics. Unlike naive feature fusion, the memory, anomaly detection, and correction modules are deeply integrated into the Cosmos3 forward path in a differentiable manner, with only the final routing decision being a discrete supervised one.

08.
arXiv (CS.CL) 2026-06-24

Preferences of a Voice-First Nation: Large-Scale Pairwise Evaluation and Preference Analysis for TTS in Indian Languages

Crowdsourced pairwise evaluation has emerged as a scalable approach for assessing foundation models. However, applying it to Text to Speech(TTS) introduces high variance due to linguistic diversity and multidimensional nature of speech perception. We present a controlled multidimensional pairwise evaluation framework for multilingual TTS that combines linguistic control with perceptually grounded annotation. Using 5K+ native and code-mixed sentences across 10 Indic languages, we evaluate 7 state-of-the-art TTS systems and collect over 120K pairwise comparisons from over 1900 native raters. In addition to overall preference, raters provide judgments across 6 perceptual dimensions: intelligibility, expressiveness, voice quality, liveliness, noise, and hallucinations. Using Bradley-Terry modeling, we construct a multilingual leaderboard, interpret human preference using SHAP analysis and analyze leaderboard reliability alongside model strengths and trade-offs across perceptual dimensions.

09.
Science (Express) 2026-05-28

A Hormone Cell Atlas maps the human endocrine system at cellular resolution | Science

Authors: Unknown Author

Hormones act across tissues and organs to coordinate physiological functions. Drawing inspiration from the Human Cell Atlas, we analyzed expression of 379 hormone and receptor genes in a transcriptomic dataset comprising 14 million single cells and nuclei across 47 human tissues. Using hormone2cell, we mapped putative hormone-producing and hormone-receiving cell types, defining tissue-specific and cross-tissue endocrine signatures. We predicted non-classical sites of hormone expression, including secretin in plasmacytoid dendritic cells, inferred convergent hormone action and endocrine feedback loops, and implicated cell populations in monogenic endocrine disorders. In a cross-tissue integration of adipocyte datasets, we uncovered dynamic endocrine programs across depots, within adipocyte subtypes and through adipogenic differentiation. Cumulatively, the Hormone Cell Atlas ( hormonecellatlas.org.uk ) provides a comprehensive framework for dissecting hormonal impact on health and disease.

10.
arXiv (CS.LG) 2026-06-16

HAPI-EP: Towards Hybrid, Adaptive, and Predictive Digital Twins of Cardiac Electrophysiology

arXiv:2606.15637v1 Announce Type: new Abstract: A digital twin (DT) of a patient-specific heart offers significant potential in personalized medicine. However, its rapid and dynamic adaptation to an individual's live data and its predictive capability after adaptation remains central challenges. We examine this challenge from its two building blocks: DT formulation where mechanistic and data-driven models show competing merits and limitations, and DT optimization strategies that are largely driven by a reconstruction objective leading to un-identifiable models. We address both bottlenecks via HAPI – an AI framework for building hybrid, adaptive, and predictive DTs with three key enablers. First, HAPI constructs a physics-integrated gray-box model in which an interpretable mechanistic backbone is augmented by a neural component that models its residual to the observed data. Second, rather than attempting to pre-encode all possible variations in a static hybrid model, HAPI enables rapid on-the-fly adaptation of the hybrid model to few-shot live data, achieved by feedforward meta-learners realizing amortized inference of both mechanistic and neural parameters of the hybrid model trained with predictive objectives. Finally, we show that this adaptivity corresponds to the construction of a conditional generative model (i.e., the hybrid DT) that endows it with theoretical identifiability and thus strong performance in predictive scenarios. We demonstrate the proof-of-concept of HAPI in cardiac electrophysiology using a hybrid monodomain model with mechanistic reaction kinetics and neural graph diffusion. Across synthetic and real-data studies, we show that HAPI's mechanistic-neural hybridization and predictive adaptation are critical for obtaining identifiable DTs with strong predictive and out-of-distribution capabilities.

11.
arXiv (CS.CL) 2026-06-16

MedSynth: Realistic, Synthetic Medical Dialogue-Note Pairs

Physicians spend significant time documenting clinical encounters, a burden that contributes to professional burnout. To address this, robust automation tools for medical documentation are crucial. We introduce MedSynth – a novel dataset of synthetic medical dialogues and notes designed to advance the Dialogue-to-Note (Dial-2-Note) and Note-to-Dialogue (Note-2-Dial) tasks. Informed by an extensive analysis of disease distributions, this dataset includes over 10,000 dialogue-note pairs covering over 2000 ICD-10 codes. We demonstrate that our dataset markedly enhances the performance of models in generating medical notes from dialogues, and dialogues from medical notes. The dataset provides a valuable resource in a field where open-access, privacy-compliant, and diverse training data are scarce. Code is available at https://github.com/ahmadrezarm/MedSynth/tree/main and the dataset is available at https://huggingface.co/datasets/Ahmad0067/MedSynth.

12.
arXiv (CS.LG) 2026-06-18

MOLAR: Learning Multimodal Molecular Representations from Noisy Labels

arXiv:2606.18390v1 Announce Type: new Abstract: Motivation: Noisy labels are a common challenge in molecular property prediction because molecular annotations are often obtained from assays, curated databases, or weak annotation pipelines rather than directly observed clean biological states. Treating recorded labels as reliable supervision can cause models to memorize corrupted observations and learn misleading molecular evidence. In multimodal molecular representation learning, this issue can be amplified by graph-text fusion or alignment, which may propagate label-induced errors across modalities. Results: We propose MOLAR, a noise-aware framework for learning multimodal molecular representations from noisy labels. MOLAR separates latent clean-property inference from recorded-label observation: graph and text views contribute residual evidence to a clean-property distribution, and a categorical label-observation channel maps this distribution to recorded labels for training. This formulation derives posterior label reliability and modality-specific molecular evidence from the model. Experiments on naturally noisy molecular benchmarks and controlled label-flipping benchmarks show that MOLAR consistently outperforms representative baselines. Visualization analyses further show that MOLAR provides interpretable reliability and modality-evidence diagnostics.

13.
medRxiv (Medicine) 2026-06-18

Hospital-Level Variation in Antenatal Corticosteroids for Late Preterm Births

Objective: To determine whether and to what extent hospitals across the United States vary in their use of late-preterm steroids using a novel data set in which the timing of steroid administration relative to delivery can be observed. Methods: This was a retrospective cohort study of singleton births with known gestational ages identified in the Premier Healthcare Database from 2015 to 2022. The primary variable of interest was hospital-level adoption of antenatal corticosteroids for late-preterm singleton deliveries, calculated as the proportion of late-preterm singleton births (34-36 completed weeks of gestation) with any betamethasone exposure during the same late-preterm period. Hospital adoption was defined as the weighted average rate of ALPS administration among late-preterm infants across the entire post-period. Hospitals were ranked by their late-preterm steroid adoption rates and categorized by quartile based on the empirical distribution. Temporal trends were assessed using annual hospital-level adoption rates and visualized using time-series plots and distributional plots. A logistic regression model was constructed to determine hospital characteristics associated with being a highest-quartile adopting hospital. Results: The analysis cohort included 728 hospitals and 5,452,791 births, of which 361,006 (6.6%) were singleton late preterm births. Hospital steroid exposure rates ranged from 0 to 82% and were categorized into quartiles based on overall exposure rate, with cutoffs at 20.6%, 29.8%, and 40.1%. Median exposure rates increased progressively across quartiles from 14.1% (IQR 9.3-17.4%) in the lowest adopting hospitals (Q1) to 47.6% (IQR 43.7-53.2%) in the highest adopting hospitals (Q4), with substantial within-quartile variation. In the multivariable model, urban location was a strong predictor of high adoption after adjustment (aOR 2.05; 95% CI 1.11-3.83, p=0.02). Compared to Midwest hospitals, Southern hospitals had significantly lower odds of being high adopters (aOR 0.37; 95% CI 0.20-0.69, p

14.
arXiv (CS.CL) 2026-06-12

M\"OVE: A Holistic LLM Benchmark for the German Public Sector

We present M\"OVE (Modelle für die \"Offentliche Verwaltung Evaluieren), a holistic benchmark for evaluating large language models (LLMs) in the context of the German public sector. While LLMs are increasingly adopted in public administration, model selection remains largely ad hoc, and existing benchmarks offer limited guidance: they are predominantly English-centric, US-centric in content, and focus exclusively on task performance. M\"OVE addresses these gaps by evaluating 39 models across two complementary dimensions. Performance criteria cover summarization, question answering, and topic extraction. Governance criteria assess hallucination tendencies, energy consumption, provider transparency, and alignment with German constitutional values and knowledge about positions by German political parties. In total, we utilize ten German-language datasets, including gold- and silverstandard datasets that we constructed to reflect public-administration domains. We employ a multi-metric evaluation strategy combining classical NLP metrics, embedding-based methods, and LLM-as-a-judge approaches. Our results show that no single model dominates across all criteria: top performers differ between tasks, and model size alone is a poor predictor of quality. We further evaluate the benchmark itself, analyzing its statistical precision, LLM judge reliability, the impact of our private datasets on model rankings, the sensitivity of our results to prompt formulation, and the validity of our energy consumption estimates. M\"OVE is designed as a living benchmark under active development; results are publicly available at https://moeve.bundesdruckerei.de/.

15.
arXiv (quant-ph) 2026-06-19

Progress on the Kretschmann-Schlingemann-Werner Conjecture

arXiv:2308.15389v4 Announce Type: replace Abstract: Given any pair of quantum channels $\Phi_1,\Phi_2$ such that at least one of them has Kraus rank one, as well as any respective Stinespring isometries $V_1,V_2$, we prove that there exists a unitary $U$ on the environment such that $\|V_1-({\bf1}\otimes U)V_2\|_\infty\leq\sqrt{2\|\Phi_1-\Phi_2\|_\diamond}$. Moreover, we provide a simple example which shows that the factor $\sqrt2$ on the right-hand side is optimal, and we conjecture that this inequality holds for every pair of channels.

16.
arXiv (CS.CL) 2026-06-18

Aligning Implied Statements for Implicit Hate Speech Generalizability with Context-Bounded Semi-hard Negative Mining

Classifying implicit hate speech remains a challenge, as intent is often masked through insinuation and context rather than explicit slurs. Prior supervised contrastive approaches improve in-domain detection but can overfit surface cues and struggle to transfer across datasets. We propose ImpSH, a triplet-based framework that aligns posts with implied statements when available and uses context-bounded semi-hard negatives to focus learning on near confusions. We also examine AugSH, which forms positives via data augmentation. In controlled evaluations on IHC, SBIC, and DynaHate with BERT and HateBERT, ImpSH is a viable alternative to standard supervised contrastive baselines and often improves cross-domain performance under matched preprocessing and tuning budgets. Representation analysis using alignment and uniformity indicates tighter positive pairs with balanced global spread, and qualitative nearest-neighbor case studies illustrate typical false negatives under domain shift. These results demonstrate that aligning posts with their implied statements via context-bounded mining provides a more stable, bijective-like mapping to related insinuations, overcoming the volatility inherent in traditional clustering-based representation learning.

17.
Nature (Science) 2026-06-09

Good recycling starts at home — and benefits the world

Authors: Unknown Author

New research supports the value of household-level waste separation. But policies must also carefully consider consumer behaviours to maximize the quality of material collected. New research supports the value of household-level waste separation. But policies must also carefully consider consumer behaviours to maximize the quality of material collected.

18.
arXiv (CS.LG) 2026-06-24

Managing Task Execution for Unknown Workloads in Batteryless IoT: A Hardware-Agnostic Evaluation

arXiv:2606.24340v1 Announce Type: new Abstract: In recent years, the Internet of Things (IoT) paradigm has been shifting toward batteryless, energy-harvesting architectures. Sustaining reliable operation in these systems requires intelligent management of highly volatile stored energy. As edge applications grow in complexity, traditional energy-aware schedulers struggle with unpredictable workloads due to their reliance on static execution thresholds or pre-measured, hardware-specific task profiles. To overcome this, we propose two novel, hardware-agnostic dynamic scheduling strategies treating applications as a "black box," requiring no prior energy information: a model-free Reinforcement Learning (RL) agent and an on-the-fly Approximated Prediction (AP) method. We evaluate these methods against an adaptive task rate approach (AsTAR) and optimized static thresholds using a custom-built, physically accurate simulation framework driven by real-world solar data and dynamic LoRa transmission profiles. Rather than claiming universal superiority, our analysis exposes the distinct operational trade-offs of each method: the AP approach delivers lightweight, near-oracle task throughput; the RL agent provides tunable survival-execution balancing; and AsTAR excels at execution pacing across long energy gaps. Finally, we demonstrate that while these advanced strategies provide critical resilience for severely constrained systems with small capacitors, devices with larger energy buffers can efficiently rely on simpler, less computationally expensive static policies.

19.
medRxiv (Medicine) 2026-06-22

Effectiveness of Stress Management to Reduce Stress Eating for Women: A Systematic Review and Meta-analysis of Intervention Studies

Objective: This systematic review and meta-analysis examined 1) the effects of stress management interventions on changes in stress eating for women, and 2) the longevity of these effects, by summarizing and assessing evidence from controlled and non-equivalent pretest-posttest intervention studies. Method: Five databases (PsycINFO, PubMed, Medline, Web of Science, CINAHL), existing sources, and grey literature were searched (February - June 2025). Studies that assessed stress eating or emotional eating, included a stress management intervention, and comprised at least 70% women were included. The primary outcome was reduction in stress eating. Data were pooled in meta-analyses using multi-level random-effects models and subset by follow-up period. Risk of bias was assessed via funnel plots and sensitivity analyses. Results: Sixty studies with 119 effect size estimates were included in the primary analysis. Pooled estimates indicated that stress management interventions significantly reduced stress eating (Hedges g = -0.4174, p < 0.001), with pre-post designs having larger effects than controlled trials. Subgroup analyses of follow-up periods found small effects in the short-term (before 3 months; Hedges g = -0.4202, p < 0.0001) and moderate effects for mid-term (3-6 months; Hedges g = -0.5886, p < 0.0001). Effects beyond 6 months were small and nonsignificant (Hedges g = -0.4370, p = 0.0660). Conclusion and Relevance: Stress management interventions appear to be effective for reducing stress eating for women, suggesting the potential to incorporate stress management in interventions targeting obesity. Effects may be only sustained 6 months post-intervention, suggesting the need for strategies to bolster long-term effectiveness.

20.
arXiv (CS.LG) 2026-06-11

Phi-Actor-Critic: Steering General-Sum Games to Pareto-Efficient Correlated Equilibria

arXiv:2606.11284v1 Announce Type: cross Abstract: Real-world multi-agent systems, from traffic coordination to resource allocation, are often modeled as general-sum games where individual incentives conflict with collective welfare. In these settings, the central challenge is not merely finding an equilibrium, but selecting socially desirable outcomes among many suboptimal Nash equilibria. Standard deep multi-agent reinforcement learning (MARL) methods struggle with this problem, as value-decomposition approaches are constrained by monotonicity assumptions and policy-gradient methods often converge to stable but socially inefficient equilibria. To address this limitation, we propose $\Phi$-Actor-Critic ($\Phi$-AC), a framework that leverages swap regret minimization to steer learning toward high-welfare correlated equilibria (CE). To make counterfactual regret estimation tractable in deep MARL, $\Phi$-AC employs a centralized attention critic that predicts vector-valued regrets in a single forward pass, avoiding computationally expensive counterfactual simulations. We further introduce a Lagrangian-based equilibrium selection mechanism that optimizes social welfare while enforcing stability through regret constraints. Experiments on matrix games, Multi-Agent Particle Environments (MPE), and the Melting Pot Harvest scenario demonstrate that $\Phi$-AC learns efficient and stable coordination strategies across diverse mixed-motive settings while maintaining high collective return and competitive fairness.

21.
arXiv (CS.AI) 2026-06-17

Software Delegation Contracts: Measuring Reviewability in AI Coding-Agent Work

arXiv:2606.17099v1 Announce Type: cross Abstract: AI coding agents increasingly accept assigned software tasks, modify repositories under bounded authority, and return work packages for review. Prior work proposed the software delegation contract, covering the task, authority, returned work package, and acceptance context, as the unit of analysis for delegated coding work, but did not measure its effects. This paper reports a controlled pilot study of explicit delegation contracts for coding agents. We built a dependency-free TypeScript API task environment with seeded defects and documentation gaps, authored ten tasks across five families, and ran 64 agent executions across two model tiers under three conditions: a realistic issue-style prompt, an explicit delegation contract, and a contract with a required evidence bundle. Each run was scored with hidden acceptance tests, mutation checks, and scope analysis, then reviewed by three independent condition-blinded model-based reviewers using a fixed rubric, for 192 reviews. Explicit contracts did not improve objective task outcomes: all 64 runs passed hidden acceptance checks, with zero scope violations. They did improve reviewability. Evidence sufficiency improved in 22 of 30 paired comparisons and worsened in none (+0.83 on a 5-point scale, p < 0.0001, Cliff's delta = 0.66); reviewer ambiguity decreased (p = 0.035); changed-file lists, known-limitations sections, residual-risk sections, and reviewer checklists appeared mostly or only when demanded by the contract. Contracts cost +13% agent tokens and +38% wall-clock time, with larger effects for the weaker model tier. On these small tasks, delegation contracts bought reviewability rather than correctness.

22.
arXiv (CS.LG) 2026-06-19

Enhancing Graph Neural Networks Using Proximity Graphs for Dust Source Emission Forecasting

arXiv:2606.19825v1 Announce Type: new Abstract: Accurate prediction of dust source emissions is critical for mitigating the significant environmental and health hazards posed by dust storms. Traditional forecasting methods often struggle to capture the complex spatiotemporal dynamics of these phenomena. In this paper, we demonstrate that proximity graphs enable Graph Neural Networks (GNNs) to effectively model the intricate spatial and temporal relationships between data points. Specifically, we use proximity graphs–such as Delaunay triangulation, Gabriel graph, k-Nearest Neighbor graph, and Yao graph–as the input for GNNs (including GraphSAGE, Graph Convolutional Networks, and Graph Attention Networks) to perform message passing. Our approach highlights the effectiveness of integrating proximity graphs with GNNs for robust and accurate dust source forecasting. To emphasize the importance of proximity graph representations, we compare our method against GNNs using random graphs for message passing. The results show that GNNs with proximity graphs significantly outperform those with random graphs and are also far superior to Long Short-Term Memory (LSTM) model in dust source emission forecasting.

23.
Nature (Science) 2026-06-17

Optical metasurfaces for general vision processing on the edge

Authors:

Large-scale artificial intelligence (AI) models achieve notable performance in computer vision but require substantial computational resources, limiting their deployment on edge devices1,2. Optical neural networks (ONNs) promise reduced latency and energy consumption by making use of the inherent parallelism of light3. However, present ONNs struggle to scale and are confined to simple tasks, owing to the challenges of replicating exact algebraic operations of digital models using physical (analogue) systems. This work introduces a new paradigm that directly embeds core computer vision principles, including similarity-based recognition, attention-guided perception and detail–context fusion, into a large-scale optical metasurface. By unifying optical physics with these computer vision fundamentals, we develop a photonic–electronic engine that overcomes scalability and generality barriers, enabling high-accuracy, general-purpose computer vision at the edge. The resulting system combines a 41-million-parameter optical metasurface front end with a co-designed, ultraefficient 87,000-parameter digital back end, outperforming many digital models with tens of millions of parameters across object detection, segmentation, 3D reconstruction and video understanding. We build a deployable prototype and demonstrate real-time edge visual processing in natural scenes. This work represents a path towards practical optical computing for general vision tasks in complex natural environments, enabling a new paradigm for low-energy, low-latency, real-time on-device vision intelligence. By embedding core computer vision principles into a large-scale optical metasurface, an efficient vision processing system using far fewer parameters is demonstrated to outperform many digital models and enables deployment on edge devices.

24.
arXiv (CS.AI) 2026-06-19

A Tool for the Synthesis of Adaptive Probabilistic Processors Based on the Ising Model

arXiv:2606.19533v1 Announce Type: cross Abstract: This work presents a tool for the synthesis and simulation of probabilistic architectures for solving combinatorial optimization problems by mapping them to the Ising model. The proposed approach automatically constructs the Ising Hamiltonian and determines the number of probabilistic elements (p-bits) based on problem characteristics such as size and topology. Furthermore, the tool introduces an adaptive strategy for selecting the most suitable update algorithm among Gibbs Sampling, Simulated Annealing (SA), Simulated Quantum Annealing (SQA), and cluster-based methods. Experimental results using benchmark problems demonstrate improved convergence behavior and flexibility compared to fixed approaches. The proposed framework enables systematic evaluation of probabilistic computing strategies and supports the development of future hardware implementations based on MTJs and p-bits.

25.
arXiv (CS.AI) 2026-06-24

Structural Kolmogorov-Arnold Convolutions: Learnable Function on the Values or the Filter Shape as Parameter-Efficient Alternative to Per-Edge Convolutional KANs

arXiv:2606.24371v1 Announce Type: cross Abstract: Convolutional Kolmogorov–Arnold Networks (KANs) replace the fixed weights of a convolutional kernel with learnable univariate functions. The dominant formulation attaches one such function to every kernel entry and lets it act on pixel values, expressive but parameter-heavy and prone to overfitting. We argue that the learnable functions are better placed in the structure of the convolution than on each edge, and we organise the design space along a single axis: whether the function acts on the pixel values or on the filter shape. We study three realisations. SV-KAN applies one shared univariate function to the values and leaves the spatial filter free and static, aa classical convolution with a single learnable shared activation. AG-KAN keeps the shared value function but supplies the spatial structure through a content-adaptive Gaussian gate. RF-KAN instead moves the learnable functions onto the filter shape, building each filter from oriented ridge profiles expanded in a localised oscillatory (Morlet) wavelet basis with content-adaptive amplitudes. Under a matched four-layer protocol with in-run references and three seeds, RF-KAN and SV-KAN reach $88.47\pm0.10\%$ and $88.20\pm0.31\%$ on CIFAR-10 and $64.40\pm0.19\%$ and $64.57\pm0.30\%$ on CIFAR-100, at about $0.4$M parameters. At this matched scale the shape model and the simplest value model meet at the top, both above a plain convolution and every per-edge KAN we tested, including the official Gram variant, at roughly a fifth of the parameters. A controlled study attributes the RF-KAN gain to an intrinsically localised oscillatory basis and to content adaptivity, and an ablation that removes the learned shape entirely, leaving only the shared value function, collapses accuracy by over forty points, identifying the learned shape as the load-bearing ingredient at this scale.