Academic Intelligence · Curated Daily

探索全球前沿学术脉络

AcademicHub 汇聚顶级期刊与预印本平台的实时文献。定制您的专属科研雷达,利用大语言模型自动生成交叉领域文献分析简报。

01.
arXiv (CS.CL) 2026-06-16

MAWARITH: A Dataset and Benchmark for Legal Inheritance Reasoning with LLMs

Islamic inheritance law is challenging for large language models because solving inheritance cases requires complex, structured, multi-step reasoning and the correct application of juristic rules to compute heirs' shares. We introduce MAWARITH, a large-scale annotated dataset of 12,500 Arabic inheritance cases for training and evaluating models on the full reasoning chain: (i) identifying eligible heirs, (ii) applying blocking (\d{hajb}) and allocation rules, and (iii) computing exact inheritance shares. To the best of our knowledge, MAWARITH is the first Arabic corpus and benchmark designed for end-to-end Islamic inheritance reasoning. Unlike prior datasets that restrict inheritance case solving to multiple-choice questions, MAWARITH supports the full reasoning chain and provides step-by-step solutions with justifications grounded in classical juristic sources and established inheritance rules, as well as exact share calculations. This enables models to learn how to generate detailed, step-by-step responses to user queries that reflect real-world Islamic inheritance cases. To evaluate models beyond final-answer accuracy, we propose MIR-E (Mawarith Inheritance Reasoning Evaluation), a weighted multi-stage metric that scores key reasoning stages and captures error propagation across the pipeline. We evaluate six large language models in a zero-shot setting. A commercial model achieves about 90\%, whereas all evaluated open-source models remain below 50\%. Our error analysis identifies recurring failure patterns, including scenario misinterpretation, errors in heir identification, errors in share allocation, and missing or incorrect application of key inheritance rules such as \textquotesingle awl and radd. The MAWARITH dataset is publicly available at https://gitlab.com/nlpresearcher/mawarith.

02.
arXiv (CS.CV) 2026-06-16

MamBOA: State-Space Architecture for Video Recognition

Fine-grained action recognition demands temporal reasoning that general-purpose architectures address through different cost-accuracy tradeoffs: 3D dense operators couple computation to the input volume, while difference-based methods approximate motion through rigid, hand-crafted subtraction of uncontextualized features - each reflecting a deliberate design choice with corresponding limitations in expressiveness or flexibility. We present MamBOA, a backbone-agnostic temporal framework built upon a novel interleaved scan structure that recasts the selective state-space recurrence (S6) as a native motion synthesizer. By interleaving consecutive feature representations extracted from a pretrained backbone into a single alternating sequence, the proposed scan structurally drives the recurrence to encode both temporal observations of each position within a shared hidden state, separated by only a single decay step - rendering the inter-frame transition an intrinsic component of the state dynamics rather than an externally computed quantity. A cascade of dedicated alignment and decoding operations then distills this joint encoding into an explicit motion representation, which a dual-path pooling mechanism adaptively aggregates by balancing attention-driven selection with uniform temporal coverage. The framework interfaces seamlessly with CNN, Transformer, and Mamba backbone families, adding only ~2.1 GFLOPs per feature pair. On Diving48, MamBOA achieves 85.02% Top-1 accuracy with an image-pretrained backbone and 86.24% with a video-pretrained backbone processing the entire video in a single forward pass - demonstrating that structurally induced state-space dynamics constitute a principled and general foundation for motion modeling.

03.
arXiv (quant-ph) 2026-06-17

Broadband High-Level Squeezed Light using Waveguide Optical Parametric Amplifiers with External Dispersion Compensation

arXiv:2606.17422v1 Announce Type: new Abstract: We demonstrate broadband phase-sensitive amplification (PSA) measurement of squeezed light generated by a waveguide optical parametric amplifier (OPA) with external dispersion compensation. In broadband systems, group velocity dispersion (GVD) induces a frequency-dependent rotation of the squeezing axis, which limits the observable bandwidth in PSA measurements. To overcome this limitation, we introduce external dispersion compensation between two OPAs and suppress the quadrature rotation over a wide frequency range. As a result, we observe a maximum squeezing of 5.9 dB near the carrier frequency and more than 5 dB of squeezing up to a frequency offset of 4.5 THz from the carrier. Furthermore, squeezing below the shot-noise level is confirmed up to a frequency offset of 6 THz from the carrier, corresponding to the accessible phase-matching bandwidth of the waveguide OPA. Our results establish a practical method for broadband characterization of squeezed light and provide a key step toward ultrafast continuous-variable quantum information processing.

04.
bioRxiv (Bioinfo) 2026-06-10

GEOAgent: An AI-driven Autonomous Framework for Intelligent GEO Data Retrieval and Standardized Preprocessing

Datasets in the Gene Expression Omnibus (GEO) remain difficult to reuse at scale because sample annotations are heterogeneous and raw sequencing data require assay-specific preprocessing. We present GEOAgent, an AI-driven autonomous framework designed for intelligent dataset retrieval and standardized preprocessing by coupling autonomous semantic governance with an automated Nextflow pipeline named bioStream. Metadata from 181,760 sequencing series and 84,756 associated PubMed records were organized in a relational database and semantic index to support natural-language dataset retrieval. The framework automatically determines assay modalities, resolves experimental design pairings, and standardizes sample naming to minimize manual curation overhead. Based on these parsed attributes, the framework generates deployment-ready manifests to automatically execute containerized workflows across bulk and single-cell omics modalities. In expert-curated benchmarks, the workflow achieved 96% retrieval precision alongside 100% accuracy in assay classification and sample relationship resolution. The web platform is publicly accessible, while the source code and associated databases are openly available via GitHub and Zenodo.

05.
arXiv (quant-ph) 2026-06-15

Interpreting Bohm-like quantum potentials in "Computing quantum waves exactly from classical action"

arXiv:2605.20443v3 Announce Type: replace Abstract: The recent posting arXiv:2605.02621 [14], commenting on the article rspa.2025.0413 [7], argues that the proof of Lemma 3.1 in [7] is missing the spatial derivative of the density, which would lead to a Bohm-like quantum potential. This technical note shows why the propagated density is independent of space in the Feynman propagator construction of Lemma 3.1. This is done by extending the proof of Lemma 3.1 explicitly with Bohm-like quantum potential terms along the stationary action paths, and then showing that these terms are exactly zero. In [7], this property can also be verified directly on most examples (double slit, Aharonov-Bohm, potential well, harmonic oscillator, tunneling, EPR, QED), as well as in the derivations of the Pauli, Dirac, and Maxwell equations. For more general nonlinear actions, a time rescaling may be required to guarantee this space independence along stationary paths. In the hydrogen atom example, this time rescaling can be computed in closed form. In contrast to the general wave of the Madelung solution [9] Lemma 3.1 of [7] is defined first for a propagator, and a general wave is then constructed in a second step. Recall that a propagator is a specific quantum wave, which is initialized at $t=0$ with a Dirac impulse at a given initial position or momentum. In turn, a general wave is constructed in a second step by superposing a distribution of initial conditions using the propagator. This key difference is why the Bohm-like quantum potential terms disappear in the construction [7] (specifically, in the first step) while the Bohm potential in the Madelung analysis does not. This fundamental difference is also consistent with the fact that the wave construction in [7] extends naturally to relativistic contexts, while Bohmian non-locality notoriously prevents such extensions. Keywords - Response to arXiv:2605.02621, in relation to rspa.2025.0413

06.
arXiv (CS.LG) 2026-06-18

Anomaly Detection for Sparse and Irregular Multivariate Time Series with Latent SDEs

arXiv:2606.18898v1 Announce Type: new Abstract: Multivariate time series anomaly detection (MTSAD) is critical for a wide range of application areas, such as industrial monitoring, cybersecurity, or healthcare. Real-world data is often sparse, irregularly sampled or partially observed, yet existing methods assume uniformly sampled time series. We propose a generative approach based on Latent SDEs that projects the observed time series on a continuous-time stochastic dynamical system, directly being able to handle missing observations and irregular sampling, while also naturally capturing possible cyclic behavior that many real-world use cases inherently possess. Experiments on six anomaly benchmark datasets show that our proposed method ranks first among state-of-the-art baselines. We further demonstrate that our method remains robust under severe data sparsity, while performance significantly degrades for the tested baseline methods. These results highlight latent SDEs as a natural inductive bias for anomaly detection in multivariate time series, especially in presence of real-world irregularities.

07.
arXiv (math.PR) 2026-06-16

Super-Arrhenius relaxation of the triangular plaquette model in any dimension

arXiv:2606.16259v1 Announce Type: new Abstract: Consider the following plaquette model from statistical physics: a lamp lies at every vertex of the triangular lattice and a switch lies at every even vertex of the (bipartite) dual hexagonal lattice. Each switch toggles the three lamps on its face. The energy of a configuration is the number of ON lamps. For the Glauber dynamics associated with the Gibbs measure defined by this Hamiltonian at any inverse temperature $\beta>0$, we show that, in any dimension $d\ge 2$, the infinite volume relaxation time satisfies \[e^{\beta^2/C}/C \le T_{\mathrm{rel}}\le Ce^{e^{C\beta}}\] for some $C>0$. Our result entails that the Gibbs measure is unique. The $e^{\beta^2}$ scaling was conjectured by Newman and Moore in 1999 and matches the behaviour of supercritical rooted kinetically constrained models such as the East model, thus recovering fragile glass phenomenology in the absence of kinetic constraints. More precisely, we show that, on a torus of side length $2^k$, when $\beta\to\infty$ and $k/\beta\to0$, we have $T_{\mathrm{rel}}=e^{2\beta k(1+o(1))}$. Quite surprisingly, however, we also prove that, on non-periodic finite domains of size $n\le e^{\beta/C}$ for large $C>0$, we have the much larger asymptotics $\ln T_{\mathrm{rel}}=\beta n^{\Theta(1)}$. The main ingredients of the proofs are new results in extremal and enumerative combinatorics and rely on renormalisation ideas for the dynamics and its groundstates also known as the Ledrappier subshift. We note consequences of our results to geometric group theory (more precisely to the complexity of the word problem for the Baumslag finitely presented group) and to ergodic theory.

08.
arXiv (CS.LG) 2026-06-16

Descriptive versus Regulatory Uncertainty in Bounded Predictive Systems

arXiv:2605.18909v2 Announce Type: replace Abstract: Any system that models the world under finite representational capacity must compress; any compression entails a prior; and the prior is the system's bias. What has not been established is whether uncertainty participates in the dynamics governing future behavior, or merely describes the output distribution without consequence. We introduce a structural distinction between descriptive uncertainty, which does not recursively modulate the system's policy, and regulatory uncertainty, which directly enters the optimization landscape and drives persistent adaptive restructuring. We prove formally that current transformer architectures are confined to descriptive uncertainty at inference. We ground this in thermodynamics via Landauer's principle: for uncertainty to be regulatory, epistemic error must cost real energy; in a decoupled system, hallucinations and correct derivations dissipate identical energy. We test this empirically across three locally-deployed language models (3B, 8B, 70B parameters). Token-level Shannon entropy is statistically invariant across tasks spanning pattern retrieval, causal operator application, and out-of-distribution causal generalization in all three models (all pairwise p >= 0.568; within-model ranges 0.011-0.028 nats), while task accuracy varies substantially across the same conditions (0%-100%). Entropy and accuracy are orthogonal. The decoupling is scale-invariant: larger models achieve higher accuracy but identical entropy flatness. This structural incapacity is not resolvable by additional parameters or training data. Genuine epistemic grounding requires physical coupling between thermodynamic substrate state and information processing cost.

09.
arXiv (CS.CV) 2026-06-16

Mitigating Visual Hallucinations in Multimodal Systems through Retrieval-Augmented Reliability-Aware Inference

Multimodal large language models (MLLMs) have demonstrated strong capabilities in vision-language understanding and natural-language response generation. However, these systems can still produce overconfident predictions and hallucination-like outputs, particularly when the visual evidence is weak, ambiguous, or semantically inconsistent. Most existing approaches focus on improving multimodal representation alignment or retrieval-augmented generation, while providing limited mechanisms to quantify instance-level prediction reliability or identify incorrect visual outputs. This work proposes a retrieval-augmented reliability-aware inference framework for trustworthy multimodal visual understanding. The proposed framework constructs an external visual evidence database using pretrained visual embeddings and nearest-neighbor retrieval over normalized feature representations. Retrieved evidence is used to estimate prediction trustworthiness through multiple reliability indicators, including similarity strength, class-support agreement, evidence margin, entropy-based uncertainty, and an aggregate reliability score. Based on these signals, a decision gate determines whether the system should accept the prediction, answer with caution, or abstain/fallback when evidence is insufficient. A multimodal response-generation layer then produces a final user-facing response conditioned on the reliability decision. Experiments on ImageNet-100 demonstrate that the proposed reliability-aware framework improves accepted prediction accuracy from 85.84\% to 88.88\% at 89.04\% coverage. The hallucination-like accepted wrong-answer rate is reduced from 14.16\% to 11.12\%. These results show that integrating retrieval evidence, reliability estimation, and selective decision gating can improve calibration and reduce overconfident visual errors without retraining large multimodal models.

10.
arXiv (CS.CV) 2026-06-11

Frozen Multimodal Embeddings for Personality and Cognitive Ability Assessment in Asynchronous Video Interviews

Predicting psychological traits from asynchronous video interviews (AVIs) is a challenging multimodal learning problem because labeled datasets are limited while each response contains high-dimensional visual, acoustic, and verbal signals. This paper presents our solution for the ACM Multimedia AVI Challenge 2026, which evaluates two tasks: Track~1 predicts self-reported HEXACO personality traits from personality-related interview responses, and Track~2 classifies cognitive ability levels from structured AVI responses. We treat the problem as a small-sample representation learning task. Instead of fine-tuning large pretrained models, we use frozen multimodal encoders, including CLIP for visual features, Whisper for acoustic features and transcripts, and RoBERTa, E5, and DeBERTaV3 for textual representations, followed by low-capacity downstream models. For Track~1, our trait-specific regression and late-fusion system achieves an average validation MSE of 0.2696, improving over the official baseline of 0.3334. Ablation results show a three-step improvement from a global model (0.3189), to per-trait modeling (0.2871), to per-trait late fusion (0.2696), corresponding to a 19.1\% relative MSE reduction over the official baseline. For Track~2, a compact subject-attribute baseline reaches 0.5781 accuracy, while our multimodal ensemble reaches 0.5313, both above the official baseline of 0.4062. We interpret this result as evidence of possible subject-attribute shortcuts in the validation split rather than robust cognitive inference from AVI content. Overall, our findings suggest that AVI-based psychological assessment benefits from trait-specific multimodal modeling, but cognitive ability prediction requires careful control of dataset shortcuts.

11.
bioRxiv (Bioinfo) 2026-06-11

Integrating Spatially Adjusted Protein Summaries for Survival Prediction in Spatial Proteomics

Recent advances in spatial proteomics, particularly imaging mass cytometry, enable the measurement of protein expression at the single-cell level while preserving a spatial context. Conventional survival analyses, however, typically rely on patient-level averages of protein intensities and therefore overlook spatial heterogeneity and tissue architecture. To address this limitation, we introduce a framework that incorporates spatial information into survival modeling by generating spatially adjusted protein summaries (SAPS). In this approach, cell-level protein intensities within each patient are modeled using spatial spline regression to capture spatial trends. From these models, we extract two complementary features: a spatially adjusted mean expression and a residual variance that reflects cell-to-cell variability unexplained by spatial effects. These summaries are then incorporated into Cox proportional hazards models in combination with clinical covariates. In simulation studies, our proposed framework achieved improved predictive performance compared to other alternative methods. The application of the method to breast cancer imaging mass cytometry data indicate that spatially adjusted summaries may enhance survival prediction and reveal biologically interpretable spatial protein patterns, suggesting high translational potential. This methodology offers an efficient means of translating complex spatial proteomics data into patient-level features, providing both improved survival prediction and new insights into the role of spatial heterogeneity in cancer outcomes.

12.
PLOS Medicine 2026-05-12

Social contact patterns in the United Kingdom following the COVID-19 pandemic: The Reconnect cross-sectional survey

by Lucy Goodfellow, Billy J. Quilty, Kevin van Zandvoort, W. John Edmunds Background Close-contact and respiratory infectious diseases are spread through social interactions. Measuring these interactions has transformed our ability to understand transmission and control these infections. Social contact patterns were disrupted during the COVID-19 pandemic and have been affected by wider demographic, cultural, and workplace changes since then. Methods and findings To estimate post-pandemic social contact patterns in the United Kingdom, we conducted a cross-sectional social contact survey from November 2024 to March 2025 on a nationally representative sample of participants. Interactions were captured by age, gender, and across socioeconomic status (SES) and ethnic groups. We calculated the mean number of daily contacts and contact matrices, stratified by variables of interest, using a negative binomial regression model weighted by age, gender, ethnic group, and weekday/weekend. 13,238 participants were recruited, 3,019 of whom were aged under 18 years old; survey response rates were 36% and 27% for adults and children, respectively. The mean number of daily contacts was 9.1 (95% confidence interval (CI): 8.7, 9.5); this figure was 13.8 (95% CI: 12.8, 14.9) for children, and 7.8 (95% CI: 7.4, 8.2) for adults. Higher numbers of contacts were positively associated with employment, household income, and educational qualifications held. Contact matrices showed high levels of age-assortativity, as well as inter-generational contacts in the home. Contacts were assortative between ethnic groups and SES in all settings; this effect was strongest between ethnic groups in the home, and between SES in the workplace. We constructed socially-stratified next-generation matrices for a novel respiratory pathogen, projecting that the majority White ethnic group would account for the largest share of new infections (76.7% (95% CI: 75.5, 77.9) of cases), but that per-capita infection risk would disproportionately affect minority ethnic groups, with the risk for the Black population being 2.27 (95% CI: 2.06, 2.51) times that of the White population. This study may be limited by the inherent recall biases and reporting fatigue involved with self-reporting contacts. Conclusions This study provides crucial data to inform post-pandemic mathematical models of infectious disease transmission, and allows ethnicity and SES to be incorporated in such models.

13.
arXiv (CS.CV) 2026-06-18

Multi-Modal Hyper-Graph Fusion for Low-Light Crowd Counting

Crowd counting is a fundamental task in computer vision. However, crowd counting in low-light environments remains largely underexplored, despite its practical importance in the real world. Existing methods mainly focus on well-lit scenes or rely on single-modality Red-Green-Blue (RGB) representations, which often become unreliable under extreme darkness and complex non-uniform illumination. To handle this problem, we construct three new low-light crowd counting benchmarks, which consist of two synthetic datasets, SHA\_Dark and SHB\_Dark, and a real-world benchmark LC-Crowd (Low-light Crowd Dataset). Inspired by Retinex-based physical modeling, we introduce depth and Canny edge cues as complementary geometric and structural priors to enhance the intrinsic reflectance representation under low-light conditions. We propose a Multi-Modal Hyper-Graph Fusion module, which formulates RGB appearance, depth geometry, and edge structure cues as nodes in a unified hyper-graph and explicitly captures their high-order complementary relationships via dynamic hyperedge construction and message passing. Furthermore, to adaptively allocate computation in dense prediction, we propose a Deformable Rectangular Sparse Attention (DRSA) module, which concentrates computation on informative regions through anchor-aware estimation and adaptive rectangular window modeling. Based on these designs, we develop a unified Low-Light Counting Network (LCNet) for robust low-light crowd counting. Extensive experiments on three benchmarks demonstrate that the proposed method achieves the best overall performance against existing state-of-the-art (SOTA) methods. The code is in the supplementary material. The datasets will be made public upon acceptance.

14.
arXiv (CS.CL) 2026-06-18

Beyond Tokenization: Direct Timestep Embedding and Contrastive Alignment for Time-Series Question Answering

Recent advances in large language models (LLMs) have given rise to time-series question answering (TSQA), which formulates time-series analysis as natural-language question answering. However, directly feeding raw numerical series into LLMs suffers from a tokenization bottleneck: Byte Pair Encoding fragments continuous values into unstable tokens whose embeddings lack meaningful metric structure, resulting in the loss of magnitude, scale, and trend information. Prior methods use patch-based encoders that split the series into fixed windows, locking in one granularity that breaks patterns and hides exact timesteps, through a separate module that rarely transfers across datasets with different lengths or sampling rates. To address this challenge, we propose CADE (Contrastive Alignment with Direct Embedding), a novel framework for TSQA built upon two key components: direct timestep embedding and semantic alignment. The proposed framework maps each timestep directly into the LLM embedding space through a point-wise linear encoder and MLP projector, preserving exact index-level access while eliminating the need for patching and padding. To further bridge the semantic gap between time-series and language representations, we introduce a novel one-directional supervised contrastive loss that aligns time-series embeddings with frozen class-name text anchors. Experimental results on the public Time-MQA benchmark demonstrate that our framework consistently improves performance across six TSQA tasks, outperforming both open-source and proprietary LLM baselines.

15.
arXiv (CS.CL) 2026-06-12

EDEN: A Large-Scale Corpus of Clinical Notes for Italian

We present EDEN (Emergency Department Electronic Notes), a new and unique large-scale corpus of clinical notes produced in Emergency Departments of Italian hospitals. The corpus, in its current version, is composed of approximately 4 million clinical notes fully anonymized, covering diverse phases of patient care during the stay in the emergency department. In addition, a subset of about six thousand notes has been manually annotated by clinical experts through a structured Case Report Form (CRF) containing 132 items relevant for two patient situations in emergency departments, dyspnea and loss of consciousness. Items may assume numerical values (e.g., for blood saturation), categorical (e.g., for level of consciousness ), binary (e.g., for presence of traumas), and mixed value types. The annotation process involved multiple clinicians and underwent iterative revision to resolve ambiguities in item formulation, resulting in a richly structured (although high imbalanced) resource. The dataset aims to fill a relevant gap of data able to support both the development and the use of Large Language Models in concrete medical applications. We describe the data collection protocol, the on-site anonymisation pipeline, corpus statistics, and the annotation scheme. Finally, we propose CRF-filling as a novel structured information extraction benchmark, and provide zero-shot baseline resulting from Gemma-27B and MedGemma-27B. To the best of our knowledge, the EDEN dataset is the largest freely available corpus of clinical notes existing for the Italian language.

16.
arXiv (CS.LG) 2026-06-11

Time-multiplexed layer reuse for physical neural networks

arXiv:2511.00044v3 Announce Type: replace Abstract: Physical neural networks (PNNs) are promising candidates for next-generation computing, but existing demonstrations remain several orders of magnitude smaller than modern digital neural networks, whose recent advances have been driven by rapid growth in trainable parameters. This situation resembles the constraints of early digital neural networks, which led to ideas around parameter reuse. We investigate what similarly efficient hardware architectures may look like, focusing specifically on the common bottleneck of slow re-adjustment of the weights in PNNs. We propose the Time-Indexed Deep Alternating Layers Network (TIDAL-Net), which occupies an intermediate regime between recurrent and deep neural networks, specifically aimed at the scales and restrictions of common PNN prototypes. TIDAL-Net leverages the timescale separation found in many PNNs between fast forward dynamics and slowly trainable weights and biases, using layer-by-layer time multiplexing to increase effective depth while limiting implementation cost. Numerical experiments on image classification and natural language processing tasks show that TIDAL-Net improves performance with only minor modifications to conventional PNNs.

17.
arXiv (CS.LG) 2026-06-11

Visualizing LLM Latent Space Geometry Through Dimensionality Reduction

arXiv:2511.21594v3 Announce Type: replace Abstract: Large language models (LLMs) achieve state-of-the-art results across many natural language tasks, but their internal mechanisms remain difficult to interpret. In this work, we extract, process, and visualize latent state geometries in Transformer-based language models through dimensionality reduction. We capture layerwise activations at multiple points within Transformer blocks and enable systematic analysis through Principal Component Analysis (PCA) and Uniform Manifold Approximation and Projection (UMAP). We demonstrate experiments on GPT-2 and LLaMa models, where we uncover interesting geometric patterns in latent space. Notably, we identify a clear separation between attention and MLP component outputs across intermediate layers, a pattern not documented in prior work to our knowledge. We also characterize the high norm of latent states at the initial sequence position and visualize the layerwise evolution of latent states. Additionally, we demonstrate the high-dimensional helical structure of GPT-2's positional embeddings and the sequence-wise geometric patterns in LLaMa. We make our code available at https://github.com/Vainateya/Feature_Geometry_Visualization. A better formatted blog-post with identical content is available at https://iclr-blogposts.github.io/2026/blog/2026/vis-llm-latent-geometry/.

18.
bioRxiv (Bioinfo) 2026-06-19

OmniPath Metabo: chemical structures, interactions and mechanisms to study the metabolome

Mechanistic and functional analysis of omics data largely relies on the incorporation of prior knowledge; however, connecting metabolomics data and knowledge is a major methodological challenge. This is largely driven by the diverse prior knowledge being fragmented across many databases requiring the merging of different database records across chemical structures, identifiers, and varying levels of structural specificity. Hence, this limits mechanistic interpretation and functional characterisation of the metabolome. Here, we present OmniPath Metabo, a comprehensive, harmonized, metabolome-centric database covering metabolites, lipids, food-derived compounds, and small molecule drugs, along with their associated receptors, transporters, enzymes, reactions, allosteric regulators, and disease associations. OmniPath Metabo harmonizes attributes using controlled vocabularies and ontologies, structures and built-in cheminformatics to map identifiers and track ambiguity. OmniPath Metabo is built directly from 40+ original resources and is freely accessible via an interactive web app and API at metabo.omnipathdb.org. OmniPath Metabo enables dynamic, context-specific construction of subnetworks to serve dedicated purposes, such as cell-cell communication or integrated multi-omics metabolite-driven regulation, connecting reactions, allosteric regulation, metabolite-receptor and metabolite-transporter interactions. Combining it with the over 170 other resources in OmniPath, it can be used for integrated networks of signaling, gene regulation, and metabolism. We showcase the application of OmniPath Metabo by analysing publicly available metabolomics data of lung cancer cell lines and metabolic footprints to mutational patterns. In summary, OmniPath Metabo transforms fragmented resources into a harmonised prior knowledge framework for a mechanistic and functional analysis of the metabolome.

19.
Nature (Science) 2026-06-17

<i>CHPO</i> coordinates chilling recovery and nitrogen use in rice

作者:

Global rice production faces mounting challenges from abnormal temperature fluctuations and nitrogen-fertilizer-driven environmental pollution1–7. Developing varieties that balance chilling resilience and nitrogen-use efficiency (NUE) offers a promising solution, but the molecular networks coordinating these traits remain poorly understood. Here we identify CHILLING PHOENIX (CHPO), a major gene underlying the quantitative trait locus shared by both chilling tolerance and resilience. It encodes a MYB transcription factor that acts as a key regulator coordinating post-chilling recovery with nitrogen use in rice. Natural variation in a GCG-repeat-encoded polyalanine tract alters CHPO DNA-binding preference and redirects regulatory outputs between the japonica-type (CHPOjap) and indica-type (CHPOind), causing opposing effects on chilling tolerance and resilience. This allelic variation is shaped by domestication selection, with the CHPOjap allele probably derived from Chinese wild rice. CHPOjap directly targets OsTCP19 and OsNRT2.4 to fine-tune NUE, thereby enhancing chilling tolerance and resilience. These findings provide a mechanistic framework for a chilling-induced high-nitrogen-utilization module that alleviates the damage caused by chilling stress, and a potential molecular design&nbsp;strategy for breeding rice varieties with both chilling resilience and high NUE at the&nbsp;recovery stage. A rice gene, CHPO, links chilling resilience with nitrogen-use efficiency, revealing a domestication-shaped regulatory mechanism that could guide breeding of climate-resilient, sustainable rice varieties.

20.
arXiv (CS.CV) 2026-06-16

Learning Sparse Latent Predictive Foundation Model for Multimodal Neuroimaging

Brain MRIs are routinely acquired as multiple complementary sequences with unique contrast weighting, including T1-weighed imaging (T1w) anatomic and fluid-sensitive T2-weighted (T2w) contrasts. However, methods for learning unified representations across the multitude of MRI contrast mechanisms at health-system scale are lacking. In this study, we introduce Neuro-JEPA, a sparse multimodal neuroimaging foundation model that combines a latent predictive objective with a Mixture-of-Experts architecture to encode brain MRI across core T1w, T2w, and fluid-suppressed FLAIR imaging (FLAIR). We further provide a systematic methodological study of architectural, masking, objective, and sparsity design choices beneficial for robust neuroimaging multimodal representation learning. Neuro-JEPA was pretrained on 1,551,862 scans from 428,647 studies after modality-specific preprocessing with data curation across three core structural brain MRI sequences. We evaluated the learned representations across clinical and research settings, including 25 tasks from three health systems: NYU Langone, NYU Long Island, and Massachusetts General Hospital, and 22 tasks from 12 public datasets, covering unimodal, multimodal and cross-domain evaluation configurations. Across these benchmarks, existing neuroimaging foundation models showed inconsistent gains over a simple convolutional neural network (CNN) baseline, whereas Neuro-JEPA achieved stronger and more consistent performance across all evaluated settings. These results establish a scalable methodological framework for multimodal neuroimaging representation learning and highlight the need for foundation model evaluation protocols that include simple baselines, clinically heterogeneous cohorts and controlled multimodal comparisons.

21.
arXiv (CS.CV) 2026-06-16

Temporal Difference Learning for Diffusion Models

Diffusion models are typically trained with objectives that focus on local denoising targets at individual time steps (or adjacent pairs), which do not enforce consistency between predictions along the denoising trajectory. This lack of cross-time consistency can degrade performance, especially for few-step samplers. We introduce a temporal difference (TD) objective that penalizes inconsistency of the model's multi-step progress along the denoising path. By reformulating the diffusion process as a Markov reward process and casting denoising as a policy evaluation problem in reinforcement learning, we derive a unified TD approach that applies to both discrete- and continuous-time diffusion formulations. We further propose a principled sample-based reweighting method that stabilizes training. Empirically, we show that using our TD training can significantly improve sample quality measured by FID, with stronger advantages when the number of sampling steps is small, highlighting its practical utility under low-computation-budget scenarios. We provide ablation studies to justify our design choices, including pairwise loss reweighting, regularization weight, and one-step stride. Overall, our TD approach can be a general drop-in that enforces cross-time consistency and improves generation quality across different diffusion generative models.

22.
arXiv (CS.CL) 2026-06-17

Self-Generated Error Training for Token Editing in Diffusion Language Models

作者:

Token-to-token (T2T) editing lets LLaDA2.1 revise committed tokens during block-diffusion decoding. The released recipe trains this editor on random vocabulary corruptions, but at inference the editor sees the model's own fluent, high-confidence draft errors instead. We study this training-inference mismatch and propose self-generated T2T, which performs a no-gradient draft pass, fills masked positions with predicted tokens, and supervises recovery in a second pass under these self-generated corruptions. We implement the update as a short LoRA continued-pretraining pass on LLaDA2.1-mini and evaluate on several benchmarks under the official Q-Mode T2T procedure with unchanged inference parameters. The method generally improves accuracy while reducing T2T edit intensity, mitigating failure modes such as final-digit transcription errors after otherwise correct reasoning and excessive self-correction before short factual answers.

23.
arXiv (CS.AI) 2026-06-15

SpheriCity: Designing Trustworthy Conversational AI for Sustainability Decision Support

arXiv:2606.13854v1 Announce Type: cross Abstract: We present SpheriCity, an expert-grounded conversational prototype designed to support trustworthy knowledge sensemaking from sustainability reports. City-level circularity assessment reports contain rich information about materials, infrastructure, and policy interventions, yet their length and heterogeneous structure make cross-document synthesis and comparison difficult for practitioners and researchers working on circular economy initiatives. While large language models (LLM) promise faster knowledge access and synthesis, their opaque reasoning, hallucinations, and lack of source transparency introduce risks for trust and interpretability, and require verification in high-stakes sustainability contexts. SpheriCity addresses these challenges through a provenance-first conversational agent that foregrounds evidence traceability, structured synthesis, and interaction scaffolds to support exploratory querying and cross-document synthesis across sustainability reports. We conducted a formative expert review with six sustainability experts using representative queries spanning cross-city comparison, policy summarization, and recommendation-oriented tasks. Experts evaluated responses across dimensions and provided qualitative reflections on the system's usefulness for sustainability knowledge work. Our results reveal that transparent sourcing, contextual explanation, interpretability, and alignment with expert workflow strongly shape expert trust and judgments of system usefulness. This work contributes (1) a conversational prototype for sustainability knowledge sensemaking, (2) an expert-grounded evaluation framework for assessing AI responses in high-stakes knowledge domains, and (3) design insights into how provenance, uncertainty communication, and integration in workflow influence expert users' trust in AI assistance for sustainability decision support.

24.
arXiv (CS.AI) 2026-06-16

Adaptive Memory Crystallization for Autonomous AI Agent Learning in Dynamic Environments

arXiv:2604.13085v2 Announce Type: replace-cross Abstract: Autonomous AI agents operating in dynamic environments face a persistent challenge: acquiring new capabilities without erasing prior knowledge. We present Adaptive Memory Crystallization (AMC), a memory architecture for progressive experience consolidation in continual reinforcement learning. AMC is conceptually inspired by the qualitative structure of synaptic tagging and capture (STC) theory, the idea that memories transition through discrete stability phases, but makes no claim to model the underlying molecular or synaptic mechanisms. AMC models memory as a continuous crystallization process in which experiences migrate from plastic to stable states according to a multi-objective utility signal. The framework introduces a three-phase memory hierarchy (Liquid–Glass–Crystal) governed by an Itô stochastic differential equation (SDE) whose population-level behavior is captured by an explicit Fokker–Planck equation admitting a closed-form Beta stationary distribution. We provide proofs of: (i) well-posedness and global convergence of the crystallization SDE to a unique Beta stationary distribution; (ii) exponential convergence of individual crystallization states to their fixed points, with explicit rates and variance bounds; and (iii) end-to-end Q-learning error bounds and matching memory-capacity lower bounds that link SDE parameters directly to agent performance. Empirical evaluation on Meta-World MT50, Atari 20-game sequential learning, and MuJoCo continual locomotion consistently shows improvements in forward transfer (+34–43\% over the strongest baseline), reductions in catastrophic forgetting (67–80\%), and a 62\% decrease in memory footprint.

25.
arXiv (CS.CV) 2026-06-15

Naive Visual Memory is Not Enough: A Failure-Mode Study of GUI Agents

Graphical User Interface (GUI) agents are increasingly used to automate complex computer tasks across applications, websites, and operating systems. To improve their reliability, recent work has introduced experiential memory, where agents retrieve prior trajectories to guide decision-making in similar states. More recent approaches further extend this idea to visual memory by storing and retrieving screenshots from past interactions, providing agents with richer contextual information than text-only memories. However, the effect of visual memory in GUI agents remains insufficiently understood: it is unclear which failures visual memory mitigates, or which failures it exacerbates. To systematically analyze the effect of visual memory, we introduce a taxonomy of four GUI agent failures (i.e., cognitive failure, visual state misunderstanding, hidden operation blindness, and grounding error) that map to distinct stages of the perception-reasoning-action pipeline. We find that prepending full-image memory has a divergent effect on the failure distribution: it reduces state-level failures but worsens action-level ones, and increases hidden operation blindness and grounding error. Motivated by this finding, we propose Action-Grounded Visual Memory (AGMem), an action-grounded memory framework for GUI agents. The core idea of AGMem is to store image crops that capture the local GUI region closely related to a successful action or a recovery, rather than storing full screenshots. Experiments on OSWorld show that AGMem improves task success rates by 33.3 % over full-image memory. These results demonstrate that AGMem is an effective representation for visual memory in GUI agents.