Academic Intelligence · Curated Daily

Explore the Frontier of Global Academia

AcademicHub aggregates real-time literature from top journals and preprint platforms. Build your personal research radar and let large language models compile cross-disciplinary analysis briefings automatically.

01.
arXiv (CS.AI) 2026-06-18

Rescaling MLM-Head for Neural Sparse Retrieval

arXiv:2606.18811v1 Announce Type: cross Abstract: Learned sparse retrieval (LSR) models such as SPLADE have traditionally used BERT-style masked language models as backbone encoders. A natural expectation is that replacing BERT with stronger pretrained encoders should improve retrieval effectiveness. However, we find that under standard SPLADE training recipes, backbones with large MLM-head L2 norms can suffer performance degradation and even training collapse under standard SPLADE training recipes. We identify this failure as a scale mismatch in the MLM head: SPLADE directly uses MLM-head outputs to construct sparse lexical representations, and query-document relevance is computed by an unnormalized dot product over these representations. As a result, an inflated MLM-head scale can amplify sparse activations, distort matching scores, and destabilize contrastive training under common training settings. To address this issue, we introduce a simple initialization-time correction that rescales the MLM-head projection by a constant factor before SPLADE training. This zero-cost adjustment improves training stability without modifying the model architecture or training objective. Across both in-domain and out-of-domain retrieval benchmarks, this simple correction substantially improves large-norm backbones such as ModernBERT and Ettin, turning unstable training runs into competitive sparse retrievers. In several settings, the corrected models further match or surpass the classic BERT-SPLADE baseline. These findings suggest that the bottleneck in adapting pretrained encoders to LSR is not encoder capacity alone, but the calibration of the MLM-head scale used to construct sparse lexical representations.

02.
arXiv (CS.LG) 2026-06-17

Physics-Constrained Neural Networks for Improved Short-Term Weather Forecasting: A Case Study over the South Pacific

arXiv:2606.17659v1 Announce Type: new Abstract: This study introduces enhancements to physics-constrained neural networks (PCNNs) that improve the accuracy and stability of hybrid short-term weather forecasting models. Building on the WeatherGFT architecture, three innovations are proposed. First, an upgraded numerical solver, combining a fifth-order weighted essentially non-oscillatory scheme (WENO-5), a beta-plane approximation, and subgrid-scale viscosity, permits a fourfold increase in the integration time step to 1200 s while reducing the daily mean squared error by up to 26%. Second, a unified autoregressive hybrid block replaces the original chain of 24 specialised modules, eliminating overfitting to specific lead times. Third, the physical core is integrated with two state-of-the-art neural backbones, resulting in PI-PredFormer and PI-IAM4VP. Evaluation on the WeatherBench South Pacific subset from 2000 to 2004 shows that these hybrids reduce root mean squared error at 1-12 h lead times by 8-22% compared to purely neural counterparts, while better preserving physical consistency. These results demonstrate that incremental refinement of hybrid components offers a practical route toward more accurate and efficient short-range weather forecasting.

03.
medRxiv (Medicine) 2026-06-11

Dissecting the functional landscape of rare diseases through genomic variation in a heterogeneous cohort of 11,000 patients

Rare diseases (RDs) remain a major diagnostic challenge. Genetic and phenotypic heterogeneity, incomplete knowledge of disease mechanisms, and limitations in variant clinical interpretation leave many patients without a molecular diagnosis. Meanwhile, the growing volume of genomic data generated in clinical practice offers an opportunity to develop data-driven methodologies for exploring disease mechanisms and improving the reanalysis of unsolved cases. We aggregated real-world genomic data from 11,084 unrelated patients with suspected RD. Patients were clinically classified into 122 diseases. We built a multi-disease genomic variant frequency database (FJD-DB), which enabled the development of variant and gene-disease association scores by means of case-control subcohort comparisons across 32 disease groups. Functional enrichment analyses were then used to highlight disease-associated protein domains, pathways, biological processes, and phenotypes. Finally, the resulting knowledge was integrated into a data-driven framework for the guided reanalysis of unsolved RD patients applied to Inherited Retinal Dystrophies (IRD) patients as first use case. FJD-DB contained more than 45 million unique variants, including ~185,000 potentially pathogenic variants. Disease-specific analyses identified disease-associated pathogenic variants and highlighted both established and candidate disease genes. We detected 179 significantly enriched protein domains across 23 diseases, 124 Human Phenotype Ontology terms across 13 diseases, 79 Reactome pathways across 10 diseases, and 72 Gene Ontology biological processes across 8 diseases, revealing highly disease-specific functional signatures. Integration of disease-specific variant, gene, and functional association signals enabled the development of a data-driven framework for guided reanalysis of unsolved RD cases. Applied to more than 1,100 unsolved IRD cases, the framework generated clinically relevant findings in 26 patients, including four molecular diagnoses, seven candidate diagnoses, and 15 cases upgraded from non-informative findings to variants of uncertain significance. Aggregated real-world genomic data can be leveraged to identify disease-associated molecular signals generating novel biological hypotheses. A unified analytical framework provides a scalable strategy for knowledge discovery and guided reanalysis, facilitating the identification of overlooked and potentially novel genetic causes of RDs.

04.
arXiv (CS.CL) 2026-06-12

Getting Better at Working With You: Compiling User Corrections into Runtime Enforcement for Coding Agents

Interactive LLM agents are becoming part of daily work, but they do not reliably become easier to work with over time: a correction remembered in one session may still be violated in the next. We study this gap between preference access and preference compliance. In tasks derived from anonymized real-user friction cases, Mem0 memory still leaves 57.5% of applicable preference checks violated. We introduce Test-time Rule Acquisition and Compiled Enforcement (TRACE), a drop-in skill-layer pipeline for coding-agent runtimes that mines user corrections, rewrites them as atomic rules, and compiles them into runtime checks that must pass before an agent completes future tasks. Unlike runtime checks written ahead of time by developers, TRACE skills come from the user's own chat corrections. We evaluate TRACE with simulated user-in-the-loop experiments on ClawArena coding-agent tasks and MemoryArena-derived memory-intensive tasks. On ClawArena, TRACE reduces held-out preference violation from 100.0% to 37.6% on in-distribution tasks and from 100.0% to 2.0% on out-of-distribution tasks. On MemoryArena-derived tasks, TRACE reduces in-distribution violation from 100.0% to 60.5% while matching or exceeding the strongest memory baseline on task pass. These results suggest that compiling corrections into runtime enforcement can address a repeated-friction failure mode that memory alone does not reliably solve, reducing the need for users to restate the same correction across future sessions. Experiment code is available at https://github.com/YujunZhou/TRACE_exp, and the deployable skill is available at https://github.com/YujunZhou/tellonce.

05.
arXiv (CS.AI) 2026-06-12

Algorithmic Constitutionalism

arXiv:2606.12437v1 Announce Type: cross Abstract: The increasing encroachment of artificial intelligence (AI) on social life raises significant risks for society, particularly within the infospheres created and controlled by companies such as Google, Facebook, Apple, and Amazon. This article examines these risks through an in-depth analysis of Facebook's content moderation regime, which is already partially governed by algorithms. We argue that the idea of ethical engineering, often proposed in the literature as a solution to the governance challenges posed by AI, is inadequate for several reasons. In response, we develop an alternative framework, which we term "algorithmic constitutionalism." Our approach rests on three pillars: (a) a layered architecture consisting of two levels of code: (i) an operative or object level and (ii) a meta level designed to protect the system's core principles from algorithmically initiated change; (b) algorithmic meta-reasoning, which enables the system to operate simultaneously at both levels so that it can monitor, verify, and potentially correct in real time operations at the object level that depart from principles protected at the meta-code level; and (c) correction through deliberation. The article elaborates the concept of algorithmic constitutionalism and demonstrates how it may be applied to Facebook's content moderation regime. As part of this analysis, we examine the tension between societal constitutionalism and algorithmic constitutionalism. Paradoxically, attempts to subject AI systems to external deliberative control may also enable AI agents to intervene in that process, potentially undermining its purpose. The article concludes by considering the implications of this argument for the European Digital Services Act, which entered into force in October 2022.

06.
medRxiv (Medicine) 2026-06-22

Vaccine introductions in the WHO African Region, 2023-26: a country-level ecological analysis by Gavi eligibility and conflict-affected status

Background. The Immunization Agenda 2030 (IA2030) tracks new and underused vaccine introduction as an access metric, and its mid-term review calls for stronger country ownership, prioritisation, data use and tailored support in conflict-affected and resource-constrained settings; however, national launch status does not measure recurrent financing, implementation, safety or equity. We examined how recent vaccine-introduction activity was distributed across the WHO African Region. Methods. We conducted a descriptive country-level ecological analysis of all 47 Member States from January 2023 to June 2026. The country was the unit of analysis and contributed one cumulative, unweighted count of nationally endorsed vaccine-introduction and programme-change events. Counts were linked to Gavi eligibility, World Bank FY26 conflict-affected status, broader fragile and conflict-affected situation status in sensitivity analysis, and concurrent system-performance indicators, and modelled with Poisson regression using HC1 robust standard errors. Two Expanded Programme on Immunization (EPI) manager survey waves were summarised at country level. Reporting followed STROBE and RECORD. Results. Seventy-two events were recorded across 38 of 47 Member States: 48 new-antigen introductions, 20 dose or schedule expansions and four combination-vaccine introductions; malaria vaccines accounted for 21. Gavi-eligible conflict-affected countries averaged 2.50 events per country versus 1.27 in both comparison groups. Gavi-eligible conflict-affected status was associated with a higher count (incidence rate ratio [IRR] 1.97, 95% confidence interval [CI] 1.38-2.81; p

07.
arXiv (CS.CV) 2026-06-16

MVEB: Massive Video Embedding Benchmark

We introduce the Massive Video Embedding Benchmark (MVEB), a 23-task benchmark for video embeddings spanning classification, zero-shot classification, clustering, pair classification, retrieval, and video-centric question answering. We evaluate 33 models and find that no single model dominates: MLLM-based embeddings lead on classification, clustering, pair classification, and QA; multimodal binding leads on retrieval and zero-shot classification; generative MLLMs without contrastive adaptation collapse on cross-modal tasks. Paired video-only vs. audio+video evaluations show that audio's contribution depends on dataset annotation provenance: audio helps when labels were produced from both modalities and hurts when they were produced from visuals alone, a six-point gap consistent across model families. MVEB is derived from MVEB+, a 184-task pool, and is designed to maintain task diversity while reducing evaluation cost. It integrates into the MTEB ecosystem for unified evaluation across text, image, audio, and video. We release MVEB and all 184 tasks along with code and a leaderboard at https://github.com/embeddings-benchmark/mteb.

08.
arXiv (CS.CV) 2026-06-12

SAM-Deep-EIoU: Selective Mask Propagation for Multi-Object Tracking

Multi-object tracking has a heavy-tailed difficulty distribution: most frames are easy for a lightweight base tracker, while a small fraction are intrinsically hard. Video object segmentation (VOS) models can often preserve identity through the hard frames where the base tracker fails, but they are much more expensive in compute and memory. We propose selective mask propagation, a tracking algorithm that dispatches from a base tracker to a VOS model only on windows where an assignment-uncertainty signal fires. The base tracker's output is modified only when the VOS model makes a confident prediction that contradicts the base tracker's identity assignment; weak or inconclusive predictions preserve the base output. The method is training-free, treats both the base tracker and the VOS model as black boxes, and can benefit from replacing the VOS component with a more capable model. On DanceTrack, selective mask propagation improves three different base trackers. On SportsMOT, where identity preservation is central to sports analytics, SAM3-Deep-EIoU with global track association achieves state-of-the-art performance on the benchmark with 86.8 HOTA.

09.
arXiv (CS.CV) 2026-06-11

DIRECT: When and Where Should You Allocate Test-Time Compute in Embodied Planners?

Vision-Language Models (VLMs) are increasingly deployed as high-level planners for embodied agents, with an emerging strategy of scaling test-time compute to improve capability. However, we observe that doing so increases latency, token usage, and FLOPs while yielding uneven, often diminishing gains in downstream success, limiting where embodied agents can be deployed. We argue that choosing when and where to spend test-time compute is central to bringing frontier performance to the real world. We introduce DIRECT, a routing framework that uses multimodal scene context to allocate compute per prompt, improving the success–cost Pareto frontier over fixed model selection. Across three dominant scaling axes, namely chain-of-thought depth, model size, and memory history, our experiments on VLABench and RoboMME show that test-time compute is not a uniform lever: different axes yield qualitatively distinct capability gains. We validate these insights on a physical Franka arm in a DROID setup spanning zero-shot manipulation and long-horizon chaining, where our router matches or exceeds a stronger model's success rate at up to 65% lower average latency. Ultimately, our results show that naively scaling test-time compute is wasteful, and that DIRECT can provide frontier-level embodied planning in robotic systems at a fraction of the cost. Project page can be found at jadee-dao.github.io/direct/.

10.
bioRxiv (Bioinfo) 2026-06-18

Looking beyond stereotyped neuron structures reveals links between beading and morphological rearrangements in aging phenotypes.

Understanding how neuronal morphology changes during aging and acute stress is essential for elucidating mechanisms of neurodegeneration. The highly branched PVD neuron of Caenorhabditis elegans provides a powerful model for studying dendritic remodeling and degeneration-associated phenotypes such as dendritic beading. However, the complexity of this arbor presents substantial challenges for automated segmentation and quantitative analysis. In this study, we adapted a convolutional neural network (CNN)-guided region growing framework for automated dendrite tracing, coupled with two topology-based algorithms for categorizing dendritic segments by branching degree. The segmentation algorithm achieved high accuracy relative to manual tracing, with a median Dice coefficient of 0.82, while reducing analysis time by approximately tenfold. Automated dendrite categorization demonstrated strong agreement with manual annotations across branching orders, though position-based mapping performance declined with age due to progressive morphological distortion. Leveraging this platform, we investigated mechanistic differences in dendritic beading patterns observed during aging and cold shock. Consistent with prior work, aging was associated with decreased inter-bead spacing, whereas cold shock produced increased bead dispersion with stress severity. Structural analysis revealed that these trends were not driven by dendritic pruning or reduced arbor complexity. Instead, while a traditional anatomically unflexible paradigm falsely implicated lower-degree dendrites as highly vulnerable, our branching-informed framework revealed that age-dependent beading is fundamentally dictated by a segments history of successive branching events. Conversely, acute cold shock triggered systemic beading that expanded across all dendritic orders in a severity-dependent manner. Together, these findings demonstrate that chronic aging and acute stress engage distinct degenerative pathways (compartment-specific lineage vulnerability versus global architectural collapse) rather than gross morphological loss, as well as highlighting the need for paradigms that enable reliable analysis of changing morphologies.

11.
Nature (Science) 2026-06-10

Efficient and accurate neural-field reconstruction using resistive memory

Authors:

Applications such as medical imaging, augmented and virtual reality, and embodied artificial intelligence (AI) depend on the ability to reconstruct complex signals from sparse observations. These applications are characterized by incomplete measurements and limited computational resources. Traditional approaches to digital hardware face the following challenges: explicit signal representations require heavy sampling and storage, data movement across the von Neumann bottleneck dominates energy and latency, and CMOS (complementary metal–oxide–semiconductor)-based circuits offer limited parallel efficiency. Here we present a software–hardware co-optimization framework for sparse-input signal reconstruction. At the software level, we use neural fields1 to implicitly represent signals using neural networks, which are further compressed by low-rank decomposition and structured pruning. At the hardware level, we design a resistive-memory-based computing-in-memory platform, featuring a Gaussian encoder and a multi-layer perceptron processing engine. The Gaussian encoder leverages the intrinsic stochasticity of resistive memory for efficient encoding, whereas the processing engine enables precise weight mapping through a hardware-aware quantization circuit. On a 40-nm 256 Kb resistive-memory macro, the system delivers 23.5×, 21.0× and 32.3× gains in projected energy efficiency, together with 10.8×, 38.8× and 6.2× gains in projected parallelism, for three-dimensional computed tomography sparse reconstruction, novel view synthesis and dynamic-scene novel view synthesis, without compromising on reconstruction quality. This work advances AI-driven signal reconstruction technology and paves the way for future efficient and robust medical AI and three-dimensional vision applications. A co-optimized AI hardware–software system using resistive-memory computing improves energy efficiency and parallelism for sparse signal reconstruction in imaging and three-dimensional vision applications.

12.
arXiv (CS.LG) 2026-06-15

Zero-shot generalization of transformer neural operators to larger domains

arXiv:2606.14597v1 Announce Type: new Abstract: Transformer-based neural operators have shown remarkable performance for approximating solution operators of partial differential equations on complex geometries. However, existing approaches implicitly assume a fixed domain size, which limits their ability to generalize at inference. In this work, we investigate domain extension, namely zero-shot inference on spatial domains that are significantly larger than those encountered during training. We argue that this setting fundamentally requires spatial locality and translation equivariance. We propose to implement this locality via a decomposable bias in the attention logits computation, enabling finely controllable locality while remaining fully decomposable into query-key inner products and directly compatible with optimized attention kernels. Combined with rotary positional embeddings, it enables expressive embeddings with controllable spatial support without altering the transformer architecture. We empirically show that our approach substantially improves zero-shot generalization to larger domains across two PDE benchmarks and a 3D industrial atmospheric flow application. Our code and datasets are available at https://github.com/cerea-daml/domain-extension.

13.
arXiv (CS.AI) 2026-06-11

DuoBench: A Reproducible Benchmark for Bimanual Manipulation in Simulation and the Real World

arXiv:2606.11901v1 Announce Type: cross Abstract: Bimanual robot systems substantially expand manipulation capabilities, but coordinating two arms introduces additional control complexity and failure modes that are not well captured by existing benchmarks. We introduce DuoBench, an extensible benchmarking framework for bimanual manipulation policies on the FR3 Duo platform. DuoBench comprises eleven tasks spanning four coordination categories, implemented in simulation and partially reproduced in the real world through reproducible task recipes with 3D-printable assets. In addition, we propose a stage-based evaluation scheme that supports fine-grained semantic failure analysis beyond binary success and provide human-teleoperated datasets for all benchmark tasks. We benchmark several dual-arm imitation-learning and vision-language-action policies in simulation and on real hardware. Our results show that current policies remain challenged by bimanual manipulation, particularly in early interaction stages, parallel arm execution, and transfer between simulation and real-world settings. DuoBench provides a reproducible testbed for diagnosing these failure modes and studying future methods for dual-arm policy learning. Code, datasets, and videos are available at https://duobench.github.io/

14.
arXiv (quant-ph) 2026-06-16

Real-space spectral functions of three-dimensional billion-size topological non-Hermitian matter with tensor networks

arXiv:2606.16424v1 Announce Type: cross Abstract: Non-Hermitian systems host a wide range of unconventional topological phenomena while large-scale simulations in finite three dimensional systems remain challenging because of the rapidly growing number of sites. In particular, higher-order topological corner modes are often studied only in small lattices, where strong finite-size effects can mask their intrinsic behavior. Here, we develop a tensor-network framework that combines quantics tensor cross interpolation with the kernel polynomial method, enabling compact representations of large non-Hermitian tight-binding Hamiltonians and direct calculations of real-space spectral functions for systems exceeding one billion lattice sites. Using this approach, we investigate three-dimensional non-Hermitian higher-order topological insulators with with structured real-space geometries. The unprecedented system size enables direct access to the macroscopic regime and allows corner-mode spectral responses to be resolved in genuinely three-dimensional systems.By tuning the loss strength, we identify distinct in-gap corner modes across weak- and strong-loss regimes.Our results establish tensor-network algorithms as a powerful strategy to perform real-space spectral calculations in exceptionally large non-Hermitian systems.

15.
arXiv (CS.AI) 2026-06-18

DeepInflation: an AI agent for research and model discovery of inflation

arXiv:2601.14288v2 Announce Type: replace-cross Abstract: We present DeepInflation, an AI agent designed for research and model discovery in inflationary cosmology. Built upon a multi-agent architecture, DeepInflation integrates Large Language Models (LLMs) with a symbolic regression (SR) engine and a retrieval-augmented generation (RAG) knowledge base. This framework enables the agent to automatically explore and verify the vast landscape of inflationary potentials while grounding its outputs in established theoretical literature. We demonstrate that DeepInflation can successfully discover simple and viable single-field slow-roll inflationary potentials consistent with the latest observations (with the ACT DR6 results taken as an example) or any given $n_s$ and $r$, and provide accurate theoretical context for obscure inflationary scenarios. DeepInflation serves as a prototype for a new generation of autonomous scientific discovery engines in cosmology, which enables researchers and non-experts alike to explore the inflationary landscape using natural language. This agent is available at https://github.com/pengzy-cosmo/DeepInflation.

16.
arXiv (CS.CV) 2026-06-11

CellNet – Localizing Cells using Sparse and Noisy Point Annotations

Counting living cells is an important step in many biological research workflows. Our collaborators at the Wellcome Sanger Institute study vital genes in humans via large scale saturation genome editing screening, which requires repeatedly counting cells a great number of times. Computer Vision based automation is crucial for high throughput and resource efficiency. In this work, we develop a regression-based deep learning computer vision algorithm to detect and count cells in phase-contrast microscopy images. To reduce annotation effort, which in practice often becomes a bottleneck, we focus on counting cells only using sparse point annotations, which are fast and easy to acquire. By comparison to state-of-the-art 0-shot methods, we show that regression-based counting is a promising alternative in low data regimes. Through developing methods to automatically count living cells in microscopy images, we contribute to valuable research on the human genome. The code is available at https://github.com/beijn/cellnet.

17.
bioRxiv (Bioinfo) 2026-06-15

SMLMFlow: Improving Structural Resolution in Single Molecule Localization Microscopy with Flow Matching

While Single Molecule Localization Microscopy (SMLM) aims to generate precise coordinates of molecular targets in cells, the resulting point clouds are inherently blurred by additive noise sources across the experimental, imaging, and processing workflow. This blurring often limits SMLM's ability to accurately quantify complex assembled structures required to address biological issues, despite reported localization precision down to a couple of nanometers. Here, we present SMLMFlow, a machine learning framework for improving structural resolution in SMLM datasets that combines a graph neural network and a hierarchical transformer with flow matching. We show that SMLMFlow improves structural resolution and downstream quantification across different structures, including filaments and protein nano-clusters, and generalizes to new unseen photophysics models.

18.
arXiv (CS.LG) 2026-06-12

The Metric Picks the Winner: Evaluation Choice Flips Model Rankings for Drug-Response Prediction in Unseen Chemistry

arXiv:2606.12639v1 Announce Type: new Abstract: Predicting how a cell's transcriptome responds to a drug it has never seen is a core, hard problem in computational cell biology: recent benchmarks show complex models often fail to beat trivial baselines once test compounds are held out by chemistry. We study one cell line and assay, THP-1 cells profiled by DRUG-seq, scored by the active-compound weighted MSE(wMSE) of the VCPI prediction contest. We propose a staged approach: dumb baselines (untreated control and mean training-compound response) that the field keeps failing to beat; non-parametric retrieval (a Tanimoto-weighted average of a held-out compound's nearest training compounds); and a fusion stage combining a frozen chemistry embedding with retrieval-support features to predict the residual over the mean, with an uncertainty head and gene programs. On the released VCPI THP-1 drug-seq data (14,026 training compounds), under a Bemis-Murcko scaffold split, the model ranking inverts depending on the metric. Under an inverse-variance per-gene proxy, a regularized linear regression on Morgan fingerprints appears to win over the deep models, retrieval, and ChemBERTa – the textbook "simple baselines win" result. But under the contest's true active-set metric (per-(gene, compound) Mejia weights, validated against the official scorer; mean baseline 0.535 vs the organizers' 0.507 reference), that reverses: the deep models win, our fusion decoder significantly beats the linear fingerprint baseline (-0.012 wMSE, paired bootstrap p < 10^-4), and the proxy's winner becomes the worst chemistry-aware predictor. Picking the metric picks the winner – to our knowledge the first demonstration on real held-out drug chemistry of the metric-calibration effect established largely on genetic perturbation. We release a reproducible pipeline wired to the official scorer that emits a valid submission over the real 1064 x 12,995 grid.

19.
arXiv (CS.LG) 2026-06-16

Enhancing Physics-Informed Neural Networks Through Feature Engineering

arXiv:2502.07209v4 Announce Type: replace Abstract: Physics-Informed Neural Networks (PINNs) seek to solve partial differential equations (PDEs) with deep learning. Mainstream approaches that deploy fully-connected multi-layer deep learning architectures require prolonged training to achieve even moderate accuracy, while recent work on feature engineering allows higher accuracy and faster convergence. This paper introduces SAFE-NET, a Single-layered Adaptive Feature Engineering NETwork that achieves orders-of-magnitude lower errors with far fewer parameters than baseline feature engineering methods. SAFE-NET returns to basic ideas in machine learning, using Fourier features, a simplified single hidden layer network architecture, and an effective optimizer that improves the conditioning of the PINN optimization problem. Numerical results show that SAFE-NET converges faster and typically outperforms deeper networks and more complex architectures. It consistently uses fewer parameters – on average, 65% fewer than the competing feature engineering methods – while achieving comparable accuracy in less than 30% of the training epochs. Moreover, each SAFE-NET epoch is 95% faster than those of competing feature engineering approaches. These findings challenge the prevailing belief that modern PINNs effectively learn features in these scientific applications and highlight the efficiency gains possible through feature engineering.

20.
arXiv (CS.LG) 2026-06-12

Feature-preserving Latent-EnKF for Data Assimilation of Flows with Shocks

arXiv:2606.12559v1 Announce Type: cross Abstract: The ensemble Kalman filter (EnKF) is widely adopted for sequential data assimilation, but fails for solutions with discontinuities, such as shocks in compressible flows. Uncertainty in shock location induces multimodal ensemble statistics that violate the Gaussian assumptions underlying the EnKF, producing large-scale spurious oscillations in the analysis state. We introduce a feature-preserving latent-EnKF that performs the ensemble update in a learned low-dimensional latent space, where shock and flow features admit a smooth manifold representation, thereby preserving sharp features during EnKF analysis. The updated latent state is mapped back to physical state through a shared decoder for all ensemble members. The algorithm eliminates the member-specific ordered training and positivity flooring used in prior approaches. Numerical experiments on a Sod shock tube and Mach 2 shock interaction with a 2D cylinder, using sparse and noisy observations, show accurate feature recovery of shocks and contact discontinuities without spurious oscillations.

21.
bioRxiv (Bioinfo) 2026-06-12

Systematic functional annotation of thousands of BAHD acyltransferases in plant genomes using Protein Language Model and phylogenomic tools

The functional annotation of plant genes lags significantly behind their genomic annotation. Closing this gap requires thorough cataloging of reported protein activities alongside predictive methods that scale beyond sequence-similarity inference. Focusing on the BAHD acyltransferase enzyme family as a model, we assembled FuncZymeDB-BAHD, a large database of 2,705 LLM-retrieved and curated enzyme-acceptor-donor activities covering 336 BAHDs from 156 plant species, a 2-to-6-fold expansion over Swiss-Prot and prior compilations. We further developed FuncPred-OG, which maps queries to orthologous groups and previously characterized enzymes in FuncZymeDB-BAHD, returning hits with high evidence provenance. FuncPred-OG enabled functional prediction of over half of BAHDs across 85 plant proteomes, of which five novel predictions were validated via in vitro assays and recent studies. For the remaining BAHDs without FuncPred-OG annotation, we developed FuncPred-AI, where logistic-regression classifiers trained on protein language model embeddings achieved high Area-Under-the-Precision-Recall-curve (AUPR) scores and correct-hit rates up to 93%. FuncPred-AI yielded >1 probable donor/acceptor annotation for 99.9% (8894/8897) of BAHDs in our pan-plant dataset. Finally, the FuncPred workflow and datasets were deployed on a web portal for broader utilization, potentially reducing experimentalist efforts for selecting candidates from days to minutes. Overall, this framework provides a generalizable template for functional annotation of entire enzyme families.

22.
arXiv (CS.AI) 2026-06-19

Hard or Just Unreached? Diagnosing the Sampling Blind Spot in Math-Reasoning Difficulty Estimation

arXiv:2606.19636v1 Announce Type: cross Abstract: Math and science reasoning benchmarks rely on pass@k, the fraction of sampled chains that reach gold, as the canonical per-example difficulty signal. The same signal drives RL with verifiable rewards, math data curation, synthetic curricula, and verifier training. We show this proxy has a persistent blind spot on its hardest stratum: on the eight free-form math cells we test (GSM8K and MATH across four open-weight models), 10.3-22.9% of the examples that no sampling seed solves in six tries are instead solved at matched compute by a six-chain deterministic regime. These are greedy decoding plus five cheap residual-stream perturbations applied via activation grafting, while greedy alone solves at most 6% on these math cells. Recovery scales with the additional budget, across perturbations whose mechanistic distinctness we verify across all twelve cells (cross-kind fix-set Jaccard

23.
arXiv (CS.AI) 2026-06-17

Learning Cardiac Electrophysiology Digital Twins Through Agentic Discovery of Hybrid Structure

arXiv:2606.18154v1 Announce Type: new Abstract: Building personalized cardiac electrophysiology (EP) digital twins requires identifying the appropriate model structure for each patient, not merely fitting parameters. Traditional methods rely on experts to manually prescribe hybrid physics-neural architectures, which requires deep domain expertise and does not transfer across patients. Recent works have applied large language models (LLMs) to generate or act as hybrid models. However, despite their promising generalization capacity, these LLM-based methods lack the structural priors needed for stable cardiac simulations. Hence, we propose LEADS, a framework that formulates cardiac EP domain knowledge as a structured action space and utilizes an LLM agent to discover hybrid models. The agent follows an iterative reasoning-and-action loop to select, combine, and refine hybrid models, whilst gradient descent handles parameter fitting. The proposed LEADS designs every candidate model towards physically grounded, interpretable, and numerically stable, while allowing open-ended architectural discovery. We validate LEADS on synthetic data with three ground-truth reaction models and on real cardiac EP data, demonstrating that it outperforms both human-designed hybrid models and other LLM-based hybrid modeling.

24.
arXiv (CS.CL) 2026-06-11

Notes2Skills: From Lab Notebooks to Certainty-Aware Scientific Agent Skills

Scientific discovery workflows usually contain and rely heavily on lab notes, where researchers record observations, interpret uncertain results, and plan follow-up experiments. Such informative lab notes preserve evolving scientific reasoning and author uncertainty, rather than polished final results exhibited in publications, providing a valuable opportunity for AI to engage in scientific exploration at a more comprehensive and deeper level. However, most prior work on scientific text focuses on papers, protocols, or structured databases, leaving informal laboratory notes underexplored as inputs to AI agents for science. This gap matters because lab notes often intermingle validated observations, tentative judgments, and possible experimental next steps within the same passage. If these signals are conflated, an AI agent may mistake uncertain scientific judgments for confirmed conclusions or executable actions. To this end, we present Notes2Skills, a two-stage framework for turning lab notebooks into verifiable skills for scientific AI agents while preserving the author's certainty. Across seven conditions and three wet-lab sessions, Notes2Skills is the only configuration that neither mistakes uncertain notes for firm instructions nor discards firm ones. We show that certainty preservation is the missing piece between lab notebooks and reliable agent skills, opening a path toward safer AI co-scientist systems.

25.
arXiv (CS.CL) 2026-06-17

Scaling Enterprise Agent Routing: Degradation, Diagnosis, and Recovery

Production LLM assistants route user requests to growing libraries of specialized tools, but how does routing accuracy degrade as the catalog scales? We study single-step routing on a 110-agent, 584-tool catalog from a deployed enterprise productivity assistant, evaluating three frontier models from 10 to 110 agents. Routing F1 on under-specified requests drops 16–23 percentage points across models. An oracle analysis decomposes the degradation into a retrieval gap (the model cannot surface the right tool) and a confusion gap (even with perfect retrieval, the oracle ceiling drops 10pp). Embedding-based shortlisting recovers +10–11pp F1 at full scale across all three models and two providers. A production annotation study (1,435 human-labeled utterances, three annotators) confirms the recovery on real traffic at +10–17pp despite 10–15pp lower absolute performance.