Academic Intelligence · Curated Daily

探索全球前沿学术脉络

AcademicHub 汇聚顶级期刊与预印本平台的实时文献。定制您的专属科研雷达,利用大语言模型自动生成交叉领域文献分析简报。

01.
arXiv (CS.AI) 2026-06-16

Retro-Expert: Collaborative Reasoning for Interpretable Retrosynthesis

arXiv:2508.10967v3 Announce Type: replace-cross Abstract: Retrosynthesis prediction aims to infer the reactant molecules based on a given product molecule, which is a fundamental task in chemical synthesis. However, existing methods rely on a static pattern-matching paradigm, which limits their ability to perform effective logical decision-making from chemical data, leading to a black-box process. We propose Retro-Expert, an interpretable retrosynthesis framework that performs collaborative reasoning by combining the complementary strengths of Large Language Models and specialized models via pure reinforcement learning. It outputs natural language explanations grounded in chemical logic through three components: (1) specialized models provide chemical knowledge that is distilled into a high-quality chemical decision space, (2) LLM-driven critical reasoning to generate predictions with an interpretable reasoning path, and (3) knowledge-grounded policy optimization refines the interpretable decision policy. Experiments show that Retro-Expert surpasses both LLM-based and specialized models across different metrics, while generating chemically grounded explanations that enhance chemists' trust in practice. The source code for this paper is available at https://github.com/MagixRab-ll/Retro-Expert.

02.
arXiv (math.PR) 2026-06-12

Diffusion approximations for interacting stochastic systems with reflection and control

arXiv:2601.05895v2 Announce Type: replace Abstract: We study diffusion approximations for a class of interacting stochastic systems with reflection and control. Motivated by interacting stochastic dynamics subject to feedback mechanisms and boundary constraints, we consider diffusion-scaled stochastic processes incorporating stochastic fluctuations, state-dependent interactions, and reflection. Under suitable assumptions, we establish convergence in distribution of the scaled processes to systems of interacting reflected stochastic differential equations of Ornstein-Uhlenbeck type. The limiting dynamics capture key features of constrained multi-agent systems, including mean-reverting behavior, interaction effects, and confinement within bounded domains through Skorokhod reflection. The analysis combines diffusion-scaling arguments, stability estimates, and continuity properties of the Skorokhod map to connect discrete stochastic systems with their reflected diffusion limits. To illustrate the framework, we present numerical examples motivated by crowd dynamics and neural population dynamics. The simulations demonstrate qualitative agreement between the finite stochastic systems and the corresponding reflected diffusion models and illustrate how diffusion approximations can provide tractable descriptions of interacting stochastic systems with constraints.

03.
arXiv (quant-ph) 2026-06-16

Hyperinvariant Spin Network States – An AdS/CFT Model from First Principles

arXiv:2510.06602v2 Announce Type: replace Abstract: We study the existence and limitations of hyperinvariant tensor networks incorporating a local SU(2) symmetry. As discrete implementations of the anti de-Sitter/conformal field theory (AdS/CFT) correspondence, such networks have created bridges between the fields of quantum information theory and quantum gravity. Adding SU(2) symmetry to the tensor network allows a direct connection to spin network states, a basis of the kinematic Hilbert space of loop quantum gravity (LQG). We consider a particular situation where the states can be interpreted as kinematic quantum states for three-dimensional quantum gravity. We show that important aspects of the AdS/CFT correspondence are realized in certain quantum states of the gravitational field in LQG, thus justifying, from first principles, a class of models introduced by [F. Pastawski et al., JHEP 06, 149 (2015)]. We provide examples of hyperinvariant tensor networks, but also prove constraints on their existence in the form of no-go theorems that exclude absolutely maximally entangled states as well as general holographic codes from local SU(2)-invariance. We calculate surface areas as expectation values of the LQG area operator and discuss further possible constraints as a consequence of a decay of correlations on the boundary.

04.
arXiv (CS.AI) 2026-06-17

Dissecting model behavior through agent trajectories

arXiv:2606.17454v1 Announce Type: new Abstract: AI agent performance is not just a modeling problem, it is fundamentally a systems problem. The advanced capabilities of models are realized through agent harnesses. Therefore, a gap between model assumptions and harness behavior can easily prevent the model's full capabilities from translating into agent performance. We formalize this as the `intent-execution' gap: the mismatch between what the model intends and what the harness executes, and vice versa. We argue that minimizing this intent-execution gap is as important as other aspects of harness design such as tools and execution loops. To illustrate the impact of this harness-model alignment, we develop a simple and customizable harness called `Simple Strands Agent' (SSA). SSA aims to find the bulk of common patterns which generalize across different model families (such as Claude, Gemini, GPT, Grok, Qwen), as well as a small number of model-specific preferences. We make two contributions: (i) we $reproduce or improve on the pass@1$ performance reported by diverse model-provider families on popular agentic benchmarks (SWE-Pro, SWE-Verified and Terminal-Bench-2), and (ii) building on an $analysis of 138k trajectories generated by SSA$, we look beyond the $\texttt{pass@1}$ numbers which tend to be relatively even across frontier models. By representing agent trajectories in code state-spaces, we observe model-level differences in problem-solving behavior. Finer-grained metrics such as edit frequency, testing activity, and phase-transitions reveal how individual models allocate effort across different stages of autonomous problem solving.

05.
bioRxiv (Bioinfo) 2026-06-18

Trajectory inference of epithelial-centered neighborhood profiles reconstructs a pseudo-temporal continuum in idiopathic pulmonary fibrosis

Idiopathic pulmonary fibrosis (IPF) is characterized by complex lung architecture and spatially heterogeneous remodeling, which have hindered integrated analysis of cell-intrinsic activity and intercellular communication during disease progression. Here we profiled six IPF lung specimens comprising more than 630,000 cells using the Xenium 5k panel and developed an epithelial-centered neighborhood profiling framework based on the local cellular composition around each epithelial cell. This approach captured fibrosis-associated variation in epithelial niches without requiring predefined histological regions. Pseudo-temporal continuum inference of these profiles reconstructed a continuous axis that reflected the spatial progression of fibrotic remodeling from relatively preserved alveolar regions to fibrotic and airway-like remodeled regions. Within this spatial dataset, we mapped coordinated changes in epithelial states, local microenvironments, epithelial intracellular pathway activities, and directional interactions with neighboring cell types along the same axis. Our findings provide a spatial framework that generates testable hypotheses for progressive epithelial niche remodeling in IPF.

06.
arXiv (CS.LG) 2026-06-18

ToolChain-CRC: Conformal Risk Control for Agentic AI Under Retrieval and Tool-Use Drift

arXiv:2606.18467v1 Announce Type: cross Abstract: Modern AI agents retrieve documents, call tools, check intermediate information, and then produce a final answer or action. This creates a risk-control problem that is not visible from the final answer alone. A final response may look acceptable even when the retrieval was weak, a tool output was wrong, or an earlier step was unsupported. We propose ToolChain-CRC, a conformal risk-control method for retrieval-augmented and tool-using agents under drift. The method treats each agent run as a full trajectory of actions, observations, and final output. It builds step-level risk scores, combines them into a trajectory risk score, calibrates an accept-or-intervene rule, and adds an anytime alarm that can stop risky runs before the final answer. We prove trajectory-level risk control under exchangeable calibration runs, give a drift-aware extension with auditable constants, and prove an anytime escalation rule through a supermartingale construction. Experiments cover synthetic tool-chain drift, RAG/tool-use stress tests, public SQuAD-derived retrieval tasks, an API-free agentic QA case study, ablations, target-risk sensitivity checks, 20-seed robustness checks, a drift-margin audit, and a live RAG/tool-use agent benchmark. Across these settings, final-answer-only calibration can miss retrieval and tool failures, while trajectory-level calibration keeps accepted-trajectory risk below the target.

07.
Nature (Science) 2026-06-10

Daily briefing: Ancient ground squirrels ate like ‘zombies of the Pleistocene’

作者:

Evidence from fossilized poo reveals the diverse diet of ancient ground squirrels. Plus, the science behind the peptide craze and our innate tendency to wander anticlockwise. Evidence from fossilized poo reveals the diverse diet of ancient ground squirrels. Plus, the science behind the peptide craze and our innate tendency to wander anticlockwise.

08.
arXiv (CS.LG) 2026-06-16

Self-Supervised Learning of Iterative Solvers for Constrained Optimization

arXiv:2409.08066v3 Announce Type: replace Abstract: The real-time solution of parametric optimization problems is critical for applications that demand high accuracy under tight real-time constraints, such as model predictive control. To this end, this work presents a learning-based iterative solver for constrained optimization, comprising a neural network predictor that generates initial primal-dual solution estimates, followed by a learned iterative solver that refines these estimates to reach high accuracy. We introduce a novel loss function based on Karush-Kuhn-Tucker (KKT) optimality conditions, enabling fully self-supervised training without pre-solved optimizer solutions. Theoretical guarantees ensure that the training loss function attains minima exclusively at KKT points. A convexification procedure enables application to nonconvex problems while preserving these guarantees. Experiments on two nonconvex case studies demonstrate speedups of up to one order of magnitude compared to state-of-the-art solvers such as IPOPT, while achieving orders of magnitude higher accuracy than competing learning-based approaches.

09.
arXiv (CS.LG) 2026-06-12

Physics-Informed Neural Networks for Chemotherapy Pharmacokinetics: Benchmarking the Clinical Estimator and Exposing Parameter Identifiability

arXiv:2606.12658v1 Announce Type: new Abstract: Physics-Informed Neural Networks (PINNs) are an attractive tool for partial-observation problems in biology, where the governing dynamics are known but some compartments cannot be measured. Chemotherapy pharmacokinetics (PK) is a clean instance: drug concentration in plasma is routinely measured, but concentration in tissue – which determines tumour kill and off-target toxicity – is not. We benchmark a PINN against the standard clinical baseline (nonlinear least-squares on the analytical biexponential plasma solution, hereafter NLS) and a physics-agnostic neural baseline (a data-only MLP) on two PK problems. On the linear two-compartment problem, NLS is near-optimal; the PINN matches it to within a small constant factor while also producing the tissue curve in a single training pass, whereas the data-only MLP fails on tissue by roughly 10x. On a Michaelis-Menten extension (saturable elimination), the biexponential closed form no longer exists, so NLS is mis-specified and silently returns meaningless rate constants. The PINN instead exposes a deeper fact: the Michaelis-Menten two-compartment model is non-identifiable from plasma alone, and the PINN reports this honestly by converging to a basin with k12 -> 0. Adding two sparse tissue observations largely resolves identifiability: across five seeds the PINN recovers k21 to within 1% of truth and Vmax, Km to within one standard-deviation bar, while k12 moves in the correct direction (0.02 -> 0.82) but remains ~2 sigma below truth – a recovery the closed-form NLS estimator cannot attempt at all, because its biexponential ansatz describes only plasma. Our claim is not that PINNs beat NLS. It is that PINNs offer a uniform recipe that ties the textbook estimator on the textbook problem, exposes structural identifiability that the textbook estimator hides, and absorbs heterogeneous measurements within a single loss.

10.
arXiv (CS.LG) 2026-06-19

A Hybrid GNN-FEM Framework for Phase-Field Fracture Simulation. Physics-Preserving Hybridization for Generalizable Surrogate Modeling

arXiv:2606.19378v1 Announce Type: new Abstract: Scientific machine learning (SciML) has emerged as a promising approach for accelerating simulations of complex physical systems, yet achieving physically consistent and generalizable predictions for nonlinear, history-dependent problems remains a central challenge. In this study, we propose a hybrid GNN–FEM framework for efficient and generalizable phase-field fracture modeling. While phase-field approaches provide a robust variational framework for simulating complex crack evolution, their high computational cost limits practical applications because they require solving coupled, nonlinear, and history-dependent systems within an incremental finite element procedure. To address this challenge, a graph neural network surrogate is integrated into the conventional staggered scheme, replacing the phase-field update at each load increment while retaining the FEM-based displacement solver to enforce mechanical equilibrium and boundary conditions. By preserving the incremental solution structure, the framework remains consistent with history-dependent fracture evolution without requiring the surrogate to approximate the full solution trajectory. This selective surrogate strategy emphasizes the identification of a physically meaningful and incrementally structured learning target, rather than relying on brute-force data generation to learn the full fracture process. The proposed framework achieves strong generalization across varying geometries, loading conditions, material properties, and discretizations through dimensionless feature design, a graph-based formulation on mesh-based domains, and a physics-informed loss derived from the governing phase-field equation. Numerical experiments demonstrate that the hybrid approach reduces computational cost while maintaining accuracy compared with conventional FEM, and exhibits robust predictive performance across diverse problem settings.

11.
medRxiv (Medicine) 2026-06-15

Pulmonary extracellular vesicles drive alveolar macrophage dysfunction via microRNA transfer in Acute Respiratory Distress Syndrome

Background: Alveolar macrophage (AM) dysfunction contributes to Acute Respiratory Distress Syndrome (ARDS) pathogenesis. We investigated the role of extracellular vesicles (EVs) in mediating this dysfunction. Methods: Pulmonary EVs were isolated from broncho-alveolar lavage and non-directed bronchial lavage samples of ventilated sepsis patients with and without ARDS, and post-operative control patients via ultracentrifugation. AMs were isolated from lung tissue resections of lobectomy patients. AMs were treated with pooled EVs for 24 hours prior to functional, metabolic and autophagy profiling. EV cargo was profiled via small RNA transcriptomics and proteomics. Mechanistic role of EV microRNAs was assessed via mimic / antagomir transfection. Results: Pulmonary EVs from sepsis patients with ARDS impaired AM efferocytosis, and control EVs had no effect. ARDS EV treatment enhanced AM mitochondrial-linked respiration, but not glycolysis. ARDS EV treatment impaired LC3B-II and LAMP1 expression, indicating dysregulated AM autophagy-lysosomal machinery. Proteomics revealed downregulation of innate immune pathways in ARDS EVs. Transcriptomics revealed enrichment of 24 microRNAs in ARDS EVs; miR-652-3p was the most enriched, validated by RT-qPCR. EV miR-652-3p was associated with 90-day mortality (9.20 vs 0.59 RQ, p=0.0295) and inversely correlated with oxygenation (PaO2/FiO2). AM transfection with miR-652-3p mimic induced similar dysregulation of function and autophagy as ARDS EVs. Transfection of ARDS EVs with antagomirs to miR-652-3p prior to AM treatment partially rescued efferocytosis and autophagy. Conclusions: Targeting EV miR-652-3p may restore alveolar macrophage function and reduce excessive inflammation, thus offering a novel therapeutic strategy for patients with ARDS.

12.
arXiv (math.PR) 2026-06-18

Formation of clusters and coarsening in weakly interacting diffusions

arXiv:2510.17629v3 Announce Type: replace-cross Abstract: This paper studies the clustering behavior of weakly interacting diffusions under the influence of sufficiently localized attractive interaction potentials on the one-dimensional torus. We describe how this clustering behavior is closely related to the presence of discontinuous phase transitions in the mean-field PDE. For local attractive interactions, we employ a new variant of the strict Riesz rearrangement inequality to prove that all global minimizers of the free energy are either uniform or single-cluster states, in the sense that they are symmetrically decreasing. We analyze different timescales for the particle system and the mean-field (McKean-Vlasov) PDE, arguing that while the particle system can exhibit coarsening by both coalescence and diffusive mass exchange between clusters, the clusters in the mean-field PDE are unable to move and coarsening occurs via the mass exchange of clusters. By introducing a new model for this mass exchange, we argue that the PDE exhibits dynamical metastability. We conclude by presenting careful numerical experiments that demonstrate the validity of our model.

13.
arXiv (CS.AI) 2026-06-18

Beyond Similarity: Temporal Operator Attention for Time Series Analysis

arXiv:2605.11287v2 Announce Type: replace-cross Abstract: A persistent paradox in time-series forecasting is that structurally simple MLP and linear models often outperform high-capacity Transformers. We argue that this gap arises from a mismatch in the sequence-modeling primitive: while many time-series dynamics are governed by global temporal operators (e.g., filtering and harmonic structure), standard attention forms each output as a convex combination of inputs. This restricts its ability to represent signed and oscillatory transformations that are fundamental to temporal signal processing. We formalize this limitation as a simplex-constrained mixing bottleneck in softmax attention, which becomes especially restrictive for operator-driven time-series tasks. To address this, we propose $Temporal Operator Attention (TOA)$, a framework that augments attention with explicit, learnable sequence-space operators, enabling direct signed mixing across time while preserving input-dependent adaptivity. To make dense $N \times N$ operators practical, we introduce Stochastic Operator Regularization, a high-variance dropout mechanism that stabilizes training and prevents trivial memorization. Across forecasting, anomaly detection, and classification benchmarks, TOA consistently improves performance when integrated into standard backbones such as PatchTST and iTransformer, with particularly strong gains in reconstruction-heavy tasks. These results suggest that explicit operator learning is a key ingredient for effective time-series modeling.

14.
arXiv (CS.CL) 2026-06-11

Energy-Efficient On-Device RAG on a Mobile NPU: System Design and Benchmark on Snapdragon X Elite

Retrieval-Augmented Generation (RAG) pipelines are compute-intensive, combining embedding, retrieval, reranking, and large language model (LLM) generation. Running them entirely on-device benefits privacy, latency, and offline use, but the energy cost of CPU inference is a major barrier. We present what is, to our knowledge, the first end-to-end RAG pipeline that runs all neural stages – embedding, reranking, and LLM generation – on the Qualcomm Hexagon NPU of the Snapdragon X Elite. Profiling on a Dell XPS 13 laptop, we compare NPU-accelerated RAG against CPU and OpenCL/Adreno GPU baselines on indexing and query workloads. On indexing, the NPU achieves 9.1x higher embedding throughput and 12.3x less system energy. On a 120-query Wikipedia-passage benchmark, it delivers 18.1x faster LLM prefilling, 4.0x lower end-to-end query latency, and 4.0x less system energy than the CPU baseline; the same workload on the integrated GPU is 1.7x slower than CPU and uses 6.5x more energy than the NPU. A GPT-4.1 LLM-as-judge evaluation finds NPU answer quality on par with CPU and GPU within evaluator noise (mean 9.32 vs. 8.95 vs. 9.03 on a 1-10 rubric), with 86.7% of queries scoring identically across all three backends. On the Snapdragon X Elite / Hexagon class of laptop SoC, the NPU thus enables practical, energy-efficient on-device RAG without quality regression – a sustainable path toward green edge intelligence that we expect to generalize to comparable mobile NPUs (Apple Neural Engine, Intel NPU, MediaTek APU) as their software stacks mature.

15.
arXiv (CS.LG) 2026-06-19

Physics-Informed Neural Network with Squeeze-Excitation-like Attention

arXiv:2606.19853v1 Announce Type: new Abstract: We introduce SEA-PINN, a novel architecture that incorporates a Squeeze-Excitation-like attention mechanism into physics-informed neural networks to dynamically recalibrate the importance of neurons across layers. A key feature of SEA-PINN is its highly stable initialization. On 17 out of 20 benchmark problems, SEA-PINN exhibit nearly negligible variance and significantly reduced initial loss, establishing a quasi-deterministic and favorable starting point for optimization. Notably, without employing Fourier feature embeddings or periodic activation functions, SEA-PINN attained competitive accuracy (83\% vs. 90\% improvement relative to FNN-PINN on the high-frequency case 7) as compared with TSA-PINN-a model specifically engineered for high-frequency problems via learnable frequencies in sinusoidal activations. Furthermore, integrating SEA-PINN into TSA-PINN boosted performance by 42.49\%. These results underscore SEA-PINN as a lightweight plug-in module that enhances nonlinear representation power, promotes more robust and efficient convergence, and strengthens the overall reliability of physics-informed learning.

16.
arXiv (CS.CV) 2026-06-11

Continual Learning with Support Boundary Experience Blending

Continual learning (CL) seeks to mitigate catastrophic forgetting when models are trained with sequential tasks. A common approach, experience replay (ER), stores past exemplars but only sparsely approximates the data distribution, yielding fragile and oversimplified decision boundaries. We address this limitation by introducing Support Boundary Data (SBD), generated via differential-privacy-inspired noise into latent features to create boundary-adjacent representations that implicitly regularize decision boundaries. Building on this idea, we propose Experience Blending (EB), a framework that jointly trains on exemplars and SBD through a dual-model aggregation strategy. EB has two components: (1) latent-space noise injection to generate support boundary data, and (2) end-to-end training that jointly leverages exemplars and SBD. Unlike standard experience replay, SBD enriches the feature space near decision boundaries, leading to more stable and robust continual learning. Extensive experiments on CIFAR-10, CIFAR-100, Tiny ImageNet, and ImageNet1K demonstrate consistent accuracy improvements of 10%, 6%, 13%, 2%, respectively.

17.
medRxiv (Medicine) 2026-06-15

Non-invasive intracranial pressure waveform reconstruction with deep learning

Purpose: Continuous intracranial pressure (ICP) monitoring requires invasive instrumentation, reaching only a narrow subset of critically ill patients. We tested whether deep learning models trained on routinely acquired extracranial signals can reconstruct continuous ICP waveforms at clinically relevant accuracy in an independent external cohort. Methods: In adults admitted to the ICU at a single quaternary health system, five deep learning architectures were trained on high-frequency arterial blood pressure (ABP), photoplethysmography (PPG), and electrocardiography (ECG) waveforms, using invasive (intraparenchymal) ICP as ground truth. Two fusion strategies (early and late) and three training objectives (waveform-morphology, baseline robust regression, and weighted robust regression) were evaluated. Models were externally validated on the held-out MIMIC-III Waveform Database. Performance was assessed by mean absolute error (MAE) and waveform similarity by Pearson correlation (r). Results: We analyzed data from 158 critically ill adults (~5,322 hours) across two quaternary health systems (Johns Hopkins Hospital, Baltimore; Beth Israel Deaconess Medical Center, Boston). Validation MAE ranged from 4.276 mmHg [95% CI 4.269, 4.283] (gated recurrent, late fusion) to 4.946 mmHg [95% CI 4.938, 4.956] (attention-based, early fusion), with Pearson r ranging from 0.599 [95% CI 0.599, 0.600] to 0.722 [95% CI 0.722, 0.723]. The multiscale encoder-decoder model demonstrated the most favorable MAE-correlation tradeoff. Conclusion: This is the first demonstration that continuous ICP waveform reconstruction from bedside signals generalizes across institutions at clinically relevant accuracy, establishing a foundation for non-invasive ICP monitoring and motivating validation across broader populations and ICP ranges.

18.
medRxiv (Medicine) 2026-06-16

Wildfire pollution exposure during childhood adversely affects cognitive and neural development

Air pollution has well-documented negative cardiovascular and respiratory consequences. However, the impact of particulate matter pollution (PM2.5) on brain development is unclear. Animal studies suggest that exposure to early-life PM2.5 can cause adverse neurodevelopmental outcomes, but in vivo human work has been hampered by cross-sectional designs and heavily confounded PM2.5 exposure measures. Here we use an innovative natural experimental design to isolate the effects of wildfire pollution on neurocognitive development in a large cohort of children (N>9000, 4 waves, age 9-16). Doing so, we find that greater wildfire PM2.5 exposure is robustly associated with slower brain development and shallower cognitive improvement across early adolescence. Our study underscores the urgent public health concern that wildfire PM2.5 poses for childhood development.

19.
arXiv (CS.CL) 2026-06-16

Understanding the Behaviors of Environment-aware Information Retrieval

Recent retrieval-augmented generation (RAG) approaches have demonstrated strong capability in handling complex queries, yet current research overlooks a critical challenge: different retrievers require fundamentally different query formulation strategies for optimal performance. In this work, we present the first systematic analysis of how LLMs can learn to adapt their query formulation strategies for different retrievers via reinforcement learning (RL). Our empirical study reveals that RL effectively teaches an LLM to tailor its queries to specific retriever characteristics. We discover that different retrievers exhibit surprisingly distinct optimal query styles (e.g., descriptive vs. question-like), suggesting strategies learned for one retriever ineffective for another. We further show that performance can be enhanced by incorporating retriever-specific human guidance and by scaling model size. To facilitate learning over multi-retrieval-step trajectories, we introduce a branching-based rollout technique that improves training stability. Our work provides the first empirical evidence and actionable insights for building truly retriever-aware RAG systems. Code and resources are available at https://github.com/LCO-Embedding/Envs-aware-Information-Retrieval.

20.
arXiv (CS.LG) 2026-06-11

TacCoRL: Integrating Tactile Feedback into VLA via Simulation

arXiv:2606.11743v1 Announce Type: cross Abstract: Vision-language-action (VLA) models provide strong visual, language, and action priors for robot manipulation, but visual observations alone often miss the local contact state required for contact-rich tasks. We present TacCoRL, a scalable framework that injects Tactile feedback into VLA policies and improves them through sim-real Co-training and simulation-based reinforcement learning (RL), without requiring large-scale tactile pretraining or extensive real-world contact exploration. The key idea is not only adding touch as an input, but learning how contact readings should modulate action responses in near-failure states that are rare in demonstrations and risky to collect on hardware. We use a real-aligned simulator as a closed-loop training environment for contact interaction. Mixed simulated and real trajectories first warm-start tactile-conditioned actions in the pretrained policy. Reinforcement learning with verifiable task rewards then optimizes the policy using simulated contact rollouts. It reinforces tactile-conditioned actions that lead to task completion, while a supervised objective on real trajectories keeps the refined policy anchored to deployment visual, tactile, and action distributions. The resulting policy transfers directly to the real robot without privileged simulation state or online real-world RL. Across four bimanual contact-rich tasks, the final visuo-tactile policy achieves an average success rate of 72.5%, compared to baseline of 50.0%. Result videos and more details are available at https://tac-corl.github.io/

21.
bioRxiv (Bioinfo) 2026-06-18

Elucidating the Design Space of Generative Models for Single-Cell Perturbation Prediction

Next-token prediction has produced predictable scaling in language, but the recipe presumes a sequence of tokens with a meaningful order. Single-cell RNA-seq counts have no natural gene ordering, so applying the recipe directly to raw expression fails under an ill-suited left-to-right bias. We instead ask whether a learned latent can supply the structure the recipe needs. We introduce texttt{ExpressionVAE} (eVAE), a discrete-latent perturbation model that compresses each cell into a short sequence of discrete codes through a finite-scalar-quantization (FSQ) bottleneck and trains a perturbation-conditioned discrete prior over those codes. On Replogle and Parse~1M, eVAE sets a new state of the art on every distributional metric and leads on most cell-eval perturbation metrics, with Fr'echet distance and $mathrm{MMD}^2$ roughly $3$ to $20times$ lower than the strongest continuous-latent baseline. Swapping the prior between autoregressive and masked discrete diffusion leaves performance near-identical, isolating the gain to the discrete latent itself rather than the prior family. A decoder-head ablation then exposes a single design axis, the richness of the predictive distribution at inference, that splits the standard metrics into two groups, variance-sensitive and mean-sensitive, which move in opposite directions along the axis. Finally, on a held-out CRISPRi reversion benchmark of $1{,}732$ perturbations under inflammatory cytokine stress, the frozen eVAE encoder outperforms UMAP and differential expression and matches scGPT on perturbation ranking at a fraction of the data.

22.
arXiv (math.PR) 2026-06-15

Stability of the $k$-Plane Transform on Measures and Hölder-Type Comparisons of Wasserstein Metrics

arXiv:2605.00375v2 Announce Type: replace-cross Abstract: We establish stability estimates for the $k$-plane transform on finite positive Radon measures, with emphasis on Fourier and Wasserstein metrics. We first introduce a metric on $k$-plane transform data and prove a bi-Lipschitz stability estimate showing that this metric is equivalent to a generalized Fourier metric obtained by augmenting the Fourier distance between centered normalized measures with separate barycenter and total mass difference terms. Building on a Hölder-type comparison between Fourier and Wasserstein metrics due to Carrillo and Toscani, we extend this comparison to positive Radon measures under uniform bounds on centered moments of order slightly larger than $2$. This yields Hölder-type stability for the $k$-plane transform in a generalized $2$-Wasserstein metric and, in particular, a $W_2$-stability estimate for centered probability measures. We also compare the $2$-Wasserstein distance with its max-sliced analogue. For centered probability measures with uniformly bounded moments of order slightly larger than $2$, we prove a two-sided Hölder-type comparison between these distances. We then extend the result to positive Radon measures by applying it to centered normalized measures and adding separate barycenter and mass terms. Finally, for absolutely continuous compactly supported probability measures with bounded densities, we prove a strong equivalence between the $2$-Wasserstein distance of the measures and the $(k/2-1)$-order Sobolev norm of the $k$-plane transform data of the difference of their densities.

23.
arXiv (CS.LG) 2026-06-12

Contrastive Geometric Learning Unlocks Unified Structure- and Ligand-Based Drug Design

arXiv:2601.09693v3 Announce Type: replace Abstract: Structure-based and ligand-based computational drug design have traditionally relied on disjoint data sources and modeling assumptions, limiting their joint use at scale. In this work, we introduce Contrastive Geometric Learning for Unified Computational Drug Design (ConGLUDe), a single contrastive geometric model that unifies structure- and ligand-based training. ConGLUDe couples a geometric protein encoder that produces whole-protein representations and implicit embeddings of predicted binding sites with a fast ligand encoder, removing the need for predefined pockets. By aligning ligands with both global protein representations and multiple candidate binding sites through contrastive learning, ConGLUDe supports ligand-conditioned pocket prediction in addition to virtual screening and target fishing, while being trained jointly on protein-ligand complexes and large-scale bioactivity data. Across diverse benchmarks, ConGLUDe achieves competitive zero-shot virtual screening performance, substantially outperforms existing methods on a challenging target fishing task, and demonstrates state-of-the-art ligand-conditioned pocket selection. These results highlight the advantages of unified structure-ligand training and position ConGLUDe as a step toward general-purpose foundation models for drug discovery.

24.
arXiv (CS.CV) 2026-06-19

DeepForestVisionV2: Ecology-Driven Taxonomy Expansion for Camera-Trap Monitoring in African Tropical Forests

Camera-trap monitoring in African tropical forests increasingly extends beyond closed-canopy interiors to riverbanks, clearings, and park edges. Among available open tools for African forest camera-trap classification, DeepForestVision is the only one providing a matched offline workflow for both photographs and videos, and previous work showed that it outperformed other available baselines on a comparable benchmark. However, it was designed for closed-canopy, ground-level forest interiors and uses a 35-class prediction space that becomes too coarse when deployments encounter arboreal primates, birds, semi-aquatic taxa, or human-associated confounders such as livestock. We present DeepForestVisionV2, an ecology-driven expansion from 35 to 64 prediction classes (61 animal classes plus human, vehicle, and blank) designed to address three recurrent deployment gradients: vertical stratification, scene openness, and anthropogenic interfaces. DeepForestVisionV2 retains the same offline workflow and is trained on 1,535,010 photographs and 243,354 videos from multi-country African tropical-forest projects. Evaluation combines a cross-country cropped-photo validation set, used to assess robustness across sites and camera-trap settings, with three held-out Uganda video benchmarks spanning the targeted gradients. On the validation set, DeepForestVisionV2 reaches 0.86 accuracy, 0.82 macro-F1, and 0.81 balanced accuracy. On the deployment benchmarks, it preserves or improves baseline accuracy despite its harder classification task, while increasing the number of identified taxa from 22 to 29 in forest-interior videos and from 4 to 9 at riverbanks. In the park-edge use case, it raises accuracy from 0.62 to 0.86 and reduces false alarms from 11 to 0. These results show that DeepForestVisionV2 materially improves field utility while preserving robustness across sites, habitats, and camera-trap settings.

25.
arXiv (CS.CV) 2026-06-12

CD-RCM: Generalizable Continuous-Depth Novel View Synthesis for Reflectance Confocal Microscopy

Reflectance confocal microscopy (RCM) provides noninvasive, cellular-resolution "optical biopsies" of human skin in vivo by acquiring en-face images at successive depths, forming a sparse z-stack. Due to optical limitations, these stacks are anisotropic 3D volumes with lateral resolution (0.5 $\mu$m) $\sim$6 times higher compared to axial resolution, which is defined by the optical sectioning (3 $\mu$m), limiting the interpretation of tissue. Our goal is to provide continuous-depth visualization by interpolating intermediate sections and making the 3D volume isotropic. Such a representation permits arbitrary-direction sectioning, including histopathology-like cross-sectional examination, without requiring per-patient optimization. To that end, we introduce the first RCM-specific novel-view synthesis (NVS) approach, CD-RCM, a feedforward model that predicts realistic, unseen depths from sparsely sampled RCM stacks. Classical neural rendering methods focus on reconstruction from surface-level multi-view observations. In contrast to surface-level camera views, RCM can acquire optically sectioned en-face images of tissue beyond the surface up to 200 $\mu$m. However, during visualization of the RCM stacks, observations of the shallower sections (towards the surface) obscure the deeper ones. This unique axial imaging geometry and layer-dependent anatomical organization motivated our development of a tailored architectural and training framework that explicitly accounts for RCM's depth-resolved, occlusive imaging physics. Experiments demonstrate that CD-RCM achieves high-fidelity novel-view synthesis with sub-second inference time.