Academic Intelligence · Curated Daily

探索全球前沿学术脉络

AcademicHub 汇聚顶级期刊与预印本平台的实时文献。定制您的专属科研雷达,利用大语言模型自动生成交叉领域文献分析简报。

01.
bioRxiv (Bioinfo) 2026-06-12

From Proteome Mining to Structural Validation: Phosphopyruvate Hydratase as a Structurally Tractable Drug Target in Kinetoplastid Parasites

Chagas disease, caused by Trypanosoma cruzi, demands novel therapeutic strategies that overcome the toxicity and limited efficacy of current treatments. To address this need, herein we report an integrative, target-centric strategy that combines parasite proteome mining, structural modeling, and experimental validation. Functional enrichment and druggability analyses identified phosphopyruvate hydratase (PPH) as a promising candidate due to its essential metabolic role and limited similarity to human homologs. Notably, proteome mining revealed the presence and conservation of PPH across kinetoplastid parasites, including Leishmania donovani, supporting its evaluation beyond T. cruzi. For the selected PPH sequences, AlphaFold-derived three-dimensional models underwent extensive molecular dynamics refinement, yielding stable conformational ensembles suitable for structure-based studies. Using this validated model, virtual screening of the Latin American Natural Products Database - LANaPDB - identified aptosimon as a top-ranked compound candidate. Molecular dynamics simulations further showed ligand-dependent binding behavior, suggesting alternative binding modes distinct from the canonical substrate configuration. In vitro assays demonstrated consistent antiparasitic activity against intracellular T. cruzi amastigotes (IC50 = 3.52 ug/mL) and Leishmania donovani promastigotes (IC50 = 13.06 ug/mL), supporting the biological relevance of the aptosimon-related lignan chemotype, hinokinin, across two kinetoplastid parasite models. Together, these results support PPH as a structurally tractable and biologically relevant candidate target, while identifying an aptosimon-related lignan chemotype, represented experimentally by hinokinin, as a cross-species antiparasitic scaffold that warrants further biochemical target-validation studies.

02.
arXiv (CS.LG) 2026-06-19

Computational Methods and Challenges in Cell-Free DNA Analysis for Multi-Cancer Early Detection

arXiv:2606.20174v1 Announce Type: new Abstract: Cell-free DNA (cfDNA) is a promising avenue for non-invasive multicancer early detection (MCED), in that, it can enable multiple cancer detection simultaneously from a single blood draw, with particular sensitivity to cancers that currently lack established screening programs. Here we review the computational methods developed between 2022 and 2025 for cfDNA-based MCED. We focus on how fragmentomics and epigenetic features are extracted and analyzed to detect cancer at early stages. We first briefly outline the biological basis of cfDNA signals, then review classical statistical and machine learning approaches alongside deep learning frameworks including autoencoder-based models. For each method we discuss biological interpretability, validation strategy, and readiness for clinical integration. Furthermore, we categorize the current challenges into technical, computational, and methodological while outlining open problems in the field. This review shows that multimodal ensemble approaches have the strongest promise for clinical integration and the highest readiness. However, for better assessment of future work and side-by-side comparison, standardization of evaluation protocols and reporting results will be crucial.

03.
arXiv (CS.LG) 2026-06-11

Tensor Methods: A Unified and Interpretable Approach for Material Design

arXiv:2602.10392v2 Announce Type: replace Abstract: When designing new materials, it is often necessary to tailor the material design to have some desired properties. As the set of design parameters grow, the search space grows exponentially, making the actual synthesis and evaluation of all material combinations virtually impossible. Even using traditional computational methods such as Finite Element Analysis becomes too computationally heavy to search the design space. Recent methods use machine learning (ML) surrogate models to more efficiently determine optimal material designs; unfortunately, these methods often (i) are notoriously difficult to interpret and (ii) under perform when the training data comes from a non-uniform sampling of the design space. We suggest the use of tensor completion methods as an all-in-one approach for interpretability and predictions. We observe classical tensor methods are able to compete with traditional ML in predictions, with the added benefit of their interpretable tensor factors (which are given completely for free, as a result of the prediction). In our experiments, we are able to rediscover physical phenomena via the tensor factors, indicating that our predictions are aligned with the true underlying physics of the problem. This also means these tensor factors could be used by experimentalists to identify potentially novel patterns, given we are able to rediscover existing ones. We also study the effects of both types of surrogate models when we encounter training data from a non-uniform sampling of the design space. We observe more specialized tensor methods that can give better generalization in these non-uniforms sampling scenarios. We find the best generalization comes from a tensor model, which is able to improve upon the baseline ML methods by up to 5% on aggregate $R^2$, and halve the error in some out of distribution regions.

04.
arXiv (CS.LG) 2026-06-16

When Does q-error Predict Plan Regret? Three Regimes of Cardinality-Estimation Error

arXiv:2606.15600v1 Announce Type: cross Abstract: Cardinality-estimation (CE) research ranks estimators by q-error, yet it is well known that q-error is an imperfect proxy for query-plan quality. We give a measurement-driven account of when it is a good proxy and when it is not, and why. Modeling plan selection as an argmin over a piecewise-linear cost landscape, we find that plan regret (the cost of the chosen plan relative to the optimal, under true cardinalities) is governed by plan-cost geometry in a regime-dependent way. (i) For small errors, a true-point condition number kappa predicts regret and out-predicts q-error; its predictive power decays to zero as error grows, as a local linearization must. (ii) For large errors – where deployed learned estimators operate – an estimator-independent average-case sub-optimality measure ACS-infinity predicts which queries are regret-prone (Spearman rho ~ 0.54 on STATS-CEB), while q-error is nearly uninformative at the query level (rho ~ 0.05). (iii) The worst case is Haritsa's maximum sub-optimality (MSO). The three are one cost-ratio spectrum under three weightings. We prove a limit law ACS-infinity = sum_k r_k pi_k with cardinality-independent combinatorial weights, and validate every claim on STATS-CEB and JOB-light with four released estimators under pre-registered decision rules, and confirm on real PostgreSQL runtime that ACS-infinity predicts regret where q-error does not. The contribution is conceptual and empirical – an average-case companion to worst-case robust query optimization, and a characterization of when an accuracy metric tracks plan quality – rather than a new estimator. Code and the full pre-registration are public.

05.
arXiv (quant-ph) 2026-06-12

The Pound-Drever-Hall Method for Superconducting-Qubit Readout

arXiv:2512.03138v3 Announce Type: replace Abstract: Scaling quantum computers to large sizes requires the implementation of many parallel qubit readouts. Here we present an ultrastable superconducting-qubit readout method using the multi-tone self-phase-referenced Pound-Drever-Hall (PDH) technique, originally developed for use with optical cavities. In this work, we benchmark PDH readout of a single transmon qubit, using room-temperature heterodyne detection of all tones to reconstruct the PDH signal. We demonstrate that PDH qubit readout is insensitive to microwave phase drift, displaying $0.73^\circ$ phase stability over 2 hours, and capable of single-shot readout in the presence of phase errors exceeding the phase shift induced by the qubit state. We show that the PDH sideband tones do not cause unwanted measurement-induced state transitions for a transmon qubit, leading to a potential signal enhancement of at least $14$~dB.

06.
arXiv (CS.CV) 2026-06-12

RGB-S: Image-Aligned Tactile Saliency for Robust Dexterous Manipulation

Effective visuo-tactile integration is critical for robotic dexterous manipulation, especially when visual observations are unreliable or occluded. However, robustly aligning sparse, heterogeneous tactile measurements with dense visual representations remains a fundamental challenge. Most existing approaches require policies to learn cross-modal correspondences implicitly from limited demonstrations, without leveraging geometric priors. As a result, they are often data-inefficient and generalize poorly when visual observations are degraded. To address this limitation, we propose a framework that explicitly grounds physical contacts in the image domain. Using robot forward kinematics and camera calibration, we project tactile sensor locations directly onto the RGB image plane. We then render force-modulated Gaussian saliency maps to model spatial uncertainty arising from kinematic and calibration errors. By integrating these 2D spatial anchors through a zero-initialized conditioning architecture, our method injects physical contact priors into standard visual backbones while preserving pre-trained visual representations. We evaluate our method on six dexterous manipulation tasks in both simulation and the real world under severe visual occlusions. Real-world experiments show that explicit RGB-S grounding in the image domain improves real-world occluded manipulation success rates by $26.7$ percentage points over the strongest implicit visuo-tactile baseline, suggesting its improved spatial reasoning and robustness to occlusion. Project page: touch-as-saliency.github.io

07.
arXiv (CS.CL) 2026-06-12

When Does Mixing Help? Analyzing Query Embedding Interpolation in Multilingual Dense Retrieval

While mixed-language querying is ubiquitous in multilingual communities, the sensitivity of dense retrievers to such queries remains poorly understood. We present a ratio-controlled study on mMARCO that systematically evaluates retrieval performance by varying the mixing proportion of parallel query translations via embedding-level mixing – constructing mixed queries as an interpolation of monolingual embeddings. Experiments with BGE-M3 demonstrate that an optimal mixing ratio outperforms the best monolingual endpoint in 88/105 cases. We uncover a distinct asymmetry driven by English dominance: mixing is uniformly beneficial when retrieving from non-English document indices, whereas indices containing English are best served by pure English queries. Furthermore, English acts as the strongest mixing partner for every non-English document language. Finally, when controlling for English dominance, mixing gains correlate negatively with typological distance. We conclude that language-mix sensitivity is structured and predictable, and we validate the robustness of these patterns across model families and scales.

08.
arXiv (CS.LG) 2026-06-18

Investigating Faithfulness in Large Audio Language Models

arXiv:2509.22363v4 Announce Type: replace Abstract: Large Audio Language Models (LALMs) integrate audio encoders with pretrained Large Language Models to perform complex multimodal reasoning tasks. While these models can generate Chain-of-Thought (CoT) explanations, the faithfulness of these reasoning chains remains unclear. In this work, we propose a systematic framework to evaluate CoT faithfulness in LALMs with respect to both the input audio and the final model prediction. We define three criteria for audio faithfulness: hallucination-free, holistic, and attentive listening. We also introduce a benchmark based on both audio and CoT interventions to assess faithfulness\footnote{The benchmarking interface and evaluation results are available at https://poonehmousavi.github.io/faithfulness/. Experiments on Audio Flamingo 3 and Qwen2.5-Omni suggest a potential multimodal disconnect: reasoning often aligns with the final prediction but is not always strongly grounded in the audio and can be vulnerable to hallucinations or adversarial perturbations.

09.
arXiv (CS.CV) 2026-06-16

When Confidence Lacks Concepts: Interpretable OOD Detection via Representation Perturbations

Deep neural networks have achieved remarkable performance across medical imaging tasks, yet their tendency to overgeneralize under distributional shifts poses a major obstacle to safe clinical deployment. Out-of-Distribution (OOD) detection methods aim to mitigate this risk, but most existing approaches rely on opaque internal signals with poorly understood semantic meaning, limiting trust in safety-critical settings. In this work, we propose an interpretable OOD detection framework that probes the stability of model predictions under class-conditioned semantic perturbations. Leveraging sparse autoencoders (SAEs), we learn class-specific concept vectors from in-distribution data that disentangle dense intermediate representations into sparse, semantically meaningful components. At inference, we perturb deeper-layer representations using the concept vectors associated with the model's predicted class and measure the class logits stability. We hypothesize that in-distribution samples exhibit low sensitivity to such perturbations, as their representations align with class-specific semantic directions, whereas OOD samples show amplified deviations due to representational misalignment. By framing OOD detection as a concept conditioned stability analysis, our approach provides both a discriminative OOD signal and an interpretable lens into the internal mechanisms driving model uncertainty, making it particularly suitable for high stakes medical applications.

10.
arXiv (CS.AI) 2026-06-12

From Verdict to Process: Agentic Reinforcement Learning for Multi-Stage Fact Verification

arXiv:2606.13262v1 Announce Type: new Abstract: Recent approaches combining Large Language Models (LLMs) with retrieval-augmented reasoning have shown promise for automated fact verification. To process complex claims, these verification pipelines typically execute multi-stage workflows that coordinate tightly coupled modules, including claim decomposition, evidence gathering, and verdict prediction. However, existing methods optimize individual stages in isolation or rely on fixed heuristics, which limits adaptive coordination among stages and can lead to suboptimal outcomes. In this work, we propose ProFact, an agentic reinforcement learning framework for end-to-end optimization of multi-stage fact verification trajectories. ProFact trains a unified policy to coordinate claim decomposition, evidence seeking, answer generation, and verdict prediction. To address the sparse and delayed supervision provided by final veracity labels, ProFact introduces process-aware rewards that provide stage-level learning signals throughout the verification process. Empirical evaluation shows that ProFact consistently outperforms strong baselines in both verification performance and inference efficiency. These results highlight the effectiveness of process-aware trajectory optimization for multi-stage fact verification.

11.
arXiv (math.PR) 2026-06-16

Stochastic control with dividend payments and capital injections for Markov additive processes

作者:

arXiv:2604.00190v4 Announce Type: replace Abstract: Motivated by de Finetti's optimal dividend problem with capital injections, we study a stochastic control problem for the additive component of a Markov additive process (MAP). In contrast to previous studies, the modulating component is allowed to be a general right process on a Radon space, so the model is not restricted to finite-state regime switching and cannot in general be reduced to a finite collection of Lévy process control problems. Capital injections are allowed at arbitrary times. We first consider the case in which dividend payments are allowed only at prescribed discrete times and establish necessary and sufficient conditions for the optimality of a strategy. These conditions then yield the optimality of a class of Markov-modulated periodic–classical barrier strategies. Combining this optimality result with an approximation argument, we obtain insight into the possible form of optimal strategies in the case where dividend payments, like capital injections, may be made at arbitrary times. Because of the generality of the MAPs considered here, the proof techniques used in previous studies of similar problems are not directly applicable. We therefore develop an alternative argument based on the additive structure of MAPs and dynamic programming between dividend opportunities. The argument also suggests a possible approach to other stochastic control problems involving general MAPs.

12.
arXiv (CS.CL) 2026-06-19

Quality Over Clicks: Iterative Reinforcement Learning for Early-Stage E-Commerce Query Suggestion

Existing dialogue systems rely on query suggestion to enhance user engagement. Recent approaches mainly optimize generative models using click-through rate (CTR) models to align with user preferences. However, these methods are less effective in early-stage deployment scenarios, where click feedback is sparse and insufficient for training a reliable CTR model. To bridge this gap, we propose QualEQS, a quality-first iterative reinforcement learning framework for e-commerce query suggestion. We formalize actionable suggestion quality along three dimensions that directly affect downstream usability: answerability, factuality, and information gain. To continuously improve from online traffic without click supervision, we further propose group-level disagreement among candidate suggestions to identify ambiguous query contexts and mine hard training cases for iterative refinement. We also introduce EQS-Benchmark, a dataset of 16,949 real-world e-commerce queries for offline training and evaluation. Experiments show that our quality-based offline metrics correlate strongly with online performance, providing a practical evaluation recipe for sparse-feedback deployment. In both offline and online settings, QualEQS consistently outperforms strong baselines, yielding a 6.81% improvement in online ChatPV in a real-world enterprise-level conversational shopping assistant system.

13.
arXiv (quant-ph) 2026-06-12

Geometric Algebra Quantum Gate Decomposition

arXiv:2606.12480v1 Announce Type: new Abstract: Quantum gates are usually described through matrix and tensor-product formalisms that often obscure their geometric structure. In this work, we formulate the Pauli and Clifford groups within the complex Geometric Algebra (GA) framework. We show that the Pauli group is naturally identified with the group of blades up to a global phase, thereby providing a geometric interpretation of Pauli operators and their commutation relations in terms of oriented subspaces. We further prove that Clifford operators are generated by products of {\pi}/4-Pauli rotors and introduce a greedy Pauli rotor decomposition algorithm whose empirical behavior suggests unexpectedly compact decompositions for Clifford operators. Finally, we show that Clifford+T universality admits a natural geometric interpretation through {\pi}/8-rotors within this framework.

14.
arXiv (quant-ph) 2026-06-17

Quantum mechanics in configuration space in context

arXiv:2606.17622v1 Announce Type: new Abstract: To enhance the way in which wave-particle duality is implemented in the modelling of quantum mechanical systems, Bukhari et al. [New J. Phys. 27, 084501 (2025)] recently introduced an alternative approach to quantum mechanics, namely quantum mechanics in configuration space. This formalism is based on a physically motivated quantisation of Newtonian mechanics and promotes the classical position-velocity states (x,v) to pairwise distinguishable quantum states. The resulting |x,v> states form the basis of the Hilbert space of individual quantum mechanical particles and evolve along classical trajectories. In this paper, we consider the modelling of a mechanical particle in free space and put quantum mechanics in configuration space into context. It is shown that this formalism increases the continuity between quantum and classical mechanics by avoiding a conceptual inconsistency associated with the definition of momentum in canonical quantisation. In addition, we emphasise that standard quantum mechanics and quantum mechanics in configuration space are based on two distinct formulations of classical mechanics.

15.
arXiv (CS.CV) 2026-06-18

FashionChameleon: Towards Real-Time and Interactive Human-Garment Video Customization

Human-centric video customization, particularly at the garment level, has shown significant commercial value. However, existing approaches cannot support low-latency and interactive garment control, which is crucial for applications such as e-commerce and content creation. This paper studies how to achieve interactive multi-garment video customization while preserving motion coherence using only single-garment video data. We present FashionChameleon, a real-time and interactive framework for human-garment customization in autoregressive video generation, where users can interactively switch garment during generation. FashionChameleon consists of three key techniques: (i) Instead of training on multi-garment video data, we train a Teacher Model with In-Context Learning on a single reference-garment pair. By retaining the image-to-video training paradigm while enforcing a mismatch between the reference and garment image, the model is encouraged to implicitly preserve coherence during single-garment switching. (ii) To achieve consistency and efficiency during generation, we introduce Streaming Distillation with In-Context Learning, which fine-tunes the model with in-context teacher forcing and improves extrapolation consistency via gradient-reweighted distribution matching distillation. (iii) To extend the model for interactive multi-garment video customization, we propose Training-Free KV Cache Rescheduling, which includes garment KV refresh, historical KV withdraw, and reference KV disentangle to achieve garment switching while preserving motion coherence. Our FashionChameleon uniquely supports interactive customization and consistent long-video extrapolation, while achieving real-time generation at 23.8 FPS on a single GPU, 30-180$\times$ faster than existing baselines.

16.
medRxiv (Medicine) 2026-06-17

Preserved Medial Temporal Lobe Flexibility Predicts Memory Generalization Only in the Context of Good Sleep Quality among Older African Americans

Objectives: Poor sleep quality is a risk factor for Alzheimer's disease (AD). Older African Americans experience disproportionately high rates of sleep disturbance and AD. Medial temporal lobe (MTL) flexibility reflects dynamic neural reorganization and may be a marker of generalization performance. This study examined whether sleep quality moderates the association between MTL flexibility and memory generalization. Methods: Fifty older African Americans (MeanAge=69.7{+/-}6.21 years; 80% women) underwent rs-fMRI to quantify MTL flexibility, Rutgers Acquired Equivalence Task for memory generalization, and Pittsburgh Sleep Quality Index for sleep quality. Results: Greater MTL flexibility was associated with better generalization (r=0.367, p=.017). Good sleepers showed higher MTL flexibility (F(1,44)=8.11, p2=.156, p=.007) and superior generalization (F(1,46)= 12.33, p2=.211, p=.001). Sleep quality significantly moderated the MTL flexibility and generalization relationship ({beta}=-1.519, p=.012). Conclusions: Preserved MTL flexibility may confer generalization only in good sleepers, suggesting that sleep disturbance may disrupt the MTL neural resilience among older African Americans.

17.
arXiv (math.PR) 2026-06-12

Explosion and non-explosion in pure birth Crump–Mode–Jagers branching processes

arXiv:2601.06850v2 Announce Type: replace Abstract: In this short note, we provide an explicit sufficient condition for non-explosion of Crump–Mode–Jagers branching processes with pure birth reproduction. It shows that the standard sufficient condition for explosion, namely the convergence of the series of reciprocals of the birth rates, is – at least for rate sequences without excessive oscillations – remarkably close to being necessary. At the same time, it is not necessary in full generality: we construct a counterexample which also yields a general preferential attachment tree without fitness with an infinite path and no vertices of infinite degree, thereby answering an open question previously raised in the literature.

18.
arXiv (math.PR) 2026-06-12

On McDiarmid's Inequality under Dependence via Approximate Tensorization of Entropy

arXiv:2606.12720v1 Announce Type: new Abstract: We argue that dependent versions of McDiarmid's inequality are a useful but underutilized tool in mathematical statistics, learning theory and theoretical computer science. To make this point, we first highlight that approximate tensorization of entropy (ATE) implies McDiarmid's via the Entropy Method. Second, we derive McDiarmid's inequality for non-isotropic Gaussian random vectors $X \sim \mathcal N(\mu, \Sigma)$ through ATE with a constant of the order of the condition number of $\Sigma$. We both independently obtain this ATE through a simple application of stochastic localization and also discuss how a more general ATE for the Gibbs sampler due to Ascolani et al., 2026 generalizes McDiarmid's-like concentration to strongly log-concave and log-smooth probability measures. We then apply the resulting concentration inequalities to resolve a question on the concentration of $\operatorname{sign}(X)$ posed by Simone Bombari, investigate Erdős-Rényi graphs under dependence and prove a Dvoretzky-Kiefer-Wolfowitz-type inequality for observations from a joint measure fulfilling ATE and continuous marginal CDFs. For the class of strongly log-concave and log-smooth measures, this result improves upon a prior Dvoretzky-Kiefer-Wolfowitz-type inequality for non-i.i.d. observations due to Bobkov and Götze, 2010, by establishing the expected $1/\sqrt{n}$-rate of convergence under weak dependence instead of $n^{-1/3}$.

19.
arXiv (CS.CV) 2026-06-11

Bridging the Modality Gap in Forensic Image Retrieval

Automated image retrieval plays an increasingly critical role in modern forensic analysis, supporting investigative workflows that rely on efficient comparison of visual evidence. While prior work has focused primarily on developing and optimizing multimodal retrieval systems, limited attention has been paid to evaluating the forensic applicability of these technologies across diverse real-world scenarios. In this study, we present a unified retrieval framework adapted to four key forensic tasks: (1) tattoo image retrieval given a tattoo query image; (2) tattoo retrieval guided by human-expert textual descriptions, modelling the common situation where a witness verbally describes a tattoo; (3) tattoo retrieval from hand-drawn sketches; and (4) face retrieval from forensic face sketches. Our system leverages a multimodal large language model (MLLM) to automatically generate structured textual descriptions for all queries and gallery images, followed by sentence-transformer embedding for text-based comparison. We evaluate retrieval using visual-only embeddings, text-only embeddings and a multimodal fusion strategy that combines text- and image-based similarity scores derived from state-of-the-art visual feature extractors relevant to each task. The fusion of modalities consistently improves retrieval precision and robustness, especially in scenarios where visual information is limited or noisy (e.g., sketches, partial tattoos, or fragmented witness statements). This work highlights the forensic value of a unified multimodal retrieval pipeline and demonstrates how modern MLLMs can operationalize challenging forensic tasks that traditionally rely on manual expert analysis. Our results position multimodal retrieval as a promising tool for supporting investigative workflows involving tattoos, facial composites, and witness descriptions.

20.
arXiv (CS.CL) 2026-06-12

When Iterative RAG Beats Ideal Evidence: A Diagnostic Study in Scientific Multi-hop Question Answering

Retrieval-Augmented Generation (RAG) extends large language models (LLMs) beyond parametric knowledge, yet it is unclear when iterative retrieval-reasoning loops meaningfully outperform static RAG, particularly in scientific domains with multi-hop reasoning, sparse domain knowledge, and heterogeneous evidence. We provide the first controlled, mechanism-level diagnostic study of whether synchronized iterative retrieval and reasoning can surpass an idealized static upper bound (Gold Context) RAG. We benchmark eleven state-of-the-art LLMs under three regimes: (i) No Context, measuring reliance on parametric memory; (ii) Gold Context, where all oracle evidence is supplied at once; and (iii) Iterative RAG, a training-free controller that alternates retrieval, hypothesis refinement, and evidence-aware stopping. Using the chemistry-focused ChemKGMultiHopQA dataset, we isolate questions requiring genuine retrieval and analyze behavior with diagnostics spanning retrieval coverage gaps, anchor-carry drop, query quality, composition fidelity, and control calibration. Across models, Iterative RAG consistently outperforms Gold Context, with gains up to 25.6 percentage points, especially for non-reasoning fine-tuned models. Staged retrieval reduces late-hop failures, mitigates context overload, and enables dynamic correction of early hypothesis drift, but remaining failure modes include incomplete hop coverage, distractor latch trajectories, early stopping miscalibration, and high composition failure rates even with perfect retrieval. Overall, staged retrieval is often more influential than the mere presence of ideal evidence; we provide practical guidance for deploying and diagnosing RAG systems in specialized scientific settings and a foundation for more reliable, controllable iterative retrieval-reasoning frameworks.

21.
Science (Express) 2026-05-07

TranscriptFormer: A generative cell atlas across 1.5 billion years of evolution | Science

作者: 未知作者

Single-cell transcriptomics is revolutionizing our understanding of cellular diversity, yet comparing transcriptional programs across the tree of life remains challenging. We developed TranscriptFormer, a family of generative foundation models trained on up to 112 million cells spanning 1.53 billion years of evolution across 12 species. We demonstrate state-of-the-art performance on cell type classification, even for species separated over 685 million years of evolution, and zero-shot disease state identification in human cells. Developmental trajectories, phylogenetic relationships and cellular hierarchies emerge naturally in TranscriptFormer’s representations without any explicit training on these annotations. This work establishes a powerful framework for quantitative single-cell analysis and comparative cellular biology, thus demonstrating that universal principles of cellular organization can be learned and predicted across the tree of life.

22.
arXiv (CS.AI) 2026-06-16

LatentGym: A Testbed For Cross-Task Experiential Learning With Controllable Latent Structure

arXiv:2606.15306v1 Announce Type: cross Abstract: We envision continually learning agentic systems that become more useful over time: as they encounter sequences of related tasks, they should infer the hidden structure shared across those tasks and use it to improve future decisions. This cross-task experiential learning capability is pivotal in domains such as personalization and interactive assistance, but existing training/evaluation frameworks do not provide shared, controllable latent structures and cannot measure whether or why agents improve. We introduce LatentGym: a controllable suite in which each environment is organized around a ground-truth latent variable governing the structure across tasks. Our construction yields metrics that separate exploration (whether the agent's actions gather information about the latent) from exploitation (whether the agent uses what it has gathered). We demonstrate our suite on empirical studies addressing three questions: how and why frontier models fail to adapt across related tasks; whether post-training on related task sequences improves general cross-task adaptation, and where those gains come from; and how design choices such as inter-task feedback shape training dynamics and generalization. Together, these results establish a controlled foundation for studying how LLM agents learn from experience across tasks, and for designing agents that adapt more reliably in sequential, personalized, and interactive settings.

23.
arXiv (CS.CL) 2026-06-11

Grammar-Constrained Decoding Can Jailbreak LLMs into Generating Malicious Code

Large Language Models (LLMs) are increasingly used for code generation, raising concerns that they may be misused to produce malicious code. Meanwhile, Grammar-Constrained Decoding (GCD) has been widely adopted to improve the reliability of LLM-generated code by enforcing syntactic validity. In this paper, we reveal a counterintuitive risk: this reliability-oriented technique can itself become an attack surface. We uncover a new jailbreak attack, termed CodeSpear, that exploits GCD to induce LLMs into generating malicious code. Our experiments show that simply applying a benign code grammar constraint can effectively jailbreak LLMs. To address this vulnerability, we propose CodeShield, a safety alignment approach that robustly preserves safe behavior even under attacker-controlled grammar constraints. CodeShield aligns the model in the code modality by teaching it to generate honeypot code under GCD. Such code is semantically harmless, so it does not implement the malicious request, and structurally diverse, so it is difficult to suppress through grammar tightening. At the same time, CodeShield still preserves natural-language refusals when natural language is available. Experiments on 10 popular LLMs across 4 benchmarks show that CodeSpear outperforms representative jailbreak baselines and increases the attack success rate by more than 30 percentage points on average. CodeShield also restores safety under CodeSpear while preserving benign utility. Our findings reveal a fundamental risk of GCD and call for greater attention to its potential security implications.

24.
arXiv (CS.LG) 2026-06-17

Beyond Independent Genes: Learning Module-Inductive Representations for Single-Cell Gene Perturbation Prediction

arXiv:2602.04901v2 Announce Type: replace-cross Abstract: Predicting transcriptional responses to genetic perturbations is a central problem in functional genomics. In practice, perturbation responses are rarely gene-independent but instead manifest as coordinated, program-level transcriptional changes among functionally related genes. However, most existing methods do not explicitly model such coordination, due to gene-wise modeling paradigms and reliance on static biological priors that cannot capture dynamic program reorganization. To address these limitations, we propose scBIG, a module-inductive perturbation prediction framework that explicitly models coordinated gene programs. scBIG induces coherent gene programs from data via Gene-Relation Clustering, captures inter-program interactions through a Gene-Cluster-Aware Encoder, and preserves modular coordination using structure-aware alignment objectives. These structured representations are then modeled using conditional flow matching to enable flexible and generalizable perturbation prediction. Extensive experiments on multiple single-cell perturbation benchmarks show that scBIG consistently outperforms state-of-the-art methods, particularly on unseen and combinatorial perturbation settings, achieving an average improvement of 6.7% over the strongest baselines. The code is available at https://github.com/ttruan2426-dot/scBIG.