Academic Intelligence · Curated Daily

探索全球前沿学术脉络

AcademicHub 汇聚顶级期刊与预印本平台的实时文献。定制您的专属科研雷达,利用大语言模型自动生成交叉领域文献分析简报。

01.
arXiv (CS.AI) 2026-06-12

Can I Buy Your KV Cache?

arXiv:2606.13361v1 Announce Type: new Abstract: Right now, across the world, AI agents are repeating the same absurd act: to read one document, they each recompute it from scratch. Every agent re-runs prefill, the most compute-intensive step a large model takes, over identical text, only to rebuild a key-value (KV) cache identical to the one the agent before it just built. The same answer, computed a million times. We make a proposal that is almost offensively simple: compute it once. Let a publisher precompute a document's KV cache, and let every other agent buy the right to load it and skip prefill. It works, and it is token-exact: loading a precomputed KV and continuing matches prefilling from scratch (24/24 greedy tokens, and at the logits level), with no accuracy cost. On Qwen3-4B, reuse is 9-50x cheaper in compute than prefill, and the gap widens with length (prefill's attention scales with L^2), so a single reuse already pays it back. Then the part that matters: where the KV lives. Shipping it fails, because KV is nearly incompressible, so per-load egress costs more than the prefill it saves. Hosting it provider-side, exactly as production prompt-caching works, removes egress entirely. The size of the prize is set by our measured compute saving: serving one hot 3774-token document to 80M agents costs ~$1.5M to re-prefill but only ~$0.03M of reuse compute (49.7x less). The 0.1x cache-read tariff APIs charge passes a 10x discount to users while sitting inside this measured envelope, so the 10x is a floor that the measured ~50x compute saving clears, and the gap to the physical ~50x is provider margin: millions of dollars per popular document. We frame the resulting agent-native prefill CDN and leave lossless KV compression and a cross-party payment layer as the open problems.

02.
arXiv (quant-ph) 2026-06-25

Quantum Error Correction-like Noise Mitigation for Wave-like Dark Matter Searches with Quantum Sensors

arXiv:2511.03253v2 Announce Type: replace-cross Abstract: We propose a quantum error correction-like noise mitigation protocol for enhancing the sensitivity of wave-like dark matter searches with quantum sensors. Our protocol uses multiple sensors to mitigate the noise affecting each sensor individually, allowing for the suppression of excitation noise that is parallel to the dark matter signal. We demonstrate that our protocol can improve the sensitivity to dark matter signals by a factor of $\sqrt{N}$, where $N$ is the number of sensors used, for small $N$. Furthermore, for sufficiently large $N$, we find that our protocol achieves the same performance as the standard quantum limit by the ideal measurement, which non-entangled sensors with parallel noise cannot reach due to the unknown phase of the dark matter field. Our work can be widely applied to various types of signals with unknown phases, and has the potential to enhance the sensitivity of quantum sensors such as arrays of resonant cavities.

03.
arXiv (CS.AI) 2026-06-24

MuTRAP: Multi-trigger Trojans Attacking Robot Task Planning Systems

arXiv:2504.17070v3 Announce Type: replace-cross Abstract: Robots need task planning methods to achieve goals that require more than one action. Recently, large pretrained models have demonstrated impressive performance in task planning. For instance, large language models (LLMs) can generate task plans using action and goal descriptions. Despite the rapid progress of large models in robot intelligence, their security implications remain only partially understood, leaving important gaps in the exploration of potential vulnerabilities in LLM-driven robotic planning systems. To investigate such risks, in this paper, we develop MuTRAP, the first multi-trigger trojan attack specifically designed and targeted for LLM-assisted robot task planners. MuTRAP follows the standard practice of LLM usage in robotics where the backbone LLM is typically frozen and hosted in a central server limiting attacker's reach. In contrast, MuTRAP injects backdoor using a small set of task-specific parameters. In addition, we develop a trigger optimization method for selecting multiple-trigger words that are most effective for different robot applications. For instance, one can use unique trigger word "herical" to activate a specific malicious behavior, e.g., cutting hand on a kitchen robot. Through MuTRAP that demonstrates the vulnerability of current LLM-based planners, our goal is to promote the development of secured robot intelligence. Details and demos are provided in: https://mutrap.github.io/MuTRAP/

04.
arXiv (CS.CL) 2026-06-17

Structural Role Injection in Handlebars-Templated LLM Prompts: Triple-Brace Interpolation, Delimiter Family, and the Limits of HTML Auto-Escaping

Large language model applications build prompts from templates, and Handlebars is a widely used templating engine and the default prompt-template format in Microsoft Semantic Kernel. Its double-brace {{x}} expression HTML-escapes the interpolated value and is documented as the safe default; its triple-brace {{{x}}} expression inserts the value raw. We show that this choice silently governs an application's exposure to structural role injection, where attacker-controlled data carries chat role delimiters that forge a higher-privilege turn. A model-free analysis establishes the mechanism: Handlebars escaping rewrites angle brackets but not square brackets, colons, or Markdown hashes, so it neutralises ChatML, Llama-3, and XML role delimiters (survival rate 0.00) while leaving Llama-2 [INST], legacy Human:/Assistant:, and Markdown ### delimiters intact (survival rate 1.00 for the last two). We then run 5760 trials across seven delimiter families, two attack objectives, and four models (GPT-3.5 Turbo, GPT-4o mini, GPT-4.1 mini, Claude Haiku 4.5) at a combined API cost of 1.63 USD. GPT-3.5 Turbo follows the task-hijack instruction in 97% of raw and 91% of escaped trials, with the escaping protection concentrated in the angle-bracket families and absent for the colon- and Markdown-based families; the harder secret-exfiltration objective, which does not saturate, exposes the same family interaction more cleanly. Claude Haiku 4.5 resists both objectives almost entirely. The escaped default protects only the delimiter schemes whose characters HTML escaping happens to cover, gives no protection for the rest, and cannot substitute for a structural separation of instruction and data.

05.
arXiv (CS.CL) 2026-06-18

DSB: Dynamic Sliding Block Scheduling for Diffusion LLMs

Diffusion large language models (dLLMs) have emerged as a promising alternative for text generation, distinguished by their native support for parallel decoding. In practice, block inference is crucial for avoiding order misalignment in global bidirectional decoding and improving output quality. However, the widely-used fixed, predefined block (naive) schedule is agnostic to semantic difficulty, making it a suboptimal strategy for both quality and efficiency: it can force premature commitments to uncertain positions while delaying easy positions near block boundaries. In this work, we analyze the limitations of naive block scheduling and disclose the importance of dynamically adapting the schedule to semantic difficulty for reliable and efficient inference. Motivated by this, we propose Dynamic Sliding Block (DSB), a training-free block scheduling method that uses a sliding block with a dynamic size to overcome the rigidity of the naive block. To further improve efficiency, we introduce DSB Cache, a training-free KV-cache mechanism tailored to DSB. Extensive experiments across multiple models and benchmarks demonstrate that DSB, together with DSB Cache, consistently improves both generation quality and inference efficiency for dLLMs. Code is released at https://github.com/lizhuo-luo/DSB.

06.
Nature (Science) 2026-06-17

Cucurbituril-based anion-conducting membranes with supramolecular nanopores

作者:

Nanoporous anion-conducting membranes have gained considerable interest for their potential to reduce resistance in electrochemical devices1–4. Current pore-forming methods, such as backbone engineering through polymers of intrinsic microporosity5,6 or covalent organic and metal–organic frameworks7,8, however, suffer from limited structural control, mechanical fragility or demanding synthesis. Here we establish a supramolecular strategy that overcomes these limitations by constructing uniform, dynamic nanopores. Co-assembly of the rigid macrocyclic host cucurbit[7]uril with the cationic polymer guest quaternized poly(piperidinium-terphenyl) yields a robust network of nanometre-scale channels while simultaneously enhancing mechanical and chemical stability. The dynamic host–guest interactions allow the pore structure to fluctuate on picosecond and angstrom scales. This transient environment supports low-friction hydroxide migration through a Grotthuss mechanism, producing a marked enhancement in ionic conductivity. This bottom-up design principle provides a versatile new tool for molecularly engineering transport pathways and promises to advance electrochemical reactors with respect to energy efficiency, operational stability and the production of high-purity products. A supramolecular strategy, in which uniform, dynamic nanopores are constructed, overcomes the limitations of limited structural control, mechanical fragility or demanding synthesis in nanoporous anion-conducting membranes, providing a versatile tool for molecularly engineering transport pathways.

07.
arXiv (CS.AI) 2026-06-16

Demystifying Variance in Circuit Discovery of LLMs

arXiv:2606.16920v1 Announce Type: cross Abstract: Circuit discovery is a key technique in mechanistic interpretability to pinpoint the model components that are crucial for performing a given task. Although the current state-of-the-art method (EAP-IG) performs well on the metric of (un)faithfulness, it suffers from substantial variability. This includes resampling variance, where the circuit changes when we probe with a new batch of data from the same distribution; rephrasing variance, where the discovered circuit shifts when the prompts are rephrased; and sample-wise variance, where a circuit with low population unfaithfulness exhibits large fluctuations in unfaithfulness across individual samples. This paper studies the roots of these variances. We demonstrate that CEAP, our new circuit discovery method that improves upon EAP-IG with a theoretical guarantee, can substantially lessen resampling variance. We further show that rephrasing variance arises because prompts with different templates tend to activate different circuits in the model. This leads us to argue that it may be challenging to find a comprehensive circuit that explains and controls the model's behavior on a task, which can be expressed in countless templates, suggesting that LLMs may be inherently hard to steer. We show that sparsity, which has been claimed to form more compact and interpretable task circuits, fails to solve this problem. Regarding sample-wise variance, we argue that it is largely benign: extremely poor unfaithfulness scores often stem from how unfaithfulness is defined, rather than from defects in the measured circuits. We show that the magnitude of unfaithfulness is affected by selective contribution scaling, a neural mechanism that accounts for the extremely poor scores sometimes observed.

08.
medRxiv (Medicine) 2026-06-24

TMPRSS2-Coagulation Nexus: A Novel Molecular Link Revealed by Pairwise Correlation Analysis Following AstraZeneca (ChAdOx1 nCoV-19) Vaccination in a Nigerian Cohort

Background: While haematological and coagulation changes following AstraZeneca vaccination have been described, the molecular mechanisms linking TMPRSS2 expression to coagulation remain underexplored, particularly in African populations. Methods: In this case-control study, 102 adults (51 vaccinated with AstraZeneca >=6 months prior, 51 unvaccinated controls) aged 18-65 years in Port Harcourt, Nigeria, were evaluated. Full blood count (Sysmex XN-1000), PT/aPTT (Erba Mannheim), RNA concentration, and qRT-PCR for ACE2/TMPRSS2 (normalized to GAPDH) were performed. Pearson correlations and t-tests were conducted (SPSS v26, p

09.
arXiv (CS.AI) 2026-06-15

Thinking Outside the [Chat]Box: Bridging Computer Science and Industrial Design for Cognitive-Inclusive Generative AI

arXiv:2606.14306v1 Announce Type: cross Abstract: Current Generative AI (GenAI) interfaces remain largely constrained to chatbox interaction, which can impose high cognitive demands on users and create substantial barriers for people with intellectual disabilities (ID), including prompt formulation difficulties, response overload, and limited mechanisms to assess information reliability. To explore alternative interaction models for cognitive accessibility, we conducted a cross-disciplinary co-design challenge in which two student cohorts (Computer Science and Industrial Design) developed interface concepts from the same set of functional requirements (e.g., prompt scaffolding, structured output, GUI-based refinement, transparency, and personalization). Comparing the resulting proposals reveals both convergence on foundational requirements (notably initial calibration, proactive prompting, and direct manipulation of response fragments) and complementary contributions that outline a multi-layered support system. Computer Science teams primarily produced structural scaffolding, emphasizing predictability, navigability, and trust through mechanisms such as reliability indicators, explicit sources, and context management for long conversations. Industrial Design teams emphasized experiential scaffolding, focusing on pacing, attention guidance, multimodality, and proactive agency, including step-by-step response flows, focus modes, and assistant-like integrations. We synthesize these findings into a dual-layer scaffolding framework that expands the design space for cognitively accessible GenAI interaction beyond chat-centric models and motivates future work on expert refinement, technical feasibility, and empirical validation with users with ID.

10.
Nature (Science) 2026-06-24

The mutational landscape of STING-induced immunity

作者:

Stimulator of interferon genes (STING) is an evolutionary conserved immune signalling protein with key roles in host defence, cancer, senescence and inflammation1–3. Downstream of STING, type I interferon, inflammatory cytokine signalling and non-canonical autophagy are governed by a multilayered mechanism integrating ligand-induced structural transitions, protein–protein interactions and coordinated intracellular trafficking4–13. Despite its central role in immunity and relevance as therapeutic target14, the sequence elements that govern STING (in)activation in cells remain incompletely understood. Here we developed a massively parallel assay to systematically chart the sequence-function landscape of STING. Profiling thousands of single amino-acid variants, we identified structural and functional determinants that shape the immunostimulatory capacity of STING and its ability to translate ligand recognition into distinct signalling outputs. Cryogenic-electron microscopy structures of select STING hyperactive variants revealed new regulatory principles dictating conformational transition from inactive to signalling-competent states of STING. Mutational effects are widespread across the functional landscape and can sensitize STING towards the natural ligand 2′3′-cGAMP15–18 or decouple interferon induction from non-canonical autophagy, demonstrating a diversity of possible responses that can be accessed through single point substitutions. Finally, our data showed the clinical and evolutionary relevance of naturally occurring STING protein variants. Collectively, these findings define molecular principles that tune STING activity and chart the landscape of its functional potential across immune contexts. A massively parallel assay systematically charts the sequence-function landscape of the STING signalling protein, and the findings define molecular principles that tune STING activity and show its functional potential across immune contexts.

11.
arXiv (CS.LG) 2026-06-15

Code Correctness Signals in LLM Hidden States: Pre-Generation Probing and Repair Geometry

arXiv:2606.14530v1 Announce Type: new Abstract: Large language models encode rich information in their hidden states. This work asks whether code correctness is legible in the hidden states of Qwen3-4B-Instruct-2507, before it generates and as it repairs a failed attempt, studied on 444 LiveCodeBench tasks. It reports two findings connected by a single confound-control tool: residualization. First, the correctness of the model's first-attempt code is linearly decodable from the prompt-final hidden state, with a leakage-free held-out AUC of 0.931 +/- 0.008 across 50 outer splits. After the linear effect of prompt length is removed from each hidden state dimension, the probe still reaches 0.911 +/- 0.010, well above a prompt-length baseline of 0.754 +/- 0.014. Second, on 236 cleaned cases where the model attempts to repair a failed first attempt, the hidden state shift from the failing attempt to its repair carries a statistically detectable contrastive direction, significant on both a magnitude and a split-half test against label-shuffled nulls. This direction does not survive a conditional residualization against repair-context covariates that differ between successful and failed repairs, marking it as a correlate of repair success driven by the repair context rather than an isolated repair-comprehension feature. The probe layer is selected by nested cross-validation, and the same residualization approach that upholds the pre-generation correctness result overturns the repair-direction interpretation. The contribution is as much methodological as empirical: a diagnostic honest enough to report a negative result alongside a positive one.

12.
arXiv (CS.CL) 2026-06-11

Augmenting Molecular Language Models with Local $n$-gram Memory

Transformer-based language models for SMILES strings suffer from a locality gap: standard character-level tokenization fragments chemically meaningful motifs, forcing models to repeatedly learn local syntax at the expense of long-range dependencies. To address this without disrupting standard tokenizers, we propose MolGram, which integrates a conditional $n$-gram memory module into molecular language models. MolGram maps local string patterns to learned embeddings via scalable hash lookups and dynamically injects this regional context into hidden states. Evaluations across three tasks, including unconditional molecule generation, forward reaction prediction, and single-step retrosynthesis, show that MolGram consistently improves performance. Crucially, our analyses demonstrate that MolGram outperforms baselines with 3$\times$ more parameters, establishing explicit local pattern memory as a highly efficient inductive bias.

13.
arXiv (CS.CV) 2026-06-25

SSMNBench: Diagnosing Image-based Cross-View Human-Object Understanding via Single-View Sufficiency and Multi-View Necessity

Multimodal Large Language Models (MLLMs) have shown remarkable progress in single-image perception, yet their ability to reason about complex cross-view human-centric scenes remains largely unverified. Current multi-view benchmarks evaluate models using a fixed "bag of frames" and thus conflate a model's robustness to visual distraction with its genuine ability to fuse fragmented cross-view evidence. To address this issue, we introduce SSMNBench, a diagnostic benchmark comprising 3,300 curated QA pairs for cross-view human and human-object understanding. SSMNBench uniquely categorizes tasks into Single-View Sufficiency (SVS) and Multi-View Necessity (MVN). By systematically perturbing view availability across 17 state-of-the-art MLLMs, critical limitations are revealed: models suffer from severe "distraction degradation" when presented with redundant views (SVS), and fail to integrate fragmented geometric evidence across cameras (MVN). Our evaluations demonstrate that modern MLLMs rely on multiple single-image semantic averaging and view preference rather than genuine cross-view synthesis. By exposing these fundamental vulnerabilities, SSMNBench provides a rigorous diagnostic framework to drive the advancement of future cross-view-aware multimodal architectures. The code is available at: $ \href{https://github.com/gtc-gh/SSMNBench}{SSMNBench} $

14.
arXiv (CS.AI) 2026-06-24

Are Safety Guarantees in Neural Networks Safe? How to Compute Trustworthy Robustness Certifications

arXiv:2606.23858v1 Announce Type: cross Abstract: A primary challenge in AI safety is the existence of adversarial examples – slightly distorted inputs that cause a neural network (NN) to misclassify. To mitigate this problem, recent research focuses on the computation of robustness certifications, which, for a given input, determine the largest distortion the input may receive without breaking the network's prediction. Robustness certifications can be interpreted as an axis-aligned hyper-rectangle (multi-dimensional intervals). Most existing approaches focus on maximizing the certification's volume, but recent intractability results prohibit the computation of volume-optimal certifications in reasonable time. We introduce the apothem measure and show how to compute apothem-optimal certifications in a linear number of calls to a NN verifier (oracle) w.r.t. the input domain's diameter. Moreover, we prove that we cannot have a volume-optimal, oracle-based algorithm, even if we discard the oracle costs. Also, we introduce dual certifications – an interval including all instances of a class – thus providing apothem-minimum upper bounds to a robustness certification. Further, we present the ParallelepipedoNN system, which we evaluate on the standard MNIST and Fashion MNIST benchmarks. A preliminary comparison with existing work on the same datasets reveals at least two-fold improvement w.r.t. the minimum edge length.

15.
medRxiv (Medicine) 2026-06-23

Sex-Specific Hemostatic Responses and Diagnostic Potential of Platelet Distribution Width (PDW) and D-Dimer in Mild COVID-19, Malaria, and Co-Infection in a Tropical Setting: A Case-Control Study in Port Harcourt, Nigeria

Background: In malaria-endemic tropical regions, the overlapping coagulopathy in COVID-19 and malaria poses diagnostic and prognostic challenges, particularly with potential sex differences. This study evaluated sex-specific variations in platelet indices and fibrinolytic markers and assessed the utility of Platelet Distribution Width (PDW) and D-dimer in mild/asymptomatic cases. Methods: A case-control study was conducted with 220 participants (55 each in healthy controls, malaria-positive, COVID-19-positive, and COVID-19+malaria co-infected groups), aged 20-65 years, in Port Harcourt, Nigeria. Platelet indices were analysed using Sysmex XP-300 haematology analyser, while D-dimer and fibrinogen were measured by ELISA. Data were analysed using SAS 9.4 with ANOVA, Tukey's HSD, Pearson correlation, and sex-stratified comparisons. Results: PDW was significantly elevated in all infected groups compared to controls (malaria: 15.21 +/- 0.22 fL; COVID-19: 15.21 +/- 0.22 fL; co-infection: 15.61 +/- 0.21 fL vs. control: 13.26 +/- 0.17 fL; F=25.850, p < 0.001). D-dimer levels were highest in the co-infected group (553.42 +/- 59.74 ng/ml, F=2.816, p = 0.040). No significant changes were observed in other platelet indices or fibrinogen across groups. No significant correlation existed between platelet indices and the fibrinolytic markers. Males exhibited significantly higher D-dimer levels across all infected groups (p < 0.05) and higher fibrinogen in COVID-19 subjects (p = 0.036). Sex exerted a stronger influence on parameters than age. Conclusion: Males show heightened fibrinolytic activation in COVID-19 and malaria co-infection. PDW and D-dimer are promising, cost-effective biomarkers for screening mild infections in resource-limited tropical settings.

16.
arXiv (CS.AI) 2026-06-11

OCSVM-Guided Representation Learning for Unsupervised Anomaly Detection

arXiv:2507.21164v2 Announce Type: replace-cross Abstract: Unsupervised anomaly detection (UAD) aims to detect anomalies without labeled data, a necessity in many machine learning applications where anomalous samples are rare or not available. Most state-of-the-art methods fall into two categories: reconstruction-based approaches, which often reconstruct anomalies too well, and decoupled representation learning with density estimators, which can suffer from suboptimal feature spaces. While some recent methods attempt to couple feature learning and anomaly detection, they often rely on surrogate objectives, restrict kernel choices, or introduce approximations that limit their expressiveness and robustness. To address this challenge, we propose a novel method that couples representation learning with an analytically solvable One-Class SVM (OCSVM), through a custom loss formulation that directly aligns latent features with the OCSVM decision boundary. The model is evaluated on two tasks: a \deleted{new} benchmark based on MNIST-C, and a challenging brain MRI \deleted{subtle} lesion detection task. Unlike most methods that focus on large, hyperintense lesions at the image level, our approach succeeds to target small, non-hyperintense lesions, while we evaluate voxel-wise metrics, addressing a more clinically relevant scenario. Both experiments evaluate a form of robustness to domain shifts, including corruption types in MNIST-C and texture or population age variations in MRI. Results demonstrate performance and robustness of our proposed model, highlighting its potential for general UAD and real-world medical imaging applications. The source code is available at https://github.com/Nicolas-Pinon/uad_ocsvm_guided_repr_learning.

17.
arXiv (CS.AI) 2026-06-17

Treatment Response Optimized Clinical Decision Support AI System via Digital Twin Simulation

arXiv:2606.17405v1 Announce Type: new Abstract: Clinical decision support AI systems (CDSASs) must adapt to evolving patient conditions in real-time while adhering to strict safety constraints. We present an online adaptive framework that integrates Treatment Effect (TE) estimation to quantify clinical benefits, a patient Digital Twin (DT) to simulate treatment trajectories, and Reinforcement Learning (RL) for sequential decision-making. The AI system is initially trained on historical medical records and operates in a continuous learning loop. To ensure safety, a rule-based module monitors vital signs and blocks contraindicated treatments. Cases with strong internal model disagreement are flagged for clinician review, simulated in our experiments via a pre-trained outcome model. We validate our framework using both a synthetic clinical simulator and a real-world ovarian cancer dataset from The Cancer Genome Atlas (TCGA). In both simulated and clinical settings, our method demonstrated superior effectiveness and stability in recommending treatments compared to standard computational baselines. Furthermore, the AI system maintains low latency and requires expert consultation for only a minority of cases in our experimental validation, demonstrating its potential as a safe, clinician-supervised tool for personalized medicine that continuously improves through practical use.

18.
arXiv (quant-ph) 2026-06-11

Additivity and chain rules for quantum entropies via multi-index Schatten norms

arXiv:2502.01611v3 Announce Type: replace Abstract: The primary entropic measures for quantum states are additive under the tensor product. In the analysis of quantum information processing tasks, the minimum entropy of a set of states, e.g., the minimum output entropy of a channel, often plays a crucial role. A fundamental question in quantum information and cryptography is whether the minimum output entropy remains additive under the tensor product of channels. Here, we establish a general additivity statement for the optimized sandwiched Rényi entropy of quantum channels. For that, we generalize the results of [Devetak, Junge, King, Ruskai, CMP 2006] to multi-index Schatten norms. As an application, we strengthen the additivity statement of [Van Himbeeck and Brown, 2025] thus allowing the analysis of time-adaptive quantum cryptographic protocols. In addition, we establish chain rules for Rényi conditional entropies that are similar to the ones used for the generalized entropy accumulation theorem of [Metger, Fawzi, Sutter, Renner, CMP 2024].

19.
arXiv (CS.LG) 2026-06-16

Robust Transformer-Based One-Step Stock Index Forecasting via Shifted Data Augmentation

arXiv:2606.15701v1 Announce Type: new Abstract: Transformers have shown remarkable success in sequence modeling, yet their direct application to financial time series remains challenging due to noisy signals, short-memory dynamics, and distributional shifts. This paper proposes a modified Transformer architecture for one-step stock index forecasting, combined with advanced learning-rate scheduling and a novel Shifted Data Augmentation (SDA) technique. We evaluate the proposed framework on two benchmark stock index datasets, VN30 and S&P 500. Experimental results demonstrate that cosine annealing with warmup consistently improves forecasting accuracy over the generalized inverse-power scheduler. Furthermore, SDA substantially reduces forecasting errors and run-to-run variability while improving robustness to hyperparameter selection. The combination of cosine annealing scheduling and SDA achieved the best performance on both datasets, indicating that data augmentation can play a more important role than increasing model complexity in Transformer-based financial forecasting. These findings provide a practical and computationally efficient approach for robust stock index forecasting in noisy financial environments.

20.
arXiv (CS.CL) 2026-06-25

Detect, Unlearn, Restore: Defending Text Summarization Models Against Data Poisoning

Training-time data poisoning during fine-tuning poses a significant threat to large language models (LLMs) deployed for abstractive text summarization, where small task-specific datasets exert disproportionate influence on model behavior. In this setting, adversaries manipulate fine-tuning data to induce persistent summarization failures, such as biased or harmful summaries, while preserving standard evaluation metrics. We present a unified post-hoc defense framework for detecting and remediating fine-tuning-stage poisoning in summarization models across the machine learning supply chain. Our experiments show that in white-box settings, poisoned document-summary pairs exhibit abnormally high training influence, enabling detection via influence-function analysis with semantic consistency checks. In black-box settings, poisoned models display two to three times greater sensitivity to semantics-preserving perturbations, enabling behavioral auditing without training data access. Beyond existing poisoning formulations, we introduce novel attacks targeting factual distortion and representational bias, showing that poisoning alters summarization behavior without triggering conventional alarms. Across nine architectures and six benchmark datasets under adaptive attacks, our defenses achieve 85-92% detection precision, while gradient-ascent unlearning restores up to 96% of original behavior with minimal utility loss (less than 0.6% ROUGE degradation). These results indicate that fine-tuning-time poisoning leaves persistent structural artifacts, enabling practical detection and post-deployment recovery without full retraining.

21.
bioRxiv (Bioinfo) 2026-06-18

Accounting for allelic diversity and multicopy gene detection improves the accuracy of antibiotic resistance genotypic determination

Background Genomic prediction of antimicrobial resistance (AMR) relies on the accurate detection of resistance genes or allelic variants of core genes from raw or assembled genomes sequences. For several bacterial species and antibiotics, AMR genotype-phenotype discrepancies are common, indicating that important sources of error remain unresolved. For Enterococcus faecium, we focused on identifying the sources of discrepancies for tetracycline resistance, for which genotypic detection had shown particularly low accuracy. We investigated the effect of structural variation in antibiotic resistance genes (ARGs), including gene duplications, truncations, interruptions, and mixed configurations of complete and partial gene copies, as a source of genotype-phenotype discrepancies from short-read data. We conduct further extended investigations to other antibiotic families and into another bacterial species: Escherichia coli. Methods We analyzed collections of E. faecium and E. coli genomes, integrating high-quality complete assemblies, simulated Illumina short reads, and matched AMR phenotypic data. The integrity, copy number, and allelic diversity of ARGs were examined for multiple antibiotic classes, and their impact on ARG detection and accuracy of AMR determination was assessed using several commonly used bioinformatic tools (SRST2, ARIBA and AMRFinderPlus). Results For E. faecium, after ruling out the effect of specific tet allelic variants on tetracycline susceptibility, we found that the integrity and copy number of tet(M) had a major effect on detection accuracy. Duplicated and incomplete ARGs are also common in E. faecium genomes, particularly for macrolides (erm(B)) and aminoglycosides (ant(6)-Ia and aph(3')-IIIa). In E. coli, similar patterns were observed for tet(A), erm(B) and aminoglycoside-associated genes (aph(3')-IIIa and ant(6)-Ia). Across ARGs in both species, short-read mapping methods wrongly reported interrupted genes as complete in some instances, while assembly-based methods often failed to resolve complete copies of duplicated genes. Detection accuracy improved when tools were adapted to account for gene integrity and when extended AMR databases incorporating species-specific alleles were included. Conclusions Our findings reveal that bioinformatic limitations in dealing with ARG copy number and completeness, and in accounting for allelic variation, underly a substantial source of genotype-phenotype errors, highlighting the need for improved AMR databases and bioinformatic tools that consider these factors to achieve reliable genomic prediction of AMR.

22.
arXiv (CS.AI) 2026-06-24

SP-Mind: An Autonomous Reasoning Agent for Spatial Proteomics Analysis

arXiv:2606.24235v1 Announce Type: new Abstract: Spatial proteomics enables single-cell-resolution characterization of protein expression within tissue architecture, playing a critical role in understanding tumor microenvironments and guiding precision medicine. However, current analysis workflows remain fragmented, requiring expert manual orchestration of heterogeneous tools and limiting research scalability and reproducibility. We present SP-Mind, the first autonomous AI agent designed to unify the spatial proteomics analysis pipeline, from raw multiplexed tissue imaging to downstream phenotype discovery. Equipped with expert-curated biological analysis skills and specialized computational tools, SP-Mind converts natural-language queries into end-to-end analytical workflows without task-specific fine-tuning. To rigorously evaluate its capabilities, we introduce SP-Bench, a comprehensive benchmark spanning diverse tissue types, comprising 102 tasks across 18 distinct categories. Through extensive evaluation on SP-Bench and established downstream tasks, SP-Mind achieves state-of-the-art performance compared to existing open-source biomedical agent baselines.

23.
arXiv (CS.CV) 2026-06-16

Shift-and-Sum Quantization for Visual Autoregressive Models

Post-training quantization (PTQ) enables efficient deployment of deep networks using a small set of data. Its application to visual autoregressive models (VAR), however, remains relatively unexplored. We identify two key challenges for applying PTQ to VAR: (i) large reconstruction errors in attention-value products, especially at coarse scales where high attention scores occur more frequently; and (ii) a discrepancy between the sampling frequencies of codebook entries and their predicted probabilities due to limited calibration data. To address these challenges, we propose a PTQ framework tailored for VAR. First, we introduce a shift-and-sum quantization method that reduces reconstruction errors by aggregating quantized results from symmetrically shifted duplicates of value tokens. Second, we present a resampling strategy for calibration data that aligns sampling frequencies of codebook entries with their predicted probabilities. Experiments on class-conditional image generation, inpainting, outpainting, and class-conditional editing show consistent improvements across VAR architectures, establishing a new state of the art in PTQ for VAR.

24.
arXiv (CS.CV) 2026-06-16

Multi-Modal Spatio-Temporal Graph Neural Network with Mixture of Experts for Soil Organic Carbon Prediction

Top-soil organic carbon (SOC) prediction is fundamental to agricultural sustainability, land use policy and fertilization planning. Existing approaches face two limitations: they pair hand-crafted covariates with classical ML or single-modal deep models that miss rich spectral and temporal information, and grid-based architectures ignore the irregular spatial structure of field measurements. We introduce SpTGNN, a multi-modal spatio-temporal graph neural network addressing both. SpTGNN represents soil measurements as nodes in a heterogeneous graph with three edge types (spatial proximity, spectral similarity, elevation), and applies relational graph attention to learn separate patterns per relation. A fine-tuned TerraMind encoder extracts node features from Sentinel-2, Sentinel-1 and DEM signals, combined with per-sample environmental covariates and learned positional and temporal embeddings. A sparse Mixture-of-Experts module fuses the four streams via top-$k$ routing. Uncertainty is captured by pairing heteroscedastic regression (aleatoric) with deep ensembles (epistemic), and a Moran's $I$ penalty regularizes spatial autocorrelation. We evaluate on a global SOC corpus split into three regional instances ($\sim$49k samples globally, Africa $\sim$26k, Europe $\sim$14k). Our 5-member deep ensemble reports $R^2=0.762$, RMSE $=3.51\pm0.48$ g/kg and MAPE $=22.9\%$ on the Africa test split, improving over a tabular XGBoost baseline; the best single checkpoint reaches validation $R^2=0.864$. Ablations confirm the heterogeneous graph, MoE fusion and fine-tuned backbone each contribute substantively, and the ensemble UQ stack achieves post-calibration ECE of $0.031$ (hybrid) and $0.026$ ($\beta$-NLL). To our knowledge, this is the first framework to unify foundation-model feature extraction, heterogeneous graph attention and decomposed uncertainty quantification for SOC estimation.

25.
arXiv (CS.AI) 2026-06-15

Rethinking Backdoor Adversarial Unlearning through the Lens of Catastrophic Forgetting in Continual Learning

arXiv:2606.14078v1 Announce Type: cross Abstract: Existing studies reveal that current backdoor defenses exhibit limited robustness and often fail against specific types of attacks. More concerningly, prevailing safety tuning strategies tend to provide only superficial safety protection, as they fall short of completely eliminating the backdoor effects. In this work, we present a novel formulation of backdoor learning and unlearning as a sequential, three-stage process from a continual learning perspective. Within this framework, we formally define complete backdoor unlearning and further derive the necessary conditions for achieving it based on the mechanism of catastrophic forgetting. Guided by these insights, we propose Blind Inversion-Backdoor Adversarial Unlearning (BI-BAU), which formulates the generation of adversarial examples satisfying the unlearning conditions as a blind inversion problem. We solve this by integrating the bi-level optimization process of adversarial training into an Expectation-Maximization (EM) algorithm framework to optimize the maximum a posteriori (MAP) objective. Furthermore, BI-BAU is extended to untargeted adversarial scenarios with unknown target classes, as well as to multi-modal contrastive learning tasks, enhancing its applicability to real-world deployment scenarios where pre-trained models may be compromised. Extensive experiments demonstrate that our method exhibits general applicability across a wide spectrum of backdoor attacks and can effectively and thoroughly eliminate the backdoor effects from a backdoor model.