Academic Intelligence · Curated Daily

探索全球前沿学术脉络

AcademicHub 汇聚顶级期刊与预印本平台的实时文献。定制您的专属科研雷达,利用大语言模型自动生成交叉领域文献分析简报。

01.
arXiv (CS.AI) 2026-06-15

Safety-Contract Graph Multi-Agent Reinforcement Learning for Autonomous Network Security Response

arXiv:2606.13832v1 Announce Type: cross Abstract: Autonomous network-security response systems promise to reduce Security Operations Centre (SOC) reaction latency, but reward-only multi-agent reinforcement learning (MARL) can improve security reward while remaining non-deployable. We present a safety-contract graph MARL framework and instantiate it as ACD$^3$-GAT (Adaptive Constrained Counterfactual Decisioning with a Graph Attention Network encoder), an architecture that separates simulator observations from reusable operational budgets, constrained optimization, graph state encoding, and counterfactual action screening. We evaluate the method in CAGE Challenge 4, where agents operate under budgets for Mean Time to Recover (MTTR), false-positive response, and firewall change-management disruption. Across the benchmark, every unconstrained method violates the SOC downtime budget in 100% of evaluated episodes, with mean downtime proxy costs of 311-430 against a budget of 50. This complements prior CAGE Challenge 4 findings by showing that reward-only learning lacks operational discipline. Constrained MAPPO-GAT (C-MAPPO-GAT) isolates Lagrangian operational-cost control and budget-aware screening, while ACD$^3$-GAT adds budget context, CVaR tail-risk estimation, opponent-belief state, and Graph Counterfactual Risk Propagation (G-CRP). The replicated comparison includes three 200-episode seeds for IPPO, MAPPO-GAT, C-MAPPO-GAT, and ACD$^3$-GAT. C-MAPPO-GAT reduces downtime violation from 100% to 0.3% and mean downtime cost from 355.4 to 15.5 relative to MAPPO-GAT. ACD$^3$-GAT reduces mean downtime cost to 48.2 with a 13.8% violation rate, placing it on the safety-contract frontier rather than at the most conservative compliance point. Topology-seed and coupled adaptive Red-process stress tests preserve this contrast and show lower worst adaptive degradation for safety-constrained policies than reward-only MAPPO-GAT.

02.
arXiv (quant-ph) 2026-06-11

Permutation-Invariant N-body gates via Tavis-Cummings Hamiltonian

arXiv:2506.03453v3 Announce Type: replace Abstract: Global control provides a promising route to implementing multi-qubit gates without individual qubit addressing. This is especially appealing for permutation-invariant (PI) gates, whose symmetry is often broken when they are compiled into individually addressed one- and two-qubit gates. Important examples include SWAP, $\sqrt{iSWAP}$, and the n-qubit controlled-Z gate, which is equivalent, up to two single-qubit Hadamard gates, to the multi-qubit Toffoli gate. Motivated by this global-control perspective, we show that all PI unitaries on an arbitrary number of qubits can be realized using the Tavis-Cummings (TC) interaction, the multi-qubit version of the Jaynes-Cummings interaction, together with global uniform z and x fields. Here, the $n$ qubits are identically coupled to a single bosonic mode (oscillator), which is initialized in and returned to its vacuum state. A corollary is that all PI states, including GHZ and Dicke states, can be prepared using the same global control. For the case n=2 qubits, which is particularly important in quantum computing, we also find explicit pulse sequences for implementing all PI qubit unitaries that conserve angular momentum in the z direction, using only the TC interaction and global z fields. This includes controlled-Z, SWAP, and $\sqrt{iSWAP}$.

03.
arXiv (CS.CV) 2026-06-17

Two-Stage Fine-Tuning of ResNet50 for High-Sensitivity Melanoma Detection on Dermoscopic Images

作者:

Melanoma is the most dangerous form of skin cancer with five-year survival rates exceeding 99% when detected early but falling sharply once the disease spreads. This paper proposes and evaluates a two-stage fine-tuning approach for ResNet50 applied to binary melanoma classification on dermoscopic images. The core challenges addressed are class imbalance and suboptimal transfer learning from single-stage fine-tuning. After stratified train/validation/test splitting, random oversampling was applied exclusively to the training set to achieve a 1:1 class balance. Stage 1 trained only the classification head with the ResNet50 base frozen, while Stage 2 fine-tuned all layers jointly at a low learning rate of 1e-5 to prevent catastrophic forgetting of learned visual features. On an independent test set of 3,826 images, the model achieved an AUC-ROC of 0.9559, accuracy of 88.34%, sensitivity of 87.56%, specificity of 89.13%, and F1-score of 88.29%. An ablation study confirms the two-stage protocol significantly outperforms single-stage fine-tuning, with sensitivity gains of over 4%. Grad-CAM visualizations demonstrate correct lesion localization. A fully deployable Streamlit detection application is provided alongside all training code.

04.
arXiv (CS.CV) 2026-06-18

Pyramid Self-Contrastive Learning for Single-shot Test-time Ultrasound Image Denoising

The inherent electronic and speckle noise complicates clinical interpretation of ultrasound images. Conventional denoising methods rely on explicit noise assumptions whose validity diminishes under composite noise conditions. Learning-based methods are usually pretrained in a limited image domain using a labeled dataset, which implies inevitable domain shift in complex in vivo environments. This study proposes a Pyramid Self-Contrastive Learning (PSCL) framework for test-time ultrasound image denoising without pretraining. Given multiple noisy samples from only one-shot imaging, PSCL disentangles anatomical similarity and noise randomness into separate pyramid latent spaces. The clean image is then decoded from the anatomy space while discarding the noise space. We first apply PSCL to synthetic aperture ultrasound (SAU), where an Aperture-to-Aperture loop serves as a self-supervised proxy task to ensure denoising fidelity. Simulation experiments, including noise levels from 0 to 30 dB and inclusion geometries from simple to complex, demonstrated improvements of 69.3% in SNR and 34.4% in CNR. The in vivo results showed 84.8% SNR and 25.7% CNR gains using only two aperture data of the heart in six echocardiographic views, liver, and kidney. PSCL delivers clear images across diverse imaging targets and configurations, paving the way for more reliable anatomical visualization without domain shift and pretraining costs.

05.
arXiv (CS.AI) 2026-06-17

Discrete Autoregressive Transformer for Generative Mechanism Synthesis

arXiv:2606.17409v1 Announce Type: cross Abstract: Planar path synthesis requires mechanisms whose coupler curves match a prescribed trajectory; the mapping from curve to linkage is inherently one-to-many across four-, six-, and eight-bar topologies. We address this design problem with simulation-grounded evaluation on a curated corpus of over one million mechanisms, reporting Chamfer distance and dynamic time warping after forward kinematics and geometric alignment. We formulate synthesis as conditional autoregressive sequence modeling: joint coordinates are uniformly quantized to tokens and generated by a decoder-only transformer with a variational-autoencoder (VAE) latent of the target curve and an explicit mechanism-type token. Training combines token cross-entropy with a Gaussian-smoothed bin auxiliary loss that respects ordinal structure among bins. At inference, a bounded latent-noise schedule decodes all mechanism types at each noise level; we retain the top five candidates by geometric error, yielding diverse accurate families without dataset lookup. On held-out tests, aggregate mean Chamfer distance is $0.0132$ and mean dynamic time warping is $0.153$; a latent $k$-nearest-neighbor baseline that conditions on training-set neighbor latents in VAE space achieves matched-topology mean Chamfer distance $0.0071$ and mean dynamic time warping $0.117$ using the same decoder.

07.
arXiv (CS.CL) 2026-06-18

Decoupling Search from Reasoning: A Vendor-Agnostic Grounding Architecture for LLM Agents

Production LLM agents increasingly depend on real-time search, yet native search grounding bundles retrieval policy, provider choice, evidence injection, cost, latency, and generation behavior behind a single model-provider boundary. This coupling makes grounding hard to inspect, tune, reuse, or port, and can trigger Search-Induced Verbosity that breaks strict output contracts. We present Decoupled Search Grounding (DSG), a vendor-agnostic boundary that moves grounding outside the reasoning model through an MCP-compatible gateway, exposing provider routing, source-aware context rendering, configured fallback, retrieval-depth control, and exact plus semantic caching as first-class controls. Across five frontier models on SimpleQA, FreshQA, and HotpotQA, native search leads on recency-sensitive FreshQA, but DSG exposes a stronger frontier when control matters: on SimpleQA it nearly matches native accuracy (86.1% vs. 87.7%) at 91% lower search cost, preserves concise answer contracts, and reaches a 99.4% warm-cache hit rate with 68% lower latency. Deployed as a shared production grounding layer for large-scale agentic workloads with interchangeable models, DSG matches or slightly exceeds native-search accuracy on an e-commerce query-understanding (QIU) workload while cutting search cost by over 98%. Real-time grounding is best treated as an optimizable interface boundary, not a fixed model feature.

08.
PLOS Computational Biology 2026-06-22

Towards modeling phage therapy

by Rob J. de Boer, Robert Schooley, Alan S. Perelson Patients infected with life-threatening multi-drug resistant (MDR) bacteria have been treated with cocktails of bacteriophages. This is a complicated form of personalized medicine as the phages given to a patient have to be selected beforehand on the basis of their lytic capacity of the infecting bacteria. Because bacteria rapidly become resistant, the evolution of resistance to a diverse cocktail of phages is a complicated dynamical process, during which competing bacterial strains replace one another by accumulating several resistance mechanisms, each of which may involve a fitness cost. As a consequence, it is typically not known why a particular phage therapy succeeded or failed, and how one can optimize the composition of the cocktails to maximize the rate of success. To improve upon this, we extend an existing in vivo-calibrated mouse model into a novel mathematical model for the human situation, and include multiple phages infecting multiple bacterial strains, differing in their resistance to each of the phages. We adjust several parameter estimates of the bacterial model to the human situation, and use the model to describe a successful case of phage therapy involving several cocktails, each containing several phages. In the model, treatment success crucially depended on pretreatment resistance levels, and on the diversity and the timing of the cocktails. Once an appropriate cocktail is found, it is less important to further optimize the infection rates of the phages. Resistant bacterial strains expand rapidly when sensitive strains decline, and the higher the infectivity of the phages, the faster resistant strains expand. Because resistance evolves rapidly, it is best to provide a diverse set of phages right from the start of therapy, i.e., to hit hard and early, and create a high genetic barrier to bacterial resistance.

09.
arXiv (CS.AI) 2026-06-15

Aligning Quantum Operators with Large Language Models

arXiv:2606.13811v1 Announce Type: cross Abstract: Can Large Language Models (LLMs) understand and reason about quantum operators? Despite their remarkable capabilities in mathematics and symbolic reasoning, LLMs remain inherently blind to quantum representations such as unitary matrices. In this work, we take a step toward bridging this gap by introducing an approach that maps unitary operators into the latent space of an LLM, enabling unified modeling over quantum and linguistic inputs. We instantiate this idea on Clifford+T circuit synthesis over a Pauli rotation gate set, where our model achieves results competitive with state-of-the-art methods and scales consistently with training data, with no signs of saturation. Our approach further enables language-conditioned synthesis, allowing gate constraints unseen during training to be specified directly in natural language. This work suggests a path toward quantum–aware foundation models that can natively interpret and reason about quantum operations, which could have broader implications reaching across quantum compilation and algorithm discovery.

10.
medRxiv (Medicine) 2026-06-22

Between Patterns and Predictions: Interpretable Latent EEG Representations for Clinical Insights

Electroencephalography (EEG) captures rich brain dynamics, yet in clinical practice this complexity is often reduced to simplified summaries or categorical labels, limiting its interpretability for decision-making. We tested the hypothesis that a pretrained latent embedding framework, the Universal Map of EEG (UM-EEG), can preserve clinically meaningful structure across heterogeneous datasets and provide a generalizable representation of brain states. We applied UM-EEG, without retraining, to three independent cohorts spanning distinct clinical contexts: long-term EEG recordings from cardiac arrest patients (n = 576), subarachnoid hemorrhage (n = 100), and routine clinical EEG recordings containing physiological and pathological patterns (n = 141). EEG segments were projected into a shared 128-dimensional space anchored by expert-derived reference states, including wakefulness, sleep stages, ictal-interictal continuum activity, and burst suppression. Across datasets, favorable outcome or physiological recordings were consistently located closer to healthy reference states, whereas poor outcome and pathological recordings shifted toward pathological regions of the embedding space. Trajectory-derived geometric and temporal features discriminated outcome in cardiac arrest (ROC-AUC 0.83) and subarachnoid hemorrhage (ROC-AUC 0.76), and distinguished physiological from pathological routine EEGs (ROC-AUC 0.93). In routine EEG, similarity relationships derived from embedding trajectories correlated with those derived from structured clinical reports, indicating that the latent space recapitulates clinically relevant organization. These findings show that a fixed, semantically structured EEG embedding generalizes across etiologies and recording settings, enabling prognostic stratification and contextual interpretation while preserving the relational structure of brain states.

11.
arXiv (CS.CV) 2026-06-18

HeatKV: Head-tuned KV-cache Compression for Visual Autoregressive Modeling

Visual Autoregressive (VAR) models have recently demonstrated impressive image generation quality while maintaining low latency. However, they suffer from severe KV-cache memory constraints, often requiring gigabytes of memory per generated image. We introduce HeatKV, a novel compression method that adapts cache allocation in each head based on its attention to previously generated scales. Using a small offline calibration set, the attention heads are ranked according to their attention scores over prior scales. Based on this ranking, we construct a static pruning schedule tailored to a given memory budget. Applied to the Infinity-2B model, HeatKV achieves $2 \times$ higher compression ratio in memory allocation for KV cache compared to existing methods, while maintaining similar or better image fidelity, prompt alignment and human perception score. Our method achieves a new state-of-the-art (SOTA) for VAR model KV-cache compression, showcasing the effectiveness of fine-grained, head-specific cache allocation. Code and calibration script available at https://github.com/arm-research/heatkv.

12.
medRxiv (Medicine) 2026-06-16

Reporting patterns of adverse drug withdrawal events using individual case safety reports in United States and European databases

Introduction: Adverse drug withdrawal events (ADWEs) are a key safety concern with deprescribing but are infrequently reported in trials. Although pharmacovigilance systems have advanced our understanding of medication-related harms, it is unclear how extensively these systems have been used for ADWEs. Objectives: To examine the reporting patterns of ADWEs for all drugs recorded in United States and European pharmacovigilance databases between 2004 and 2023. Methods: A retrospective study was conducted using two pharmacovigilance databases, the publicly available FDA-FAERS dataset and EMA-EV Level 2A (individual-level) dataset. ADWE cases were identified using relevant MedDRA preferred terms. Data on patient characteristics, reporter type, drugs, indication, ADWE outcomes, dechallenge/rechallenge, seriousness criteria, time to onset, duration, and causality were summarised. Results: A total of 158,505 ADWE reports were analysed (FDA-FAERS: 145,514; EMA-EV: 12,987), with mean ages of 46.1 (FDA; 55.3% female) and 45.5 years (EMA; 57.1% female). The frequently reported drug classes were opioids (FDA: oxycodone, 29.8%; EMA: buprenorphine, 19%), antidepressants (FDA: duloxetine, 32%; EMA: venlafaxine, 25.9%) and gabapentinoids (FDA: pregabalin, 6.7%; EMA: pregabalin, 6.0%). The most common adverse outcomes were other serious medical conditions (FDA=63.9%; EMA=46.0%), hospitalisation (FDA=15.9%; EMA=28.3%), and disability (FDA=13.3%; EMA=6.2%) and these outcomes varied significantly based on sex and age group (p

13.
arXiv (CS.CL) 2026-06-18

Are LLMs Ready to Assist Physicians? PhysAssistBench for Interactive Doctor-Patient-EHR Assistance

The most plausible near-term role of medical LLMs is to assist rather than replace physicians, yet current evaluations often test isolated capabilities: clinical knowledge, EHR system interaction, or patient communication. Physician assistance instead requires coordinating these capabilities within the same interaction, where physicians issue underspecified requests, patients describe symptoms ambiguously, and EHR systems demand precise tool use. We introduce PhysAssistBench, a benchmark for interactive doctor-patient-EHR assistance. Built from real MIMIC-IV cases, PhysAssistBench uses a scalable pipeline to construct agentic patients: interactive, record-grounded agents that turn static EHR records into multi-turn clinical scenarios while preserving clinical factuality. PhysAssistBench provides a curated bilingual evaluation set of 1,296 manually reviewed and physician-validated turns. Experiments with leading LLMs show that current models remain unreliable in this setting, which exposes a key bottleneck for clinical LLMs: reliable assistance requires coordination across knowledge, communication, and systems, not isolated gains in any of them.

14.
arXiv (CS.CL) 2026-06-19

Light-weight Pronunciation Assessment via Discrete Speech Token Surprisal

Training automated pronunciation assessment often relies on labeled learner errors or non-native corpora that are costly to collect. We propose a lightweight framework trained only on native speech resources, operating unsupervised or lightly calibrated with a small set of scored utterances. At inference, learner speech is discretized with an SSL encoder and a K-means codebook. A token language model trained on native sequences computes surprisal where higher surprisal indicates phonotactic deviation. We add a transcript-guided Text2DUnit–DTW module that predicts native token sequences from reference text and aligns them to acoustic tokens to derive error-sensitive features. Surprisal and alignment features are fused via simple regression. On SpeechOcean762, PCC improves from 0.60 to 0.66 with transcript guidance, near supervised baselines. Cross-dataset evaluation on L2-ARCTIC shows consistent gains.

15.
medRxiv (Medicine) 2026-06-18

The relationship between serotonin transporter occupancy and extracellular serotonin concentration is hyperbolic, not linear: implications for safely tapering antidepressants

Background: Hyperbolic tapering is an increasingly recognized approach for discontinuing serotonin reuptake inhibitor (SRI) antidepressants that involves non-linear dose reductions with equal stepwise reductions in serotonin transporter (SERT) occupancy to mitigate withdrawal symptoms. Its theoretical basis is the hyperbolic relationship between SRI dose and SERT occupancy reported in radioligand imaging studies. Hyperbolic tapering implicitly assumes that changes in SERT occupancy approximate changes in biologic effect and withdrawal risk. Because SERT occupancy plateaus across the therapeutic dose range of SRIs, this framework predicts relatively small biologic effects and withdrawal risk within this range. However, SERT occupancy influences serotonergic activity only indirectly via its effects on extracellular serotonin concentrations, and the relationship between these two variables is poorly characterized. Methods: We developed a two-pathway clearance model derived from mass-action kinetics to evaluate the steady-state relationship between SERT occupancy and extracellular serotonin concentrations under chronic SRI treatment. Results: Our analysis indicates that serotonin concentrations increase hyperbolically as transporter occupancy increases, suggesting that biologically meaningful differences in serotonergic signaling persist across the therapeutic dose range of SRIs despite plateauing occupancy. Conclusions: Our model predicts a hyperbolic relationship between SERT occupancy and extracellular serotonin concentrations, suggesting that changes in occupancy may not map proportionally onto serotonergic effect. These findings provide a potential mechanistic explanation for dose-dependent clinical effects of SRIs despite plateauing transporter occupancy and generate testable hypotheses regarding antidepressant tapering strategies. Empirical validation is warranted.

16.
bioRxiv (Bioinfo) 2026-06-22

From hotspot dependence to distributed robustness in resistance-aware lead optimization

Drug resistance remains a recurrent failure mode in targeted anticancer and antiviral therapy, and resistance evidence often enters only after compound selection. ResistAgent is an evidence-constrained framework that converts mutational liabilities into design-time objectives through site- and combo-aware resistance mapping, deterministic mechanism diagnosis and robust counter-design. In EGFR-Erlotinib and HIV-RT-Rilpivirine, the framework separated residue-level liabilities from observed HIV combination liabilities and linked prioritized mutations to anchor loss, pocket rearrangement, electrostatic shifts and contact redistribution. Same-budget paired searches showed that robust objectives changed lower-tail mutant-panel behavior and interaction-dependence profiles while prioritizing robustness over average-affinity behavior. Under predefined liability panels, selected robust-best trajectories shifted support away from mutable hotspot contacts toward more distributed interaction networks. Supplementary physical summaries and ranking-first benchmarks support the scope of this resistance-aware design strategy while preserving clear boundaries for prospective validation.

17.
arXiv (CS.CV) 2026-06-17

SierpinskiCam: Camera-Controlled Video Retaking with Sierpinski Triangle Pattern Cues

Generating novel renderings of a scene along user-defined camera trajectories from a single monocular video, dubbed video retaking, is a compelling but difficult problem in content creation and visual effects. Existing geometry-guided approaches reconstruct a 4D representation from the source video and render it along the target trajectory to condition video diffusion models. However, this guidance degrades as the target camera departs from the source trajectory, leaving newly revealed regions sparse or entirely missing. We propose SierpinskiCam, which addresses this limitation by augmenting geometry-based guidance with Sierpinski dome texture cues that contains rich trackable features even under large viewpoint changes. We further introduce a reference video conditioning mechanism that appends source-video tokens to the target-token sequence and separates the two streams with negative RoPE indices, enabling appearance grounding without architectural modification or per-video adaptation. Extensive experiments show that SierpinskiCam achieves significant gains in camera controllability, geometric consistency, and video quality across diverse and challenging retaking scenarios. Project page: https://hyelinnam.github.io/SierpinskiCam/.

18.
arXiv (CS.CV) 2026-06-16

FusionRS: A Large-Scale RGB-Infrared Remote Sensing Dataset for Dual-Modal Vision-Language Foundation Models

Remote sensing vision-language models have advanced Earth observation understanding, but most existing work remains centered on RGB imagery, leaving the complementary information in infrared data underexplored. Infrared images provide distinctive cues, including thermal intensity structures, object boundaries, and illumination-invariant scene features, which can enrich visual-language learning beyond conventional RGB observations. However, a large-scale RGB-infrared-text dataset for remote sensing vision-language modeling is still absent. To address this gap, we introduce FusionRS, the first large-scale RGB-infrared-text dataset designed for dual-modal vision-language learning in remote sensing. FusionRS is constructed by translating diverse public RGB remote sensing images into infrared-style counterparts, forming aligned RGB-IR image pairs. Each pair is associated with conventional scene captions and IR-aware captions that explicitly describe infrared-specific visual properties while preserving semantic content. Based on FusionRS, we train dual-modal vision-language foundation models for RGB-IR joint understanding. We first train CLIP-style models for RGB-IR-text alignment, and then fine-tune generative VLMs for dual-modal RGB-IR captioning. Experiments show that FusionRS improves RGB-IR alignment, infrared-to-text retrieval, and dual-modal captioning over RGB-only and non-IR-aware training settings. Ablation studies further verify that IR-aware captions are crucial for strengthening infrared-language alignment, highlighting the importance of modality-specific textual supervision for more scalable RGB-infrared remote sensing vision-language representation learning.

19.
arXiv (math.PR) 2026-06-24

REM universality and Poisson-Dirichlet Gibbs weights for linear random energy

arXiv:2606.07757v2 Announce Type: replace Abstract: We study the Hamiltonian $H_n(h,\sigma)=\sum_{i=1}^n h_i(\sigma_i-m), $ where $(h_i)$ are i.i.d.\ real random variables and $(\sigma_i)$ are i.i.d.\ Ising spins. We consider the energy levels obtained after an independent thinning that retains an exponential number of configurations ($e^{O(n)}$). We prove that, after an $(h_i)$-dependent centering, the resulting point process converges in distribution to a Poisson point process with exponential intensity. Thus, the energy levels asymptotically has the one of the Random Energy Model (REM). Our results extend previous ones, where REM universality for this model was established only either for energy fluctuations of order $e^{-O(n)}$ or for $e^{o(\sqrt n)}$ randomly selected configurations. We also identify the limiting Gibbs weights, which converge to a Poisson–Dirichlet law, and the quenched free energy, which exhibits a freezing transition at $\beta=\tilde\lambda$. The proofs are presented here in compressed form; full details are given in the companion preprint.

20.
arXiv (CS.AI) 2026-06-16

ROSA-RL: Uncertainty-Aware Roundabout Optimized Speed Advisory with Reinforcement Learning

arXiv:2606.16558v1 Announce Type: new Abstract: Roundabouts challenge automated driving in mixed traffic, as heterogeneous and non-deterministic human behavior, unknown driving intentions, and high interaction complexity create uncertainty about whether the conflict zone will be blocked or available at the moment of entry. We present ROSA-RL – uncertainty-aware Roundabout Optimized Speed Advisory with Reinforcement Learning. It enables safe and efficient roundabout entry for automated and human-driven vehicles in mixed traffic through probabilistic conflict forecasting. A Transformer-based model predicts conflict zone occupancy over a five-second horizon, capturing multi-agent interactions to anticipate upcoming conflicts and available gaps. The prediction outputs encode uncertainty in future motion and intent, and augment the state of a classical RL framework, enabling uncertainty-aware speed coordination. Evaluated in simulations grounded in real-world data, ROSA-RL can effectively handle uncertainty and outperform a comparable model-based baseline, closing the gap to an ideal setting assuming fully known occupancy while improving traffic efficiency and safety. The source code of this work is available under: github.com/urbanAIthi/ROSA-RL.

21.
arXiv (CS.LG) 2026-06-18

Stochastic Adaptive Gradient Descent Without Descent

arXiv:2509.14969v2 Announce Type: replace Abstract: We introduce a new adaptive step-size strategy for convex optimization with stochastic gradient that exploits the local geometry of the objective function only by means of a first-order stochastic oracle and without any hyper-parameter tuning. The method comes from a theoretically-grounded adaptation of the Adaptive Gradient Descent Without Descent method to the stochastic setting. We prove the convergence of stochastic gradient descent with our step-size under various assumptions, and we show that it empirically competes against tuned baselines.

22.
arXiv (CS.CL) 2026-06-17

Adaptive Activation Steering for Efficient LLM Reasoning via Closed-Loop PID Control

Reasoning LLMs trained with long chain-of-thought often overthink: they spend tokens on redundant reflection and transitions that inflate cost without improving accuracy. Static activation steering (e.g.\ SEAL) suppresses such content with a fixed vector, but applies the same strength regardless of how redundant the current chunk actually is. We describe PID-steering, a training-free, decoding-time method that modulates the steering strength with a PID controller driven by a lightweight chunk-level redundancy classifier. On a subset of GSM8K with DeepSeek-R1-Distill-Qwen-1.5B, the method improves accuracy from 85.7\% to 89.6\% (+3.9 pp) while cutting average output length from 1026 to 790 tokens ($-$23\%). We report it as a small-scale proof of concept rather than a benchmark result.

23.
Nature Biotechnology 2026-06-19

Optimized R2 retroelement complexes for DNA insertion into plant genomes

Traditional approaches for DNA insertion into plant genomes using Agrobacterium tumefaciens result in random integration. Newer genetic engineering methods based on nucleases, prime editors, transposases and recombinases extend capabilities but remain constrained with low efficiencies, off-target integration or limited payload size. Here we adapt the avian Taeniopygia guttata R2 protein (R2Tg) for targeted DNA insertion into plant genomes by engineering R2Tg expression cassettes and RNA payloads carrying intron-disrupted reporters, with optimized ribosomal DNA homology arms and untranslated regions. In Arabidopsis thaliana protoplasts, Nicotiana benthamiana leaves and Solanum lycopersicum seedlings, our R2Tg editor system achieves targeted insertion of full-length payloads ranging from 2.2 kb to 5 kb. In Nicotiana benthamiana leaves, integration occurs, on average, at 1 copy per genome, which is 30 times more efficient than that achieved by Cas9 homology-directed repair. This work establishes an R2Tg ribonucleoprotein platform for targeted DNA insertion into plant genomes, using a multicopy genomic safe-harbor site to enable efficient addition of multikilobase genes. R2 retrotransposons are used to integrate DNA into plant and crop 25S ribosomal DNA sites.

24.
arXiv (CS.CV) 2026-06-24

An LMM for Precisely Grounding Elements in Documents

Visual grounding in documents is a crucial ability for Large Multimodal Models (LMMs) in areas such as document understanding, deep research and document error detection. However, existing approaches exhibit poor grounding precision in text-rich document images, often failing to accurately locate the critical document elements needed for reliable reasoning. To address this gap, we introduce PreciseDoc, an LMM specifically designed for precise element grounding and can be further optimized for Document VQA tasks. Specifically, to enhance the basic localization capability, we construct challenging training data by two pipelines capable of mass-producing high-quality documents with paired metadata of fine-grained coordinates, including synthetic hand-filled documents with camera effects. The model develops more real-world functions beyond straightforward localization of single text, such as locating personal information from CVs. Furthermore, we introduce a training paradigm for visual grounded reasoning where the grounding and reasoning are supervised jointly with reinforcement learning to improve the contribution of the grounded evidence. A comprehensive evaluation on various benchmarks demonstrates the advantage of the proposed data and methods in document spatial grounding and document understanding.

25.
arXiv (CS.LG) 2026-06-16

A polarity-aware multi-relational model for the signed interaction prediction in biological networks

arXiv:2407.07357v3 Announce Type: replace Abstract: Predicting signed interactions in biological networks is crucial for understanding drug mechanisms and facilitating drug repurposing. While deep graph models have demonstrated success in modeling complex biological systems, existing approaches often fail to distinguish between positive and negative interactions, limiting their utility for precise pharmacological predictions. In this study, we propose a novel deep graph model, PAMR (polarity-aware multi-relational model), designed to predict both polar (e.g., activation, inhibition) and non-polar (e.g., binding, affect) chemical-gene interactions. Our model integrates graph convolutional networks with tensor decomposition to enhance feature representation and incorporates a conflict-aware sampling strategy to resolve polarity ambiguities. We introduce new evaluation metrics, polarity discrimination score (PDS) and CP@100, to assess the model's ability to differentiate interaction types. Experimental results demonstrate that PAMR outperforms baseline models, achieving superior classification accuracy and improved discrimination of polar edges. Specifically, PAMR-CL attains a Macro AUROC of 0.9072 and CP@100 of 0.974, surpassing RGCN, GraphSAGE, TransE, and BioNet baselines. A case study on nicotine further identifies two novel chemical-gene suppression links, S100A6 and SPP1, that are corroborated by independent experimental literature. Furthermore, we analyze the impact of subgraph components on predictive performance, revealing that additional network structures do not always enhance accuracy. These findings highlight the importance of polarity-aware modeling in drug discovery and network pharmacology, providing a scalable computational framework for polarity-aware chemical-gene interaction prediction and network pharmacology analysis.