Academic Intelligence · Curated Daily

探索全球前沿学术脉络

AcademicHub 汇聚顶级期刊与预印本平台的实时文献。定制您的专属科研雷达,利用大语言模型自动生成交叉领域文献分析简报。

01.
arXiv (CS.CL) 2026-06-15

Verbatim Chunks Beat Extracted Artifacts: A Controlled Ablation of Memory Representations for Long LLM Conversations

作者:

A growing class of conversational-memory systems compresses dialogue history into structured artifacts – extracted facts, decisions, or events – on the premise that distilled structure retrieves better than raw text. We test this premise with a controlled ablation: within one fixed retrieval-rerank-reasoning pipeline, we swap only the stored representation – LLM-extracted typed artifacts versus verbatim conversation chunks – holding the model, retriever, reranker, and judge constant. Verbatim chunks win by 15.9 points on LoCoMo (43.9% vs. 28.0%) and 22.0 points on LongMemEval-S (67.4% vs. 45.4%); a 1-hop semantic graph does not recover the gap, and five confound controls reproduce the effect. The mechanism is lossy distillation: extraction discards verbatim detail that chunks retain for free, and the extracted-artifact pipeline never beats naive RAG in overall accuracy. Concurrent positive results with near-verbatim, provenance-preserving units fit the same account: retrieval accuracy tracks how far the representation departs from the source. For the extraction designs we test, structured memory should augment verbatim text rather than replace it: a chunks $\cup$ artifacts union store matches chunks on both benchmarks while artifacts alone forfeit the gap. Code and data: https://github.com/tao-hpu/cog-canvas

02.
arXiv (CS.LG) 2026-06-15

Curvature-Guided Geometric Representation for Protein-Ligand Binding Affinity Prediction

arXiv:2606.14159v1 Announce Type: new Abstract: Protein-ligand binding affinity (PLA) prediction is critical in drug discovery. Despite the notable advancements in machine learning-based approaches, existing methods struggle to jointly characterize local geometric organization and globally coordinated cross-molecular interactions, limiting their ability to model complex binding mechanisms. Here, we propose RicciBind, a geometric representation framework that integrates curvature-guided hierarchical structure learning with optimal transport (OT)-based cross-domain alignment to model molecular interactions. Specifically, RicciBind leverages Ricci curvature to capture local interaction tightness within molecular structures, enhancing structural awareness and organizing atomic interactions into curvature-aware hierarchical representations. An OT-based cluster matching mechanism then aligns protein and ligand clusters across heterogeneous domains under geometric constraints, enabling globally consistent correspondences and revealing higher-order interaction patterns beyond local neighborhoods. By coupling curvature-guided structure encoding with OT-driven cross-domain alignment, RicciBind effectively models complex interaction semantics and substantially improves both the accuracy and interpretability of binding affinity prediction. Extensive experiments demonstrate that RicciBind achieved superior predictive performance and generalization across PLA benchmarks and virtual screening tasks. Ablation studies further confirmed the essential role of Ricci curvature in enhancing molecular interaction representations.

03.
arXiv (CS.CL) 2026-06-11

The Dynamics of Human and AI-Generated Language: How Semantics Fluctuates across Different Timescales

Spoken language, whether produced by humans or large language models (LLM), unfolds over time with varying semantic content. However, we still lack simple, interpretable time-series features that capture how generic versus specific content is distributed over time, and that can be used to compare human and AI-generated speech. We introduce a semantic-timescale analysis pipeline that turns word-level transcripts with timestamps into semantic time-series. For each spoken narrative, we compute (i) semantic specificity using WordNet-based word depth and (ii) contextual similarity using SBERT embeddings and quantify their temporal dependence using autocorrelation-window measures (ACW-0 and related metrics). We then compare original speech to multiple shuffled controls that selectively disrupt lexical identity, temporal order, and word duration. Across human-read autobiographical narratives, TTS readings, and LLM-generated texts rendered with TTS, we find that segments with longer ACW-0 in the semantic time-series tend to contain more generic vocabulary, whereas segments with shorter ACW-0 are enriched in more specific words. These associations are strongly attenuated or abolished when word order and timing are randomized, indicating that ACW-based measures capture non-trivial temporal organization of semantic content beyond static lexical distributions. Our results suggest that ACW-based semantic timescales are a useful family of features for analyzing and comparing the temporal structure of human and AI-generated speech.

04.
arXiv (quant-ph) 2026-06-16

Weak continuous measurements require more work than strong ones

arXiv:2502.09732v4 Announce Type: replace Abstract: Understanding the energy cost of quantum measurement process and its connection to the measurement performance faces the challenge of modeling the objectification process. The latter, turns the measurement result into an objective fact, available to independent observers, and is responsible for the measurement irreversibility. To address this issue, we propose and analyze a dynamical model of quantum measurement, able to capture nonideal (weak and inefficient) measurements. In this model, the objectification is induced by a contact with a macroscopic reservoir at equilibrium which is responsible for the redundant broadcast of the measurement outcome (producing a Spectrum Broadcast Structure (SBS) state) while inducing decoherence in the pointer basis, in the line of the theory of quantum Darwinism. We analyze the performance of the obtained measurement process by introducing figures of merit to quantify the strength of the measurement and its efficiency. We also derive and a lower bound on the measurement work cost that we can relate to the measurement quality. We take as an illustration the readout of a qubit via its coupling to a harmonic oscillator. We investigate the long sequences of extremely short and weak measurements (a.k.a continuous measurements), to find under which conditions they converge to an ideal (projective) measurement and analyze their work cost. Surprisingly, we find that a sequence converging to projective measurement has a much larger work cost than an equivalent strong measurement obtained from a single intense interaction with the apparatus. We extend this result to a large class of models owing to scaling arguments. Our analysis offers new insights into the trade-offs between measurement strength, energy consumption, and information extraction in quantum measurement protocols.

05.
arXiv (CS.CV) 2026-06-16

Comparing Human Gaze and Vision-Language Model Attention in Safety-Relevant Environments

Human visual attention plays an important role in how people perceive and respond to environments containing potential risks. This study investigates whether large vision-language models can identify the same regions of a scene that attract human attention in safety-relevant environments. Eye-tracking data were collected from ten participants viewing 33 scene images representing environments with varying levels of potential risk using Pupil Invisible wearable glasses. Gaze coordinates were mapped onto stimulus images to generate population-averaged human gaze heatmaps. In parallel, GPT-4o was prompted through the OpenAI Vision Application Programming Interface (API) to generate spatial predictions of visual attention, which were converted into saliency maps for comparison with human gaze patterns. Spatial alignment between human gaze heatmaps and model-generated saliency maps was evaluated using four complementary metrics: Pearson correlation (r = 0.515 +- 0.117), Normalised Scanpath Saliency (NSS = 0.988 +- 0.323), Kullback-Leibler divergence (KL = 1.766 +- 0.844), and Area Under the Receiver Operating Characteristic Curve using the Judd formulation (AUC-Judd = 0.806 +- 0.076). A cross-model comparison with Gemini Pro, Gemini Flash, and Claude showed that all models exceeded the AUC-Judd chance baseline of 0.5 and achieved positive NSS scores. Gemini Pro demonstrated the strongest spatial localisation according to three of the four metrics, whereas GPT-4o produced the closest distributional match to human attention as measured by KL divergence. These findings suggest that large vision-language models can identify regions that broadly correspond to where humans direct visual attention in safety-relevant scenes without requiring eye-tracking training data. The results highlight the potential of vision-language models as a scalable tool for approximating human attentional patterns.

06.
arXiv (CS.AI) 2026-06-16

EMS: Multi-Agent Voting via Efficient Majority-then-Stopping

arXiv:2604.02863v2 Announce Type: replace Abstract: Majority voting is the standard for aggregating multi-agent responses into a final decision. However, traditional methods typically require all agents to complete their reasoning before aggregation begins, leading to significant computational overhead, as many responses become redundant once a majority consensus is achieved. In this work, we formulate efficient multi-agent voting as a reliability-aware agent scheduling problem and propose Efficient Majority-then-Stopping (EMS) to improve reasoning efficiency. EMS first estimates a Task-Conditioned Reliability Ordering (TCRO) for each agent by retrieving its historical consensus evidence on semantically similar queries, and then invoking agents in descending reliability order. Next, Adaptive Incremental Voting (AIV) terminates the process once the current leading answer cannot be overturned by any possible votes from the remaining agents, and returns this answer. Finally, Reliability History Updating (RHU) updates only the invoked agents according to their consensus with the final decision. Extensive evaluations across five benchmarks show that EMS preserves the accuracy of Majority Voting while reducing the average number of invoked agents by 35% and token consumption by 44%, respectively. The code is available at https://github.com/fuyu66/EMS.

07.
arXiv (CS.AI) 2026-06-18

SHIFT: Semantic Harmonization via Index-side Feature Transformation for Multilingual Information Retrieval

arXiv:2606.18801v1 Announce Type: cross Abstract: With the rapid expansion of massive multilingual corpora, Multilingual Information Retrieval (MLIR) has emerged as a critical technology for global information access. MLIR enables users to retrieve semantically relevant documents from multilingual text collections using a single-language query. However, recent multilingual dense retrieval models often exhibit a strong preference for documents in the same language as the query. This leads to severe language bias, where top-ranked results are dominated by documents of specific languages, even when documents in other languages contain more semantically relevant information. To address this issue, we propose SHIFT, a training-free method applicable in the indexing stage. Specifically, SHIFT utilizes parallel translation pairs to estimate a relative language vector for each target language with respect to a source language. Subsequently, SHIFT corrects the language-specific offset by subtracting this relative language vector from document embeddings during indexing. Our comprehensive evaluation across four MLIR benchmarks and diverse dense retrieval models confirms that SHIFT can effectively mitigate language bias and enhance MLIR performance.

08.
arXiv (CS.LG) 2026-06-19

Weighted Bayesian Conformal Prediction

arXiv:2604.06464v2 Announce Type: replace Abstract: Conformal prediction provides distribution-free prediction intervals with finite-sample coverage guarantees, and recent work by Snell \& Griffiths reframes it as Bayesian Quadrature (BQ-CP), yielding powerful data-conditional guarantees via Dirichlet posteriors over thresholds. However, BQ-CP fundamentally requires the i.i.d. assumption. Meanwhile, weighted conformal prediction handles distribution shift via importance weights but remains frequentist, producing only point-estimate thresholds. We propose Weighted Bayesian Conformal Prediction (WBCP), which generalizes BQ-CP to arbitrary importance-weighted settings by replacing the uniform Dirichlet $\Dir(1,\ldots,1)$ with a weighted Dirichlet $\Dir(\neff \cdot \tilde{w}_1, \ldots, \neff \cdot \tilde{w}_n)$, where $\neff$ is Kish's effective sample size. We prove four theoretical results: (1)~$\neff$ is the unique concentration parameter matching frequentist and Bayesian variances; (2)~posterior standard deviation decays as $O(1/\sqrt{\neff})$; (3)~BQ-CP's stochastic dominance guarantee extends to per-weight-profile data-conditional guarantees; (4)~the HPD threshold provides $O(1/\sqrt{\neff})$ improvement in conditional coverage. We instantiate WBCP for spatial prediction as Geographical BQ-CP, where kernel-based spatial weights yield per-location posteriors with interpretable diagnostics. Experiments on synthetic and real-world spatial datasets demonstrate that WBCP maintains coverage guarantees while providing substantially richer uncertainty information.

09.
bioRxiv (Bioinfo) 2026-06-11

A quantitative coordinate system for developmental dynamics

Quantitative comparison of morphogenesis across individuals remains a fundamental challenge, as developing embryos vary in shape, orientation and developmental tempo. Moreover, real-time three-dimensional imaging generates large, heterogeneous four-dimensional datasets that are difficult to directly align. As a result, developmental variability is typically described qualitatively rather than measured. Here we introduce STERN, a quantitative framework that learns continuous spatiotemporal representations of morphogenesis directly from in vivo 4D imaging data. By embedding embryos into a shared spatiotemporal space, STERN defines a quantitative developmental coordinate system that enables direct comparison of developmental trajectories across individuals without requiring explicit registration or staging. Applied to mouse embryogenesis, STERN reveals that embryos follow conserved developmental trajectories while progressing at distinct temporal rates, providing a quantitative measure of developmental heterochrony. Extending this framework to zebrafish neural crest light-sheet timelapse imaging, we further show that developmental order is preserved across distinct imaging views even with altered anatomical coverage, supporting the generality of the learned representation across vertebrate imaging contexts. Finally, in developing mouse hearts, where morphogenesis proceeds through subtle and continuously evolving structural changes, STERN resolves fine-scale developmental dynamics at minute-scale temporal resolution that are difficult to localize reproducibly using human experts or general-purpose multimodal AI. Together, these results establish a shared quantitative coordinate system for morphogenesis, in which developmental trajectories become directly comparable across individuals and developmental variability becomes a measurable property.

10.
arXiv (CS.CV) 2026-06-15

Pix2Fact: When Vision Is Not Enough – Benchmarking Fine-Grained VQA with Web Verification on High-Resolution Real-World Scenes

Despite progress on general tasks, vision-language models (VLMs) still struggle with challenges that demand both fine-grained visual grounding and external knowledge, a synergy overlooked by existing benchmarks that evaluate these abilities in isolation. To fill this void, we introduce Pix2Fact, a visual question-answering benchmark designed to assess expert-level visual perception and knowledge search. Pix2Fact comprises 1,000 high-resolution (4K+) images spanning eight scenarios. Its questions and answers are meticulously crafted by PhD-holding annotators from top global universities across diverse disciplines. Each question requires detailed visual grounding and the integration of external knowledge. Evaluating ten state-of-the-art VLMs, including proprietary models such as Gemini-3.1-Pro and GPT-5.4, we find that Pix2Fact poses a formidable challenge: the most advanced model (Gemini-3.1-Pro) achieves only 51.7% average accuracy, even with access to visual ground truth and search tools. Our analysis attributes this low accuracy to three factors, frequent visual grounding errors even with visual ground truth, shallow search harnessing, and VLM's inability to retrieve long-tail, unstructured local information. This striking gap exposes the limitations of current models in assisting humans with real-world scenarios that demand overwhelming visual comprehension. We believe Pix2Fact will serve as a critical benchmark to drive the next generation of language-vision agents that seamlessly integrate fine-grained perception with robust knowledge search.

11.
arXiv (quant-ph) 2026-06-17

Quantum Chip Paradigm Framework

arXiv:2606.17899v1 Announce Type: new Abstract: Quantum Electronic Design Automation (Q-EDA) is emerging as quantum chips move from laboratory prototypes to scalable engineering systems. This paper argues that superconducting quantum chip design is approaching a "SPICE moment" similar to early classical EDA, where growing qubit scale, control complexity, frequency planning, packaging, process variation, and cryogenic measurement feedback require a shift from experience-based design to model-driven engineering. We propose a Quantum Chip Paradigm Framework that treats Q-EDA not only as software, but as part of the quantum chip development paradigm. Unlike classical HDL-first design, quantum chip design must begin with physical structures such as Josephson junctions, resonators, couplers, readout elements, control lines, and packaging environments. The framework emphasizes PCell-based modeling, SPICE-Q simulation, Quantum PDKs, and design-technology-measurement co-optimization. We further outline a hierarchical Q-EDA system spanning physical structures, qubit PCells, logical qubits, quantum arithmetic, functional quantum IP, and Quantum SoC systems. The key goal is to turn physical models, layout rules, simulation results, fabrication data, and measurement feedback into reusable and auditable engineering objects for large-scale quantum processors and fault-tolerant quantum computing.

12.
arXiv (CS.AI) 2026-06-16

Semantics-Enhanced Retrieval-Augmented Time Series Forecasting

arXiv:2606.14941v1 Announce Type: new Abstract: Time series forecasting models often benefit from historical patterns. Inspired by Retrieval-Augmented Generation (RAG), recent research explored retrieving relevant historical time series segments to enhance forecasting. However, relying solely on time series similarity is often insufficient for retrieval under non-stationarity. To address this, we propose a multimodal approach: a Semantics-Enhanced Retrieval-Augmented Time Series Forecasting framework, SERAF. Unlike mainstream approaches that depend only on time series similarity, SERAF conducts dual retrieval over the time series and their self-generated textual descriptions. It retrieves two complementary sets of historical patterns and corresponding futures, which are selectively and jointly used to guide future predictions. Experiments across seven real-world datasets demonstrate the effectiveness of SERAF in bridging numerical and semantic views of time series compared with state-of-the-art baselines.

13.
arXiv (quant-ph) 2026-06-11

TensorKit.jl: A Julia package for large-scale tensor computations, with a hint of category theory

arXiv:2508.10076v2 Announce Type: replace-cross Abstract: TensorKit$.$jl is a Julia-based software package for tensor computations, especially focusing on tensors with internal symmetries. This paper introduces the design philosophy, core functionalities, and distinctive features, including how to handle abelian, non-abelian, and anyonic symmetries through the ``TensorMap'' type. We highlight the software's flexibility, performance, and its capability to extend to new tensor types and symmetries, illustrating its practical applications through select case studies.

14.
Nature (Science) 2026-06-08

GPR15-guided CD8<sup>+</sup> T regulatory cells control intestinal inflammation

作者:

Inflammatory bowel disease (IBD) causes chronic suffering from gastrointestinal inflammation and dysfunction that can progress to colon cancer1,2. The disease prevalence is increasing and there is an urgent need to better understand its pathogenic mechanisms to improve treatment. We show that GPR15, a G protein-coupled receptor (GPCR) expressed in immune cells and previously described as an entry co-factor for human and simian immunodeficiency viruses3, is a marker and homing receptor for a subset of intramucosal GPR15-guided regulatory CD8+ T lymphocytes (CD8+ TIGR). Deleterious GPR15 gene variants in humans cause defective homing of CD8+ TIGR and are associated with severe early-onset IBD. Moreover, CD8+ TIGR cells are reduced in the intestinal mucosa of sporadic IBD patients. In mice, GPR15 deficiency impairs colonic homing of CD8+ TIGR cells, leading to accumulation of inflammatory macrophages and increased susceptibility to colitis. CD8+ TIGR cells potently kill macrophages activated by intestinal damage or disease using Fas ligand (FasL) and TNF-related weak inducer of apoptosis (TWEAK). The identification of CD8+ TIGR cells yields new insights into organ-specific immune regulation and potential therapeutics for IBD.

15.
arXiv (CS.AI) 2026-06-11

Vision-Language-Action Jump-Starting for Reinforcement Learning Robotic Agents

arXiv:2604.13733v2 Announce Type: replace-cross Abstract: Reinforcement learning (RL) enables high-frequency, closed-loop control for robotic manipulation, but scaling to long-horizon tasks with sparse or imperfect rewards remains difficult due to inefficient exploration and poor credit assignment. Vision-Language-Action (VLA) models leverage large-scale multimodal pretraining to provide generalist, task-level reasoning, but current limitations hinder their direct use in fast and precise manipulation. In this paper, we propose Vision-Language-Action Jump-Starting (VLAJS), a method that bridges sparse VLA guidance with on-policy RL to improve exploration and learning efficiency. VLAJS treats VLAs as transient sources of high-level action suggestions that bias early exploration and improve credit assignment, while preserving the high-frequency, state-based control of RL. Our approach augments Proximal Policy Optimization (PPO) with a directional action-consistency regularization that softly aligns the RL agent's actions with VLA guidance during early training, without enforcing strict imitation, requiring demonstrations, or relying on continuous teacher queries. VLA guidance is applied sparsely and annealed over time, allowing the agent to adapt online and ultimately surpass the guiding policy. We evaluate VLAJS on six challenging manipulation tasks: lifting, pick-and-place, peg reorientation, peg insertion, poking, and pushing in simulation, and validate a subset on a real Franka Panda robot. VLAJS consistently outperforms PPO and distillation-style baselines in sample efficiency, reducing required environment interactions by over 50% in several tasks. Real-world experiments demonstrate zero-shot sim-to-real transfer and robust execution under clutter, object variation, and external perturbations.

16.
arXiv (CS.LG) 2026-06-15

Optimal Hidden-Target Learning for Online Inventory Optimization on General Convex Sets

arXiv:2606.14679v1 Announce Type: new Abstract: Online inventory optimization (OIO) is online convex optimization with physical memory: inventory carryover makes the feasible action set depend on the past. A natural principle, used in stochastic inventory learning and recently in OIO under a single linear capacity constraint, is to maintain a hidden target chosen by an online learner and implement its projection onto the currently feasible order-up-to set. We prove that this simple principle is optimal for OIO on arbitrary bounded convex capacity sets. With online gradient descent as the base learner, the method improves the best known regret guarantee for OIO on general convex sets from inverse to inverse-square-root dependence on the common-demand probability, and we prove a matching lower bound. The same principle gives the first polylogarithmic regret guarantee for strongly convex losses and the first dynamic regret guarantee adapting to Euclidean path variation on general convex capacity sets. The analysis introduces a norm alignment principle: the right state variable is the distance from the hidden target to the feasible set, measured in the same norm as the projection. Under norm alignment, this distance evolves pathwise as a scalar queue, with target movement as arrival and common demand as service. This reduction to one-dimensional queue control resolves the state dependence and extends the guarantees to general convex capacity sets, beyond the reach of prior productwise approaches. Experiments on synthetic and real-world inventory data corroborate the theory.

17.
arXiv (CS.CV) 2026-06-12

Zero-Shot Captioning for Cultural Heritage: Automated Image Analysis of Traditional Indonesian Clothing

This paper presents Custom ZeroCLIP, a retrieval-augmented vision-language framework for zero-shot captioning of Indonesian traditional garments. The dataset contains 3,800 expert-annotated images from all 38 Indonesian provinces. Using a province-level inductive zero-shot protocol, the model is trained on 24 seen provinces, validated on 6 seen provinces, and evaluated on 8 unseen provinces. The framework combines a frozen CLIP ViT-B/32 image encoder, a CLIP text encoder, a BERT text encoder, and an LSTM caption decoder. During inference, unseen-province labels and captions are unavailable, and retrieval uses only captions from training provinces. No unseen-province image, label, or caption is used during training, validation, or retrieval-bank construction. Custom ZeroCLIP achieves a CLIPScore of 0.8536, BLEU-4 of 0.3342, and METEOR of 0.4859, outperforming existing baselines. Ablation results show that retrieval improves cultural vocabulary recovery with a 19.3\% METEOR gain, while human evaluation confirms stronger cultural accuracy and fluency. The results demonstrate the effectiveness of retrieval-augmented domain adaptation for culturally grounded caption generation in low-resource heritage settings. The dataset is publicly available at https://github.com/AnugrahAidinYotolembah/Traditional-Indonesian-Clothing-Captioning-Dataset.

18.
arXiv (CS.CV) 2026-06-24

Lite Any Stereo V2: Faster and Stronger Efficient Zero-Shot Stereo Matching

Recent advances in stereo matching have achieved remarkable accuracy, but often rely on large models, heavy computation, or additional foundation-model priors, making them difficult to deploy on resource-constrained platforms. In contrast, efficient stereo models offer faster inference but are commonly considered less capable of strong zero-shot generalization. In this paper, we challenge this assumption by introducing Lite Any Stereo V2 (LAS2), an ultra-fast model series designed for efficient zero-shot stereo matching. LAS2 is developed from both architecture and training perspectives. Architecturally, we revisit efficient stereo design under practical deployment settings and propose a 2D-only cost aggregation framework, optimized for real inference latency rather than theoretical MACs alone. For training, we develop a three-stage strategy that combines synthetic supervision, self-distillation, and real-world knowledge distillation. To improve the reliability of real-world pseudo supervision, we further introduce pseudo-label filtering and an error-clamping operation, enabling smoother synthetic-to-real transfer. We instantiate LAS2 as a family of models, including feed-forward variants for different efficiency budgets and an iterative variant for higher accuracy. Extensive experiments show that LAS2 achieves state-of-the-art accuracy among efficient stereo methods while maintaining significantly lower latency. Specifically, LAS2-H achieves stronger overall zero-shot performance than the iterative method Fast-FoundationStereo, with 1.8x and 2.7x faster inference on H200 and Orin, respectively. The project page, demos, and code are available at https://tomtomtommi.github.io/LiteAnyStereoV2/.

19.
arXiv (CS.AI) 2026-06-11

Exploration Structure in LLM Agents for Multi-File Change Localization

arXiv:2606.11976v1 Announce Type: cross Abstract: Software engineering tools increasingly rely on LLM based agents to localize files to change to resolve a software issue. Most AI agents explore repositories linearly, that is, visiting one directory or file per step. We postulate that this is a structural mismatch for changes that span several subsystems. We compare linear sequential exploration against non-linear, domain-scoped parallel agentic exploration. Using SWE Bench Pro as initial benchmark, we focus on ansible as an exemplar. We construct an approach for persistent-session evaluation of GitHub issues anchored at a single base commit. We compare our non-linear domain-agent file traversal system against a base LLM without direct repository access, a single agent Recursive Language Model (RLM) baseline with a persistent Python REPL and an external CLI baseline using Codex 5.5 High. Domain scoped parallel agent spawning with a small Haiku-class model achieves the highest micro F1 among Haiku class models by a large margin. Domain-agents is the second highest behind only the much larger Codex 5.5 High on our own expanded benchmark including over more recent PRs from 2025 and 2026. On the original, curated, 2020 SWE-bench Pro benchmark, a larger Sonnet plain LLM baseline attains higher micro F1 by predicting few files, leading to higher precision, but at significantly lower all gold recall. We also present three additional findings. First, documentation evolution is a latent dependency unresolved by any approach. Second, naive file system access can degrade localization driven by test-file over prediction. Lastly, forced multi-agent consultation does not measurably help and raises token cost substantially.

20.
arXiv (CS.CL) 2026-06-11

Can News Predict the Market? Limits of Zero-Shot Financial NLP and the Role of Explainable AI

Can financial news reliably predict short-term stock movements? Despite advances in large language models, this question remains unresolved. We revisit this problem using a zero-shot natural language processing framework, investigating whether models can extract actionable signals from financial news without domain-specific training. We design a structured pipeline that combines zero-shot natural language inference with temporal aggregation, explicitly modelling recency and event-dependent impact horizons when integrating information across articles. To address the need for transparency in high-stakes settings, we introduce a multi-layered explainability framework that links predictions to token-level, article-level, and aggregate evidence, and produces grounded natural language rationales. Across multiple models and prediction horizons, we find that zero-shot approaches consistently fail to outperform simple baselines, with particularly weak performance on negative movements, suggesting deeper structural limitations in mapping news sentiment to short-term price dynamics. However, explainability signals reliably distinguish between trustworthy and unreliable predictions, offering practical value even when accuracy is limited. These findings highlight the limits of zero-shot financial NLP and motivate a shift toward decision-support systems that prioritise transparency and uncertainty awareness. Code: https://github.com/alimert05/zero-shot-stock-xai

21.
arXiv (quant-ph) 2026-06-16

Controlled Quantum Metrology with Anisotropic Heisenberg Spin Interactions under Intrinsic Decoherence

arXiv:2606.16918v1 Announce Type: new Abstract: We theoretically investigate quantum parameter estimation in a two-qubit anisotropic Heisenberg spin system with Dzyaloshinskii-Moriya (DM) interaction in the presence of intrinsic decoherence described by the Milburn model. Using the Quantum Fisher Information (QFI), we study the estimation of both the uniform magnetic field and the DM interaction strength. Analytical expressions for the time-evolved density matrix are obtained and used to explore the effects of exchange anisotropy, intrinsic decoherence, and probe-state preparation on the achievable estimation precision. Our results show that suitable tuning of the anisotropic exchange coupling and the initial entangled state can considerably enhance the estimation performance, with different optimal parameter regimes emerging for magnetic-field and DM-interaction sensing. To better understand the role of quantum resources in metrology, we also examine the behaviour of concurrence, quantum coherence, and von Neumann entropy. Overall, our findings demonstrate that anisotropic Heisenberg spin systems with DM interaction provide a promising and flexible platform for high-precision quantum metrology even in the presence of intrinsic decoherence.

22.
PLOS Medicine 2026-05-21

Novel symptoms associated with eclampsia could improve detection and save lives

by Alice Beardmore-Gray, Andrew Shennan Eclampsia is a life-threatening complication of pre-eclampsia, yet remains difficult to predict. In this Perspective, Alice Beardmore-Gray and Andrew Shennan highlight a recent study that identifies 10 novel prodromal symptoms of eclampsia, with potential to better predict which women are at risk and therefore reduce delays in intervention.

23.
arXiv (CS.AI) 2026-06-16

PolyKV: Heterogeneous Retention and Allocation for KV Cache Compression

arXiv:2606.15157v1 Announce Type: cross Abstract: KV cache compression is essential for reducing the memory cost of long-context large language model inference. Existing approaches, however, typically apply a single compression policy and a uniform cache budget across all transformer layers. This uniform design ignores the fact that different layers can play different roles during prefill and decoding, and may therefore require different eviction strategies and cache capacities. We present PolyKV, a layer-wise KV cache optimization framework that considers design space with method selection and budget allocation. PolyKV routes each layer to a suitable KV compression policy based on layer-level signals, while assigning non-uniform budgets under a fixed total budget. This formulation enables heterogeneous compositions of existing KV cache methods. Experiments on LLaMA-3.1-8B and Qwen3-8B show that, under the same 512-token average KV budget, PolyKV recovers 54.5% and 25.7% of the LongBench performance gap between the strongest single-policy baseline and FullKV, respectively. Across 128-1024 budget sweep, PolyKV consistently improves over the strongest baseline by 1.7%-6.4%, corresponding to 40.0%-54.5% recovery of the FullKV gap.

24.
medRxiv (Medicine) 2026-06-17

Efficacy of a Gamified Digital Platform for Substance Use Education and Overdose Prevention Among College Students: a Pilot and Feasibility Study

Background: For US young adults aged 18-25 in the 2018-2024 period, fentanyl was involved in 78.2% of the 44,020 unintentional or undetermined-intent overdose deaths, most often co-involving stimulants and other non-opioid substances. While fatal overdose rates in this age group have fallen to their lowest recorded level, emergency medical services-attended non-fatal overdose events have reached record highs, shifting the decisive variable toward bystander recognition and response. College students report near-universal alcohol education but minimal education on the substances actually driving overdose mortality. Methods: We conducted a single-group pre-post evaluation of the DopaGE Portal, a gamified, mastery-based digital platform covering cocaine, MDMA, benzodiazepines, and opioid overdose response, deployed at a public university (UNL) and a multi-campus volunteer network (TACO). Paired pre/post surveys (N=42) measured self-efficacy (7 items; primary), behavioral intentions, risk perception, and knowledge/attitudes on 5-point scales, plus four factual knowledge questions. Paired t-tests, exact McNemar tests, and Benjamini-Hochberg correction across eight primary tests were applied. Institutional naloxone distribution at UNL was tracked as an ecological behavioral outcome. A mandated high-school cohort (N=94) provided supplementary acceptability data. Results: Self-efficacy increased from 2.82 to 4.46 (d=2.00, 95% CI 1.46-2.55; adjusted p

25.
bioRxiv (Bioinfo) 2026-06-24

Generative Modeling of Mouse Embryogenesis for Fate and Disease Prediction

Embryonic development is orchestrated by complex gene regulatory networks, and learning regulatory dynamics from developmental data could allow us to understand, predict, and ultimately engineer cell fates. Here we introduce Navigo (https://github.com/aristoteleo/Navigo-release), a biologically grounded generative modeling framework that learns a developmental vector field by integrating flow matching at the population level with RNA kinetics modeling at the molecular level. Navigo accurately maps developmental trajectories across lineages on a mouse embryogenesis scRNA-seq atlas spanning 43 time points and comprising 12.4 million cells. Applied to cardiac development, Navigo enables disease modeling by mechanistically resolving regulatory networks that distinguish congenital heart disease subtypes. Navigo also predicts perturbation effects in a zero-shot manner, as validated on independent in vivo data from six knockout genotypes without perturbation-specific training, uncovering lineage-specific gene-compensation mechanisms. Moreover, Navigo guides rational cell-fate engineering, exemplified by fibroblast reprogramming analyses, including identifying pro-fibrotic barriers to cardiac fates and evaluating hundreds of pairwise transcription factor combinations for neuronal fate, each consisting of one bHLH factor and one POU factor. Overall, Navigo provides a generalizable AI platform for perturbation-effect prediction, disease modeling, and rational cell-fate engineering, advancing toward AI-based virtual embryos for developmental biology and regenerative medicine.