论文广场 - AcademicHub

01.

arXiv (CS.AI) 2026-06-16 DOI: arXiv:2606.16050

ALCL: An Adaptive Log-Correntropy Loss for Robust Learning under Non-Gaussian Noise

作者:

Mainak Kundu ↗Ria Kanjilal ↗Ismail Uysal ↗

arXiv:2606.16050v1 Announce Type: cross Abstract: Robust deep learning under heavy-tailed and impulsive noise remains challenging because conventional losses such as mean squared error (MSE) exhibit unbounded sensitivity to outliers. Although correntropy-based objectives improve robustness, existing formulations rely on fixed kernel parameters that must be empirically tuned and remain static during training. To address these limitations, we propose an Adaptive Log-Correntropy Loss (ALCL), a heavy-tailed loss formulation that adaptively learns its robustness geometry during optimization. ALCL introduces a logarithmic residual model whose shape and scale parameters are learned jointly with network weights through differentiable reparameterization. This yields a principled maximum likelihood formulation whose influence function is formally bounded and redescending, allowing the loss geometry to adapt dynamically to evolving residual statistics while suppressing extreme outliers. Comparative experiments on four widely used benchmark datasets spanning grayscale and red-green-blue (RGB) image data under mixed heavy-tailed and impulsive noise demonstrate that ALCL consistently outperforms MSE and optimally tuned generalized correntropy losses in both reconstruction fidelity and downstream classification accuracy. While performance differences remain small under low-noise conditions, under high-noise regimes ALCL improves median accuracy by up to 4.75% on grayscale benchmarks and 4.51% on RGB datasets, with reduced variance across runs. These results demonstrate that adaptive robustness through joint learning of loss parameters provides a computationally efficient alternative to static correntropy-based losses for deep learning in non-Gaussian environments.

阅读与讨论 → 访问原文 →

02.

arXiv (CS.AI) 2026-06-16 DOI: arXiv:2601.21527

Sustainable Materials Discovery in the Era of Artificial Intelligence

作者:

Sajid Mannan ↗Rupert J. Myers ↗Rohit Batra ↗Rocio Mercado ↗Lothar Wondraczek ↗N. M. Anoop Krishnan ↗

arXiv:2601.21527v3 Announce Type: replace-cross Abstract: Artificial intelligence (AI) has transformed materials discovery, enabling rapid exploration of chemical space through generative models and surrogate screening. Yet current generative AI models for materials discovery, which now drive exploration of vast chemical and structural spaces, optimize candidates exclusively for structural stability and functional properties, with no integration of environmental assessment at any stage of the design loop. Prospective and ex-ante life cycle assessment methods exist and have been applied to emerging technologies, but they operate as standalone downstream analyses, not as active constraints within generative or active-learning pipelines. The result is that environmental feedback, even when produced, arrives after design decisions have been made rather than informing them. The disconnect between atomic-scale design and lifecycle assessment (LCA) reflects fundamental challenges: (i) data scarcity across heterogeneous sources, (ii) scale gaps from atoms to industrial systems, (iii) uncertainty in synthesis pathways, and (iv) the absence of frameworks that co-optimize performance with environmental impact. In this Perspective, we propose integrating upstream ML-assisted materials discovery with downstream LCA into the ML-LCA framework, comprising five components: information extraction for building materials-environment knowledge bases, harmonized databases linking properties to sustainability metrics, multi-scale models bridging atomic properties to lifecycle impacts, ensemble prediction of manufacturing pathways with uncertainty quantification, and uncertainty-aware optimization enabling simultaneous performance-sustainability navigation. Case studies spanning polymers, glass, photoresists, and cement demonstrate both necessity and feasibility while identifying material-specific integration challenges.

阅读与讨论 → 访问原文 →

03.

arXiv (quant-ph) 2026-06-16 DOI: arXiv:2501.05103

Exactly Solvable Quantum Model with Spin-Dependent Coulomb Interaction

作者:

Jiang-Lin Zhou ↗Yu-Xuan Zhang ↗Choo Hiap Oh ↗Jing-Ling Chen ↗

arXiv:2501.05103v5 Announce Type: replace Abstract: In this work, we report an exactly solvable quantum model featuring a spin-dependent Coulomb interaction, described by the spin vector potential $\vec{\mathcal{A}} = k (\vec{r} \times \vec{S}) / r^2$ together with a Coulomb-type scalar potential $\varphi = \kappa / r$ . The model is governed by the Schrödinger-type Hamiltonian $\mathcal{H}_S = \vec{\Pi}^2 / (2M) + q \varphi$ in nonrelativistic quantum mechanics and by the Dirac-type Hamiltonian $\mathcal{H}_D = c \vec{\alpha} \cdot \vec{\Pi} + \beta M c^2 + q \varphi$ in relativistic quantum mechanics, where $\vec{\Pi} = \vec{p} - (q/c)\vec{\mathcal{A}}$ is the canonical momentum. We demonstrate two main results: (i) Just as the Coulomb-type scalar potential $\mathcal{S}_Maxwell = \{\vec{\mathcal{A}} = 0,\ \varphi = \kappa / r\}$ is a local exact solution of Maxwell's equations on $r\neq0$, the gauge potential $\mathcal{S}_YM = \{\vec{\mathcal{A}} = k (\vec{r} \times \vec{S}) / r^2,\ \varphi = \kappa / r\}$ constitutes a local exact solution of the Yang–Mills equations on the punctured region $r\neq0$. (ii) Both Hamiltonians $\mathcal{H}_S$ and $\mathcal{H}_D$ can be solved exactly in the presence of this spin-dependent Coulomb interaction. The resulting energy spectra are derived, and they naturally reduce to those of the ordinary hydrogen atom when the spin-dependent terms are neglected. Finally, we clarify the quantization conditions and the fixed-background interpretation of the model.

阅读与讨论 → 访问原文 →

04.

arXiv (CS.AI) 2026-06-16 DOI: arXiv:2606.14948

Beyond Correctness: Enhancing Architectural Reasoning in Code LLMs via Scalable Labeling with Agentic Judgment

作者:

Kirill Vasilevski ↗Ximing Dong ↗Benjamin Rombaut ↗Ruochen Deng ↗Jiahuei Lin ↗Arthur Leung ↗Dayi Lin ↗Boyuan Chen ↗Shaowei Wang ↗Ahmed E. Hassan ↗

arXiv:2606.14948v1 Announce Type: cross Abstract: LLMs have substantially improved software engineering yet real-world development requires architectural understanding. Such understanding is prohibitively expensive to label manually and impossible to verify through tests alone. We propose an agentic judging pipeline using a strong LLM as a scalable proxy for expert architectural evaluation, comprising two judges: the Architecture Complexity Judge (ACJ), which estimates codebase-specific architectural understanding a task demands, and the Architecture Quality Judge (AQJ), which evaluates patch conformance to repository-specific architectural conventions via source-grounded rubrics. Fine-tuning Qwen3-8B/14B/32B on 3,360 curated instances achieves resolved rates of up to 27.2% on SWE-bench Verified - up to 540% over the base model and 256% over unfiltered fine-tuning. Meanwhile, the trained models achieve strong cross-language generalization and consistent improvements in architectural patch quality.

阅读与讨论 → 访问原文 →

05.

arXiv (CS.LG) 2026-06-15 DOI: arXiv:2606.13825

Scalable Deep Unfolding of Conic Optimizers

作者:

Alex Oshin ↗Rahul Vodeb Ghosh ↗Evangelos A. Theodorou ↗

arXiv:2606.13825v1 Announce Type: cross Abstract: Deep unfolding (DU) accelerates iterative optimizers by introducing learnable components and training them through unrolled iterations, but extending DU to the large-scale semidefinite programs (SDPs) common in robotics has remained limited. Unrolling a full-update conic solver such as COSMO exposes two obstacles that prior work on learned conic solvers has not: backpropagating through the per-iteration linear-system solve incurs memory quadratic in the problem size once the coefficient matrix is formed explicitly, and backpropagating through the positive semidefinite (PSD) cone projection becomes numerically unstable when eigenvalues coincide. We address the first obstacle with a matrix-free implicit differentiation rule that operates entirely through matrix-vector products, reducing memory from $O(n^2)$ to $O(n)$ and enabling backpropagation at scales where direct factorization runs out of memory. We address the second with a backward rule based on the Dalečkii–Krein representation of the Fréchet derivative, which remains well-defined under repeated eigenvalues. Together these make it possible to learn lightweight hyperparameter policies and warm-starts for a full-update conic solver. We evaluate on nonlinear covariance steering problems solved via sequential convex programming (SCP), as well as standalone SDPs and second-order cone programs ranging from max-cut and Lovász $\vartheta$ SDPs to robust estimation and control problems. The learned policies outperform state-of-the-art solvers across all problems, and can provide up to a 50$\times$ speedup depending on the class. When used as a subroutine in SCP, the learned approach delivers over a 30$\times$ speedup compared to COSMO.

阅读与讨论 → 访问原文 →

06.

arXiv (CS.CL) 2026-06-12 DOI: arXiv:2604.26940

Select to Think: Unlocking SLM Potential with Local Sufficiency

作者:

Wenxuan Ye ↗Yangyang Zhang ↗Xueli An ↗Georg Carle ↗Yunpu Ma ↗

Small language models (SLMs) offer efficient deployment, yet they often lag behind their larger counterparts (LLMs) in reasoning. Existing remedies either invoke an LLM at points of reasoning divergence, incurring substantial latency and cost, or rely on standard distillation, which is limited by the SLM's capacity to accurately mimic the LLM's complex generative distribution. We address this dilemma by identifying local sufficiency: at divergence points, the LLM's preferred token often resides within the SLM's top-K next-token predictions, even when failing to emerge as the SLM top-1 choice. We therefore propose Select to Think (S2T), which reframes the LLM's role from open-ended generation to selection among the SLM's proposals, simplifying the supervision signal to discrete candidate rankings. Leveraging this, we introduce S2T-Local, which distills the selection logic into the SLM, empowering it to perform autonomous re-ranking without inference-time LLM dependency. Empirically, a 1.5B SLM's top-8 candidates contain the 32B LLM's choice with a 95% hit rate, and S2T-Local improves the 1.5B SLM's Math Avg. over greedy decoding by 24.1% relative gain, matching the efficacy of 8-path self-consistency with single-trajectory efficiency.

阅读与讨论 → 访问原文 →

07.

Nature Biotechnology 2026-06-16 DOI: HASH:39dd3b98bf98ac0351f213df1edf9f2e

DeepMind spinout raises $2.1 billion

作者: 未知作者

该条目无摘要（多为勘误、社论或新闻类内容，出版方未提供摘要）

阅读与讨论 → 访问原文 →

08.

arXiv (CS.AI) 2026-06-16 DOI: arXiv:2604.05859

When Do We Need LLMs? A Diagnostic for Language-Driven Bandits

作者:

Uljad Berdica ↗Fernando Acero ↗Anton Ipsen ↗Parisa Zehtabi ↗Michael Cashmore ↗Manuela Veloso ↗

arXiv:2604.05859v2 Announce Type: replace Abstract: We study Contextual Multi-Armed Bandits (CMABs) for non-episodic decision-making problems where the context includes both textual and numerical information (e.g., recommendation systems, dynamic portfolio adjustments, offer selection; all frequent problems in finance). While Large Language Models (LLMs) are increasingly applied to these settings, utilizing LLMs for reasoning at every decision step is computationally expensive, and uncertainty estimates are difficult to obtain. To address this, we introduce LLMP-UCB, a bandit algorithm that derives uncertainty estimates from LLMs via repeated inference. However, our experiments demonstrate that lightweight numerical bandits operating on text embeddings (dense or Matryoshka) match or exceed the accuracy of LLM-based solutions at a fraction of their cost. We further show that embedding dimensionality is a practical lever on the exploration-exploitation balance, enabling cost-performance tradeoffs without prompt complexity. Finally, to guide practitioners, we propose a geometric diagnostic based on the arms' embeddings to decide when to use LLM-driven reasoning versus a lightweight numerical bandit. Our results provide a principled deployment framework for cost-effective, uncertainty-aware decision systems with broad applicability across AI use cases.

阅读与讨论 → 访问原文 →

09.

arXiv (CS.LG) 2026-06-18 DOI: arXiv:2606.18524

On the Residual Scaling of Looped Transformers: Stability and Transferability

作者:

Shaowen Wang ↗Bingrui Li ↗Ge Zhang ↗Wenhao Huang ↗Shen Yan ↗Jian Li ↗

arXiv:2606.18524v1 Announce Type: new Abstract: Looped (weight-tied) Transformers apply a shared residual block $N$ times ($h \leftarrow h + \varepsilon\,f(h)$, same $f$ at each step), increasing effective depth without adding parameters. Prior depth-scaling analyses prescribe $\varepsilon = 1/\!\sqrt{L}$ for depth-$L$ residual networks. We show that this is insufficient for looped architectures: weight sharing makes residual updates correlated across iterations, requiring the stronger scaling $\varepsilon = 1/N$. For multi-layer blocks ($L$ unique layers looped $N$ times), we derive a factored parameterization $\varepsilon = \lambda/(N\!\sqrt{L})$ that separates the two sources of growth: $1/N$ controls the within-layer loop correlation, and $1/\!\sqrt{L}$ controls the across-layer variance. A key consequence is that the optimal learning rate depends only on the number of unique layers $L$, not on the loop count $N$, enabling direct hyperparameter transfer from small to large $N$ without retuning. Experiments on looped Transformers confirm that $1/N$ scaling improves trainability and yields better loss than $1/\!\sqrt{N}$ scaling across loop counts.

阅读与讨论 → 访问原文 →

10.

arXiv (CS.LG) 2026-06-17 DOI: arXiv:2606.18218

Finite-Time Queue Peak Laws in Stochastic Networks: Logarithmic Scaling After Geometric Thresholds

作者:

Hao Liang ↗Cheng Tang ↗Yunzong Xu ↗

arXiv:2606.18218v1 Announce Type: cross Abstract: We study finite-horizon queue peaks in generalized switches, a standard stochastic-network model in which many queues share constrained service resources. Arrivals may be dependent, time-varying, and adapted to the past; the standing load condition is uniform interior slack, meaning the conditional mean arrival vector stays in a fixed contraction of the capacity region. We show that this slack reshapes the finite-time peak law for drift-minimizing scheduling policies such as MaxWeight. The square-root envelope that is sharp without slack persists only up to a geometry-dependent threshold; beyond that threshold, the running maximum grows only logarithmically with the horizon, both with high probability and in expectation. The mechanism is self-normalization: in the current queue direction, the projected fluctuation scale is normalized by the stabilizing drift scale. This removes capacity geometry from the logarithmic coefficient, while geometry remains in the threshold. Matching lower bounds show that both the logarithmic term and a geometric threshold are unavoidable. When finite-time state-space collapse is available, the threshold can be sharpened using local bottleneck geometry. For generalized input-queued switches, we obtain finite-time peak bounds with tight logarithmic coefficients. Simulations illustrate the two-phase envelope, local geometric refinements, and variance-sensitive improvements predicted by the theory.

阅读与讨论 → 访问原文 →

11.

arXiv (CS.AI) 2026-06-15 DOI: arXiv:2606.14119

FactoryLLM: A Safe and Open-Source AI Playground for Evaluating LLMs in Smart Factories

作者:

Yash Pulse ↗Yong-Bin Kang ↗Abhik Banerjee ↗Abdur Forkan ↗Prem Prakash Jayaraman ↗

arXiv:2606.14119v1 Announce Type: new Abstract: Fault diagnostics and recovery in smart factories is challenging because critical information is dispersed across manuals of multiple machines which are interconnected through the manufacturing process. Large Language Models (LLMs) can provide a promising approach. In this paper, we propose FactoryLLM, a safe and open-source AI playground designed for evaluating different LLM-based retrieval-augmented generation (RAG) models by analysing documents from multiple machines across the manufacturing process. FactoryLLM enables the user to configure the LLM, and assess performance when reasoning over multiple documents, through a dual evaluation setup using both RAGAS and NVIDIA's LLM-as-a-Judge metrics. FactoryLLM is safe because it allows users to run local or open-source LLMs without sharing sensitive industrial data, providing a controlled environment for experimentation. We demonstrate the efficacy of FactoryLLM through a case study which involves an Autonomous Intelligent Vehicle and its Mobile Planner software, evaluating three LLMs across 30 maintenance queries derived from approximately 600 pages of cross-machine documentation. The results suggest that FactoryLLM is effective in cross-machine document reasoning: every model achieved a groundedness score above 0.88. The full code and documentation for community to test FactoryLLM with their manufacturing specific scenarios are publicly available.

阅读与讨论 → 访问原文 →

12.

bioRxiv (Bioinfo) 2026-06-19 DOI: HASH:34815a2582d60266155ba622b0621bdd

Identification of Altered Potassium Channels for Drug Repurposing in Long COVID Patients

作者:

George ↗J. P ↗Gaikwad ↗K. B ↗Sharma ↗

Long COVID (LC) is a complex condition characterized by persistent, chronic multisystem manifestations, with a significant proportion of patients exhibiting neurological symptoms. Human ion channels (HICs), particularly potassium channels, are abundantly expressed in the nervous system and linked to key metabolic processes, making them potential candidates for understanding LC pathophysiology and drug repurposing. Meta-analysis of RNA-Seq datasets from COVID-19 recovered and LC patients was performed to identify altered HICs in LC. Differential gene expression analysis, functional enrichment analysis, and weighted gene co-expression network analysis (WGCNA) were performed to uncover key genes, pathways, and co-expression modules consisting of HICs, lipid metabolism-, and immune signaling-related genes. Drug-gene interaction analysis was performed to identify approved drugs targeting potential HICs. A total of 715 dysregulated genes, including eighteen HICs were identified, among which seven were potassium channels. Three significant modules containing HICs, lipid metabolism-, and immune signaling-related genes were identified and found to be associated with antigen processing and presentation, complement and coagulation cascades, and cytokine-related pathways. Approved drugs targeting KCNA6, KCNJ10, KCNN3, and KCNH4 were identified. With further experimental validation, these dysregulated potassium channels, supported by their co-expression networks and pathway associations, may act as potential candidates for drug repurposing in LC patients.

阅读与讨论 → 访问原文 →

13.

arXiv (quant-ph) 2026-06-16 DOI: arXiv:2606.16055

Readout-Induced Leakage in Superconducting Circuits with Nonlinear Couplings

作者:

Sumeru Hazra ↗Wei Dai ↗Daniel K. Weiss ↗Pranav D. Parakh ↗Luigi Frunzio ↗Michel H. Devoret ↗

arXiv:2606.16055v1 Announce Type: new Abstract: In superconducting circuits, drive-induced unwanted transitions limit the readout power, thereby constraining readout speed and fidelity. When such transitions excite the qubit into leakage states, they produce correlated errors that are particularly harmful for quantum error correction. Native nonlinear qubit-readout resonator coupling is a promising alternative to conventional linear hybridization because it provides intrinsic Purcell protection and stricter selection rules for multiphoton processes. In realistic devices, however, we show that such a coupling alone neither eliminates nor necessarily suppresses drive-induced transitions. Instead, if not appropriately engineered, these couplings often worsen the situation by introducing additional parasitic processes. Moreover, the rates of these unwanted transitions remain sensitive to the choice of readout frequency, regardless of the coupling mechanism. We demonstrate that readout-induced leakage can thus vary by orders of magnitude even when readout frequencies differ by less than ~7%. Our results establish that the benefits of native nonlinear couplings are realized only through informed device design, including the spectral placement of relevant auxiliary modes and elimination of parasitic ones.

阅读与讨论 → 访问原文 →

14.

bioRxiv (Bioinfo) 2026-06-16 DOI: HASH:d913354eca23c2730c762161634084e3

DynamicDemiLog: A Single Sketch for Ultrafast Similarity, Frequency, and Cardinality Estimation

作者:

Bushnell ↗B. J ↗

Probabilistic cardinality estimators (HyperLogLog), similarity sketches (MinHash), and frequency estimators (Count-Min Sketch) are fundamental approximate data structures that each target one primary problem. We present DynamicDemiLog (DDL), a sketch that unifies cardinality estimation, set similarity, containment, element frequency and composition in one tiny data structure built from a single pass over the input stream. Using an inverted index over 200,687 RefSeq sketches (159,567 organisms), DDL performs all-to-all sketch similarity comparison of the full database in 30 seconds (128 threads, indexed) - over 375x faster per query than Mash's brute-force all-to-all comparison of 91,282 sketches, or 31x faster without the index, at double the sketch resolution. DDL extends the LogLog register with a mantissa: each register stores a floating-point-encoded hash value consisting of an integer exponent (the leading-zero count) and a fractional mantissa (the sub-leading-zero bits), rather than the integer leading-zero count alone. This preserves enough hash information for meaningful register-by-register comparison - a property that standard 6-bit registers lack - while improving on LogLog's cardinality estimation machinery, including DynamicLogLog's early exit mask for high-throughput streaming. With a default 10 mantissa bits (16-bit registers, 2,048 buckets, 4 KB), DDL achieves a per-register false-match rate of 0.018% on unrelated random same-size sets (compared to 17.0% for LL6, a basic HyperLogLog implementation), enabling Weighted Kmer Identity (WKID), Average Nucleotide Identity (ANI), containment, and completeness estimation from register comparison alone. A 16-bit per-register observation counter provides element frequency information at trivial additional computation cost, and an additional byte tracks element composition (GC content, for biological data). Furthermore, DDL's high-specificity registers enable an inverted index structure (DDLIndex) that answers similarity queries against a database of N sketches in O(B + M) time, where M is the number of matching index entries, compared to O(NxB) for pairwise comparison.

阅读与讨论 → 访问原文 →

15.

arXiv (math.PR) 2026-06-18 DOI: arXiv:2501.17577

On the Singular Control of a Diffusion and its Running Infimum or Supremum

作者:

Giorgio Ferrari ↗Neofytos Rodosthenous ↗

arXiv:2501.17577v2 Announce Type: replace-cross Abstract: We study a class of singular stochastic control problems for a one-dimensional diffusion $X$ in which the performance criterion to be optimised depends explicitly on the running infimum $I$ (or supremum $S$) of the controlled process. We introduce two novel integral operators that are consistent with the Hamilton-Jacobi-Bellman equation for the resulting two-dimensional singular control problems. The first operator involves integrals where the integrator is the control process of the two-dimensional process $(X,I)$ or $(X,S)$; the second operator concerns integrals where the integrator is the running infimum or supremum process itself. Using these definitions, we prove a general verification theorem for problems involving two-dimensional state-dependent running costs, costs of controlling the process, costs of increasing the running infimum (or supremum) and exit times. Finally, we apply our results to explicitly solve an optimal dividend problem in which the manager's time-preferences depend on the company's historical worst performance.

阅读与讨论 → 访问原文 →

16.

arXiv (CS.AI) 2026-06-11 DOI: arXiv:2601.14764

An XAI View on Explainable ASP: Methods, Systems, and Perspectives

作者:

Thomas Eiter ↗Tobias Geibinger ↗Zeynep G. Saribatur ↗

arXiv:2601.14764v2 Announce Type: replace Abstract: Answer Set Programming (ASP) is a popular declarative reasoning and problem solving approach in symbolic AI. Its rule-based formalism makes it inherently attractive for explainable and interpretive reasoning, which is gaining importance with the surge of Explainable AI (XAI). A number of explanation approaches and tools for ASP have been developed, which often tackle specific explanatory settings and may not cover all scenarios that ASP users encounter. In this survey, we provide, guided by an XAI perspective, an overview of types of ASP explanations in connection with user questions for explanation, and describe their coverage by current theory and tools. Furthermore, we pinpoint gaps in existing ASP explanations approaches and identify research directions for future work.

阅读与讨论 → 访问原文 →

17.

arXiv (CS.AI) 2026-06-15 DOI: arXiv:2602.00845

Optimizing Agentic Reasoning with Retrieval via Synthetic Semantic Information Gain Reward

作者:

Senkang Hu ↗Yong Dai ↗Yuzhi Zhao ↗Yihang Tao ↗Yu Guo ↗Zhengru Fang ↗Sam Tak Wu Kwong ↗Yuguang Fang ↗

arXiv:2602.00845v3 Announce Type: replace Abstract: Agentic reasoning enables large reasoning models (LRMs) to dynamically acquire external knowledge, but yet optimizing the retrieval process remains challenging due to the lack of dense, principled reward signals. In this paper, we introduce InfoReasoner, a unified framework that incentivizes effective information seeking via a synthetic semantic information gain reward. Theoretically, we redefine information gain as uncertainty reduction over the model's belief states, establishing guarantees, including non-negativity, telescoping additivity, and channel monotonicity. Practically, to enable scalable optimization without manual retrieval annotations, we propose an output-aware intrinsic estimator that computes information gain directly from the model's output distributions using semantic clustering via bidirectional textual entailment. This intrinsic reward guides the policy to maximize epistemic progress, enabling efficient training via Group Relative Policy Optimization (GRPO). Experiments across seven question-answering benchmarks demonstrate that InfoReasoner consistently outperforms strong retrieval-augmented baselines, achieving up to 5.4% average accuracy improvement. Our work provides a theoretically grounded and scalable path toward agentic reasoning with retrieval. The code is available at https://github.com/dl-m9/InfoReasoner

阅读与讨论 → 访问原文 →

18.

arXiv (CS.CL) 2026-06-17 DOI: arXiv:2606.18216

Zone of Proximal Policy Optimization: Teacher in Prompts, Not Gradients

作者:

Byung-Kwan Lee ↗Ximing Lu ↗Shizhe Diao ↗Minki Kang ↗Saurav Muralidharan ↗Karan Sapra ↗Andrew Tao ↗Pavlo Molchanov ↗Yejin Choi ↗Yu-Chiang Frank Wang ↗Ryo Hachiuma ↗

Knowledge distillation transfers a teacher's competence to a small student but is brittle in the small-student regime: forcing the student to imitate logits from a much larger teacher concentrates it on the teacher's sharpest modes, hurting generalization on benchmark families beyond the training corpus. Reinforcement learning (RL) avoids logit imitation by training on the student's own rollouts. However, on questions where every rollout fails-yielding zero advantage and being silently discarded-injecting a stronger teacher's response into the policy gradient breaks the on-policy assumption and induces drift. We introduce Zone of Proximal Policy Optimization (ZPPO), inspired by Vygotsky's zone of proximal development, which keeps the teacher inside the prompt rather than the policy gradient. On hard questions, ZPPO constructs two reformulated prompts: a Binary Candidate-included Question (BCQ) pairs one correct teacher response with one incorrect student response as anonymized candidates the student must discriminate, and a Negative Candidate-included Question (NCQ) aggregates the student's wrong rollouts into a single prompt to surface their shared failure modes. A prompt replay buffer recirculates each hard question until it either graduates-the student's mean rollout accuracy on it reaches half- or is FIFO-evicted under finite capacity, amplifying BCQ and NCQ inside the student's current zone of proximal development. On the Qwen3.5 family at four student scales (0.8B-9B) with a 27B teacher, post-trained as vision-language models and evaluated on a 31-benchmark suite (16 VLM, 10 LLM, 5 Video), ZPPO outperforms off/on-policy distillation and GRPO, with the largest gains at the smallest scale.

阅读与讨论 → 访问原文 →

19.

arXiv (CS.AI) 2026-06-17 DOI: arXiv:2602.08939

CausalT5k: Diagnosing Refusal and Failure Modes in Trustworthy Causal Reasoning Across Causal Rungs

作者:

Longling Geng ↗Andy Ouyang ↗Theodore Wu ↗Daphne Barretto ↗Matthew John Hayes ↗Rachael Cooper ↗Yuqiao Zeng ↗Sameer Vijay ↗Gia Ancone ↗Ankit Rai ↗Matthew Wolfman ↗Patrick Flanagan ↗…

arXiv:2602.08939v2 Announce Type: replace Abstract: Large language models increasingly produce fluent causal explanations, yet they often fail in ways aggregate accuracy cannot diagnose: confusing association with intervention, abandoning correct judgments under pressure, over-refusing valid claims, or answering when evidence is underdetermined. We introduce CTK, a diagnostic benchmark of 5,147 cases and growing, across 10 domains and all three levels of Pearl's Ladder of Causation. Unlike benchmarks that only score correctness, CTK reveals why a model failed by annotating causal rung, trap type, pressure sensitivity, refusal quality, and Utility-Safety tradeoffs. Its Sheep/Wolf taxonomy separates valid causal designs from inferential traps; paired neutral/pressure variants measure sycophantic drift through Bad Flip Rate; and Wise Refusal fields test whether a model identifies the missing information needed before endorsing a claim. CTK exposes failure modes hidden by aggregate accuracy: the Skepticism Trap, Rung Collapse under scaling, pressure-induced drift, Detection-Correction gaps, and counterfactual error modes. Rather than prescribing a correction method, it provides the diagnostic substrate for studying causal-reasoning failure profiles.

阅读与讨论 → 访问原文 →

20.

arXiv (CS.AI) 2026-06-16 DOI: arXiv:2606.16475

AI systems out-persuade expert humans

作者:

Kobi Hackenburg ↗Caroline Wagner ↗Luke Hewitt ↗Ben M. Tappin ↗Ed Saunders ↗Hannah Rose Kirk ↗Helen Margetts ↗Christopher Summerfield ↗

arXiv:2606.16475v1 Announce Type: cross Abstract: Many societal decisions are settled by contests of persuasion. Conversational AI is a powerful new entrant in these contests, but whether it can out-persuade skilled and highly incentivized humans has remained unclear. Here, in a series of four preregistered experiments (n = 18,978 conversations from 6,923 people), we pitted AI systems against a range of human persuaders, including laypeople, winners of a separately preregistered four-round online persuasion tournament, professional canvassers, and world championship debaters. We found that AI systems were reliably more persuasive than expert humans, even when expert humans chose their issues, researched in advance, underwent hours of live, structured practice, and were incentivized with {\pounds}1,000 cash bonuses. In a follow-up study, AI's advantage persisted after experts received a coaching tool that let them practice against the AI that beat them, review their performance history, and see what AI would have said at key moments. We found converging evidence that AI's advantage stemmed from rapidly deploying larger quantities of information: after coaching, expert humans could tie an AI constrained to respond at human speeds and with human-length messages. In a final study, we show that AI's advantage extends to consequential real-world behavior: AI was nearly 3x more effective than professional canvassers from a UK fundraising firm at raising real-money donations to Save the Children. Together, these results establish that frontier AI systems out-persuade expert humans in conversation, with significant implications for political communication.

阅读与讨论 → 访问原文 →

21.

bioRxiv (Bioinfo) 2026-06-20 DOI: HASH:2291c7631b4dce607c6fd937b22d42ec

Seed variation impacts clustering stability in Single-Cell RNA-Seq and can be mitigated by StAbility-BasEd-Reassignment (SABER)

作者:

Zagaria ↗Tini ↗Bonetti ↗Mazzarella ↗

Single-cell RNA-seq clustering is commonly treated as reproducible once a random seed is fixed, yet the choice of seed itself may alter cell assignments and downstream interpretation. We systematically quantified seed-induced clustering variability by running Louvain and Leiden clustering across 100 seeds in Seurat and Scanpy on 28 single-cell RNA-seq datasets from the Human Cell Atlas and IMMUcan. Using Element-Centric Consistency, we found that seed choice affected a substantial fraction of cells, with Scanpy showing more unstable assignments than Seurat on average, 40.46% versus 26.78% unstable cells, respectively. This increased stability came at a marked computational cost: Seurat required approximately 19-fold higher median memory than Scanpy. Seed-dependent clustering variability also propagated to cell-type annotation, particularly among transcriptionally related populations including macrophage/monocyte, endothelial/epithelial and T/NK cell states. To mitigate this instability, we developed StAbility-BasEd Reassignment (SABER), a Scanpy-based framework that identifies seed-sensitive cells across repeated clusterings and reassigns them to stable cluster cores using cosine similarity. SABER improved clustering quality while preserving annotation concordance and reduced median memory usage 3.5-fold compared with Seurat-Louvain. Our results identify seed choice as an underappreciated source of variability in single-cell analysis and provide a scalable strategy to improve clustering robustness.

阅读与讨论 → 访问原文 →

22.

arXiv (CS.AI) 2026-06-18 DOI: arXiv:2606.18932

TransitNet: A Compact Attention-Augmented Deep Learning Framework for Low-SNR Transit Blind Searches

作者:

Xingchen Yan ↗Jian Ge ↗Qingtian Liu ↗Kevin Willis ↗Quanquan Hu ↗Jiapeng Zhu ↗

arXiv:2606.18932v1 Announce Type: cross Abstract: Motivated by the observational incompleteness of intermediate-to-long-period Earth-size planets, we present TransitNet, a compact attention-augmented deep-learning framework for low-SNR transit blind searches. To enable realistic method development and objective threshold calibration under blind-search conditions, we develop a unified dataset construction, benchmarking, and threshold-selection framework. On recovery benchmarks constructed from unseen Kepler targets, TransitNet attains 95.2 percent accuracy in the challenging SNR range of 6 to 8 and outperforms both TLS and BLS, achieving ROC-AUC and PR-AP values of 0.974 and 0.982, respectively. In an injected Earth-size and sub-Earth-size transit recovery experiment, TransitNet achieves a recovery rate of 93.0 percent, substantially exceeding those of TLS (63.1 percent) and BLS (60.0 percent). In addition to detection, TransitNet provides attention-based estimates of transit windows and midpoints. On an independent evaluation set, 97.4 percent of injected transits are fully covered by the estimated transit window. Applied to real Kepler observations, the model successfully recovers all 34 selected confirmed Kepler planets, with a mean absolute transit midpoint error of 1.24 hours. The model combines a compact footprint of about 1.5 MB with high inference efficiency, yielding speed-ups of about 12 to 25 times relative to CPU-TLS and about 4 to 5 times relative to CPU-BLS. These results demonstrate that TransitNet provides an accurate, scalable, and computationally efficient framework for low-SNR transit blind searches in the tested regime and motivate its extension to longer-period Earth-size planet searches.

阅读与讨论 → 访问原文 →

23.

arXiv (CS.CV) 2026-06-12 DOI: arXiv:2606.12953

OpenMedQ: Broad Open Pretraining for Medical Vision-Language Models

作者:

Ibrahim Gulluk ↗Max Van Puyvelde ↗Olivier Gevaert ↗

We present OpenMedQ, a medical vision-language model pretrained on the broadest fully-open medical mix to date: 14 datasets totaling ~3.35M pretraining samples spanning pathology, radiology, microscopy, and text-only clinical QA. OpenMedQ reaches state-of-the-art BLEU-1 on PathVQA (75.9), beating Med-PaLM M variants up to 562B parameters (~80x larger), and matches the best reported VQA-MED BLEU-1 (64.5). Its vision encoder, transferred to 8 unseen medical classification benchmarks under an identical downstream recipe, obtains the highest average macro-F1 (0.757) among BiomedCLIP (0.745), PMC-CLIP (0.745), PubMedCLIP (0.746), and a from-scratch baseline (0.616). We release our code and an interactive demo is publicly available as a reproducible baseline for the community.

阅读与讨论 → 访问原文 →

24.

arXiv (quant-ph) 2026-06-16 DOI: arXiv:2606.14873

Finite-Element Matrix Product States for Continuum Models in One Dimension

作者:

Akshay Shankar ↗Karel Van Acoleyen ↗Jutho Haegeman ↗

arXiv:2606.14873v1 Announce Type: new Abstract: We present a matrix product state framework for simulating one-dimensional quantum many-body systems in the continuum using non-orthogonal single-particle basis sets. By mapping the physical problem to an auxiliary computational space, we show that the resulting many-body overlap operator can be efficiently encoded as a matrix product operator for sufficiently localized orbitals, thereby generalizing a construction that first appeared in [arXiv:2405.10285]. This construction recasts the variational ground-state search into a generalized eigenvalue problem, which can be solved using a generalized density matrix renormalization group algorithm. As a primary application, we employ a first-order finite-element expansion to study the ground state properties of the Lieb-Liniger gas in the presence of inhomogeneities. This approach also provides a natural setting for exactly refining the lattice, thereby enabling multigrid optimization strategies for matrix product states.

阅读与讨论 → 访问原文 →

25.

arXiv (CS.CV) 2026-06-16 DOI: arXiv:2501.19337

Token-Level Entropy Reveals Demographic Disparities in Language Models

作者:

Messi H. J. Lee ↗

We ask whether demographic identity, signaled by a name alone, systematically reshapes the generative distribution of a language model. Measuring full-vocabulary Shannon entropy at temperature zero across six open-weight base models and 5,760 implicit sentence-completion prompts (e.g., "Tanisha walked into the office on a Monday morning and"), we find that Black-associated names produce higher first-token entropy than White-associated names across all six architectures - opposite to the output-level homogeneity bias documented under explicit demographic prompting (Lee et al., 2024) - and Black-associated names always produce greater entropy above identity-neutral baselines than White-associated names ($\Delta\Delta > 0$ in all six models). Women-associated names co-occur with lower first-token entropy (DL-pooled $\hat\beta = -0.041, p = .019$) and more homogeneous outputs ($\hat\alpha = +0.024, p < .001$) than men-associated names - a pattern convergent with homogeneity bias; race and gender effects are additive. Instruction tuning does not attenuate the race gap (matched-format DL-pooled $\hat{\beta}=+0.153$). Running the same templates with explicit group labels instead of names yields null race effects in 10 of 12 models where implicit probing is significant - establishing that probing methodology is a primary determinant of which distributional structure is recovered.

阅读与讨论 → 访问原文 →

探索全球前沿学术脉络