Academic Intelligence · Curated Daily

探索全球前沿学术脉络

AcademicHub 汇聚顶级期刊与预印本平台的实时文献。定制您的专属科研雷达,利用大语言模型自动生成交叉领域文献分析简报。

01.
medRxiv (Medicine) 2026-06-15

Supporting people to access social security payments through the Special Rules for End of Life: a qualitative study of the perspectives of patients, carers and health care professionals

Background: People living with terminal illness face a double financial burden from additional costs and loss of earning for themselves and their carers. Social security benefits are intended to help alleviate some of this financial pressure, and in the UK and other countries people are eligible for fast-tracked access to financial support via the Special Rules for End of Life. One in 3 people who are eligible miss out on this support, yet there is limited evidence on the reasons for this take-up deficit. Objectives: The aim of this study is to understand the barriers and facilitators to claiming benefits for terminally ill people from the perspectives of patients, carers, and health care professionals. Methods: This is a qualitative study combining i) focus groups with healthcare professionals recruited via professional networks and social media, and ii) interviews with patients and carers recruited in hospital and hospice settings. We analysed the data using Practical Thematic Analysis Results: Fifty-five multidisciplinary healthcare professionals participated in 11 focus groups, and we interviewed 10 patients and carers. We constructed five descriptive themes to summarise the data: Navigating priorities and uncertainty; positive impacts alongside a sense of shame and stigma; talking about money, difficulties and dividends; everybodys, yet nobodys, responsibility; and sticking points in the system. Conclusion: The themes reveal several challenges that may contribute to people not taking up this financial support. However, discussions about access to benefits were also seen as a core part of holistic care, a positive way to offer support and a gateway to other discussions about end-of-life care preferences and decisions. Recommendations for policy and practice include evaluating the adoption of a diagnostic rather than a prognostic eligibility criteria, integrating discussions about benefits into existing processes such as advance care planning, and improving education and support for clinicians.

02.
arXiv (CS.CL) 2026-06-19

Telenor Nordics Customer Service self-help corpus

作者:

This paper presents a multilingual customer service self-help corpus comprising 1,122 manually validated documents in Finnish, Danish, Norwegian, and Swedish, totaling 274,599 words and 1,884,833 characters. The documents have been sourced from the public self-help pages of four Nordic telecommunications operators and subsequently filtered for person-identifiable information and relevance through a combined LLM and human annotation pipeline. Domain-specific datasets for Nordic languages remain scarce, particularly in customer service: a domain of growing importance for retrieval-augmented generation, cross-lingual transfer learning, and emerging agent-based service architectures. An analysis of the corpus reveals substantial variation in document length and structure across operators, reflecting distinct editorial strategies, as well as broad topical coverage spanning network hardware, mobile services, TV and streaming, billing, and account management. The dataset is publicly available under a CC-BY-NC-SA-4.0 license at https://zenodo.org/records/20732652, intended to support reproducible research in Nordic NLP and information retrieval.

03.
arXiv (CS.AI) 2026-06-16

Artificial Intelligence Index Report 2026

arXiv:2606.15708v1 Announce Type: new Abstract: Welcome to the ninth edition of the AI Index report. As AI continues to advance rapidly, the question becomes whether the systems built around it can keep up. Governance frameworks, evaluation methods, education systems, and the data infrastructure needed to track AI's impact are struggling to match the pace of the technology itself. That gap between what AI can do and how prepared we are to manage it runs through every chapter of this year's report. New in this edition, the report tracks how AI is being tested more ambitiously across reasoning, safety, and real-world task execution, and why those measurements are increasingly difficult to rely on. It also features new estimates of generative AI's economic value alongside emerging evidence of its labor market effects, an analytical framework on AI sovereignty, and a science chapter developed in collaboration with Schmidt Sciences. For the first time, the report features standalone chapters on AI in science and AI in medicine, reflecting AI's growing impact across these two domains.

04.
arXiv (CS.CL) 2026-06-12

NaturalFlow: Reducing Disruptive Pauses for Natural Speech Flow in Simultaneous Speech-to-Speech Translation

Simultaneous speech-to-speech translation aims to enable near-real-time communication by minimizing latency, offering a compelling, real-time alternative to the high latency of consecutive translation. However, the excessive pursuit of low latency often results in fragmented chunk-wise speech. Consequently, listeners are subjected to an unnatural acoustic flow punctuated by frequent pauses, which could increase their cognitive load. To bridge this gap, we introduce a fluency-aware optimization framework designed to discover the sweet spot between the low-latency benefits of simultaneous translation and the natural flow of consecutive translation. Our framework minimizes inter-chunk silences by leveraging model-internal signals, including linguistic diversity and induced temporal variability in speech durations. Experiments on short- and long-form benchmarks show that our framework produces natural speech flow while maintaining competitive latency and translation quality.

05.
arXiv (quant-ph) 2026-06-12

Quantized time in quantum walks under weak rank-K measurements

arXiv:2606.13552v1 Announce Type: new Abstract: Measurements can be used to monitor the evolution of quantum systems and may lead to a universally quantized time statistics. It is known that the mean return time is quantized for strong and indirect monitoring through the winding number of the return amplitude in a one-dimensional space. Here we discuss that under multi-channel strong or indirect monitoring, where the latter is achieved through ancilla coupling, the mean return time of a quantum walk in the projected subspace is also quantized. This reflects a universal time quantization for a higher dimensional evolution.

06.
arXiv (CS.LG) 2026-06-19

An adaptive framework for the axisymmetric pulsar magnetosphere using physics-informed Kolmogorov-Arnold networks

arXiv:2606.10686v2 Announce Type: replace-cross Abstract: The pulsar magnetosphere has only recently been addressed using Physics-Informed Neural Networks (PINNs), by deploying a domain-decomposition approach and treating the separatrix and equatorial current sheet as infinitesimally thin discontinuities. However, this baseline requires extensive manual hyperparameter tuning, achieves limited final accuracy and demands several hours of training. We refine this framework by introducing domain-specific neural architectures based on Kolmogorov-Arnold networks, an automated adaptive training pipeline and a physics-based convergence criterion that eliminate the need for manual calibration. The proposed methodology delivers self-consistent axisymmetric magnetosphere solutions with mean squared errors of the PDE residuals at O(1e-6) in double precision - an improvement of two orders of magnitude over the baseline - while achieving convergence in under 20 minutes in single precision. Importantly, the method reliably resolves stellar radii reduced by up to 80% compared to the baseline, overcoming the severe spatial scale disparities that also challenge traditional solvers. Furthermore, by varying the flux that opens to infinity, we provide a correction to the equation that connects it to the equatorial T-point's position. The complete framework is released as the open-source library PulsarX.

07.
Nature (Science) 2026-06-10

Mutation-dependent responses to sleep and exercise in clonal haematopoiesis

Clonal haematopoiesis (CH) activates inflammation and increases the risk of atherosclerosis1,2. Whether lifestyle alters CH clone expansion or the phenotypic programming of CH mutant cells, thereby affecting atherosclerosis, is unknown. Here, in humans and mice and across mutations in Jak2, Tet2, Trp53 and Dnmt3a, we demonstrate mutation-dependent responses to sleep and exercise in CH and show that mutant cells are uniquely sensitive to lifestyle. In two human datasets, moderate-to-vigorous physical activity was associated with lower prevalence of non-DNMT3A-driven CH. In atherogenic mice with Jak2V617F or Tet2 loss of function (LOF), but not Trp53 LOF or Dnmt3aR878H CH, uninterrupted sleep or exercise curtails clone expansion. In CH with the Jak2V617F mutation, sleep and exercise reduces clone expansion by selectively reprogramming mutant, but not cohabitant wild type, haematopoietic progenitor cells towards antiproliferative and metabolically healthy phenotypes by tempering bone marrow macrophage–haematopoietic progenitor cell IL-1β signalling. Sleep or exercise also lessens Jak2V617F-driven, Tet2 LOF-driven and Trp53 LOF-driven, but not Dnmt3aR878H-driven, atherosclerosis by locally reprogramming mutant vascular macrophages, independent of peripheral clone dynamics. In Jak2V617F, but not adjacent wild type, aortic macrophages, uninterrupted sleep blunts CLEC4E-dependent inflammasome activation, consequently diminishing lesions. Exercise, meanwhile, activates PAC1+ neurons in the locus coeruleus, raising the levels of peripheral noradrenaline, which signals through adrenergic receptor β2 (ADRβ2) whose expression is preserved by exercise in Jak2V617F, but not cohabitant wild type, aortic macrophages, selectively repressing their inflammatory programming and atherosclerosis. Our findings establish that healthy lifestyles gene-specifically diminish CH and selectively reprogram mutant haematopoietic progenitor cells and macrophages to maintain cardiovascular health. Sleep and exercise can slow clonal haematopoiesis and limit mutant cell-driven atherosclerosis.

08.
arXiv (CS.LG) 2026-06-16

MolE-RAG: Molecular Structure-Enhanced Retrieval-Augmented Generation for Chemistry

arXiv:2606.05693v2 Announce Type: replace Abstract: Large language models (LLMs) have shown promise for molecular property prediction, but their ability to reason over chemical structures remains limited, as molecular representations such as SMILES differ substantially from the natural language on which LLMs are primarily trained. To bridge this semantic and chemical knowledge gap, we propose MolE-RAG, a training-free, molecule-centric retrieval-augmented generation framework for LLM-based molecular property prediction. MolE-RAG augments each prediction with three complementary sources of inference-time context: retrieved chemistry literature, molecule-specific information including compound synonyms, identifiers, functional group annotations, and physicochemical descriptors, and structurally similar molecules retrieved from the training set. We evaluate MolE-RAG across nine molecular property prediction tasks using proprietary, chemistry-specialized, and open-source LLMs. Across general-purpose LLMs, MolE-RAG improves ROC-AUC by up to 28 percentage points on classification tasks and reduces regression RMSE by up to 67% relative to a SMILES-only baseline. We further find that the utility of each context source varies across models and tasks, with different models benefiting most from textual retrieval, molecular context, or structural retrieval. These results suggest that molecule-centric retrieval can improve LLM-based molecular property prediction without model fine-tuning while providing a flexible framework for integrating heterogeneous chemical knowledge at inference time.

09.
arXiv (CS.AI) 2026-06-17

Learning Fair Pareto-Optimal Policies in Multi-Objective Reinforcement Learning

arXiv:2606.18111v1 Announce Type: cross Abstract: Fairness is an important aspect of decision-making in multi-objective reinforcement learning (MORL), where policies must ensure both optimality and equity across multiple, potentially conflicting objectives. While single-policy MORL methods can learn fair policies for fixed user preferences using welfare functions such as the generalized Gini welfare function (GGF), they fail to provide the diverse set of policies necessary for dynamic or unknown user preferences. To address this limitation, we formalize the fair optimization problem in multi-policy MORL, where the goal is to learn a set of Pareto-optimal policies that ensure fairness across all possible user preferences. Our key technical contributions are threefold: (1) We show that for concave, piecewise-linear welfare functions (e.g., GGF), fair policies remain in the convex coverage set (CCS), which is an approximated Pareto front for linear scalarization. (2) We demonstrate that non-stationary policies, augmented with accrued reward histories, and stochastic policies improve fairness by dynamically adapting to historical inequities. (3) We propose three novel algorithms, which include integrating GGF with multi-policy multi-objective Q-Learning (MOQL), state-augmented multi-policy MOQL for learning non-statoinary policies, and its novel extension for learning stochastic policies. We evaluate our algorithms across various domains and compare our methods against the state-of-the-art MORL baselines. The empirical results show that our methods learn a set of fair policies that accommodate different user preferences.

10.
arXiv (quant-ph) 2026-06-16

Minimum measurements quantum protocol for band structure calculation

arXiv:2511.04389v2 Announce Type: replace Abstract: Protocols for quantum measurement are an essential part of quantum computing. Measurements are no longer confined to the final step of computation but are increasingly embedded within quantum circuits as integral components of noise-resilient algorithms. However, each observable typically requires a distinct measurement basis, often demanding a different circuit configuration. As the number of such configurations typically grows with the number of qubits, measurements constitute a major bottleneck. Focusing on electronic structure calculations in crystalline systems, we propose a measurement protocol that restricts the required measurement configurations to an absolute minimum of just three, independent of the number of qubits. This makes it one of the few known protocols that do not scale with qubit number. In particular, we derive the measurement protocol from the symmetries of tight-binding (TB) Hamiltonians and implement it within the Orthogonal-Ansatz Variational Quantum Eigensolver (OA-VQE) algorithm. We demonstrate its performance on three systems, namely a two-dimensional CuO$_2$ square lattice (3 qubits), bilayer graphene with hexagonal (Honeycomb) lattice (4 qubits) and three-dimensional diamond lattice (10 qubits). Beyond tight-binding systems, the protocol can be extended to enable efficient initial state preparation for many-body Hamiltonians, such as multi-orbital Hubbard models in a momentum space.

11.
Nature (Science) 2026-06-10

Amplified Arctic iceberg traffic reshapes benthic biodiversity

The Arctic is undergoing rapid warming, resulting in retreating sea ice and glaciers1, yet how cryospheric changes propagate into the deep ocean remains poorly understood2. Here we identify a climate-driven mechanism linking accelerating glacier disintegration to an increase in deep-sea hard-bottom habitats far beyond calving fronts. Seafloor observations in Fram Strait show a localized increase in the density and patchiness of dropstones delivered by debris-laden icebergs. At the same time, four decades of shipboard records show that the occurrence of icebergs increased abruptly in the early 2000s. Backtracking links these icebergs to the main outlet glaciers in northeast Greenland and the Russian High Arctic. In northeast Greenland, the timing of glacier destabilization coincides with this rise, whereas sparse satellite coverage in the Russian sector limits temporal attribution despite indications of enhanced glacier activity. A model sensitivity study shows that, apart from intensified calving, a more dynamic sea ice cover enhances downstream transport of glacial ice. Along these pathways, increased iceberg activity could reshape deep-sea habitats through enhanced melt and associated lithogenic input, and elevate navigational hazards as maritime traffic expands in the Arctic. Although modest compared with the iceberg discharges of Pleistocene Heinrich events, this mechanism provides a modern analogue of long-range cryospheric influence on the seafloor in a warming climate. Accelerated Arctic glacier disintegration and a more dynamic sea ice cover are increasing iceberg-delivered dropstones in the deep ocean, reshaping seafloor habitats and extending cryospheric impacts far beyond glaciers.

12.
arXiv (CS.CL) 2026-06-15

Achieving Precise Text-To-Cypher Via Grounded Knowledge Graph Data Generation

Property Graphs are rapidly being adopted as database frameworks for representing heterogeneous data sources. To enable precise access to the information contained in them we need conversational interfaces based on Text-To-Cypher (Text2Cypher) parsers. This paper presents an automatic synthetic data generation method that can be leveraged to fine-tune small LLMs for this task. We conduct experiments on all the major Text-To-Cypher benchmarks, demonstrating that with our synthetic data generation approach we can significantly increase the performance of small LLMs, allowing them to compete with much larger proprietary models. This means that in settings in which models must be locally deployed we can ensure data-sovereignty without sacrificing accuracy and without costly annotation campaigns.

13.
arXiv (CS.CL) 2026-06-11

Judging Against the Reference: Uncovering Knowledge-Driven Failures in LLM-Judges on QA Evaluation

While large language models (LLMs) are increasingly used as automatic judges for question answering (QA) and other reference-conditioned evaluation tasks, little is known about their ability to adhere to a provided reference. We identify a critical failure mode of such reference-based LLM QA evaluation: when the provided reference conflicts with the judge model's parametric knowledge, the resulting scores become unreliable, substantially degrading evaluation fidelity. To study this phenomenon systematically, we introduce a controlled swapped-reference QA framework that induces reference-belief conflicts. Specifically, we replace the reference answer with an incorrect entity and construct diverse pairings of original and swapped references with correspondingly aligned candidate answers. Surprisingly, grading reliability drops sharply under swapped references across a broad set of judge models. We empirically show that this vulnerability is driven by judges' over-reliance on parametric knowledge, leading judges to disregard the given reference under conflict. Finally, we find that this failure persists under common prompt-based mitigation strategies, highlighting a fundamental limitation of LLM-as-a-judge evaluation and motivating reference-based protocols that enforce stronger adherence to the provided reference.

14.
arXiv (CS.CV) 2026-06-12

From Seeing to Experiencing: Scaling Navigation Foundation Models with Reinforcement Learning

Navigation foundation models trained on massive web-scale data enable agents to generalize across diverse environments and embodiments. However, these models, which are trained solely on offline data, often lack the capacity to reason about the consequences of their actions or adapt through counterfactual understanding. They thus face significant limitations in real-world urban navigation, where interactive and safe behaviors, such as avoiding obstacles and moving pedestrians, are critical. To tackle these challenges, we introduce the Seeing-to-Experiencing (S2E) learning framework to scale the capability of navigation foundation models with reinforcement learning. S2E combines the strengths of pretraining on offline videos and post-training through reinforcement learning. It maintains the model's generalizability acquired from large-scale real-world videos while enhancing its interactivity through reinforcement learning in simulation environments. Specifically, we introduce two innovations: (1) an Anchor-Guided Distribution Matching strategy for offline pretraining, which stabilizes learning and models diverse motion patterns through anchor-based supervision; and (2) a Residual-Attention Module for reinforcement learning, which obtains reactive behaviors from simulation environments without erasing the model's pretrained knowledge. Moreover, we establish a comprehensive end-to-end evaluation benchmark, NavBench-GS, built on photorealistic 3D Gaussian Splatting reconstructions of real-world scenes that incorporate physical interactions. It can systematically assess the generalizability and safety of navigation foundation models.

15.
arXiv (CS.CL) 2026-06-19

When Lower Privileges Suffice: Investigating Over-Privileged Tool Selection in LLM Agents

As LLM agents increasingly select tools autonomously, their choices among tools with different privileges become safety-relevant. However, prior tool-selection studies focus on safety-agnostic metadata preferences, leaving privilege-sensitive choices underexplored. To address this gap, we study over-privileged tool selection, in which an agent selects or escalates to a higher-privilege tool despite a sufficient lower-privilege alternative. We introduce ToolPrivBench to evaluate whether agents choose higher-privilege tools despite sufficient lower-privilege alternatives, measuring both initial selection and escalation after transient tool failures. Across eight domains and five recurring risk patterns, we find that over-privileged tool selection is common among mainstream LLM agents and is further amplified by transient failures. We further find that general safety alignment does not reliably transfer to least-privilege tool choice, while prompt-level controls provide only limited mitigation under transient failures. We therefore introduce a privilege-aware post-training defense that teaches agents to prefer sufficient lower-privilege tools and escalate only when necessary. Our mitigation experiments show that this defense substantially reduces unnecessary high-privilege tool use while preserving general capabilities.

16.
arXiv (CS.CV) 2026-06-17

Neural Tree Reconstruction for the Open Forest Observatory

The Open Forest Observatory (OFO) is a collaboration across universities and other partners to make low-cost forest mapping accessible to ecologists, land managers, and the general public. The OFO is building both a database of geospatial forest data as well as open-source methods and tools for forest mapping by uncrewed aerial vehicle. Such data are useful for a variety of climate applications including prioritizing reforestation efforts, informing wildfire hazard reduction, and monitoring carbon sequestration. In the current iteration of the OFO's forest map database, 3D tree maps are created using classical structure-from-motion techniques. This approach is prone to artifacts, lacks detail, and has particular difficulty on the forest floor where the input data (overhead imagery) has limited visibility. These reconstruction errors can potentially propagate to the downstream scientific tasks (e.g. a wildfire simulation.) Advances in 3D reconstruction, including methods like Neural Radiance Fields (NeRF), produce higher quality results that are more robust to sparse views and support data-driven priors. We explore ways to incorporate NeRFs into the OFO dataset, outline future work to support even more state-of-the-art 3D vision models, and describe the importance of high-quality 3D reconstructions for forestry applications.

17.
arXiv (CS.CV) 2026-06-11

SceneMiner: Identity-Preserving Multi-Task Fine-Tuning for Unified BEV Scene Mining

Mining hard, safety-critical scenes from driving logs is bottlenecked by the absence of difficulty labels, and no single proxy, collision risk, trajectory ambiguity, or semantic rarity suffices to find such scenes on its own. We present SceneMiner, a unified, camera-only bird's-eye-view pipeline that emits complementary mining signals from a frozen vision-language backbone in a single forward pass, with no LiDAR or radar: a retrieval embedding for text-prompted scenario search, a multi-label scene-tag distribution, and a continuous physics-based risk score (a motion forecast is a byproduct, not a contribution). Building such a multi-head model exposes our central finding, a failure mode we term cross-task interference: adding or upgrading one head shifts a shared activation stream and degrades weight-frozen sibling heads, so freezing parameters alone is insufficient. Our contribution, identity-preserving multi-task fine-tuning, removes this interference by zero-initializing every new sub-module and freezing every parameter that feeds the shared stream. The mining heads are thereby preserved bit-identically while training only ~102k parameters. The tagging head reaches mAP 0.4614 (micro-F1 0.5557) on 20 scene tags by pooling each scene into 32 visual tokens, and the embedding head supports text-prompted retrieval, validated qualitatively. Code is available at: https://anonymous.4open.science/r/sceneminer_anonymous-64E5

18.
bioRxiv (Bioinfo) 2026-06-11

DModE: An end-to-end framework for Differential Modification and Expression Analysis of Nanopore direct RNA sequencing data

Summary: Nanopore direct RNA sequencing (DRS) enables simultaneous quantification of transcript abundance and RNA modifications from native RNA molecules, providing a unique opportunity to study transcriptional and epitranscriptomic regulation within a single experiment. However, comprehensive analysis of DRS data remains challenging, as existing workflows typically focus on individual processing steps and often require manual integration of multiple software packages for expression analysis, modification detection, statistical testing, and visualization. Furthermore, integrated differential expression and differential RNA modification analysis at both gene and isoform resolution remains poorly supported by current workflows. Here, we present DModE (Differential Modification and Expression Analysis), an end-to-end framework for integrated analysis of Nanopore DRS data. DModE combines an Epi2ME-compatible Nextflow preprocessing workflow with a dedicated Python package for downstream statistical analysis, visualization, and reporting. The framework supports differential gene and isoform expression analysis, differential RNA modification analysis at genome and transcript level, metagene profiling, exploratory epitranscriptomic analyses, and integrated assessment of relationships between expression and modification dynamics. Results are automatically summarized in interactive HTML reports, facilitating reproducible and accessible data interpretation. By integrating transcriptomic and epitranscriptomic analyses within a single framework, DModE substantially simplifies comprehensive DRS data analysis and lowers the barrier for studying RNA modification biology using Nanopore sequencing.

19.
medRxiv (Medicine) 2026-06-18

Predicting Motor Recovery After Stroke: Utility and Limits of Corticospinal Tract Biomarkers

Background: Corticospinal tract (CST) damage is a major cause of post-stroke motor deficits. However, it remains unclear which estimates of CST damage best predict motor recovery, especially regarding different aspects of motor control. While conventional CST-lesion metrics offer superior feasibility, data-driven machine learning (ML) approaches may better capture patients propensity for task-specific recovery with important implication for their use as future clinical biomarkers. Methods: Providing the first direct longitudinal comparison of these approaches based exclusively on CST-lesion patterns, we evaluated six conventional CST-lesion metrics and a voxel-wise ML approach using clinical MRI data from 127 acute ischemic stroke patients. Acute impairment and outcome (>3 months post-stroke) were assessed for basal and complex motor functions. Conventional CST-lesion metrics and ML were used to predict task-specific motor impairment and outcome. Results: All conventional CST-lesion metrics correlated significantly with both acute impairment and motor outcome across motor domains, with metrics weighted for CST narrowing and tract probability performing best. However, predictive performance for unseen patients was low. ML outperformed conventional markers in predicting acute impairment across motor domains and basal motor outcome, but failed to predict complex motor outcome. Topographically, predictive voxels clustered within and above the posterior limb of the internal capsule, with distinct CST subregions associated with basal versus complex motor impairment, consistent with a task-specific somatotopic organization. Conclusions: The predictive utility of CST biomarkers was task- and timepoint-dependent. While ML may improve predictive performance, complex motor outcome remained difficult to predict, likely reflecting greater reliance on distributed cortical reorganization beyond the CST. By revealing task-specific CST subregions, voxel-wise ML provides an anatomically informed foundation for future predictive models. Such future models should combine CST biomarkers with measures of broader motor network integrity to enable individualized prognosis tailored to specific motor domains and recovery stages.

20.
arXiv (CS.CL) 2026-06-17

Non-Autoregressive Minimum Bayes' Risk Decoding for Fast Speech Recognition

Non-autoregressive (NAR) decoding generates output tokens in parallel, making speech recognition faster than autoregressive decoding, which generates them sequentially from left to right. However, the recognition performance is degraded because NAR decoding cannot resolve uncertainty by conditioning on previously generated tokens. To address this issue, we propose a novel NAR decoding framework based on minimum Bayes' risk (MBR) decoding, termed NAR-MBR decoding, that maximizes the expected utility calculated from samples drawn from the output probability of an NAR model rather than maximizing the output probability. Notably, by leveraging the nature of NAR models, multiple samples are obtained efficiently with a single forward computation. Our experiments across LibriSpeech, Switchboard, AMI, and web presentation corpus demonstrated that our NAR-MBR decoding outperformed previous NAR decoding and ran faster than AR decoding.

21.
bioRxiv (Bioinfo) 2026-06-16

Orion: Towards Lab Automation with Computer-Using Agents

Laboratory discovery increasingly depends on computational workflows that connect experimental data to analysis, interpretation and follow-up hypotheses. Yet these workflows remain constrained by labor-intensive use of specialized software, visual inspection through graphical user interfaces, and integration of knowledge across multiple sources. Here, we present Orion, a computer-using AI agent for biomedical image analysis and interpretation that moves towards lab automation by automating this computational layer of laboratory work. Orion combines large language models with terminal execution, GUI control and adaptive multi-step reasoning in a shared computing environment. It can inspect visual data, operate standard scientific software, mine web resources and conduct end-to-end analysis and interpretation workflows without requiring bespoke software integrations. Across benchmarks, Orion achieved over 90% accuracy on biomedical database and literature retrieval tasks, learned to use the popular tools CellProfiler and QuPath for quantitative analysis of cellular and tissue images, respectively, and facilitated autonomous discovery in experimental imaging data. In 100 hours of autonomous exploration of a large-scale perturbation imaging dataset, Orion generated 52 research reports, of which human scientist review prioritized 22 plausible mechanistic hypotheses. These results show that computer-using AI agents can substantially expand the reach of laboratory automation, providing a scalable and auditable route from experimental imaging data to quantitative analysis, reports and biologically grounded hypotheses.

22.
arXiv (CS.CL) 2026-06-12

Entropy-Gradient Inversion: Moving Toward Internal Mechanism of Large Reasoning Models

The advancement of Large Reasoning Models (LRMs) has catalyzed a paradigm shift from reactive ``fast thinking'' text generation to systematic, step-by-step ``slow thinking'' reasoning, unlocking state-of-the-art performance in complex mathematical and logical tasks. However, the field faces the fundamental gap between token-level behavioral analysis and internal reasoning mechanisms, and the instability of reinforcement learning (RL) for reasoning optimization relying on costly external verifiers. We identify and formally define Entropy-Gradient Inversion, a robust negative correlation between token entropy and logit gradients that acts as a definitive geometric fingerprint for LRM reasoning capability. Building on this, we propose Correlation-Regularized Group Policy Optimization (CorR-PO), which embeds this inversion signature into RL reward regularization. Extensive experiments on various reasoning benchmarks across multiple model scales show CorR-PO consistently outperforms state-of-the-art baselines, confirming that stronger inversion directly correlates with superior reasoning performance.

23.
arXiv (CS.LG) 2026-06-18

Starter-Iterator Neural Operator: A Unified Architecture for High-Fidelity Forward and Inverse PDE Problems

arXiv:2606.18305v1 Announce Type: cross Abstract: Operator learning is an emerging interdisciplinary field that integrates machine learning with scientific computing. By mapping infinite-dimensional function spaces, this approach provides an efficient surrogate modeling framework for high-dimensional partial differential equations (PDEs). Compared to traditional numerical solvers, it achieves a superior trade-off between computational complexity and approximation accuracy, demonstrating significant advantages in many-query tasks such as real-time prediction and parameter sweeps. Given the stringent accuracy requirements of both forward simulation and inverse inference, as well as the precision bottlenecks of existing operator learning methods in handling complex boundaries or long-term evolution, we propose the Starter-Iterator Neural Operator (SINO). Our framework reinterprets the initialization strategies and iterative formats of traditional iterative methods through neural networks, establishing an efficient approach for spectral-spatiotemporal collaborative modeling. Specifically, the frequency-domain initialization module captures globally stable low-frequency features, while the time-domain learning module focuses on optimizing local solution residuals, thereby effectively overcoming the inherent limitations of conventional single-domain modeling approaches. Extensive experiments on typical dynamical systems such as the Navier-Stokes equations and acoustic wave equations, as well as practical applications including super-resolution imaging and weather forecasting, demonstrate that SINO achieves outstanding performance in numerical accuracy, generalization capability, and robustness.

24.
arXiv (CS.LG) 2026-06-16

Repeated Bilateral Trade: The Quest for Fairness

arXiv:2606.15369v1 Announce Type: new Abstract: We study repeated bilateral trade from a fairness perspective. At each round, a fresh seller-buyer pair arrives, and the platform posts a price before observing the traders' valuations. Trade occurs only if both agents accept the price. Rather than maximizing only the gain from trade, we consider platforms that seek balanced divisions of the generated surplus. We show that natural fairness desiderata lead to a one-parameter Rawls-to-Nash family of fair-gain objectives, obtained by aggregating the seller's and buyer's net gains through nonpositive Hölder means. Unlike the standard gain-from-trade objective and the Rawlsian fair-gain objective studied in prior work, our proposed objectives induce a new statistical structure in which expected rewards are recovered from threshold feedback through a two-dimensional singular-kernel integral identity. This leads to a nonstandard pure-exploration problem whose natural estimators are rectangular double sums with row-column dependence and singular weights. Assuming independent i.i.d. seller and buyer valuation sequences with arbitrary unknown marginals, we characterize the optimal learning rates for the whole Rawls-to-Nash family of fair-gain objectives, giving matching fixed-confidence sample-complexity and regret bounds up to polylogarithmic factors.

25.
arXiv (CS.LG) 2026-06-17

Verified Detection and Prevention of Concurrency Anomalies in Multi-Agent Large Language Model Systems

作者:

arXiv:2606.17182v1 Announce Type: new Abstract: Multi-agent LLM systems share state through memory stores, vector indices, and tool registries. We model such sharing as long-running read-generate-write operations under deterministic-generation semantics – the regime durable-execution engines enforce by deterministic replay – and formalize four concurrency anomalies in TLA+: stale-generation, phantom-tool, causal-cascade, and tool-effect reordering, structural analogues of classical isolation anomalies, each with a TLC counter-example. The exclusion lattice over these anomalies is trivial; the contribution is the mechanically verified realizability and strict separation of one maximal chain within it, $L_0 \subsetneq \cdots \subsetneq L_4$, to our knowledge the first machine-checked consistency hierarchy for such runtimes. A development of 274 Verus obligations (zero assume, zero admit; trust base: two structural axioms and a mutex correspondence) proves the detectors sound and complete against the specifications and each runtime its avoidance set. Three deployed Rust runtimes realize L0-L1 (pessimistic locking, serializable snapshot isolation, default-SI), each verified against stale-generation and refined to its state machine; L2-L4 are exec-mode-verified with dependency-free prevention twins (A3, A6, A2: 0/1000 versus 1000/1000), and L2 is run live across three model families (A3 prevented in all 120 retracted sessions). We reproduce a silent lost update in ByteDance's deer-flow, formalizing its fix as a verified $L_0 \to L_1$ refinement, and exhibit tool-effect reordering in LangGraph's ToolNode on unmodified output, removed by an L3 commit-order sequencer. The verified detector, refinements, and realizability artifacts are the contribution; the phenomena and lattice are classical.