Academic Intelligence · Curated Daily

探索全球前沿学术脉络

AcademicHub 汇聚顶级期刊与预印本平台的实时文献。定制您的专属科研雷达,利用大语言模型自动生成交叉领域文献分析简报。

01.
arXiv (CS.AI) 2026-06-16

FasterPy: An LLM-based Code Execution Efficiency Optimization Framework

arXiv:2512.22827v2 Announce Type: replace-cross Abstract: Code often suffers from performance bugs. These bugs necessitate the research and practice of code optimization. Traditional rule-based methods rely on manually designing and maintaining rules for specific performance bugs (e.g., redundant loops, repeated computations), making them labor-intensive and limited in applicability. In recent years, machine learning and deep learning-based methods have emerged as promising alternatives by learning optimization heuristics from annotated code corpora and performance measurements. However, these approaches usually depend on specific program representations and meticulously crafted training datasets, making them costly to develop and difficult to scale. With the booming of Large Language Models (LLMs), their remarkable capabilities in code generation have opened new avenues for automated code optimization. In this work, we proposed FasterPy, a low-cost and efficient framework that adapts LLMs to optimize the execution efficiency of Python code. FasterPy combines Retrieval-Augmented Generation (RAG), supported by a knowledge base constructed from existing performance-improving code pairs and corresponding performance measurements, with Low-Rank Adaptation (LoRA) to enhance code optimization performance. Our experimental results on the Performance Improving Code Edits (PIE) benchmark demonstrate that our method outperforms existing models on multiple metrics. The FasterPy tool and the experimental results are available at https://github.com/WuYue22/fasterpy.

02.
medRxiv (Medicine) 2026-06-12

Heterogeneity of Treatment Effect of Aspirin and Clinically Significant Bleeding in Older Adults

Aim: The global population of older adults is growing, and older age is linked to higher bleeding risk. Although guidelines discourage aspirin for primary prevention in healthy older adults due to bleeding harms outweighing benefits, many continue taking it without a clear indication. It remains unclear whether all older adults face uniform aspirin-related bleeding risk or if certain subgroups are more vulnerable. Methods: We analyzed data from 19,114 ASPREE trial participants to develop machine learning models using 116 baseline variables. Random forest (RF) and random survival forest (RSF) models predicted 5-year bleeding risk, and participants were stratified into low, intermediate, and high-risk groups based on the 20th and 80th percentiles of predicted risk. We assessed heterogeneity of treatment effect (HTE) by testing treatment-by-risk group interactions on the relative scale using Fine-Gray models, and on the absolute scale using observed 5-year cumulative incidence rates. Results: Over a median follow-up of 4.7 years, 626 major bleeding events occurred. The RF model had moderate discrimination (AUC = 0.65, 95% CI: 0.63-0.67) and good calibration (Brier = 0.032, 95% CI: 0.029-0.034). Statistically significant HTE was observed on the relative scale, with the greatest relative increase in bleeding risk seen in the low-risk group (subdistribution hazard ratio = 2.26, 95% CI: 1.27-4.01). On the absolute scale, low-risk participants experienced higher bleeding with aspirin (absolute risk difference (ARD) = 1.17%, 95% CI: 0.37-1.95), but heterogeneity in ARDs was not statistically significant (Cochran's Q p > 0.45). Similar findings were observed when using the RSF model. Conclusion: Participants at lowest baseline bleeding risk experienced the greatest relative increase in bleeding risk with aspirin therapy. We found statistically significant heterogeneity in treatment effects on the relative but not absolute scale. These findings support an individualized, risk-based approach to aspirin therapy decision-making in older adults.

03.
arXiv (quant-ph) 2026-06-19

Nearest-neighbour gates are all you need: High-rate quantum low-density parity-check codes on a planar grid

arXiv:2606.19482v1 Announce Type: new Abstract: High-performance quantum low-density parity-check codes promise substantial reductions in the overhead of fault-tolerant quantum computation, but most constructions require long-range connectivity or qubit shuttling, both of which are difficult to realise in superconducting architectures. Here we introduce a family of quantum low-density parity-check codes that, for the first time, combines planar open-boundary layouts, finite-size advantages over surface codes, and syndrome extraction using only nearest-neighbour gates on a square grid of qubits. The key idea is to generate check-data connectivity dynamically: nearest-neighbour iSWAP walks both define the stabiliser supports and implement their measurement, avoiding the need for a long-range hardware graph. The resulting circuits achieve optimal constant-depth stabiliser measurement, independent of code size, and naturally remove leakage from the system by exchanging the role of check and data qubits at each syndrome extraction round. We find finite-size instances such as a [[323,14,15]] code, whose code-efficiency ratio is nearly an order of magnitude larger than that of rotated surface-code patches. At around 30 circuit qubits per logical qubit, the best directional tile-code layouts reduce the per-logical per-round logical error rate by up to a factor of 1000 relative to rotated surface-code memories. These results show that the advantages of quantum low-density parity-check codes can survive compilation into strictly planar nearest-neighbour circuits, bringing low-overhead fault-tolerant memories closer to near-term hardware.

04.
arXiv (CS.LG) 2026-06-16

Using Reinforcement Learning to Optimize the Global and Local Crossing Number

arXiv:2509.06108v2 Announce Type: replace-cross Abstract: Graph drawing concerns the algorithmic visualization of graphs. A good drawing of a graph is easy to read and facilitates solving tasks on the graph. Several properties have been identified to occur in good drawings of graphs. Such properties include a low number of crossings, large angles between edges, short edges, and depicting symmetries. Many of these properties are explicitly measurable metrics. This brings us to the insight that graph drawing can be seen as a game. In this paper, we study a single-player optimization game in which the player iteratively moves vertices of a straight-line graph drawing to reduce edge crossings. This game arose naturally from the automatic track of the Graph Drawing Challenge, where solutions are obtained by repeatedly performing local vertex movements. We formalize this process as a game with full information and investigate whether reinforcement learning can discover effective strategies for playing it. Our reinforcement-learning agent observes the local geometric and structural context of a vertex and selects a movement direction with the goal of reducing either the global or the local crossing number, that is, the total number of crossings or the maximum number of crossings per edge. We compare the resulting strategies to existing methods and established crossing-minimization heuristics on standard benchmark graphs. While our approach does not out-compete state-of-the-art methods for minimizing the global crossing number, it is competitive and often superior for minimizing the local crossing number.

05.
arXiv (quant-ph) 2026-06-11

Rolling Stock Planning Using the Quantum Approximate Optimization Algorithm

arXiv:2606.11383v1 Announce Type: new Abstract: Rolling stock planning is a complex optimization problem in railway management that involves assigning physical trains to scheduled trips while minimizing operational costs. In this work, we address a specific instance of this problem featuring 190 trips over two days, subject to constraints such as mandatory maintenance stops. We reformulate the problem as a Maximum-Weight Independent Set (MWIS) problem on a graph where nodes represent feasible train cycles. To handle the computational complexity of the large search space, we propose a hybrid divide-and-conquer algorithm. This approach iteratively selects subgraphs and solves the MWIS problem using various solvers, including exact classical methods and the Quantum Approximate Optimization Algorithm (QAOA). We evaluate the algorithm's performance by comparing these methods and analyzing the scaling with respect to subgraph size, with QAOA assessed through both classical simulation and execution on a quantum device (IQM Emerald). Our results indicate that increasing the subgraph size generally improves solution quality, demonstrating that the hybrid framework can effectively bridge the gap between polynomial-time approximate solvers and exponential-time exact methods.

06.
arXiv (CS.CL) 2026-06-11

Gumbel-BEARD: Automatic Layer Selection for Self-Supervised Adaptation of Whisper in Low-Resource Domains

Speech foundation models often struggle in low-resource domains due to domain mismatch and data scarcity. We propose Gumbel-BEARD, a domain adaptation framework that automates Whisper encoder layer selection via an end-to-end trainable hard Gumbel-Softmax selector. It enables self-supervised adaptation with a BEST-RQ objective that dynamically adapts to target acoustic characteristics without manual tuning. Experiments on the MyST child speech corpus demonstrate efficiency and scalability: with 10 h of labeled data for fine-tuning, our method matches a fully supervised baseline trained on the complete 133 h labeled set. We establish new state-of-the-art word error rates (WERs) of 8.21% using Whisper-medium on MyST and 11.06% using Whisper-small on the OGI Spontaneous dataset. Evaluation on CORAAL further confirms robustness to adult dialectal domain shifts, with up to 6% relative WER reduction, highlighting the generalizability of our approach to diverse low-resource conditions.

07.
arXiv (CS.CV) 2026-06-12

IterCAD: An Iterative Multimodal Agent for Visually-Grounded CAD Generation and Editing

Computer-Aided Design is pivotal in modern manufacturing, yet existing automated methods predominantly rely on open-loop, one-shot generation, creating a mismatch with iterative real-world practices. In this paper, we present IterCAD, a unified multimodal agent framework for closed-loop, interactive CAD generation and editing. We formulate the task as a multi-turn interaction between a multimodal agent and an executable CAD sandbox, covering three tasks: Drawing-to-Code, Text-to-Code, and Interactive Editing. To support this, we develop a data synthesis pipeline incorporating advanced industrial manufacturing features to generate standard-compliant multi-view engineering drawings, complex code-editing tasks, and high-fidelity interaction trajectories. We optimize the agent via progressive SFT followed by geometry-aware reinforcement learning with viable-prefix masking to enhance code executability and geometric fidelity. Finally, we introduce the IterCAD-Bench evaluation suite and propose the Chamfer Distance Tolerance-Recall (CD-TR) curve alongside its AUC-TR metric, establishing a survivor-bias-free standard that unifies code validity and geometric precision. Extensive experiments demonstrate that IterCAD achieves highly competitive performance across multiple benchmarks, significantly outperforming existing approaches in both code executability and geometric precision, while exhibiting superior capabilities in closed-loop iterative refinement.

09.
arXiv (CS.CV) 2026-06-16

Learning Fine-Grained Correspondence with Cross-Perspective Perception for Open-Vocabulary 6D Object Pose Estimation

Open-vocabulary 6D object pose estimation empowers robots to manipulate arbitrary unseen objects guided solely by natural language. However, a critical limitation of existing approaches is their reliance on unconstrained global matching strategies. In open-world scenarios, trying to match anchor features against the entire query image space introduces excessive ambiguity, as target features are easily confused with background distractors. To resolve this, we propose Fine-grained Correspondence Pose Estimation (FiCoP), a framework that transitions from noise-prone global matching to spatially-constrained patch-level correspondence. To systematically eliminate background interference, FiCoP first employs an object-centric disentanglement step to isolate the target from macro-level environmental noise. Building upon this localized region, our core methodological innovations are twofold. Firstly, a Cross-Perspective Global Perception (CPGP) module is proposed to fuse dual-view features, establishing structural consensus through explicit context reasoning and text-guided semantic injection. Secondly, we design a Patch Correlation Predictor (PCP) that leverages a patch-to-patch correlation matrix as a structural prior. This generates a precise block-wise association map, acting as a spatial filter to enforce fine-grained, noise-resilient matching. Experiments on the REAL275 and Toyota-Light datasets demonstrate that FiCoP improves Average Recall by 8.0% and 6.1%, respectively, compared to the state-of-the-art method, highlighting its capability to deliver robust and generalized perception for robotic agents operating in complex, unconstrained open-world environments. The source code will be made publicly available at https://github.com/zjjqinyu/FiCoP.

10.
arXiv (CS.AI) 2026-06-16

Mind-Studio: Executable World Models with Lookahead Evaluation for Partially Observable Games

arXiv:2606.16070v1 Announce Type: new Abstract: World-model synthesis aims to turn interaction experience into an internal model of environment dynamics. Existing symbolic approaches often fit observed transitions or mixtures of local rules, but they do not produce a complete executable program that can run independently of the real environment. We present Mind-Studio, a framework that synthesizes executable pygame-style world models from state-action-next-state trajectories using large language models. Mind-Studio combines entropy-selected traces with a lightweight game skill file containing object, action, and static scene information extracted from screenshots. We evaluate synthesis quality with a K-step lookahead fidelity protocol that compares generated world-model rollouts against Real-ALE rollouts from the same state. On Montezuma's Revenge, Mind-Studio improves chosen-action next-state prediction from 0.3% for PoE-World to 48.7% while verifying 5 of 8 subgoals; across Alien, Assault, and Skiing, it achieves stronger branch-level fidelity than prior learned lookahead sources.

11.
arXiv (CS.LG) 2026-06-15

Multi-Variable Stellar Parameter Estimation Using Residual Multitask Neural Networks

arXiv:2606.13868v1 Announce Type: cross Abstract: We present an end-to-end pipeline for estimating stellar parameters from Sloan Digital Sky Survey Data Release 12 spectra using a fully connected multitask neural network with residual blocks, whose hyperparameters are tuned via Bayesian optimization. The preprocessing pipeline includes per-spectrum standardization, RobustScaler normalization of the target variables – effective temperature $T_{\mathrm{eff}}$, metallicity $[\mathrm{Fe/H}]$, and surface gravity $\log g$ – and data augmentation via Gaussian noise injection. On a held-out test set, the model achieved Mean Absolute Errors (MAE) of $59.76~\mathrm{K}$ for $T_{\mathrm{eff}}$, $0.103~\mathrm{dex}$ for $[\mathrm{Fe/H}]$, and $0.130~\mathrm{dex}$ for $\log g$. Normalized against the full-scale range of each parameter, these results represent range-normalized errors between $1\%$ and $3\%$, achieved with a highly efficient model complexity of approximately 540,000 trainable parameters. These results demonstrate that a compact residual multitask architecture, combined with principled signal preprocessing, provides a parameter-efficient solution for nonlinear parameter estimation in large-scale spectral datasets. In particular, the proposed model achieves competitive performance with substantially lower complexity than deeper neural network baselines.

12.
medRxiv (Medicine) 2026-06-12

Association of circulating endothelial progenitor cell count and functional outcome in patients with acute ischemic stroke due to intracranial large vessel occlusion

Background: Circulating endothelial progenitor cells (cEPCs) contribute to vascular repair following an ischemic stroke. The aim of the study was to evaluate the association between cEPCs and functional outcomes in patients with acute ischemic stroke (AIS) due to large vessel occlusion (LVO) who received endovascular therapy (EVT). Methods: Prospective study of patients with LVO-AIS who received EVT. Blood samples were obtained within 24 +- 12 hours and on day 7+-1 from stroke onset. cEPCs were detected using flow cytometry (CD34+/VEGFR2+/CD133+). The primary endpoint was a favourable functional outcome (modified Rankin Scale 0-2) at three months of follow-up. Secondary endpoints include baseline to 24 hours/day 7 changes in the National Institutes of Health Stroke Scale (NIHSS) score and collateral circulation (CC) status. Bivariate and multivariable logistic regression analyses were performed. Results: Included were 90 patients (73.2+-12.7 years, 41.1% women) in 42 of whom (46.7%) cEPCs were detected at 24 hours. On day 7, cEPCs were detected in 27 (43.6%) of 62 patients for which this information was available. Atrial fibrillation, prior anticoagulant treatment and stroke onset-to-door time

13.
arXiv (CS.CV) 2026-06-16

Temporal Difference Learning for Diffusion Models

Diffusion models are typically trained with objectives that focus on local denoising targets at individual time steps (or adjacent pairs), which do not enforce consistency between predictions along the denoising trajectory. This lack of cross-time consistency can degrade performance, especially for few-step samplers. We introduce a temporal difference (TD) objective that penalizes inconsistency of the model's multi-step progress along the denoising path. By reformulating the diffusion process as a Markov reward process and casting denoising as a policy evaluation problem in reinforcement learning, we derive a unified TD approach that applies to both discrete- and continuous-time diffusion formulations. We further propose a principled sample-based reweighting method that stabilizes training. Empirically, we show that using our TD training can significantly improve sample quality measured by FID, with stronger advantages when the number of sampling steps is small, highlighting its practical utility under low-computation-budget scenarios. We provide ablation studies to justify our design choices, including pairwise loss reweighting, regularization weight, and one-step stride. Overall, our TD approach can be a general drop-in that enforces cross-time consistency and improves generation quality across different diffusion generative models.

14.
arXiv (math.PR) 2026-06-15

Mixing Times for the Facilitated Exclusion Process

arXiv:2402.18999v2 Announce Type: replace Abstract: The facilitated simple exclusion process (FEP) is a one-dimensional exclusion process with a dynamical constraint. We establish bounds on the mixing time of the FEP on the segment, with closed boundaries, and the circle. The FEP on these spaces exhibits transient states that, if the macroscopic density of particles is at least $1/2$, the process will eventually exit to reach an ergodic component. If the macroscopic density is less than $1/2$ the process will hit an absorbing state. We show that the symmetric FEP (SFEP) on the segment $\{1,\ldots,N\}$, with $k>N/2$ particles, has mixing time of order $N^{2}\log(N-k)$ and exhibits the pre-cutoff phenomenon. For the asymmetric FEP (AFEP) on the segment, we show that there exists initial conditions for which the hitting time of the ergodic component is exponentially slow in the number of holes $N-k$. In particular, when $N-k$ is large enough, the hitting time of the ergodic component determines the mixing time. For the SFEP on the circle of size $N$, and macroscopic particle density $\rho \in(1/2,1)$, we establish bounds on the mixing time of order $N^{2}\log N$ for the process restricted to its ergodic component. We also give an upper bound on the hitting time of the ergodic component of order $N^{2}\log N$ for a large class of initial conditions. The proofs rely on couplings with exclusion processes (both open and closed boundaries) via a novel lattice path (height function) construction of the FEP.

15.
bioRxiv (Bioinfo) 2026-06-17

MetaHarmonizer: robust biomedical metadata harmonization and a contamination control for inflated LLM performance on public benchmarks

Public biomedical repositories hold substantial reuse potential, but inconsistent metadata routinely blocks integration across studies. Recent LLM-based harmonization approaches address scale but suffer from non-determinism, hallucinated ontology terms, and, in their highest-accuracy configurations, dependence on proprietary APIs or labeled fine-tuning data. A more fundamental concern is that LLM accuracies on widely-used public benchmarks may substantially inflate transferable capability: under a contamination-controlled evaluation protocol we developed, the apparent LLM-only advantage on the GDC schema-mapping benchmark is inverted, and three out of five LLMs recover 80 -100% of GDC identifiers from zero-schema context, suggesting direct memorization. Building on this insight, we present MetaHarmonizer, an automated metadata harmonization system designed to be robust by construction: SchemaMapper aligns attribute names across schemas, and OntologyMapper standardizes values to controlled vocabularies. Both modules implement a multi-stage cascade that escalates to more resource-intensive methods only when earlier stages fall short, with all candidates grounded in pre-defined controlled vocabularies to preclude hallucinated outputs and LLMs used only as bounded preprocessing components rather than inference-time dependencies. On the GDC schema-matching benchmark, SchemaMapper with the deployment-optimized LLM-generated alias dictionary achieved 71.6% Top-1 accuracy and the higher Recall@GT than Magneto bipartite variants, recovering significantly more ground-truth mappings; with the best performing alias dictionary, it reached the highest Top-1/Top-5/Recall@GT, and also matched the best Magneto reranker (fine-tuned LLM-reranker) on MRR; and it also outperforms LLM-only performance under contamination-controlled conditions. On four EFO benchmarks, OntologyMapper achieved 77.9 - 95.5% Top-1 accuracy, outperforming text2term by up to 16.4 pp and direct LLM inference (against the smaller corpus) by 19.2 pp because memorization is not a viable shortcut for this task. Across both modules, calibrated confidence scores separate correct from incorrect predictions (AUC 0.73 - 0.94), enabling principled human-in-the-loop triage. Inference is fully local, deterministic, and computationally efficient - seconds on schema mapping and under a minute for ontology mapping of up to ~7,000 terms against the pre-indexed 33,230-term corpus. Released as a Python package with a domain-agnostic architecture, MetaHarmonizer provides a scalable foundation for improving the FAIRness of biomedical data and enabling cross-study integration, alongside an evaluation methodology applicable to any LLM-augmented bioinformatics benchmark built on public benchmarks.

16.
arXiv (CS.LG) 2026-06-15

Curvature-Guided Geometric Representation for Protein-Ligand Binding Affinity Prediction

arXiv:2606.14159v1 Announce Type: new Abstract: Protein-ligand binding affinity (PLA) prediction is critical in drug discovery. Despite the notable advancements in machine learning-based approaches, existing methods struggle to jointly characterize local geometric organization and globally coordinated cross-molecular interactions, limiting their ability to model complex binding mechanisms. Here, we propose RicciBind, a geometric representation framework that integrates curvature-guided hierarchical structure learning with optimal transport (OT)-based cross-domain alignment to model molecular interactions. Specifically, RicciBind leverages Ricci curvature to capture local interaction tightness within molecular structures, enhancing structural awareness and organizing atomic interactions into curvature-aware hierarchical representations. An OT-based cluster matching mechanism then aligns protein and ligand clusters across heterogeneous domains under geometric constraints, enabling globally consistent correspondences and revealing higher-order interaction patterns beyond local neighborhoods. By coupling curvature-guided structure encoding with OT-driven cross-domain alignment, RicciBind effectively models complex interaction semantics and substantially improves both the accuracy and interpretability of binding affinity prediction. Extensive experiments demonstrate that RicciBind achieved superior predictive performance and generalization across PLA benchmarks and virtual screening tasks. Ablation studies further confirmed the essential role of Ricci curvature in enhancing molecular interaction representations.

17.
arXiv (CS.AI) 2026-06-15

Formalizing Numerical Analysis: An Agent Pipeline and Quality Audit Beyond Kernel Acceptance

arXiv:2606.14000v1 Announce Type: new Abstract: Recent work has demonstrated that coding agents can formalize entire advanced mathematics textbooks in Lean 4, yet existing efforts concentrate on branches of mathematics already well-represented in mathlib and measure success solely through kernel acceptance. We address both limitations by applying a coding agent to formalize Numerical Methods for Ordinary Differential Equations, a textbook in numerical analysis that is largely absent from mathlib, stressing the agent's capacity to develop new theory from scratch. We further introduce a systematic, reproducible three-dimensional framework for evaluating the quality of agent-produced formalizations beyond compilation: semantic correctness, Mathlib reuse, and cross-file reuse via LLM-as-judge methods. Applying this framework to our own formalization and to the released outputs of RepoProver and M2F, we uncover recurring unfaithful formalization patterns, including incomplete multi-part statements, added weakening hypotheses, and parameter restrictions, that kernel acceptance entirely obscures. Our results suggest that compilation-based metrics substantially overstate formalization quality, and we provide a reproducible audit methodology to support more rigorous evaluation of future autoformalization systems.

18.
arXiv (CS.AI) 2026-06-16

Hybrid NARX-LLM for Greenland Iceberg Discharge: Prompt-Driven Residual Correction

arXiv:2606.15288v1 Announce Type: cross Abstract: Greenland iceberg discharge exhibits complex nonlinear dynamics with limited observability, challenging traditional predictive models. We present a Hybrid NARX-LLM framework that combines a nonlinear autoregressive model with exogenous inputs (NARX) and a large language model (LLM) for residual correction. We further propose a Physics-Informed Prompt (PIP) method that transforms unstructured physical knowledge into structured prompts for zero-shot in-context reasoning. The primary objective is to explore the corrective potential of this framework for modeling Greenland iceberg discharge, rather than merely optimizing predictive accuracy. The NARX component captures intrinsic temporal dependencies, while the LLM, guided by PIP, encodes glacier dynamics and environmental drivers and perceives key trend patterns to correct systematic prediction errors. This integration allows the model to reason about unmodeled factors and produce interpretable residuals, enhancing overall predictive accuracy. Applied to Greenland iceberg discharge time series, our approach addresses extreme events that are difficult to predict due to rare variations and nonstationary trends, a limitation often overlooked by traditional methods. By fusing structured time-series modeling with knowledge-driven foundation AI, the framework offers a scalable and interpretable pathway to bridge data-limited climate forecasting with physics-informed LLM reasoning. The code is available.

19.
arXiv (CS.LG) 2026-06-15

Hybrid Uncertainty Sensitivity Analysis Based on the HSIC for High-Dimensional Responses with Aleatory–Epistemic Separation

arXiv:2606.14053v1 Announce Type: cross Abstract: Quantifying the influence of hybrid aleatory and epistemic uncertainties on high-dimensional system responses remains a major challenge in global sensitivity analysis (GSA). Existing Hilbert–Schmidt Independence Criterion (HSIC)-based approaches are primarily restricted to single-output settings and lack a rigorous decomposition of heterogeneous uncertainty sources and their interactions. To address this limitation, a novel double-space tensor-product RKHS framework is proposed for sensitivity analysis under hybrid uncertainty. By constructing factorized kernels over both the latent input space and the multidimensional output space, a concurrent double Möbius inversion is derived to orthogonally decompose the global dependence measure into pure aleatory effects, pure epistemic effects, and their interaction contributions. The resulting dimension-wise sensitivity indices preserve the uncertainty attribution structure across all output dimensions. To satisfy the independence assumptions required by the decomposition, an auxiliary-variable representation based on the inverse probability integral transform is introduced, enabling the treatment of hierarchical uncertainties and Copula-induced correlations within a unified latent space. A fully vectorized single-loop implementation is further developed to avoid the computational burden of nested Monte Carlo simulation. Statistical significance and estimation uncertainty are quantified through permutation testing and Bootstrap confidence intervals. Numerical studies on a modified multi-output Ishigami function and an aerodynamic pressure-field problem demonstrate the accuracy, scalability, and practical applicability of the proposed framework.

20.
arXiv (CS.CL) 2026-06-19

PsyScore: A Psychometrically-Aware Framework for Trait-Adaptive Essay Scoring and ZPD-Scaffolded Feedback

Effective Automated Essay Scoring (AES) are expected to support both reliable assessment and actionable instructional feedback. However, existing approaches often treat scoring and feedback as separate components: neural scoring models provide limited interpretability, while Large Language Model (LLM)-based feedback is typically insensitive to learners proficiency levels. To address this fragmentation, this work proposes PsyScore, a psychometrically-aware framework that integrates diagnostic assessment with instructional scaffolding through a shared latent ability representation. PsyScore comprises three key modules: a Trait-Adaptive Neural IRT Scorer that incorporates the Graded Partial Credit Model (GPCM) into a neural architecture, enabling the precise estimation of student ability while maintaining psychometric interpretability, a ZPD-Scaffolded Feedback Generator, which conditions multi-agent feedback strategies on the diagnosed ability parameter to adapt instructional focus across different proficiency levels, and a Multi-Perspective Feedback Evaluation Strategy that assesses feedback quality via pairwise preference judgements and student revision simulations. Experiments on the ASAP++ dataset demonstrate that PsyScore achieves competitive scoring performance while providing more pedagogically aligned feedback.

21.
medRxiv (Medicine) 2026-06-15

Specialty Choice Attitudes Among Medical Interns: Evidence from Hormozgan University of Medical Sciences

Background: Choosing a medical specialty is a critical career decision that affects both physicians future professional lives and the composition of the healthcare workforce. Specialty preferences are shaped by multiple personal, educational, and socioeconomic factors, yet evidence from senior medical students in southern Iran remains limited. This study aimed to assess willingness to pursue specialty training among medical interns at Hormozgan University of Medical Sciences, identify their preferred specialties, and examine factors associated with their decisions. Methods: This descriptive-analytical cross-sectional study was conducted in 2023 among medical interns at Hormozgan University of Medical Sciences in Bandar Abbas, Iran. Using a convenience census approach, all eligible interns were invited to participate, and 83 students completed an online questionnaire. The instrument collected demographic, academic, and occupational data, as well as reasons for willingness or unwillingness to pursue specialty training and specialty preferences. Content and face validity were assessed by faculty members and students, and internal consistency reliability in the present study was acceptable (Cronbach alpha = 0.82). Data were analyzed using descriptive statistics and logistic regression in SPSS version 27. Results: Of the 83 participants, 50 (60.2%) reported willingness to pursue specialty training, while 33 (39.8%) did not. Among students willing to continue, the most frequently cited reasons were achieving a better economic position, broader job opportunities, and higher social status. Among those unwilling to continue, the most common reasons were fatigue from prolonged studying, financial problems, and the desire to start working after graduation. Radiology was the most common first-choice specialty, followed by otorhinolaryngology, dermatology, and cardiology. In regression analyses, no demographic or academic variable remained independently associated with willingness to pursue specialty training in the final multivariable model. Conclusions: A majority of medical interns were interested in pursuing specialty training, with preferences concentrated in a limited number of specialties perceived as offering favorable financial prospects, prestige, and lifestyle. Economic concerns and educational fatigue were the dominant factors influencing willingness and unwillingness to continue specialty education. These findings highlight the need for structured career counseling, broader exposure to different specialties, and policy measures to address financial and structural barriers to residency training. Keywords: medical specialty choice; medical interns; residency training; medical education; Hormozgan university of medical sciences

22.
arXiv (CS.CV) 2026-06-11

Atlas H&E-TME: Scalable AI-Based Tissue Profiling at Expert Pathologist-Level Accuracy

Hematoxylin and eosin (H&E) staining is the cornerstone of histopathology, yet scalable, quantitative analysis of H&E whole-slide images (WSIs) remains a central challenge in computational pathology. We present Atlas H&E-TME, an AI-based system built on the Atlas family of pathology foundation models that predicts tissue quality, tissue region, and cell type labels across multiple cancer types, yielding over 4,500 quantitative readouts per slide at cell-level resolution. A key challenge to validating such systems is overcoming morphological ambiguity inherent to H&E-only ground truth and the limited scalability of more informed references drawing on modalities such as immunohistochemistry (IHC). We address this with a dual validation framework combining biologically grounded depth with technical and morphological breadth. For depth, we propose an IHC-informed multi-pathologist consensus protocol that substantially improves inter-rater agreement over conventional H&E-only annotation. This yields a molecularly grounded reference against which we compare Atlas H&E-TME and pathologists working from H&E alone. For breadth, we benchmark Atlas H&E-TME on over 200,000 high-confidence H&E-only pathologist annotations across 1,500+ cases spanning eight cancer types and their most common metastatic sites, with subtypes covering >90% of clinical cases per cancer type, drawn from 25+ sources and 8+ scanner models. Benchmarked against the IHC-informed consensus, Atlas H&E-TME matches or exceeds pathologist H&E-only performance and generalizes consistently and robustly across this broad morphological and technical scope. In doing so, Atlas H&E-TME turns the H&E slide – the most ubiquitous data in pathology – into a scalable, quantitative window into the tumor and its microenvironment, laying a foundation for the next generation of tissue-based biomarkers in translational and clinical research.

23.
arXiv (CS.AI) 2026-06-16

JADE: Expert-Grounded Dynamic Evaluation for Open-Ended Professional Tasks

arXiv:2602.06486v2 Announce Type: replace Abstract: Evaluating agentic AI on open-ended professional tasks faces a fundamental dilemma between rigor and flexibility. Static rubrics provide rigorous, reproducible assessment but fail to accommodate diverse valid response strategies, while LLM-as-a-judge approaches adapt to individual responses yet suffer from instability and bias. Human experts address this dilemma by combining domain-grounded principles with dynamic, claim-level assessment. Inspired by this process, we propose JADE, a two-layer evaluation framework. Layer 1 encodes expert knowledge as a predefined set of evaluation skills, providing stable evaluation criteria. Layer 2 performs report-specific, claim-level evaluation to flexibly assess diverse reasoning strategies, with evidence-dependency gating to invalidate conclusions built on refuted claims. Experiments on BizBench show that JADE improves evaluation stability and reveals critical agent failure modes missed by holistic LLM-based evaluators. We further demonstrate strong alignment with expert-authored rubrics and effective transfer to HealthBench and DR.BENCH, covering medical and 10-domain professional evaluation settings. Code and data are available at https://github.com/smiling-world/JADE.

24.
arXiv (math.PR) 2026-06-16

The Winner Takes It All

arXiv:2606.16885v1 Announce Type: cross Abstract: The winner-takes-all (WTA) process takes place on an arbitrary graph. There is an agent on each vertex of the graph, and active agents at neighboring vertices play games. In each game, a randomly chosen agent wins, while the loser is eliminated from subsequent games. The games are played at random times; each game finishes instantaneously, and the games cease when each active agent has only losers among its neighbors. On the one-dimensional lattice, the fraction of winners in the final state is $e^{-1}$, and we also determine the fractions $w_j$ of winners who won $j=0, 1, 2$ games. For the WTA process on a segment, we determine statistics of the total number of winners (the average, the variance, and all higher cumulants), the probabilities of reaching the final state with the minimum or maximum number of winners, and establish the behavior near the boundaries. For infinite regular trees with vertices of degree $d$, i.e., Bethe lattices with coordination number $d$, the fraction of winners is $(2/d)^{d/(d-2)}$.

25.
arXiv (CS.CL) 2026-06-16

Interactor: Agentic RL oriented Iterative Creation for Ad Description Generation in Sponsored Search

This paper focuses on automatically generating informative ad descriptions in sponsored search. Unlike ad titles which are usually optimized to attract user click feedbacks, ad descriptions have a longer text span and possess the potential of incorporating world knowledge to address user search intents while presenting the fine-grained selling points of the ads. We propose Interactor, a multi-turn iterative creation framework optimized with agentic RL for ad description generation. The generation model acts as a policy that interacts with a customized environment consisting of multiple generative reward models. Given initial generations by the policy, the customized GenRMs evaluate multi-dimensional qualities including knowledge capacity and landing page consistency, providing both binary signals and reasoning feedbacks. The policy then iteratively refines the descriptions based on such feedbacks to ensure continuous improvement. Experiments on industrial datasets show that the Interactor framework significantly outperforms state-of-the-art approaches in generating knowledge-rich and faithful ad descriptions. Since May 2026, it has been deployed online in a leading search ads system, contributing to both ad revenue and user experience.