论文广场 - AcademicHub

01.

arXiv (CS.AI) 2026-06-11 DOI: arXiv:2605.02411

FitText: Evolving Agent Tool Ecologies via Memetic Retrieval

作者:

Kyle Zheng ↗Han Zhang ↗Renliang Sun ↗Chenchen Ye ↗Wei Wang ↗

arXiv:2605.02411v2 Announce Type: replace Abstract: A semantic gap separates how users describe tasks from how tools are documented. As API ecosystems scale to tens of thousands of endpoints, static retrieval from the initial query alone cannot bridge this gap: the agent's understanding of what it needs evolves during execution, but its tool set does not. We identify this retrieval interface, not planning, as the binding constraint on end-to-end agent performance, and introduce FitText, a training-free framework that makes retrieval dynamic by embedding it directly in the agent's reasoning loop. FitText treats retrieval as test-time evolution of hypotheses: the agent generates natural-language pseudo-tool descriptions (revisable beliefs about the tool it needs), refines them iteratively using retrieval feedback, and explores diverse alternatives through stochastic generation. Memetic Retrieval adds evolutionary selection pressure over candidate descriptions, guided by a tool memory that avoids redundant search. On ToolRet (three domains), FitText's reformulation strategies improve NDCG@5 by 2.7 to 10.6 points over static query retrieval across all base models; on StableToolBench (16,464 APIs) with GPT-5.4-mini, Memetic reaches an 84.3% pooled pass rate, a 26.7-point absolute gain over static query retrieval.

阅读与讨论 → 访问原文 →

02.

arXiv (CS.AI) 2026-06-15 DOI: arXiv:2606.03108

EvoTrainer: Co-Evolving LLM Policies and Training Harnesses for Autonomous Agentic Reinforcement Learning

作者:

Guhong Chen ↗Yingcheng Shi ↗Yongbin Li ↗Binhua Li ↗Xander Xu ↗Hu Wei ↗Shiwen Ni ↗Min Yang ↗Jieping Ye ↗

arXiv:2606.03108v2 Announce Type: replace Abstract: Autonomous LLM training is often framed as recipe search, which leaves the training harness largely static. This limitation sharpens in agentic RL, where shifting bottlenecks and scalar rewards mask diverse failure modes. We introduce EvoTrainer, an autonomous training framework that co-evolves LLM policies and training-side harnesses through empirical feedback: it diagnoses rollout-level evidence, revises diagnostics, backtests interventions, and accumulates reusable skills. Evaluated on mathematical reasoning, competitive-programming code generation, and repository-level software engineering, EvoTrainer matches or exceeds the human-engineered RL references under the same data, codebase, and evaluation protocol, with the largest gain on long-horizon agentic SWE. Trajectory analyses show that retained strategies diverge across domains, evolving diagnostics prevent invalid high-scoring branches from being promoted, and reusable skills shape later search. Autonomous LLM RL should move beyond recipe search toward joint evolution of policies and the training harnesses that interpret them.

阅读与讨论 → 访问原文 →

03.

arXiv (CS.CV) 2026-06-16 DOI: arXiv:2606.16794

LLM-Based Visual Explanation Evaluation Framework for Assessing the Explainability of Facial Skin Disease Classification Models

作者:

Gyuyeon Na ↗

This study proposes a domain-specific LLM-based Visual Explanation Evaluation Framework for assessing Grad-CAM explanations in facial skin disease diagnosis models. While previous studies have primarily focused on improving classification performance through data augmentation techniques, relatively few studies have systematically examined whether model explanations are grounded in clinically relevant lesion regions. In this study, geometric augmentation, color-based augmentation, and mixed augmentation strategies were applied to facial skin disease classification models based on EfficientNet-B0, MobileNetV3, and ResNet18. Grad-CAM was employed to generate visual explanations representing the models' decision-making processes. Furthermore, an LLM-as-a-Judge evaluation framework was designed using GPT-5.5, Gemini 3.5 Flash, and Claude Sonnet 4.6 to assess Grad-CAM explanations from the perspectives of lesion localization and explanation trustworthiness. To improve evaluation consistency and clinical grounding, a progressive prompt engineering strategy was introduced, incorporating evaluation rubrics, clinical knowledge, penalty rules, and structured output formats.

阅读与讨论 → 访问原文 →

04.

arXiv (CS.CL) 2026-06-15 DOI: arXiv:2606.14047

Knowledge Graph Enhanced Memory-Augmented Retrieval for Long Context Modeling

作者:

Ghadir Alselwi ↗Basem Suleiman ↗Hao Xue ↗Shoaib Jameel ↗Hakim Hacid ↗Flora D. Salim ↗Imran Razzak ↗

Long-context language modeling requires not only extending context windows but maintaining coherent understanding of entity states and relationships across thousands of tokens – a challenge that semantic similarity alone cannot address. KGERMAR addresses this by constructing dynamic, context-specific knowledge graphs from input text during inference, enabling domain-adaptive retrieval that leverages both semantic similarity and explicit entity relationships. The framework performs real-time entity and relation extraction to build contextual knowledge graphs, then integrates graph-structural embeddings with textual semantics through a multi-component memory architecture. Three memory banks – contextual, semantic, and structural – are maintained with retrieval signals fused via learned weights to capture both surface-level semantics and deeper relational patterns. Evaluated on SlimPajama (84.7K training examples), WikiText-103 (4,358 examples), PG-19 (100 examples), and Proof-pile (46.3K examples), KGERMAR achieves up to 8.5\% lower perplexity and 2–2.5x better memory efficiency than memory-augmented baselines across context lengths from 1K to 32K tokens, with superior in-context learning performance across five NLU tasks. The dynamic knowledge graph construction approach advances memory-augmented language modeling by enabling domain-specific knowledge representation that adapts to input contexts rather than relying on fixed knowledge bases.

阅读与讨论 → 访问原文 →

05.

arXiv (CS.AI) 2026-06-16 DOI: arXiv:2602.06486

JADE: Expert-Grounded Dynamic Evaluation for Open-Ended Professional Tasks

作者:

Lanbo Lin ↗Jiayao Liu ↗Tianyuan Yang ↗Li Cai ↗Yuanwu Xu ↗Lei Wei ↗Sicong Xie ↗Guannan Zhang ↗

arXiv:2602.06486v2 Announce Type: replace Abstract: Evaluating agentic AI on open-ended professional tasks faces a fundamental dilemma between rigor and flexibility. Static rubrics provide rigorous, reproducible assessment but fail to accommodate diverse valid response strategies, while LLM-as-a-judge approaches adapt to individual responses yet suffer from instability and bias. Human experts address this dilemma by combining domain-grounded principles with dynamic, claim-level assessment. Inspired by this process, we propose JADE, a two-layer evaluation framework. Layer 1 encodes expert knowledge as a predefined set of evaluation skills, providing stable evaluation criteria. Layer 2 performs report-specific, claim-level evaluation to flexibly assess diverse reasoning strategies, with evidence-dependency gating to invalidate conclusions built on refuted claims. Experiments on BizBench show that JADE improves evaluation stability and reveals critical agent failure modes missed by holistic LLM-based evaluators. We further demonstrate strong alignment with expert-authored rubrics and effective transfer to HealthBench and DR.BENCH, covering medical and 10-domain professional evaluation settings. Code and data are available at https://github.com/smiling-world/JADE.

阅读与讨论 → 访问原文 →

06.

arXiv (CS.CL) 2026-06-18 DOI: arXiv:2606.18606

Steerable Cultural Preference Optimization of Reward Models

作者:

Minsik Oh ↗Advit Deepak ↗Sophie Wu ↗Douwe Kiela ↗Ekaterina Shutova ↗

It is essential for large language model (LLM) technology to serve many different cultural sub-communities in a manner that is acceptable to each community. However, research on LLM alignment has so far predominantly focused on predicting a unified response preference of annotators from certain regions. This paper aims to advance the development of alignment models with a more global outlook, that are able to accurately represent the preferences of subcommunities and do not exhibit excessive bias towards any of them. We focus on the development of reward models for this purpose and present a novel reward model training algorithm (SCPO) that can incorporate diverse cultural preferences in a balanced manner. Our method results in performance increases of the minority reward model of up to 7 points over the baseline model across two datasets, PRISM and GlobalOpinionQA, and across 7 countries. SCPO is up to 280% more training data-efficient than full-data finetuning of reward models. In addition, we perform analysis of bias by separately evaluating on the preference of subcommunities and show that excessive bias is mitigated via our weighting method. Our code is available at https://github.com/minsik-ai/Steerable-Cultural-Preference

阅读与讨论 → 访问原文 →

07.

arXiv (CS.CV) 2026-06-11 DOI: arXiv:2606.12106

MSUE: Multi-Modal Soccer Understanding Expert

作者:

Litao Li ↗Yibo Yu ↗Yufeng Hu ↗Zhuo Yang ↗Jiali Wen ↗Yixin Chen ↗Yixi Zhou ↗

This paper presents our solution to the 2026 SoccerNet VQA Challenge. We first develop a cost-effective data synthesis pipeline driven by a Vision-Language Model (VLM), which systematically restructures raw domain data into diverse VQA samples, including concise answers and long-form responses. Second, we propose MSUE, a multi-expert question answering architecture that employs a Large Language Model (LLM) to dynamically dispatch questions to text, image, and video experts. These experts are instantiated as a strong text baseline Gemini3-Flash, a fine-tuned Qwen3-VL, and an external knowledge base, respectively, working collaboratively to enhance VQA performance. MSUE achieves an accuracy of 0.95 on the challenge benchmark, securing third place in the leaderboard.

阅读与讨论 → 访问原文 →

08.

arXiv (CS.LG) 2026-06-18 DOI: arXiv:2412.16468

The Road to Artificial SuperIntelligence: A Comprehensive Survey of Superalignment

作者:

HyunJin Kim ↗DongHyun Ryu ↗Xiaoyuan Yi ↗Jing Yao ↗Jianxun Lian ↗Muhua Huang ↗Shitong Duan ↗JinYeong Bak ↗Xing Xie ↗

arXiv:2412.16468v4 Announce Type: replace Abstract: The emergence of large language models (LLMs) has sparked discussion on Artificial Superintelligence (ASI), a hypothetical AI system that surpasses human intelligence. Although ASI remains hypothetical and far beyond current AI capabilities, discussing its potential and exploring its feasibility and potential risks is critical for the development of future AI systems. The idea of superalignment originates from scalable oversight, which studies how to supervise increasingly capable AI systems when direct human supervision becomes insufficient. In this paper, we focus on the superalignment problem: "The process of supervising, controlling, and governing artificial superintelligence." We first review scalable oversight paradigms-Sandwiching, Self-Enhancement, and Weak-to-Strong Generalization – then analyze the limitations of current paradigms through the lens of possibility and impossibility, discuss key challenges, and propose pathways for the safe and continual improvement of future AI systems.

阅读与讨论 → 访问原文 →

09.

arXiv (CS.CV) 2026-06-19 DOI: arXiv:2606.20161

ARTEMIS: Agent-guided Reliability-aware Temporal Mask Evolution for Imperfectly Supervised Video Polyp Segmentation

作者:

Tong Wang ↗Siwen Wang ↗Yaolei Qi ↗Jinxing Zhou ↗Yuting He ↗Guanyu Yang ↗Yutong Xie ↗

Imperfectly supervised video polyp segmentation (VPS) aims to learn dense, temporally consistent masks from inexpensive supervision, including weak annotations (points, scribbles) and semi-supervision with few densely labeled frames. This setting is clinically valuable but challenging due to weak contrast, ambiguous boundaries, motion blur, and specular highlights, compounded by sparse pixel-level guidance. While SAM2 can generate dense masks from sparse inputs, direct pseudo-labeling often yields geometry-degraded masks with boundary leakage, underutilizes temporal consistency, and ignores reliability. To address these issues, we propose ARTEMIS, a unified framework for imperfectly supervised VPS driven by agent-guided reliability-aware temporal mask evolution. ARTEMIS initializes coarse masks from available supervision: SAM2 converts points/scribbles, while dense labels serve as reliable anchors. A debate-and-judge vision-language agent selects reliable temporal anchors under weak supervision, which are propagated bidirectionally with SAM2 to refine unreliable or unlabeled frames. Finally, ARTEMIS trains the segmenter using temporal reliability-aware robust learning, incorporating reliability-guided reference selection, a Reference Prototype Transport Module, and reliability-aware robust loss. These components assess mask reliability, evolve anchors over time, transport target identity across frames, and down-weight noisy supervision instead of discarding difficult samples. Experiments on SUN-SEG and CVC-ClinicDB-612 under scribble, point, and limited-label settings demonstrate that ARTEMIS achieves state-of-the-art performance. Code will be released at https://github.com/wangtong627/ARTEMIS.

阅读与讨论 → 访问原文 →

10.

arXiv (CS.LG) 2026-06-18 DOI: arXiv:2508.02158

Robust Detection of Planted Subgraphs in Semi-Random Models

作者:

Dor Elimelech ↗Wasim Huleihel ↗

arXiv:2508.02158v2 Announce Type: replace-cross Abstract: Detection of planted subgraphs in Erdös-Rényi random graphs has been extensively studied, leading to a rich body of results characterizing both statistical and computational thresholds. However, most prior work assumes a purely random generative model, making the resulting algorithms potentially fragile in the face of real-world perturbations. In this work, we initiate the study of semi-random models for the planted subgraph detection problem, wherein an adversary is allowed to remove edges outside the planted subgraph before the graph is revealed to the statistician. Crucially, the statistician remains unaware of which edges have been removed, introducing fundamental challenges to the inference task. We establish fundamental statistical limits for detection under this semi-random model, revealing a sharp dichotomy. Specifically, for planted subgraphs with strongly sub-logarithmic maximum density detection becomes information-theoretically impossible in the presence of an adversary-despite being possible for some planted subgraphs in the classical random model. In stark contrast, for subgraphs with super-logarithmic density, the statistical limits remain essentially unchanged; we prove that the optimal (albeit computationally intractable) likelihood ratio test remains robust. Beyond these statistical boundaries, we design a new computationally efficient and robust detection algorithm, and provide rigorous statistical guarantees for its performance. Our results establish the first robust framework for planted subgraph detection and open new directions in the study of semi-random models, computational-statistical trade-offs, and robustness in graph inference problems.

阅读与讨论 → 访问原文 →

11.

arXiv (CS.LG) 2026-06-12 DOI: arXiv:2601.22003

Efficient Stochastic Optimisation via Sequential Monte Carlo

作者:

James Cuin ↗Davide Carbone ↗Yanbo Tang ↗O. Deniz Akyildiz ↗

arXiv:2601.22003v2 Announce Type: replace-cross Abstract: The problem of optimising functions with intractable gradients frequently arises in machine learning and statistics, ranging from maximum marginal likelihood estimation procedures to fine-tuning of generative models. Stochastic approximation methods for this class of problems typically require inner sampling loops to obtain (biased) stochastic gradient estimates, which rapidly becomes computationally expensive. In this work, we develop sequential Monte Carlo (SMC) samplers for optimisation of functions with intractable gradients. Our approach replaces expensive inner sampling methods with efficient SMC approximations, which can result in significant computational gains. We establish convergence results for the basic recursions defined by our methodology which SMC samplers approximate. We demonstrate the effectiveness of our approach on the reward-tuning of energy-based models within various settings.

阅读与讨论 → 访问原文 →

12.

arXiv (CS.CL) 2026-06-19 DOI: arXiv:2603.16941

The Voice Behind the Words: Quantifying Intersectional Bias in SpeechLLMs

作者:

Shree Harsha Bokkahalli Satish ↗Christoph Minixhofer ↗Maria Teleki ↗James Caverlee ↗Ond\v{r}ej Klejch ↗Peter Bell ↗Gustav Eje Henter ↗\'Eva Sz\'ekely ↗

Speech Large Language Models (SpeechLLMs) process spoken input directly, retaining cues such as accent and perceived gender that were previously removed in cascaded pipelines. This introduces speaker identity dependent variation in responses. We present a large-scale intersectional evaluation of accent and gender bias in three SpeechLLMs using 2,880 controlled interactions across six English accents and two gender presentations, keeping linguistic content constant through voice cloning. Using pointwise LLM-judge ratings, pairwise comparisons, and Best-Worst Scaling with human validation, we detect recurring directional disparities. Eastern European-accented speech receives lower helpfulness scores, particularly for female-presenting voices. Responses remain polite but differ in helpfulness. While LLM judges capture the directional trend of these biases, human evaluators exhibit significantly higher sensitivity, showing stronger accent-level contrasts.

阅读与讨论 → 访问原文 →

13.

arXiv (CS.AI) 2026-06-19 DOI: arXiv:2402.14035

Wisdom of Committee: Diverse Distillation from Large Foundation Models and Domain Experts

作者:

Zichang Liu ↗Qingyun Liu ↗Yuening Li ↗Liang Liu ↗Anshumali Shrivastava ↗Shuchao Bi ↗Lichan Hong ↗Ed H. Chi ↗Zhe Zhao ↗

arXiv:2402.14035v4 Announce Type: replace-cross Abstract: Knowledge distillation from foundation models to compact domain models is challenging due to substantial gaps in capacity, architecture, and modality. For example, in our experiments, distilling from a 76M-parameter language model to a 2M-parameter recommender closes less than 40% of the performance gap between the undistilled student and the teacher. We show that introducing domain-specific experts – which share the student's architectural characteristics – alongside the foundation model as a diverse teacher committee significantly improves transfer. However, standard multi-teacher methods fail to exploit this diversity: naively combining heterogeneous teachers can degrade performance below single-teacher distillation. To address this, we propose DiverseDistill, an interactive distillation framework that employs a learnable Question-Answer mechanism to generate teacher-conditioned queries and align heterogeneous teacher outputs into the student's representation space. Unlike methods requiring gradient-based co-optimization or architectural modification of teachers, DiverseDistill operates with frozen teachers using only forward-pass inference through their intermediate layers: no parameter updates, no co-training, and no architectural surgery. A dynamic teacher importance mechanism further reduces training cost by filtering low-relevance teachers per sample (e.g., ~30% fewer forward passes with no quality loss for recommendation tasks), while the entire Distillation Module is discarded after training, adding zero inference overhead. Evaluations on recommendation (38x compression) and vision (3.6x compression) tasks demonstrate that DiverseDistill recovers 73-114% of the teacher-student performance gap, consistently outperforming all single- and multi-teacher baselines.

阅读与讨论 → 访问原文 →

14.

bioRxiv (Bioinfo) 2026-06-16 DOI: HASH:1960947082525af6446e27a951bbbb46

RetroMol: Parsing a shared encoding from natural products and their biosynthetic gene clusters

作者:

Meijer ↗Williams ↗S. E ↗Terlouw ↗Charusanti ↗Kok ↗Skinnider ↗M. A ↗Weber ↗van der Hooft ↗J. J. J ↗Healy ↗…

Natural products such as polyketides and nonribosomal peptides (NRPs) are important sources of bioactive compounds, including many antibiotics. Many of them are assembled by modular enzyme complexes and further modified and diversified by tailoring reactions encoded by biosynthetic gene clusters (BGCs). Although natural products and their coding BGCs describe different data modalities of the same biochemical process, a unified language to jointly describe their biochemistry is lacking. Here we introduce a sequence-based representation of the core biosynthesis of modular natural products, which we call primary sequences, that bridges chemical structures and BGCs. We also present RetroMol, an algorithm that parses either natural product structures or their encoding BGCs into their primary sequences of natural product building blocks. RetroMol allows for similarity scoring between natural products and BGCs, enabling the retrieval of compounds, BGCs, and a combination of the two, based on their biosynthetic similarity. This can, for instance, be used to retrieve biosynthetically similar but structurally dissimilar compounds, or link natural products to candidate coding BGCs in large experimental datasets. We demonstrate the latter by rediscovering the nocardichelin B BGC as a proof of principle. We also exemplify the utility of biosynthetic similarity by showing various pairs of biosynthetically similar compounds with low structural similarity. Together, these results establish primary sequences as a shared biosynthetic encoding for natural product comparison and BGC prioritization.

阅读与讨论 → 访问原文 →

15.

arXiv (quant-ph) 2026-06-11 DOI: arXiv:2606.12082

Bound State Solutions of the Relativistic Finite-difference Equation for the Ring-shaped Quesne Oscillator Potential

作者:

Sh. M. Nagiyev ↗Narmin Nasibova ↗V. A. Tarverdiyeva ↗G. H. Guliyeva ↗

arXiv:2606.12082v1 Announce Type: new Abstract: We solve exactly the relativistic finite-difference equation for the quantum three-dimensional ring-shaped Quesne oscillator potential. Our investigation is based on a finite-difference version of relativistic quantum mechanics. So-called relativistic configurational r-space is a key concept here. We show that the radial wavefunctions and angular wavefunctions are expressed through the continuous dual Hahn polynomials and Jacobi polynomials, respectively. A discrete energy spectrum has been found. The radial wave functions and energy spectrum have the correct nonrelativistic limit. We also build a dynamical symmetry group SU (1, 1) for the radial part of the equation of motion, which allows us to find the energy spectrum purely algebraically.

阅读与讨论 → 访问原文 →

16.

medRxiv (Medicine) 2026-06-22 DOI: HASH:85178fcc727cd11c0d1bbbbb81fd73a7

Exploring the association of Obesity on Cold and Warm Autoimmune Hemolytic Anemia in San Joaquin Valley: A Retrospective Cross-Sectional Study

作者:

Dao ↗K. T ↗Sharma ↗Hariprasad ↗Indrawes ↗Montana ↗W. N ↗

The relationship between obesity and specific autoimmune diseases haas been well-established, specifically due to obesity's role in promoting pro-inflammatory states. Although not much literature has been documented regarding obesity association with AIHA. As such, this study aims to assess any correlations in patients with elevated body mass index (BMI) and autoimmune hemolytic anemia (AIHA). Here we present a retrospective cross-sectional study conducted over a four-year period, across four medical centers during which a new electronic medical record was implemented. The study included 25 patients who had a previously documented history of AIHA from another facility, DAT positive with indicators of hemolysis, or DAT positive with monomer specific antisera. The patients BMI was recorded at the time of presentation to the hospital. However, for patients with a prior history of AIHA or those transferred from another facility, the BMI that was closest to the time period of when the patient was diagnosed with AIHA was used as an adjunct. Our results show that there is an association of patients with elevated BMI (>25) and AIHA; however, various other confounding variables should be taken into consideration, and further research should be done to establish a causal relationship.

阅读与讨论 → 访问原文 →

17.

arXiv (CS.CV) 2026-06-12 DOI: arXiv:2606.13022

Quality-Preserving Imperceptible Adversarial Attack on Skeleton-based Human Action Recognition

作者:

Ziyi Chang ↗Kanglei Zhou ↗Xiaohui Liang ↗Hubert P. H. Shum ↗

Adversarial attacks on skeletal human action recognition have received significant attention. However, existing methods typically introduce noise-like perturbations that degrade motion quality post-attack, and thereby are inherently perceptible with recent advancements in S-HAR systems. We discover that this degradation stems from the gap between empirical and true risks during the optimization process of previous adversarial attacks. To address this issue, we propose an attack where adversarial motions are obtained without compromising their motion quality. To minimize the risk gap and preserve motion quality, we propose a distribution-based adversarial attack method without introducing noise-like perturbations. To faithfully evaluate the motion quality, we propose a new metric that aligns with human perception on real-world naturalness. Experiments have been conducted on the state-of-the-art S-HAR methods across two datasets, demonstrating the superiority of our method in both the attack success rate and the post-attack motion quality through qualitative and quantitative analyses. The success of our quality-preserving attack application and distribution-based method raises serious concerns about the robustness of action recognizers, highlighting the need for further enhancements in this domain.

阅读与讨论 → 访问原文 →

18.

bioRxiv (Bioinfo) 2026-06-11 DOI: HASH:8a74a3bb8fdbe9c3ca06880fb9bed9aa

DivQuant: Estimation of Species Richness and Entropy from Small Samples

作者:

Schmitz ↗J. E ↗Rahmann ↗

Estimating diversity properties of discrete distributions from a small observed sample is a fundamental problem in algorithmic statistics that has applications in many fields, in particular bioinformatics, but also in ecology or linguistics. The two most common diversity measures are the number of distinct elements in a multiset, also referred to as species richness in ecology or alpha diversity in microbial analysis, and the Shannon entropy, also referred to as evenness. Estimating these properties from a small sample is particularly challenging for distributions with many rare elements. Thus, many estimators have been proposed in the past that, in practice, work well for different types of distributions. We present DivQuant, an optimization-based, extrapolating richness and entropy estimator with three contributions. First, we formulate the upsampling problem as a convex quadratic program with a Neyman {chi}2 objective. Unlike the linear program of its predecessor RichnEst, DivQuant admits confidence intervals via {chi}2 test inversion that are empirically well-calibrated. Second, we replace RichnEst's fixed-threshold fingerprint truncation with the rare/abundant fingerprint split of Valiant and Valiant, which strongly reduces problem size and preserves enough degrees of freedom for the confidence-interval program to remain valid and feasible. Third, we plug the optimal population fingerprint returned by the program into Shannon's entropy formula to obtain an entropy estimate. DivQuant attains close-to-nominal 95% confidence intervals in essentially all tested regimes, including six simulated distribution families, Tara Oceans microbiome data, and 10X Genomics scRNA-seq data, while competing state-of-the-art methods (RichnEst, iNext, PreSeq) miss the true richness in up to 80% of instances, well above the nominal 5%. In addition, DivQuant outperforms classical asymptotic entropy estimators (Miller-Madow, CAE) and the extrapolating iNext estimator. Running times remain competitive, with DivQuant typically completing in seconds. DivQuant is available as a command-line tool at https://gitlab.com/rahmannlab/divquant.

阅读与讨论 → 访问原文 →

19.

arXiv (CS.CV) 2026-06-11 DOI: arXiv:2602.08735

From Correspondence to Actions: Human-Like Multi-Image Spatial Reasoning in Multi-modal Large Language Models

作者:

Masanari Oi ↗Koki Maeda ↗Ryuto Koike ↗Daisuke Oba ↗Nakamasa Inoue ↗Naoaki Okazaki ↗

While multimodal large language models (MLLMs) have made substantial progress in single-image spatial reasoning, multi-image spatial reasoning, which requires integration of information from multiple viewpoints, remains challenging. Cognitive studies suggest that humans address such tasks through two mechanisms: cross-view correspondence, which identifies regions across different views that correspond to the same physical locations, and stepwise viewpoint transformation, which composes relative viewpoint changes sequentially. However, existing studies incorporate these mechanisms only partially and often implicitly, without explicit supervision for both. We propose Human-Aware Training for Cross-view correspondence and viewpoint cHange (HATCH), a training framework with two complementary objectives: (1) Patch-Level Spatial Alignment, which encourages patch representations to align across views for spatially corresponding regions, and (2) Action-then-Answer Reasoning, which requires the model to generate explicit viewpoint transition actions before predicting the final answer. Experiments on three benchmarks demonstrate that HATCH consistently outperforms baselines of comparable size by a clear margin and achieves competitive results against much larger models, while preserving single-image reasoning capabilities.

阅读与讨论 → 访问原文 →

20.

arXiv (quant-ph) 2026-06-11 DOI: arXiv:2601.13467

Quantum Entanglement, Stratified Spaces, and Topological Matter: Towards Entanglement-Sensitive Langlands Data

作者:

Kazuki Ikeda ↗Steven Rayan ↗

arXiv:2601.13467v2 Announce Type: replace Abstract: Using the spinless Haldane model, we study the witness-filtered Berry curvature, quantum geometric tensor, and quantum Fisher information on the gapped strata of the parameter space and evaluate them through the Fukui-Hatsugai-Suzuki discretization. The filtered quantities isolate the part of the geometric response carried by sublattice coherence: they suppress contributions from regions where the occupied Bloch state is locally A/B-separable and emphasize regions where curvature and coherence coexist. We derive exact lattice identities, reconstruction formulas for the curvature-weighted coherence, and bounds relating the filtered quantum geometric tensor and quantum Fisher information to single-particle mode entanglement. Across the gap-closing stratum, the quantized response changes admit a natural description in terms of Hecke modifications. We elicit a corresponding Langlands viewpoint – not as a full correspondence, but as an organizational principle and as the mathematical shadow of these physical geometric constructions.

阅读与讨论 → 访问原文 →

21.

arXiv (CS.AI) 2026-06-18 DOI: arXiv:2606.18746

What Must Generalist Agents Remember?

作者:

Khurram Yamin ↗Namrata Deka ↗Maitreyi Swaroop ↗Albert Ting ↗Jeff Schneider ↗Bryan Wilder ↗

arXiv:2606.18746v1 Announce Type: new Abstract: This paper develops a formal account of what generalist agents must store in memory in order to act near-optimally across multiple environments and goals. It shows that when two domains share an observational bottleneck but require incompatible optimal actions, any uniformly near-optimal policy must induce distinct memory distributions at that bottleneck. The result yields a separation theorem: sufficiently successful agents cannot rely only on current state observations, but must preserve domain-relevant information in memory. The paper further shows that if an agent's memory contains enough information to estimate values for related goals, then that memory can be used to approximately reconstruct the agent's local transition dynamics. Together, these results characterize memory as the substrate that supports domain disambiguation, transition-model reconstruction, and planning for generalist agents.

阅读与讨论 → 访问原文 →

22.

arXiv (CS.AI) 2026-06-19 DOI: arXiv:2604.08552

Automated Standardization of Legacy Biomedical Metadata Using an Ontology-Constrained LLM Agent

作者:

Josef Hardi ↗Martin J. O'Connor ↗Marcos Martinez-Romero ↗Jean G. Rosario ↗Stephen A. Fisher ↗Mark A. Musen ↗

arXiv:2604.08552v2 Announce Type: replace-cross Abstract: Scientific metadata are often incomplete and noncompliant with community standards, limiting dataset findability, interoperability, and reuse. Even when standard metadata reporting guidelines exist, they typically lack machine-actionable representations. Producing FAIR datasets requires encoding metadata standards as machine-actionable templates with rich field specifications and precise value constraints. Recent work has shown that LLMs guided by field names and ontology constraints can improve metadata standardization, but these approaches treat constraints as static text prompts, relying on the model's training knowledge alone. We present an LLM-based metadata standardization system that queries standard reporting guidelines and authoritative biomedical terminology services in real time to retrieve canonically correct standards on demand. We evaluate this approach on 839 legacy metadata records from the Human BioMolecular Atlas Program (HuBMAP) using an expert-curated gold standard for exact-match assessment. Our evaluation shows that augmenting the LLM with real-time tool access consistently improves prediction accuracy over the LLM alone across both ontology-constrained and non-ontology-constrained fields, demonstrating a practical approach to automated standardization of biomedical metadata.

阅读与讨论 → 访问原文 →

23.

arXiv (math.PR) 2026-06-17 DOI: arXiv:2606.17912

Asymptotics of the number of labelled connected sparse multitype graphs

作者:

Luisa Andreis ↗Mario Veshaj ↗

arXiv:2606.17912v1 Announce Type: cross Abstract: We study the asymptotic enumeration of labelled connected multitype graphs in the sparse regime, where both the number of vertices and edges grow linearly and the excess is proportional to the size of the graph. Extending the classical theory of connected graph enumeration to the multitype setting, we consider graphs with prescribed numbers of vertices of each type and prescribed edge counts between each pair of types. Our approach is probabilistic and relies on the theory of inhomogeneous random graphs. In particular, we exploit large-deviation principles and asymptotic estimates for connectedness probabilities to relate the counting problem to the emergence of giant components in suitably tuned supercritical random graphs. From large deviation asymptotics of connected components of inhomogeneous random graphs, we recognize that a connected graph with a given edge statistics corresponds to the (unique) giant component of larger inhomogeneous random graph with a suitably chosen connection kernel. This correspondence allows us to derive the leading exponential asymptotics for the number of connected multitype graphs with fixed type profile and edge matrix. The resulting formula generalizes the asymptotic enumeration results of Bender, Canfield, and McKay for connected sparse graphs to the multitype framework. More broadly, the paper illustrates how probabilistic techniques can provide transparent and effective tools for addressing new combinatorial enumeration problems.

阅读与讨论 → 访问原文 →

24.

arXiv (CS.AI) 2026-06-15 DOI: arXiv:2606.13722

YeasierAgent: Agentic Social Sandbox as a Canvas for Intent-Driven Creation of Platform-Agnostic Symbiotic Agent-Native Applications

作者:

Jory He ↗

arXiv:2606.13722v1 Announce Type: new Abstract: This paper introduces YeasierAgent, an application-building paradigm based on symbiotic agents, narrative worlds, and scene-aware interaction. It challenges the conventional device-coupled model of software by redefining applications as collaborative spaces among users, agents, and worlds. We present a system architecture that achieves two primary contributions: (1) enabling the rapid, cross-platform construction of agent-native applications by utilizing platform-agnostic interactive units (agents, scenes, dialogue) rather than fixed graphical layouts; and (2) unifying the emotional companionship and practical tool execution attributes of intelligent agents within a single experiential sandbox. By integrating automated generation, user-created worlds, and spatial multi-agent collaboration, YeasierAgent formalizes the category of Symbiotic Agent-Native Applications, demonstrating a shift from isolated, tool-specific chatbots toward cohesive, socially embedded computational environments.

阅读与讨论 → 访问原文 →

25.

arXiv (CS.LG) 2026-06-18 DOI: arXiv:2511.05221

ActiTect: A Generalizable Machine Learning Pipeline for REM Sleep Behavior Disorder Screening through Standardized Actigraphy

作者:

arXiv:2511.05221v3 Announce Type: replace Abstract: Isolated rapid eye movement sleep behavior disorder (iRBD) is a major prodromal marker of $\alpha$-synucleinopathies, often preceding the clinical onset of Parkinson's disease, dementia with Lewy bodies, or multiple system atrophy. While wrist-worn actimeters hold significant potential for detecting RBD in large-scale screening efforts by capturing abnormal nocturnal movements, they become inoperable without a reliable and efficient analysis pipeline. This study presents ActiTect, a fully automated, open-source machine learning tool to identify RBD from actigraphy recordings. To ensure generalizability across heterogeneous acquisition settings, our pipeline includes robust preprocessing and automated sleep-wake detection to harmonize multi-device data and extract physiologically interpretable motion features characterizing activity patterns. Model development was conducted on a cohort of 78 individuals, yielding strong discrimination under nested cross-validation (AUROC = 0.95). Generalization was confirmed on a blinded local test set (n = 31, AUROC = 0.86) and on two independent external cohorts (n = 113, AUROC = 0.84; n = 57, AUROC = 0.94). To assess real-world robustness, leave-one-dataset-out cross-validation across the internal and external cohorts demonstrated consistent performance (AUROC range = 0.84-0.89). A complementary stability analysis showed that key predictive features remained reproducible across datasets, supporting the final pooled multi-center model as a robust pre-trained resource for broader deployment. By being open-source and easy to use, our tool promotes widespread adoption and facilitates independent validation and collaborative improvements, thereby advancing the field toward a unified and generalizable RBD detection model using wearable devices.

阅读与讨论 → 访问原文 →

探索全球前沿学术脉络