论文广场 - AcademicHub

01.

arXiv (CS.CL) 2026-06-16 DOI: arXiv:2510.07096

Modeling Sarcastic Speech: Semantic and Prosodic Cues in a Speech Synthesis Framework

作者:

Zhu Li ↗Yuqing Zhang ↗Xiyuan Gao ↗Shekhar Nayak ↗Matt Coler ↗

Sarcasm is a pragmatic phenomenon in which speakers convey meanings that diverge from literal content, relying on an interaction between semantics and prosodic expression. However, how these cues jointly contribute to the recognition of sarcasm remains poorly understood. We propose a computational framework that models sarcasm as the integration of semantic interpretation and prosodic realization. Semantic cues are derived from an LLaMA 3 model fine-tuned to capture discourse-level markers of sarcastic intent, while prosodic cues are extracted through semantically aligned utterances drawn from a database of sarcastic speech, providing prosodic exemplars of sarcastic delivery. Using a speech synthesis testbed, perceptual evaluations show that semantic and prosodic cues enhance perceived sarcasm, with the combined system achieving the best downstream F1 while maintaining high subjective sarcasm ratings. These findings highlight the complementary roles of semantics and prosody in pragmatic interpretation and illustrate how modeling can shed light on the mechanisms underlying sarcastic communication.

阅读与讨论 → 访问原文 →

02.

arXiv (CS.LG) 2026-06-19 DOI: arXiv:2606.19894

Score Approximation for Diffusion Models on Arbitrary Low-Dimensional Structures

作者:

Xinhe Mu ↗Zaijiu Shang ↗Zhaoqi Zhou ↗Chuan Zhou ↗Qi Meng ↗Guiying Yan ↗Zhiming Ma ↗

arXiv:2606.19894v1 Announce Type: new Abstract: The remarkable success of score-based diffusion models has spurred significant efforts to establish their theoretical foundations. However, existing complexity bounds for score approximation rely heavily on restrictive assumptions like Lipschitz continuous densities or smooth manifold supports, which are routinely violated by the singularities, sharp boundaries, and disjoint clusters inherent to real-world perceptual data. This work establishes a universal score approximation theorem that works for any distribution supported on any compact set of upper Minkowski dimension $d$. Using a novel discrete-mixture formulation, we prove that the score function can be approximated with a ReLU network whose complexity grows exponentially only with $d$, thus breaking the exponential curse of ambient dimensionality. Combined with existing theories on accurately solving the backward diffusion SDE for arbitrary compact distributions, our work shows that diffusion models readily adapt to irregular, non-smooth data structures, explaining their competence in real-world generative tasks.

阅读与讨论 → 访问原文 →

03.

arXiv (CS.CV) 2026-06-15 DOI: arXiv:2606.14168

MUSE: Agentic 3D Scene Authoring via Memory-Grounded Incremental Requirement Satisfaction

作者:

Ruijie Xu ↗Xinnan Zhu ↗Jiayu Ying ↗Daoguo Dong ↗Yuzhou Ji ↗Xin Tan ↗

Text-driven 3D scene generation is a promising technique for digital content creation, embodied AI simulation, and interactive design, yet practical workflows often require refining, extending, or correcting existing scenes while preserving non-target content. Existing methods can produce realistic and structurally plausible scenes, but they generally lack editability with requirement-level state tracking, so part-level failures often lead to full-scene regeneration or manual intervention. To tackle this challenge, we formulate controllable 3D scene authoring as incremental requirement satisfaction, unifying construction and editing. In this paper, we present MUSE, a memory-grounded multi-agent framework in which an Architect compiles instructions into structured requirements, a Sculptor executes local scene operations, and an Inspector verifies each step while updating Working, Scene, and Skill Memory. To evaluate requirement-level controllability and preservation-aware editing, we introduce AuthorBench, offering 145 constrained construction cases and a 1,584-case preservation-aware editing pool paired with external structured checks. On full construction cases, MUSE improves All-Goal success from 37.9 to 80.7 and surface-constraint fulfillment from 35.0 to 92.6 over the strongest baseline. On a stratified 240-case editing test split, MUSE achieves 49.6 All-Goal success, 99.9 preservation rate, and only 0.6 unintended change rate. Beyond automated metrics, human evaluations on compared local-editing baselines support stronger alignment with user intent, and downstream navigation-proxy tests indicate stronger spatial stability. Combined with ablations validating our memory designs, these results establish MUSE as an effective framework for controllable 3D scene authoring.

阅读与讨论 → 访问原文 →

04.

arXiv (math.PR) 2026-06-16 DOI: arXiv:2606.15850

A 0-1 Law for Multifractal Spectra via the HGDS Scale Derivative

作者:

Jokubas Petkevi\v{c}ius ↗

arXiv:2606.15850v1 Announce Type: new Abstract: We prove that the multifractal spectrum D(h,omega) of a stochastic process is almost surely deterministic under a scale decorrelation condition on the HGDS scale derivative. The key difficulty is that the pointwise Hölder exponent lives in the germ sigma-algebra, where classical 0-1 laws do not reach. We get around this by working with the geometry accumulation integral G_Lambda, which is a genuine Lebesgue integral over scales and concentrates almost surely. The boundary case – log-correlated fields – is sharp: the variance summability condition fails exactly there.

阅读与讨论 → 访问原文 →

05.

arXiv (CS.AI) 2026-06-18 DOI: arXiv:2606.18810

Learning from Own Solutions: Self-Conditioned Credit Assignment for Reinforcement Learning with Verifiable Rewards

作者:

Yingyu Shan ↗Yuhang Guo ↗Zihao Cheng ↗Zeming Liu ↗Xiangrong Zhu ↗Xinyi Wang ↗Jiashu Yao ↗Wei Lin ↗Hongru Wang ↗Heyan Huang ↗

arXiv:2606.18810v1 Announce Type: cross Abstract: Reinforcement learning with verifiable rewards (RLVR) has driven substantial progress in training LLMs for reasoning tasks, but representative methods such as GRPO assign uniform credit across all tokens, wasting gradient on routine tokens while under-crediting pivotal reasoning steps. Existing token-level credit assignment methods require resources beyond the model's own rollouts. GRPO variants rely on process reward models or ground-truth answers. Knowledge distillation assigns credit through per-token divergence but requires external teachers (On-Policy Distillation) or privileged information (On-Policy Self Distillation). However, these dependencies limit applicability in the pure RLVR setting. We observe that conditioning the model on its own verified trajectories induces a measurable per-token KL divergence between the original and conditioned distributions, and prove that distilling from a self-teacher constructed by verified trajectories leads to infeasible weighted-average solutions when multiple verified trajectories exist. We propose SC-GRPO (Self-Conditioned GRPO), which uses KL divergence mentioned before as a multiplicative weight on GRPO gradients. Across five benchmarks spanning math, code, and agentic tasks, SC-GRPO consistently outperforms 8.1% over GRPO and 5.9% over DAPO with stronger OOD performance. Moreover, SC-GRPO achieves higher performance than OPD.

阅读与讨论 → 访问原文 →

06.

arXiv (CS.CL) 2026-06-11 DOI: arXiv:2606.11502

When Roleplaying, Do Models Believe What They Say?

作者:

Benjamin Sturgeon ↗David Africa ↗Sid Black ↗

Language models can state that "the Earth orbits the Sun" and, when role-playing Aristotle, assert the opposite. Recent work argues that persona adoption is fundamental to how language models operate, with models constantly selecting the most appropriate persona for a given context. Does such role-playing merely change the model's outputs, or does it also affect what the model internally represents as truthful? We study this question with linear truth probes, applying them to LLMs role-playing historical personas whose likely beliefs differ from modern consensus. For each persona, we compare false claims the persona would likely have endorsed (*era-believed*) with topic-matched false claims they would not have endorsed (*era-false*). Across prompting, in-context learning, and supervised fine-tuning, persona induction suppresses era-believed statements less than equally false alternatives, yet they remain classified as false overall. Role-play therefore shifts what these models say more than what they internally represent as true. We contrast this with models trained on harmful advice that exhibit Emergent Misalignment (EM). Across three model families (Qwen 2.5 14B, Qwen 3 8B, and Llama 3.3 70B), their false claims move substantially toward the true region of probe space, are defended under challenge roughly half the time versus about a sixth for role-play, and are used in downstream reasoning. Role-play and Emergent Misalignment thus are points on a spectrum of belief internalization, where role-play changes what a model says with little representational change, while Emergent Misalignment shifts the internal representation of false claims without fully marking them as true.

阅读与讨论 → 访问原文 →

07.

arXiv (CS.AI) 2026-06-19 DOI: arXiv:2606.20554

Structuring and Tokenizing Distributed User Interest Context for Generative Recommendation

作者:

Ruizhong Qiu ↗Yinglong Xia ↗Dongqi Fu ↗Hanqing Zeng ↗Ren Chen ↗Xiangjun Fan ↗Hong Li ↗Hong Yan ↗Hanghang Tong ↗

arXiv:2606.20554v1 Announce Type: cross Abstract: Generative recommendation is an emerging paradigm that has shown promise in industrial recommendation systems, aiming to predict users' next interactions from their historical behaviors. At the core of generative recommendation lies item tokenization, which bridges item semantics and recommendation models. However, existing methods often struggle to effectively organize and inject complex user-behavioral and item-semantic contexts into recommendation models simultaneously. On the one hand, existing graph-based integration methods, such as graph serialization and graph neural networks, either suffer from scalability issues or exploit only local graph information. On the other hand, existing semantic tokenization methods typically rely on heuristics and lack explicit supervision signals, which may lead to inaccurate or suboptimal semantic representations. To address these limitations in user interest context modeling, we propose G2Rec, a scalable framework that unifies holistic graph-based user co-engagement modeling with semantic tokenization for industrial-scale generative recommendation. Overall, G2Rec enables recommendation models to capture holistic and semantically grounded user interest prototypes without requiring ground-truth user interests, thereby providing more comprehensive and accurate modeling of user behavior contexts in industrial sequential recommendation. Online deployment across product surfaces and extensive experiments on public datasets demonstrate the superiority of G2Rec over existing methods.

阅读与讨论 → 访问原文 →

08.

PLOS Computational Biology 2026-06-05 DOI: HASH:06fd5a830f633ecf602ccb5755184374

Heuristic multi-site optimization for protein sequence design using Masked Protein Language Models

作者:

Lijuan Wang ↗

by Lijuan Wang, Yuze Wang, Chen Qiu, Liwei Xiao, Xianliang Liu, Junjie Chen Protein sequence design for tailored functional properties is a fundamental task in protein engineering, with critical applications in drug discovery and therapeutic development. Efficient navigation of the combinatorial vastness of protein sequence space to identify functional variants remains a formidable challenge. Conventional approaches, which predominantly rely on template-based local search or single-residue mutagenesis, are constrained by their susceptibility to local optima and their potential risk of destabilizing native structural stability. In this study, we introduce ProtHMSO, a heuristic multi-site optimization framework leveraging masked protein language models (ProtLMs) for context-aware sequence exploration. ProtHMSO mimics natural evolutionary mechanisms by employing ProtLM-derived substitution probabilities to guide heuristic searches for synergistic mutations, thereby constraining combinatorial search spaces through evolutionary and biophysical priors. ProtHMSO is further applied to replace the exploration strategies in genetic algorithms (GAs) and Monte Carlo tree search (MCTS) for improving their convergence efficiency. Benchmark experiments demonstrate that protein sequences generated by ProtHMSO exhibit superior functional performance and closer alignment with natural sequence distribution, compared with state-of-the-art methods. These advancements highlight that ProtHMSO has strong potential and compatibility to accelerate functional protein discovery, offering a robust framework for efficient and context-aware exploration of protein sequence space.

阅读与讨论 → 访问原文 →

09.

arXiv (CS.CL) 2026-06-17 DOI: arXiv:2505.19937

ALAS: An Automatic Latent Alignment Score for Audio Language Models

作者:

Pooneh Mousavi ↗Yingzhi Wang ↗Mirco Ravanelli ↗Cem Subakan ↗

Large Language Models (LLMs) are extended into Speech-LLMs, and the quality of the audio–text alignment they learn affects most downstream Spoken Language Understanding (SLU) behavior. Yet despite a growth of fusion strategies, there is no standard way to measure how well a Speech-LLM internally binds audio frames to text tokens. We introduce ALAS (Automatic Latent Alignment Score), a model and task-agnostic metric that probes the LLM's per-layer hidden states, scoring the cross-modal cosine similarity between audio and text representations against a Whisper-derived reference. ALAS needs only a frozen forward pass and an off-the-shelf ASR reference, with no training or fitted classifier, and is calibrated to an interpretable uniform baseline comparable across tasks. Applying ALAS to four open-source Speech-LLMs (AF3, Qwen2-Audio, Qwen-Omni, SALMONN) across emotion recognition (IEMOCAP), open-ended SQA (LibriSQA), and multi-choice audio understanding (MMAU-speech), we find that the depth and strength of alignment reflect each model's audio-encoder design and the acoustic-versus-semantic demands of the task, and that ALAS tracks but does not duplicate task accuracy, exposing models that score well without genuinely grounding in the audio. We release ALAS as an open-source library so that practitioners can probe their own Speech-LLMs or try it on new tasks.

阅读与讨论 → 访问原文 →

10.

Nature (Science) 2026-06-08 DOI: HASH:75b8d24b7d98e1b1c38f179aa17a78f0

GPR15-guided CD8<sup>+</sup> T regulatory cells control intestinal inflammation

作者:

Jing Cui ↗

Inflammatory bowel disease (IBD) causes chronic suffering from gastrointestinal inflammation and dysfunction that can progress to colon cancer1,2. The disease prevalence is increasing and there is an urgent need to better understand its pathogenic mechanisms to improve treatment. We show that GPR15, a G protein-coupled receptor (GPCR) expressed in immune cells and previously described as an entry co-factor for human and simian immunodeficiency viruses3, is a marker and homing receptor for a subset of intramucosal GPR15-guided regulatory CD8+ T lymphocytes (CD8+ TIGR). Deleterious GPR15 gene variants in humans cause defective homing of CD8+ TIGR and are associated with severe early-onset IBD. Moreover, CD8+ TIGR cells are reduced in the intestinal mucosa of sporadic IBD patients. In mice, GPR15 deficiency impairs colonic homing of CD8+ TIGR cells, leading to accumulation of inflammatory macrophages and increased susceptibility to colitis. CD8+ TIGR cells potently kill macrophages activated by intestinal damage or disease using Fas ligand (FasL) and TNF-related weak inducer of apoptosis (TWEAK). The identification of CD8+ TIGR cells yields new insights into organ-specific immune regulation and potential therapeutics for IBD.

阅读与讨论 → 访问原文 →

11.

arXiv (CS.CV) 2026-06-11 DOI: arXiv:2606.11661

Learning Instance-Adaptive Low-Rank Orthogonal Subspaces for Clothes-Changing Person Re-Identification

作者:

Dong-Woo Kim ↗Tae-Kyun Kim ↗

Clothes-changing person re-identification (CC-ReID) aims to recognize individuals despite drastic appearance changes caused by clothing variation. While existing methods rely on adversarial learning to disentangle clothing features, we propose Ortho-ReID, which explicitly models a low-rank clothing subspace from VLM text descriptions and extracts clothing-invariant representations via direct geometric constraints. A critical component is our transformer-based Basis Maker, which refines a shared, low-dimensional clothing prior into an instance-adaptive low-rank subspace through cross-attention with image patches, enabling robust clothing feature extraction even under varying visibility conditions. This instance-adaptive subspace is supervised via alignment with clothing text embeddings, while identity features are extracted via a learnable projection head and geometrically constrained to be strictly orthogonal to it. Extensive experiments demonstrate state-of-the-art performance on PRCC (+5.9% top-1), Celeb-reID-light (+3.5%), and LaST (+5.3%), with competitive results on LTCC.

阅读与讨论 → 访问原文 →

12.

arXiv (CS.CL) 2026-06-16 DOI: arXiv:2606.15033

Cloze: An Open Research Platform for Studying Human-AI Conversations in Mental Health Contexts

作者:

Matthew Flathers ↗Francesco Cipriani ↗John Torous ↗

Cloze is an open-source web platform for conducting controlled, monitored studies of human-AI conversation in mental health research contexts. Consumer large language model (LLM) products such as ChatGPT, Claude, and Gemini are built for individual productivity, and offer researchers little experimental control, inconsistent data export, and no shared safety scaffolding that holds across providers. Cloze gives research teams a single environment in which they configure which models participants converse with, how the AI is instructed, how conversations are scheduled over time, and which safety constraints apply unconditionally, while every message is captured with full provenance (model version, prompt configuration, timing). The platform currently supports OpenAI, Anthropic, Google, and locally hosted open-weight models served through Ollama behind a unified interface, and runs in the cloud or fully on premises so that participant data need never leave an institution. Cloze is research infrastructure for building an evidence base on human-AI interaction in mental health contexts. It is not a therapeutic product.

阅读与讨论 → 访问原文 →

13.

arXiv (CS.CV) 2026-06-16 DOI: arXiv:2606.16870

Latent Space Reinforcement Learning for Inverse Material Estimation in Food Fracture Simulation

作者:

Adrian Ramlal ↗Yuhao Chen ↗John S. Zelek ↗

Realistic visual simulation of food manipulation requires accurate material parameters, yet these are difficult to measure directly and vary across the heterogeneous regions of a single food item. We address the inverse problem of estimating material parameters from a target description of fracture behavior in a non-differentiable continuum damage mechanics simulator. Using orange peeling as a test case, we train a neural surrogate on 2,000 forward simulations and compare Covariance Matrix Adaptation Evolution Strategy (CMA-ES, a gradient-free evolutionary optimizer) with Proximal Policy Optimization (PPO, a reinforcement learning algorithm) across the original 9-dimensional parameter space and two learned 4-dimensional latent representations. Since different oranges have different material properties, a practical inverse system must handle arbitrary targets without retraining. We train a goal-conditioned PPO policy that learns a general inverse mapping: given any target description of peeling behavior, the policy produces a material parameter estimate in a single forward pass (8 surrogate evaluations, approximately 10ms). Operating in a normalizing flow latent space with a shared surrogate evaluator, the goal-conditioned policy achieves 0.642 actual recovery when validated through the simulator, outperforming the original parameter space by 23%. A warm-start extension that initializes CMA-ES refinement from the policy's output further improves recovery to 0.828 with 540 evaluations. These findings provide a practical framework for inverse food physics and lay groundwork for vision-driven material identification from video observations of food manipulation.

阅读与讨论 → 访问原文 →

14.

arXiv (CS.CL) 2026-06-16 DOI: arXiv:2606.14820

Spectro-Temporal Interference Confounds Phase Encoding in Spatial Audio Foundation Models

作者:

Yuxuan Chen ↗Haoyuan Yu ↗Peize He ↗

Recent spatial self supervised audio models achieve high performance on localization tasks, raising questions about their encoding of microsecond interaural phase fine structures. We propose a psychoacoustic benchmark based on the binaural masking level difference to evaluate this. Using an equalization cancellation baseline and a GCC PHAT positive control we evaluate nine frozen audio models spanning binaural SSL, monaural SSL, and neural audio codecs. Four monaural negative controls yield zero BMLD confirming binaural specificity. Two general purpose binaural SSL models exhibit minimal phase sensitivity while dedicated binaural spatial SSL models achieve BMLD comparable to the analytical baseline. Progressive physical ablations show that general purpose binaural SSL models rely on spectro temporal interference textures rather than cross channel phase computation. High detection rates in speech reflect a confounding reliance on broadband envelopes rather than genuine phase encoding.

阅读与讨论 → 访问原文 →

15.

arXiv (CS.AI) 2026-06-12 DOI: arXiv:2606.12936

An Embodied Simulation Platform, Benchmark, and Data-Efficient Augmentation Framework for Wet-Lab Robotics

作者:

Zhe Liu ↗Huanbo Jin ↗Zhaohui Du ↗Zhe Wang ↗He Xu ↗Peijia Li ↗Jiaming Gu ↗Quan Lu ↗Qi Wang ↗Bin Ji ↗Ting Xiao ↗

arXiv:2606.12936v1 Announce Type: cross Abstract: Wet-lab robots can improve the reproducibility, throughput, and safety of biomedical experiments, but scaling their learning requires customizable simulators for safe and reproducible task generation, open editable laboratory assets, and efficient pipelines that turn limited demonstrations into usable training data. We present Pipette, an embodied simulation platform, benchmark, and data-efficient augmentation framework for wet-lab robot learning. Pipette releases over 43 open-source and re-editable wet-lab assets, together with an extensible asset-building pipeline. A key component of Pipette is its simulation-based data augmentation pipeline, replaying human demonstrations in simulation, applies lighting, camera, speed, and action perturbations, and filters generated episodes with automatic task success checks, rapidly expanding usable training data from limited manual demonstrations. We further introduce an 11-task wet-lab embodied benchmark covering sample handling, culture-ware manipulation, device operation, and precision placement. With only 30 demonstrations per task, ACT achieves 65.5% average success rate, while simulation augmentation improves SmolVLA from 44.1% to 74.7% and {\pi}0 from 40.4% to 46.5%, validating the effectiveness of Pipette for data-efficient VLA training and evaluation. Pipette also supports natural-language-driven scene construction and task registration, lowering the barrier for non-expert users to define new wet-lab robotic tasks.

阅读与讨论 → 访问原文 →

16.

arXiv (CS.CV) 2026-06-12 DOI: arXiv:2606.10200

An Improved Generative Adversarial Network for Micro-Resistivity Imaging Logging Restoration

作者:

Ahmed Faizul Haque ↗S. M. Riaz Rahman Antu ↗Saif Ahmed ↗Asadullah Hil Galib ↗Souvik Pramanik ↗Mohammad Ashrafuzzaman Khan ↗Mohammad Abdul Qayum ↗Mohsin Sajjad ↗

An improved GAN-based imaging logging image restoration method is presented in this paper for solving the problem of partially missing micro-resistivity imaging logging images. The method uses FCN as the generative network infrastructure and adds a depth-separable convolutional residual block to learn and retain more effective pixel and semantic information; an Inception module is added to increase the multi-scale perceptual field of the network and reduce the number of parameters in the network; and a multi-scale feature extraction module and a spatial attention residual block are added to combine the channel attention. The multi-scale module adds a multi-scale feature extraction module and a spatial attention residual block, which combine the channel attention mechanism and the residual block to achieve multi-scale feature extraction. The global discriminative network and the local discriminative network are designed to gradually improve the content and semantic structure coherence between the restored parts and the whole image by playing off each other and the generative network. According to the experimental results, the average structural similarity measure of the five sets of imaged logging images with different sizes of missing regions in the test set is 0.903, which is an improvement of about 0.3 compared with other similar methods. It is shown that the method in this study can be used for the restoration of micro-resistivity imaging log images with good improvement in semantic structural coherence and texture details, thus providing a new deep learning method to ensure the smooth advancement of the subsequent interpretation of micro-resistivity imaging log images.

阅读与讨论 → 访问原文 →

17.

arXiv (CS.AI) 2026-06-16 DOI: arXiv:2606.16054

How to Detect and Measure the AI Dangers to Democracy

作者:

Giulia Sandri ↗Claudio Novelli ↗

arXiv:2606.16054v1 Announce Type: cross Abstract: Research on artificial intelligence and democracy has grown quickly over the last decade. A shared conclusion in this literature is that AI does not create new democratic problems so much as it makes old ones worse. We now see this across information ecosystems, in elections, and in public administration. However, despite growing evidence, we lack a clear way to prioritize risks in this area, compare them across domains, and identify where democratic control is most likely to break down. So, our problem is: How can we systematize the problems that AI systems pose to democratic processes? This paper argues that principal agent theory may fit the task. In many phases of democratic systems, principals delegate key functions to AI systems and their providers without really being able to monitor how these systems operate or the outputs they produce. Treating AI as a delegation problem helps identify accountability gaps and other governance failures. Most importantly, as we shall illustrate, it provides metrics for empirical assessments of AI impact on democracy. As a second analytical element, we draw on the NIST AI Risk Management Framework and its seven characteristics of trustworthy AI, which supply substantive criteria for evaluating delegated tasks. Operationalized across the three domains through measurable indicators and domain specific trustworthiness criteria, we propose an analytical framework that centers on institutional assessability as the central condition for democratic control over AI. However, we stress that how severe a harm is, and how much risk is acceptable, are evaluative judgments that current methodologies neither acknowledge nor operationalize. This becomes acute when such evaluative judgments are (silently) delegated to private vendors. We identify this as a strong limitation left for future work.

阅读与讨论 → 访问原文 →

18.

arXiv (CS.CL) 2026-06-11 DOI: arXiv:2606.07591

ResearchClawBench: A Benchmark for End-to-End Autonomous Scientific Research

作者:

Wanghan Xu ↗Shuo Li ↗Tianlin Ye ↗Qinglong Cao ↗Yixin Chen ↗Hengjian Gao ↗Yiheng Wang ↗Qi Li ↗Kun Li ↗Sheng Xu ↗Shengdu Chai ↗Fangchen Yu ↗…

AI coding agents are increasingly used for scientific work, but their end-to-end autonomous research capability remains difficult to verify. We present ResearchClawBench, a benchmark for evaluating autonomous scientific research across 40 tasks from 10 scientific domains. Each task is grounded in a real published paper, provides related literature and raw data, and hides the target paper during evaluation. Expert-curated multimodal rubrics decompose the target scientific artifacts into weighted criteria, enabling evaluation of target-paper-level re-discovery while leaving room for new discovery. We evaluate seven autonomous research (auto-research) agents under a unified protocol and seventeen native LLMs through the lightweight ResearchHarness. Current systems remain far from reliable re-discovery: the strongest autonomous agent, Claude Code, averages 21.5, and the strongest ResearchHarness LLM, Claude-Opus-4.7, averages 20.7, with an LLM frontier mean of only 26.5. Error analysis shows that failures concentrate in experimental protocol mismatch, evidence mismatch, and missing scientific core. ResearchClawBench provides a reproducible evaluation frontier for measuring progress toward autonomous scientific research.

阅读与讨论 → 访问原文 →

19.

arXiv (math.PR) 2026-06-17 DOI: arXiv:2411.09058

Time-dependent averages of a critical long-range stochastic heat equation

作者:

Sefika Kuzgun ↗Ran Tao ↗

arXiv:2411.09058v2 Announce Type: replace Abstract: We study the time-dependent spatial averages of a critical stochastic partial differential equation, namely the stochastic heat equation in dimension $d\geq 3$ with noise white in time and colored in space with covariance kernel $\|\cdot\|^{-2}$. The solution to this SPDE is a singular measure and was constructed by Mueller and Tribe in [MT04]. We show that the time-dependent spatial averages of this SPDE over a ball of radius $R$ at time $t$ have different limits under different space-time scales. In particular, when $t\ll R^2$, the central limit theorem holds; when $t=R^2$, the spatial average is a non-Gaussian random variable; when $t\gg R^2$, the spatial average becomes extinct.

阅读与讨论 → 访问原文 →

20.

bioRxiv (Bioinfo) 2026-06-16 DOI: HASH:5ce607d3ef039ca1adc4a6f51eb0fe42

Super Learner Ensemble Modeling of CPTAC Proteomic Data for Survival Prediction in Head and Neck Squamous Cell Carcinoma

作者:

Park ↗Lee ↗Oh ↗E. J ↗Tham ↗Ahn ↗

Survival analysis in head and neck squamous cell carcinoma (HNSCC) is traditionally performed using Cox proportional hazards models, alongside some exploration into black-box machine learning methods. The Super Learner (SL) algorithm addresses this model selection dilemma by combining diverse candidate algorithms into a weighted ensemble to perform comparably to the best candidate method. This study evaluates the performance of SL in HNSCC. Proteomic features as well as clinical covariates from 96 CPTAC HNSCC samples were modeled with three candidate algorithms (Cox LASSO, Cox Ridge, and Random Survival Forest) as well as the ensemble SL method. Models were optimized via Uno's time-dependent Concordance Index (C-index) and tested at 1- and 3-year time horizons using 2000 bootstrap resamples. The Cox Ridge regression model achieved the highest predictive accuracy among the four total methods. However, the SL demonstrated stable performance over both time horizons (1-year C-index: 0.985; 3-year C-index: 0.960). Variable importance analysis of the Cox Ridge model successfully identified malignant proteins (ATR, MAML1, MIEN1) alongside novel potential prognostic indicators (ZNF800, KERA). This analysis emphasizes the statistical necessity for larger cohorts for ensemble learning, while providing a benchmark of proteomic indicators in HNSCC.

阅读与讨论 → 访问原文 →

21.

arXiv (CS.CV) 2026-06-15 DOI: arXiv:2603.12400

Generation of Maximal Snake Polyominoes Using a Deep Neural Network

作者:

Benjamin Gauthier ↗Alain Goupil ↗Fadel Toure ↗

Maximal snake polyominoes are difficult to study numerically in large rectangles, as computing them requires the complete enumeration of all snakes for a specific rectangle size, which corresponds to a brute force algorithm. This hinders the study of maximal snakes in larger rectangles. Moreover, most enumerable snakes lie in small rectangles, obscuring large-scale patterns. In this paper, we investigate the contribution of a deep neural network to the generation of maximal snake polyominoes from a data-driven training, where the maximality and adjacency constraints are not encoded explicitly, but learned. To this extent, we experiment with a denoising diffusion model, which we referred as Structured Pixel Space Diffusion (SPS Diffusion). We find that SPS Diffusion generalizes from small rectangles to larger ones, generating valid snakes up to 28x28 squares and producing maximal snake candidates on squares close to the current computational limit. The model is, however, prone to errors such as branching, cycles, or multiple snake components. Overall, the diffusion model is promising and suggests that complex combinatorial objects can be understood by deep neural networks, which is useful in their investigation.

阅读与讨论 → 访问原文 →

22.

bioRxiv (Bioinfo) 2026-06-16 DOI: HASH:65f9ed62ff370b453dc974174099bcbf

MetaPilot: genome-aware adaptive search-space refinement for unified DDA and DIA metaproteomics

作者:

Cheng ↗Figeys ↗

Metaproteomic peptide identification is constrained by the structure and size of the protein search space. Pooled gene catalogues provide coverage but obscure genome-level evidence, and current workflows for data-dependent (DDA) and data-independent (DIA) acquisition diverge in their database strategies. We present MetaPilot, a genome-aware workflow that uses conserved marker-protein evidence to rank candidate genomes from MGnify catalogues and construct adaptive, sample-specific search spaces. Applied to paired DDA/DIA datasets of defined mixtures and fecal samples, MetaPilot adapted genome selection to community complexity and reproduced published peptide evidence while expanding the detectable peptide space. In DDA-independent reanalysis of Orbitrap human gut DIA data, MetaPilot identified 24.4% more peptides than the published DDA-derived library and 2.06-fold more than the matched DDA-assisted DIA search. On timsTOF DIA-PASEF mouse intestinal data, it outperformed uMetaP by 41.8~119.7%, enabling genome-resolved functional interpretation without DDA-PASEF input.

阅读与讨论 → 访问原文 →

23.

arXiv (CS.AI) 2026-06-17 DOI: arXiv:2606.04513

MapAgent: An Industrial-Grade Agentic Framework for City-scale Lane-level Map Generation

作者:

Deguo Xia ↗Zihan Li ↗Haochen Zhao ↗Dong Xie ↗Yuyao Kong ↗Xiyan Liu ↗Jizhou Huang ↗Mengmeng Yang ↗Diange Yang ↗

arXiv:2606.04513v2 Announce Type: replace Abstract: Lane-level maps are critical infrastructure for autonomous driving and lane-level navigation, yet constructing and maintaining standardized lane networks for hundreds of cities remains highly labor-intensive. Recent end-to-end vectorized mapping methods can predict lane geometry and topology directly from sensor data, but they typically treat mapping specifications and traffic regulations as implicit, dataset-dependent supervision. Moreover, in complex scenes (e.g., worn or missing markings and occlusions), correct lane configurations are often under-determined by visual evidence alone, making specification violations a major source of human post-editing. We propose MapAgent, an industrial-grade agentic architecture that augments a vectorization backbone for specification-compliant lane-map production. Rather than merely adding an agent loop to map prediction, MapAgent couples backbone perception with explicit specification verification, constraint-aware reasoning, and deterministic map editing under a bounded, verification-driven Judge-Planner-Worker loop. A vision-language Judge diagnoses errors by jointly inspecting visual evidence and draft vectors, while a tool-calling Planner generates minimal corrective edits with post-edit re-validation. To remain scalable for city-scale production, MapAgent is selectively triggered only on tiles with low backbone confidence, adding modest overhead while preserving throughput. Experiments on real-world datasets show consistent gains over strong production baselines, especially in complex and long-tail scenarios. Additionally, MapAgent has been integrated into Baidu Maps, supporting lane-level map generation for over 360 cities nationwide and elevating the overall production automation to over 95%, demonstrating MapAgent's practicality and effectiveness for large-scale lane-level map generation.

阅读与讨论 → 访问原文 →

24.

arXiv (math.PR) 2026-06-19 DOI: arXiv:2606.20289

Dimension-free bounds for {R}iesz transforms on the {H}amming cube via a {B}ellman function

作者:

Komla Domelevo ↗Paata Ivanisvili ↗Stefanie Petermichl ↗Alexander Volberg ↗

arXiv:2606.20289v1 Announce Type: cross Abstract: We give a Bellman-function proof of the dimension-free estimate \[ \Big\| \vec{R} f \Big\|_{L^p(\Omega;\,\ell^2)} \lesssim (p-1) \,\|f\|_{L^p(\Omega)}, \qquad 2\le p

阅读与讨论 → 访问原文 →

25.

arXiv (CS.CV) 2026-06-16 DOI: arXiv:2606.16414

Instance-Aware Knowledge Distillation for Semi-Supervised Learning of an On-Board Multi-Task Dense Prediction Model for Collision Avoidance System

作者:

Gyutae Hwang ↗Sang Jun Lee ↗

Collision avoidance systems have evolved toward camera-based deep learning approaches for driving scene understanding. However, deployment in edge environments such as country clubs is constrained by limited computational resources and unreliable communication infrastructure. Moreover, constructing large-scale datasets for the target domain involves substantial annotation cost. To address these limitations, we propose an instance-aware knowledge distillation framework for semi-supervised learning. Specifically, we generate pseudo labels that mitigate teacher bias by leveraging domain priors from the teacher and instance-centric knowledge from foundation models. The trained lightweight student is deployed in the proposed collision avoidance system and performs multiple dense prediction tasks in real-time. The system detects frontal obstacles and encodes their spatial information into controller area network messages for automated guided vehicle operation. To achieve this, we construct a large-scale country club dataset and perform field validation of the proposed system. Experimental results demonstrate that the student outperforms the large teacher in instance segmentation while mitigating performance degradation in monocular depth estimation. Compared with the teacher, the student reduces FLOPs by 22.68$\times$ and parameters by 14.33$\times$, achieving 6.46 FPS on a low-cost edge device.

阅读与讨论 → 访问原文 →

探索全球前沿学术脉络