Academic Intelligence · Curated Daily

探索全球前沿学术脉络

AcademicHub 汇聚顶级期刊与预印本平台的实时文献。定制您的专属科研雷达,利用大语言模型自动生成交叉领域文献分析简报。

01.
arXiv (CS.CV) 2026-06-11

Weakly Supervised Segmentation as Semantic-Based Regularization

Weakly supervised semantic segmentation (WSSS) trains dense pixel-level segmentation models from partial or coarse annotations such as bounding boxes, scribbles, or image-level tags. While recent work leverages foundation models such as the Segment Anything Model (SAM) to generate pseudo-labels, these approaches typically depend on heuristic prompt choices and offer limited ways to incorporate prior knowledge or heterogeneous labels. We address this gap by taking a neurosymbolic perspective: integrating differentiable fuzzy logic with deep segmentation models. Weak annotations and domain-specific priors are unified as continuous logical constraints that fine-tune SAM under weak supervision. The refined foundation model then produces improved pseudo-labels, from which we train a second-stage prompt-free segmentation model. Experiments on Pascal VOC 2012 and the REFUGE2 optic disc/cup segmentation dataset show that our logic-guided fine-tuning yields higher-quality pseudo-labels, leading to state-of-the-art segmentation accuracy that often exceeds densely supervised baselines.

02.
medRxiv (Medicine) 2026-06-15

Anti-Platelet Factor 4 Antibody Clonal Heterogeneity and MGUS Status in HIT

Background Monoclonal gammopathy of thrombotic significance (MGTS) is a recently described chronic prothrombotic condition characterized by monoclonal anti-PF4 antibodies that are detected above the polyclonal antibody background in patient sera (i.e. present as monoclonal gammopathy of undetermined significance, MGUS). Due to conflicting data in the published literature on antibody clonality in heparin-induced thrombocytopenia (HIT), we evaluated clonality and abundance of anti-PF4 antibodies in HIT, including investigating whether an MGUS, if present in HIT, represents the causative anti-PF4 antibody. Methods Blood samples from 15 patients with HIT were subject to Platelet Factor 4-dependent antigen-based and functional tests. The unmanipulated serum antibody repertoire and isolated anti-PF4 antibodies were subjected to mass spectrometric evaluation. Results Two of the 15 HIT patients had an IgG MGUS. Notably, anti-PF4 antibodies were not synonymous with the MGUS antibody in either of the two patients. Eight of the 15 patients demonstrated monoclonal anti-PF4 antibodies, however, none of the anti-PF4 antibodies were detectable as an MGUS upon evaluation of the entire serum antibody repertoire, reflecting their low abundance. In the seven patients with multiple anti-PF4 antibodies, non-monoclonality was confirmed by analysis of deglycosylated antibody heavy chains. Conclusions Anti-PF4 HIT antibodies are monoclonal in approximately 50% of HIT patients, however, antibody abundance is low such that they are not detectable over the polyclonal IgG background (i.e. are MGUS-negative), differentiating HIT from MGTS. This observation helps explain the transient nature of HIT relative to the persistent prothrombotic state seen in MGTS.

03.
arXiv (CS.LG) 2026-06-19

UltraEP: Unleash MoE Training and Inference on Rack-Scale Nodes with Near-Optimal Load Balancing

arXiv:2606.04101v3 Announce Type: replace-cross Abstract: Large-scale expert parallelism (EP) is becoming pivotal for training and serving frontier MoE models, but it also amplifies device-level expert load imbalance into compute stragglers, token all-to-all bottlenecks, and activation-memory spikes. Existing balancers redistribute experts periodically based on historical load, which becomes unreliable for production deployments with non-stationary load patterns. We present UltraEP, the first exact-load, real-time balancer for large-EP MoE training and serving prefill on rack-scale nodes (RSNs). Leveraging the extended scale-up connectivity among dozens of GPUs within RSNs, UltraEP rebalances every microbatch and layer on critical paths, which requires nontrivial co-design of plan solving and expert replication communication to minimize exposed overhead. To this end, UltraEP eagerly reacts to post-gating load with an efficient quota-driven planner, and executes the resulting irregular expert-state transfers with RSN-native persistent tile streaming and relay-based fan-out mitigation. We evaluate UltraEP in a multi-RSN deployment of up to 256 GPUs, using cutting-edge MoE models from 106B to 671B parameters. Averaged across training and serving, UltraEP achieves 94.3% of the force-balanced ideal throughput, delivering 1.49$\times$ improvement over no-balancing, while reducing the final inter-rank imbalance from 1.30$-$4.01 to 1.01$-$1.04.

04.
arXiv (CS.CV) 2026-06-15

Avatar V: Scaling Video-Reference Avatar Video Generation

Generating avatar videos that are not merely visually similar to a target individual but behaviorally recognizable, faithfully reproducing their talking rhythm, gestural tendencies, and expression dynamics, remains an open challenge. Existing methods predominantly condition on single static images, which provide insufficient identity information and cannot capture dynamic motion traits, while standard pixel-level objectives underserve the perceptually critical facial regions that determine avatar fidelity. We present Avatar V, a production-scale framework that addresses these limitations through video-reference-conditioned identity modeling. Rather than compressing identity into fixed-size embeddings, the model conditions directly on the full token sequence of a reference video, learning to reproduce both static identity attributes (facial geometry, skin texture) and dynamic behavioral patterns (talking rhythm, micro-expressions) through attention over the reference context. We introduce Sparse Reference Attention, an asymmetric mechanism achieving linear-complexity conditioning on arbitrarily long references; a motion representation stream enabling closed-loop talking style transfer; and an identity-aware super-resolution refiner inheriting the full reference conditioning. These are supported by a data engine curating 100M+ training clips from 50M raw videos, and a five-stage training pipeline with flow matching pre-training, personality fine-tuning, two-phase distillation (>10x acceleration), and RLHF alignment, deployed across thousands of GPUs. Avatar V generates 1080p videos of unlimited duration, achieving state-of-the-art identity preservation, lip synchronization, and generation quality on our cross-scene benchmark, consistently outperforming leading systems including Seedance 2.0, Kling O3 Pro, Veo 3.1, and OmniHuman 1.5 in both automated metrics and human evaluation.

05.
arXiv (CS.AI) 2026-06-17

Counterfactual Optimization of Baseball Pitch Sequences and Estimation of Its Impact on Season-Level Statistics

arXiv:2606.17345v1 Announce Type: cross Abstract: Although pitch sequencing is a central topic in baseball analytics, previous studies have primarily focused on optimizing the final pitch within a single plate appearance, leaving the role of preceding setup pitches and their impact on long-term season-level performance insufficiently examined. To address these issues, this study conducted counterfactual analyses using MLB Statcast data. A Transformer-based machine-learning model was trained to predict whether a target pitch would result in an in-play outcome or swing-out. Counterfactual pitch sequences were then generated by replacing either the final pitch or the preceding setup pitch with alternative pitch types and locations while keeping the surrounding contextual information fixed. Optimal counterfactual selections were defined as those that minimized the predicted in-play probability, and their expected effects on pitchers' seasonal statistics were estimated using regression models linking model outputs to season statistics. The results suggest that the optimization of both final and setup pitches may substantially influence season-level performance, including improvements of more than 1.0 in K/9. The analyses also provided several practical insights, including velocity-band-specific effective locations, the importance of pitch commands, and the expansion of pitch-selection options through middle-velocity pitches. These findings quantitatively support the strategic importance of pitch sequencing in baseball.

06.
arXiv (math.PR) 2026-06-11

On the Wasserstein distance between a hyperuniform point process and its mean

arXiv:2404.09549v3 Announce Type: replace Abstract: We study the existence of bounds on the expected $p$-Wasserstein distance between a random measure and its mean under the assumption that the $p$-th centered moments of the counting statistics are controlled uniformly in space. The average Wasserstein transport cost is shown to be bounded from above and from below by some multiples of the number of points. $D$-dimensional versions of those results are also obtained. As a corollary, we prove that for any value of $p\geq 1$ the Ginibre point process can be seen as a perturbed lattice with identically distributed perturbations with a finite $p$-th moment.

07.
arXiv (CS.AI) 2026-06-12

Mechanical Conscience: A Mathematical Framework for Dependability of Machine Intelligenc

arXiv:2605.03847v2 Announce Type: replace Abstract: Distributed collaborative intelligence (DCI), encompassing edge-to-edge architectures, federated learning, transfer learning, and swarm systems, creates environments in which emergent risk is structurally unavoidable: locally correct decisions by individual agents compose into globally unacceptable behavioral trajectories under uncertainty. Existing approaches such as constrained optimization, safe reinforcement learning, and runtime assurance evaluate acceptability at the level of individual actions rather than across behavioral trajectories, and none addresses the multi-participant, uncertainty-laden nature of DCI deployments. This paper introduces mechanical conscience (MC), a novel concept and simplified mathematical framework that operationalizes trajectory-level normative regulation for both single-agent and distributed intelligent systems. Mechanical conscience is defined as a supervisory filter that minimally corrects a baseline policy's actions to reduce cumulative deviation from a normatively admissible region, while accounting for epistemic uncertainty. We introduce associated constructs, conscience score, mechanical guilt, and resonant dependability, that provide an interpretable vocabulary and computable governance signals for this emerging field. Core theoretical properties are established: admissibility equivalence, existence of optimal regulation, and monotonic deviation reduction. Illustrative results demonstrate that MC-regulated agents maintain trajectory-level normative acceptability where conventional controllers drift outside admissible bounds, and that the framework naturally extends to suppress interaction-induced emergent risk in multi-agent DCI settings.

08.
arXiv (CS.CV) 2026-06-16

PURe: A Plug-and-Play Product-Unit Residual Module for Vision Networks

Modern vision networks are dominated by additive local transformations, whereas explicit multiplicative local interactions remain underexplored. Product units offer a direct approach to modeling such interactions, but their use in deep architectures has been limited by optimization instability. In this work, we propose PURe, a Product-Unit Residual Module for deep vision networks. PURe is built around a 2D Product Unit with a real-valued log-domain formulation that makes multiplicative local aggregation practical within deep residual hierarchies. The resulting module serves as a drop-in replacement for native residual units. We instantiate PURe in residual CNNs for image classification and in 2D residual encoder-decoder networks for slice-based segmentation on volumetric CT data. Across Galaxy10 DECaLS, ImageNet, and CIFAR-10, PURe consistently improves residual CNNs and yields a more favorable accuracy-parameter trade-off, allowing moderately deep models to match or surpass substantially deeper ResNet baselines with much smaller parameter budgets. On the AMOS benchmark, PURe also improves slice-based CT segmentation under 3D case-level evaluation. These results show that explicit multiplicative local interaction is a practical and effective design primitive for deep residual vision networks.

09.
arXiv (CS.CL) 2026-06-16

Understanding LLM Reasoning for Abstractive Summarization

Reasoning has substantially improved Large Language Models (LLMs) on analytical tasks such as mathematics and code generation, but its value for abstractive summarization remains unclear. To address this gap, we adapt general reasoning strategies to the summarization setting and conduct a large-scale comparative study of 8 reasoning strategies and 3 Large Reasoning Models (LRMs) across 8 diverse datasets, evaluating both summary quality and factual faithfulness. Our results show that reasoning is not a universal solution and its effectiveness depends strongly on the strategy and the summarization setting. In particular, we find a trade-off between summary quality and factual faithfulness. Explicit reasoning strategies often improve reference-based quality, but may weaken factual grounding, whereas implicit reasoning in LRMs shows the opposite tendency. We further find that increasing an LRM's internal reasoning budget does not reliably improve summarization and can even reduce factual consistency. These findings suggest that, for summarization, more reasoning is not always better. Effective reasoning should preserve faithful compression rather than induce over-elaboration. Our source code is publicly available.

10.
arXiv (CS.AI) 2026-06-17

LLM-Powered Multi-Agent System for Automated Crypto Portfolio Management

arXiv:2501.00826v3 Announce Type: replace-cross Abstract: Cryptocurrency portfolio management requires the fusion of heterogeneous multi-modal signals, including structured price and on-chain time series, unstructured news text, and technical indicators, under high-volatility and real-time constraints. While deep learning approaches show predictive capability, their opacity limits practical adoption, and single large language model (LLM) agents struggle to process the breadth of modality-specific inputs needed for robust decision-making. We propose a multi-agent system (MAS) framework in which three modality-specialised agents, a Crypto Agent for market dynamics, a News Agent for weekly news sentiment, and a Trading Agent for signal fusion and portfolio execution, decompose the task across three communication architectures: hierarchical, collaborative, and debate. We evaluate four capability configurations: zero-shot, chain-of-thought (CoT), retrieval-augmented generation (RAG), and skill-augmented. In a 52-week backtest over calendar year 2025 across the top 15 L1 blockchain native cryptocurrencies by market capitalisation as of January 2025, the best configuration, Hierarchical (Skill), achieves a cumulative return of 133.52% and a Sharpe ratio of 1.502, outperforming single-agent variants, passive benchmarks, and deep learning baselines. An ablation study identifies the Crypto Agent as the most critical component, with its removal reducing cumulative return by 42.57 percentage points. A cross-model comparison further shows that MAS outperforms the single-agent baseline under GPT-4o, GPT-5, and Claude Sonnet 4.5, suggesting that the benefit of multi-agent coordination is model-agnostic. Unlike black-box deep learning models, every portfolio decision is traceable to explicit agent reasoning, offering an interpretable and effective approach to multi-modal cryptocurrency portfolio management.

11.
medRxiv (Medicine) 2026-06-22

A Parent-Generated Framework of Early Connection: Findings from a CBPR Qualitative Study

Background: Early relational health (ERH) constructs are derived fromresearch observations rather than lived experiences. This study foregrounds diverse parent voices to examine how they describeconnectionwith their young children. Methods: Usingcommunity-based participatory research (CBPR),this study was co-designed withparent leadersfromReach Out and Read. A semi-structured interview guidewas co-designed,and parent leaderssubsequentlyconducted and transcribed 18 interviews with parents from their networks.Researchersanalyzed transcripts using Reflexive Thematic Analysis.Member checking sessions with parent leadersinformedthe analytic framework. Results:Sixorganizing principleswereidentified.(1) Parent-child connection begins with an instinctual sense of responsibility.(2)Connectionebbs and flows as parent and child adapt to one another through dailyactivities.(3) Family circumstances, including family structure, cultural expectations, and intergenerational values, directly shape this connection. (4) Parents' own upbringings and past relationships indirectly shape how they connect with their child. (5) Forconnectionto grow, parents must show up physically and emotionally for their children despite competing demands. (6) Parentsgrow through engaged parenting, and that growth feeds back into the connection, creating a self-sustaining cycle of relational health.Conclusions:Our analysis generated twoconstructs underspecified in ERH frameworks.Parents described their sense of responsibility as immediate and instinctual, preceding an emotional bond.Parentsdemonstratedtheir agency in deciding what to carry forward from their relational histories, a pattern this study termsrelational legacy. Integrating parent-generated language into ERH measurementresearchmay shape a more comprehensive picture of ERHreflectinghow families experience connection.

12.
arXiv (CS.AI) 2026-06-17

Small Initialization Matters for Large Language Models

arXiv:2606.17945v1 Announce Type: new Abstract: Large language models provide a tractable system for asking how intelligence itself emerges, rather than only how LLMs can be engineered. Although progress is usually attributed to scale, data and architecture, we show that parameter initialization is a gene-like determinant of training and, in particular, of model capacity. Reducing the initialization scale consistently improves pretraining, with the largest gains on reasoning-demanding tasks. We identify two widely used empirical settings that restrain the advantage of small initialization, and show how relaxing them restores favorable scaling. We further uncover a critical initialization that balances the reasoning and training. Mechanistically, small initialization drives a distinct developmental trajectory: parameters first condense into low-complexity structures and later expand into richer representations, giving concrete form to the idea that compression is intelligence. Token-level analyses show that the gains concentrate on non-trivial, context-constrained predictions rather than all tokens uniformly. These results motivate a simple $\gamma$-initialization rule: expose initialization rage as an explicit knob and use small initialization by default, an almost cost-free intervention that improves pretraining and strengthens reasoning across model scales.

13.
medRxiv (Medicine) 2026-06-22

Reliable quantification of renal function from frozen blood samples

BACKGROUND: Differences in renal function may affect Alzheimer disease (AD) blood biomarker levels independent of AD pathology. Although renal function was unaccounted for in foundational AD blood biomarker studies, there is potential to address this through quantification of estimated glomerular filtration rate (eGFR) from frozen serum and plasma samples. However, the validity of eGFR evaluation from long-term frozen blood samples is unknown. METHODS: Adults aged 50-85 with at least 2 vascular risk factors were recruited from vascular surgery or cardiology clinics in Tucson, Arizona from 2022-2025. Individuals with creatinine assessments in point-of-care whole blood (POC-WB) and frozen serum and plasma samples using the iSTAT (Abbott) were included. eGFR was calculated using the 2021 CKD-EPI creatinine equation without race. Agreement between POC-WB and frozen blood samples was assessed using Cohen's kappa with linear weights. RESULTS: 134 participants (mean [SD] age: 72.6 [7.5] years, 39.6% female, 23.1% chronic kidney disease) had POC-WB eGFR available. Frozen serum and plasma samples had strong agreement with POC-WB for eGFR (Kw= 0.90-0.95, P

14.
Nature (Science) 2026-06-10

Mitochondria tethered to the nucleus secure its energy supply

Direct interactions between the cell’s powerhouses and nuclear pores might channel energy straight into the nucleus, fuelling cell division and differentiation. Direct interactions between the cell’s powerhouses and nuclear pores might channel energy straight into the nucleus, fuelling cell division and differentiation.

15.
arXiv (CS.LG) 2026-06-17

Memory-Efficient Meta-Reinforcement Learning for Adaptive Safety-Critical Control in Adversarial Spacecraft Proximity Operations

arXiv:2606.17414v1 Announce Type: new Abstract: Autonomous spacecraft rendezvous and proximity operations (RPO) require controllers that guarantee safety under thrust constraints while minimizing fuel expenditure. Input-constrained control barrier functions (ICCBFs) provide a control method for nonlinear systems with actuation constraints that construct a forward-invariant safe set. Previous work has shown that learning class-$\mathcal{K}$ functions defining the ICCBF recursion via meta reinforcement learning (meta-RL) yields a robust, non-greedy approach to safety-critical control in RPO. This paper extends that framework further by investigating the performance of three recurrent network architectures (Long Short Term Memory (LSTM), Gated Recurrent Unit (GRU), Selective State Space Model (Mamba)) and two training algorithms (Proximal Policy Optimization (PPO) and Soft Actor Critic (SAC)) to identify the best setup for tuning ICCBF class-K functions via meta-RL. In addition to cooperative test cases, performance is evaluated in the presence of adversarial behavior where the target spacecraft behaves in a way that worsens the safety of the chaser spacecraft. Results indicate that state space models such as Mamba when used with PPO achieve superior task completion, safety, and fuel-savings compared to other architectures, across all cooperative and uncooperative scenarios tested.

16.
medRxiv (Medicine) 2026-06-10

Development of an Open-Access Action Observation Video Library for Upper Limb Motor Rehabilitation

Background: Occupational therapists can improve stroke survivors hand and arm movement and participation in daily activities through action observation (AO). AO involves watching another persons hand or arm complete a movement or task. While research generally supports the use of AO with stroke survivors, there are limited AO videos are available to occupational therapists which makes applying AO challenging. Objective: The purpose of this work is to develop structured and widely accessible tool to support access to AO for stroke survivors, occupational therapists, and researchers. Methods: To develop an AO video library for stroke rehabilitation, functional and non-functional upper limb task deficits were first identified through clinical observations and clinician interviews to establish a prioritized list of daily activities. In collaboration with media production specialists, healthy adult volunteers were recruited and filmed performing these tasks from both first- and third-person perspectives. The recorded videos were then systematically edited, enhanced with instructional title slides, and distributed via a public YouTube channel for clinical application and a categorized digital repository for research purposes. Results: Initial assessments revealed a complete lack of familiarity, awareness, and utilization of AO resources among local occupational therapists, despite high perceived clinical utility. To address this gap, a final library of 150 tasks was established, resulting in the production of 419 finalized, standardized videos featuring six healthy volunteers. For clinical application, these videos were hosted on a free, public YouTube channel organized into 18 functional playlists, while a parallel set was structured into distinct movement categories for research repository storage. Conclusion: By providing a structured and highly accessible tool, this repository enables clinicians, researchers, and caregivers to readily implement evidence-based action observation interventions in both clinical and home settings.

17.
arXiv (quant-ph) 2026-06-15

Geometric mechanisms enabling spin- and enantio-sensitive observables in one photon ionization of chiral molecules

arXiv:2603.02735v3 Announce Type: replace-cross Abstract: We examine spin-resolved photoionization of randomly oriented chiral molecules via circularly polarized light, and revisit earlier predictions of Cherepkov (J. Phys. B: Atom. Mol. Phys. 16, 1543, 1983). We will show that the dynamical origin of spin- and enantio-sensitive observables arise from two intrinsic mechanisms that are quantified by two pseudovectors stemming from the geometric properties of the photoionization dipoles in spin space and in real space, and an extrinsic mechanism which is a directional bias introduced by the well-defined direction of light polarization. These mechanisms arise solely from electric dipole interactions. Consequently, this means that the ten independent parameters that was earlier predicted by Cherepkov to fully describe spin-resolved photoionization of chiral molecules can be reduced as moments of these three pseudovectors. We also find that the molecular pseudoscalars describing the spin- and enantio-sensitive components of the yield can be described by the flux of these pseudovectors through the energy shell, which changes sign upon switching enantiomers. Our results provide compact expressions for these observables which provide an intuitive picture on what determines the strength of these spin- and enantio-sensitive observables. The approach can be readily generalized to photoexcitation, multiphoton processes, and arbitrary field polarizations. Regardless of the specific driving conditions, the resulting spin- and enantio-sensitive observables are still controlled by the same three pseudovectors, underscoring their universal role as the primary generators of chirality-induced spin asymmetries, emphasizing their fundamental geometric origin and the universality of the mechanism identified here.

18.
arXiv (CS.AI) 2026-06-18

Enhancing CVRP Solver through LLM-driven Automatic Heuristic Design

arXiv:2602.23092v2 Announce Type: replace Abstract: The Capacitated Vehicle Routing Problem (CVRP), a fundamental combinatorial optimization challenge, focuses on optimizing fleet operations under vehicle capacity constraints. While extensively studied in operational research, the NP-hard nature of CVRP continues to pose significant computational challenges, particularly for large-scale instances. This study presents AILS-AHD (Adaptive Iterated Local Search with Automatic Heuristic Design), a novel approach that leverages Large Language Models (LLMs) to revolutionize CVRP solving. Our methodology integrates an evolutionary search framework with LLMs to dynamically generate and optimize ruin heuristics within the AILS method. Additionally, we introduce an LLM-based acceleration mechanism to enhance computational efficiency. Comprehensive experimental evaluations against state-of-the-art solvers, including AILS-II and HGS, demonstrate the superior performance of AILS-AHD across both moderate and large-scale instances. Notably, our approach establishes new best-known solutions for 8 out of 10 instances in the CVRPLib large-scale benchmark, underscoring the potential of LLM-driven heuristic design in advancing the field of vehicle routing optimization.

19.
arXiv (CS.LG) 2026-06-17

Edge Flow: A Tractable and Predictive Continuous-Time Model for Gradient Descent at the Edge of Stability

arXiv:2606.18080v1 Announce Type: new Abstract: Gradient descent in deep learning may operate at the edge of stability (EoS), a regime in which the largest eigenvalue of the loss Hessian hovers near the stability threshold $2/\eta$, where $\eta$ is the learning rate. Classical analysis tools such as gradient flow and the descent lemma do not apply here, motivating the search for a continuous-time model valid at EoS. We propose Edge Flow, a system of three coupled ordinary differential equations that provides a tractable, faithful, and predictive model of gradient descent dynamics at EoS. Edge Flow decomposes the dynamics into a center, an oscillation direction, and an oscillation magnitude. The center follows a modified gradient flow on a symmetrized loss; the direction tracks a top eigenvector of the Hessian via Rayleigh quotient dynamics; and the magnitude grows or decays exponentially depending on whether the sharpness exceeds or falls below the threshold $2/\eta$. Crucially, sharpness stabilization emerges from the coupled dynamics via a self-stabilization feedback loop. Discretizing Edge Flow only requires two gradient evaluations and one Hessian–vector product at each iteration. We demonstrate empirically that Edge Flow tracks the dynamics of gradient descent at least as faithfully as previously proposed continuous-time EoS models, while in addition resolving the oscillation of the sharpness at the onset of EoS, and that it provides a principled framework for understanding and mitigating instabilities in this regime.

20.
arXiv (CS.AI) 2026-06-15

Hybrid Open-Ended Tri-Evolution Makes Better Deep Researcher

arXiv:2606.13710v1 Announce Type: new Abstract: Deep research and agent evolution serve as de-facto tasks for AI agents in real-world applications toward artificial general intelligence. The former enables autonomous retrieval and integration of information in open-ended environments to tackle open-ended research tasks, yet it is constrained by the static parametric deep research capabilities of agent systems. The latter allows agents to autonomously interact with the environment to gain experiences that evolve model capabilities. However, its effectiveness has been widely validated only on verifiable tasks with standard answers, leaving a gap with open-ended research tasks. To bridge these two critical tasks, we propose the Hybrid Open-Ended Tri-Evolution (HOTE) framework, which leverages hybrid-mode reinforcement learning to facilitate the collaborative evolution of a proposer, solver and judge based on web-scale knowledge, moving toward autonomous evolving agents in open-ended tasks and environments. Extensive experiments on three long-form deep research benchmarks demonstrate that the 8B model trained via HOTE surpasses the strongest static open 8-32B models as well as those trained by state-of-the-art deep research training methods with less time overhead, and further verify that the evolution of all three modules in HOTE is indispensable.

21.
arXiv (CS.LG) 2026-06-16

False Sense of Safety in Selective Signal Classification: Auditing Bound Tightness and Exchangeability for Risk Control

arXiv:2606.15153v1 Announce Type: new Abstract: Selective prediction with distribution-free risk control promises that, with confidence 1-delta over the calibration draw, the error rate of accepted inputs stays below a user budget alpha. We audit this promise on signal-domain detectors – machine anomalous-sound detection (ASD) and AI-generated-image forensics – for four calibration rules: uncertified empirical thresholding (NAIVE) and certified Hoeffding, Clopper-Pearson (CP), and betting (WSR) upper confidence bounds. We report three findings. (i) NAIVE thresholding, common in practice, exceeds its declared budget in 49-73% of synthetic trials (n=200 calibration points) and in up to 68% of real-data splits: a false sense of safety rather than a broken theorem, since the rule never had a certificate. (ii) Tightness matters: CP and WSR certify substantial coverage where Hoeffding certifies none, with zero observed budget overruns under exchangeable splits. (iii) Under grouped deployment (unseen machine types or generators), certified rules overrun in 9-30% of trials – far above delta – showing the failure lies in the broken exchangeability premise, not in the bounds; a conservative per-group threshold restores validity at a severe coverage cost.

22.
arXiv (CS.AI) 2026-06-16

Gaming-Resistant Insurance Contracts for Autonomous AI Agents: Strategy-Proof Toll Mechanism Design

arXiv:2606.16326v1 Announce Type: cross Abstract: Paper A defines a time-consistent actuarial runtime that prices each side-effect-bearing action against a contractually fixed safe default and gates execution against a reserve budget. It treats the operator as passive. This paper makes the operator strategic. We characterise a five-attack space for autonomous AI-agent insurance contracts and prove when the actuarial runtime is gaming-resistant. Two attack surfaces – post-toll safe-default selection and within-boundary action splitting – are closed by Paper A's minimal-authority and no-splitting clauses. The remaining three require new contract clauses. First, common-control aggregation prevents cross-boundary re-routing from reducing toll below the boundary potential applied to total exposure. Second, interface failures such as invalid JSON are contract-relevant events, not safety wins: treating them as zero-toll safe defaults can reward unreliable models, while escalation fees reverse the incentive. We validate this interface-compliance theorem on committed cross-model traces from the companion empirical paper. Third, a model-identity menu with a componentwise-minimum penalty schedule makes truthful reporting of the deployed model weakly dominant. We then compose these clauses with Paper A's runtime guarantees to obtain joint incentive compatibility over the five-attack space. Finally, a two-parameter premium family discharges operator individual rationality and weak budget balance at the truthful equilibrium. The result is an incentive-compatibility layer for actuarial control of autonomous-agent side effects.

23.
arXiv (CS.CL) 2026-06-12

InnoEval: On Research Idea Evaluation as a Knowledge-Grounded, Multi-Perspective Reasoning Problem

The rapid evolution of Large Language Models has catalyzed a surge in scientific idea production, yet this leap has not been accompanied by a matching advance in idea evaluation. The fundamental nature of scientific evaluation needs knowledgeable grounding, collective deliberation, and multi-criteria decision-making. However, existing idea evaluation methods often suffer from narrow knowledge horizons, flattened evaluation dimensions, and the inherent bias in LLM-as-a-Judge. To address these, we regard idea evaluation as a knowledge-grounded, multi-perspective reasoning problem and introduce InnoEval, a deep innovation evaluation framework designed to emulate human-level idea assessment. We apply a heterogeneous deep knowledge search engine that retrieves and grounds dynamic evidence from diverse online sources. We further achieve review consensus with an innovation review board containing reviewers with distinct academic backgrounds, enabling a multi-dimensional decoupled evaluation across multiple metrics. We construct comprehensive datasets derived from authoritative peer-reviewed submissions to benchmark InnoEval. Experiments demonstrate that InnoEval can consistently outperform baselines in point-wise, pair-wise, and group-wise evaluation tasks, exhibiting judgment patterns and consensus highly aligned with human experts.

24.
arXiv (CS.CV) 2026-06-16

CLAP: Contrastive Latent Action Pretraining for Learning Vision-Language-Action Models from Human Videos

Generalist Vision-Language-Action models remain constrained by the scarcity of robotic data relative to the abundance of human video demonstrations. Existing Latent Action Models attempt to use video data but often suffer from visual entanglement, encoding noise rather than manipulation skills. To address this limitation, we propose Contrastive Latent Action Pretraining (CLAP), a framework that first uses Act-VAE to learn an executable action-token vocabulary from robot trajectories and then aligns human visual transitions with this vocabulary through contrastive learning. This alignment maps unlabeled human videos into a physically grounded latent action space rather than reconstructing appearance. Building on the aligned tokens, we train CLAP-NTP as an autoregressive VLA using robot demonstrations and pseudo-labeled human videos, preserving instruction following and object generalization. For deployment and target-domain adaptation, we further introduce a post-training strategy that combines CLAP-RF, a Rectified Flow action head for low-latency continuous action chunk prediction, with Knowledge Matching regularization to preserve pretrained semantic knowledge during fine-tuning. Extensive experiments show that CLAP achieves strong performance against competitive baselines while enabling effective skill transfer from human videos to robotic execution.

25.
arXiv (math.PR) 2026-06-11

Construction of ergodic IDLA forests in $\mathbb{Z}^d$

arXiv:2506.10476v2 Announce Type: replace Abstract: We prove the existence of infinite-volume IDLA forests in $\mathbb{Z}^d$ , with $d \geq 2$, based on a multi-source IDLA protocol. Unlike IDLA aggregates, the laws of the IDLA forests studied here depend on the trajectories of particles, and then do not satisfy the famous Abelian property. Their existence is due to a stabilization result (Theorem 1.1, our main result) that we establish using percolation tools. Although the sources are infinitely many, we also prove that each of them play the same role in the building procedure, which results in an ergodicity property for the IDLA forests (Theorem 1.2).