Academic Intelligence · Curated Daily

探索全球前沿学术脉络

AcademicHub 汇聚顶级期刊与预印本平台的实时文献。定制您的专属科研雷达,利用大语言模型自动生成交叉领域文献分析简报。

01.
arXiv (CS.LG) 2026-06-16

DiRecT: Safe Diffusion-Based Planning via Receding-Horizon Denoising

arXiv:2606.15359v1 Announce Type: new Abstract: Diffusion models have emerged as powerful tools for planning and control by learning multimodal distributions over actions and trajectories. Yet reliable inference-time safety enforcement remains a key barrier to their deployment in safety-critical tasks. Existing approaches typically project each denoising iterate onto the feasible set, even though constraints are defined only on the final clean trajectory. Enforcing feasibility on noisy intermediate samples can therefore overconstrain the sampling dynamics, substantially degrading sample quality. To address this limitation, we introduce DiRecT (Diffusion-based planning via Receding-horizon denoising with Terminal constraints), a training-free algorithm for constrained sampling from diffusion models via stochastic optimal control (SOC). DiRecT enforces constraints only on the final clean sample, avoiding unnecessary restrictions on the intermediate denoising dynamics. Inspired by model predictive control, we derive a principled receding-horizon surrogate for the otherwise intractable constrained SOC formulation, yielding an efficient algorithm that cleanly separates stochastic denoising from constraint satisfaction, progressively steering samples toward feasible final trajectories without distorting the learned diffusion dynamics. Furthermore, DiRecT is highly flexible: it can leverage off-the-shelf or domain-specific optimizers, incorporate priors over environment dynamics, and optimize additional soft rewards. Extensive experiments on safe planning benchmarks demonstrate that DiRecT substantially improves deployment safety and task performance over existing diffusion-based planning baselines.

02.
arXiv (CS.AI) 2026-06-19

Charting the Future of Scholarly Knowledge with AI: A Community Perspective

arXiv:2509.02581v2 Announce Type: replace-cross Abstract: Despite the growing availability of tools designed to support scholarly knowledge extraction and organization, many researchers still rely on manual methods, sometimes due to unfamiliarity with existing technologies or limited access to domain-adapted solutions. Meanwhile, the rapid increase in scholarly publications across disciplines has made it increasingly difficult to stay current, further underscoring the need for scalable, AI-enabled approaches to structuring and synthesizing scholarly knowledge. Various research communities have begun addressing this challenge independently, developing tools and frameworks aimed at building reliable, dynamic, and queryable scholarly knowledge bases. However, limited interaction across these communities has hindered the exchange of methods, models, and best practices, slowing progress toward more integrated solutions. This manuscript identifies ways to foster cross-disciplinary dialogue, identify shared challenges, categorize new collaboration and shape future research directions in scholarly knowledge and organization.

03.
arXiv (CS.LG) 2026-06-19

The Token Is a Group Element: On Lie-Algebra Attention over Matrix Lie Groups

arXiv:2606.20547v1 Announce Type: new Abstract: We place the attention token on the group: a token is an element $g_i$ of a matrix Lie group $G$ – a bare transformation, with no feature payload and no external action $\rho(g)$ carrying it. To our knowledge this is the first attention construction whose tokens are bare matrix Lie group elements: their score is the closed-form algebra norm of the relative pose rather than a learned kernel, and it reaches the affine full-frame groups that every irrep- or surjective-exp-based method must exclude. We call it Lie-Algebra Attention. Once tokens are group elements, the rest follows with none of the usual representation-theoretic machinery. The relative geometry of a pair is canonical, $g_i^{-1} g_j$, so the pairwise invariant $w_{ij} = \log(g_i^{-1} g_j)$ is intrinsic rather than designed; equivariance under the diagonal $G$-action is tautological, and the cocycle condition holds automatically. The attention score is the negative squared algebra norm, $s_{ij} = -\|\log(g_i^{-1} g_j)\|_\lambda^2/\tau$: the canonical proximity kernel under a block-weighted Frobenius inner product, with no irreducible representations, spherical harmonics, Clebsch-Gordan products, or learned kernel. The construction applies to any matrix Lie group on a chosen logarithm chart containing the relative poses, including the non-compact non-abelian affine groups with scale and shear that no vector-token attention method reaches: neither the irrep tradition nor surjective-exp methods. Three sequence-completion experiments, on SE(2), SO(3), and Aff(2), bear this out: the closed-form score matches a learned MLP kernel on the same invariant and outperforms it on SE(2), using 50 to 80x fewer score parameters, while a vector-token baseline breaks invariance by five to twelve orders of magnitude.

04.
arXiv (CS.LG) 2026-06-15

Deep Doubly Debiased Longitudinal Effect Estimation with ICE G-Computation

arXiv:2602.12379v2 Announce Type: replace Abstract: Estimating longitudinal treatment effects is essential for sequential decision-making but is challenging due to treatment-confounder feedback. While Iterative Conditional Expectation (ICE) G-computation offers a principled approach, its recursive structure suffers from error propagation, corrupting the learned outcome regression models. We propose D3-Net, a framework that mitigates error propagation in ICE training and then applies a robust final correction. First, to interrupt error propagation during learning, we train the ICE sequence using Sequential Doubly Robust (SDR) pseudo-outcomes, which provide bias-corrected targets for each regression. Second, we employ a multi-task transformer with a covariate simulator head for auxiliary supervision, regularizing representation learning, and a target network to stabilize training dynamics. For the final estimate, we discard the SDR correction and instead use the uncorrected nuisance models to perform Longitudinal Targeted Minimum Loss-Based Estimation (LTMLE) on the original outcomes. This second-stage, targeted debiasing ensures robustness and optimal finite-sample properties. Comprehensive experiments demonstrate that our model, D3-Net, robustly reduces bias and variance across different horizons, counterfactuals, and time-varying confoundings, compared to existing state-of-the-art ICE-based estimators.

05.
arXiv (CS.AI) 2026-06-19

Too long; didn't solve

arXiv:2604.07593v2 Announce Type: replace Abstract: Mathematical benchmarks consisting of a range of mathematics problems are widely used to evaluate the reasoning abilities of large language models, yet little is known about how their structural properties influence model behaviour. In this work, we investigate two structural length variables, prompt length and solution length, and analyse how they relate to model performance on a newly constructed adversarial dataset of expert-authored mathematics problems. We find that both prompt and solution lengths correlate positively with increased model failure across models. We also include a secondary, exploratory analysis of cross-model disagreement. Under a difficulty-adjusted normalised analysis, both variables retain weak negative associations with realised model separation, slightly stronger for prompt length. Overall, our main robust finding is that structural length is linked to empirical difficulty in this dataset.

06.
arXiv (quant-ph) 2026-06-16

Information geometry and entanglement under phase-space deformation through nonsymplectic congruence transformation

arXiv:2505.02269v3 Announce Type: replace Abstract: The Fisher-Rao (FR) information matrix is a central object in multiparameter quantum estimation theory. The geometry of a quantum state can be envisaged through the Riemannian manifold generated by the FR-metric corresponding to the quantum state. Interestingly, any congruence transformation $GL(2n,\mathbb{R})$ in phase space leaves the FR-distance for Gaussian states invariant. In the present paper, we investigate whether this isometry affects the entanglement in the bipartite system. It turns out that the entanglement-generating congruent transformation depends upon the system and background space. To make our study relevant to physical systems, we choose Bopp's shift in phase space as an example of $GL(2n,\mathbb{R})$, so that the results can be interpreted in terms of noncommutative (NC) phase-space deformation. We provide an estimation of the measure of entangled states over separable states for bipartite Gaussian states under a Bopp's shift. Since the dynamics of free oscillators in background NC-space is mathematically equivalent to the dynamics of a charged particle under a homogeneous magnetic field, we provide an outline for a gedankenexperiment through photocurrent measurement in order to determine the effects of congruent transformation on the distinguishibility of Gaussian states.

07.
medRxiv (Medicine) 2026-06-22

Association of Digoxin Use at Norwood Discharge with Fontan Completion: A Study from the Pediatric Heart Network Public Dataset

Background: Digoxin use after the Norwood procedure has been associated with improved interstage survival in hypoplastic left heart syndrome and related conditions. Whether this benefit translates into improved longer-term outcomes through staged palliation remains unknown. We aimed to determine the association of digoxin use at Norwood discharge with transplant-free survival and Fontan completion. Methods: We conducted a retrospective cohort study using the Pediatric Heart Network (PHN) Single Ventricle Reconstruction trial public dataset, including 549 infants enrolled at 15 North American centers between 2005 and 2008. Competing risk analysis was used to evaluate Fontan completion and Cox regression to assess death or transplantation within 6 years after the Norwood procedure. Mixed-effects models compared pre-Fontan hemodynamic and echocardiographic right ventricular indices between patients treated with and without digoxin after accounting for center clustering and adjustment for sex, shunt type, heart failure medications at Norwood discharge, and census block poverty level. Results: The 6-year cumulative incidence of Fontan completion was higher among patients discharged on digoxin than among those not receiving digoxin (82% vs 71%; p = 0.013). Competing-risk analysis accounting for death and transplant demonstrated a greater likelihood of Fontan completion among digoxin users (aHR 1.31; 95%CI 1.09-1.58; p = 0.005), without significant difference in the hazard of death or transplant (aHR 0.78; 95%CI 0.53-1.15; p = 0.208). No significant differences in pre-Fontan hemodynamic or echocardiographic indices were observed between groups. Initiation of digoxin post Stage II procedure was not associated with improved survival or likelihood to complete Fontan. Conclusion: Digoxin use at the time of Norwood discharge was associated with a 30% greater likelihood of Fontan completion by 6 years, without accompanying improvement in transplant-free survival. These findings extend prior observations of improved interstage outcomes associated with digoxin use and suggest that treatment may facilitate progression through staged palliation.

08.
arXiv (CS.CL) 2026-06-16

Interactor: Agentic RL oriented Iterative Creation for Ad Description Generation in Sponsored Search

This paper focuses on automatically generating informative ad descriptions in sponsored search. Unlike ad titles which are usually optimized to attract user click feedbacks, ad descriptions have a longer text span and possess the potential of incorporating world knowledge to address user search intents while presenting the fine-grained selling points of the ads. We propose Interactor, a multi-turn iterative creation framework optimized with agentic RL for ad description generation. The generation model acts as a policy that interacts with a customized environment consisting of multiple generative reward models. Given initial generations by the policy, the customized GenRMs evaluate multi-dimensional qualities including knowledge capacity and landing page consistency, providing both binary signals and reasoning feedbacks. The policy then iteratively refines the descriptions based on such feedbacks to ensure continuous improvement. Experiments on industrial datasets show that the Interactor framework significantly outperforms state-of-the-art approaches in generating knowledge-rich and faithful ad descriptions. Since May 2026, it has been deployed online in a leading search ads system, contributing to both ad revenue and user experience.

09.
arXiv (CS.CL) 2026-06-19

The Almost Intelligent Revolution: Options for Scaling Up Deliberation and Empowering People with AI

The increasing prominence of Large Language Models (LLMs) in public discourse presents both opportunities and challenges for democratic deliberation. While red teaming strategies help mitigate specific risks, broader concerns persist regarding linguistic constraints, biases, and the sycophantic tendencies of LLMs. This chapter explores how LLMs can be used to significantly scale up and democratise deliberation, particularly in fostering inclusivity and empowering traditionally marginalised groups. Drawing on concepts from Systemic-Functional Linguistics, the chapter examines how variations across language users (for example, with respect to socio-demographic groups) and across language use (for example, with respect to communicative functions) shape participation in AI-supported deliberation. The chapter presents AI-driven deliberation studies and assesses their potential to scaffold argumentation, enhance access, and reduce the influence of exclusionary linguistic norms and biases which are embedded in prestigious registers. At the same time, the chapter cautions against both overclaiming, which leads to unrealistic expectations, and underclaiming, which risks missed opportunities for AI-assisted engagement. The chapter concludes by identifying future research directions to maximise the democratic potential of AI-assisted participation while embedding ethical safeguards to counteract the reproduction of linguistic inequalities.

10.
medRxiv (Medicine) 2026-06-22

Assessment of adaptive functioning in Angelman syndrome using the Vineland Adaptive Behavior Scales, Third Edition

Purpose: This study examined longitudinal trajectories of adaptive functioning in 331 individuals with Angelman syndrome (AS) using the Vineland Adaptive Behavior Scales, Third Edition (Vineland-3) and examined differences by molecular subtype. Methods: A total of 331 individuals (156 females, 47%) with genetically confirmed AS (ages 6 months to 52 years) were assessed between 2018 and 2025, including 207 with a deletion subtype, 63 with uniparental disomy or imprinting defect, and 61 with a UBE3A point mutation. Growth scale values were analyzed using linear mixed-effects models with log2-transformed age. Results: Individuals with deletion subtypes demonstrated significantly lower adaptive functioning across domains compared to those with non-deletion subtypes. Adaptive skills across all Vineland-3 subdomains increased nonlinearly with age, showing faster growth early in life that slowed over time, with largely parallel trajectories across subtypes. Conclusion: Individuals with AS demonstrate slow but steady growth in adaptive functioning that continues into adulthood, with progress varying by molecular subtype. These findings provide updated natural history benchmarks and demonstrate the utility of the Vineland-3 for clinical trials.

11.
arXiv (CS.AI) 2026-06-24

Integrated Sensing and Communications for Real-time Avatar Control in XR over 5G

arXiv:2606.23771v1 Announce Type: cross Abstract: Extended Reality (XR) presents a challenging use case for 5G and 6G networks, requiring high data-rates and lowlatency communication to deliver a truly immersive experience. Moreover, in order to seamlessly translate physical actions to the virtual world, accurate gesture recognition and pose estimation are required. Current XR interaction solutions based on handheld controllers and cameras cannot easily capture full-body poses, inhibit the free use of hands, and require good visibility and a clear line of sight. In this work, we propose a multimodal sensing architecture for XR that combines 5G MillimeterWave (mmWave) Integrated sensing and communication (ISAC) and surface electromyography (sEMG) signals. 5G mmWave ISAC cannot only be used to deliver content wirelessly to the Head-mounted display (HMD), but also the same communication signals can be used to derive coarse body-level gestures and poses of the user, to support real-time avatar control. For fine-grained finger-level gestures, our architecture leverages lightweight sEMG sensors that capture forearm muscle activity. To illustrate the need of both modalities, we present evaluations of both sensing technologies. At the body level (5G), our architecture relies on power-per-beam-pair (PPBP), which can be computed from standard beam management or beam sweeping procedures of the 5G NR standard. PPBP-based sensing achieves 82.2$\pm$5.9% average accuracy when evaluated on users not seen during training. For fine-grained finger-level interactions, we show that surface electromyography (sEMG) carries strong discriminative information achieving consistent promising performance across different movement settings. Thus, combining the two modalities enables multi-scale gesture recognition, at the body level via existing 5G signals and finger level via lightweight sEMG sensors, forming a complete XR framework.

12.
arXiv (math.PR) 2026-06-24

On the convex hull of a planar Brownian bridge with a random Gaussian endpoint

arXiv:2606.24485v1 Announce Type: new Abstract: We consider a one-parameter family of isotropic planar Gaussian processes \[ X_\sigma(t) =B_t+\sigma t Z,\qquad 0\le t\le 1,\quad 0\le \sigma\le 1, \] where $B$ is a standard ($0$-to-$0$) planar Brownian bridge on $[0,1]$, and $Z\sim \mathrm N(0,I)$ is a standard Gaussian random vector independent of $B$. The family interpolates between standard planar Brownian bridge ($\sigma=0$) and standard planar Brownian motion ($\sigma=1$). As the main result of the paper we compute the expected perimeter and area of the convex hull of the random set $\left\{X_\sigma(t) \colon 0\le t\le 1\right\}$ as closed formulas in terms of $\sigma$, and recover the classical Brownian bridge and Brownian motion values at $\sigma=0$ and $\sigma=1$. We also consider the convex hull spanned by multiple independent processes of this type and the possibilities for closed formulas in special cases. The key observation in our argument is that the isotropy property reduces the expected perimeter and area to one-dimensional quantities through the support function and Cauchy's formulas.

13.
arXiv (math.PR) 2026-06-18

Evolution of Conditional Entropy for Diffusion Dynamics on Graphs

arXiv:2510.19441v2 Announce Type: replace-cross Abstract: The modeling of diffusion processes on graphs is the basis for many network science and machine learning approaches. Entropic measures of network-based diffusion have recently been employed to investigate the reversibility of these processes and the diversity of the modeled systems. While results about their steady state are well-known, very few exact results about their finite-time evolution exist. Here, we introduce the conditional entropy of heat diffusion in graphs, and outline a mathematical framework that contextualizes diffusion and conditional entropy within the theories of continuous-time Markov chains and information theory. In particular, we highlight that this entropic measure satisfies an information-theoretical version of the second law of thermodynamics, thereby providing a parallelism between diffusion dynamics on networks and their physical counterparts. Furthermore, we obtain explicit results for its evolution on complete, path, and circulant graphs, as well as a mean-field approximation for Erdös-Rényi graphs. We also obtain asymptotic results for general networks and provide bounds for the evolution of conditional entropy. Finally, we experimentally demonstrate several properties of conditional entropy for diffusion over random graphs, such as the Watts-Strogatz model.

14.
arXiv (CS.CV) 2026-06-16

CASHEW: Stabilizing Multimodal Reasoning via Iterative Trajectory Aggregation

Vision-language models achieve strong performance across a wide range of multimodal understanding and reasoning tasks, yet their multi-step reasoning remains unstable. Repeated sampling over the same input often produces divergent reasoning trajectories and inconsistent final predictions. To address this, we introduce two complementary approaches inspired by test-time scaling: (1) CASHEW, an inference-time framework that stabilizes reasoning by iteratively aggregating multiple candidate trajectories into higher-quality reasoning traces, with explicit visual verification filtering hallucinated steps and grounding reasoning in visual evidence, and (2) CASHEW-RL, a learned variant that internalizes this aggregation behavior within a single model. CASHEW-RL is trained using Group Sequence Policy Optimization (GSPO) with a composite reward that encourages correct answers grounded in minimal yet sufficient visual evidence, while adaptively allocating reasoning effort based on task difficulty. This training objective enables robust self-aggregation at inference. Extensive experiments on 13 image understanding, video understanding, and video reasoning benchmarks show significant performance improvements, including gains of up to +26.2 percentage points on ScienceQA and +9.1 percentage points on EgoSchema.

15.
arXiv (CS.AI) 2026-06-11

Offline Diffusion Policy for Multi-User Delay-Constrained Scheduling

arXiv:2501.12942v2 Announce Type: replace Abstract: Effective multi-user delay-constrained scheduling is crucial in various real-world applications, including embodied AI, instant messaging, live streaming, and data center management, where efficient resource allocation is required among users with diverse delay sensitivities. In these scenarios, schedulers must make real-time decisions to satisfy both delay and resource constraints without prior knowledge of system dynamics, which are often time-varying and challenging to estimate. {Current learning-based methods typically require online interactions with actual systems during the training stage. Therefore, these approaches are often difficult or impractical, as they can significantly degrade system performance and incur substantial service costs.} To address these challenges, we propose a novel offline reinforcement learning-based algorithm, named \underline{S}cheduling By \underline{O}ffline Learning with \underline{C}ritic Guidance and \underline{D}iffusion Model (SOCD), to learn efficient scheduling policies purely from pre-collected offline data. SOCD innovatively employs a diffusion policy, complemented by a sampling-free critic network for policy guidance. By integrating the Lagrangian multiplier optimization into the offline reinforcement learning, SOCD efficiently trains high-quality constraint-aware policies exclusively from available datasets, eliminating the need for online interactions with the system. Experimental results demonstrate that SOCD is resilient to various system dynamics, including partially observable and large-scale environments, and delivers superior performance compared to existing methods.

16.
arXiv (CS.CV) 2026-06-24

DivRL: Disentangled Self-Similarity Rewards for Diverse Subject-Driven Generation

Subject-driven image generation faces an "Identity-Diversity Paradox", where strong identity preservation often leads to rigid and low-diversity outputs. We propose a post-training framework called DivRL that jointly optimizes identity consistency and structural diversity simultaneously by leveraging disentangled visual features from a robust similarity model. Specifically, we introduce a Negative Self-Similarity Measure (nSSM) to quantify structural diversity, and Visual Semantic Matching (VSM) to evaluate identity consistency. We propose an "Explore-and-Suppress" strategy that treats VSM as a gated constraint: the model freely explores structurally diverse configurations, and only samples that violate the identity threshold are penalized via a quadratic hinge loss. This converts identity preservation from a competing objective into a feasibility constraint, allowing nSSM and VSM to improve jointly. Experiments demonstrate that our method effectively pushes the model to generate both consistent and diverse images and improves structural diversity while maintaining comparable identity consistency through a gated optimization formulation.

17.
medRxiv (Medicine) 2026-06-10

Resolving Diagnostic Discordance in Group 2 Pulmonary Hypertension Through Staged Physiologic Testing: Insights From PVDOMICS

Background World Symposium on Pulmonary Hypertension (WSPH) Group 2 pulmonary hypertension (PH) is a clinically integrated phenotype attributed to left heart disease, whereas pre- versus post-capillary classification is operationalized primarily by pulmonary capillary wedge pressure (PCWP). Although current recommendations emphasize contextual interpretation and provocative testing for intermediate PCWP values, the relationship between PCWP-based classification and underlying phenotype has not been systematically evaluated. We aim to quantify phenotype-hemodynamic discordance across the PCWP spectrum and evaluate a staged physiology-guided framework incorporating inhaled nitric oxide (iNO), ventricular geometry, and provocative testing. Methods We studied 1,032 participants from the NHLBI-sponsored PVDOMICS cohort with multidisciplinary adjudicated phenotypes integrating clinical, imaging, physiologic, and hemodynamic data. Stage-specific PCWP thresholds classified pre- versus post-capillary physiology at rest, during iNO, and during provocation (fluid challenge or invasive cardiopulmonary exercise testing [iCPET]). Echocardiographic right ventricular-to-left ventricular (RV/LV) ratio was evaluated as a marker of ventricular interdependence. Restricted cubic spline and staged concordance analyses defined certainty-based PCWP ranges and incremental diagnostic yield. Results Adjudicated Group 2 phenotype was present in 37.0% of participants. Resting PCWP demonstrated good discrimination (AUC 0.86), but substantial bidirectional phenotype-hemodynamic discordance persisted across intermediate PCWP ranges. At a resting PCWP of 12 mmHg, 25% of participants classified as pre-capillary had adjudicated Group 2 PH, whereas at 18 mmHg, 35% classified as post-capillary remained discordant non-Group 2. Concordance did not approach 90% until PCWP values were 24 mmHg. Dynamic testing incrementally improved concordance within these overlap zones. Nearly half of adjudicated Group 2 PH participants (46.5%) were not identified by resting PCWP alone; incorporation of iNO and provocative testing increased cumulative Group 2 identification by 63.4% and improved sensitivity from 79.9% to 83.7%. Model discrimination improved from an AUC of 0.863 to 0.908 (likelihood-ratio P

18.
arXiv (CS.CL) 2026-06-24

Faithful by Construction: Claim-Anchored Attribution for Multi-Document Summarization

作者:

End-to-end large language models (LLMs) produce fluent multi-document summaries but remain prone to hallucination, and the attributions they offer are typically coarse (whole documents or passages) and generated post hoc, leaving each summary statement hard to verify. We revisit the modular Extract–Select–Rewrite paradigm and recast its intermediate representation as the unit of attribution. We present CAMS, a Claim-Anchored Multi-document Summarization framework that (i) extracts atomic claims with token-level provenance from every source document, (ii) clusters equivalent claims across documents while flagging inter-source conflicts, (iii) selects a support-aware and salient subset, and (iv) rewrites the selection into a summary in which every sentence is anchored to a support-checked claim that links back to one or more source spans. Because content is localized before it is realized, the pipeline is attribution-oriented by construction and faithfulness-oriented by construction: it structurally preserves fine-grained, multi-source traceability while using support-aware selection, constrained rewriting, and verification to encourage, rather than guarantee, factual faithfulness. We evaluate quality, faithfulness, and localization on MultiNews, analyze conflict handling on DiverseSumm, and test zero-shot transfer on WCEP, using a two-regime protocol that separates reference-free citation quality from gold-aligned localization accuracy, and we add an evaluator-decoupled audit that tests citation precision with a support model never used for selection or verification. CAMS matches strong end-to-end and span-attribution baselines on summary quality while substantially improving faithfulness and citation precision, lifting multi-source attribution accuracy by roughly two-thirds, and exposing a controllable faithfulness–coverage trade-off that end-to-end models leave implicit.

19.
arXiv (CS.LG) 2026-06-24

Adversarial dynamical systems characterize when data-driven learning succeeds or fails

arXiv:2407.06312v2 Announce Type: replace-cross Abstract: Many systems resist analytical modeling, making data-driven inference of dynamics important. Yet data-driven methods can fail to converge or generalize, leaving open a central question: When can system behavior be learned reliably from data, and when is such learning impossible? We answer this question using adversarial dynamical systems to identify the boundary between accessible and inaccessible regimes. In Koopman operator learning, a leading framework for representing nonlinear dynamics through linear spectral objects, we design optimal data-driven spectral algorithms with convergence and certification guarantees under conditions arising broadly in physical systems. This yields a convergence theory for Koopman-operator approximations and resolves a longstanding open problem in Koopman spectral analysis. Conversely, by constructing adversarial systems, we prove matching impossibility results: without these conditions, no single-sequence limiting procedure can guarantee learning, regardless of data quality. These results sharply characterize when data-driven spectral learning can succeed and when it must fail. We validate the framework on oscillators, chaotic fluid flows and Arctic sea ice concentration forecasting. In the latter, we uncover hidden modes of Arctic sea ice decline, deliver long-range forecasts with geographic error bounds, and outperform state-of-the-art dynamical and deep learning models at substantially lower computational cost, enabling real-time deployment on standard CPUs.

20.
arXiv (CS.AI) 2026-06-15

StainFlow: Entity-Stain Tracking and Evidence Linking for Process Rewards in GUI Agents

arXiv:2606.07027v2 Announce Type: replace Abstract: Reinforcement Learning (RL) has become a promising approach for improving GUI Agents in long-horizon, stochastic digital environments, but trajectory-level success feedback is too sparse to provide reliable credit assignment for intermediate exploration steps. To mitigate this issue, recent studies introduce Process Reward Models (PRMs), which provide finer-grained training feedback through global milestone verification or local step-level evaluation. However, these methods still suffer from two level-specific limitations: global milestone decomposition is subjective and singular, making it difficult to accommodate the multiple valid execution paths in real GUI tasks, while fixed local judging windows may miss long-range key evidence or dilute the decision signal with irrelevant frames. Inspired by stain-tracing mechanisms in network flow analysis, we propose StainFlow, an entity-stain-flow process reward model for GUI Agents. To reduce the subjectivity of global partitioning, we introduce the Global Entity Stain Tracking module, which extracts visually verifiable task entities and tracks how their stain concentrations and states evolve along the trajectory, allowing task phases to be objectively separated by changes in the entity evidence flow. To improve the accuracy of local verification, we introduce the Local Stain Evidence Linking module. Centered on the triggering entities of each candidate key node, it retrieves relevant steps based on their stain concentrations and state changes, and dynamically constructs high-density evidence windows for verifying true key nodes. Extensive experiments on AndroidWorld and OGRBench show that StainFlow relatively improves online RL success by 3.2% and trajectory completion judgment accuracy by 1.8%.

21.
arXiv (CS.LG) 2026-06-16

Rethinking Structural Anomaly Detection: From Decision Boundaries to Projection Operators

arXiv:2606.15280v1 Announce Type: new Abstract: Most existing anomaly detection methods rely on estimating a probability density or learning an enclosing decision boundary, implicitly assuming that normal data occupies a region of non-zero volume in the ambient space. In contrast, structural anomaly detection considers data that lies near a low-dimensional manifold, creating a mismatch between the inductive bias of existing methods and the structure of the data, often resulting in degraded performance. To address this mismatch, we introduce a geometric perspective. Specifically, we learn a projection operator onto the manifold of normal samples and define a sample as anomalous if it is altered by this projection. This formulation naturally integrates the inductive bias of manifold-supported data and reframes anomaly detection in terms of a projection residual, thereby resolving issues arising from modeling degenerate distributions. Notably, it provides a unifying interpretation of reconstruction-based methods by explaining their success and failure in terms of projection quality. In particular, it explains the strong generalization ability of projection-aligned models as a consequence of contraction behavior toward the manifold. Moreover, by decoupling anomaly detection from probabilistic modeling, it reduces the tendency to misclassify rare but normal samples, a widely recognized limitation of existing approaches. Empirically, we demonstrate that projection-aligned methods achieve strong performance, outperforming boundary-based methods while improving upon existing reconstruction-based approaches.

22.
arXiv (CS.CV) 2026-06-17

Two-Stage Fine-Tuning of ResNet50 for High-Sensitivity Melanoma Detection on Dermoscopic Images

作者:

Melanoma is the most dangerous form of skin cancer with five-year survival rates exceeding 99% when detected early but falling sharply once the disease spreads. This paper proposes and evaluates a two-stage fine-tuning approach for ResNet50 applied to binary melanoma classification on dermoscopic images. The core challenges addressed are class imbalance and suboptimal transfer learning from single-stage fine-tuning. After stratified train/validation/test splitting, random oversampling was applied exclusively to the training set to achieve a 1:1 class balance. Stage 1 trained only the classification head with the ResNet50 base frozen, while Stage 2 fine-tuned all layers jointly at a low learning rate of 1e-5 to prevent catastrophic forgetting of learned visual features. On an independent test set of 3,826 images, the model achieved an AUC-ROC of 0.9559, accuracy of 88.34%, sensitivity of 87.56%, specificity of 89.13%, and F1-score of 88.29%. An ablation study confirms the two-stage protocol significantly outperforms single-stage fine-tuning, with sensitivity gains of over 4%. Grad-CAM visualizations demonstrate correct lesion localization. A fully deployable Streamlit detection application is provided alongside all training code.

23.
Nature (Science) 2026-06-17

The EU needs to back its ambition to end animal testing with cash

作者: 未知作者

The European Union has declared that it wants to stop using animals in chemical safety testing. Its goal will need a timeline and a serious funding commitment. The European Union has declared that it wants to stop using animals in chemical safety testing. Its goal will need a timeline and a serious funding commitment.

24.
arXiv (CS.LG) 2026-06-12

ShapeBench: A Scalable Benchmark and Diagnostic Suite for Standardized Evaluation in Aerodynamic Shape Optimization

arXiv:2605.20763v2 Announce Type: replace Abstract: Rapid progress in aerodynamic shape optimization (ASO) has outpaced currently-available standardized evaluation frameworks. Fair comparison requires a unified benchmark spanning diverse shape classes, objective formulations, and matched-budget state-of-the-art baselines. We introduce ShapeBench, an open-source ASO benchmark with a unified API spanning 103 tasks across eight shape categories and multiple optimization regimes. Each ShapeBench task includes a validated surrogate for fast search; when feasible, a high-fidelity Computational Fluid Dynamics (CFD) pipeline for final verification is available, enabling systematic fidelity-gap analysis. ShapeBench provides a reproducible protocol with well-configured baselines to compare fairly using a consistent budget metric, allowing for comparison among both classical and LLM-driven methods, including general-purpose optimizers and a new domain-specialized evolutionary LLM baseline, ShapeEvolve. Results on ShapeBench demonstrate substantial variance in optimizer rankings across shape categories and problem formulations, with mean pairwise Spearman $\rho = 0.013$, so single-task conclusions do not reliably generalize across problem classes. The benchmark is also far from saturation; classical methods are rarely applicable across all shape categories and tasks, further highlighting the need for more general-purpose approaches.

25.
arXiv (CS.AI) 2026-06-17

Kolmogorov Regression for Robust Diffusion Policies

作者:

arXiv:2606.18186v1 Announce Type: cross Abstract: Finite-dimensional (FD) diffusion policies exhibit temporal drift owing to discretization artifacts that degrade long-horizon performance (when deployed on physical systems). We introduce a backward Kolmogorov equation that lifts diffusion policies to a Cameron-Martin space – a subset of the Hilbert space. Essentially, replacing stochastic score matching with a deterministic boundary-value PDE problem. Our core innovation thrives on Gaussian measure theory whereupon the diffusion noise covariance operator is realized from a colored noise distribution which prescribes a notion of regularity on samples from the model at inference time. We train the diffusion model with a derived precision-weighted Cameron- Martin loss and a Kolmogorov residual is introduced as a PDE diagnostic during inference. These substitutions yield (i) convergence guarantees where the bound's constants depend on the effective rank of the kernel rather than action dimension, (ii) improved trajectory regularity via spectral weighting, and (iii) a deterministic failure detector without reward signals. Validation across two application domains demonstrates substantial improvements: on the PushT manipulation benchmark, the Cameron-Martin loss achieves a 17% improvement in maximum episode reward (0.95 vs. 0.78 for MSE) and 67.6% reduction in inter-step drifts during inference via the introduced residual magnitude. Similarly, on a 6-station manufacturing line with constant work-in-process (CONWIP) flow control, we achieve 28.4% lower RMSE than classical LSTM baselines; a high starvation-event recall (1.0 in test cycles), and effective bottleneck identification (Precision@1 = 1.0 in test set, 13x signal-to-noise ratio). We then certify the dispatch policies with Hamilton-Jacobi reachability theory which reduces deadlock events by 96% compared to uncontrolled dispatch over 100 simulated runs (351 events prevented).