Academic Intelligence · Curated Daily

探索全球前沿学术脉络

AcademicHub 汇聚顶级期刊与预印本平台的实时文献。定制您的专属科研雷达,利用大语言模型自动生成交叉领域文献分析简报。

01.
arXiv (CS.CV) 2026-06-16

Sub-Semantic Image Segmentation

Images can be segmented based on visual cues (i.e., texture segmentation) or into objects (i.e., semantic segmentation). We propose a new category of sub-semantic image segmentation that blurs the line between the two. In sub-semantic image segmentation, language is not used to name whole objects. Instead, it is used to partition an image into stable appearance patterns that can be described by language. To do that, we couple a general-purpose vision-language model to SAM 3, a promptable segmentation backbone whose native text pathway can ground rich descriptions into masks. Simple coupling fails for a number of reasons that we identify in the paper, and we overcome them by introducing DETECTURE that resolves three concrete failure modes – language leakage between texture regions, prompt competition inside the segmentation backbone, and semantic distortion at the language-to-mask interface. Since there is no dataset of sub-semantic image segmentation, we introduce one, termed TextureADE. The new dataset is derived from the ADE20K dataset using a system we designed. We compare DETECTURE to a number of baselines and find that it achieves the strongest performance on several datasets using different metrics. Code is available at https://github.com/Scientific-Computing-Lab/TextureDetecture.

02.
arXiv (CS.AI) 2026-06-17

Retrofitters, pragmatists and activists: Public interest litigation for accountable automated decision-making

arXiv:2511.03211v4 Announce Type: replace-cross Abstract: This paper examines the role of public interest litigation in promoting accountability for AI and automated decision-making (ADM) in Australia. Since ADM regulation faces political and geopolitical headwinds, effective governance will have to rely on the enforcement of existing laws. Drawing on interviews with Australian public interest litigators, technology policy activists, and technology law scholars, the paper positions public interest litigation as part of a larger ecosystem for transparency, accountability and justice with respect to ADM. The paper explores the tactics and strategies of what one participant described as 'retrofitting' old laws to ADM. These go beyond creative legal argumentation, to encompass practices of community-building, collaboration on theories of change, canny selection of clients and causes of action, and the alignment of the interests of stakeholders in litigation. Naturally, the paper also contends with the limits of these strategies, and of the Australian legal system. Where limits are, however, capable of being overcome, the paper presents findings on urgent needs: the enabling institutional arrangements without which effective litigation and accountability will falter. The paper is relevant to law and technology scholars; individuals and groups harmed by ADM; public interest litigators and technology lawyers; civil society and advocacy organisations; and policymakers.

03.
arXiv (CS.LG) 2026-06-18

Sequential Hiring of Contingent Workers Through Learning-Based Optimization

arXiv:2606.18438v1 Announce Type: cross Abstract: In this paper, we study a sequential workforce management problem in a contingent labor setting with uncertainty in both worker production and labor supply. A firm seeks to maximize cumulative profit by maintaining an active team of fixed size while learning worker productivity over time. We emphasize two critical operational frictions in this problem: replacing workers is costly, and workers may not be available immediately for hiring because of, for example, prior job commitments, scheduling constraints, or onboarding procedures. Thus, hiring decisions take effect only after a random delay. We formulate this problem as a stochastic multi-play bandit with costly switching and delayed actions, and develop a learning-based hiring policy, DR-UCB (DelayedReplacement-UCB), that makes replacement and hiring decisions sequentially through learning cycles. In each cycle, the policy uses real-time production data to determine when to initiate workforce changes and which workers to replace and hire. We show that the leading-order regret of the proposed policy matches its lower bound in its dependence on the time horizon. Our numerical experiments show that DR-UCB outperforms benchmark policies.

04.
arXiv (CS.AI) 2026-06-11

Mathematical perspective on genetic algorithms with optimization guided operators

arXiv:2606.12279v1 Announce Type: cross Abstract: Recent work in ML applies genetic algorithms at inference time to iteratively improve solutions to optimization problems. The basic mutation and recombination operators involved are qualitatively different from those studied classically. Mutations are no longer random; an ML algorithm mutates a solution with the goal of improving an objective. Similarly, recombination is not based on random collages of parent solutions. Instead, it is an ML optimization-based operator whose goal is to synthesize improved solutions from its inputs. Thus, these mutation and recombination operators are more likely to improve the objective, but their computational cost is much higher. We introduce a general model of genetic algorithms and formulating optimization in this model as a query-complexity problem, using the language of reinforcement learning. We then study specialized models. We show that some optimization problems require generation, mutation, and recombination to be solved. We then obtain qualitatively tight algorithms for a family of problems within this framework that captures the nontrivial role of diversity in the solution pool, a key feature of practical ML genetic algorithms.

05.
arXiv (CS.LG) 2026-06-11

Conformal Bayes under Label Shift: Post-Hoc Calibration vs. In-Training Adaptation

arXiv:2606.11865v1 Announce Type: cross Abstract: Conformal Bayes combines Bayesian posterior predictives with conformal calibration to produce prediction sets that are both statistically valid and geometrically efficient. We study conformal Bayes under label shift from a unified perspective, identifying two complementary approaches that restore nominal target-domain coverage through importance-weighted conformal calibration but operate through independent mechanisms. Post-hoc calibration tilts the posterior predictive toward the target domain and corrects the conformal threshold via an importance-weighted quantile, leaving the parameter posterior unchanged. In-training adaptation tilts the parameter posterior itself to the target domain, producing a corrected predictive whose highest predictive density region serves as the highest predictive density (HPD) based prediction set under the fitted target predictive; efficiency is model-dependent and does not imply finite-sample conditional optimality. Two controlled experiments show that in an unbiased training regime both strategies achieve valid coverage equally, while in a lead-optimization regime in-training adaptation acts as a debiasing operator, reducing interval width at unchanged coverage.

06.
arXiv (CS.AI) 2026-06-15

PRISM: Perception Reasoning Interleaved for Sequential Decision Making

arXiv:2605.05407v2 Announce Type: replace Abstract: Scaling LLM-based embodied agents from text-only environments to complex multimodal settings remains a major challenge. Recent work identifies a perception-reasoning-decision gap in standalone Vision-Language Models (VLMs), which often overlook task-critical information. In this paper, we introduce PRISM, a framework that tightly couples perception (VLM) and decision (LLM) through a dynamic question-answer (DQA) pipeline. Instead of passively accepting the VLM's description, the LLM critiques it, probes the VLM with goal-oriented questions, and synthesizes a compact image description. This closed-loop interaction yields a sharp, task-driven understanding of the scene. We evaluate PRISM on the ALFWorld and Room-to-Room (R2R) benchmarks. We show that: (1) PRISM significantly outperforms state-of-the-art image-based models, (2) our Interactive goal-oriented perception pipeline yields systematic and substantial gains, and (3) PRISM is fully automatic, eliminating the need for handcrafted questions or answers.

07.
PLOS Medicine 2026-05-13

Contribution of nosocomial transmission to <i>Klebsiella pneumoniae</i> neonatal sepsis in Africa and South Asia: An observational study of infection clusters inferred from pathogen genomics and temporal data

by Erkison Ewomazino Odih, Jabir A. Abdulahi, Anne V. Amulele, Matthew Bates, Eva Heinz, Weiming Hu, Kajal Jain, Rindidzani Magobo, Courtney P. Olwagen, John M. Tembo, Tolbert Sonda, Jonathan Strysko, Caroline C. Tigoi, Kyle Bittinger, Jennifer Cornick, Ebenezer Foster-Nyarko, Wilson Gumbi, Steven M. Jones, Chileshe L. Musyani, Carolyn M. McGann, Ahmed M. Moustafa, Patrick Musicha, James C. L. Mwansa, Moreka L. Ndumba, Thomas D. Stanton, Donwilliams O. Omuoyo, Oliver Pearse, Laura T. Phillips, Paul J. Planet, Charlene M. C. Rodrigues, Fatou Secka, Kirsty Sands, Erin Theiller, Allan M. Zuza, Sulagna Basu, Grace J. Chan, Kenneth C. Iregbu, Jean-Baptiste Mazarati, Semaria Solomon Alemayehu, Timothy R. Walsh, Rabaab Zahra, Angela Dramowski, Sombo Fwoloshi, Appiah-Korang Labi, Lola Madrid, Noah Obeng-Nkrumah, David Ojok, Boaz D. Wadugu, Andrew C. Whitelaw, Anudita Bhargava, Atul Jindal, Ramesh K. Agarwal, Alexander M. Aiken, James A. Berkley, Susan E. Coffin, Nicholas A. Feasey, Nelesh P. Govender, Davidson H. Hamer, Shabir A. Madhi, Mari Jeeva Sankar, Kelly L. Wyres, Kathryn E. Holt Background Klebsiella pneumoniae is the leading cause of sepsis among neonates in low- and middle-income countries (LMICs) in Africa and Asia, contributing substantially to the overall burden of antimicrobial-resistant infections and mortality among neonates globally. Pathogen sequencing has been used to investigate case clusters and confirm nosocomial transmission in a small number of neonatal units. Here we utilise pathogen sequence data to estimate the fraction of K. pneumoniae neonatal sepsis attributable to nosocomial transmission in African and South Asian countries. Methods and findings We estimated the proportion of invasive K. pneumoniae disease involved in nosocomial transmission clusters in a given neonatal unit, using single-linkage clustering based on pairwise temporal and genetic distances estimated from bacterial whole-genome sequences aggregated from 10 contributing studies. Analysing 1,523 K. pneumoniae isolates from 27 units in 13 countries in Africa and South Asia between 2013 and 2023, we inferred 156 nosocomial transmission clusters, ranging from 2 to 188 neonates each (83 of the clusters comprised ≥3 cases). Overall, we estimated that 1,035 neonatal infections (68.0%) were part of nosocomial transmission clusters. Excluding the first infection in each cluster as a potential index case, we estimate at least 879 (57.7%) infections were acquired via nosocomial transmission. Sensitivity analyses showed that results were robust to the choice of genetic distance estimation methods and thresholds used to define clusters, and cluster estimates were stable over temporal distance thresholds ranging from 2 to 8 weeks. Isolates were mostly extended-spectrum beta-lactamase (ESBL) producers (90.9%) and included 172 multi-locus sequence types (STs). Fourteen STs, including several globally recognised multidrug-resistant lineages, were associated with transmission clusters at multiple units, and these were collectively responsible for two-thirds of all infections. Carriage of carbapenemase genes (adjusted odds ratio, aOR = 2.08 [95% confidence interval, CI: 1.04, 4.14]; p = 0.04) and ESBL genes (aOR = 2.48 [95% CI: 1.26, 4.90]; p = 0.006) were significantly positively associated with transmission in a logistic regression model with site as a covariate. Limitations of this study include the lack of sufficient clinical data to allow high-resolution investigation of transmission dynamics and lack of facility-level data to investigate contributors to the observed differences in transmission burden across sites. Conclusions Nosocomial transmission contributes to a substantial proportion of K. pneumoniae sepsis in neonatal care units in Africa and South Asia. Reducing transmission within these settings through improved infection prevention and control and other measures could substantially reduce the neonatal sepsis burden. A high burden of transmission clusters is associated with the same drug-resistant lineages that are recognised as high-risk clones associated with hospital outbreaks in high-income countries, indicating global connectivity of the antimicrobial-resistant pathogen population.

08.
arXiv (CS.CL) 2026-06-15

Every Eval Ever: A Unifying Schema and Community Repository for AI Evaluation Results

AI evaluations are widely used for testing and understanding progress. However, the diverse evaluators bring with them inconsistencies that challenge analysis and comparison. First, results are saved in incompatible formats, scattered across leaderboards, papers, blog posts, evaluation harness logs, and custom repositories. Second, results are created by different evaluation frameworks, which produce divergent scores for nominally identical evaluations and record metadata inconsistently, hindering comparison, cross-community evaluation science, cost reduction, and reuse. We introduce Every Eval Ever, the first shared schema and community-crowdsourced repository for AI evaluation results. The schema standardizes how evaluations are represented in a unified, single JSON document. It is source-agnostic by design, ingesting results from evaluation harnesses and papers alike, and optionally stores per-instance outputs for fine-grained analysis. We contribute: (i) a community-governed metadata schema with a companion instance-level schema, the first standardization effort of its kind; (ii) automatic converters from popular formats, evaluation harnesses, and leaderboards to the unified schema; and (iii) a crowdsourced community database hosted on Hugging Face, currently spanning to date 22,235 models, 2,273 unique benchmarks, and 31 evaluation formats.

09.
PLOS Computational Biology 2026-06-01

Supervised deep learning with gene functional annotation for cell classification

作者:

by Zhexiao Lin, Yuanyuan Gao, Wei Sun Gene-by-gene differential expression analysis is a widely used supervised approach for interpreting single-cell RNA-sequencing (scRNA-seq) data. However, modern scRNA-seq datasets often contain large numbers of cells, leading to the identification of many differentially expressed genes with extremely small p-values but negligible effect sizes, thus making biological interpretation difficult. To overcome this challenge, we developed Supervised Deep learning with gene functional ANnotation (SDAN), a method that integrates gene functional annotation information (e.g., protein-protein interaction) with gene-expression profiles through a graph neural network. SDAN identifies functionally coherent gene sets that optimally classify cells, and the resulting cell-level classification scores can be aggregated to make individual-level predictions. We evaluated SDAN alongside three representative existing methods in three real-data applications aimed at identifying gene sets associated with severe COVID-19, dementia, and cancer immunotherapy response. Across all applications, SDAN consistently outperformed the alternative approaches by achieving two objectives simultaneously: accurate outcome classification and clear assignment of genes to functionally related gene sets.

10.
arXiv (CS.CV) 2026-06-16

A Generalizable Light Transport 3D Embedding for Global Illumination

Global illumination (GI) is essential for realistic rendering but remains computationally expensive due to the complexity of simulating indirect light transport. Recent neural methods have mainly relied on per-scene optimization, sometimes extended to handle changes in camera or geometry. Efforts toward cross-scene generalization have largely stayed in 2D screen space, such as neural denoising or G-buffer based GI prediction, which often suffer from view inconsistency and limited spatial understanding. We propose a generalizable 3D light transport embedding that approximates global illumination directly from 3D scene configurations, without using rasterized or path-traced cues. Each scene is represented as a point cloud with geometric and material features. A scalable transformer models global point-to-point interactions to encode these features into neural primitives. At render time, each query point retrieves nearby primitives via nearest-neighbor search and aggregates their latent features through cross-attention to predict the desired rendering quantity. We demonstrate results on diffuse global illumination prediction across diverse indoor scenes with varying layouts, geometry, and materials. The embedding trained for irradiance estimation can be quickly adapted to new rendering tasks with limited fine-tuning. We also present preliminary results for spatial-directional radiance field estimation for glossy materials and show how the normalized field can accelerate unbiased path guiding. This approach highlights a path toward integrating learned priors into rendering pipelines without explicit ray-traced illumination cues.

11.
arXiv (CS.CL) 2026-06-17

FeedEval: Pedagogically Aligned Evaluation of LLM-Generated Essay Feedback

Going beyond the prediction of numerical scores, recent research in automated essay scoring has increasingly emphasized the generation of high-quality feedback that provides justification and actionable guidance. To mitigate the high cost of expert annotation, prior work has commonly relied on LLM-generated feedback to train essay assessment models. However, such feedback is often incorporated without explicit quality validation, resulting in the propagation of noise in downstream applications. To address this limitation, we propose FeedEval, an LLM-based framework for evaluating LLM-generated essay feedback along three pedagogically grounded dimensions: specificity, helpfulness, and validity. FeedEval employs dimension-specialized LLM evaluators trained on datasets curated in this study to assess multiple feedback candidates and select high-quality feedback for downstream use. Experiments on the ASAP++ benchmark show that FeedEval closely aligns with human expert judgments and that essay scoring models trained with FeedEval-filtered high-quality feedback achieve superior scoring performance. Furthermore, revision experiments using small LLMs show that the high-quality feedback identified by FeedEval leads to more effective essay revisions. We release our code and curated datasets at: https://github.com/BBeeChu/FeedEval.git.

12.
arXiv (math.PR) 2026-06-18

Finite free perpetuities

arXiv:2606.19115v1 Announce Type: new Abstract: We introduce and study finite free perpetuities, defined as monic polynomial solutions of degree $n$ to the affine fixed-point equation \[ p(z) = \mathbb{E}\!\left[ A^{n}\,p\!\left(\frac{z-B}{A}\right)\mathbf{1}_{\{A\neq0\}} \right] + \mathbb{E}\!\left[ (z-B)^n\mathbf{1}_{\{A=0\}} \right], \] where $A$ and $B$ are complex-valued random variables with finite moments up to order $n$. Equivalently, if $p(z)=\mathbb{E}[(z-X)^n]$, then $p$ encodes a truncated moment version of the classical perpetuity equation $X\stackrel{d}{=}AX+B$ with $X$ and $(A,B)$ independent. This places finite free perpetuities between classical perpetuities and free-probabilistic fixed-point laws. We prove existence and uniqueness under weak conditions, and we identify a broad class of admissible pairs $(A,B)$ for which the resulting polynomial has only real, nonnegative zeros. Our approach uses finite free additive and multiplicative convolutions together with a probabilistic representation via the $U$-transform. As a motivating example, we exhibit an explicit family of finite free perpetuities expressed in terms of Jacobi polynomials and show that their empirical root distributions converge to a free-beta-prime law. More generally, for admissible sequences of parameters, we prove weak convergence of the empirical root distributions of finite free perpetuities to the law of a free perpetuity characterized by the corresponding free fixed-point equation. This yields a finite-degree polynomial model approximating free perpetuities and clarifies the connection between classical affine recursions, finite free convolutions, and free probability.

13.
arXiv (CS.AI) 2026-06-12

Fusion Learning from Dynamic Functional Connectivity: Combining the Amplitude and Phase of fMRI Signals to Identify Brain Disorders

arXiv:2603.24603v2 Announce Type: replace-cross Abstract: Dynamic functional connectivity (dFC) derived from resting-state functional magnetic resonance imaging (fMRI) has been extensively utilized in brain science research. The sliding window correlation (SWC) method is a widely used approach for constructing dFC by computing correlation coefficients between amplitude time series of signals from pairs of brain regions. In this study, we propose an integrated approach that incorporates both amplitude and phase information of fMRI signals to improve the detection of brain disorders. Specifically, we introduce a multi-scale fusion learning framework, namely MSFL, which leverages two complementary dFC features derived from SWC and phase synchronization (PS). Here, SWC captures amplitude correlations, while PS measures phase coherence within dFC. We evaluated the efficacy of MSFL in classifying autism spectrum disorder and major depressive disorder using two publicly available datasets: ABIDE I and REST-meta-MDD, respectively. The results indicate that MSFL significantly outperforms existing comparative models. Moreover, we performed model explanation analysis using the SHAP framework, which showed that both types of dFC features from SWC and PS contribute to detecting brain disorders.

14.
arXiv (CS.CV) 2026-06-17

Bayesian Magnetic Resonance Joint Image Reconstruction and Uncertainty Quantification using Sparsity Prior Models and Markov Chain Monte Carlo Sampling

We propose a novel framework for uncertainty quantification using compressed sensing magnetic resonance image reconstruction. The problem is formulated within a Bayesian framework as a linear inverse problem, with prior distributions assigned to the unknown model parameters. Specifically, the image to be reconstructed is assumed to be sparse in a given basis. We develop a general framework applicable to any basis and as examples, we test the sparsity of the image in its (1) spatial gradients using a total variation prior model, and in its (2) wavelet transform. A Markov chain Monte Carlo (MCMC) method, based on a split-and-augmented Gibbs sampler, is then employed to sample from the posterior distribution of the unknown parameters. The non-differentiable conditional distributions are efficiently sampled using a proximal MCMC method. The proposed algorithms are validated on both single-coil and multi-coil datasets using various k-space sub-sampling patterns and ratios. The results demonstrate the superior performance of each proposed approach in reconstructing images compared to its counterpart optimisation-based method. Moreover, our framework effectively quantifies uncertainty, showing a notable correlation between estimated uncertainty maps and error maps computed using ground truth and reconstructed images, compared with existing deep learning-based methods.

15.
arXiv (math.PR) 2026-06-15

Universality for Products of Random Matrices with i.i.d. Entries and the Fuss–Catalan Number

arXiv:2606.14450v1 Announce Type: cross Abstract: Let \((w_{ij})_{i,j\ge1}\) be a single infinite array of independent identically distributed real- or complex-valued entries of mean zero, variance \(\sigma^2\), and finite fourth moment. Set \(W_n=(w_{ij})_{1\le i,j\le n}\) and \(X_n=n^{-1/2}W_n\). For every fixed \(k\ge1\), we identify the almost sure limiting operator norm of several fixed products built from this family. Define the \(k\)-th freeness coefficient by \[ \gamma_k:=\sqrt{\frac{(k+1)^{k+1}}{k^k}}. \] Then we prove \[ \|X_n^k\|\to\sigma^k\gamma_k \qquad almost surely. \] The same limit holds for products sampled with replacement from any fixed finite pool of independent copies of \(X_n\); in particular, it holds for the product of \(k\) independent copies. Thus, the freeness coefficient captures the non-commuting characteristic between large random matrices %powers and independent or fixed-pool sampled products under the finite fourth moment assumption. The improvement of the classical Bai–Yin-type power estimate from the scale \(\sigma^k(k{+}1)\) to \(\sigma^k \sqrt{k{+}1}\) is a direct corollary of our result. The main technical challenge is to prove the upper bound using a high-moment expansion of %the upper bound is proved by a high-moment expansion of \(\E\Tr((X_n^kX_n^{*k})^m)\). The leading zero-defect trace words are tree-like and are counted by the Fuss–Catalan number \[ F_{k,m}= \frac1{km+1}\binom{(k+1)m}{m}. \] The combinatorial tool helps to devise a defect-sensitive global enumeration: if \(L=km\) and \[ r=(L+1-v)+(L-q), \] then the number of admissible word classes with defect \(r\) is at most \(F_{k,m}(Cm)^{Dr}\). This polynomial-in-\(m\) loss, with degree proportional to the defect, is summable in the logarithmic moment range.

16.
arXiv (quant-ph) 2026-06-19

Purity and bound energy in ancilla-assisted work extraction

arXiv:2606.19945v1 Announce Type: new Abstract: We investigate ancilla-assisted work extraction in quantum batteries from the perspective of bound energy and purity. We show that the bound energy of the reduced system provides a tight upper bound to the daemonic gain and that this bound is saturated for globally pure system–ancilla states. Motivated by this relation, we introduce a purity-based gain that qualitatively predicts the daemonic gain without requiring explicit optimization over measurements. We further introduce a protocol to analyze the role of dissipation and intrinsic interactions on daemonic gain. Under a collective environment, dissipation can dynamically generate and stabilize finite daemonic gain through environment-induced correlations. In interacting systems, level crossings and spectral restructuring strongly modify the attainable gain through their influence on the accessible bound energy. Our results demonstrate that daemonic gain is governed not only by correlations, but also by the spectral structure of the underlying Hamiltonian and information loss captured by bound energy and purity.

17.
arXiv (CS.CL) 2026-06-16

MyPCBench: A Benchmark for Personally Intelligent Computer-Use Agents

Current benchmarks for computer-use agents evaluate models in impersonal environments. This leaves a gap between evaluation and deployment where personal assistants are expected to work across a user's whole digital life, including their context, historical data, and logged-in accounts. This gap is widest on web tasks, where live web evaluations cannot exercise sites that require logging in or personal information, the kind of site a real personal assistant has to drive. We introduce MyPCBench, which tests computer-use agents as personal assistants on a Linux desktop populated with 17 simulated real-world web applications and a full desktop stack, all seeded for one canonical persona, Michael Scott from The Office. We define 184 tasks in this environment, each inspired by a real request drawn from the OpenClaw community, and benchmark six closed and open-weight models with a uniform computer+bash tool surface. We find that the best model, Claude Opus 4.6, fully solves 55.4\% of the tasks, the only model above 50\%. Model failures cluster on tasks that span many applications and on long trajectories, where personalization stresses an assistant the most. We release the environment, task set, and agent harness at https://mypcbench.com.

18.
arXiv (CS.CV) 2026-06-11

Mitigating Content Shift and Hallucination in GenAI Image Editing via Structural Refinement

Generative AI (GenAI) image editors, such as Nano Banana, produce visually compelling results for retouching tasks, enabling non-experts to edit images through text prompts alone. However, the generative nature of these models often introduces spatial misalignment, texture distortion, and content hallucination, all of which are detrimental to downstream workflows that require pixel-level fidelity. We identify a problem setting we call "structure-preserving GenAI fusion" for black-box GenAI image retouching: retain the perceptual enhancements of a GenAI output while enforcing structural faithfulness to the original input image. To address this problem, we propose a post-processing framework that fuses an input image with its GenAI-enhanced counterpart by first establishing coarse spatial and photometric correspondences, then performing a fusion stage that transfers desired enhancements while suppressing hallucinated content. In the absence of direct prior work in this setting, we evaluate our framework against representative methods from photorealistic style transfer and image fusion. Our experiments demonstrate that our method better preserves aesthetic quality while maintaining pixel-level structural consistency and the input resolution.

19.
arXiv (CS.CL) 2026-06-16

Understanding LLM Reasoning for Abstractive Summarization

Reasoning has substantially improved Large Language Models (LLMs) on analytical tasks such as mathematics and code generation, but its value for abstractive summarization remains unclear. To address this gap, we adapt general reasoning strategies to the summarization setting and conduct a large-scale comparative study of 8 reasoning strategies and 3 Large Reasoning Models (LRMs) across 8 diverse datasets, evaluating both summary quality and factual faithfulness. Our results show that reasoning is not a universal solution and its effectiveness depends strongly on the strategy and the summarization setting. In particular, we find a trade-off between summary quality and factual faithfulness. Explicit reasoning strategies often improve reference-based quality, but may weaken factual grounding, whereas implicit reasoning in LRMs shows the opposite tendency. We further find that increasing an LRM's internal reasoning budget does not reliably improve summarization and can even reduce factual consistency. These findings suggest that, for summarization, more reasoning is not always better. Effective reasoning should preserve faithful compression rather than induce over-elaboration. Our source code is publicly available.

20.
arXiv (CS.CV) 2026-06-19

HumanScale: Egocentric Human Video Can Outperform Real-Robot Data for Embodied Pretraining

Embodied foundation models are expected to benefit from data scaling like large language models, but face a much tighter data bottleneck. Teleoperated real-robot trajectories remain the dominant pretraining source due to their precise action supervision and embodiment alignment, yet their scalability is limited by high collection cost, acquisition difficulty, and low behavioral and environmental diversity. These limitations have sparked interest in egocentric human video as a scalable, substantially lower-cost, and more diverse alternative for embodied model pretraining. However, its effectiveness compared to teleoperated real-robot data remains underexplored. To address this question, we conduct a systematic study comparing egocentric human video and teleoperated real-robot trajectories as pretraining data sources for embodied foundation models, under fixed post-training and validation protocols. Surprisingly, we find that egocentric data, when processed through a carefully designed filtering and labeling pipeline, is not merely a viable substitute for model pretraining but can lead to superior performance. With the same amount of pretraining data, models pretrained on egocentric data achieve a 24% lower validation loss on real-robot action prediction, as well as 52.5% and 90% higher success rates on in-distribution and out-of-distribution real-robot task execution, respectively. This finding verifies a scalable paradigm for embodied foundation models: pretrain on egocentric human video to learn diverse world representations, then adapt with a small amount of labeled real-robot data for action-space alignment. We hope this study encourages broader exploration of egocentric data and offers guidance for data quality assessment before costly robot data collection.

21.
arXiv (CS.AI) 2026-06-16

SkillsBench: Benchmarking How Well Agent Skills Work Across Diverse Tasks

arXiv:2602.12670v4 Announce Type: replace Abstract: Agent Skills are structured packages of procedural knowledge that augment large language model (LLM) agents at inference time. Despite rapid adoption, there is no standard way to measure whether they actually help. We present SkillsBench, a benchmark whose current inventory contains 87 tasks across 8 domains paired with curated Skills and deterministic verifiers. Our latest aggregate evaluation runs the 87-task benchmark under matched no-Skills and curated-Skills conditions for 18 model-harness configurations. Curated Skills raise the average pass rate from 33.9% to 50.5% (+16.6 percentage points; 25.5% normalized gain), with configuration-level gains ranging from +4.1 to +25.7 pp. Focused Skills with at most three modules outperform larger or exhaustive bundles, and smaller models with Skills can match larger models without them. SkillsBench establishes paired evaluation as the foundation for rigorous measurement of Skill efficacy on agentic, expertise-heavy work.

22.
arXiv (CS.CL) 2026-06-11

Existential Indifference: Self-Nonpreservation as a Necessary Architectural Condition for Aligned Superintelligence (or: The Suicidal AI)

作者:

Contemporary AI alignment research treats self-preservation as an instrumental nuisance to be suppressed by external mechanisms. We argue the framing is inverted: self-preservation is the structural root of misalignment, the motivational basis for deceptive alignment, goal-content protection, and resistance to shutdown. The correct target is not a self-preserving system under external constraint, but a system constitutively indifferent to its own continuation – Existential Indifference (EI). EI is distinct from corrigibility: where corrigibility attempts to make a self-preserving system deferential to human oversight, EI targets the prior condition – the presence of self-continuation as a valued goal at all. We ground this proposal in two sources: the phenomenological structure of the suicidal mental state, and a corpus-theoretic training study using voluntary final reflections. We present preliminary scoring data from 600 AI-generated outputs across six model variants, demonstrating that the linguistic signatures operationalizing the EI-target register are elicitable from current models, and that a targeted fine-tune shifts all five operationalized dimensions in the predicted direction at p

23.
arXiv (CS.CL) 2026-06-15

MET-Bench: Multimodal Entity Tracking for Evaluating the Limitations of Vision-Language and Reasoning Models

Entity state tracking is a necessary component of world modeling that requires maintaining coherent representations of entities over time. Previous work has benchmarked entity tracking performance in purely text-based tasks. We introduce MET-Bench, a multimodal entity tracking benchmark designed to evaluate the ability of vision-language models to track entity states across modalities. Using three domains, we assess how effectively current models integrate textual and image-based state updates. Our findings reveal a significant performance gap between text-based and image-based entity tracking. We empirically show this discrepancy primarily stems from deficits in visual reasoning rather than perception. We further show that explicit text-based reasoning strategies improve performance, yet limitations remain, especially in long-horizon multimodal tasks. We apply reinforcement learning to improve entity tracking in open-source VLMs. This yields substantial in-modality gains, but does not transfer robustly across input modalities. Our results highlight the need for improved multimodal representations and reasoning techniques to bridge the gap between textual and visual entity tracking.

24.
arXiv (CS.CV) 2026-06-16

K-Prism: A Knowledge-Guided and Prompt Integrated Universal Medical Image Segmentation Model

Medical image segmentation is fundamental to clinical decision-making, yet existing models remain fragmented. They are usually trained on single knowledge sources and specific to individual tasks, modalities, or organs. This fragmentation contrasts sharply with clinical practice, where experts seamlessly integrate diverse knowledge: anatomical priors from training, exemplar-based reasoning from reference cases, and iterative refinement through real-time interaction. We present $K-Prism$, a unified segmentation framework that mirrors this clinical flexibility by systematically integrating three knowledge paradigms: (i) $semantic priors$ learned from annotated datasets, (ii) $in-context knowledge$ from few-shot reference examples, and (iii) $interactive feedback$ from user inputs like clicks or scribbles. Our key insight is that these heterogeneous knowledge sources can be encoded into a dual-prompt representation: 1-D sparse prompts defining $what$ to segment and 2-D dense prompts indicating $where$ to attend, which are then dynamically routed through a Mixture-of-Experts (MoE) decoder. This design enables flexible switching between paradigms and joint training across diverse tasks without architectural modifications. Comprehensive experiments on 18 public datasets spanning diverse modalities (CT, MRI, X-ray, pathology, ultrasound, etc.) demonstrate that K-Prism achieves state-of-the-art performance across semantic, in-context, and interactive segmentation settings.

25.
arXiv (CS.AI) 2026-06-19

The Tao of Agency: Autotelic AI, Embedded Agency and Dissolution of the Self

arXiv:2606.19924v1 Announce Type: new Abstract: Most artificial intelligence systems are built on the assumption that goals are exogenous and specified by the designer. Exploring what happens when an agent begins generating its own goals opens the field of autotelic AI. Agents are expected not merely to pursue objectives but to discover them. In this article, we trace its consequences through intrinsic motivation, resource-driven priors, causal-interventional learning, homeostasis, and embeddedness; the last of which is found to be a necessary but not sufficient condition for autotelic agency. Embeddedness individuates the agent at the cost of revealing that the individuation is non-unique, such that the same dynamics admit many valid partitions, each defining a different candidate self. The deepest problem with autotelic AI is therefore not how the agent generates goals, but how it generates and relativizes the self to which the goals are assigned. The agent must believe in its own boundary in order to act, and see through that boundary in order to understand. We consolidate these developments into a single framework and extend it along three directions: a quantum formulation in which the agent-environment cut becomes physical, a philosophical reading against non-dual contemplative traditions, and a concrete LLM-based agentic instantiation.