Academic Intelligence · Curated Daily

探索全球前沿学术脉络

AcademicHub 汇聚顶级期刊与预印本平台的实时文献。定制您的专属科研雷达,利用大语言模型自动生成交叉领域文献分析简报。

01.
arXiv (quant-ph) 2026-06-16

Quantum Energy Teleportation under Equilibrium and Nonequilibrium Environments

arXiv:2511.01518v3 Announce Type: replace Abstract: Quantum energy teleportation (QET), implemented via local operations and classical communication, enables carrier-free energy transfer by exploiting quantum resources. While QET has been extensively studied theoretically and validated experimentally in various quantum platforms, enhancing energy output for mixed initial states, as the system inevitably interacts with environments, remains a significant challenge. In this work, we study QET performance in a two-qubit system coupled to equilibrium or nonequilibrium reservoirs. We derive an analytical expression for the energy output in terms of the system Hamiltonian eigenstates, enabling analysis of energy output for mixed states. Using the Redfield master equation, we systematically examine the effects of qubit detuning, nonequilibrium temperature difference, and nonequilibrium chemical potential difference on the energy output. We find that the energy output for mixed states often follows that of the eigenstate with the highest population, and that nonequilibrium environments can enhance the energy output in certain parameter regimes.

02.
arXiv (CS.CV) 2026-06-16

Sex-based Network-Specific Differences in Connectomes: A Krakencoder-Based Analysis

This study examines how deficiencies in one brain connectome modality propagate to the other, using the Krakencoder as a simulation framework. Structural and functional connectomes from 702 healthy participants in the Human Connectome Project were analyzed, with the impact of each of the Yeo-7 functional networks assessed separately. Seven scenarios were considered, each involving the removal of a single network while the remaining networks were preserved. The resulting perturbations in cross-modal predictions were quantified using three complementary metrics: KL divergence on eigenvalue spectra, Frobenius norm, and Wasserstein distance. In addition, the persistence of sex-specific information within the predicted connectomes was evaluated. Across all metrics and both prediction directions, the Default Mode Network produced the largest perturbations, whereas the Somatomotor network yielded the smallest. Sex differences in network-level perturbation signatures were subtle, with the best result being an accuracy of 66.09% from connectomes predicted under network-removal conditions. In contrast, connectomes predicted from intact inputs achieved substantially higher sex classification accuracy, reaching up to 84.76%. These findings confirm that full predicted connectomes retain considerably more sex-discriminative information than perturbation-derived signatures alone.

03.
arXiv (CS.LG) 2026-06-19

Flow Map Denoisers: Traversing the Distortion-Perception Plane for Inverse Problems

arXiv:2606.19802v1 Announce Type: new Abstract: Image restoration faces a fundamental tradeoff: methods that minimize error produce blurry reconstructions, while those that maximize perceptual quality yield sharp but less faithful images. Existing approaches either commit to a single operating point on this distortion perception (DP) frontier or require paired-data supervision, auxiliary models, or hyperparameter tuning of the sampler to access different points. We show that flow map models, a recent extension of flow matching for few-step sampling that learns an average field, implicitly define a one-parameter family of denoisers that continuously spans the DP frontier. The lookahead parameter t acts as a control knob between the MMSE and perceptual regimes. For Gaussian targets, we prove that varying t exactly recovers the optimal DP frontier; for natural images, we observe similar behavior empirically. Within a Plug-and-Play solver, the same mechanism extends to general inverse problems, where it controls a tradeoff between perceptual alignment and data consistency. Despite the lack of exact optimality guarantees in this setting, a single trained flow map spans the DP tradeoff, matching or exceeding specialized baselines at both extremes. Extensive experiments on CelebA ($128\times 128$) and AFHQ ($256\times 256$) across several linear and nonlinear inverse tasks validate our findings.

04.
arXiv (CS.CV) 2026-06-17

Bayesian Magnetic Resonance Joint Image Reconstruction and Uncertainty Quantification using Sparsity Prior Models and Markov Chain Monte Carlo Sampling

We propose a novel framework for uncertainty quantification using compressed sensing magnetic resonance image reconstruction. The problem is formulated within a Bayesian framework as a linear inverse problem, with prior distributions assigned to the unknown model parameters. Specifically, the image to be reconstructed is assumed to be sparse in a given basis. We develop a general framework applicable to any basis and as examples, we test the sparsity of the image in its (1) spatial gradients using a total variation prior model, and in its (2) wavelet transform. A Markov chain Monte Carlo (MCMC) method, based on a split-and-augmented Gibbs sampler, is then employed to sample from the posterior distribution of the unknown parameters. The non-differentiable conditional distributions are efficiently sampled using a proximal MCMC method. The proposed algorithms are validated on both single-coil and multi-coil datasets using various k-space sub-sampling patterns and ratios. The results demonstrate the superior performance of each proposed approach in reconstructing images compared to its counterpart optimisation-based method. Moreover, our framework effectively quantifies uncertainty, showing a notable correlation between estimated uncertainty maps and error maps computed using ground truth and reconstructed images, compared with existing deep learning-based methods.

05.
medRxiv (Medicine) 2026-06-22

The Unsteady Return of Command-Following: Recovery and Instability of Bedside Motor Command-Following After Acute Brain Injury

Background/Objective: Following a verbal command marks the bedside transition from unresponsiveness to overt recovery of consciousness after acute brain injury. Its timing across phenotypes, stability once present, and dependence on sedation are uncharacterized at scale. Methods: Retrospective cohort of adults with acute brain injury, first intensive care unit stay, MIMIC-IV. Command-following was the Glasgow Coma Scale motor response "Obeys Commands." Among patients not following commands at admission, cumulative incidence was estimated with death or hospice and discharge without recovery as competing events. Instability was quantified as transient first recovery and threshold crossings; examinations were tagged for concurrent sedation. Principal findings were externally validated in the multicenter eICU Collaborative Research Database. Results: Of 13,900 brain-injured patients with three or more motor examinations, 5,498 (39.6%) were not following commands at admission. The cumulative incidence of first command-following was 43.5% by 24 hours and 65.0% by 14 days, ranging at 14 days from 36.9% in anoxic injury to 77.2% in ischemic stroke (anoxic versus ischemic stroke at 72 hours, difference 0.41; adjusted P = .002). Among 3,573 patients who recovered, the first recovery was transient in 22.2%, and 62.4% crossed the threshold repeatedly. Non-following was strongly associated with sedation, consistent with an arousal-dependent examination. In eICU, the 14-day incidence was 64.8%, and transient first recovery was 22.7%, closely matching the primary cohort. Conclusions: After acute brain injury, overt bedside command-following returns early but unsteadily, with phenotype-dependent timing, threshold fluctuation, and strong dependence on sedation. A single charted observation is an unreliable index of the underlying state.

06.
arXiv (CS.LG) 2026-06-15

AcceRL: A Distributed Asynchronous Reinforcement Learning and World Model Framework for Vision-Language-Action Models

arXiv:2603.18464v3 Announce Type: replace Abstract: Reinforcement learning (RL) for large-scale Vision-Language-Action (VLA) models is severely bottlenecked by synchronization barriers and the high cost of environment data acquisition. To overcome these challenges, we propose AcceRL, a distributed asynchronous RL framework that physically isolates environment rollouts, model inference, and gradient updates. By eliminating the cascading long-tail idle bubbles inherent in synchronous systems, AcceRL maximizes hardware utilization and ensures scalable throughput. Furthermore, AcceRL features a modular design that supports the integration of diverse, plug-and-play world models into its distributed pipeline. Extensive experiments demonstrate that the base framework achieves highly competitive performance across all four LIBERO[liu2023libero] task suites. Systematically, the asynchronous architecture delivers a $2.4\times$ throughput speedup over leading synchronous baselines. Algorithmically, by leveraging a world model pre-trained on 1,000 offline trajectories, AcceRL achieves up to a $200\times$ improvement in online sample efficiency on LIBERO-Spatial, establishing a robust framework that is both sample-efficient and time-efficient for embodied AI. Code is included in the supplementary material. Code is available at https://github.com/distanceLu/AcceRL.

07.
arXiv (CS.LG) 2026-06-18

Wasserstein Policy Learning for Distributional Outcomes

arXiv:2606.19117v1 Announce Type: cross Abstract: Offline policy learning has received growing attention in causal inference. The primary objective is to learn a policy (individualized treatment rule) as a mapping from covariates to treatment that maximizes the empirical welfare defined as the mean of scalar-valued potential outcomes. In this paper, we study offline policy learning with distribution-valued outcomes, where each potential outcome is a probability measure on $\mathbb{R}$ and the reward is defined through a utility functional applied to the Wasserstein barycenter of induced outcome distributions. We establish statistical guarantees for the policy learning framework based on both Inverse Probability Weighting (IPW) and Doubly Robust (DR) estimators. By handling the challenging uniform deviation over the product of the combinatorial policy class and the infinite-dimensional quantile domain, we prove that the finite-sample regret has leading dependence $\widetilde{\mathcal{O}}(\sqrt{\mathrm{N-dim}(\Pi)/N})$. In the one-dimensional Wasserstein setting and under the stated regularity conditions, the leading regret rate is still governed by the policy-class complexity. Moreover, we provide a minimax lower bound establishing the sharpness of the leading dependence on $N$ and $\mathrm{N-dim}(\Pi)$.

08.
arXiv (CS.LG) 2026-06-11

Critic Architecture Matters: Dual vs. Unified Critics for Humanoid Loco-Manipulation

arXiv:2606.11891v1 Announce Type: cross Abstract: Multi-objective reinforcement learning for humanoid robots must coordinate locomotion and manipulation within a single policy. A natural design choice is whether to use a single (unified) critic that estimates the combined value of all objectives, or separate (dual) critics with disjoint reward signals. We present a controlled comparison on the Unitree G1 humanoid (23 active DoF) in NVIDIA Isaac Lab, training loco-manipulation policies through a sequential curriculum spanning 13 levels from stationary reaching to walking with variable-orientation targets. In standardized evaluation, dual-critic policies reach targets 3.5$\times$ faster (6.5 vs. 22.6 simulation steps), achieve 2$\times$ higher throughput (14.3 vs. 7.0 validated reaches per 1,000 steps), and attain higher validated reach rates (65.2% vs. 53.8%) compared to the unified-critic policy. Notably, additional anti-gaming reward mechanisms provide no further improvement beyond the architectural change alone (60.9% vs. 65.2%). These results have direct implications for the emerging paradigm of RL fine-tuning of imitation-learned policies: when refining a pre-trained manipulation policy with RL, a unified critic risks suppressing the learned behavior through competing locomotion gradients. These findings demonstrate that critic architecture is a primary - and often overlooked - design choice in multi-objective humanoid RL, with greater impact than reward engineering on reaching efficiency.

09.
bioRxiv (Bioinfo) 2026-06-14

Somatic variant detection in normal tissues from single-cell sequencing data

A crucial advantage of single-cell sequencing (SCS) is its ability to identify somatic variants in individual cells, enabling phylogenetic analysis of cellular populations within bulk tissues. While identifying somatic variants in tumor tissues via SCS has become a common practice, doing so in normal tissues remains challenging due to the rarity of somatic variants in normal cells. To evaluate the feasibility of somatic variant calling from widely available single-nucleus RNA-seq (snRNA-seq) and single-nucleus ATAC-seq (snATAC-seq) data, we profiled a Cell-line mix of six HapMap samples prepared by the SMaHT consortium using 10x Genomics 5' snRNA-seq (12k cells with 36k mean reads per cell) and snATAC-seq (11k cells with 14k median high-quality fragments per cell) for variant calling. PacBio long-read whole genome sequencing (WGS) data (109x) generated from individual cell lines were used as ground truth. Two computational tools, Monopogen and SComatic, were used for somatic variant calling from the SCS data. Monopogen achieved single nucleotide variant (SNV) detection accuracies of 93.30% in the snRNA-seq and 99.64% in the snATAC-seq data, both of which outperformed SComatic (74.35% and 94.29%, respectively). Monopogen also consistently detected somatic SNVs at cellular fractions as low as 0.5% (2.54% in snRNA and 0.81% in snATAC) in individual samples. Notably, snATAC-seq exhibited higher genomic coverage breadth and larger number of variants detected than snRNA-seq. While the SCS data have lower overall genome coverage than that of the bulk WGS, the single-cell level variant resolution allows Monopogen to assign variants to their cells of origin with over 80% accuracy in both RNA and ATAC modalities, thereby facilitating studies of clonal evolution and cell-type-specific mutagenesis. Other benchmarking methods were also evaluated (DeepVariant, Cellsnp-lite and Mutect2) for comparison. In conclusion, our study demonstrated the feasibility of performing reliable single-cell somatic mutation calling in a cell-line mixture and discussed the strengths and limitations of current computational methods when applied to normal tissues.

10.
arXiv (CS.AI) 2026-06-11

INFRAMIND: Infrastructure-Aware Multi-Agent Orchestration

arXiv:2606.11440v1 Announce Type: new Abstract: Existing multi-agent LLM orchestration methods, ranging from brute-force ensembles to learned routers, select models and topologies based on task and model features. However, these methods do not consider the runtime state of the serving infrastructure. On shared GPU clusters under concurrent load, this infrastructure blindness causes systematic resource underutilization: preferred models accumulate deep request queues while equally capable alternatives sit idle. In multi-agent pipelines, where each query triggers multiple sequential model calls, these delays then compound across every downstream step. Closing this gap is challenging because the relevant infrastructure signals (queue depths, KV-cache pressure, latencies) are dynamic and noisy, and they must drive three different decisions: planning, per-step routing, and scheduling. We introduce INFRAMIND, a framework that makes the entire multi-agent stack infrastructure-aware. An infra-aware planner conditions topology and role selection on real-time system load and remaining budget, biasing toward simpler graphs under congestion and richer ones at low load. An infra-aware executor then observes per-model queue depths, cache utilization, and response latencies at each agent step to decide which model to call and how deeply to reason; a budget-aware scheduler further reorders each model's queue so that urgent requests are served first. Cast as a hierarchical constrained MDP and solved end-to-end via reinforcement learning, the system learns to balance quality against latency automatically. Across five benchmarks, INFRAMIND delivers up to +7.6 pp accuracy over the prior baseline at low load with up to 7x lower latency, and sustains up to 99.9% SLO compliance under high load where every baseline drops below 50%.

11.
arXiv (CS.AI) 2026-06-17

SkillChain-Gym: A Benchmark for Reskilling-Aware Production-Inventory Control under Disruptions

arXiv:2606.17266v1 Announce Type: new Abstract: Production planning increasingly has to treat workforce capability as a decision variable: certifications lapse when skills are not maintained, new products require skills the current workforce does not hold, and reskilling competes for the same worker hours needed for production. Existing operations benchmarks usually treat labor as exogenous, while workforce-planning models with skills and learning are rarely released as reusable testbeds. We introduce SkillChain-Gym, a benchmark specification for reskilling-aware production-inventory control: a single-site environment with stylized worker skill-state dynamics, hard threshold certification, forgetting, and capacity-consuming training actions constrained by the same per-worker time budget as production. The benchmark includes seed-controlled disruption scenarios, three feasibility modes with projection diagnostics, deterministic replay, and metrics covering operations, resilience, capability growth, and training-access distribution. We evaluate production-only, reactive adaptive, water-filling adaptive, and static-insurance policies with budget variants over 60-shift horizons with paired statistical tests. The results are regime-dependent rather than a ranking. Training-capable policies dominate the production-only baseline, and maintenance training is necessary under forgetting even without disruptions. Among training-capable classes, adaptive training helps when bottlenecks are visible in the forecast, while a lean static cross-training plan, a deliberately favorable comparator whose structure encodes relevant skill contingencies, acts as strong insurance under surprise shocks and absenteeism. Capacity slack and the forgetting rate govern the boundary between these regimes. No policy class dominates across regimes, motivating forecast-driven controllers that decide when to buy skill insurance and when to react.

12.
arXiv (quant-ph) 2026-06-11

Diffusive Relaxation of Participation Entropy in U(1)-symmetric Dynamics

arXiv:2606.11561v1 Announce Type: new Abstract: Participation entropy (PE) quantifies the spread of a many-body wavefunction across configuration space. While PE relaxes rapidly in generic chaotic systems, we show that $\mathrm{U}(1)$ conservation laws slow it down by imprinting with the slow hydrodynamic modes. Using a cluster expansion around equilibrium, we show that, after local density inhomogeneities decay, the leading PE deficit is dominated by squared connected density correlations. The long time relaxation is therefore controlled by diffusive correlation spreading, giving $\Delta S(t)\sim t^{-1/2}$ in the hydrodynamic regime and crossing over to $\sim \exp[-O(t/L^2)]$ when $t\geq L^2$. We confirm this entropy correlation relation using exact computation and infinite system tensor network simulations in various quantum $\mathrm{U}(1)$ conserving circuits. Our results establish PE as a sensitive probe of hydrodynamic memory and suggest that slow relaxation is a generic consequence of conservation laws.

13.
arXiv (math.PR) 2026-06-17

Full $\Gamma-$expansion for the level-two large deviation rate functionals of non-reversible one-dimensional diffusions with periodic boundary conditions

arXiv:2606.17859v1 Announce Type: new Abstract: Consider the diffusion process \begin{equation*} dX_{\epsilon}(t) = \mss b(X_{\epsilon}(t)) \, dt + \sqrt{2\, \epsilon\, \mss a(X_\epsilon(t))} \, dW_{t}, \end{equation*} on the one-dimensional torus $\bb T = [0,1)$. Here $\epsilon$ is the temperature, $W_{t}$ a Brownian motion on $\bb T$ and $\mss a$, $\mss b$ functions of class $C^{2}(\bb T)$ satisfying further conditions. Denote by $\mss P(\bb T)$ the set of probability measures on $\bb T$ equipped with the weak topology, and by $\ms I_{\epsilon}\colon \mss P(\bb T)\to [0,+\infty)$ the level two large deviation rate functional of the diffusion $X_{\epsilon}(\cdot)$. We derive a full $\Gamma-$expansion of $\ms I_{\epsilon}$, as $\epsilon \to 0$, expressing it as \begin{equation*} \ms I_{\epsilon} = \frac{1}{\epsilon} \;\ms J^{(-1)} \; +\; \ms J^{(0)} \;+\; \sum_{p=1}^{\widehat{\mf q}}\frac{1}{\theta^{(p)}_{\epsilon}}\;\ms J^{(p)}\,, \end{equation*} where $\ms J^{(-1)}$, $\ms J^{(0)}$, $\ms J^{(p)} \colon \mss P(\bb T)\to [0,+\infty]$ represent rate functionals, independent of $\epsilon$, and $\theta^{(p)}_{\epsilon}$ are the time-scales at which the Markov process $X_{\epsilon}(\cdot)$ exhibits a metastable behaviour.

14.
arXiv (CS.CL) 2026-06-12

Detect, Remask, Repair: Diffusion Editing for Faithful Summarization of Evolving Contexts

Summaries of real-world events can become outdated as contexts evolve and new information arrives. A common response is to generate a new summary from the updated context, but full regeneration discards the previous draft, can obscure what changed, and may be unnecessary when only a few claims are unsupported. We study localized faithfulness repair: updating outdated spans in an existing summary while preserving supported content. We propose DETECT-REMASK-REPAIR, a diffusion-based framework that identifies, remasks, and repairs outdated regions with masked diffusion language models. To evaluate evolving-context summarization, we introduce StreamSum, a benchmark of synthetic event timelines. Experiments on DialogSum and StreamSum show that localized diffusion repair provides a controllable alternative to full rewriting: faithfulness-steered repair improves early drafts, one-step repair reduces repair cost to under half a second, with the framework enabling faithfulness-speed-preservation tradeoffs across datasets. We also find that the framework can provide a post-hoc correction step that improves faithfulness for autoregressive systems.

15.
arXiv (CS.LG) 2026-06-17

Data augmented bootstrap: Unifying confidence interval construction by approximate invariance

arXiv:2606.09049v2 Announce Type: replace-cross Abstract: We propose the data augmented bootstrap (DAB), a framework for constructing confidence intervals from approximately invariant transformations of the data. As special cases, DAB recovers popular methods that rely on exact group symmetries, such as conformal prediction, wild bootstrap for Maximum Mean Discrepancy U-statistics and the recently proposed SymmPI. Meanwhile, DAB also recovers the classical bootstrap method, which exploits the dataset's approximate invariance under uniform sampling of data indices as the dataset size grows. For all DAB methods, we establish theoretical coverage results that interpolate between finite-sample and asymptotic guarantees according to the strength of the invariance, and without assuming a group structure. The approximate invariance is measured in the Kolmogorov distance and, for statistics that satisfy Gaussian universality, reduces to conditional mean and variance matching. This allows us to incorporate data augmentation (DA), a widely used machine learning heuristic based on approximate invariances, into known statistical methods. We empirically test the performance of incorporating DA into bootstrap, wild bootstrap and conformal prediction for simulated settings as well as for image, language and scientific data.

16.
bioRxiv (Bioinfo) 2026-06-11

Pillbox: A Leakage-Aware Foundation-Model Predictor and Lineage-Ceiling Diagnostic for Cancer Drug Response

We present Pillbox, a predictor whose pipeline is audited against the six Asiaee leakage modes with the one residual pathway shown by per-fold ablation to be non-load-bearing on hard splits. Our model combines CpGPT methylation embeddings, CLAMP drug embeddings, and per-fold-fit gene-expression principal components which are fused by Feature-wise Linear Modulation (FiLM)-conditioned graph attention on the STRING v12 protein-protein interaction graph. Then we alpha-ensemble the model against a histogram-based gradient boosting regressor baseline. On GDSC GSE68379 (987 cell lines, 375 drugs) across seeds 42, 7, and 123, the ensemble reaches test R-Squared of 0.78, 0.77, and 0.76 on random, histology-blind, and site-blind splits respectively, with cell-aware lifts above the drug-mean floor of +0.054, +0.060, and +0.037. As a quantitative diagnostic for feature-stack saturation we propose the cross-architecture residual correlation, calibrated against a same-architecture-different-initialization control. On histology-blind splits the cross-architecture value of 0.939 falls short of the same-architecture ceiling of 0.974 by approximately 0.03 in residual correlation, a gap we interpret as the headroom available to architecture choice on top of the current foundation-model representation and consistent with the long-established observation that tissue lineage dominates cell-line drug response. We integrated curated mutation, methylation, and drug-target-expression channels, but these do not improve prediction once foundation-model embeddings are in place. Cross-screen validation against PRISM matches the GDSC-to-PRISM measurement reproducibility ceiling within 0.01 Spearman.

17.
arXiv (CS.AI) 2026-06-11

FreeBridge: Variational Schrödinger Bridges for Cellular Transition Dynamics

arXiv:2606.11286v1 Announce Type: cross Abstract: High-content imaging assays quantify cellular responses to chemical and genetic perturbations, yet continuous trajectories of individual cells are unobservable because cells are chemically fixed at acquisition. Perturbation modeling therefore reduces to inferring stochastic transport between control and treated populations observed only as separate marginals. While recent generative models achieve strong end-point alignment, boundary consistency does not determine intermediate evolution: multiple stochastic processes may connect identical marginals while traversing regions unsupported by observed single-cell morphologies. We introduce FreeBridge, a Schrödinger Bridge formulation for single-cell transition modeling under endpoint-only supervision. FreeBridge defines atomic states as instance-segmented single-cell representations, establishing a fixed cellular manifold, and learns stochastic transport constrained within this geometry via empirical latent support regularization. Across BBBC021, RxRx1, and JUMP, FreeBridge maintains competitive or improved endpoint fidelity and mechanism-of-action retention under a unified evaluation protocol; on BBBC021, it further reduces intermediate support violations. These findings highlight the importance of geometric grounding for biologically interpretable perturbation dynamics. Project page: https://y-research-sbu.github.io/FreeBridge/.

18.
arXiv (CS.CL) 2026-06-12

MentalMARBERT: Domain-Adaptive Pre-training and Two-Stage Fine-Tuning for Arabic Mental Health Disorders Detection

Detecting mental health disorders from Arabic social media text remains challenging due to dialectal variation, informal language, limited high-quality annotated resources, and severe class imbalance. While English mental health natural language processing (NLP) has progressed substantially, Arabic multi-class disorder classification remains insufficiently studied. This study proposes a two-phase framework for Arabic mental health text classification. In phase 1, three Arabic pre-trained language models, AraBERT, CAMeLBERT, and MARBERT, undergo Domain-Adaptive and Task-Adaptive Pretraining (DAPT and TAPT) using a large-scale corpus of unlabeled Arabic mental health tweets. The adapted models are evaluated under a unified protocol to identify the most effective backbone model. In phase 2, the selected model is assessed across four configurations combining single-stage and hierarchical two-stage classification architectures with full fine-tuning and Low-Rank Adaptation (LoRA). To support this study, we constructed a novel annotated Arabic mental health dataset comprising 50,670 tweets across six categories, with strong inter annotator agreement (Krippendorff's Alpha = 0.733, average pairwise agreement = 0.797). Experimental results show that the domain-adapted MARBERT (MentalMARBERT) achieves statistically significant improvements over baseline models in both accuracy and macro-F1. The hierarchical two-stage architecture combined with full fine-tuning achieves the best overall performance, reaching a macro-F1 of 0.861 and an accuracy of 0.877. These findings demonstrate the effectiveness of domain-specific adaptive pretraining and hierarchical classification for Arabic mental health disorder detection.

19.
arXiv (CS.AI) 2026-06-19

The Tao of Agency: Autotelic AI, Embedded Agency and Dissolution of the Self

arXiv:2606.19924v1 Announce Type: new Abstract: Most artificial intelligence systems are built on the assumption that goals are exogenous and specified by the designer. Exploring what happens when an agent begins generating its own goals opens the field of autotelic AI. Agents are expected not merely to pursue objectives but to discover them. In this article, we trace its consequences through intrinsic motivation, resource-driven priors, causal-interventional learning, homeostasis, and embeddedness; the last of which is found to be a necessary but not sufficient condition for autotelic agency. Embeddedness individuates the agent at the cost of revealing that the individuation is non-unique, such that the same dynamics admit many valid partitions, each defining a different candidate self. The deepest problem with autotelic AI is therefore not how the agent generates goals, but how it generates and relativizes the self to which the goals are assigned. The agent must believe in its own boundary in order to act, and see through that boundary in order to understand. We consolidate these developments into a single framework and extend it along three directions: a quantum formulation in which the agent-environment cut becomes physical, a philosophical reading against non-dual contemplative traditions, and a concrete LLM-based agentic instantiation.

20.
bioRxiv (Bioinfo) 2026-06-10

APOSM: Pairwise preference learning improves generative small-molecule design

Small-molecule lead refinement is constrained by the cost of synthesizing and assaying candidates, making the surrogate models that prioritize compounds for experimental testing central to the design process. The reliability of such surrogates is limited by the noise and sparsity of screening measurements. We show that training the surrogate on pairwise comparisons between candidate molecules, rather than on absolute predicted scores, yields a substantially more reliable signal for active candidate selection in this regime. We develop APOSM, an active-learning algorithm that combines a fragment-based generator, a pairwise message-passing graph neural network surrogate, and probabilistic ranking inside a batched acquisition loop. On the Practical Molecular Optimization benchmark and a GPCR ligand rediscovery task, APOSM improves target attainment and sampling efficiency over unguided fragment-based optimization, the Graph-GA genetic algorithm, and a pointwise-regression ablation, with the largest gains on tasks where absolute scores are hardest to calibrate.

21.
arXiv (CS.AI) 2026-06-16

EEG-FM-Bench: A Comprehensive Benchmark for the Systematic Evaluation and Diagnostic Analyses of EEG Foundation Models

arXiv:2508.17742v3 Announce Type: replace-cross Abstract: Electroencephalography foundation models (EEG-FMs) have advanced brain signal analysis, but the lack of standardized evaluation benchmarks impedes model comparison and scientific progress. Current evaluations rely on inconsistent protocols that render cross-model comparisons unreliable, while a lack of diagnostic analyses obscures the internal mechanisms driving transfer efficiency and scaling behaviors. To address this, we introduce EEG-FM-Bench, a unified system for the standardized evaluation of EEG-FMs. The benchmark integrates 14 datasets across 10 paradigms and incorporates diverse experimental settings, including multiple fine-tuning strategies, task organizations, and classifier configurations, supported by tools for gradient and representation analysis. Our experiments and analysis reveal several critical insights: (1) multi-task learning often acts as a useful regularizer that mitigates overfitting in data-scarce EEG contexts, although negative transfer can arise under specific task paradigms; (2) pre-training efficiency is currently limited by gradient conflicts between reconstruction objectives and downstream tasks; (3) under released checkpoints and a matched downstream protocol, model or data scale alone does not fully explain transfer performance, while objective alignment, adaptation compatibility, and EEG-specific design appear to be important factors. This benchmark enables fair comparison and reproducible analysis, providing a step toward fairer comparison and more interpretable analysis of EEG-FMs. Code is available at https://github.com/xw1216/EEG-FM-Bench.

22.
arXiv (CS.AI) 2026-06-16

DynaDebate: Breaking Homogeneity in Multi-Agent Debate with Dynamic Path Generation

arXiv:2601.05746v2 Announce Type: replace Abstract: Recent years have witnessed the rapid development of Large Language Model-based Multi-Agent Systems (MAS), which excel at collaborative decision-making and complex problem-solving. Researchers have further investigated Multi-Agent Debate (MAD) frameworks, which enhance the reasoning and collaboration capabilities of MAS through information exchange and debate among multiple agents. However, existing approaches often rely on unguided initialization, causing agents to adopt identical reasoning paths that lead to the same errors. As a result, effective debate among agents is hindered, and the final outcome frequently degenerates into simple majority voting. To solve the above problem, we introduce Dynamic Multi-Agent Debate (DynaDebate), which enhances the effectiveness of multi-agent debate through three key mechanisms: (1) Dynamic Path Generation and Allocation, which employs a dedicated Path Generation Agent to generate diverse and logical solution paths with adaptive redundancy; (2) Process-Centric Debate, which shifts the focus from surface-level outcome voting to rigorous step-by-step logic critique to ensure process correctness; (3) A Trigger-Based Verification Agent, which is activated upon disagreement and uses external tools to objectively resolve deadlocks. Experiments show that DynaDebate achieves superior or highly competitive performance across the majority of benchmarks\footnote{The code is at https://github.com/nwpuLee2021/brianstorm.}.

23.
arXiv (CS.AI) 2026-06-19

Human Universal Grasping

arXiv:2606.17054v1 Announce Type: cross Abstract: Humans can grasp objects effortlessly, whereas multi-fingered robots are far from this level of generality. We argue that the most natural source of robot grasping data is from humans, who pick up thousands of objects every day. We present HUG, a flow-matching model that generates diverse human grasps for any user-specified object in a single RGB-D image captured from a stereo camera. Using smart glasses, we first collect 1M-HUGs, an egocentric dataset of human grasps spanning 1M frames (27.8 hrs) and 6,707 object instances across 41 buildings. Next, to model the distribution of natural human grasps, our novel flow-matching model fuses RGB and depth observations to output a grasp parameterized by wrist translation, wrist rotation, and MANO hand pose. Predicted grasps can be retargeted to various robot hands, enabling zero-shot grasping in everyday scenes. To standardize evaluation, we build a new simulated benchmark, HUG-Bench, of 90 unseen objects from five geometric categories and various sizes, with metric-scale 3D meshes. We evaluate HUG in the real world on the 30-object test set of HUG-Bench across multiple stereo cameras, robot embodiments, and household environments. HUG outperforms the state-of-the-art grasping baselines by +23% and +34% on our challenging object set. Code, data, benchmark, checkpoints, and an interactive demo are released on our website: https://grasping.io/

24.
medRxiv (Medicine) 2026-06-10

Human-centred design approaches to health facility design: Evidence from perinatal care settings in Ethiopia and Bangladesh

While significant progress has been made in perinatal outcomes over recent decades in low- and middle-income countries (LMICs), maternal and newborn quality improvement initiatives often fail to account for the spatial conditions in which they are implemented. Health systems are increasingly deploying evidence-based care models into built environments that are not optimally structured to meet the needs of its patient population. As the principal users, patients and health care workers can offer pragmatic insights about improving these structural designs. Our objective was to gather insights from patients, providers, and companions about how the physical design of their health facilities influenced their experience receiving or delivering perinatal care. We conducted a prospective observational study using a human-centred design (HCD) approach to analyse perceptions of the quality of perinatal care across two low resource settings: Ethiopia and Bangladesh. Using engagement and assessment tools, we conducted interviews, focus groups, facility walk-throughs, co-design workshops, and infrastructural assessments with patients, companions, providers, and Ministry of Health representatives. Descriptive statistics and thematic analysis were used to identify key learnings and develop recommendations. Across both countries, participants identified the need for facility layouts that better support privacy, mobility during labour, alternative birth positions, companion involvement, cultural and religious practices, sanitation, and provider visibility. Based on these insights, we developed six recommendations to better align health facility infrastructure with maternal and newborn care delivery needs. Our findings suggest that investments in health facility infrastructure may improve care experiences and help enable respectful, safe, and evidence-based maternal and newborn care. Alongside targeted spatial improvements, government authorities responsible for health facility planning should incorporate participatory design processes to ensure infrastructure reflects the needs of patients, companions, and providers and supports high-quality care delivery.

25.
arXiv (CS.AI) 2026-06-16

Skill-to-LoRA: From Using Skills to Learning Behaviors for Token-Efficient LLM Agents

arXiv:2606.16769v1 Announce Type: new Abstract: Agent skills are commonly distributed as SKILL.md files: human-readable procedural documents that describe workflows, tools, resources, and domain conventions. While convenient for inspection and reuse, this design requires the same reusable procedure to be repeatedly injected into the runtime context. We propose Skill-to-LoRA(S2L), a behavior-centric skill representation that replaces runtime skill text with skill-specific LoRA adapters. Rather than compressing the skill document itself, S2L models the behavioral change induced by the skill text: offline, the complete SKILL.md is used to synthesize skill-guided demonstrations; online, the full document is omitted and the corresponding LoRA adapter is dynamically loaded to activate the learned skill behavior. We evaluate S2L with Qwen3.6-27B on a 21-skill subset of SWE-Skills-Bench. Compared with the no-skill and Full Skill Text baselines, S2L improves pass rate by 2.9 and 5.2 percentage points, respectively, while reducing per-step token cost by 6.6% relative to Full Skill Text prompting. S2L matches or improves Full Skill Text on 18/21 skills and the no-skill baseline on 15/21 skills. Control experiments further show that the gains depend on skill-specific adapter alignment: Wrong-LoRA and Shared-LoRA both reduce performance. These results suggest that many procedural agent skills can be converted from runtime instructions into trainable, dynamically loadable behavioral modules. Code will be released upon acceptance.