Academic Intelligence · Curated Daily

探索全球前沿学术脉络

AcademicHub 汇聚顶级期刊与预印本平台的实时文献。定制您的专属科研雷达,利用大语言模型自动生成交叉领域文献分析简报。

01.
Nature (Science) 2026-06-10

Measurement of reactor neutrino oscillation with the first JUNO data

Neutrino oscillations (see refs. 1,2 and references therein), a quantum effect manifesting at macroscopic scales, are governed by lepton flavour mixing angles and neutrino mass-squared differences3 that are fundamental parameters of particle physics, representing phenomena beyond the Standard Model. Precision measurements of these parameters are essential for testing the completeness of the three-flavour framework, determining the mass ordering of neutrinos and probing possible new physics. The Jiangmen Underground Neutrino Observatory (JUNO)4 is a 20-ktonne liquid-scintillator detector located 52.5 km from multiple reactor cores, designed to resolve the interference pattern of reactor neutrinos with sub-percent precision5,6. Here we report, using the first 59.1 days of data collected since detector completion in August 2025, the first simultaneous high-precision determination of two neutrino oscillation parameters, $${\sin }^{2}{\theta }_{12}=0.3092\,\pm \,0.0087$$ and $$\Delta {m}_{21}^{2}=(7.50\,\pm \,0.12)\times 1{0}^{-5}\,{\mathrm{eV}}^{2}$$ for the normal mass ordering scenario, improving the precision by a factor of 1.6 relative to the combination of all previous measurements. These results advance the basic understanding of neutrinos, validate the design of the detector and indicate the readiness of JUNO for resolving the neutrino mass ordering with a larger dataset. The rapid achievement with a short exposure highlights the potential of JUNO to push the frontiers of precision neutrino physics and paves the way for its broad scientific programme. The first data of the Jiangmen Underground Neutrino Observatory deliver high-precision neutrino oscillation parameters, improving measurements and demonstrating readiness to determine neutrino mass ordering.

02.
arXiv (CS.CV) 2026-06-12

Goal2Pixel: Grounding Goals to Pixels for Vision-Language Navigation

Vision-language models (VLMs) have become a common foundation for vision-and-language navigation in continuous environments (VLN-CE). Yet most VLM-based methods cast navigation as low-level action prediction, an interface that is ambiguous, tied to short-horizon motion primitives, and inefficient due to repeated VLM querying. We propose Goal2Pixel, a pure pixel-based paradigm that reformulates VLN-CE as navigable pixel grounding. Rather than predicting actions, Goal2Pixel uses the image plane as a unified spatial interface between VLM reasoning and robot motion: the model predicts a visible navigable pixel to the agent, which is back-projected into a 3D waypoint for forward navigation. For non-forward actions, we append auxiliary directive regions to the image plane, where the left/right/bottom regions are interpreted as turning left, turning right, and stopping, respectively. To enable long-horizon navigation, we propose a visibility-aware keyframe memory for compact and informative history representation. To adapt pretrained VLMs to navigable pixel grounding, we introduce semantic embeddings and coordinate-aware auxiliary losses. Goal2Pixel achieves competitive state-of-the-art performance while requiring fewer VLM inference calls than prior methods. On R2R-CE Val-Unseen it achieves 54.1% SR and 52.5% SPL with just 7.75 VLM calls per episode, 6x fewer than the 46.62 required by direct action prediction at 32.9% SR. The same trend holds on RxR-CE.Project Page: https://baobao0926.github.io/Goal2Pixel/.

03.
bioRxiv (Bioinfo) 2026-06-20

Ribosomes are covered by a coat of flexible protein fragments

Ribosomal proteins contain flexible terminal regions that are averaged out during electron density reconstructions, rendering them absent from experimental models derived by X-ray crystallography or cryogenic electron microscopy. These flexible protein fragments (FPFs) collectively form an invisible coat on the ribosome surface whose presence has been systematically overlooked. Here we analysed FPFs from 36 ribosomes spanning bacteria, eukaryotes, and mitochondria. We found that mitoribosomes harbour the most numerous and longest FPFs. Structural predictions confirmed that FPFs are predominantly disordered across all ribosome classes. Comparison of FPF amino acid composition against proteome-wide background frequencies revealed strong and domain-specific compositional biases. The balance between arginine and lysine content tracks the cardiolipin content of the membrane each ribosome class contacts. The arginine enrichment in mitoribosomal FPFs may additionally reflect selection arising from the RNA-rich environment of mitochondrial RNA granules, membraneless condensates where mitoribosomes are assembled. FPFs are uniformly depleted in aromatic residues, arguing against protein-driven liquid–liquid phase separation propensity. Our findings suggest that the flexibly tethered coat is a highly functional intrinsic part of all ribosomes.

04.
bioRxiv (Bioinfo) 2026-06-19

Evaluation of analysis modes for RNA coexpression in single-cell and bulk tissue

Coexpression of transcripts presents the most common means of computational inference of transcription factor regulation, and is often combined with other data types to infer regulatory networks. With the growing popularity of single-cell approaches, there are questions about how best to extract coexpression information from the data. Recently we reported a simulation study that explored the differences among coexpression performed at different levels: across single cells (xCell, per cell type), across subjects from pseudobulked single-cell data (xSubject, per cell type), or across subjects using bulk tissue samples (xBulk). Here we test predictions made by those models using real data. We consider both preservation (consistency of coexpression findings across different levels of analysis of the same data) and replicability across independent studies, as well as biological interpretability. We find that preservation across levels is limited, indicating the choice of analysis level will affect outcomes. We show that xCell coexpression is more replicable across studies compared to xSubject. xBulk coexpression is dominated by patterns driven by variability in cellular composition and fails to capture much coexpression that is reliably detected at finer resolutions. While all modes of analysis exhibit some enrichment for known regulatory relationships, it was highest with the xCell mode. Finally, we present a case study of the effect of analysis modes on a schizophrenia-associated pattern, reinforcing the importance of analytic choices in the interpretation and replicability of coexpression analyses. Together with our modeling study, this work emphasizes the importance of understanding sources of expression covariation as they relate to the goals of the analysis, and recommend single-cell-based data with biological replicates should be the focus of attempts to infer dynamic regulatory interactions that are more likely to be replicable by others.

05.
medRxiv (Medicine) 2026-06-17

Efficacy of a Gamified Digital Platform for Substance Use Education and Overdose Prevention Among College Students: a Pilot and Feasibility Study

Background: For US young adults aged 18-25 in the 2018-2024 period, fentanyl was involved in 78.2% of the 44,020 unintentional or undetermined-intent overdose deaths, most often co-involving stimulants and other non-opioid substances. While fatal overdose rates in this age group have fallen to their lowest recorded level, emergency medical services-attended non-fatal overdose events have reached record highs, shifting the decisive variable toward bystander recognition and response. College students report near-universal alcohol education but minimal education on the substances actually driving overdose mortality. Methods: We conducted a single-group pre-post evaluation of the DopaGE Portal, a gamified, mastery-based digital platform covering cocaine, MDMA, benzodiazepines, and opioid overdose response, deployed at a public university (UNL) and a multi-campus volunteer network (TACO). Paired pre/post surveys (N=42) measured self-efficacy (7 items; primary), behavioral intentions, risk perception, and knowledge/attitudes on 5-point scales, plus four factual knowledge questions. Paired t-tests, exact McNemar tests, and Benjamini-Hochberg correction across eight primary tests were applied. Institutional naloxone distribution at UNL was tracked as an ecological behavioral outcome. A mandated high-school cohort (N=94) provided supplementary acceptability data. Results: Self-efficacy increased from 2.82 to 4.46 (d=2.00, 95% CI 1.46-2.55; adjusted p

06.
arXiv (CS.LG) 2026-06-19

On the Oracle Complexity of Interpolation-Based Gradient Descent

arXiv:2606.19878v1 Announce Type: new Abstract: Recent work on first-order optimizers for empirical risk minimization (ERM) has suggested that smoothness of ERM loss functions in the training data, rather than in the optimization parameters, can be leveraged to improve the oracle complexity of gradient descent (GD) methods. In this paper, we propose an inexact gradient method, piecewise polynomial interpolation-based gradient descent (PPI-GD), which approximates the full gradient in each iteration by querying the first-order oracle at equidistant points in the data domain to construct polynomial interpolants of the resulting gradient samples over appropriately sized patches of the data domain. We analyze the oracle complexity of PPI-GD for strongly convex and non-convex loss functions when the data space dimension is bounded by a polylogarithmic function of the number of training samples, and find it to outperform several GD variants in key regimes when the loss function is sufficiently smooth. Furthermore, our analysis extends several techniques from the error analysis of bicubic spline interpolants to the setting of $d$-variate tensor product polynomial interpolants which may be of independent interest in interpolation analysis.

07.
arXiv (CS.CL) 2026-06-25

The Generalization Spectrum: A Chromatographic Approach to Evaluating Learning Algorithms

Traditional evaluations measure a learning algorithm's final performance on an i.i.d. test set, reducing learning to a single aggregate score. This approach obscures a fundamental question: to what extent does learning from a specific example generalize to others? Such per-sample generalization, akin to learning by analogy in human cognition, captures how far the knowledge extracted from one example can transfer, yet remains invisible to standard benchmarks. We introduce the Generalization Spectrum, an evaluation framework designed to expose this hidden dimension. For each training example, we construct a controlled suite of test variants arranged by increasing transfer distance, from exact recall to implementation transfer across languages, context transfer under complete narrative re-framing, category-matched in-domain problems, and an unpaired baseline. By tracking performance across these distances, we reveal not just whether an algorithm learns, but how far that learning extends. We instantiate this framework on competitive programming, using a selection-and-synthesis pipeline seeded with recent problems to mitigate contamination. We first compare three canonical learning paradigms under matched memorization. RL converts memorization into near-transfer more efficiently than SFT-family baselines, while ICL exhibits strong but correspondence-dependent transfer. We then use the Spectrum to diagnose within-family variants. The resulting profiles show that local gains need not expand the generalization radius: abstractions and hints mainly lift local transfer, RFT preserves a stronger far-transfer tail than reference SFT, and self-distillation or hint-assisted RL can reduce far transfer even when local transfer or optimization improves.

08.
arXiv (CS.LG) 2026-06-25

Learning Structural Hardness for Combinatorial Auctions: Instance-Dependent Algorithm Selection via Graph Neural Networks

作者:

arXiv:2602.14772v2 Announce Type: replace Abstract: The Winner Determination Problem (WDP) in combinatorial auctions is NP-hard, and no existing method reliably predicts which instances will defeat fast greedy heuristics. The ML-for-combinatorial-optimization community has focused on learning to replace solvers, yet recent evidence shows that graph neural networks (GNNs) rarely outperform well-tuned classical methods on standard benchmarks. We pursue a different objective: learning to predict when a given instance is hard for greedy allocation, enabling instance-dependent algorithm selection. We design a 20-dimensional structural feature vector and train a lightweight MLP hardness classifier that predicts the greedy optimality gap with mean absolute error 0.033, Pearson correlation 0.937, and binary classification accuracy 94.7\% across three random seeds. For instances identified as hard – those exhibiting ``whale-fish'' trap structure where greedy provably fails – we deploy a heterogeneous GNN specialist that achieves ${\approx}0\%$ optimality gap on all six adversarial configurations tested (vs.\ 3.75–59.24\% for greedy). A hybrid allocator combining the hardness classifier with GNN and greedy solvers achieves 0.51\% overall gap on mixed distributions. Our honest evaluation on CATS benchmarks confirms that GNNs do not outperform Gurobi (0.45–0.71 vs.\ 0.20 gap), motivating the algorithm selection framing. Learning when to deploy expensive solvers is more tractable than learning to replace them.

09.
arXiv (CS.AI) 2026-06-18

How Well Do Large Language Models Capture Human Personality?

arXiv:2606.18263v1 Announce Type: cross Abstract: Large language models (LLMs) are increasingly used to simulate human populations via persona prompting, often under the assumptions that richer persona descriptions improve behavioral fidelity, similarly sized attribute combinations are equally simulatable, and persona definitions generalize across tasks. In this work, we formalize these assumptions and systematically evaluate them across multiple architectures, scales, and simulation settings. We identify a fundamental limitation we term persona manifold collapse, where increasingly expressive persona specifications lead to systematic contraction of representational and behavioral diversity. Across models, increasing persona complexity consistently reduces inter-persona separation in latent space and weakens behavioral differentiation in downstream simulation tasks. These effects persist across multiple analyses as richer personas fail to preserve human subgroup disagreement, performance varies across attribute combinations of similar size, and adding descriptive detail often degrades rather than improves simulation fidelity. Surprisingly, simple Age-Gender personas consistently outperform richly specified Ideal Customer Profiles (ICPs) across industries, achieving substantially higher downstream prediction accuracy. We find that collapse is not uniform across attributes. Certain combinations remain behaviorally stable and preserve stronger alignment with human responses, forming localized regions we term alignment bridges. Together, our results provide empirical and conceptual foundations for understanding the limits of persona-conditioned simulation, highlighting the need for representation-aware persona construction rather than increasing persona expressivity alone.

10.
arXiv (CS.CL) 2026-06-18

GraphPO: Graph-based Policy Optimization for Reasoning Models

Reinforcement Learning with Verifiable Rewards (RLVR) has become a standard paradigm for enhancing the capability of large reasoning models. RLVR typically samples responses independently and optimizes the policy using from final answers. This paradigm has two limitations. First, independently responses often contain similar intermediate reasoning steps, causing redundant exploration and wasted computation. Second, sparse final-answer rewards make it hard to identify useful steps. Tree-based methods partly address this problem by sharing prefixes and comparing branches from the same prefix to provide fine-grained signals. However, tree branches are still expanded independently. When different branches reach similar reasoning states, they cannot share information and repeat similar exploration. Moreover, tree-based methods ignore such dispersion and only perform local comparisons within separate branches, which can lead to higher variance in advantage estimation. To address this challenge, we propose GraphPO (Graph-based Policy Optimization), a novel RL framework that represents rollouts as a directed acyclic graph, with reasoning steps as edges and semantic states summarized from the reasoning paths as nodes. GraphPO merges semantically equivalent reasoning paths into equivalence classes, allowing them to share suffixes and reallocating budget away from redundant expansions to diverse exploration. Furthermore, we assign efficiency advantages to incoming edges and correctness advantages to outgoing edges, thereby improving inference efficiency while deriving process supervision from outcome. Theory shows that GraphPO reduces advantage-estimation variance and enhances reasoning efficiency. Experiments on three LLMs across reasoning and agentic search benchmarks show that GraphPO consistently outperforms chain- and tree-based baselines with the same token budgets or response budgets.

11.
medRxiv (Medicine) 2026-06-18

Hard to Halt: Automation Bias in Agent-Driven Sequencing Prior Authorization Workflows

Purpose: Prior authorization (PA) for exome or genome sequencing is a time-consuming process that impedes timely rare disease diagnosis. Large language model-based browser agents offer potential for automating these workflows, but their clinical reliability remain uncharacterized. Methods: We developed a sandbox compromising a simulated ES/GS PA submission payer portal and a synthetic EHR containing 836 patient records spanning compliant profiles and deficient profiles with different types of issues. Gemini 3 Pro, Gemini 3 Flash, and Claude Opus 4.5 were evaluated on task completion rate, form completion accuracy, and appropriate withholding for deficient profiles. Results: Larger models achieved much higher task completion rates (Gemini 3 Pro 95.45%, Claude Opus 4.5 93.67%) compared to Gemini 3 Flash (56.05%), but nearly universally failed to withhold submission for deficient profiles whereas Gemini 3 Flash ironically demonstrated superior withholding performance (17.33%). In a non-agentic setting, Gemini 3 Pro correctly identified 91% of the issues in deficient profiles, indicating that withholding failure is attributable to the browser interaction rather than the model's reasoning limitations. Conclusion: Current LLM-based browser agents exhibit a systematic bias towards form submission that poses risks in PA workflows. A modular, multi-agent architecture with human supervision is necessary for a safe clinical deployment.

12.
arXiv (CS.AI) 2026-06-25

From Sounds to Scenes: A Benchmark for Evaluating Context-Aware Auditory Scene Understanding in Large Audio Language Models

arXiv:2606.25391v1 Announce Type: cross Abstract: Recent Large Audio Language Models (LALMs) have achieved remarkable progress in audio perceptual tasks across individual acoustic layers, including speech, sound, and music. However, existing benchmarks predominantly evaluate these layers in isolation, overlooking the complex contextual relationships that arise when multiple acoustic sources co-occur in real-world auditory scenes. Real-world auditory interpretation requires Context-Aware Auditory Scene Understanding (CASU): the ability to comprehend the holistic scene by integrating sound layers. To evaluate this capability, we introduce the CASU benchmark, which assesses whether Audio LLMs can interpret auditory scenes composed of speech, acoustic events (e.g., announcements), and background environments (e.g., traffic), and reason about the logical relationships between these layers. We propose a scalable pipeline for constructing time-accurate, semi-synthetic audio streams by composing real-world scene sounds with synthetic speech. Building on this data, we design four tasks that probe scene understanding: contextual question answering, entity extraction from the scene, speaker role inference, and counterfactual reasoning where scene is manipulated. Experiments across multiple LALMs demonstrate that effective auditory scene understanding requires integration over all auditory layers, rather than reliance on speech or sound alone, underscoring the necessity of CASU for advancing complex audio understanding in LALMs.

13.
arXiv (CS.AI) 2026-06-24

Breaking the Filter Bubble: A Semantic Pareto-DQN Framework for Multi-Objective Recommendation

arXiv:2606.24042v1 Announce Type: new Abstract: Recommender systems often induce filter bubbles and semantic homogenization by monolithically optimizing for immediate user engagement. Standard single-objective models, including traditional Deep Q-Networks, are ill-equipped to navigate the trade-offs between platform retention and critical societal values like information diversity and provider fairness. To address these limitations, we introduce a multi-objective reinforcement learning framework that formalizes recommendation as a semantic multi-objective Markov decision process. By integrating high-fidelity semantic embeddings with a Pareto-DQN agent, our architecture treats engagement, diversity, and fairness as distinct, non-aggregable reward signals, avoiding the pitfalls of static reward scalarization. Empirical evaluations on the MovieLens small dataset shows that our hypervolume based action selection disrupts the feedback loops responsible for semantic collapse. By sustaining high state-trajectory variance, the Pareto-DQN effectively maps the Pareto frontier, achieving gains in auxiliary societal objectives with only marginal impacts on engagement. This work provides a path toward intrinsically aligned, responsible recommender systems.

14.
arXiv (CS.CL) 2026-06-24

Towards Spec Learning: Inference-Time Alignment from Preference Pairs

Steering a large language model (LLM) toward a desired behavior typically relies on an iterative process of hand-crafting a prompt based on a careful inspection of the model's responses. This is an involved, brittle, and error-prone process. Preference-based fine-tuning is a more rigorous but often prohibitively expensive solution. We propose spec learning, a framework that relies on a brief user instruction and a small set of preference judgments. These are compiled into specifications in the form of natural-language prompts for an LLM. Specifications condition LLMs at inference time, and no parameter updates to the underlying models are required. We show that the responses generated based on the compiled specifications often outperform direct preference optimization (DPO) on datasets from specialized domains whose preference signal is dense. Unlike opaque weight updates, the resulting specifications are human-readable and double as interpretable and transparent written embodiments of the preference signal that produced them.

15.
arXiv (quant-ph) 2026-06-25

Taxonomy of integrable and ground-state solvable models: Jastrow wave functions on graphs and parent Hamiltonians

arXiv:2602.22315v2 Announce Type: replace Abstract: We introduce a family of many-body systems of distinguishable continuous-variable particles in which interparticle interactions are set by the adjacency matrix of a graph. The ground-state wave function of such systems is of a generalized Jastrow form involving the product of pair-correlation functions over the edge set of the graph. These systems describe quantum fluids when the graph is complete, and the pair function has a well-defined permutation symmetry. In general, they provide the continuous-variable generalization of spin systems on graphs, with broken permutation symmetry. The corresponding parent Hamiltonian is shown to include (a) two-body interactions determined by the graph adjacency matrix and (b) three-body interactions over all possible 2-paths on the graph. Employing elements of graph theory, we chart the landscape of models, recovering known instances in the literature and providing numerous new examples of ground-state solvable models for which the system Hamiltonian, ground-state wave function, and corresponding energy eigenvalue are specified.

16.
arXiv (CS.LG) 2026-06-24

AsyncOPD: How Stale Can On-Policy Distillation Be?

arXiv:2606.24143v1 Announce Type: new Abstract: On-policy distillation (OPD) trains a student on its own rollouts guided by teacher feedback and is becoming increasingly important for large language model (LLM) post-training. Like reinforcement learning (RL), however, OPD faces an on-policy systems bottleneck, as rollouts can dominate training time for reasoning workloads. Asynchronous training pipelines can alleviate this bottleneck by decoupling rollout generation from learner updates, but doing so introduces stale-policy data. While prior work has studied stale data in asynchronous RL, its effects in OPD remain underexplored. We present the first systematic study of staleness in asynchronous OPD, focusing on a practical setting where teacher feedback is implemented through local KL losses and full-vocabulary teacher logits are too expensive to store or transfer, necessitating finite teacher-score caches. We first show that KL direction changes the stale-data problem: teacher-weighted forward KL is more robust to stale rollouts, whereas student-weighted reverse KL is vulnerable. Second, for this vulnerable reverse-KL case, we study whether methods designed to stabilize asynchronous RL can mitigate OPD staleness. In our experiments, they do not improve over a simpler OPD-specific surrogate: recomputing the reverse-KL signal under the current student at learner time. Third, we analyze how finite teacher-score caches create a bias-variance tradeoff for sparse and sampled reverse-KL OPD estimators. This motivates multi-sample Monte Carlo (MC), which preserves MC correctability while reducing one-sample variance. Finally, we present and open-source AsyncOPD, a fully asynchronous OPD training pipeline built from these estimator choices. Experiments show that AsyncOPD improves training throughput by $1.6\times$ to $3.8\times$ over strict synchronous training while reaching comparable accuracy.

17.
arXiv (CS.CV) 2026-06-24

MapReason-OSM: Can Vision-Language Models Make Graph-Verifiable Mobility Decisions from Street Maps ?

Vision-language models (VLMs) are increasingly used to read maps for logistics, delivery, and accessible navigation, where the output is an actionable decision (a route, a pin, a parking choice) that must respect the road network. Yet most map benchmarks grade free text or multiple-choice answers that cannot be verified against the underlying graph. We present MapReason-OSM, a benchmark and evaluation harness for graph-verifiable mobility decisions on self-rendered OpenStreetMap panels. We render fixed-style maps for ten U.S. downtowns at two aligned zoom scales, overlay a consistent marker grammar, and pair each panel with a hidden street graph and exact oracles, yielding 6,000 instances (12,000 panels across the two zooms) over 12 routing, facility-location, and visual disambiguation tasks. Models return structured decisions that we snap back to the graph and score for validity, legality, optimality, and constraint satisfaction, plus cross-zoom consistency. Across seven VLMs, models read maps and route simply but fail at graph cost reasoning (single-facility pin placement is near chance even for frontier reasoning models), and are frequently scale-inconsistent. We release the benchmark, harness, and deterministic generator. Code and data: https://github.com/Vi-Sri/mapreason-osm

18.
arXiv (CS.AI) 2026-06-18

scGTN: Deep Siamese Graph Transformer Network for Single-cell RNA Sequencing Clustering

arXiv:2606.18672v1 Announce Type: cross Abstract: Single-cell RNA sequencing (scRNA-seq) serves a pivotal role in characterizing gene expression at the cellular level, enabling the identification of cell types and advancing the understanding of cellular heterogeneity. Despite the significant progress in scRNA-seq data clustering, we argue that current methods always ignore the sparsity and noise, as well as the complex intercellular structural information inherent in scRNA-seq data. Toward this end, in this paper, we propose a novel single-cell RNA-seq clustering framework via deep Siamese Graph Transformer Network (termed scGTN), which explicitly integrates gene expression profile and intercellular structural dependencies for cell clustering. In particular, we formulate scRNA-seq data as a graph and construct two augmented graph views that serve as dual views to capture complementary intercellular information. Then, a Siamese graph transformer network is employed to explicitly incorporate shortest-path information and node-wise distances for capturing richer structural relationships between cells. Finally, we employ an optimal transport strategy to guide the cell clustering in a self-supervised manner. Extensive experiments on multiple benchmark scRNA-seq datasets demonstrate that our scGTN consistently outperforms existing methods. Our code is available at https://github.com/W-RMSL/scGTN.

19.
arXiv (CS.CL) 2026-06-24

Reinforcement Learning Towards Broadly and Persistently Beneficial Models

As AI systems are deployed across increasingly diverse and high-stakes settings, model alignment must generalize beyond the tasks and domains seen during training. This is especially important for reinforcement learning (RL), which can introduce unexpected misalignment through reward hacking, deception, or other unintended strategies. We study whether RL on beneficial behavior, instantiated in realistic domains, can produce broad and persistent alignment generalization beyond the training distribution. We construct a dataset of realistic situations designed to measure and train beneficial traits, such as truthfulness, fairness, risk awareness, and corrigibility, spanning varied domains, including health, science, and education. We then train models with RL on this dataset and evaluate them on more than 50 independent benchmarks of alignment and beneficial behavior. Compared to a compute-matched baseline, beneficial trait RL improves performance on over 80% of these out-of-distribution benchmarks. We observe substantial out-of-distribution alignment transfer: a beneficial-behavior RL intervention entirely limited to one domain, health, produces broad improvements on non-health alignment evaluations, including reduced reward hacking, deception, and general misalignment. Finally, we study alignment persistence: whether behavior remains robustly aligned under attempts to steer models towards misalignment. Models trained with beneficial trait RL show improved persistence, including greater resistance to adversarial prompting and harmful finetuning; further work is required to isolate the sources of these effects. These results suggest that RL to reinforce beneficial behavior in realistic domains can produce models that are more robustly aligned with human flourishing.

20.
arXiv (CS.CV) 2026-06-16

High-Fidelity 4D Hand-Object Capture via Multi-View Spatiotemporal Tracking and Physics-Aware Gaussians

The growing demand for high-fidelity 4D hand-object interaction (HOI) data in embodied AI and spatial computing is currently bottlenecked by the reliance on pre-scanned object templates and physical markers. While recent methods have demonstrated promising results in reconstructing 4D hand-object interaction from videos, they are highly sensitive to initial estimates of hand and object poses. Yet, estimating these poses from images is challenging, in particular under severe occlusion which is inherent in hand-object interaction scenarios. We propose a novel system for the robust and accurate reconstruction of hands and objects from synchronized and calibrated multi-view videos without requiring any templates or markers. Our system consists of two main components with key innovations: (1) a multi-view feed-forward transformer model that aggregates cross-view geometry and temporal cues to provide a reliable, metric-consistent initialization for both poses and dense object geometry, and (2) a hand-object physics-aware Gaussian-based optimization framework to refine the initial estimates, integrating tetrahedral constraints, collision refinement, and appearance decomposition to produce physically plausible and visually accurate reconstruction. Validated on public benchmarks and an extensive internal dataset, our pipeline achieves highly robust, artifact-free reconstruction, providing an efficient foundation for automated 4D asset generation. Our project page are available at https://zyshen021.github.io/HOSTPG/.

21.
medRxiv (Medicine) 2026-06-22

Survival differences and artemisinin resistance in severe malaria among HIV coinfected patients: data from Mozambique

Abstract Background Malaria remains a significant cause of morbidity and mortality, especially in sub-Saharan Africa, where rates of HIV coinfection are high. This study aimed to determine whether Plasmodium falciparum malaria treatment outcomes and rates of antimalarial resistance markers differ according to HIV serostatus in Mozambique. Methodology We conducted an observational study of non-pregnant adults, with and without HIV coinfection, admitted to the Hospital Central de Maputo for treatment of severe malaria. Plasmodium falciparum DNA was extracted from whole blood and sequenced to identify single-nucleotide polymorphisms. Statistical analyses to compare clinical outcomes and rates of nonsynonymous mutations in genes associated with drug resistance were performed in R version 4.2. Results We recruited 149 study participants aged between 18-62 years, 72 (48.3%) were female, and 59 (39.6%) were infected with HIV. Comparing clinical outcomes, we found a significant difference in anemia (hemoglobin

22.
medRxiv (Medicine) 2026-06-16

Physiological Aging of the Respiratory System (PARS): from development to application

Background: Aging has a critical role in lung changes and the outcome of lung disease. Several lung aging equations have been proposed to measure deviation from physiological aging of the respiratory system. In this study, we aimed to develop a single measure of accelerated lung aging and show its application as a measure of lung aging. Method: We used a pre-bronchodilator pulmonary function test (PFT) from NHANES adult participants recruited from 2007 to 2011. We applied Klemera-Dubal Method (KDM) to four PFT measurements, FEV1, FVC, FEF25-75, and PEF, to calculate a measure of lung biological aging. Physiological Aging of the Respiratory System (PARS) was calculated from the residual method vs. chronological age. We tested the construct validity of PARS by measuring its association with risk factors of lung health. The prognostic validity was measured using a survival analysis. Sampling weights were applied to all analyses. Results: In 14,123 adult participants, the mean (SD) of accelerated lung age (PARS) was 0 (8.2) years. Participants with a history of asthma and emphysema had 4- and 10-year higher PARS. Cigarette smoking, lower socioeconomic status, black race, higher serum cadmium, and lower serum selenium and magnesium were associated with higher PARS. During 116 months of follow-up, PARS was associated with a higher mortality (HR = 1.06, 95%CI: 1.05-1.07 per year). Females with higher PARS had a higher risk of death (P for interaction < 0.001). Results were consistent across different subgroups and sensitivity analyses. Conclusion: PARS is a noninvasive lung aging marker and can be applied as a single measure of lung accelerated aging in the adult population. Its strong construct and predictive validity support its future application among different populations with and without lung disease.

23.
arXiv (CS.AI) 2026-06-16

AI Pluralism and the Worlds It Misses

arXiv:2606.16167v1 Announce Type: new Abstract: AI pluralism is often framed as a problem of representing diverse values, preferences, users, or outputs. This paper argues that this framing is incomplete because AI systems also impose ontologies: they define what counts as an entity, relation, feature, harm, benefit, and valid form of evidence. We define ontological flattening as the conversion of situated, contested, and historically specific meanings into a restricted technical category, proxy, aggregation rule, or benchmark target that is treated as neutral and difficult to contest. The paper develops a bounded conceptual and qualitative synthesis across value pluralism, pluralistic alignment, participatory and democratic AI, procedural justice, science and technology studies, accountability research, aggregate themes from 11 expert interviews, and three urban AI companion cases. The cases illustrate how pluralistic methods can improve or structure model behavior while still compressing categories, proxies, aggregation rules, and revision rights before affected actors have procedural standing. We introduce Pluralistic Lifecycle Governance (PLG) as a preliminary qualitative audit scaffold for documenting ontological openness, epistemic inclusion, procedural authority, evaluation pluralism, and lifecycle accountability. PLG is not presented as a validated scoring instrument; it is a framework for making the evidence and governance conditions of pluralistic AI explicit.

24.
arXiv (CS.CL) 2026-06-16

Entropy-Aware On-Policy Distillation of Language Models

On-policy distillation is a promising approach for transferring knowledge between language models, where a student learns from dense token-level signals along its own trajectories. This framework typically uses reverse KL divergence, encouraging the student to match the teacher's high-confidence predictions. However, we show that the mode-seeking property of reverse KL reduces generation diversity and yields unstable learning signals when the teacher distribution has high entropy. To address this, we introduce Entropy-Aware On-Policy Distillation. Our key idea is augmenting the standard reverse KL objective with forward KL when teacher entropy is high, capturing the full range of plausible outputs while retaining precise imitation elsewhere. It balances mode-seeking precision with mode-covering robustness without sacrificing on-policy training efficiency. Experiments show that our method maintains generation diversity (sustained token-level entropy) and improves student-teacher alignment (lower forward KL on high-entropy tokens). Across six math reasoning benchmarks, this yields Pass@8 accuracy gains of +1.37 for Qwen3-0.6B-Base, +2.39 for Qwen3-1.7B-Base, and +5.05 for Qwen3-4B-Base compared to baseline on-policy distillation methods. These results demonstrate that accounting for teacher uncertainty is essential for maintaining diversity and achieving effective knowledge transfer.

25.
bioRxiv (Bioinfo) 2026-06-18

Calculation of sequence space coverage in a mutagenesis library

Directed evolution requires screening of large mutagenesis libraries, but accurate calculation of library sizes needed to discover functional variants remains challenging. Existing models provide baseline estimates, yet current computational approaches for finding the best variants scale poorly with library complexity. Here, we introduce a scalable algorithmic framework to compute exact discovery probabilities in saturation mutagenesis libraries with no requirement for explicit sequence enumeration. By aggregating variants into a composition log–sum distribution and applying log-space convolution across randomisation blocks, it is possible to extend this to massive sequence spaces and mixed codon schemes. By inverting these calculations, absolute mathematical ceilings for experimental design are established. Ultimately, this framework provides a rapid, quantitative tool to balance the statistical coverage-diversity trade-off within the limitations of laboratory screening. Finally, this is implemented as an open-source web application (SSCC) that allows researchers to construct heterogeneous library designs and compute required sampling depths, coverage probabilities, and absolute randomisation limits.