Academic Intelligence · Curated Daily

Explore the Frontier of Global Academia

AcademicHub aggregates real-time literature from top journals and preprint platforms. Build your personal research radar and let large language models compile cross-disciplinary analysis briefings automatically.

01.
arXiv (CS.AI) 2026-06-12

When Does Delegation Beat Majority? A Delegation-Based Aggregator for Multi-Sample LLM Inference

arXiv:2606.08098v2 Announce Type: replace Abstract: Majority voting over sampled answers is the dominant unsupervised aggregator for multi-sample LLM inference. In this paper, we show a delegation-based aggregator (Propagational Proxy Voting, PPV; Sakai et al., 2025) yields an unsupervised consensus rule that beats majority on MMLU-Pro by +1.5 pp overall and +2.24 pp on the non-trivial subset (paired McNemar p ~ 1.0e-14, n = 8,099). Majority discards two signals that every sample carries: within-group letter entropy and between-group reasoning geometry. PPV exposes per-voter levers that consume exactly these two signals: When (how much weight a voter keeps on its own pick) and Whom (how it splits the remainder across peers). We drive When with letter entropy and Whom with per-question-centered embedding cosine. Our method needs no gold labels and no auxiliary training: per-question, we partition 128 sampled generations into 16 groups, compute each group's letter-level semantic entropy and reasoning embedding centroid, and feed both into a stochastic delegation matrix whose stationary distribution selects the consensus answer. We walk through an example in which PPV overturns a clear 10-6 majority for the wrong letter: the 10-voter majority cluster is geometrically incoherent (mean within-cluster cosine -0.02) while the 6-voter minority is tight (+0.26), so propagated delegation mass concentrates on the minority's answer even though entropy alone would keep the majority ahead. We further report delegation strategies with negative results that constrain the design space for unsupervised LLM aggregation. No within-question ensemble of confidence modes closes the oracle gap.

02.
arXiv (quant-ph) 2026-06-12

More efficient Clifford+T synthesis for small-angle rotations and application to Trotterization

arXiv:2605.31544v2 Announce Type: replace Abstract: Clifford+T synthesis of rotation gates is an important routine in fault-tolerant quantum compilation. While Clifford+T synthesis is scalable, it has a high overhead of tens of T gates per rotation in practice, translating to high resource estimates for many fault-tolerant algorithms. However, these well-known results, including those using probabilistic mixtures [Quantum 7, 1208 (2023)], are independent of the rotation angle $\theta$, requiring $O(\log 1/\delta)$ T gates. We show that it is possible to do much better for small angles, reducing the T cost to $\tilde{O}(\theta^2/\delta)$, and returning to existing $O(\log1/\delta)$ results in the worst case. This is particularly important since many algorithms, such as Trotterization, are dominated by small-angle rotations. Further, we perform a detailed theoretical and numerical study of quasi-probabilities, which can further reduce the total T cost of large circuits by orders of magnitude with only a small overhead in sample complexity. We also develop a scheme based on quasi-probability mixtures of Clifford+T fallback channels. We derive new $\theta$-dependent formulas that can be used for resource estimation of fault-tolerant quantum algorithms. As an application of our results, we show that the gate cost of Trotterization circuits compiled to a Clifford+T gate set is constant in the small Trotter step size limit, and can be reduced by orders of magnitude even for large step sizes. The cost of fault-tolerant Trotterization for a variety of applications should be re-examined in light of these results. Our work dispels the widely-stated claim that Clifford+T rotation synthesis has a high cost independent of $\theta$, and further develops a scalable quasi-probability method for rotation synthesis. We also expect our results to bring forward useful early fault-tolerant quantum computing by reducing required magic state resources.

03.
PLOS Medicine 2026-06-12

Comparison of count-based and clustering definitions of multimorbidity and their association with prevalence of multimorbidity, health profiles, and mortality: A cohort study of UK Biobank participants

by Gabriella C. Silva, Aurore Fayosse, Louis Jacob, Séverine Sabia, Archana Singh-Manoux, Benjamin Landré Background Multimorbidity, the presence of several chronic conditions, is linked to higher mortality and healthcare use and thus poses a major challenge for aging populations. While most studies rely on simple counts of conditions, clustering approaches have been proposed to describe patterns of co-occurring diseases. We aimed to evaluate the extent to which these methodological choices influence prevalence and association with health profiles and mortality. Methods and findings Using UK Biobank baseline data (n = 474,397), collected between 2006 and 2010, we compared six count-based definitions of multimorbidity based on different condition lists (extended, most prevalent, or body systems) and thresholds (≥2 versus ≥3 conditions). We also applied a clustering analysis to characterize subtypes of multimorbidity among participants with at least two chronic conditions. We compared prevalence and associations with concurrent health outcomes (polypharmacy, self-rated health, frailty, falls, surgery, chronic pain), blood-based measures (C-reactive protein, Cystatin-C, HDL, LDL Cholesterol, IGF-1), and 3- and 10-year mortality risks. Analyses were undertaken separately in men and women using multivariable regression models adjusted for sociodemographic characteristics and body mass index. Multimorbidity prevalence ranged from 1.0% (cluster-based) to 35.3% (count-based). Count-based definitions using lists with more conditions yielded higher prevalence. Higher thresholds identified more severe health profiles on all measured health outcomes, blood-based measures, but not higher mortality risks. Associations with blood-based measures were more pronounced using clustering, with the highest differences from the standard definition distributed across clusters. Odds ratios for 3-year mortality ranged from 1.44 [1.26; 1.64] to 4.60 [3.73; 5.62] for men and 1.35 [1.07; 1.69] to 3.83 [2.78; 5.14] for women. For 10-year mortality, they ranged from 1.42 [1.34; 1.50] to 3.86 [3.46; 4.30] in men and 1.29 [1.21; 1.39] to 3.33 [2.93; 3.77] for women, with clustering identifying groups with low prevalence and high mortality risks. Findings should be interpreted in light of the selected nature of the UK Biobank cohort and the cross-sectional assessment of several health indicators. Conclusion Operational definitions of multimorbidity substantially influence prevalence estimates, while associations with mortality appear more robust across count-based approaches. Clustering analyses provide complementary insights into heterogeneity within multimorbid populations. Future translational studies are warranted to determine how multimorbidity definitions can be optimized to ultimately improve clinical management and health outcomes in practice.

04.
arXiv (CS.CV) 2026-06-18

BindEdit: Taming Attention Leakage for Precise Multi-Object Image Editing

Real image editing enables precise manipulation of visual content, yet existing methods often fail in complex multi-object scenarios, causing semantic blending, object duplication, or incomplete edits. We attribute these failures to attention leakage, where signals across spatial regions and text tokens become entangled during the denoising process. Specifically, we identify two distinct forms of leakage: Edit-Token Leakage, where ambiguous token-region alignment leads to object blending, and Source Dominance Leakage, where tokens of unchanged source objects overwhelm the attention intended for target entities. To resolve these leakages, we propose BindEdit, which enforces attention-level constraints within a single diffusion trajectory. To suppress Edit-Token Leakage, BindEdit jointly regularizes cross- and self-attention so that each target token group is bound to its corresponding spatial region while maintaining instance-level separation. To suppress Source Dominance Leakage, a cross-attention re-balancing mechanism amplifies target token influence and attenuates residual source semantics within editable regions. Moreover, a region fidelity term ensures that each target concept is expressed coherently across the entire editing mask. Additionally, we propose a comprehensive multi-object benchmark encompassing diverse object counts and categories. Extensive experiments demonstrate that BindEdit consistently outperforms existing methods within a single diffusion trajectory, maintaining robust performance across both single- and multi-object editing scenarios.

05.
arXiv (quant-ph) 2026-06-12

Coulomb crystallization of xenon highly charged ions in a laser-cooled Ca+ matrix

arXiv:2512.12266v2 Announce Type: replace-cross Abstract: We report on the sympathetic cooling and Coulomb crystallization of xenon highly charged ions (HCIs) with laser-cooled Ca$^+$ ions. The HCIs are produced in a compact electron beam ion trap, then charge selected, decelerated, and finally injected into a cryogenic linear Paul trap. There, they are captured into $^{40}$Ca$^+$ Coulomb crystals, and co-crystallized within them, causing dark voids in their fluorescence images. Fine control over the number of trapped ions and HCIs allows us to realize mixed-species crystals with arbitrary ordering patterns. By investigating Xe$^{q+}$–Ca$^+$ strings, we confirm the HCI charge states, measure their lifetime and characterize the mixed-species motional modes. Our system effectively combines the established quantum control toolbox for Ca$^+$ with the rich set of atomic properties of Xe highly charged ions, providing a resourceful platform for optical frequency metrology, searches for signatures of new physics, and quantum information science.

06.
arXiv (CS.CL) 2026-06-16

How Far Can Machine Translation Quality Take You? Extrinsic Discourse Evaluation in Goal-Oriented Setups

Existing machine translation (MT) metrics and discourse-focused evaluations primarily assess translation quality intrinsically, without measuring the downstream consequences of translation errors. In this work, we focus on extrinsic discourse evaluation of machine translation under two distinct regimes: static and interactive. Under the static regime, we propose an entity counting task as a probe of referential consistency in discourse. We show that high intrinsic MT quality does not reliably predict downstream discourse success and strong MT systems still produce referential inconsistencies. For the interactive regime, we study the goal-oriented multi-agent Welfare Diplomacy game as a probe of long-horizon communication and coordination. We find that interaction-specific translation failures impact downstream coordination. Our results highlight goal-oriented environments as a viable framework for discourse-sensitive extrinsic MT evaluation.

07.
arXiv (CS.CV) 2026-06-16

Polyp-D2ATL: Deep Domain-Adaptive Transfer Learning for Colorectal Polyp Classification under Label Distribution Shift

Early and highly accurate prediction of colorectal polyps, as an important sign of one of the most dangerous types of cancer, will result in saving more lives. Despite the advancements in colorectal polyp classification, many challenges remain in obtaining an automated polyp prediction system that is able to diagnose the difficult-to-predict polyps accompanied by different features in real scenarios, where the model can handle imbalanced data, label distribution shift, and cross-modality generalization successfully. In this study, we propose Polyp-D2ATL, a novel framework accompanied by a specific training strategy, which mitigates these limitations and effectively predicts the different classes of polyps belonging to the NICE classification. Our extensive experiments on the PICCOLO validation and test sets demonstrate that the proposed Polyp-D2ATL significantly outperforms existing state-of-the-art models across various reliable metrics, achieving an accuracy of 82.38%, a Macro-F1 of 77.49%, and a specificity of 87.47% on the validation set, alongside consistent improvements on the held-out test set which demonstrates the generalization capacity and clinical applicability of the proposed approach.

08.
arXiv (CS.LG) 2026-06-17

Noise-Driven Escape from Metastable Phases explains Grokking in Deep Neural Networks

arXiv:2606.17120v1 Announce Type: new Abstract: Deep neural networks (DNNs) exhibit first order phase transitions under variations of the L2 regularization strength, with each transition marking the onset of a new learnable feature. Below a critical regularization strength, all features are in principle learnable, but coexisting metastable states, separated by energy barriers, can trap the network and impede convergence. A strength of DNNs is their ability to generalize. But many open questions remain, among them the origin of so called grokking: the abrupt, delayed onset of generalization after prolonged apparent overfitting. We show for linear DNNs that grokking is consistent with hysteresis in first-order L2 phase transitions: using L2 regularization to engineer deliberate trapping, we demonstrate that a model in a low-accuracy metastable state escapes only when SGD noise drives it across an energy barrier, with escape times following Arrhenius scaling. We reproduce grokking-like delayed convergence across two orders of magnitude in escape time by deliberately trapping models in metastable phases. Using sparse sub-sampling we also reproduce the canonical grokking curve where test error eventually approaches the final training error. Our work suggests that the number of metastable states equals the number of learnable features – one per singular value of the data covariance – the potential for hysteresis grows naturally with task complexity. We provide evidence that the same mechanism likely operates in general nonlinear DNNs. Our results provide routes toward more efficient learning schemes.

09.
arXiv (CS.AI) 2026-06-19

Beyond Entropy: Learning from Token-Level Distributional Deviations for LLM Reasoning

arXiv:2606.19771v1 Announce Type: new Abstract: Reinforcement Learning with Verifiable Rewards (RLVR) has significantly advanced Large Language Model (LLM) reasoning; however, it faces a fundamental optimization instability: uniform token updates precipitate entropy collapse, leading to premature convergence to suboptimal strategies, whereas excessive Shannon Entropy maximization can cause entropy explosion, driving blind exploration toward incoherent reasoning chains. To resolve this dichotomy, we introduce the Independent Combinatorial Tokens (ICT) framework, which shifts the optimization focus from scalar uncertainty to the distributional properties of token logits. By leveraging the Jensen-Shannon (JS) divergence between token logits distributions, ICT identifies tokens with distinctive distributional patterns as critical branching points for guiding effective exploration in LLM reasoning. Our theoretical analysis, grounded in both Shannon and second-order Rényi entropy, proves that selectively updating on these tokens regulates policy concentration: it reduces the overall distribution uncertainty measured by Shannon entropy, while controlling probability concentration captured by second-order Rényi entropy. This dual effect prevents over-concentrated token generation from weakening exploration and effectively stabilizes the training landscape. Empirical results demonstrate that updating only the top 10% of unique tokens on Qwen2.5 (0.5B/1.5B/7B) models yields an average pass@4 improvement of 4.58%, with a maximum gain of 14.9%, over GRPO, 20-Entropy, and STAPO baselines across seven benchmarks spanning math, commonsense, and Olympiad-level problems.

10.
arXiv (CS.AI) 2026-06-19

Enhancing Generative Auto-bidding with Offline Reward Evaluation and Policy Search

arXiv:2509.15927v5 Announce Type: replace-cross Abstract: Auto-bidding is a critical tool for advertisers to improve advertising performance. Recent progress has demonstrated that AI-Generated Bidding (AIGB), which learns a conditional generative planner from offline data, achieves superior performance compared to typical offline reinforcement learning (RL)-based auto-bidding methods. However, existing AIGB methods still face a performance bottleneck due to their inherent inability to explore beyond the static dataset with feedback. To address this, we propose AIGB-Pearl (Planning with \textbf{EvaluAtor via RL}), a novel method that integrates generative planning and policy optimization. The core of AIGB-Pearl lies in constructing a trajectory evaluator to assess the quality of generated scores and designing a provably sound KL-Lipschitz-constrained score-maximization scheme to ensure safe and efficient exploration beyond the offline dataset. A practical algorithm that incorporates the synchronous coupling technique is further developed to ensure the model regularity required by the proposed scheme. Extensive experiments on both simulated and real-world advertising systems demonstrate the state-of-the-art performance of our approach.

11.
arXiv (CS.CV) 2026-06-17

MODE-RAG: Manifold Outlier Diagnosis and Energy-based Retrieval-Augmented Generation Evaluation

While Multimodal Retrieval-Augmented Generation (M-RAG) enhances Large Vision-Language Models, it remains highly susceptible to cross-modal hallucinations, causal fabrications, and sycophancy. Furthermore, existing mitigation pipelines often face an intervention paradox: static rules tend to unnecessarily disrupt accurate generations, whereas leaving the multi-modal reasoning completely unguided allows existing mismatches to cascade into severe logical fabrications. To quantify and mitigate these hallucinations, we propose a Multi-Agent system, MODE-RAG, driven by Variational Free Energy (VFE) and internal attention states to dynamically gate interventions. High-risk queries are routed to five stage-specific agents, integrating Monte Carlo Tree Search (MCTS) for rigorous causal derivation and logit perturbations to penalize sycophancy. Dedicated Correction and Overseer agents ensure formatting stability and perform post-hoc factual verification. To objectively evaluate our approach, we introduce ModeVent, a challenging subset derived from the MultiVent dataset. Extensive experiments indicate that our system effectively reduces hallucination rates and logical fabrication, significantly improving the robustness of M-RAG systems.

12.
arXiv (CS.LG) 2026-06-17

From Drift to Coherence: Stabilizing Beliefs in LLMs

arXiv:2606.17832v1 Announce Type: new Abstract: Large language models (LLMs) are often hypothesized to perform implicit Bayesian inference, yet a key coherence condition, the martingale property of predictive beliefs, has been shown to fail in controlled synthetic in-context learning settings. We revisit this question in a more typical usage regime: generic multiple-choice question answering. Exploiting the discrete answer space, we compute exact predictive distributions and study belief dynamics induced by autoregressive answer resampling. We introduce prompted predictive resampling (PPR), where an LLM generates a sequence of answers to the same question. Empirically, PPR reveals early-stage belief drift, indicating martingale violations. However, after sufficient resampling steps, the belief process self-stabilizes and converges to a coherent predictive distribution. Based on this observation, we further propose (i) a seed-answer prompting strategy to accelerate stabilization, and (ii) a self-consistency loss that amortizes early-stage drift into the model via fine-tuning. Experiments on multiple-choice QA benchmarks show that our methods substantially reduce belief drift and improve predictive coherence without sacrificing accuracy.

13.
bioRxiv (Bioinfo) 2026-06-16

Integrative Transfer Network: Deep Transfer Learning Across Populations and Prediction Targets

Authors:

Large-scale clinical and biomedical datasets increasingly contain both diverse subgroup attributes (e.g., demographic or clinical subgroups) and multiple prediction targets. Although various machine learning approaches can address subgroup differences or multi-target prediction, they often consider these aspects independently rather than jointly. To more effectively capture the shared and subgroup-specific information in such complex datasets, we propose the Integrative Transfer Network (ITN), a deep neural network designed to leverage data across subgroups and multiple related outcomes simultaneously. In extensive experiments, including time-to-event and classification tasks where demographic subgroups and multiple disease endpoints are prevalent, ITN demonstrates consistent improvements in subgroup-specific prediction by borrowing strength from other subgroups and outcomes. We envision ITN as a unified framework for learning from heterogeneous datasets where subgroup-specific insights are critical.

14.
arXiv (CS.CV) 2026-06-19

SurgVista: Long-Horizon Surgical World Modeling with Plausible Instrument-Tissue Dynamics

Scaling robot policy learning for autonomous surgery is challenging, as expert demonstrations are expensive and in vivo exploration poses substantial safety risks. Surgical world models address this by generating realistic, action-conditioned future frames from an initial observation, but existing methods exhibit two persistent failure modes: spatial interaction incoherence, where visible instrument contact fails to induce spatially consistent tissue deformation, and temporal fidelity collapse, where prediction errors compound across autoregressive rollouts and progressively corrupt visual quality. We present SurgVista, a surgical world model that mitigates both failures through two training recipes. Deformation Consistency Regularization extracts scene-point trajectories from training videos and enforces cross-frame coherence through latent contrastive learning, strengthening physically consistent instrument-tissue dynamics. Drift Adaptation Training mitigates long-horizon drift by perturbing conditioning frames with online prediction residuals and photometric augmentations calibrated to long-horizon drift statistics, sustaining visual fidelity over extended rollouts. To enable rigorous evaluation, we further introduce SurgWorld-Bench, featuring diverse procedure types, long-range rollouts, and decoupled metrics for instrument-motion accuracy and tissue-response fidelity. Extensive experiments show that SurgVista consistently outperforms state-of-the-art methods across visual quality, temporal consistency, and interaction fidelity, with gains widening as the prediction horizon grows.

15.
arXiv (CS.CV) 2026-06-15

Aligned but Stereotypical? How System Prompts Shape Demographic Bias in LLM-Based Text-to-Image Models

Text-to-image (T2I) systems increasingly rely on Large Language Model (LLM)-based text conditioning to interpret and expand user prompts. While this improves prompt understanding and text-image alignment, we find that it can also introduce implicit demographic assumptions, even when demographic attributes are unspecified. To systematically investigate this behavior across varying levels of prompt ambiguity and complexity, we construct a comprehensive benchmark covering diverse prompt settings. Evaluations on eight recent T2I models show that LLM-based systems consistently exhibit stronger demographic skew than non-LLM-based baselines. We further analyze system prompts, a component unique to LLM-based T2I systems that guides prompt interpretation and expansion. Our analyses show that these instructions strongly influence text embeddings, which subsequently leads to biased image generations. Motivated by these findings, we propose FairPro, a training-free debiasing framework that adaptively generates fairness-aware instructions while preserving user intent. Experiments demonstrate that FairPro substantially reduces demographic disparities while maintaining prompt fidelity.

16.
Nature (Science) 2026-06-10

Daily briefing: Ancient ground squirrels ate like ‘zombies of the Pleistocene’

Authors:

Evidence from fossilized poo reveals the diverse diet of ancient ground squirrels. Plus, the science behind the peptide craze and our innate tendency to wander anticlockwise. Evidence from fossilized poo reveals the diverse diet of ancient ground squirrels. Plus, the science behind the peptide craze and our innate tendency to wander anticlockwise.

17.
medRxiv (Medicine) 2026-06-18

MOSAIC: Methylation-Oriented Site Analysis and Information Classifier for Robust Epigenomic Classification of Acute Leukemia in Clinical Cohorts with Variable Tumor Purity

DNA methylation-based classification offers a rapid diagnostic complement to conventional molecular workflows in acute leukemia. Existing classifiers are trained on array-derived reference cohorts whose construction favors specimens with adequate tumor content, leaving clinically relevant low-purity specimens underrepresented and classifier robustness in this regime uncharacterized. On held-out low-purity specimens, existing classifiers were concordant with expert pathology in only 7 of 10 (MARLIN) and 5 of 10 (ALMA) cases, motivating a classifier built to maintain accuracy at low tumor purity. We developed MOSAIC (Methylation-Oriented Site Analysis and Information Classifier), a neural network classifier built to maintain accuracy across the full range of tumor purities encountered in clinical practice. MOSAIC is a neural network trained on publicly available array-based methylation data augmented with native methylation calls from Oxford Nanopore sequencing. MOSAIC was evaluated on low-purity specimens held out entirely from training. On these held-out low-blast leukemia specimens, all below 25% blasts and including a case at 1.4%, MOSAIC was concordant with expert pathology in every case, recovering the correct subtype where diluted disease signal would otherwise be mistaken for normal or unrelated tissue. Gradient-based saliency analysis showed that the network relies on a partially distinct set of discriminative CpG probes when classifying low-blast specimens. MOSAIC demonstrates that augmenting training with clinically representative clinical specimens yields methylation-based leukemia classification that maintains effectiveness under the variable tumor purity of real clinical cohorts.

18.
arXiv (CS.AI) 2026-06-17

Belief-Space Control for Personalized Cancer Treatment via Active Inference

arXiv:2606.10376v2 Announce Type: replace Abstract: Cancer treatment is at the core a sequential decision-making problem with partial observability, latent patient heterogeneity, and explicit constraints on the budget for medical measurements. Unlike standard Reinforcement Learning (RL) approaches that control state trajectories, cancer treatments permanently modify patients' transition dynamics, changing how states evolve over time. We model cancer treatment as a belief-space planning problem using active inference, deriving an expected free-energy objective that unifies goal-directed control and information acquisition under measurement budgets without. We implement this framework using real clinical cancer data from the AACR Project GENIE Biopharma Collaborative dataset. Results on clinical data demonstrate a simultaneous patient categorization and high treatment efficacy, under real measurement and treatment constraints.

19.
arXiv (CS.LG) 2026-06-19

Representing Piecewise-Linear Functions by Functions with Minimal Arity

arXiv:2406.02421v2 Announce Type: replace-cross Abstract: Any continuous piecewise-linear function $F\colon \mathbb{R}^{n}\to \mathbb{R}$ can be represented as a linear combination of $\max$ functions of at most $n+1$ affine-linear functions. In our previous paper [``Representing piecewise linear functions by functions with small arity'', AAECC, 2023], we showed that this upper bound of $n+1$ arguments is tight. In the present paper, we extend this result by establishing a correspondence between the function $F$ and the minimal number of arguments that are needed in any such decomposition. We show that the tessellation of the input space $\mathbb{R}^{n}$ induced by the function $F$ has a direct connection to the number of arguments in the $\max$ functions.

20.
arXiv (CS.AI) 2026-06-11

Offline Diffusion Policy for Multi-User Delay-Constrained Scheduling

arXiv:2501.12942v2 Announce Type: replace Abstract: Effective multi-user delay-constrained scheduling is crucial in various real-world applications, including embodied AI, instant messaging, live streaming, and data center management, where efficient resource allocation is required among users with diverse delay sensitivities. In these scenarios, schedulers must make real-time decisions to satisfy both delay and resource constraints without prior knowledge of system dynamics, which are often time-varying and challenging to estimate. {Current learning-based methods typically require online interactions with actual systems during the training stage. Therefore, these approaches are often difficult or impractical, as they can significantly degrade system performance and incur substantial service costs.} To address these challenges, we propose a novel offline reinforcement learning-based algorithm, named \underline{S}cheduling By \underline{O}ffline Learning with \underline{C}ritic Guidance and \underline{D}iffusion Model (SOCD), to learn efficient scheduling policies purely from pre-collected offline data. SOCD innovatively employs a diffusion policy, complemented by a sampling-free critic network for policy guidance. By integrating the Lagrangian multiplier optimization into the offline reinforcement learning, SOCD efficiently trains high-quality constraint-aware policies exclusively from available datasets, eliminating the need for online interactions with the system. Experimental results demonstrate that SOCD is resilient to various system dynamics, including partially observable and large-scale environments, and delivers superior performance compared to existing methods.

21.
medRxiv (Medicine) 2026-06-16

The Target48 Neurodegeneration Panel: A Novel Tool for Profiling Protein Signatures in Neurodegenerative Disorders

Introduction: Novel tools for absolute quantification of established and emerging fluid neuro-biomarkers are required to advance diagnostic studies and improve biological insights. Methods: We conducted an extensive analytical and clinical validation of the Olink Target 48 Neurodegeneration panel (T48 Neuropanel) in 352 paired CSF and plasma samples from cognitively unimpaired controls (CU), Alzheimer dementia (AD), frontotemporal dementia (FTD), and dementia with Lewy bodies (DLB), n=44 per group. Comparisons with benchmark assays were performed. Results: Good detectability (CSF: 31 out of 42 assays; plasma: 38 out of 42 assays) and technical performance was observed. Benchmark assays showed good correlations, supporting method transformation formulas. Next to emerging biomarkers (MMP10, ITGB2), discriminative performance was excellent in AD: CSF pTau217: AUC=1; FTD: plasma NfL: AUC=0.952; and DLB: CSF DDC: AUC=0.901. Discussion: This analytical and clinical validation of the T48 Neuropanel highlights initial cut-offs and emerging biomarkers to aid clinical studies for the diagnosis, prognosis, and monitoring of neurodegenerative diseases. Highlights: The T48 Neuropanel shows robust analytical performance, with high detectability across both plasma and CSF matrices. The T48 Neuropanel validates established (i.e., pTau217, Abeta42, NfL, and GFAP) and emerging biomarkers (i.e., DDC, MMP10, ITGB2, ITGAM, NPTX2, NPTXR, SMOC1, sTREM1, and sTREM2) in CSF and plasma. CSF NfL, GFAP, ITGB2, and ITGAM and plasma GFAP were dysregulated across AD, FTD, and DLB dementias. -The multiplex design of the T48 Neuropanel enables rich biological interpretation by simultaneously quantifying established and emerging neurodegeneration biomarkers. Importantly, the inclusion of absolute quantification facilitates the establishment of cut-offs, supporting its potential for clinical translation.

22.
arXiv (CS.LG) 2026-06-16

The Algebra of Units: From Buckingham's Pi-grec Theorem to Latent-Variable Learning

arXiv:2606.16737v1 Announce Type: cross Abstract: Engineers often measure many quantities-speed, pressure, temperature, length-expressed in different physical units. The Buckingham Pi-grec theorem states that these variables can always be combined into a smaller set of dimensionless numbers whose values fully determine the system's behaviour. Identifying the appropriate dimensionless groups has traditionally required expert knowledge and physical insight. This paper shows that they can instead be discovered automatically from data, without prior knowledge of the governing physics. The key observation is that, after logarithmic transformation, measurements collected under different scalings of the same system lie on a low-dimensional manifold whose geometry is determined by the underlying dimensionless groups. Singular value decomposition (SVD) identifies this manifold directly from data. A subsequent search over integer-exponent combinations recovers candidate dimensionless quantities, while a repeating-variable filter retains only those constructed from the machine's characteristic scales. This procedure recovers familiar engineering groups, including the flow coefficient, head coefficient, and Mach number, while excluding equivalent but less interpretable alternatives. The method is demonstrated on a synthetic compressor dataset containing 16,000 measurements. Starting from raw dimensional variables and no physics input, it recovers the correct dimensionless groups to numerical precision and reproduces the compressor performance map with an error below 0.01%. More broadly, the work reveals a close connection between classical dimensional analysis and modern data-driven learning. Both rely on the same underlying algebraic structure, suggesting new approaches for building physical models that are simultaneously interpretable, scalable, and data-efficient.

23.
arXiv (CS.CV) 2026-06-12

Self-Evolving Vision-Language Models for Image Quality Assessment via Voting and Ranking

Improving vision-language models (VLMs) in the post-training stage typically relies on supervised fine-tuning or reinforcement learning, methods that necessitate costly, human-annotated data. While self-supervised techniques have proven effective for enhancing reasoning capabilities, their application to perceptual domains such as image quality assessment (IQA) remains largely unexplored. In this work, we introduce EvoQuality, a novel framework that enables a VLM to autonomously refine its quality perception capabilities without any ground-truth labels. EvoQuality adapts the principle of self-consistency to the ranking-based nature of IQA. It generates pseudo-labels by performing pairwise majority voting on the VLM's own outputs to establish a consensus on relative quality. These pseudo-rankings are then formulated into a fidelity reward that guides the model's iterative evolution through group relative policy optimization (GRPO). By iteratively leveraging its own predictions, EvoQuality progressively refines the VLM's perceptual capability. Extensive experiments show that EvoQuality boosts the base VLM's zero-shot performance by 31.8% on PLCC across diverse IQA benchmarks. Remarkably, despite being entirely self-supervised, EvoQuality achieves performance that is competitive with, or even surpasses, state-of-the-art supervised VLM-based IQA models, outperforming these models on 5 out of 7 IQA benchmarks. Furthermore, the framework demonstrates significant flexibility, allowing it to be stacked with pre-trained IQA models to bolster generalization on unseen datasets. Codes and checkpoints will be available at https://github.com/bytedance/EvoQuality.

24.
arXiv (CS.AI) 2026-06-16

Mask-Proof: An LLM-based Automated Data Curation Pipeline on Mathematical Proofs

arXiv:2606.15258v1 Announce Type: new Abstract: Large language models (LLMs) are increasingly capable of mathematical problem solving and can even assist with research-level proofs, yet we still lack a scalable and reproducible way to measure step-level reasoning in long proofs across diverse sources. This evaluation gap limits trustworthy AI assistance in proof-certified scientific progress. Existing evaluations often emphasize final answers or rely on costly expert grading, while end-to-end proof generation remains open-ended and hard to verify automatically. We introduce Mask-Proof, a pipeline that turns real proofs into automatically checkable masked-step tasks. It masks key formula steps, provides the necessary surrounding context, and evaluates model reconstructions with an LLM-based equivalence judge using repeated votes for stability. The resulting Mask-ProofBench contains 292 curated problems across diverse research areas. Experiments with 17 models show that reasoning-enhanced models outperform standard models by 12% to 27%. Our evaluator achieves 96.8% agreement with expert annotators, enabling faithful, reproducible, and comparable measurement of step-level mathematical reasoning. Benchmark, annotations, and code are available at https://github.com/weating/Mask-Proof.

25.
arXiv (CS.CV) 2026-06-16

Hierarchical Fine-Grained Aerial Object Detection

Fine-grained aerial object detection, driven by the intrinsic granularity of real-world object categories, is crucial for advanced scene understanding in remote sensing. Existing methods largely inherit the paradigm of coarse-grained object detection, relying solely on single-label supervision and thus struggling to distinguish model-level categories with subtle structural differences. However, for each specific model (e.g., Boeing 787), structured prior knowledge such as attributes and hierarchies offers discriminative semantics across multiple granularities. Motivated by this, we present ExpertDet, a scheme that incorporates expert-informed cues to enhance fine-grained aerial object detection. Specifically, we design Vision-aware Masked Attribute Modeling (VMAM), which aligns attribute semantics with visual structures by reconstructing randomly masked attributes from visual cues, enabling the detector to capture subtle structural distinctions. We further propose Hierarchical Visual Instance Promotion (HierVIP), which builds a visual prototype tree based on hierarchical relations and imposes taxonomy-aware constraints to preserve cross-level semantic continuity while enhancing category discrimination. Moreover, we curate a new fine-grained object detection benchmark for Precise recognition of model-specific Ships and Planes from aerial imagery, PSP, covering 106 ship classes and 30 airplane models, respectively, featuring the most extensive collection of model-specific categories among existing aerial object detection datasets to date. We benchmark state-of-the-art object detection algorithms on the PSP benchmark. Extensive evaluation demonstrates that ExpertDet consistently outperforms other fine-grained competitors across hierarchy levels. The dataset, benchmark, and code are available at https://nnnnerd.github.io/PSP-Benchmark/.