Academic Intelligence · Curated Daily

探索全球前沿学术脉络

AcademicHub 汇聚顶级期刊与预印本平台的实时文献。定制您的专属科研雷达,利用大语言模型自动生成交叉领域文献分析简报。

01.
arXiv (CS.CL) 2026-06-16

Detecting Hate and Inflammatory Content in Bengali Memes: A New Multimodal Dataset and Co-Attention Framework

Internet memes have become a dominant form of expression on social media, including within the Bengali speaking community. While often humorous, memes can also be exploited to spread offensive, harmful, and inflammatory content targeting individuals and groups. Detecting this type of content is exceptionally challenging due to its satirical, subtle, and culturally specific nature. This problem is magnified for low-resource languages like Bengali, as existing research predominantly focuses on high-resource languages. To address this critical research gap, we introduce Bn-HIB (Bangla Hate Inflammatory Benign), a novel dataset containing 3,247 manually annotated Bengali memes categorized as Benign, Hate, or Inflammatory. Significantly, Bn- HIB is the first dataset to distinguish inflammatory content from direct hate speech in Bengali memes. Furthermore, we propose the MCFM (Multi-Modal Co-Attention Fusion Model), a simple yet effective architecture that mutually analyses both the visual and textual elements of a meme. MCFM employs a co-attention mechanism to identify and fuse the most critical features from each modality, leading to a more accurate classification. Our experiments show that MCFM significantly outperforms several state-of-the-art models on the Bn-HIB dataset, demonstrating its effectiveness in this nuanced task. To facilitate reproducibility and future research, the Bn-HIB dataset has been made publicly available through Mendeley Data. Warning: This work contains material that may be disturbing to some audience members. Viewer discretion is advised

02.
arXiv (math.PR) 2026-06-12

Data-driven subsampling rates for diffusion parameter estimation of SDEs

arXiv:2606.13615v1 Announce Type: new Abstract: We study the problem of diffusion parameter estimation for stochastic differential equation (SDE) models in scenarios where data and model are compatible only on specific scales that have yet to be determined. We introduce a simple and efficient method for selecting suitable rates at which given time series data should be subsampled in order to ensure that the statistical structure of the subsampled data is consistent with the behavior of the SDE model on an infinitesimal scale. Our approach is based on analyzing the statistics of the lengths of monotonically increasing or decreasing segments in the subsampled data sequence, which we refer to as monotone runs. As an analytical foundation, we prove for a large class of SDEs with additive noise that the lengths of monotone runs at an infinitesimal scale are approximately geometrically distributed with success probability $1/2$. This universal characterization is employed to derive an automated method for selecting appropriate subsampling rates for given time series data that is directly applicable in real-world scenarios and does not rely on an asymptotic framework of multiscale diffusions. The approach is demonstrated using an application from industrial mathematics concerning surrogate models for fiber lay-down curves in production processes of nonwoven textiles.

03.
arXiv (CS.LG) 2026-06-18

Quantum Annealing Enhanced Reinforcement Learning for Accurate Remaining Useful Lifetime Prediction

arXiv:2606.18503v1 Announce Type: new Abstract: Remaining useful life (RUL) estimation is central to predictive maintenance, where an unplanned failure can cost far more than the asset itself. Statistical degradation models miss the strong nonlinearity of real systems, and data-driven models often converge to suboptimal solutions in high-dimensional, non-convex search spaces. We propose a Quantum Annealing enhanced Q-Learning (QAQL) framework that couples the sampling behaviour of quantum annealing with the sequential decision making of Q-learning. Each Q-value update is encoded as a small quadratic unconstrained binary optimization (QUBO) whose ground state is the greedy action; rather than acting as a deterministic optimizer, the annealer returns a distribution over near-optimal actions across many reads, and this stochastic action selection supplies the exploration that curbs premature convergence on nonlinear degradation trajectories. The QUBO is solved on the D-Wave Advantage system using minor embedding, with the annealer woven into the reinforcement-learning loop rather than bolted on after training. We validate QAQL on two public benchmarks: the NASA C-MAPSS turbofan engine datasets and a device-fleet predictive maintenance dataset. Averaged over many independent runs and across six error metrics, QAQL outperforms the classical and quantum baselines considered in this study, with statistically significant improvements. The results indicate that quantum annealing is a usable, not merely theoretical, optimizer inside a reinforcement-learning loop for industrial predictive-maintenance applications.

04.
arXiv (CS.LG) 2026-06-18

Predicting the Neutrino Mass Ordering Using Neural Networks

arXiv:2606.03745v1 Announce Type: cross Abstract: Determining the neutrino mass ordering remains a central open problem in particle physics. While next-generation long-baseline experiments are expected to resolve this question, current data provide limited sensitivity because the spectral differences between normal and inverted ordering are subtle and entangled with parameter degeneracies. We investigate a machine-learning strategy for mass-ordering determination using a feed-forward neural-network classifier trained on synthetic long-baseline datasets generated with three-flavour oscillation probabilities, matter effects, and statistical fluctuations. We evaluate the classifier against standard $\chi^2$ and $\log\mathcal{L}$ approaches using common discrimination metrics, including receiver-operating-characteristic curves, to quantify sensitivity and to illustrate how operating points can be selected to prioritise purity or efficiency. We find that the neural network achieves performance comparable to conventional fits for the scenarios studied, providing a flexible, independent cross-check of established analyses. The framework can be extended to incorporate systematic uncertainties and to explore joint inference of oscillation parameters, and it may also serve as a pedagogical tool for introducing machine-learning methods in neutrino physics.

05.
arXiv (CS.CV) 2026-06-15

Stream3D: Sequential Multi-View 3D Generation via Evidential Memory

View-conditioned 3D generators such as SAM 3D, TRELLIS, and Hunyuan3D produce high-quality object reconstructions from a single view, but real-world visual observation often arrives as long monocular streams. Naively applying these generators to each streaming frame independently leads to severe temporal inconsistency in the generated results. To address this problem, we propose Stream3D, the first training-free streaming mechanism that turns a frozen view-conditioned 3D generator into a streaming generator with constant cross-chunk memory. Stream3D achieves this by maintaining a compact evidential memory, which selectively caches the most informative historical frames based on a proposed evidence score mechanism. As the stream progresses, the memory dynamically updates to retain a fixed number of informative frames, preventing the memory footprint from growing linearly with sequence length. This also prevents degradation over long sequences and keeps the underlying generator completely unchanged without retraining, architectural modifications, or auxiliary losses. Evaluated on both realistic and synthetic streaming benchmarks, Stream3D outperforms latent-transport baselines, including KV-cache reuse and flow-based feature editing, across both photometric and geometric metrics. More details can be found at: https://stream-3d.github.io/stream3d.github.io/.

06.
arXiv (CS.CL) 2026-06-11

The Dynamics of Human and AI-Generated Language: How Semantics Fluctuates across Different Timescales

Spoken language, whether produced by humans or large language models (LLM), unfolds over time with varying semantic content. However, we still lack simple, interpretable time-series features that capture how generic versus specific content is distributed over time, and that can be used to compare human and AI-generated speech. We introduce a semantic-timescale analysis pipeline that turns word-level transcripts with timestamps into semantic time-series. For each spoken narrative, we compute (i) semantic specificity using WordNet-based word depth and (ii) contextual similarity using SBERT embeddings and quantify their temporal dependence using autocorrelation-window measures (ACW-0 and related metrics). We then compare original speech to multiple shuffled controls that selectively disrupt lexical identity, temporal order, and word duration. Across human-read autobiographical narratives, TTS readings, and LLM-generated texts rendered with TTS, we find that segments with longer ACW-0 in the semantic time-series tend to contain more generic vocabulary, whereas segments with shorter ACW-0 are enriched in more specific words. These associations are strongly attenuated or abolished when word order and timing are randomized, indicating that ACW-based measures capture non-trivial temporal organization of semantic content beyond static lexical distributions. Our results suggest that ACW-based semantic timescales are a useful family of features for analyzing and comparing the temporal structure of human and AI-generated speech.

07.
arXiv (CS.CV) 2026-06-16

Divide-and-Denoise: A Game-Theoretic Method for Fairly Composing Diffusion Models

The abundance of pre-trained diffusion models provides an opportunity for composition. Combining several models, however, runs the risk of one model dominating or models disagreeing with each other. Here, we propose Divide-and-Denoise, a method for coordinating multiple pre-trained diffusion models during sampling. Much like managing a specialized workforce, our method creates a fair but efficient division of labor across models. Central to our method is the notion of an allocation which defines the responsibility of each model to every region of the noisy sample. At every timestep, we then denoise by (i) updating the allocation by solving a fair division game, where we divide the sample into regions that maximize total utility under fairness constraints, and (ii) aligning the models with this allocation, where we guide each model to denoise within its assigned region. This leads to a new composite denoising process that evolves in tandem with a division process. We evaluate Divide-and-Denoise on conditional image generation. Across several quality metrics, including the GenEval benchmark, our method outperforms baselines and resolves common failures including missing objects and mismatched attributes. Experiments show that Divide-and-Denoise utilizes each model's expertise without neglecting any other model.

08.
arXiv (CS.AI) 2026-06-16

S-SPPO: Semantic-Calibrated Self-Play Preference Optimization

arXiv:2606.01561v2 Announce Type: replace Abstract: Aligning Large Language Models (LLMs) with human preferences is often formulated via Direct Preference Optimization (DPO). However, the standard Bradley-Terry instantiation of DPO is limited in modeling common departures from transitivity in human preferences. To address this, recent work has introduced Self-Play Preference Optimization (SPPO), which iteratively refines the policy by training on self-generated win-lose pairs. Our investigation, however, reveals a critical instability in SPPO: the optimization is prone to policy degeneration when the preference oracle assigns overly confident wins to semantically indistinguishable responses. To mitigate this, we propose S-SPPO, a dual-space semantic calibration framework comprising: i) Supervision Calibration via semantic gating, which anneals win rate targets toward the maximum-entropy baseline as semantic overlap increases; and ii) Representation Calibration via latent repulsion to enforce geometric diversity to prevent manifold collapse and maintain latent diversity between chosen and rejected samples. Theoretically, we show that the calibration preserves the constant-sum game structure, facilitating convergence to a Nash Equilibrium. Empirically, S-SPPO avoids the performance degradation seen in prior methods, achieving 52.19% win rate and 47.46% length-controlled win rate on AlpacaEval 2.0 with Llama-3-8B, without using additional human-annotated preferences during training. The code will be available at https://github.com/xiwenc1/s-sppo.

09.
arXiv (CS.CV) 2026-06-16

Variable-Rate Deep Image Compression based on Low-Rank Adaptation by Progressive Learning

In the digital age, image compression is crucial for numerous applications, including web media, streaming services, high-resolution medical imaging, and connected vehicle networks, enabling efficient data storage and transmission. With the increasing demand for high-quality image communication, the need for advanced compression techniques becomes increasingly critical. Numerous Deep Image Compression (DIC) techniques have recently been introduced, showing impressive performance compared to traditional standards. However, variable-rate image compression remains an unresolved issue. Specific DIC methods deploy multiple networks to attain different compression rates, whereas others use a single model, which often results in higher computational complexity and reduced performance. This work proposes a progressive learning approach for variable-rate image compression based on the parameter-efficient fine-tuning method, the Low-Rank Adaptation (LoRA). We introduce an additional LoRA Rate-Adaptive Module (LoRAM) in DIC methods. Due to the re-parameterized merging of LoRA, our proposed method does not introduce additional computational complexity during inference. Compared to methods utilizing multiple models, comprehensive experiments demonstrate that our approach achieves competitive performance, saving 99\% in parameter storage, 90% in datasets, and 97% in training steps.

10.
arXiv (CS.CV) 2026-06-16

Improved Baselines with Representation Autoencoders

Representation Autoencoders (RAE) replace traditional VAE with pretrained vision encoders. In this paper, we systematically investigate several design choices and find three insights which simplify and improve RAE. First, we study a generalized formulation where the representation is defined as sum of the last k encoder layers rather than solely the final layer. This simple change greatly improves reconstruction without encoder finetuning or specialized data (e.g., text, faces). Second, we study the prevalent assumption that RAE (using pretrained representation as encoder) replaces representation alignment (REPA), which distills the same representation to intermediate layers instead. Through large-scale empirical analysis, we uncover a surprising finding: RAE and REPA exhibit complementary working mechanisms, allowing the same representation to be used as both encoder and target for intermediate diffusion layers. Finally, the original RAE struggles with classifier-free guidance (CFG) and requires training a second, weaker diffusion model for AutoGuidance (AG). We show that REPA itself can be viewed as x-prediction in RAE latent space. By simply re-parameterizing the output of the DiT model, it can provide guidance for "free". Overall, RAEv2 leads to more than 10x faster convergence over the original RAE, achieving a state-of-the-art gFID of 1.06 in just 80 epochs on ImageNet-256. On FDr6, RAEv2 achieves a state-of-the-art 2.17 at just 80 epochs compared to the previous best 3.26 (800 epochs) without any post-training. This motivates EPFID@k (epochs to reach unguided gFID < k) as a measure of training efficiency. RAEv2 attains an EPFID@2 of 35 epochs, versus 177 for the original RAE. We also validate our approach across diverse settings for text-to-image generation and navigation world models, showing consistent improvements. The code is available at https://raev2.github.io.

11.
medRxiv (Medicine) 2026-06-15

ICD-10 Code Ambiguity Obscures Treatment-Eligible Adults with Spinal Muscular Atrophy: A Single-Center Chart Review and Patient Outreach Study

Background. Three disease-modifying therapies (DMTs) for spinal muscular atrophy (SMA) have been approved since 2016, yet many adults remain untreated. Identifying them depends on ICD-10 codes that capture SMA but do not reliably distinguish it from other related conditions. We examined, in one U.S. health system, both patients' engagement with therapy and the accuracy of the codes used to find them. Methods. We conducted a retrospective chart review of adults in an academic health system identified by SMA-associated ICD-10 codes, with manual adjudication of diagnosis and DMT status. Confirmed SMA-positive, DMT-naive patients were invited to a structured telephone interview on treatment awareness and barriers. Results. Of 60 charts, 22 (36.7%; 95% CI 25.6-49.3%) were appropriately coded for SMA or a related disorder; only 16 (26.7%) had molecularly confirmed SMA. The other 38 (63.3%) were miscoded, spanning spinal and bulbar muscular atrophy, asymptomatic carriers, prenatal screening, and conditions unrelated to SMA. Ten of the 16 confirmed patients (62.5%) were DMT-naive; one was interviewed, one declined, and eight could not be reached. The non-response is itself a finding: the patients least visible to administrative data are the hardest to reach. Conclusions. ICD-10 ambiguity is a barrier to treatment access in adult SMA, as is loss to follow-up. We make two recommendations: continuous documentation-coding alignment that uses natural language processing to verify the genetic precondition, and type-specific SMA codes (subcodes for Types 0-4) anchored on molecular SMN1 confirmation. Together these would support cohort identification, outreach, and evidence generation without adding to clinician burden.

12.
arXiv (CS.AI) 2026-06-17

ARVO: Atlas of Reproducible Vulnerabilities for Open-Source Software

arXiv:2606.17283v1 Announce Type: cross Abstract: Achieving reproducibility, quantity, and diversity in vulnerability datasets has long been viewed as an inherent three-way trade-off, where improving one dimension often comes at the cost of the others. In practice, reproducibility has been the dimension most often neglected. This has limited what can be automatically extracted from historical bug datasets, and has reduced their utility for downstream security research. In this work, we propose a method to produce a new security dataset which ensures reproducibility for diverse vulnerabilities at scale by identifying the key obstacles to large-scale bug reproduction and addressing them with general solutions. Using this method, we introduce full reproducibility to the largest open source software vulnerability dataset (OSS-Fuzz) and construct the ARVO dataset (an Atlas of Reproducible Vulnerabilities in Open-source software). ARVO is a large-scale dataset consisting of over 6,100 real-world vulnerabilities across 311 projects. Focusing on reproducibility, ARVO differs from existing datasets by providing each vulnerability in a form that can be consistently rebuilt, triggered, and analyzed across versions. Reproducibility also enables automatic identification of the corresponding patch for each vulnerability and supports direct interaction with vulnerabilities after code changes, capabilities that existing large-scale datasets do not provide. In our evaluation, ARVO successfully reproduces 81% of vulnerabilities and achieves 89.4% accuracy on the located patches. We also discuss ARVO's influence on both upstream practices and downstream security research.

13.
arXiv (CS.CV) 2026-06-12

SmartFont: Dynamic Condition Allocation for Few-Shot Font Generation

Few-shot font generation simultaneously requires global structural completeness and fine-grained local style fidelity. Existing methods usually either rely on global content-style modeling, which is robust but imperfectly disentangled, or emphasize component/local modeling, which captures fine details but relies heavily on local priors and reference coverage. We argue that the key challenge is not merely to learn purer conditions, but to organize complementary yet biased global and local conditions through multi-level allocation during generation. To this end, we propose SmartFont, a diffusion-based few-shot font generation framework that combines global content-style generation with weakly supervised local corrective experts. The local branch performs semantic-spatial allocation by learning expert-wise local concepts and semantically meaningful spatial maps under weak component supervision, enabling fine-grained correction without requiring explicit component-conditioned inference. On top of this, a denoising-state condition allocation module adaptively weights global content, global style, and local corrective feature across timesteps and injection blocks. Extensive experiments show that SmartFont achieves better global-local balance, improves glyph quality and local detail fidelity.

14.
arXiv (CS.LG) 2026-06-11

Structure-Preserving Neural Surrogates with Tractable Uncertainty Quantification

arXiv:2606.11650v1 Announce Type: new Abstract: Recent advances in scientific machine learning provide a means of near-real-time solution to partial differential equations (PDEs), but lack the theoretical underpinnings of conventional simulators that support contemporary verification and validation. In this work, we construct data-driven reduced-order models that serve as structure-preserving, real-time surrogates. Remarkably, the exterior calculus that imposes physical conservation structure also exposes topological structure that we use to build a Gaussian process (GP) representation of uncertainty in state-flux relationships, ultimately yielding a Dirichlet-to-Neumann map for quantities of interest with closed-form expressions for posterior uncertainty. We specifically propose structure-preserving $H(\mathrm{div})$–$L^2$ subspaces of conventional Raviart–Thomas and $dgP_0$ elements prescribed by a lightweight transformer. Reduced-order dynamics consistent with this subspace are learned by posing a conservation law in which a GP describes the fluxes between volumes. This work hinges on a novel interface between mixed FEM spaces and GP regression; when training is posed as the optimal recovery problem (ORP), the resulting GP regression can be written as an optimization problem with equality constraints that impose a conservation structure, amenable to a fast Schur-complement training strategy. The trained model can then be solved in real time with closed-form estimators for boundary fluxes driven by prescribed Dirichlet data. The paper includes RKHS posterior error bounds for linear functionals to support uncertainty quantification, as well as numerical experiments demonstrating the accuracy of the posterior distribution as a surrogate for error estimation.

15.
arXiv (CS.AI) 2026-06-12

OCOO-T : A Simple and Scalable Virtual Cell Model for Transcriptional Perturbation Response Prediction

arXiv:2606.12838v1 Announce Type: cross Abstract: Predicting single-cell transcriptional responses to genetic, chemical and cytokine perturbations is a fundamental challenge in computational biology and AI Virtual Cell (AIVC) modeling, with direct implications for drug discovery and the elucidation of gene regulatory networks. Existing approaches often rely on auxiliary cell-state encoders, hierarchical variational autoencoders, dedicated Transformer encoder-decoder modules, or gene-interaction priors to compress high-dimensional expression profiles into latent representations. While effective, these designs increase architectural complexity and may limit scalability and generalizability. This paper introduces OCOO-T, a minimalist flow-matching-based AIVC model for transcriptional perturbation response prediction. OCOO-T utilizes a vanilla Transformer stack that operates directly on continuous gene expression profiles and formulates perturbation response prediction as a continuous-time denoising process. Perturbation embeddings, dosage information, and cell-line/cell-type specificity are integrated through adaptive layer normalization and in-context tokens. Comprehensive evaluations on Tahoe100M, Replogle, and PBMC benchmarks demonstrate that OCOO-T achieves state-of-the-art performance across diverse perturbations and cell types while effectively scaling to long transcriptional profiles through patching and depatching of cellular contexts. By leveraging the simplicity of Transformer-based denoising for single-cell omics, OCOO-T provides an effective and scalable framework for in-silico cellular simulation.

16.
PLOS Medicine 2026-05-20

Brain morphology in Anorexia Nervosa and its subtypes: A multi-cohort study of individual participant data

by Fabio Bernardoni, Dominic Arold, Luis Schoppik, Klaas Bahnsen, Ruiyang Ge, Clara Moreau, Lasse Bang, Federico D’Agata, Giovanni Abbate-Daga, Christian K. Tamnes, Iain Campbell, Owen O’Daly, Ulrike Schmidt, Guido Frank, Stefanie Horndasch, Andreas Hess, Arnd Dörfler, Hans-Christoph Friederich, Joe Simon, Angela Favaro, Luca Lavagnino, Christina E. Wierenga, Amanda Bischoff-Grethe, Amy E. Miles, Allan Kaplan, Aristotle Voineskos, Paul A. M. Smeets, Annemarie A. van Elburg, Unna Danner, Sophia I. Thomopoulos, Laura Berner, Neda Jahanshad, Sophia Frangou, Joseph A. King, Paul Thompson, Stefan Ehrlich Background In a recent coordinated meta-analysis of neuroimaging data, we reported gray matter (GM) alterations in acutely underweight patients with anorexia nervosa (AN). Here, we extend these findings by examining individual variation in brain structure within AN, individual-level differentiation between AN and healthy controls (HC), and differences between AN subtypes, with potential relevance for understanding clinical heterogeneity. Methods and findings We analyzed individual-level data from 11 international sites in the ENIGMA Eating Disorders Working Group, including 570 female participants with AN and 739 HC. We examined cortical thickness, cortical surface area and subcortical volumes in AN versus HC using three complementary approaches: (i) group-level differences in a mega-analysis correcting for age effects, (ii) frequencies of extreme deviations (infra-/supranormal; z  1.96) based on normative reference models by the CentileBrain Initiative, and (iii) individual-level classification performance using machine learning. The same analytic framework was applied to compare AN restricting versus binge-eating/purging subtype, additionally correcting for BMI effects.Mega-analyses reinforced previous meta-analytic findings of pronounced and widespread GM deficits in AN compared to HC. Normative modelling revealed that the frequency of infranormal z-scores (23/68 cortical thickness, 13/14 subcortical volume metrics) and supranormal z-scores (35/68 cortical thickness, 17/68 cortical surface area metrics) was significantly higher in AN than expected based on reference data. Individuals with AN could be reliably differentiated from HC using machine-learning classifiers (ROC–AUC = 0.75–0.81). In contrast, neither group-level differences nor frequency of extreme z-scores differed between AN subtypes, and individuals with different subtypes could not be reliably differentiated from each other. Importantly, the observational design cannot distinguish neurobiological differences related to AN from the effects of starvation or low BMI in the AN versus HC analyses. The lack of differences between subtypes does not exclude brain structural differences between AN subtypes that might be detectable with other modalities or analytic approaches. Conclusion Using a mega-analytic approach, we confirm widespread GM deficits in AN, show that these alterations are (in some patients) extreme, and demonstrate that they enable robust classification with superior performance compared to most MRI-based psychiatric classification studies. The absence of differences between AN subtypes may reflect shared neurobiology, though other imaging modalities may reveal distinctions beyond brain structure.

17.
arXiv (CS.CV) 2026-06-19

HypOProto: Hyperbolic Ordinal Prototypes for Left Ventricular Filling Pressure Classification

Echocardiography (echo) is a widely used imaging modality for assessing cardiac function, with Left Ventricular Filling Pressure (LVFP) serving as a critical physiological marker for conditions such as heart failure. Standard LVFP classification into normal vs elevated categories relies on the Doppler-derived $E/e'$ ratio, which is operator-dependent and often unavailable in resource-limited settings, motivating methods that infer LVFP directly from B-mode echo. Existing deep learning approaches achieve high performance but remain largely black-box, limiting clinical interpretability. We propose HypOProto, a hyperbolic, ordinal prototype-based framework for interpretable LVFP classification using a frozen, explainable foundation model backbone. HypOProto arranges prototypes along the physiological $E/e'$ scale, placing borderline cases near the hyperboloid root where small angular differences separate similar cases, while normal and elevated cases occupy outward positions reflecting increasing diagnostic certainty. This hyperbolic geometry encodes clinically meaningful ordinal relationships and improves interpretability. We also introduce a novel Hyperbolic Prototype Angular Separation (HyperPAS) loss, enforcing inter-class prototype separation in hyperbolic space. HypOProto achieves SOTA performance while maintaining transparency, and highlights clinically relevant regions in visualizations. This work represents the first prototype-based framework for LVFP classification in echo. Our code can be found at https://github.com/DeepRCL/HypOProto.

18.
arXiv (CS.LG) 2026-06-16

Communication-Efficient Distributed Training for Collaborative Flat Optima Recovery in Deep Learning

arXiv:2507.20424v3 Announce Type: replace Abstract: We study centralized distributed data parallel training of deep neural networks (DNNs), aiming to improve the trade-off between communication efficiency and model performance of the local gradient methods. To this end, we revisit the flat-minima hypothesis, which suggests that models with better generalization tend to lie in flatter regions of the loss landscape. We introduce a simple, yet effective, sharpness measure, Inverse Mean Valley, and demonstrate its strong correlation with the generalization gap of DNNs. We incorporate an efficient relaxation of this measure into the distributed training objective as a lightweight regularizer that encourages workers to collaboratively seek wide minima. The regularizer exerts a pushing force that counteracts the consensus step pulling the workers together, giving rise to the Distributed Pull-Push Force (DPPF) algorithm. Empirically, we show that DPPF outperforms other communication-efficient approaches and achieves better generalization performance than local gradient methods and synchronous gradient averaging, while maintaining communication efficiency. In addition, our loss landscape visualizations confirm the ability of DPPF to locate flatter minima. On the theoretical side, we show that DPPF guides workers to span flat valleys, with the final valley width governed by the interplay between push and pull strengths, and that its pull-push dynamics is self-stabilizing. We further provide generalization guarantees linked to the valley width and prove convergence in the non-convex setting.

19.
arXiv (quant-ph) 2026-06-11

Superspace Concentration and Adversarial Robustness in Quantum Algorithms

arXiv:2606.11580v1 Announce Type: new Abstract: We study superspace concentration as a quantum resource, formalized through the focus measure F(\r{ho}) = {\lambda}_max(\r{ho}_super) - the largest eigenvalue of the reduced superspace state - which quantifies the capacity of a quantum system to concentrate informational weight into a preferred subspace of an extended degree-of-freedom space. We develop a complete resource-theoretic framework around this measure and validate its properties through GPU-accelerated numerical simulation. Analytic decoherence predictions are confirmed to machine precision (1.11 x 10^{-16}) for superspace dimensions dS in {2,4,8,16,32}. Focus monotonicity holds across 10,000 random states with zero violations under four focus-non-generating channels across six system configurations. Focused quantum states resist coherent unitary attacks with significantly greater resilience than standard fidelity predicts, with focus remaining above 0.9 at attack strength {\epsilon} = 0.302 versus {\epsilon} = 0.174 for fidelity. We further demonstrate that the focus measure and the U(dS)-asymmetry measure are operationally distinct: asymmetry remains near zero and provides no robustness signal under coherent and targeted attacks while focus tracks spectral concentration and remains robust until {\epsilon} > 0.3. The connection between Grover's algorithm and superspace concentration is made explicit via the identity F(|{\psi}_k>

20.
arXiv (CS.CL) 2026-06-12

MoReBench: Evaluating Procedural and Pluralistic Moral Reasoning in Language Models, More than Outcomes

As AI systems progress, we rely more on them to make decisions with us and for us. To ensure that such decisions are aligned with human values, it is imperative for us to understand not only what decisions they make but also how they come to those decisions. Reasoning language models, which provide both final responses and (partially transparent) intermediate thinking traces, present a timely opportunity to study AI procedural reasoning. Unlike math and code problems which often have objectively correct answers, moral dilemmas are an excellent testbed for process-focused evaluation because they allow for multiple defensible conclusions. To do so, we present MoReBench: 1,000 moral scenarios, each paired with a set of rubric criteria that experts consider essential to include (or avoid) when reasoning about the scenarios. MoReBench contains over 23 thousand criteria including identifying moral considerations, weighing trade-offs, and giving actionable recommendations to cover cases on AI advising humans moral decisions as well as making moral decisions autonomously. Separately, we curate MoReBench-Theory: 150 examples to test whether AI can reason under five major frameworks in normative ethics. Our results show that scaling laws and existing benchmarks on math, code, and scientific reasoning tasks fail to predict models' abilities to perform moral reasoning. Models also show partiality towards specific moral frameworks (e.g., Benthamite Act Utilitarianism and Kantian Deontology), which might be side effects of popular training paradigms. Together, these benchmarks advance process-focused reasoning evaluation towards safer and more transparent AI.

21.
arXiv (CS.LG) 2026-06-11

Prediction-Powered Risk Monitoring of Deployed Models for Detecting Harmful Distribution Shifts

arXiv:2602.02229v2 Announce Type: replace Abstract: We study the problem of monitoring model performance in dynamic environments where labeled data are limited. To this end, we propose prediction-powered risk monitoring (PPRM), a semi-supervised risk-monitoring approach based on prediction-powered inference (PPI). PPRM constructs anytime-valid lower bounds on the running risk by combining synthetic labels with a small set of true labels. Harmful shifts are detected via a threshold-based comparison with an upper bound on the nominal risk, satisfying assumption-free finite-sample guarantees on the type-I error. We demonstrate the effectiveness of PPRM through extensive experiments on image classification, large language model (LLM), and telecommunications monitoring tasks.

22.
arXiv (CS.LG) 2026-06-12

Generative Modeling of Bach-Style Symbolic Music: A Comparative Study of Autoregressive, Latent-Variable, and Adversarial Approaches

arXiv:2606.13626v1 Announce Type: cross Abstract: We study generative modeling of Bach-style symbolic piano music using a shared MIDI corpus and three model families: autoregressive LSTMs with attention, latent-variable models including recurrent VAEs and vector-quantized VAEs, and generative adversarial networks. We compare their ability to model polyphonic note sequences, learn useful latent representations, and generate stylistically coherent compositions. Our experiments show that the autoregressive LSTM with attention produces the most musically coherent samples, while vector quantization helps mitigate posterior collapse and yields more structured outputs than conventional recurrent VAEs. The adversarial approach captures local pitch patterns but remains difficult to train and generalizes less reliably to Bach's style. These results highlight the relative strengths and failure modes of autoregressive, latent-variable, and adversarial approaches for symbolic music generation.

23.
arXiv (CS.CL) 2026-06-11

FORT-Searcher: Synthesizing Shortcut-Resistant Search Tasks for Training Deep Search Agents

Training deep search agents requires verifiable questions whose answers remain unavailable until sufficient evidence has been acquired through search. Existing synthesis methods often increase apparent difficulty by enriching graph structures, but structural complexity alone does not guarantee realized search difficulty: the intended search process can collapse through a cheaper identifying route. We formalize this gap with a shortcut-aware difficulty framework and identify four actionable shortcut risks: evidence co-coverage, single-clue selectivity, exposed constants, and prior-knowledge binding. To diagnose their realized effects, we use trajectory signatures including solving cost, answer hit time, and prior-shortcut rate. Guided by this framework, we introduce FORT, a Framework of Shortcut-Resistant Training-Data Synthesis. FORT constructs shortcut-resistant training data by controlling shortcut risks across entity selection, evidence graph construction, question formulation, and adversarial refinement. Experiments show that FORT induces longer pre-answer search and fewer shortcut patterns than existing open-source deep search datasets. Using the resulting trajectories, we train FORT-Searcher with supervised fine-tuning (SFT) only, and it achieves the best overall performance among comparable-size open-source search agents on challenging deep search benchmarks. Relevant resources will be made available at https://github.com/RUCAIBox/FORT-Searcher.

24.
arXiv (CS.LG) 2026-06-12

Quantizing Time-Series Models As Dynamical Systems: Trajectory-Based Quantization Sensitivity Score

arXiv:2606.13300v1 Announce Type: new Abstract: We introduce the Trajectory-based Quantization Sensitivity Score (TQS), a metric that reframes post-training quantization (PTQ) through the lens of dynamical-systems stability. By modeling the network's rollout as a discrete-time dynamical system, TQS characterizes how quantization-induced errors propagate and amplify over the rollout horizon. Unlike conventional PTQ methods, where sensitivity analysis is often coupled to the quantization procedure, TQS enables a priori sensitivity estimation decoupled from quantizer selection and bit-width assignment. This separation allows for quantization budget planning even for black-box or compiled networks with fused operators. Building on this, we present TQS-PTQ, a flexible mixed-precision framework that requires no calibration data or costly second-order approximations. Our experiments show that a dynamical-systems perspective provides a robust, high-performing pathway for low-precision deployment in resource-constrained settings.

25.
arXiv (math.PR) 2026-06-11

Mean-field theory via dissociated arrays for particle systems interacting through noisy weights

arXiv:2606.12135v1 Announce Type: new Abstract: We study a mean-field limit for a $N$-particle system in which each particle follows a diffusion and interacts with other particles through a weight on each directed edge. Each weight evolves according to its own nonlinear SDE driven by a Brownian motion, with coefficients involving the states of the two endpoint particles of the edge. The initial vertex and edge variables are assumed to have a dissociated Aldous–Hoover form. We construct the limiting nonlinear SDE by averaging the interaction over an independent neighbor and an edge input, prove its well-posedness, and show that the dissociated vertex-edge structure is propagated by the dynamics. This propagation property is an analogue of propagation of chaos in the case where the weight of each edge may remain correlated with the states of the two endpoint particles. Under either a bounded-observable assumption or a sub-Gaussian edge-input condition, the finite system converges to this limit through quantitative coupling estimates for a typical particle and a typical edge. We also prove the convergence of the empirical measure of particle's state pairs and their interaction weights.