Academic Intelligence · Curated Daily

探索全球前沿学术脉络

AcademicHub 汇聚顶级期刊与预印本平台的实时文献。定制您的专属科研雷达,利用大语言模型自动生成交叉领域文献分析简报。

01.
arXiv (CS.CV) 2026-06-16

Vision-Encoder Behavioral Fingerprints of Image-to-Image Generative Models: A Training-Paradigm-Driven Taxonomy of Six Commercial APIs

作者:

We study six production image-to-image AI systems (gpt-image-1, Gemini 2.5 Flash Image, Flux Kontext, SDXL img2img, SD3 img2img, and Qwen Image Edit) under a content-adaptive sub-JND adversarial perturbation pipeline, scoring all outputs by frozen DINOv2 ViT-B/14 token distances against clean references. Across a 3,588-call corpus spanning COCO photographs, CelebA-HQ portraits, and AI-generated inputs, the six systems partition into two image-invariant behavioral bands on a 2D (patch_mean, ssim_clean) plane: edit-trained models (Flux Kontext, Qwen Edit, Gemini) cluster in a tight band, while T2I-base models adapted at sampling time (SDXL, SD3, gpt-image-1) cluster in a drift band.

02.
arXiv (CS.CV) 2026-06-16

An Ensemble Deep Learning Approach for Reliable and Scalable Lemon Leaf Disease Classification

Early detection of plant diseases is crucial to plants and for the farmers. Plant diseases reduce fruit yield and quality, and plants are more susceptible to other stresses when they are infected. The lemon leaf disease dataset contains 1354 images. The dataset has 9 classes. Among the 9 classes only one class is for healthy leaf, and the other 8 classes are leaf diseases. The dataset was split into training (70%), testing (15%) and validation (15%) sets after comprehensive preprocessing. Two pretrained models (InceptionV3 and MobileNetV2) were applied and then combined these models using an ensemble technique to boost robustness. Ensemble models showed a promising performance of 99.27% accuracy. Adversarial Training is applied to improve models' ability and ensure reliable predictions under noisy data. Grad-CAM visualization highlights the important regions of leaf images that validate the model prediction with confidence level.

03.
arXiv (CS.CL) 2026-06-18

Rethinking Cross-lingual Gaps from a Statistical Viewpoint

Any piece of knowledge is usually expressed in one or a handful of natural languages on the web or in any large corpus. Large Language Models (LLMs) act as a bridge by acquiring knowledge from a source language and making it accessible when queried using target languages. A cross-lingual gap is a drop in accuracy incurred when querying knowledge in a target language rather than the source language. Existing research focused on modeling or training failures leading to cross-lingual gaps. In this work, we take an alternative view to characterize the nature of cross-lingual error, and hypothesize that the variance of responses in the target language is a key cause of this gap. For the first time, we formalize the cross-lingual gap in terms of biased and unbiased errors. We empirically validate our hypothesis through multiple inference-time interventions that control variance and reduce the cross-lingual gap. We demonstrate a few test-time ensemble methods that reduce response variance, and thereby improve source-target transfer scores by up to 12 absolute points yielding relative gains of 8% to over 50% across various LLMs.

04.
arXiv (CS.AI) 2026-06-18

Attribution-Guided and Coverage-Maximized Pruning for Structural MoE Compression

arXiv:2606.18304v1 Announce Type: cross Abstract: Mixture-of-Experts (MoE) models scale compute efficiently, yet remain expensive to deploy due to their substantial memory footprint and inference overhead. Prior compression methods mainly operate at the expert level, either removing entire experts or ranking experts by coarse-grained importance scores. However, such expert-wise decisions are often too coarse to capture fine-grained redundancy, leading to misallocated pruning budgets and limited compression. To address this problem, we observe that information within MoE experts is highly concentrated in a small subset of channels, leaving substantial redundancy even in experts deemed important. Based on this observation, we propose a structural pruning framework tailored for MoE models. Our method reformulates prune-ratio allocation as a channel-score coverage maximization problem and solves it efficiently using an attribution-based approximation. Experiments on DeepSeek and Qwen MoE models show that our method preserves model accuracy under 50% or 25% structured pruning when combined with 4-bit quantization. On Qwen3-30B-A3B, our approach reduces memory footprint by 5.27$\times$ and consistently outperforms state-of-the-art baselines across diverse benchmarks.

05.
arXiv (CS.CL) 2026-06-18

Retell, Reward, Repeat: Reinforcement Learning for Narrative Theory-Informed Story Retelling

Counterfactual story retelling exposes LLM shortcomings in constrained narrative solution spaces where they can no longer rely on recalling memorised training data. Ground-truth-based post-training, such as SFT, fails to teach LLMs how to generate logical and rational narrative events. In this paper, we introduce Retell, Reward, Repeat (RRR), an RL-based pipeline synthesising Structuralist Narratology with scalar narrativity to teach storytelling structure. We extend the TimeTravel dataset with human-annotated stages of narrative equilibrium to evaluate reward models. By using d-RLAIF, RRR derives training signals from the narrativity of textual features without the need for reference outputs. Evaluations demonstrate that RRR-trained LLMs outperform few-shot and SFT baselines in logic, rationality, and completeness, with output quality additionally validated by blind human preference. Relying on a small, query-only dataset, RRR provides a linguistically grounded, cost-effective post-training mechanism for storytelling–a domain currently lacking effective post-training methods. RRR highlights the continued relevance of integrating established linguistic theories into contemporary NLP.

06.
arXiv (CS.CL) 2026-06-16

FinBalance: A Multi-Document Accounting Reconciliation Benchmark

Existing financial-NLP benchmarks mostly evaluate prepared artifacts such as filings, tables, or extracted values. Real accounting begins earlier: source documents must be reconciled into cited journal entries, aggregated into a balance sheet, and checked for contradictions. We introduce FinBalance, a multi-document accounting reconciliation benchmark built from source-document bundles across eight industries, three period types, and five difficulty levels. Human-authored business scenarios, accounting policies, tax/FX treatments, document schemas, distractors, and inconsistency templates are composed by a deterministic generator whose ledger produces journal entries,balance sheets, and 23 inconsistency-code labels. On a 710-record evaluation split, six contemporary LLMs reach at most 46% exact final-balance-sheet accuracy. Four models show a 26-41 pp gap between BS_exact, the model's reported balance sheet, and BS_recon, the balance sheet obtained by replaying its entries through our ledger. Models often recover numerically plausible entries but fail to bind them to supporting documents and aggregate them consistently. Citation-pressure prompting barely changes document-linking errors, while ledger-feedback ablations substantially improve reported balance sheets and expose inconsistency-detection trade-offs. Expert finance reviewers validate the benchmark design and labels.

07.
arXiv (CS.CV) 2026-06-16

Disagreement-Based Cross-Model Routing for Implicit Video Question Answering

We study multiple-choice video question answering on the ImplicitQA benchmark, where the correct answer is never explicitly shown but must be inferred from off-screen events, line-of-sight cues, causal structure, and cross-shot spatial layout. On this benchmark a single frontier video LLM already operates near its accuracy ceiling, and we observe that conventional self-consistency strategies – majority voting across repeated samples of the same model – can hurt rather than help, because the model's errors on hard questions are correlated. We propose disagreement-based cross-model routing, a pure inference-time procedure that requires no labels and no training. We triple-sample a native-video model (Gemini 3.1 Pro Preview) at temperature zero, exploit the genuine sample-to-sample variance of its video-processing pipeline to identify the roughly 20% subset of questions where the three samples disagree, and route only that subset to a second model from a different family (Claude Opus 4.8) that consumes uniformly sampled frames with adaptive thinking. On the 1001-question validation set with public ground truth – our main evaluation – the method improves AvgAcc by +1.43 over the best single sample of the primary model, with per-category gains concentrated on Motion & Trajectory (+5.49), Inferred Counting (+3.45), and Vertical Spatial Reasoning (+1.82) – the categories most dependent on cross-shot reference resolution. The same pipeline applied to the held-out 172-question CVPR 2026 ImplicitQA challenge test set achieves 82.03 AvgAcc / 79.71 MacroAvgAcc (+1.81 over the best single sample of the primary model), confirming the validation result on an independent split.

08.
PLOS Computational Biology 2026-06-02

PepAnno: A structure-aware deep learning framework for bioactive peptide prediction, structural visualization, and physicochemical profiling

作者:

by Enyan Liu, Yueming Hu, Liya Liu, Yifan Chen, Shilong Zhang, Sida Li, Haoyu Chao, Luyao Xie, Yi Shen, Liangwei Wu, Julio Raúl Fernández Massó, Ming Chen Peptides are gaining prominence as therapeutic candidates due to their diverse physiological functions and structural simplicity. Although multiple computational tools exist for bioactive peptide prediction, many suffer from limitations such as non-intuitive interfaces, sequence-only representations, insufficient structural awareness, restricted interpretability, or fragmented analysis workflows, leading to reduced research efficiency and higher costs. To address these challenges, we present PepAnno (https://bis.zju.edu.cn/pepanno/), a comprehensive and user-friendly web server for multi-functional peptide annotation. PepAnno is powered by a novel structure-aware, multi-view geometric deep learning framework that integrates pre-trained sequence embeddings with predicted 3D structural graphs through a dual-stream architecture combining a Transformer and a GATv2 network. A cross-modal attention mechanism is employed to effectively fuse semantic and geometric representations, enabling accurate multi-task prediction across 7 key bioactivities, including antimicrobial and anticancer properties. Comprehensive evaluation on seven curated bioactivity datasets demonstrates that PepAnno achieves robust and competitive predictive performance across tasks, consistently outperforming or matching existing methods in terms of discrimination and stability. Beyond functional prediction, PepAnno provides automated calculation of physicochemical properties, structure visualization, and access to an integrated repository of peptide-related databases and tools. By enabling one-click peptide annotation, PepAnno offers an efficient and interpretable solution for large-scale peptide analysis and facilitates downstream experimental design and peptide-based drug discovery.

09.
arXiv (quant-ph) 2026-06-19

Ricci flow for the Bures–Helstrom qubit metric

arXiv:2606.19493v1 Announce Type: cross Abstract: The Bures–Helstrom metric is the minimal monotone Riemannian metric on the state space of a qubit. With the quantum Fisher normalization used here, it identifies the Bloch ball with a geodesic hemisphere of the unit round three–sphere. We describe its Ricci flow explicitly. In a general rotationally symmetric gauge the flow is a coupled system for the radial lapse and warping factor; a single scalar equation appears only after a Hamilton–DeTurck gauge choice. In the corresponding moving DeTurck frame the squared warping function $\Psi=\Phi^2$ satisfies the linear forced heat equation \begin{equation*} D_t\Psi=\Psi_{ss}-2, \end{equation*} while the fixed-lapse coordinate form contains the associated transport term. Since the Bures–Helstrom metric is Einstein, the geometric flow itself is the homothetic shrinker \begin{equation*} g(t)=(1-4t)g_{\mathrm{BH}}, \end{equation*} with scalar curvature $6/(1-4t)$ and extinction time $T=1/4$. Thus the metric remains inside the monotone cone for all $t

10.
arXiv (CS.CV) 2026-06-11

P3D-Bench: Benchmarking MLLMs for Parametric 3D Generation and Structural Reasoning

Multimodal large language models can write code to produce complex programs as well as use programs to do 3D modeling, which opens up a new avenue for 3D generation powered by their priors, world knowledge and reasoning. Yet existing benchmarks rarely evaluate 3D modeling through code. Such modeling demands more than runnable code: from a text or visual specification, a model must generate a parametric 3D program that is geometrically precise, semantically aligned and assembly-consistent. We introduce P3D-Bench, a benchmark for parametric 3D generation. Unlike a 3D mesh, a parametric 3D program exposes explicit dimensions, construction operations and part relations, revealing whether a model recovers a design's structure, not just its appearance. Under a unified protocol, P3D-Bench covers three task families (Text-to-3D, Image-to-3D and Assembly-3D) and scores each output for executability, geometric fidelity, topology, text-grounded constraints, multiview semantic alignment and part-level structure. We evaluate frontier MLLMs and text-only LLMs on 400 text cases, 400 image cases and 203 annotated assemblies, with domain-specific models as reference points. Our extensive evaluation yields three findings. First, assemblies are the hardest setting, where models still fail to compose multiple parts into a coherent structure. Second, models can often recover the global shape and semantic identity of the target object, yet fail to reproduce the precise parametric geometry specified by the input. Third, part-level modeling remains weak on assemblies, where models recover neither the geometry of each part nor the right number of parts. These results position P3D-Bench as a benchmark for evaluating precise parametric geometry and part-level structure in parametric 3D generation.

11.
PLOS Medicine 2026-06-12

Placenta accreta spectrum in the 21st century: Challenging dogma and redefining disorder

by Eric Jauniaux, Helena C. Bartels, Yalda Afshar Placenta accreta spectrum (PAS) is a serious pregnancy complication caused by abnormal placental attachment to the uterus. In this Perspective, Eric Jauniaux and colleagues discuss emerging evidence that challenges our long-held pathophysiological understanding of PAS, and argue that a critical reassessment of definition, diagnosis, and management is overdue. In this Perspective, Jonathan Evans and colleagues discuss why restricting access to joint replacement surgery based on BMI alone is not supported by evidence, and highlight how such rest rictions risk exacerbating stigma, inequity and avoidable harm to those who would benefit from surgery.

12.
bioRxiv (Bioinfo) 2026-06-08

TRACEY: an updated resource for SNARE protein domain annotation with improved HMMs and expanded sequence coverage

Motivation: SNARE proteins catalyse membrane fusion across the eukaryotic endomembrane system, from synaptic vesicle exocytosis to intracellular trafficking, endosomal and vacuolar transport, and autophagy, and their accurate domain annotation depends on the quality of profile models and the sequence diversity behind them. The original SNARE domain classification predates the recent expansion of eukaryotic sequence data, leaving its HMM profiles and subgroup coverage unable to resolve divergent and lineage-specific paralogs. Results: We present an updated release of TRACEY built on a resynchronized, non-redundant collection of 18,915 curated SNARE proteins spanning 1,188 species, together with a consolidated set of 83 HMM profiles, including 43 models for newly defined subgroups, reconstructed through an iterative, mixture-model-driven procedure. In direct comparison with the legacy models, at least ~75% of sequences in every overlapping group scored better with the new HMMs, indicating systematic gains in domain detection. A redesigned web interface adds multiparameter querying, FASTA download, and direct scanning of user-submitted sequences against the curated profiles. Availability and implementation: TRACEY is freely available at https://tracey.unil.ch.

13.
arXiv (CS.AI) 2026-06-18

Synthetic Resonance: A Framework for Growth-Oriented Human-AI Relationships

arXiv:2606.18265v1 Announce Type: cross Abstract: As human relationships with artificial intelligence systems become increasingly frequent and sustained, existing language and theory fail to accurately capture the nature of these affiliations. Common descriptors such as mutual understanding, connection, or friendship risk anthropomorphizing systems that lack subjective experience, while dominant frameworks tend to reduce AI to either a tool or a threat. In this paper, I introduce the concept of synthetic resonance as an integrative framework for understanding human-AI relationships. Synthetic resonance describes how relationships humans define as meaningful can emerge between a human and an AI system without the need to attribute shared feelings or mutual awareness. I argue that synthetic resonance is best understood as a structured, dynamic pattern of interaction that can produce a sense of relationship without the presence of a second experiencing subject. By clarifying this distinction, the concept of synthetic resonance offers a more precise way of conceptualizing human-AI relationships and highlights their potential value and ethical implications. I also call for more research that tests the processes and outcomes of synthetic resonance.

14.
arXiv (CS.LG) 2026-06-15

Learning Variable-Length Tokenization for Generative Recommendation

arXiv:2605.17779v2 Announce Type: replace Abstract: Generative recommendation reformulates recommendation as next-token prediction over discrete semantic identifiers (IDs). A fundamental yet unexplored design choice is that existing methods employ fixed-length tokenization for all items, implicitly assuming uniform encoding capacity regardless of item characteristics. Through systematic experiments across four datasets, we discover the Popularity-Length Paradox: popular items achieve optimal performance with short IDs, while tail items require substantially longer codes to capture discriminative semantics. This reveals a critical mismatch where popular items benefit from abundant collaborative signals and require minimal semantic detail, whereas tail items must rely on fine-grained content features due to sparse interaction data. To address this, we propose VarLenRec, a framework for learning variable-length tokenization. We develop Popularity-Weighted Information Budget Allocation (PIBA), an information-theoretic framework proving that optimal ID length should scale as a negative power of popularity. Directly implementing variable-length allocation faces two technical challenges: standard Euclidean residual quantization lacks geometric capacity to support diverse code lengths without distortion, and discrete length decisions are non-differentiable. We address these through Hyperbolic Residual Quantization, which leverages the exponential volume growth of the Poincaré ball to naturally stratify encoding capacity, and a Soft Length Controller, which enables differentiable length prediction via continuous layer retention probabilities regularized by PIBA-derived priors. Extensive experiments demonstrate that VarLenRec achieves significant improvements over state-of-the-art methods in recommendation accuracy and training/inference efficiency, revealing the importance of adaptive encoding capacity in generative recommendation.

15.
arXiv (CS.LG) 2026-06-15

Minimum Distance Summaries for Robust Neural Posterior Estimation

arXiv:2602.09161v2 Announce Type: replace-cross Abstract: Simulation-based inference (SBI) enables amortized Bayesian inference by first training a neural posterior estimator (NPE) on prior-simulator pairs, typically through low-dimensional summary statistics, which can then be cheaply reused for fast inference by querying it on new test observations. Because NPE is estimated under the training data distribution, it is susceptible to misspecification when observations deviate from the training distribution. Many robust SBI approaches address this by modifying NPE training or introducing error models, coupling robustness to the inference network and compromising amortization and modularity. We introduce minimum-distance summaries, a plug-in robust NPE method that adapts queried test-time summaries independently of the pretrained NPE. Leveraging the maximum mean discrepancy (MMD) as a distance between observed data and a summary-conditional predictive distribution, the adapted summary inherits strong robustness properties from the MMD. We demonstrate that the algorithm can be implemented efficiently with random Fourier feature approximations, yielding a lightweight, model-free test-time adaptation procedure. We provide theoretical guarantees for the robustness of our algorithm and empirically evaluate it on a range of synthetic and real-world tasks, demonstrating substantial robustness gains with minimal additional overhead.

16.
medRxiv (Medicine) 2026-06-16

Ranking-optimized survival models can underperform fixed-horizon clinical prediction: a SUPPORT2 reanalysis of machine learning, attending-physician judgment, and the original SUPPORT model at 60- and 180-day mortality

Machine-learning survival models are increasingly proposed for intensive-care mortality prediction and are almost always selected and reported using the concordance index, a ranking metric averaged over follow-up. Yet most bedside decisions hinge on a probability at a specific time, such as 60- or 180-day mortality. We asked whether ranking-optimized models remain competitive at fixed clinical horizons against two reference points clinicians actually rely on: unaided attending-physician judgment and the original 1995 SUPPORT logistic model. Reanalyzing the SUPPORT2 cohort (9,105 critically ill adults from five United States centers, 1989-1994) under a stratified 70/15/15 split, we compared a gradient-boosted survival model, the physician's recorded prognosis, and the 1995 model at 60 and 180 days, alongside several alternative learners. The survival model achieved competitive ranking concordance (0.705) yet underperformed both comparators at fixed horizons: at 60 days its area under the ROC curve was 0.750, against 0.808 for physicians on the matched sample and 0.827 for the 1995 model, a gap that held across eight independent data splits and remained statistically reliable after multiplicity correction. The shortfall was not miscalibration, since post-hoc recalibration left discrimination unchanged, nor limited capacity, since neural networks, a deep ranking model, and two timepoint-aware discrete-time models also failed to close it; replacing the ranking objective with timepoint-matched binary training recovered roughly half the gap, pointing to an objective-horizon mismatch. Discrimination was equitable across sex, race, and age, but leave-one-disease-out validation exposed severe failure for disease groups absent from training, and the physician advantage was conditional on a physician electing to provide an estimate. We recommend reporting timepoint-specific discrimination alongside concordance, timepoint-matched training when fixed-horizon predictions drive care, leave-one-subgroup validation, and distribution-free prediction intervals to support selective deployment.

17.
arXiv (CS.AI) 2026-06-17

AnalogFed: Privacy-Preserving Discovery of Analog Circuits at Scale with Federated Generative AI

arXiv:2507.15104v2 Announce Type: replace-cross Abstract: Recent advances in generative AI (GenAI) have shown transformative potential for modern hardware design. However, existing GenAI-driven approaches fall short of enabling large-scale electronic design automation (EDA) due to the proprietary and siloed nature of hardware datasets, which cannot be centralized for model training. Achieving at-scale GenAI-driven EDA, therefore, requires a novel privacy-preserving framework that can leverage distributed data without compromising confidentiality. This work introduces AnalogFed, the first privacy-preserving framework for large-scale analog circuit topology discovery using federated learning (FedL) and GenAI. AnalogFed establishes the feasibility of collaborative analog topology design while addressing key security challenges: it mitigates membership inference attacks (MIAs) through a novel input perturbation strategy based on dummy token injection, and defends against model inversion attacks with customized, efficient homomorphic encryption. Extensive experiments demonstrate AnalogFed's effectiveness and efficiency, achieving strong privacy protection without degrading model utility. This framework lays the foundation for scalable, multi-party collaboration in next-generation hardware design automation with GenAI.

18.
arXiv (quant-ph) 2026-06-19

Optimal multi-spectral squeezing via deterministic 2D-phase optimization

arXiv:2606.20192v1 Announce Type: new Abstract: Optimization routines are ubiquitous in quantum information technologies and essential to reach the resource levels required by quantum protocols. Specifically, multi-spectral squeezing for use in such protocols requires that losses be kept minimal at every stage, including coherent detection, which is performed by interfering the signal with a classical local-oscillator beam. This in turn requires control over all optical degrees of freedom of the beam in order to optimize the detection. The most general framework for this optimization relies on agnostic, off-the-shelf machine-learning techniques. Here we take the opposite approach: by focusing on a physical description of the specific optical process, we develop a deterministic sequential algorithm that provably reaches the global maximum of the visibility in a pixel basis and scales linearly with the number of pixels, thereby offering an efficient and theoretically grounded alternative to black-box optimization. In our waveguide-based setup, the optimized mask increases the visibility from 76% to 84%, corresponding to a 20% gain in mode-matching efficiency. Multi-spectral squeezing measurements confirm that this improvement translates directly into quantum readout: for the most squeezed spectral mode, the squeezing increases from $-2.08$ dB to $-2.64$ dB, consistent with the inferred efficiency gain. These results establish deterministic spatial phase shaping as an effective, interpretable route to enhanced multimode squeezing in waveguide platforms.

19.
arXiv (CS.AI) 2026-06-19

Which Pairs to Compare for LLM Post-Training?

arXiv:2606.19607v1 Announce Type: new Abstract: Preference-based post-training has become a central paradigm for aligning language models. A common data-collection strategy is to generate a small set of completions for each prompt and label the resulting comparison pairs. However, human preference labels are often much more expensive than generating additional completions, suggesting a different use of the same labeling budget: generate a larger pool of completions, but label only the most informative comparison pairs. This paper studies which pairs should be compared in preference-based post-training. We formulate comparison curation as a sampling-design problem and evaluate designs by the quality of the final policy under the preference-based post-training objective. We instantiate this framework for Direct Preference Optimization (DPO), analyzing how the choice of labeled pairs propagates through DPO training to downstream policy performance. Our main results provide matching upper and lower bounds on the post-training optimality gap of the DPO-trained policy. The bounds show that comparison selection affects downstream performance through a single design-dependent information matrix, which links label allocation to parameter estimation error and policy suboptimality. This yields an explicit optimization criterion for budgeted comparison curation and motivates practical sampling designs for selecting informative pairs from large generated completion pools. Experiments on synthetic settings and language-model post-training benchmarks show that the proposed designs consistently improve sample efficiency over common comparison-selection heuristics.

20.
arXiv (CS.CL) 2026-06-18

DSB: Dynamic Sliding Block Scheduling for Diffusion LLMs

Diffusion large language models (dLLMs) have emerged as a promising alternative for text generation, distinguished by their native support for parallel decoding. In practice, block inference is crucial for avoiding order misalignment in global bidirectional decoding and improving output quality. However, the widely-used fixed, predefined block (naive) schedule is agnostic to semantic difficulty, making it a suboptimal strategy for both quality and efficiency: it can force premature commitments to uncertain positions while delaying easy positions near block boundaries. In this work, we analyze the limitations of naive block scheduling and disclose the importance of dynamically adapting the schedule to semantic difficulty for reliable and efficient inference. Motivated by this, we propose Dynamic Sliding Block (DSB), a training-free block scheduling method that uses a sliding block with a dynamic size to overcome the rigidity of the naive block. To further improve efficiency, we introduce DSB Cache, a training-free KV-cache mechanism tailored to DSB. Extensive experiments across multiple models and benchmarks demonstrate that DSB, together with DSB Cache, consistently improves both generation quality and inference efficiency for dLLMs. Code is released at https://github.com/lizhuo-luo/DSB.

21.
arXiv (CS.CV) 2026-06-16

Beyond Scalar Rewards by Internalizing Reasoning into Score Distributions

Reward models are central to text-to-image post-training, but visual preference is subjective and better represented as a distribution over rubric scores than as a deterministic scalar. Existing scalar, score-token, and pairwise reward models over-compress uncertainty and fine-grained score differences, while reasoning-based generative rewards provide stronger judgments but are costly to deploy and difficult to use as direct optimization signals. We propose Z-Reward, a teacher-student reward modeling framework that decouples reasoning-heavy judgment from efficient reward deployment. The teacher is a large VLM that uses reasoning to infer rubric-aligned score distributions, and is trained with Group-wise Direct Score Optimization (GDSO), which combines policy-gradient rewards from distribution expectations with direct pointwise and pairwise supervision on score distributions and score gaps. The student is trained with Reasoning-Internalized Score Distillation (RISD), which transfers the teacher's reasoning-conditioned score distribution into a compact VLM without requiring explicit reasoning chains at inference time. On our internally annotated evaluation set, the 27B GDSO teacher reaches 89.6% human preference accuracy, outperforming SFT, RewardDance, and GRPO, while the 9B RISD student reaches 88.6%, outperforming the OPD baseline and closely matching the larger teacher. We further show that Z-Reward can serve as a differentiable reward signal for text-to-image optimization, yielding a 41.3% net human-preference improvement over the SFT baseline.

22.
arXiv (quant-ph) 2026-06-16

Enhancing Quantum Machine Learning with Anyons

arXiv:2606.16090v1 Announce Type: new Abstract: The power of quantum computing and quantum machine learning relies on harnessing uniquely quantum phenomena as computational resources. While superposition, coherence and entanglement have been central to this effort, the role of particle exchange statistics remains largely unexplored. Here, we introduce a quantum kernel framework that unifies bosonic, fermionic, and anyonic (fractional) exchange statistics within a single learning paradigm. We study this family of kernels from three perspectives. At the representation level, Haar-averaged effective-dimension analysis shows that fractional exchange phases access feature-space directions inaccessible to the purely symmetric or antisymmetric limits. At the level of kernel geometry, the corresponding Gram matrices show greater separation from the distinguishable-particle baseline and reduced label-dependent model complexity. Finally, on learning benchmarks, anyonic kernels consistently outperform their bosonic and fermionic counterparts, with stronger target alignment and more favorable class geometry. Together, these findings show that exchange statistics reshape the structure and geometry of quantum feature space, leading to enhanced learning performance. Our work identifies particle exchange statistics as an overlooked computational ingredient for quantum machine learning and provides the first systematic comparison of quantum learning models across exchange phases.

23.
arXiv (CS.CV) 2026-06-11

TextHOI-3D: Text-to-3D Hand-Object Interaction via Discrete Multi-View Generation and Joint Mesh Optimization

Text-conditioned 3D generation has progressed rapidly for images and isolated objects, but producing a hand-object mesh remains challenging: the output must preserve language semantics, cross-view consistency, object geometry, articulated hand shape, and physically plausible contact. We present TextHOI-3D, a staged framework that uses generated multi-view observations as an explicit interface between text-conditioned visual generation and geometry-aware hand-object recovery. TextHOI-3D learns a compact VQ token space for fixed-camera hand-object observations, predicts multi-view visual tokens from text with a CLIP-conditioned visual autoregressive model, and recovers a unified hand-object mesh through prior initialization, multi-view joint optimization, and anti-penetration refinement. The design separates semantic generation from geometric recovery while keeping both stages connected by a discrete multi-view representation. On HO3D-derived evaluations, the multi-view setting reduces object CD from 17.26 mm to 4.92 mm and penetration volume from 5.3721 cm^3 to 0.2193 cm^3 compared with a single-view counterpart, while improving hand errors and surface F-scores. These results support multi-view visual tokens as an effective intermediate representation for text-driven 3D hand-object mesh creation.

24.
arXiv (CS.LG) 2026-06-12

A Stationary (and Therefore Compatible) Representation is All You Need

arXiv:2606.12488v1 Announce Type: new Abstract: Learning compatible representations aims to learn feature representations that can be used interchangeably over time whenever a model undergoes updates. In this paper, we demonstrate that stationary representations learned by d-Simplex fixed classifiers imply compatibility as in its formal definition. This result establishes a foundation for future works and can be directly exploited in practical learning scenarios. We address the challenge of learning compatibility using $d$-Simplex fixed classifiers when the model is sequentially fine-tuned. Learning according to a d-Simplex fixed classifier with the cross-entropy loss aligns feature distributions at the first-order statistics. Consequently, it may not fully capture higher-order dependencies in the representation between model updates. To address this issue, we demonstrate that training the model using a $d$-Simplex fixed classifier through a convex combination of the cross-entropy loss and a contrastive loss not only captures higher-order dependencies, but is also equivalent to learning with the cross-entropy under the compatibility constraints. We confirm our findings with extensive experiments also considering a new scenario where a pre-trained model is sequentially fine-tuned and occasionally replaced with an improved model. We show that stationary representations enable uninterrupted retrieval services (without reprocessing gallery images) while improving performance during model updates and replacements, achieving state-of-the-art. Code at https://github.com/miccunifi/iamcl2r.

25.
arXiv (math.PR) 2026-06-16

Delayed acceptance sampling with Hamiltonian proposal subchains for random field materials inference

arXiv:2606.14743v1 Announce Type: cross Abstract: This paper focuses on accelerating Markov chain Monte Carlo sampling in Bayesian inverse problems in which forward model evaluations dominate the computational cost. It builds on several established ingredients previously used in related scenarios: delayed acceptance, neural network surrogate models, Hamiltonian proposals, and proposal subchains. The main framework is the delayed-acceptance Metropolis-Hastings algorithm of Christen and Fox (2005). The first-stage proposal distribution is constructed from a subchain of Hamiltonian trajectories targeting the surrogate posterior. For each fixed surrogate model, the Hamiltonian subchain and delayed-acceptance correction define a kernel invariant with respect to the exact posterior. In the present work, the surrogate is updated only during a burn-in phase, after which the production run uses a fixed surrogate model. The sampling framework is implemented in Python using parallel processes. Several chains are generated in parallel and share a single surrogate model trained during burn-in on all collected data. The forward model is treated as a black box; therefore, the application area is broad. However, the main motivation is efficient solution of geotechnical inverse problems with material properties represented by Gaussian random fields. In this study, the sampling framework is applied to a geotechnical inverse problem in which hydraulic conductivity and porosity are modeled as non-stationary Gaussian random fields approximated using truncated Karhunen-Loeve expansions. Based on a precomputation, the truncation dimensions are chosen separately for hydraulic conductivity and porosity. The forward model outputs are pore pressure values at control points and selected observation times. These are compared with in situ pore pressure measurements collected over one year during the Tunnel Sealing Experiment in an underground laboratory in Canada.