Academic Intelligence · Curated Daily

探索全球前沿学术脉络

AcademicHub 汇聚顶级期刊与预印本平台的实时文献。定制您的专属科研雷达,利用大语言模型自动生成交叉领域文献分析简报。

01.
arXiv (CS.AI) 2026-06-25

Stable-Shift: Biologically Structured Prediction of Transcriptional Responses to Unseen Gene Perturbations

arXiv:2606.24940v1 Announce Type: cross Abstract: Predicting transcriptional responses to genetic perturbations could reduce the experimental burden of functional genomics, but extrapolation to genes that were never perturbed during training remains difficult. We present Stable-Shift, a structured method for estimating unseen-gene responses. Stable-Shift aggregates single-cell measurements into perturbation-level expression shifts, fits a low-rank response basis using training perturbations only, and predicts an unseen gene's coordinates in that basis from biological context. The context combines STRING interactions, network structure, control-cell expression statistics, and Gene Ontology annotations; the evaluated implementation uses graph convolution to integrate these inputs. On the supplied K562 Perturb-seq benchmark, Stable-Shift obtained 0.592 cosine similarity, compared with 0.569 for GEARS, together with higher Spearman correlation and top-gene precision among the evaluated methods. Its mean cosine similarity over five unseen-gene splits was 0.589 +/- 0.008. The same ordering was observed in the supplied graph-aware, residualized, gene-space, and Norman-dataset comparisons. These results support further study of biologically structured latent-response prediction, while the lower gene-space accuracy and sensitivity to sparse graph neighborhoods limit the scope of the present conclusions.

02.
arXiv (CS.AI) 2026-06-18

RankGraph-2: Lifecycle Co-Design for Billion-Node Graph Learning in Recommendation

arXiv:2606.18379v1 Announce Type: cross Abstract: Graph-based retrieval at billion-node scale requires jointly solving three tightly coupled problems – graph construction, representation learning, and real-time serving – yet existing work addresses each in isolation. We present RankGraph-2, a framework deployed at Meta that co-designs all three lifecycle stages for similarity-based retrieval (U2U2I and U2I2I), where each stage's requirements shape the others. Serving requires a co-learned cluster index to avoid expensive online KNN – this pushes index co-training into the training objective. Training benefits from the observation that similarity-based retrieval tolerates pre-computed neighborhoods, eliminating online graph infrastructure – this requires construction to produce self-contained data. Construction must also support hour-level refresh for item coverage. Acting on these cascading requirements, RankGraph-2 reduces hundreds of trillions of edges to hundreds of billions via subsampling with popularity bias correction, pre-computes multi-hop neighborhoods via personalized PageRank, and co-learns a residual-quantization cluster index that reduces serving computational cost by 83%. This lifecycle co-design enables a simple architecture to achieve 3.8 x higher recall than a GAT + Deep Graph Infomax model on a bipartite graph and 2.1 x higher than PyTorch-BigGraph on item retrieval. RankGraph-2 delivers up to +0.96% CTR and +2.75% CVR, and has powered 20+ retrieval launches across major surfaces.

03.
arXiv (CS.CL) 2026-06-11

UR-BERT: Scaling Text Encoders for Massively Multilingual TTS Through Universal Romanization and Speech Token Prediction

We propose UR-BERT, a Romanized transcription-based text-to-speech (TTS) encoder for massively multilingual TTS systems. Conventional grapheme-to-phoneme (G2P)-based approaches are limited to around 100 languages due to the availability of reliable G2P resources. In contrast, UR-BERT scales to 495 languages by unifying diverse writing systems into a shared Romanization representation. To further enhance phonetic fidelity and text-speech alignment, we introduce a speech token prediction objective during training, which encourages the encoder to learn speech-aware phonetic representations in a data-efficient manner. Experiments show that TTS systems built on UR-BERT consistently outperform recent text encoder baselines across a wide range of languages and resource conditions, and demonstrate strong generalization to unseen languages.

04.
arXiv (CS.CV) 2026-06-19

Geometry-Aware Superpixel Graph Transformer with Metadata for Skin Lesion Classification

Automated skin cancer classification from dermoscopic images remains challenging due to heterogeneous lesion structure, strong intra-class variability, and subtle visual differences between benign and malignant cases. Existing CNN/ViT pipelines typically rely on global or patch-level features and often combine patient metadata via late fusion, which limits spatially grounded multimodal reasoning. We present a novel region-based graph learning framework that explicitly models lesions as graphs of spatially coherent superpixel regions represented as frozen CNN features. To capture fine-grained lesion arrangements, we encode inter-regional geometry as edge attributes and introduce a dedicated metadata context node connected to all regions, providing structured integration of demographic/clinical variables within the same relational space. Node representations are updated using our edge-aware graph transformer followed by attention-driven propagation, and a final graph-level embedding for benign-malignant classification. Experiments on four public benchmarks demonstrate that explicit region-level relational modeling and graph-native multimodal fusion yield consistent gains over the state-of-the-art. Consequently, we establish a new graph-centric perspective in which CNN features are modeled as relational nodes and improved through contextual integration, yielding more expressive and robust classifications.

05.
arXiv (CS.LG) 2026-06-16

Neuromorphic Wireless Split Computing with Resonate-and-Fire Neurons

arXiv:2506.20015v2 Announce Type: replace Abstract: Neuromorphic computing offers an energy-efficient alternative to conventional deep learning accelerators, particularly for real-time processing of time-series data. However, many edge applications, such as wireless sensing and audio recognition, generate streaming signals with rich spectral features that are not effectively captured by conventional leaky integrate-and-fire (LIF) spiking neurons. This paper investigates a wireless split computing architecture that employs resonate-and-fire (RF) neurons with oscillatory dynamics to process time-domain signals directly, eliminating the need for costly spectral pre-processing. By resonating at tunable frequencies, RF neurons extract time-localized spectral features while maintaining low spiking activity. This temporal sparsity translates into significant savings in both computation and transmission energy. Assuming an OFDM-based analog wireless interface for spike transmission, we present a complete system design and evaluate its performance on audio classification and modulation classification tasks. Experimental results show that the proposed RF-SNN architecture achieves comparable accuracy to conventional LIF-SNNs and ANNs, while substantially reducing spike rates and total energy consumption during inference and communication.

06.
arXiv (CS.CV) 2026-06-18

VGGHeads: 3D Multi Head Alignment with a Large-Scale Synthetic Dataset

Human head detection, keypoint estimation, and 3D head model fitting are essential tasks with many applications. However, traditional real-world datasets often suffer from bias, privacy, and ethical concerns, and they have been recorded in laboratory environments, which makes it difficult for trained models to generalize. Here, we introduce \method – a large-scale synthetic dataset generated with diffusion models for human head detection and 3D mesh estimation. Our dataset comprises over 1 million high-resolution images, each annotated with detailed 3D head meshes, facial landmarks, and bounding boxes. Using this dataset, we introduce a new model architecture capable of simultaneous head detection and head mesh reconstruction from a single image in a single step. Through extensive experimental evaluations, we demonstrate that models trained on our synthetic data achieve strong performance on real images. Furthermore, the versatility of our dataset makes it applicable across a broad spectrum of tasks, offering a general and comprehensive representation of human heads.

07.
arXiv (CS.CL) 2026-06-17

Priors Persist Through Suppression: A Stroop Paradigm for Lexical Override

作者:

Glossaries, technical specifications, and system prompts routinely ask language models to use familiar words in unfamiliar ways. When this works, the local rule does not install the new meaning on top of the old one; the pretrained prior keeps operating underneath, and its strength still shows through. We test this with a Stroop-style paradigm: a remapping rule (doctor means forest) pitted against the query word's lexical-prior distractor (hospital), with matched neutral controls. Across 11 open-weight models spanning four families and 1B-9B parameters, lexical-prior strength predicts interference even after item-level controls for answer prior, frequency, tokenization, and prompt wording. Activation patching on five aligned models locates a source-position triplet (definition subject, definition target, query word) that nearly fully recovers the conflict effect (aggregate $R \in [0.92, 1.06]$); a definition-target swap shows the triplet performs binding rather than identity matching. Dissociation experiments isolate target preservation as the binding-specific signature: distractor suppression occurs under matched, swap, and item-mismatched conditions alike, whereas target logit collapse occurs only when the definition-target position is corrupted. Behavior and mechanism converge on the same channel: the prior's strength both predicts which overrides fail and marks where the causal repair lands.

08.
arXiv (CS.LG) 2026-06-12

Dense Supervision, Sparse Updates: On the Sparsity and Geometry of On-Policy Distillation

arXiv:2606.13657v1 Announce Type: new Abstract: On-policy distillation (\textsc{OPD}) has recently become a prominent post-training recipe as it combines two desirable ingredients: on-policy student trajectories and dense teacher supervision, yet how this hybrid changes a model's parameters remains unclear. Across several language and vision-language model pairs and use cases, our analysis yields two main findings. On sparsity, \textsc{OPD}-style updates are small and coordinate-sparse. They are distributed across layers and are usually FFN-heavy. This sparse structure is operationally useful: training only the discovered subnetwork recovers nearly the same performance as full \textsc{OPD}. However, the sparsity-inducing SGD optimizer underperforms AdamW in our optimizer ablation, likely because dense teacher supervision preserves heterogeneous coordinate-wise gradient scales where AdamW's adaptive scaling remains useful. On geometry, the updates are numerically full-rank but spectrally concentrated; they lie mostly away from the principal singular subspaces of the source weights and fall disproportionately on coordinates where the source weights are close to zero. These findings suggest that dense teacher supervision does not turn \textsc{OPD} into ordinary dense parameter rewriting; instead, \textsc{OPD} retains important geometric signatures of on-policy post-training.

09.
arXiv (CS.AI) 2026-06-16

Prototyping an AI-powered Tool for Energy Efficiency in New Zealand Homes

arXiv:2509.05364v2 Announce Type: replace-cross Abstract: Residential buildings contribute significantly to energy use, health outcomes, and carbon emissions. In New Zealand, housing quality has historically been poor, with inadequate insulation and inefficient heating contributing to widespread energy hardship. Recent reforms, including the Warmer Kiwi Homes program, Healthy Homes Standards, and H1 Building Code upgrades, have delivered health and comfort improvements, yet challenges persist. Many retrofits remain partial, data on household performance are limited, and decision-making support for homeowners is fragmented. This study presents the design and evaluation of an AI-powered decision-support tool for residential energy efficiency in New Zealand. The prototype, developed using Python and Streamlit, integrates data ingestion, anomaly detection, baseline modeling, and scenario simulation (e.g., LED retrofits, insulation upgrades) into a modular dashboard. Fifteen domain experts, including building scientists, consultants, and policy practitioners, tested the tool through semi-structured interviews. Results show strong usability (M = 4.3), high value of scenario outputs (M = 4.5), and positive perceptions of its potential to complement subsidy programs and regulatory frameworks. The tool demonstrates how AI can translate national policies into personalized, household-level guidance, bridging the gap between funding, standards, and practical decision-making. Its significance lies in offering a replicable framework for reducing energy hardship, improving health outcomes, and supporting climate goals. Future development should focus on carbon metrics, tariff modeling, integration with national datasets, and longitudinal trials to assess real-world adoption.

10.
arXiv (CS.LG) 2026-06-15

ORCA: A Platform for Open-Source Dexterity Research

arXiv:2606.14561v1 Announce Type: cross Abstract: Robotics manipulation research increasingly focuses on two-finger parallel grippers for their effectiveness, affordability, and ease of teleoperation. Grippers are nonetheless limited by their form factor, often requiring bimanual setups even for simple reorientation tasks. Anthropomorphic hands are a more natural platform for dexterous robot learning – closer to the human hand, and capable of learning from human video – yet they remain hard to use in learning research: even where open and accessible hand hardware exists, the software for control, simulation, teleoperation, and retargeting is scattered in one-off code bases, and largely disconnected from the robot-learning ecosystem. In this work, we introduce the \orca~learning stack, an open-source research stack for dexterity as a first-class robot learning domain. Our \orca~stack unifies low-level control, simulation, teleoperation from a range of consumer platforms, and hand retargeting, behind a single interface, and integrates natively with popular robot-learning frameworks such as \lerobot, so dexterous hand researchers can leverage the same data, training, and evaluation pipelines used for non-dexterous robot learning. We demonstrate a complete end-to-end workflow, collecting expert demonstrations of an in-hand reorientation task by teleoperation with a consumer-grade VR headset, training an autonomous policy with \lerobot, and evaluating the learned policy in a fully reproducible and observable setup. We open-source the entire stack as a shared, reproducible foundation for dexterous-manipulation research.

11.
arXiv (CS.LG) 2026-06-19

Evaluating Universal Machine Learning Force Fields Against Experimental Measurements

arXiv:2508.05762v2 Announce Type: replace-cross Abstract: Universal machine learning force fields (UMLFFs) promise to revolutionize materials science by enabling rapid atomistic simulations across the periodic table. However, their evaluation has been limited to computational benchmarks that may not reflect real-world performance. We introduce UniFFBench, a comprehensive evaluation framework featuring the MinX dataset – a diverse collection of 1,500+ mineral systems spanning 85 elements, extreme thermodynamic conditions (0–5000 K, 0–1000 GPa), and structural complexity, including partial occupancy and disorder. This diversity, combined with experimental reference values for validation, enables assessment of UMLFF generalization across chemical space and conditions substantially beyond typical training scenarios. Our systematic evaluation of six state-of-the-art UMLFFs reveals a substantial ``reality gap'': models achieving impressive performance on computational benchmarks often fail when confronted with experimental complexity. Even the best-performing models exhibit higher density prediction error than the threshold required for practical applications. We observe disconnects between simulation stability and mechanical property accuracy, with prediction errors correlating with training data representation rather than the modeling method.

12.
arXiv (CS.CL) 2026-06-25

Do Encoders Suffice? A Systematic Comparison of Encoder and Decoder Safety Judges for LLM Adversarial Evaluation

With the widespread adoption of large language models (LLMs) in chatbots and everyday applications, companies increasingly need guardrails that are effective while remaining low-cost and low-latency. Safety evaluation of LLM outputs has generally relied on LLM-based judges, which can be effective but are often slow and expensive to deploy at scale. In this paper, we evaluate whether fine-tuned modern encoder classifiers from the ModernBERT family, including ModernBERT and Ettin, can reliably identify harmful LLM outputs in user-model conversations without substantial performance loss relative to LLM-based judges. We benchmark these encoder classifiers against rule-based prefix matching, fine-tuned LLM classifiers, and LLM judges using a range of judge-prompting strategies across open-source adversarial datasets. The LLM judges include evaluation methodologies from StrongReject, ShieldGemma, JailbreakBench, AILuminate, SorryBench, and a Claude-as-a-judge setup, as well as fine-tuned safety classifiers such as LlamaGuard 3 and LlamaGuard 4. The encoder classifiers are fine-tuned on judge-labeled data using a majority-voting label strategy and are then evaluated on a gold-standard holdout dataset to assess their performance relative to LLM judges. We report absolute performance using F1 score, false negative rate, and precision-recall metrics. We also break down results by attack technique, including single-turn prompting, decomposition, escalation, and context manipulation, to identify where encoder classifiers align with or diverge from LLM-based judges. Our findings provide guidance on when encoder classifiers can serve as cost- and latency-efficient alternatives to LLM-based safety evaluation.

13.
arXiv (CS.CV) 2026-06-11

FitVTON: Fit-aware Virtual Try-On via Body-Garment Size Control

While diffusion-based virtual try-on has achieved impressive visual realism, most methods treat the task as 2D inpainting, prioritizing texture preservation over physical plausibility. Consequently, they often produce plausible-looking images that fail to reflect authentic garment fit across diverse body shapes. We present FitVTON, a Fit-aware virtual try-on model on different bodies in the wild. FitVTON encodes garment-body size through structured text prompts, and learn from simulated try-on triplets from parameterized garment model. To improve the fitting effects over garment silhouettes, we introduce two auxiliary head to predict the masks for both the garment and the exposed body. We further introduce a texture rectification stage to improve realistic appearance from simulated data. To evaluate the fitting fidelity, we curate a real-world dataset, FittingEffect3K, combining VLM-based scoring protocol. Both subjective and quantitive experiments show that FitVTON demonstrate authentic fitting fidelity, with significant sizing accuracy and shape preservation over state-of-the-art methods while maintaining competitive image quality. Project Page: https://zenoning.github.io/FitVTON/.

14.
arXiv (CS.AI) 2026-06-24

CineCap: Structured Reasoning with Spatio-Temporal Anchors for Cinematographic Video Captioning

arXiv:2606.24636v1 Announce Type: new Abstract: Cinematographic captioning aims to describe how a video is filmed using professional film-language concepts such as camera movement, shot size, depth of field, composition, and shooting angle. This capability is important for fine-grained video understanding and controllable movie-quality video generation, yet remains underexplored in existing multimodal large language models. Unlike question-answering-based evaluation of cinematic understanding, cinematographic captioning requires a unified open-form description over multiple cinematographic dimensions. This task is challenging for two main reasons: the model must infer professional cinematographic concepts from subtle visual evidence, and it must generate captions that are both comprehensive and accurate. Accordingly, we propose CineCap, a framework that combines structured reasoning with spatio-temporal anchors and reinforcement learning with comprehensiveness, accuracy, and gated coverage rewards. The former grounds professional cinematographic descriptions in explicit visual evidence and organizes them into compact atomic reasoning for supervised fine-tuning, while the latter improves the balance between descriptive completeness and factual correctness. In addition, we construct CineCap Bench, a benchmark of 472 manually annotated video-caption pairs for systematic evaluation. Extensive experiments show that CineCap consistently outperforms strong proprietary and open-source baselines, establishing a new state of the art for cinematographic captioning. The code, model checkpoint, and benchmark are publicly available in https://github.com/Hectormxy/CineCap.git.

15.
arXiv (CS.LG) 2026-06-19

Comparative Study on Agility, Efficiency, and Impact Absorption of Bipedal Robots with Active Toes

arXiv:2606.19699v1 Announce Type: cross Abstract: Human legs exhibit high efficiency, agility, and impact absorption, with toes playing a crucial role in these capabilities. While many attempts have been made to implement human-like toes in robots, they have not fully replicated human characteristics nor rigorously validated their benefits. We propose a 14-DOF biped robot emulating human toes' lightweight, high-torque, robust nature. To quantitatively analyze the effectiveness of the active toes in terms of agility, efficiency, and impact absorption, we developed a high-fidelity simulation training environment that reflects actual actuators with coupled transmissions and accurate power consumption. To ensure a fair comparison between configurations with and without active toes, we designed a minimal RL reward function and applied an identical training procedure to both. The simulation results indicate that, at 1.33 m/s walking, the toe-equipped robot reduced CoT by 17.5% and heel-strike GRF by 5.0% compared with the toe-ablation configuration. On the agility test, average and maximum path deviation decreased by 25.0% and 34.0%, respectively.

16.
arXiv (CS.LG) 2026-06-11

A theory of learning data statistics in diffusion models, from easy to hard

arXiv:2603.12901v2 Announce Type: replace-cross Abstract: While diffusion models have emerged as a powerful class of generative models, their learning dynamics remain poorly understood. We address this issue first by empirically showing that standard diffusion models trained on natural images exhibit a distributional simplicity bias, learning simple, pair-wise input statistics before specializing to higher-order correlations. We reproduce this behaviour in simple denoisers trained on a minimal data model, the mixed cumulant model, where we precisely control both pair-wise and higher-order correlations of the inputs. We identify a scalar invariant of the model that governs the sample complexity of learning pair-wise and higher-order correlations that we call the diffusion information exponent, in analogy to related invariants in different learning paradigms. Using this invariant, we prove that the denoiser learns simple, pair-wise statistics of the inputs at linear sample complexity, while more complex higher-order statistics, such as the fourth cumulant, require at least cubic sample complexity. We also prove that the sample complexity of learning the fourth cumulant is linear if pair-wise and higher-order statistics share a correlated latent structure. Our work describes a key mechanism for how diffusion models can learn distributions of increasing complexity.

17.
arXiv (CS.CV) 2026-06-11

Illumination-Robust Camera-Based Heart-Rate Estimation for Physiological Sensing in Robots

Physiological awareness is important for service, social, and assistive robots that interact with humans in everyday environments. Remote photoplethysmography (rPPG) enables non-contact heart-rate (HR) estimation from an RGB camera, making it a promising sensing modality for robot-mounted vision systems. However, illumination variation remains a major barrier to robust deployment. This paper presents an end-to-end spatial-temporal transformer framework for remote HR estimation on a new dataset with varied illumination. Our estimator integrates PRNet-based 3D face alignment, clip-level illumination augmentation, the Residual Temporal Standardization Module, and controlled hybrid temporal-frequency supervision. The training objective combines a Soft-Shifted Pearson waveform loss with a spectral Kullback-Leibler divergence loss, where a tuned weight ($\mathbf{\beta}$) controls the contribution of frequency-domain heart-rate guidance. Experiments on a static all-level mix protocol covering three illumination levels show that $\mathbf{\beta}=5$ provides the strongest result among the tested beta settings, achieving a best-run HR mean absolute error (MAE) of 0.79 bpm and an HR correlation of 0.982. Compared with the PhysFormer baseline evaluated on our dataset, our estimator reduces HR MAE by 93.6 %, while increasing HR correlation from 0.088 to 0.982, making it usable when illumination varies.

18.
arXiv (CS.AI) 2026-06-19

Robust $Q$-learning for mean-field control under Wasserstein uncertainty in common noise

arXiv:2606.20356v1 Announce Type: cross Abstract: In this article, we present a robust $Q$-learning algorithm for discrete-time mean-field control problems under Wasserstein uncertainty in the common noise law. The algorithm combines a quantization-and-projection scheme with a Wasserstein dual reformulation on the common-noise space. We establish its convergence together with finite-time iteration bounds for both synchronous and asynchronous learning schemes. Numerical experiments on systemic risk and epidemic models compare the asynchronous implementation with an idealized Bellman iteration, illustrate the robustness-performance tradeoff under common-noise misspecification, and report the observed convergence behavior of the asynchronous $Q$-learning algorithm.

19.
arXiv (CS.AI) 2026-06-25

Evaluating AGENTS.md: Are Repository-Level Context Files Helpful for Coding Agents?

arXiv:2602.11988v2 Announce Type: replace-cross Abstract: A widespread practice in software development is to tailor coding agents to repositories using context files, such as AGENTS.md. Although this practice is strongly encouraged by agent developers, there is currently no rigorous investigation into whether such context files are actually effective for real-world tasks. In this work, we study this question and evaluate coding agents' task completion performance in two complementary settings: established SWE-bench tasks from popular repositories, with LLM-generated context files, and a novel collection of issues from repositories containing developer-committed context files. Surprisingly, we find that providing context files does not generally improve task success rates, while increasing inference cost by over 20% on average. This observation holds across different LLMs, coding agents, and for both LLM-generated and developer-committed context files. Specifically, we find that while instructions in the context files are well followed by coding agents, repository overviews, although popular and recommended by model providers, are not helpful. We conclude that while context files are useful for specifying non-standard coding practices, any attempts to improve performance should be rigorously evaluated before deployment.

20.
arXiv (quant-ph) 2026-06-17

Optimizing bias-tailored quantum error correction beyond code-capacity noise

arXiv:2606.17709v1 Announce Type: new Abstract: We find that the substantial advantages predicted for bias-tailored quantum error correction (QEC) under code-capacity noise are strongly reduced once realistic syndrome extraction and circuit-level noise models are considered. We start by comparing XZZX codes to rectangular surface codes with a bias-dependent optimised anisotropy. Although code-capacity simulations predict an advantage of rectangular surface codes in the limit of high noise bias, this actually disappears under circuit-level noise, making the XZZX codes the preferred and simplest choice even for platforms that allow for a flexible variation of the code layout adapted to changes in noise calibration. Our results identify bias degradation during syndrome extraction under circuit-level noise as the central limitation of biased-tailored QEC. To partially mitigate this effect, we introduce a bias-filtering CNOT gadget that temporarily encodes the ancillary target qubit during syndrome extraction in a repetition code and, upon measurement and feed forward, manages to reduce the bias degradation. In a regime of high-bias and low-idle errors, this bias-filtering gadget yields a few-percent relative improvement of the XZZX code error threshold, demonstrating that lightweight bias-filtering strategies can recover part of the lost bias-tailoring advantage for realistic circuit-level noise.

21.
arXiv (CS.LG) 2026-06-12

Enhanced Low-Density Region Exploration in Classifier-Guided Diffusion Models Through Modified Reverse Diffusion Sampling

arXiv:2606.13347v1 Announce Type: new Abstract: Diffusion models have emerged as state-of-the-art generative models for high-fidelity image synthesis, particularly in their classifier-free guided and classifier-guided forms. However, standard classifier guidance concentrates probability mass around high-density class mean, leading to poor coverage of rare samples in the tails of the class-conditional distributions. Recent work on diffusion-based tail sampling mitigates this by training an additional low-density-seeking classifier with a synthetic-vs-real discriminator, at the cost of additional networks and training. In parallel, a number of samplers and distillation techniques accelerate or refine diffusion sampling, but do not explicitly address long-tail coverage. We propose a purely sampling-time, density-aware extension of classifier-guided conditional diffusion model that targets low-density regions without any additional training. We have applied guidance at noisy images not on predicted noise like most diffusion models. Starting from a pretrained conditional diffusion model and classifier on ImageNet, we modify the guided reverse dynamics by steering trajectories toward low-confidence regions via the modified classifier gradient, and at each time step, we also guide the sampling process toward the predicted real image. 1st guidance helps explore low-probability samples, and 2nd guidance helps to generate samples to be close to the real data manifold. The proposed sampler consistently improves ADM model recall at 64x64 resolution while maintaining a comparable FID, and with a 256x256 ADM model, we showed the results visually with different combinations of both guidance. We also showed that standard ADM classifier guidance, combined with predicted real image guidance, helps generate high perceptual quality samples with a 256x256 ADM model on ImageNet.

22.
arXiv (quant-ph) 2026-06-11

Scaling-optimal purification of noisy qubit unitary channels

arXiv:2606.12394v1 Announce Type: new Abstract: We consider the problem of purifying noisy qubit unitary channels. Given the ability to apply an unknown qubit unitary channel followed by depolarizing noise, we aim to construct a superchannel that purifies the noisy unitary back to the original unknown unitary. We first provide numerical evidence that sequential strategies can strictly outperform parallel strategies when the number of channel uses is finite, highlighting the fundamental distinction from state purification. We then provide a concrete $\mathrm{U}(2)$-covariant parallel protocol based on a novel entanglement-assisted quantum error-correcting code that suppresses the first-order noise strength as $O(1/n)$ with $n$ channel uses and show this scaling is asymptotically optimal in the low-noise regime, even when sequential strategies are allowed.

23.
arXiv (quant-ph) 2026-06-17

Split-Head Quantum Generative Adversarial Network for Crystalline Material Discovery

arXiv:2606.17852v1 Announce Type: new Abstract: The discovery of novel crystalline materials is a critical challenge in computational materials science, often limited by the spatial representation limitations and mode collapse typical of classical generative models. Traditionally, developing Quantum GANs for continuous 3D space is hindered by the limited capacity of near-term hardware. To overcome this, we adapt a physics-informed "split-head" architecture right from the quantum trunk to explicitly decouple macroscopic lattice bounds from microscopic atomic coordinates, significantly maximizing resource efficiency. This study disentangles the contributions of quantum circuits from these architectural priors by evaluating a Split-Head Quantum Generative Adversarial Network against an architecture-matched classical ablation model. Evaluated on the highly constrained Mg-Mn-O system, the results reveal a highly nuanced performance dichotomy between the advanced models. The architecture-matched classical ablation model demonstrated superior thermodynamic precision. Conversely, the integration of quantum circuits in the SH-QGAN drove unparalleled structural breadth and latent space exploration, more than doubling the ablation's geometric validity and successfully generating novel, metastable candidates converging on the Mg2MnO4 stoichiometry. These findings clarify that while architectural separation of cell and atom generation drives strict thermodynamic precision, quantum feature mapping independently provides the spatial diversity necessary to overcome mode collapse. Both mechanisms offer distinct, complementary enhancements for the generative discovery of advanced materials.

24.
arXiv (math.PR) 2026-06-15

Hierarchical symmetry selects log-Poisson cascades: classification, uniqueness, and stability

arXiv:2604.01632v2 Announce Type: replace Abstract: Within i.i.d. multiplicative cascades, a single axiom – the hierarchical symmetry, a linear contraction on incremental scaling exponents – is shown to be necessary and sufficient for the cascade multiplier to be log-Poisson. We prove: (1) a characterization theorem determining the log-Poisson law with explicit parameters, within the class of all multipliers with finite lattice moments; (2) a classification theorem locating the log-Poisson class inside the log-infinitely-divisible family and identifying the mechanism by which every rival sub-family fails the symmetry; (3) a stability theorem with sharp constants – $(1+\beta)^{1/2}$ when the limiting increment is known, $\sqrt{2}$ when it is fitted – and (4) an unconditional propagation theorem transferring the bound to the multiplier distribution at the sharp rate $\Theta(\sqrt{\varepsilon})$, with a matching lower bound. Beyond independence, the classification extends exactly at the level of asymptotic statistics (limiting cumulant generating function, large deviations, multifractal spectrum) and provably not at the level of laws: an explicit stationary ergodic Markov multiplier satisfies the symmetry exactly with a non-log-Poisson marginal, while exchangeable multipliers collapse to the i.i.d. log-Poisson cascade and finite-state Markov multipliers cannot satisfy the symmetry at all. In the continuous category of exactly scale-invariant log-infinitely-divisible multifractal random measures, no finite moment window of structure-function exponents identifies the cascade class, whereas at the level of the scale-invariance generator the symmetry selects exactly the Barral-Mandelbrot compound Poisson cascade, with scale-ratio-free stability constants. The proofs reduce to second-moment identities on [0,1] via the change of variables $u = e^{kx}$, boundedness of the multiplier, and multiplicative couplings.

25.
arXiv (math.PR) 2026-06-24

Scheduling jobs with unknown size distribution in a M/G/1 queue: the shifted empirical Gittins

arXiv:2606.24703v1 Announce Type: new Abstract: In this paper we consider a M/G/1 queue for which we want to minimize the expected response time. We show how to compute indices from $n$ samples of the job size distribution such that the corresponding index policy is asymptotically optimal as $n$ grows. This construction is based on a discretization of the bounded support of the job size distribution and a shift of the samples to their nearest discrete point to the right. We show that the Gittins index of the empirical distribution of these shifted samples is close to the Gittins index of the original distribution. This translates to the asymptotic optimality of the corresponding index policy for minimizing the expected response time. Numerical comparison with other approaches further confirm the efficiency of our approach.