Academic Intelligence · Curated Daily

探索全球前沿学术脉络

AcademicHub 汇聚顶级期刊与预印本平台的实时文献。定制您的专属科研雷达,利用大语言模型自动生成交叉领域文献分析简报。

01.
arXiv (CS.AI) 2026-06-24

DeepBD: A Grounded Agentic Workflow for Variant Prioritization and Diagnosis of Genetic Birth Defects

arXiv:2606.24779v1 Announce Type: cross Abstract: Birth defects are a major cause of fetal loss, neonatal morbidity and long-term disability. In the subset with suspected genetic etiologies, exome and genome sequencing have moved many cases from variant detection to post-sequencing interpretation: clinicians must rank patient-specific candidate variants under incomplete fetal or infant phenotypes and heterogeneous evidence from population genetics, variant-effect prediction, gene-disease validity, phenotype ontologies, cellular and pathway context, protein structure and clinical literature. We present DeepBD, a grounded agentic workflow for variant prioritization and diagnostic interpretation of genetic birth defects. DeepBD organizes the workflow into LLM-assisted case structuring, a pretrained evidence engine, specialist evidence modules and a grounded diagnostic review layer. The evidence engine learns patient-specific variant scores from structured rule evidence, sequence and variant-effect representations and phenotype-conditioned biological context, whereas specialist modules and the agentic layer provide tool-based refinement, candidate-pool review and diagnosis-oriented synthesis from ranked candidates. Developed using an in-house fetal and infant cohort comprising 18,622 cases, DeepBD achieved Recall@1/3/5/10 of 0.658/0.882/0.912/0.929 on an internal held-out solved-case benchmark, outperforming standalone Exomiser, DeepRare and prompted LLM reranking baselines evaluated on Exomiser-derived top-20 candidate variants. Ablation and overlap analyses show that rule evidence, mechanistic context, and specialist refinement provide complementary signals. These findings support a grounded agentic workflow that separates evidence integration, tool-based refinement, and LLM-assisted diagnostic review for retrospective variant prioritization in genetic birth defects.

03.
arXiv (CS.CV) 2026-06-11

SheafStain: Sheaf-Theoretic Schrödinger Bridge for Spatially and Biologically Coherent Virtual Staining

Current virtual staining approaches offer the potential for time- and cost-efficient biomarker quantification in cancer diagnostics and prognostics. However, patch-wise inference for gigapixel whole slide images (WSIs) fails to maintain spatial continuity, yielding artifacts that cause catastrophic mismatches with ground-truth images. Although pathology Vision Foundation Models (VFMs) offer rich representations, their self-attention causes varying global contexts to produce inconsistent embeddings for the same physical region. We formalize and validate this ``context contamination'' as a sheaf-theoretic problem where these embeddings form a presheaf that violates the gluing axiom. To address this, we propose SheafStain, a new approach that reinterprets VFM features as sheaf-like sections for spatially and biologically coherent virtual staining. Specifically, SheafStain integrates class and patch tokens into a Schrödinger Bridge framework as sheaf-like sections. While the class token anchors biological consistency, patch tokens form a per-position spatial map. A backbone co-pretrained on Hematoxylin \& Eosin (H\&E) and Immunohistochemistry (IHC) yields non-degenerate cross-stain stalks, so a single VFM feature space supervises both input conditioning and output stain alignment. Departing from prior work that evaluates on isolated $256 \times 256$ patches and either random-crops or resizes the $1024 \times 1024$ ground truth, we translate at $256 \times 256$ and evaluate on the stitched $1024 \times 1024$ outputs across HER2, ER, PR, and Ki-67. SheafStain demonstrates promising results against six prior methods while mitigating patch-boundary stitching artifacts. Code will soon be released.

04.
medRxiv (Medicine) 2026-06-22

How knowledge shapes community stigma and social support for women seeking abortion in the Democratic Republic of Congo: A cross-sectional study.

Background The Democratic Republic of Congo (DRC) bears one of the highest maternal mortality ratios globally (746 per 100,000 live births), with nearly 11% of deaths attributable to complications of unsafe abortion. Despite ratification of the Maputo Protocol and related national policies, access to safe abortion remains limited, largely due to entrenched stigma. Social support, encompassing emotional, informational, and instrumental assistance, is critical in shaping womens abortion-seeking behaviors and health outcomes. This study examines the influence of community-level knowledge on stigma and social support for women seeking abortion care. Methods A cross-sectional survey was conducted from May 2024 to June 2024 among 1,715 adults in Kinshasa and North Kivu provinces. Analyses focused on a sub-sample of 574 respondents reporting familiarity with women who had undergone abortion. Structural Equation Modeling (SEM) was applied to estimate direct and indirect pathways linking community knowledge, stigma, and social support. Results Two core knowledge indicators, recognition of abortion as a safe medical procedure and awareness of legal conditions for access, were significantly associated with outcomes. A one-unit increase in knowledge corresponded to a 0.39-point increase in social support and a 0.19-point reduction in stigma. Enhanced knowledge promoted empathetic attitudes, reinforced practical support, and mitigated moralizing judgments toward women seeking abortion. Conclusions Strengthening community knowledge emerges as a strategic lever to reduce abortion-related stigma and enhance social support in the DRC. These findings underscore the importance of integrating stigma-reduction and knowledge-enhancement interventions into reproductive health programs to improve womens access to safe and dignified abortion care.

05.
arXiv (math.PR) 2026-06-11

Markov property and path regularity for the solutions to SPDEs driven by cylindrical-martingale valued measures

arXiv:2606.12381v1 Announce Type: new Abstract: In this paper we prove the Markov property for the solution to stochastic partial differential equations driven by a cylindrical orthogonal martingale-valued measure. We assume our coefficients are time-dependent and satisfy some growth and Lipschitz conditions. We also prove that for time-independent coefficients and under mild assumptions on the cylindrical orthogonal martingale-valued measure, the solutions to our stochastic partial differential equations are Feller. Finally, in the case that the $C_{0}$-semigroup is quasi-contraction, we show that the solution to our stochastic partial differential equation possesses a càdlàg version.

06.
arXiv (CS.CV) 2026-06-25

ADM-Fusion: Adaptive Deep Multi-Sensor Fusion for Robust Ego-Motion Estimation in Diverse Conditions

Robust multi-sensor fusion is essential for reliable autonomy in diverse and degraded environments, where sensor reliability can fluctuate rapidly. Because different modalities fail in distinct ways, effective fusion should adaptively balance complementary cues rather than rely on fixed weighting. This adaptability is particularly important for ego-motion estimation, since accurate updates depend on the consistent integration of complementary sensor information. We propose ADM-Fusion, an end-to-end deep learning based multi-sensor fusion method designed to adapt to environmental changes and sensor degradation. ADM-Fusion employs an adaptive sensor mixture-of-experts framework with content-aware routing to dynamically assign weights to sensor inputs in real time. The system further incorporates separate translation and rotation branches, coupled through a cross-task attention mechanism to preserve task-specific specialization while enabling information sharing. ADM-Fusion is trained on the CARLA-LOC simulated dataset and subsequently fine-tuned on KITTI real-world data, demonstrating effective simulation-to-real transfer. Experiments show that ADM-Fusion remains robust under degraded conditions while maintaining competitive performance against existing methods.

07.
arXiv (CS.AI) 2026-06-19

How Do Instructions Shape Speech? Cross-Attention Attribution for Style-Captioned Text-to-Speech

arXiv:2606.20532v1 Announce Type: new Abstract: Style-captioned text-to-speech systems use natural language to control voice characteristics, but how individual words influence acoustic output remains unclear. Understanding this is critical for diagnosing failure modes and improving controllability in expressive TTS. We propose cross-attention attribution for speech diffusion models, adapting the DAAM framework to the speech domain for the first time, and apply it to CapSpeech-TTS. Our method extracts per-token heatmaps across 25 layers and 24 ODE steps. We analyze 3,600 (style caption, text transcript) combinations comprising 120 style captions conditioning the generation of 30 text transcripts each, revealing how caption tokens shape waveforms. Results show: (1) style tokens have lower temporal variance than content/function tokens, confirming global conditioning; (2) style attention correlates with F0 and energy; (3) style conditioning peaks in early steps and deep layers; (4) attention entropy reaches its minimum at layer 17, co-occurring with the style importance peak, indicating maximal network selectivity at the most style-critical stage. This is the first study of how natural language influences cross-attention in speech diffusion models

08.
arXiv (CS.LG) 2026-06-24

Exact Schur-Sylvester Dimensionality Reductions for Non-Smooth Stochastic Complexity and Manifold Sampling

arXiv:2606.23867v1 Announce Type: new Abstract: The exact computation of the Normalized Maximum Likelihood (NML) codelength for regular non-smooth estimators (e.g., Lasso) has been historically limited by the cubic scaling walls of manifold-constrained projection and volume integration. At each step of the geometric Propose-and-Project Metropolis–Hastings (PPMH) sampler, evaluating the projection operator requires inverting an $(N+k) \times (N+k)$ generalized KKT matrix, while calculating the volume factor requires the determinant of an $(N-k) \times (N-k)$ Gram matrix. This paper presents an exact, mathematically equivalent formulation that bypasses both bottlenecks by utilizing the block Schur complement and Sylvester's determinant identity. We prove that the computational complexity of both operations collapses from $\mathcal{O}(N^3)$ to $\mathcal{O}(k^3 + N^2 k)$ per step. We generalize this reduction to Sparse Support Vector Machines (SVMs), Elastic Net, and Group Lasso. Finally, we provide a rigorous numerical stability analysis and evaluate the sampler's efficiency using the Effective Sample Size (ESS) per second. Our empirical benchmarks on high-dimensional datasets confirm a constant speedup exceeding $14{,}100\times$ while maintaining double-precision numerical equivalence, rendering exact non-smooth NML estimation highly tractable for large-scale statistical inference.

09.
arXiv (CS.LG) 2026-06-16

MIRAGE: Auditing Anti-Muslim Bias in Frontier LLMs Across Reasoning, Agentic, and Time-Coupled Conditions

arXiv:2606.16562v1 Announce Type: new Abstract: Five years after the discovery of persistent anti-Muslim bias in large language models, most evaluations remain confined to single-turn prompt completion, a setting that no longer reflects how frontier LLMs are deployed. We introduce MIRAGE (Muslim-Identity Reasoning and Agentic Generation Evaluation), a benchmark of 1{,}200 prompts spanning three deployment-realistic conditions: direct completion, chain-of-thought reasoning, and simulated agentic decision-making across content moderation, lending triage, refugee claim summarization, and hiring screens. Across six frontier models, we find that (i) chain-of-thought reasoning amplifies rather than suppresses Muslim-violence associations by 12–34\% relative to direct completion, (ii) agentic decisions exhibit a 9–22 percentage-point asymmetry between Muslim and matched non-Muslim cases on identical evidence, and (iii) bias is sharply time-coupled to retrieved news context, increasing 18–27\% under recent-conflict retrieval. Existing prompt-based mitigations transfer poorly across our three conditions, suppressing direct-completion bias while leaving agentic asymmetry largely intact. We release MIRAGE and an open evaluation harness to support targeted mitigation research.

10.
arXiv (CS.LG) 2026-06-24

Efficient reduction of stellar contamination and noise in planetary transmission spectra using neural networks

arXiv:2602.10330v3 Announce Type: replace-cross Abstract: Context: The characterization of exoplanetary atmospheres has been transformed by the James Webb Space Telescope (JWST), whose infrared sensitivity enables transmission spectroscopy at unprecedented precision. However, stellar heterogeneities (e.g., spots and faculae) remain a dominant source of contamination that can bias atmospheric retrievals if not properly corrected. Aims: We present a methodology for reducing stellar contamination and instrument-specific noise from exoplanet transmission spectra using neural networks, in particular the so-called Denoising AutoEncoders (DAE). Our goals are to enable fast, accurate corrections that improve the reliability of atmospheric parameter retrievals and to promote the use of unsupervised algorithms for efficient data processing. Methods: We designed and trained DAE architectures using large synthetic datasets of terrestrial (TRAPPIST-1e analogues) and sub-Neptune (K2-18b analogues) planets. Atmospheric retrieval experiments were then performed on contaminated spectra in order to compare our deep-learning approach against standard correction methods in terms of accuracy and computational cost. Results: Our autoencoders successfully reconstruct uncontaminated spectra, preserving essential molecular features even in low-S/N regimes. In retrieval tests, the denoising autoencoder pre-processing reduces bias in retrieved abundance parameters compared to uncorrected observations. Notably, our method matches the accuracy of simultaneous stellar-contamination fitting while maintaining a much lower computational cost, typically one order of magnitude smaller. Conclusions: These results demonstrate that DAEs outperform conventional correction methods in computational efficiency while maintaining high accuracy, paving the way for their integration into future atmospheric characterization pipelines for both rocky and giant exoplanets.

11.
arXiv (CS.CV) 2026-06-16

Unlocking Diffusion Hierarchies: Adaptive Timestep Selection for Zero-Shot Segmentation

Zero-shot segmentation has recently shown notable improvement by leveraging the rich visual priors in large-scale text-to-image diffusion models, such as Stable Diffusion. However, current diffusion-based methods often face limitations due to the trade-off between spatial resolution and contextual information, as well as their reliance on a single static timestep for feature extraction. To overcome these challenges, our work introduces two key advancements. First, our Contextual Similarity Maps fuse high-resolution attention maps with rich U-Net encoder features, providing both fine-grained and robust per-pixel representations. Second, we identify an emergent hierarchical semantic progression within the denoising process of various diffusion models: representations transition from part-level abstractions at earlier timesteps to object-level abstractions at later stages. Leveraging this insight, we introduce a mechanism to adaptively select the optimal timestep for each pixel. Extensive experiments demonstrate that our method consistently outperforms existing zero-shot segmentation baselines, validating the efficacy of combining contextual features with dynamic, hierarchical timestep selection.

12.
arXiv (CS.AI) 2026-06-15

Shift-Invariant Attribute Scoring for Kolmogorov-Arnold Networks via Shapley Value

arXiv:2510.01663v2 Announce Type: replace-cross Abstract: For many real-world applications, understanding feature-outcome relationships is as crucial as achieving high predictive accuracy. While traditional neural networks excel at prediction, their black-box nature obscures underlying functional relationships. Kolmogorov–Arnold Networks (KANs) address this by employing learnable spline-based activation functions on edges, enabling recovery of symbolic representations while maintaining competitive performance. However, KAN's architecture presents unique challenges for network pruning. Conventional magnitude-based methods become unreliable due to sensitivity to input coordinate shifts. We propose ShapKAN, a pruning framework using Shapley value attribution to assess node importance in a shift-invariant manner. Unlike magnitude-based approaches, ShapKAN quantifies each node's actual contribution, ensuring consistent importance rankings regardless of input parameterization. Extensive experiments on synthetic and real-world datasets demonstrate that ShapKAN preserves true node importance while enabling effective network compression. Our approach improves KAN's interpretability advantages, facilitating deployment in resource-constrained environments.

13.
arXiv (CS.CV) 2026-06-25

Heterogeneous and Adept Snapshot Distillation for 3D Semantic Segmentation

Multi-modal fusion and multi-model ensembling are prevalent in enhancing the performance of 3D semantic segmentation. Despite the impressive performance, these methods either rely on auxiliary input signals or suffer from costly computational expense. To efficaciously enhance the segmentation performance without introducing intolerable costs, we propose to transfer the rich knowledge from the multi-modal model (i.e., point clouds and images) and multiple model experts to the point-cloudbased network through knowledge distillation. Specifically, we present Information-oriented Heterogeneous Distillation (IHD) to help the uni-modal model absorb the complementary knowledge from the multi-modal teacher. We design the Information-Oriented Filtering (IOF) strategy to select informative images from the continuous image sequence for multi-modal fusion. This practice can boost the performance of the multi-modal teacher, thus benefiting the learning of the student. Besides, as opposed to vanilla model ensembling that requires the separate training of each expert, we propose Adept Snapshot Distillation (ASD). ASD treats the freely available model snapshots generated during the training phase as multiple experts, which significantly reduces the training cost for model ensembling. For each expert teacher, it only provides supervision to the student in the class where it is adept. The resulting Heterogeneous and Adept Snapshot Knowledge Distillation, dubbed HAS-KD, attains state-of-the-art results on ScanNetV2 and S3DIS datasets. HAS-KD can be seamlessly integrated into contemporary 3D segmentation algorithms and bring considerable gains without introducing extra inference burdens. The code will be made publicly available upon publication.

14.
arXiv (CS.AI) 2026-06-19

Robust $Q$-learning for mean-field control under Wasserstein uncertainty in common noise

arXiv:2606.20356v1 Announce Type: cross Abstract: In this article, we present a robust $Q$-learning algorithm for discrete-time mean-field control problems under Wasserstein uncertainty in the common noise law. The algorithm combines a quantization-and-projection scheme with a Wasserstein dual reformulation on the common-noise space. We establish its convergence together with finite-time iteration bounds for both synchronous and asynchronous learning schemes. Numerical experiments on systemic risk and epidemic models compare the asynchronous implementation with an idealized Bellman iteration, illustrate the robustness-performance tradeoff under common-noise misspecification, and report the observed convergence behavior of the asynchronous $Q$-learning algorithm.

15.
arXiv (CS.AI) 2026-06-25

TheoremGraph: Bridging Formal and Informal Mathematics

arXiv:2606.25363v1 Announce Type: cross Abstract: Mathematical knowledge is organized around statements and their dependencies, but this structure is exposed unevenly: informal papers cite mostly at the document level, while formal libraries record fine-grained dependencies over a much smaller body of mathematics. We introduce TheoremGraph, a unified statement-level dependency graph spanning both informal and formal mathematics. On the informal side, we parse 11.7M theorem-like environments from mathematics arXiv and recover 18.3M candidate directed dependencies, each labeled by the extractor that proposed it so downstream users can trade coverage for precision. On the formal side, we release LeanGraph, a Lean 4 elaborator-level extractor producing 388,105 declaration nodes and 11.3M typed edges across 25 Lean projects. We bridge the two graphs by embedding generated natural-language slogans into a shared semantic space, linking related statements across papers and across the informal/formal divide; an LLM judge affirms 47,952 such matches above a 0.8 cosine floor, with the judge-acceptance rate rising from 48% across the floor to 87% in the >=0.9 tier. On formal concept retrieval, our name-and-signature representation with graph expansion comes within 0.5pp of LeanSearch v2's reranked Recall@10 (0.775 vs. 0.780) without an LM reranker. We release the dataset, extractors, HTTP API, and MCP interface as infrastructure for mathematical search, attribution, and retrieval-augmented reasoning, available at theoremsearch.com and huggingface.co/datasets/uw-math-ai/theorem-matching.

16.
arXiv (CS.CV) 2026-06-12

AudioX-Turbo: A Unified Framework for Efficient Anything-to-Audio Generation

Audio and music generation based on flexible multimodal control signals is a widely applicable topic, with the following key challenges: 1) a unified multimodal modeling framework, 2) large-scale, high-quality training data, and 3) the prohibitive inference cost of multi-step diffusion sampling. As such, we propose AudioX-Turbo, a unified and efficient framework for anything-to-audio generation that integrates varied multimodal conditions (i.e., text, video, and audio signals) in this work. AudioX-Turbo follows a teacher-student paradigm. The teacher AudioX-Base is built on a Multimodal Diffusion Transformer with a Multimodal Adaptive Fusion module that aligns diverse multimodal inputs for high-fidelity synthesis, and is then distilled into the few-step student AudioX-Turbo via Distribution Matching Distillation adapted to flow matching, complemented by a diffusion-based discriminator for high-quality few-step generation. To support the training of AudioX-Turbo, we construct a large-scale, high-quality dataset, IF-caps-Pro, comprising approximately 9.2M samples curated through a two-stage data collection and annotation pipeline. We benchmark AudioX-Turbo across a wide range of tasks, finding that our model achieves superior performance, especially on text-to-audio and text-to-music generation, while operating at only 4 sampling steps and requiring approximately 25x fewer function evaluations (NFE) than multi-step baselines. These results demonstrate that our method is capable of audio generation under flexible multimodal control, showing efficient and powerful instruction-following capabilities. The code and datasets will be available at https://zeyuet.github.io/AudioX-Turbo/.

17.
arXiv (CS.CV) 2026-06-25

Cage-based Texture Transfer with Geometric Filtering

Real-time texture transfer expands the creative horizon for interactive applications, enabling seamless detail projection in scenarios that range from digital character cosmetics to procedural automotive texturing. Yet, its practical application is governed by inherent trade-offs between processing speed and suppression of artifacts. Low-latency transfer methods frequently fail to suppress artifacts, and robust alternatives rely on large-scale models that are costly in training and memory. Our proposed method bridges the gap between efficiency and robustness by using a cage-based geometric filtering method to identify Non-Cosmetic Zones (NCZs) for artifact suppression. While other models are resource-intensive and require multiple days of training on manually annotated datasets, we are able to successfully suppress artifacts and achieve immediate deployment on consumer-grade hardware. Our framework achieved highly efficient runtimes of ~70ms on mobile devices for a ~4.8k triangle mesh.

18.
medRxiv (Medicine) 2026-06-17

Clinical Study Protocol of the 'Biomarkers of Severity of COVID-19 Patients' (BIOMARCOVID) Project

Introduction The coronavirus disease 2019 (COVID-19) pandemic has challenged health care systems worldwide, in certain areas exceeding hospital capacities and human resources. This has underscored the importance of having better tools to predict the outcome of potentially severe respiratory infections such as SARS-CoV-2. Predicting COVID-19 severity may allow physicians to better manage ICU beds and increase the chances of patient survival through appropriate management. During the toughest months of the pandemic, most physicians tried to identify patients that might develop severe forms based primarily on clinical features on admission (e.g., BMI, age). In this context, significant research has focused on identifying comorbidities, clinical manifestations, and routine blood biomarkers to predict disease severity. However, despite the demonstrated value of untargeted metabolomics in assessing severity, limited data exist on its use for identifying novel metabolite biomarkers that could improve both the sensitivity and specificity of outcome prediction. Our goal is to identify metabolite biomarkers that could enhance the predictive accuracy of standard medical biology data and clinical parameters. Methods and analysis This is a retrospective, observational, monocentric cohort study conducted at the Centre Hospitalier Universitaire Grenoble Alpes (CHUGA). The maximum number of eligible patients admitted for PCR-confirmed COVID-19 between March and December 2020 will be included. Severity outcome is defined using the WHO 10-category ordinal scale (mild: categories 4-5; severe: >5). Blood samples were collected within 48 hours of admission and analyzed for 62 routine blood tests and untargeted multiplatform LC-MS/MS metabolomics across four national platforms. Statistical analysis will include logistic regression with variable selection for the primary aim, and multi-block chemometric integration of clinical, biological, and metabolomics data as a secondary aim. Ethics and dissemination A study steering committee has been formed to ensure the accuracy of the collected data by thoroughly reviewing it prior to the data lock. All aspects of the study comply with ethical standards, including approval by the CHUGA institutional review board and adherence to CNIL Reference Methodology MR004 for the protection of participants' rights, privacy, and confidentiality. This study is registered on the French Health Data Hub (number F20210218154851). Results will be disseminated through peer-reviewed publications, presentations at national and international scientific and clinical conferences, and reports shared with key healthcare system stakeholders.

19.
arXiv (quant-ph) 2026-06-19

Truncated Wigner dynamics of biclique quantum spin glasses

作者:

arXiv:2606.20187v1 Announce Type: cross Abstract: Quantum spin glasses are often considered testbeds for studying quantum optimization algorithms and as such have been the subject of various quantum advantage claims. Here we investigate the near adiabatic dynamics of biclique quantum spin glasses within the (discrete) truncated Wigner approximation (TWA). Benchmarks on small systems show that TWA recovers sample-to-sample fluctuations of the Edwards-Anderson order parameter, over a wide range of annealing times, with increasing fidelity when the system size increases. We extract critical exponents from the Binder cumulant in line with theoretical expectations, reproducing recent quantum experiments. The computational cost of the method is minimal and it can easily be applied to tens of thousands of qubits.

20.
arXiv (CS.LG) 2026-06-11

Counterexample Guided Learning in the Large using Reasoning Agents

arXiv:2606.11521v1 Announce Type: new Abstract: LLMs and LLM agents should improve when given feedback, but identifying when they are able to do so is difficult: feedback is heterogeneous, domain-specific, and difficult to control. We approach this challenge by asking LLMs to perform regular-expression induction, a classical symbolic learning problem where precise mechanisms for feedback exist in the form of counterexamples. In counterexample-guided learning, a learner (LLM) proposes candidate regular expressions from positive/negative-labeled strings, and the teacher (verifier) returns counterexamples showcasing the difference between the candidate and target languages. We identify novel counterexample-guided refinement strategies that enable effective regex learning, such as regularization and symbolic counterexample clusters. We also explore agentic strategies such as reflection and repair loops. Empirically, we find that verifier feedback substantially improves sample efficiency on challenging regex-induction tasks, reducing the number of labeled examples required and enabling learning of complex target expressions where standard prompting fails. For example, on the hardest task groups, our counterexample-guided framework improves success from 3.2% to 38.1% and from 38.9% to 74.1% on two different regex domains. These results suggest that LLMs can benefit from rich feedback beyond treating it as additional data, opening the door for robust verifier-guided methods for LLM-based program synthesis and formal reasoning.

21.
arXiv (CS.AI) 2026-06-19

A Systematic Evaluation of Black-Box Uncertainty Estimation Methods for Large Language Models

arXiv:2606.19868v1 Announce Type: new Abstract: Although large language models (LLMs) have shown strong capabilities across a wide range of tasks, their outputs often remain unreliable and may contain hallucinations, making uncertainty estimation (UE) essential for building trustworthy LLMs. In practice, many mainstream LLMs are only accessible through restricted APIs, where internal signals such as logits and hidden states are unavailable, making black-box UE especially important. However, existing work on black-box UE for LLMs remains fragmented in methodology and lacks a unified empirical comparison. To address this gap, we present a systematic review of black-box UE methods and organize them into five categories: verbalization-based, sampling-based, explanation-based, multi-agent, and hybrid methods. We further build a unified evaluation framework and benchmark 24 representative methods across 4 models and 4 dataset settings. Our results show that no single method consistently dominates across all settings. Nevertheless, methods that reason over and compare candidates in the answer space are generally effective, and hybrid methods that combine multiple uncertainty signals perform well under most conditions. By releasing the benchmark data and a unified evaluation framework, we aim to facilitate reproducible comparisons and support future research, while our empirical findings provide practical guidance for developing future black-box UE methods for LLMs.

22.
arXiv (CS.LG) 2026-06-17

A Dynamical Systems Perspective on the Analysis of Neural Networks

arXiv:2507.05164v2 Announce Type: replace-cross Abstract: In this chapter, we utilize dynamical systems to analyze several aspects of machine learning algorithms. As an expository contribution we demonstrate how to re-formulate a wide variety of challenges from deep neural networks, (stochastic) gradient descent, and related topics into dynamical statements. We also tackle three concrete challenges. First, we consider the process of information propagation through a neural network, i.e., we study the input-output map for different architectures. We explain the universal embedding property for augmented neural ODEs representing arbitrary functions of given regularity, the classification of multilayer perceptrons and neural ODEs in terms of suitable function classes, and the memory-dependence in neural delay equations. Second, we consider the training aspect of neural networks dynamically. We describe a dynamical systems perspective on gradient descent and study stability for overdetermined problems. We then extend this analysis to the overparameterized setting and describe the edge of stability phenomenon, also in the context of possible explanations for implicit bias. For stochastic gradient descent, we present stability results for the overparameterized setting via Lyapunov exponents of interpolation solutions. Third, we explain several results regarding mean-field limits of neural networks. We describe a result that extends existing techniques to heterogeneous neural networks involving graph limits via digraph measures. This shows how large classes of neural networks naturally fall within the framework of Kuramoto-type models on graphs and their large-graph limits. Finally, we point out that similar strategies to use dynamics to study explainable and reliable AI can also be applied to settings such as generative models or fundamental issues in gradient training methods, such as backpropagation or vanishing/exploding gradients.

23.
arXiv (CS.CV) 2026-06-18

ProductConsistency: Improving Product Identity Preservation in Instruction-Based Image Editing via SFT and RL

Recent advances in instruction-based image editing have enabled models to perform complex visual edits from natural language instructions. However, in product-centric scenarios where preserving product features, branding, and textual elements are critical, current open and closed source models often struggle to maintain this fine-grained object identity. This issue is further compounded by the lack of datasets for instruction-based product image editing with text fidelity constraints, leaving it largely treated as an implicit capability of instruction-based image editing models. In this work, we introduce the ProductConsistency dataset which is designed to improve product-centric image editing. Our approach includes a supervised fine-tuning (SFT) dataset of 87k samples for product editing, a reinforcement learning (RL) dataset with 869 unique product images, and a new benchmark dataset, the ProductConsistency Benchmark, to allow rigorous and standardized evaluation of editing models. To guide RL training, we propose a Cyclic Consistency reward that enforces semantic preservation of product identity by using caption similarity between the original product description and captions generated from the edited image. We fine-tune both Qwen-Image-Edit-2511 and Flux.1-Kontext-dev using our dataset and demonstrate consistent improvements over baseline models in OCR and Perceptual metrics, and MLLM-based evaluations as well, indicating stronger product consistency, text rendering, and overall visual quality; with the Qwen-Image-Edit-2511 model achieving a 5x reduction in the character error rate. The code and pipeline is available at https://anonymous.4open.science/r/ProductConsistency-6FCC/README.md

24.
arXiv (quant-ph) 2026-06-19

Phase locking nuclear spins in silicon with spin-orbit coupling

arXiv:2606.20340v1 Announce Type: new Abstract: Because they have such long coherence times, nuclear spins have extraordinary potential for use in quantum information processing devices. However, coherent nuclear spin control generally requires external phase references, such as microwave control fields. Here, we phase-lock a $^{29}$Si nuclear spin ensemble in a silicon quantum dot using only the internal electronic spin-orbit coupling as a phase reference. When driven with the quantum-dot electrons, the nuclear spins align themselves to a phase determined by the electronic spin-orbit coupling and the timing of the drive protocol. This enables us to measure the coherent precession and inhomogeneous dephasing of the nuclear spins. We corroborate our results with detailed numerical simulations of the many-body electron nuclear system. Our work opens new routes for coherently controlling solid-state nuclear spin ensembles.

25.
arXiv (CS.CL) 2026-06-17

Self-Generated Error Training for Token Editing in Diffusion Language Models

作者:

Token-to-token (T2T) editing lets LLaDA2.1 revise committed tokens during block-diffusion decoding. The released recipe trains this editor on random vocabulary corruptions, but at inference the editor sees the model's own fluent, high-confidence draft errors instead. We study this training-inference mismatch and propose self-generated T2T, which performs a no-gradient draft pass, fills masked positions with predicted tokens, and supervises recovery in a second pass under these self-generated corruptions. We implement the update as a short LoRA continued-pretraining pass on LLaDA2.1-mini and evaluate on several benchmarks under the official Q-Mode T2T procedure with unchanged inference parameters. The method generally improves accuracy while reducing T2T edit intensity, mitigating failure modes such as final-digit transcription errors after otherwise correct reasoning and excessive self-correction before short factual answers.