Academic Intelligence · Curated Daily

探索全球前沿学术脉络

AcademicHub 汇聚顶级期刊与预印本平台的实时文献。定制您的专属科研雷达,利用大语言模型自动生成交叉领域文献分析简报。

01.
arXiv (quant-ph) 2026-06-16

REGRID-QAOA: A Resource-Efficient Graph-Reduced Hybrid QAOA Framework for Physics-Constrained Power System Islanding

arXiv:2606.15083v1 Announce Type: new Abstract: Quantum computing has rapidly emerged as a powerful paradigm for tackling computationally demanding problems. In particular, quantum optimization shows strong promise for hard combinatorial problems in power systems, where increasing distributed energy penetration heightens the need for intentional islanding to maintain grid reliability and resilience. However, power system islanding is an NP-hard combinatorial optimization problem that becomes computationally prohibitive for classical solvers as network size grows, motivating the use of quantum computing as a promising alternative pipeline. This study develops a resource-efficient hybrid QAOA islanding framework that brings physics-constrained power-system partitioning into the quantum optimization workflow. The framework combines coherency-informed graph reduction, physics-aware constraint modeling, and structured post-processing to efficiently convert shallow-circuit QAOA samples into high-quality feasible islanding decisions without deep circuits or large shot budgets. The proposed framework is validated on the standard IEEE benchmark systems (9-, 14-, 24-, 30-, 39-, and 57-bus), demonstrating that the hybrid workflow achieves Gurobi-optimal solution quality with a clear quantum resource advantage over vanilla QAOA, while the resulting islanding solutions satisfy all physical feasibility requirements after network separation. This study establishes QAOA-based islanding as a viable quantum approach for critical infrastructure, with structured post-processing as the key enabler of quantum resource efficiency.

02.
arXiv (CS.LG) 2026-06-12

Toward General Digraph Contrastive Learning: A Dual Spatial Perspective

arXiv:2510.16311v2 Announce Type: replace Abstract: Graph Contrastive Learning (GCL) has emerged as a powerful tool for extracting consistent representations from graphs, independent of labeled information. However, existing methods predominantly focus on undirected graphs, disregarding the pivotal directional information that is fundamental and indispensable in real-world networks (e.g., social networks and recommendations).In this paper, we introduce S2-DiGCL, a novel framework that emphasizes spatial insights from complex and real domain perspectives for directed graph (digraph) contrastive learning. From the complex-domain perspective, S2-DiGCL introduces personalized perturbations into the magnetic Laplacian to adaptively modulate edge phases and directional semantics. From the real-domain perspective, it employs a path-based subgraph augmentation strategy to capture fine-grained local asymmetries and topological dependencies. By jointly leveraging these two complementary spatial views, S2-DiGCL constructs high-quality positive and negative samples, leading to more general and robust digraph contrastive learning. Extensive experiments on 7 real-world digraph datasets demonstrate the superiority of our approach, achieving SOTA performance with 4.41% improvement in node classification and 4.34% in link prediction under both supervised and unsupervised settings.

03.
PLOS Medicine 2026-05-21

Semaglutide-associated risk of nonarteritic anterior ischemic optic neuropathy in patients with type 2 diabetes: A systematic review and meta-analysis of observational studies

by Jędrzej Chrzanowski, Magdalena Walicka, Jacek Burzyński, Małgorzata Zaraś, Arkadiusz Michalak, Wojciech Fendler Background Semaglutide, a glucagon-like peptide-1 receptor agonist, is widely used for the management of type 2 diabetes (T2DM). Recent case reports have raised concerns about a potential association between semaglutide use and the development of nonarteritic anterior ischemic optic neuropathy (NAION), a rare but vision-threatening condition. We aimed to evaluate whether semaglutide use is associated with an increased risk of NAION in patients with T2DM. Methods and findings We conducted a systematic review and meta-analysis of observational studies comparing patients with T2DM aged ≥12 years treated with semaglutide to those receiving other glucose-lowering therapies. We searched PubMed, Scopus, and Web of Science databases from January 2023 to November 2025. Two reviewers independently extracted data on study design, population characteristics, and outcomes. Risk of bias was assessed using the Newcastle–Ottawa Scale, and ROBINS-I v.2. Certainty of the evidence was graded according to the GRADE framework. Pooled hazard ratios (HRs) and 95% confidence intervals (CIs) were calculated using fixed-effects models; sensitivity analyses included crude and subgroup HRs, and overlapping study replacement. Leave-one-out analysis was conducted to assess small-study effects and publication bias. Results were contextualized within other meta-analyses, systematic reviews, consensus statements, and regulatory communications on the topic.Five eligible observational studies met the inclusion criteria, and 7 additional studies were included in the sensitivity analysis. Semaglutide use was associated with a significantly increased hazard of NAION compared with nonsemaglutide glucose-lowering regimens (HR 2.17, 95% CI [1.73, 2.74]; p 

04.
arXiv (CS.AI) 2026-06-25

STEB: A Speech-to-Speech Translation Expressiveness Benchmark for Evaluating Beyond Translation Fidelity

arXiv:2606.25529v1 Announce Type: cross Abstract: Speech-to-speech translation (S2ST) should preserve not only lexical meaning, but also expressive attributes: emotion, scenario style (e.g., news reporting vs. dramatic dialogue), and nonverbal vocalizations (NVs). Moreover, collecting cross-lingual target speech that is both translation-faithful and expressively aligned with the source is difficult at scale, making reference-based evaluation impractical. We introduce STEB (Speech-to-Speech Translation Expressiveness Benchmark), a 32.6-hour Chinese–English benchmark that evaluates both standard dimensions (translation fidelity, speaker similarity, duration alignment) and expressiveness dimensions (emotion, scenario style, NV preservation). For expressiveness evaluation, STEB uses a caption-then-summarize framework that converts speech into structured expressive attributes and compares source and hypothesis attributes with an LLM judge. Human validation shows statistically significant correlations with listener judgments across all expressive dimensions. We evaluate six S2ST systems covering cascaded systems, end-to-end models, and speech large language models. Many systems, especially cascaded ones, achieve strong translation fidelity, but they still struggle with emotion preservation (best: 3.82/5) and NV preservation (best: 2.31/5). These results reveal a gap between semantic transfer and expressive transfer, identifying expressiveness preservation as an open challenge for S2ST. Audio samples are available at https://cmots.github.io/steb.github.io/.

05.
arXiv (quant-ph) 2026-06-25

Single-Period Floquet Control of Bosonic Codes with Quantum Lattice Gates

arXiv:2601.08782v2 Announce Type: replace Abstract: Bosonic codes constitute a promising route to fault-tolerant quantum computing. Existing Floquet protocols enable analytical construction of bosonic codes but typically rely on slow adiabatic ramps with thousands of driving periods. In this work, we circumvent this bottleneck by introducing an analytical and deterministic Floquet method that directly synthesizes arbitrary unitaries within a single period. The phase-space unitary ensembles generated by our approach reproduce the Haar-random statistics, enabling practical pseudorandom states in continuous-variable systems. We prepare various prototypical bosonic codes from vacuum and implement single-qubit logical gates with high fidelities using quantum lattice gates. By harnessing the full intrinsic nonlinearity of Josephson junctions, quantum lattice gates decompose quantum circuits into primitive operations for efficient continuous-variable quantum computing.

06.
arXiv (CS.CV) 2026-06-11

Performance Analysis of YOLOv11 and YOLOv8 for Mixed Traffic Object Detection under Adverse Weather Conditions in Developing Countries

In modern vehicular systems, robust performance under harsh conditions has become a critical problem of autonomous driving. Our study delivers a comprehensive evaluation of the newest iteration of the YOLO series, which is YOLOv11 Nano architecture benchmarked against the widely adopted YOLOv8 Nano as a baseline on a custom fused dataset that combines the Indian Driving Dataset (IDD) [1] and Berkeley Deep Drive Dataset (BDD100K) [2]. We have analyzed the trade-offs among detection accuracy, inference speed, and computational efficiency in high-entropy scenarios involving dense mixed traffic, rain, and low-light conditions. Specifically, YOLOv11n achieves a mean Average Precision (mAP@50) of 46.6%, with a notable 3.2% improvement in Precision over the baseline, effectively reducing false positives in cluttered scenes. Furthermore, the proposed model exhibits enhanced energy efficiency, requiring 22% fewer FLOPs (6.3G vs. 8.1G) while maintaining real-time inference speed of 70.9 FPS on a Tesla T4 GPU, offering an optimal trade-off for safety-critical edge deployment.

07.
arXiv (CS.CL) 2026-06-16

Do LLMs Reliably Identify Correct Information Units in Aphasic Discourse?

Correct Information Units (CIUs) are central to discourse assessment in aphasia because they quantify communicative informativeness rather than linguistic form alone. However, CIU scoring is time intensive and requires trained raters. This study examined whether instruction-tuned large language models (LLMs) can reliably perform token-level CIU classification from aphasic discourse transcripts. Sixteen picture-description transcripts elicited with the Cat Rescue stimulus were annotated for CIU status according to Nicholas and Brookshire (1993). The sample spanned four severity strata: control, mild, moderate, and severe aphasia. Four publicly available instruction-tuned LLMs were benchmarked under zero-shot and two few-shot prompting conditions across five stratified random seeds. Performance was evaluated against consensus human labels using accuracy, precision, recall, F1, and Cohen's kappa. Zero-shot prompting was insufficient across models. In contrast, few-shot prompting yielded substantial gains and produced competitive performance for three viable models. Mean few-shot F1 scores ranged from 0.776 to 0.817 across Llama-3.1-8B, Qwen2.5-7B, and Mistral-7B, with no significant differences between fixed global and per-chunk local example selection. Phi-3-mini was unstable and did not yield reliable performance. Viable models showed high recall but lower precision, suggesting systematic over-classification of tokens as CIUs. Performance also varied by discourse severity, with the weakest results in more severe aphasia. Few-shot LLM prompting can support automated CIU identification without gradient-based task training, but agreement with human annotation remains insufficient for fully autonomous use. These findings support LLM-based CIU scoring as a promising human-in-the-loop component of discourse assessment systems.

08.
arXiv (CS.AI) 2026-06-16

Why Low-Precision Transformer Training Fails: An Analysis on Flash Attention

arXiv:2510.04212v4 Announce Type: replace-cross Abstract: The pursuit of computational efficiency has driven the adoption of low-precision formats for training transformer models. However, this progress is often hindered by notorious training instabilities. This paper provides the first mechanistic explanation for a long-standing and unresolved failure case where training with flash attention in low-precision settings leads to catastrophic loss explosion. Our in-depth analysis reveals that the failure is not a random artifact but caused by two intertwined phenomena: the emergence of similar low-rank representations within the attention mechanism and the compounding effect of biased rounding errors inherent in low-precision arithmetic. We demonstrate how these factors create a vicious cycle of error accumulation that corrupts weight updates, ultimately derailing the training dynamics. To validate our findings, we introduce a minimal modification to the flash attention that mitigates the bias in rounding errors. This simple change stabilizes the training process, confirming our analysis and offering a practical solution to this persistent problem. Code is available at https://github.com/ucker/why-low-precision-training-fails.

09.
arXiv (CS.CV) 2026-06-17

A Benchmark for Omni-Modal Reasoning in Long Videos

Long-form omni-modal video understanding requires integrating vision, speech, and ambient audio with coherent long-context reasoning. Existing video benchmarks often trade off temporal scale, modality coverage, open-ended interaction, and interpretable scoring. To address this gap, we introduce LongShOTBench, a long video understanding benchmark designed around three coupled goals: holistic omni-modal integration, intent-driven open-ended interaction, and rubric-level diagnosis. It builds single- and multi-turn questions from real viewing scenarios, with systematic tasks probing visual, speech, ambient-audio, temporal, and cross-modal reasoning. Each item includes a reference answer and a weighted criterion-level rubric, letting evaluation identify which perceptual facts, temporal links, modality-grounding requirements, and reasoning steps are satisfied or missed. All samples are manually verified to improve grounding, clarity, and rubric reliability. We also introduce LongShOTAgent, a training-free omni-modal evidence-seeking agent coupling full-video preprocessing with targeted retrieval, query-adaptive segment refinement, and explicit claim verification over visual, speech, and non-speech audio evidence. Its iterative search-refine-verify loop exposes intermediate evidence and lets modality-specific specialists re-analyze relevant moments before answering. We evaluate 105 video-capable models spanning open-source omni-modal models, vision-language systems, audio LLMs, agentic pipelines and closed-source APIs. Current MLLMs remain far from saturating LongShOTBench, while our LongShOTAgent is the strongest training-free system, reaching 66.64% overall. By releasing the benchmark, leaderboard, and method, we provide a shared, interpretable testbed for advancing long-form omni-modal video reasoning. Code, data, and the leaderboard are available at https://longshot.cvmbzuai.com/.

10.
arXiv (CS.CL) 2026-06-17

The Critical Role of Model Selection in Causal Inference: A Comparative Analysis of Classification Models within the InferBERT Framework for Pharmacovigilance

Distinguishing causal adverse drug events (ADEs) from spurious correlations remains a central challenge in pharmacovigilance. The InferBERT framework integrates transformer models with Do-calculus, but its success hinges on the underlying classification model. This study evaluates the impact of model choice in InferBERT, assessing whether simpler models suffice, if domain-specific pre-training helps, whether scaling to LLMs improves causal detection, and the effect of post-hoc calibration. We performed a comparative study on two benchmarks: Analgesics-induced Acute Liver Failure (AILF) and Tramadol-related Mortalities (TRAM). Four models were evaluated-XGBoost (baseline), ALBERT (original InferBERT), BioBERT (biomedical transformer), and Med-LLaMA (medical LLM)-using 5-fold cross-validation repeated over 20 runs. We measured accuracy, Expected Calibration Error (ECE) pre- and post-isotonic regression, and Jaccard concordance of causal terms with PRR, ROR, and EBGM; significance was tested with paired t-tests. BioBERT achieved the highest accuracy on both datasets, while Med-LLaMA underperformed despite its size and parameter-efficient fine-tuning. Domain-specific pre-training was decisive. Calibration improved ECE but had mixed effects on accuracy and causal discovery. BioBERT's superiority also yielded the strongest concordance with traditional pharmacovigilance signals. These results show that domain-specific pre-training provides a clear advantage over simpler baselines and larger LLMs. Investing in manageable, domain-aware models is more effective for computational pharmacovigilance than simply scaling model size.

11.
arXiv (CS.LG) 2026-06-15

Direct Fisher Score Estimation for Likelihood Maximization

arXiv:2506.06542v2 Announce Type: replace-cross Abstract: We study the problem of likelihood maximization when the likelihood function is intractable but model simulations are readily available. We propose a sequential, gradient-based optimization method that directly models the Fisher score based on a local score matching technique which uses simulations from a localized region around each parameter iterate. By employing a linear parameterization to the surrogate score model, our technique admits a closed-form, least-squares solution. This approach yields a fast, flexible, and efficient approximation to the Fisher score, effectively smoothing the likelihood objective and mitigating the challenges posed by complex likelihood landscapes. We provide theoretical guarantees for our score estimator, including bounds on the bias introduced by the smoothing. Empirical results on a range of synthetic and real-world problems demonstrate the superior performance of our method compared to existing benchmarks.

12.
arXiv (quant-ph) 2026-06-16

Towards Interpretability of Neural Quantum States

arXiv:2508.14152v2 Announce Type: replace Abstract: Neural quantum states (NQS) have emerged as a powerful variational ansatz for representing quantum many-body wave functions. Their internal mechanisms, however, remain poorly understood. We investigate the role of correlations for NQS-like quantum state representation by employing a correlation-based interpretable neural network architecture and then proving our observations using Boolean function theory. The correlator neural network demonstrates that, even for simple product states, up to all system-size correlation orders in the chosen computational basis are required to represent a quantum state faithfully. We explain these observations using Fourier expansion, which reveals the correlator basis as the effective basis of the internal NQS structure, the resulting necessity for high-order correlations that is supported by an entanglement bound that scales with the correlation order, consequences of linear dependencies in constrained Hilbert spaces for correlation requirements, and connections between spin basis rotations and the correlator basis. Furthermore, we analyze how neural networks achieve high correlation orders by increasing the magnitude of the network weights, which can be compensated by increasing the network depth. Lastly, we discuss how activation functions, network architectures, and choice of reference basis influence correlation requirements. Our results provide new insights and a better understanding of the internal structure and requirements of NQS, enabling a more systematic use of NQS in future research.

13.
arXiv (CS.CV) 2026-06-16

Fusing Transferred Priors and Physics-based Decomposition for Underwater Image Enhancement

The underwater images are captured within diverse water-medium conditions, leading to complex degradation, including color bias, low contrast, and blur effect. Recently, learning-based methods have demonstrated their potential for underwater image enhancement (UIE). However, most of the previous work focus on the training strategy or network design to make the enhanced result aligned well with the labels in datasets, ignoring that the labels are selected from the enhanced results of previous UIE methods and these pseudo-labels are noisy. Consequently, the performance of their models is not satisfactory to a certain extent. However, collecting the true labels of the underwater images is challenging. In this work, we propose a transfer learning-based UIE that does not require underwater images to have paired noisy or true labels for learning. Instead, the UIE task is first divided into global color correction, haze removal, and background noise suppression following the underwater physics. Then multiple types of prior from other vision tasks are leveraged as cross-domain supervision in each step. In this way, a novel UIE is available via transfer learning, and the physics-aligned UIE decomposition provides theoretical soundness. Qualitative and quantitative experiments demonstrate that our proposal based on physics and priors fusion achieves SOTA performance in the UIE task and effectively boosts downstream vision tasks, significantly outperforming benchmark methods. Project repo: https://github.com/Haru2022/P2-UIE.

14.
arXiv (CS.LG) 2026-06-18

Ensuring Trustworthy Online A/B Testing: Addressing Five Key Questions on CUPED

arXiv:2606.18750v1 Announce Type: cross Abstract: A/B testing has become the gold standard for data-driven decision-making in large-scale online experimentation, providing critical guidance for feature launch, pricing optimization, and user experience enhancement. To maximize statistical sensitivity, many technology companies routinely employ Controlled-experiment Using Pre-Experiment Data (CUPED), a technique that achieves substantial variance reduction while preserving the unbiasedness of estimating the average treatment effect. Despite its widespread adoption, several critical methodological and practical nuances of CUPED remain underexplored. This paper systematically addresses five frequently encountered yet overlooked questions regarding the application of CUPED. First, we provide a comparative analysis of various post-CUPED estimators to identify the optimal adjustment specification. Second, we evaluate the validity of regression-based adjustments and delineate robust variance estimation methods tailored for such frameworks. Finally, we extend our investigation to complex but common scenarios, including multi-arm experiments and two-stage sampling designs. Our findings reveal that in these settings, naive reliance on standard variance estimators can lead to severely misleading inferences. By offering rigorous theoretical insights and extensive experimental validation, this work deepens the conceptual understanding of CUPED. Notably, the recommended methodologies have been successfully deployed and integrated into ByteDance's experimentation platform.

15.
arXiv (CS.CV) 2026-06-11

What Semantics Survive the Connector? Diagnosing VLM-to-DiT Alignment in Video Editing

Flow matching based video generative models have been increasingly relying on prepended Vision-Language Models (VLMs) to handle complex, instruction-based video editing. The prevailing assumption underlying this paradigm is that a connector module can seamlessly align the VLM's rich multi-modal reasoning with the original text embedding space of DiTs. However, we hypothesize that this alignment acts as a severe semantic bottleneck, degrading fine-grained structural variables. Verifying this is challenging, as end-to-end evaluations conflate alignment failures with generation errors, and natural datasets lack disentangled annotations. To rigorously investigate this, we propose a controlled data processing pipeline based on video composition that results in TRACE-Edit, a diagnostic dataset focusing on relation-based editing. Leveraging this dataset, we propose a comprehensive diagnostic protocol to analyze two important designs of meta-query and connector in the existing video editing models. Systematic evaluation of four representative model cases reveals that fine-grained structural semantics can be severely degraded during alignment. Our findings overturn the assumption of lossless semantic transfer, identifying the VLM-to-DiT alignment as a major bottleneck and providing a new diagnostic foundation for future multi-modal alignment architectures.

16.
arXiv (CS.CV) 2026-06-11

Learning Instance-Adaptive Low-Rank Orthogonal Subspaces for Clothes-Changing Person Re-Identification

Clothes-changing person re-identification (CC-ReID) aims to recognize individuals despite drastic appearance changes caused by clothing variation. While existing methods rely on adversarial learning to disentangle clothing features, we propose Ortho-ReID, which explicitly models a low-rank clothing subspace from VLM text descriptions and extracts clothing-invariant representations via direct geometric constraints. A critical component is our transformer-based Basis Maker, which refines a shared, low-dimensional clothing prior into an instance-adaptive low-rank subspace through cross-attention with image patches, enabling robust clothing feature extraction even under varying visibility conditions. This instance-adaptive subspace is supervised via alignment with clothing text embeddings, while identity features are extracted via a learnable projection head and geometrically constrained to be strictly orthogonal to it. Extensive experiments demonstrate state-of-the-art performance on PRCC (+5.9% top-1), Celeb-reID-light (+3.5%), and LaST (+5.3%), with competitive results on LTCC.

17.
arXiv (CS.CL) 2026-06-18

Want Better Synthetic Data? Steer It: Activation Steering for Low-Resource Language Generation

Large language models (LLMs) have become an effective tool for synthetic data generation, including for low-resource languages, where generated data can improve downstream task performance. Current best-performing approaches typically rely on few-shot prompting with target-language examples, which increases inference costs and may reduce diversity through lexical anchoring. In this work, we investigate activation steering as an alternative for low-resource synthetic data generation. We study two steering strategies: Language Steering, which targets the linguistic identity of a language, and Quality Steering, which captures well-formedness by contrasting human-written and backtranslated text representations. We evaluate these methods across four open-source LLMs, multiple layers, and 11 typologically diverse languages by generating sentiment and topic classification data and finetuning smaller classifiers. Steering is applied in both zero-shot and few-shot prompting settings and compared against non-steered counterparts. Our results show that steering on early layers consistently improves the diversity of generated data while often yielding stronger downstream model performance, particularly for low-resource languages.

18.
bioRxiv (Bioinfo) 2026-06-08

HydraMPP: A lightweight library for distributed massive parallel processing in Python - threading at scale.

We now exist in the era of massive datasets from genomics, large language models, and all the known knowledge of humanity right at our fingertips. Much of this data is becoming more accessible; however, processing such data remains an ongoing issue across systems including high performance computing (HPC) infrastructures. Massively parallel computing (MPP) has solved this using a divide and conquer approach by splitting workloads across independent nodes (i.e., central processing units (CPU) allowing for higher scaling of data). The main engine for this in python is Ray; however, it has many issues including a large code space, security issues, debugging opacity, and memory management issues. Here, we present HydraMPP, a lightweight, ease of use and utilization, with high auditability, and with SLURM ergonomics.

19.
arXiv (CS.LG) 2026-06-25

Structured Approximations of Measures

arXiv:2310.09149v3 Announce Type: replace-cross Abstract: We study the approximation of probability measures in the Wasserstein-$p$ distance by structured classes of approximators, motivated by applications in imaging, machine learning, and physical measurement under sensor constraints. We obtain three sets of results. First, for measures with densities bounded away from zero on a bounded Lipschitz domain $\Omega$, we prove that any approximation scheme for functions in $\mathrm{L}_p(\Omega)$ transfers, with linear rate, to a corresponding approximation scheme for measures in $\mathrm{W}_p(\Omega)$. The argument applies a theorem of Bogovskii on regularity of solutions to the continuity equation in the Benamou-Brenier formulation of optimal transport. We exhibit concrete approximation schemes (polynomials, shift-invariant spaces, cardinal interpolation with radial basis functions, kernel density estimators, and piecewise approximations on nonuniform Voronoi partitions) that fit the framework. As a matter of independent interest, we prove a negative Sobolev lower bound that generalizes existing bounds from $p=2$ to all $p\in(1,\infty)$. We also consider deterministic bounds for discrete approximations to arbitrary measures in terms of the mesh norm of a quasi-uniform set of points. We specialize these bounds to show that compactly supported measures admit a deterministic $N$-term approximation $\mu_N$ such that $\mathrm{W}_p(\mu,\mu_N) = O(N^{-\frac{1}{d}})$ for all $d\geq 1$, which matches the asymptotic optimal quantizer rate. We also extend these results to non-compactly supported measures with appropriate tail decay.

20.
arXiv (CS.AI) 2026-06-25

Reward-Conditioned Attention: How Reward Design Shapes What Autonomous Driving Agents See

arXiv:2606.25127v1 Announce Type: cross Abstract: We investigate how reward design shapes the internal attention patterns of reinforcement learning agents trained for autonomous driving. Using three Perceiver-based agents that share identical architectures and training data but differ only in their reward configurations$\unicode{x2014}$ranging from basic violation penalties to continuous proximity penalties$\unicode{x2014}$we analyze cross-attention allocation across 50 real-world scenarios from the Waymo Open Motion Dataset. A central methodological finding is that naïve pooling of timesteps across episodes substantially underestimates the attention$\unicode{x2013}$risk relationship; within-episode correlation with Fisher z-transform aggregation is the appropriate statistic and reveals a robustly positive link between collision risk and agent-directed attention. Building on this validated methodology, we demonstrate two reward-conditioned effects: agents trained with navigation rewards allocate up to $2.0\times$ more attention to GPS-path tokens than those trained with additional proximity penalties$\unicode{x2014}$and $4.7\times$ more than agents with no navigation incentive$\unicode{x2014}$revealing that reward content directly determines which scene elements the encoder prioritizes, and continuous time-to-collision penalties create a $learned vigilance prior$$\unicode{x2014}$elevated resting agent surveillance maintained throughout collision-free phases. In several scenarios, the complete-reward and minimal-reward models exhibit opposite attention$\unicode{x2013}$risk correlation directions, demonstrating that reward design can qualitatively reverse attentional strategy rather than merely modulating its magnitude. These results suggest that attention analysis is a practical diagnostic for verifying that a reward function produces the intended representational behaviour in safety-critical RL systems.

21.
arXiv (CS.CV) 2026-06-24

Cyclic Denoising Reveals Ultrastable Memories in Diffusion Models

We introduce cyclic denoising – repeated forward and reverse diffusion at controlled noise amplitudes – as an extraction attack for image diffusion models. Inspired by random organization in disordered solids, cyclic denoising exposes regions of the learned distribution that are largely inaccessible to standard sampling. The dynamics drive samples toward attractors with a broad stability spectrum. The deepest attractors are ultrastable: they regenerate after near-total corruption and persist through thousands of noising-denoising cycles. Many of these attractors correspond to memorized training images, including stock photographs, brand watermarks, and web-crawl artifacts. The attack requires only sampler-level control, with no gradients, weight inspection, prompts, captions, or prior knowledge of the training data. Unlike generate-and-filter attacks, which rely on large-scale prompted generation and post-hoc similarity or membership-inference filtering, our main protocol is fully unconditioned. We demonstrate the phenomenon in Stable Diffusion v1.4 and in a pixel-space DDPM, showing consistent behavior across latent- and pixel-space diffusion models. Across noise amplitudes, we observe a yielding-like transition: low-amplitude cycling produces trivial absorbing fixed points or limit cycles, while larger amplitudes induce rearrangements, basin hopping, and long-lived trapping in structured memorized attractor basins. We also observe hierarchical partial absorption, prompt-stabilized basins, and cross-initial-condition universality of the recovered attractor set. Our results therefore show that cyclic denoising is both a physics-inspired probe of generative landscapes and a practical tool for memorization auditing, with implications for privacy, copyright compliance, and model fingerprinting.

22.
arXiv (CS.AI) 2026-06-16

PANDA: An LLM-Enhanced Performance-Driven Analog Design Framework Bridging Design Intent and Layout Generation

arXiv:2606.15052v1 Announce Type: cross Abstract: Traditional design of analog circuits heavily relies on manual interventions across topology, sizing, and layout, with prior automation addressing stages in isolation. In this work, we propose PANDA, an LLM-enhanced framework that bridges high-level design intent to final layout by actively managing cross-stage dependencies through guided topology synthesis, substructure-aware sizing, and constraint-driven layout generation. This shifts automation from algorithm-centric execution to intent-centric co-design, reducing turnaround time from days or weeks to hours while improving design performance.

23.
arXiv (CS.LG) 2026-06-24

Debate2Create: Robot Co-design via Multi-Agent LLM Debate

arXiv:2510.25850v3 Announce Type: replace-cross Abstract: We introduce Debate2Create (D2C), a multi-agent LLM framework that formulates robot co-design as structured, iterative debate grounded in physics-based evaluation. A design agent and control agent engage in a thesis-antithesis-synthesis loop, while criterion-specific LLM judges provide multi-objective feedback to steer exploration. Across five MuJoCo locomotion benchmarks, D2C achieves the highest default-normalized score among the evaluated LLM-based and black-box baselines, with gains up to 3.2x on Ant and nearly 9x on Swimmer. Iterative debate yields 18-35% gains over compute-matched zero-shot generation, and D2C-generated rewards transfer to default morphologies in 4/5 tasks. These results suggest that structured, simulator-grounded multi-agent interaction is a useful mechanism for joint morphology-reward optimization under a fixed-topology, per-candidate-RL protocol. Project page: debate2create.github.io.

24.
arXiv (quant-ph) 2026-06-17

Closest Accessible Symmetry reduction: a tool for Hamiltonian interpolation analysis

arXiv:2606.18161v1 Announce Type: new Abstract: We introduce a framework for analysing the spectrum of Hamiltonian interpolations without heavily relying on discretising the interpolation parameter. The method is based on the concept of accessible symmetries: a problem-class-dependent family of certifiable reflections that induce bipartitions of the Hilbert space. At each step, the interpolation Hamiltonian is projected onto the sectors of the accessible symmetry that is closest to being satisfied, yielding a hierarchy of weakly coupled pseudo-eigenspaces together with explicit residual couplings between them. We show that this representation captures qualitative signatures of quantum phase transitions, provides estimates of their location, and offers insights into their nature. The quality of the approximation is controlled by the compatibility between the accessible symmetry family and the problem instance. Although motivated in spirit by adiabatic quantum computation, our approach applies more broadly to the study of Hamiltonian phase diagrams, providing a new perspective on the spectral reorganisation of many-body quantum systems.

25.
arXiv (CS.AI) 2026-06-19

Cross-Dataset, Age, and Gender Generalization: A Comprehensive Analysis of Fine-Tuning Strategies for Low-Resource Children's ASR

arXiv:2606.19791v1 Announce Type: cross Abstract: The challenge associated with recognizing dysarthric speech primarily arises from pronounced acoustic variability attributed to impaired articulatory precision. Past research has demonstrated improved recognition through the use of hybrid DNN/HMM sequence discriminative training. This paper presents a comprehensive investigation of various combinations of acoustic features tailored to different Acoustic Models, offering suitable feature selections for each. The incorporation of Pitch features notably improved recognition performance, especially for sentence recognition tasks involving dysarthric speech. Through a systematic examination of the TORGO database, we have demonstrated the potential to enhance the performance of the state-of-the-art Factorized Time Delay Neural Network (F-TDNN) model for recognizing dysarthric speech. Our methods, implemented with the F-TDNN model, resulted in a 4.65\% relative improvement in isolated word recognition and a 4.63\% relative improvement in sentence recognition for dysarthric speech, compared to previous research. This improvement effectively compensates for speech variability, attributable to our deliberate selection of the number of overlapping frames between consecutive training example chunks.