Academic Intelligence · Curated Daily

探索全球前沿学术脉络

AcademicHub 汇聚顶级期刊与预印本平台的实时文献。定制您的专属科研雷达,利用大语言模型自动生成交叉领域文献分析简报。

01.
arXiv (CS.AI) 2026-06-24

SEAL: Searching Expandable Architectures for Incremental Learning

arXiv:2505.10457v3 Announce Type: replace-cross Abstract: Incremental learning is a machine learning paradigm where a model learns from a sequential stream of tasks. This setting poses a key challenge: balancing plasticity (learning new tasks) and stability (preserving past knowledge). Neural Architecture Search (NAS), a branch of AutoML, automates the design of the architecture of Deep Neural Networks and has shown success in static settings. However, existing NAS-based approaches to incremental learning often rely on expanding the model at every task, making them impractical in resource-constrained environments. In this work, we introduce SEAL, a NAS-based framework tailored for data-incremental learning, a scenario where disjoint data samples arrive sequentially and are not stored for future access. SEAL adapts the model structure dynamically by expanding it only when necessary, based on a capacity estimation metric. Stability is preserved through cross-distillation training after each expansion step. The NAS component jointly searches for both the architecture and the optimal expansion policy. Experiments across multiple benchmarks demonstrate that SEAL effectively reduces forgetting and enhances accuracy while allocating additional capacity only when required. These results highlight the promise of combining NAS and selective expansion for efficient, adaptive learning in incremental scenarios.

02.
arXiv (CS.AI) 2026-06-19

How Transparent is DiffusionGemma?

arXiv:2606.20560v1 Announce Type: cross Abstract: LLM reasoning transparency is a critical affordance for understanding model decisions, mitigating misuse and misalignment, and debugging surprising model behaviors. However, DiffusionGemma performs a larger fraction of its computation in a continuous latent space; does this make its reasoning less transparent? We study this question by decomposing transparency into two components: variable transparency, whether we understand intermediate snapshots of a model's computational state; and algorithmic transparency, whether we can use these snapshots to reconstruct the process by which the model arrived at its outputs. Naively, DiffusionGemma has poor variable transparency: its opaque serial depth, the amount of serial computation that occurs in between interpretable model states, seems at first 28.6X higher than the corresponding autoregressive Gemma 4 model. However, we show that we can map the information flowing between denoising steps through an interpretable token bottleneck with no decrease in downstream performance. Treating these intermediate states as interpretable reduces the opaque serial depth to just 1.1X that of Gemma 4. Algorithmic transparency is harder for diffusion models than for autoregressive models because all token predictions in the canvas can change at every denoising step, giving the model the power to implement complicated distributed algorithms during the denoising process. To begin bridging this gap, we conduct a suite of interpretability case studies, uncovering initial evidence of novel diffusion-specific phenomena such as non-chronological reasoning, token and sequence smearing, and intermediate-context reasoning. Finally, we test monitorability, a key application of transparency that measures whether model outputs are useful for downstream tasks. We find that DiffusionGemma is similarly monitorable to Gemma 4.

03.
arXiv (CS.AI) 2026-06-12

Improving Crash Frequency Prediction from Simulated Traffic Conflicts Using Machine Learning Based Microsimulation

arXiv:2606.12500v1 Announce Type: cross Abstract: Traffic microsimulation combined with surrogate safety measures has increasingly been used as a proactive alternative to historical crash data for predicting crash frequency for current or planned road infrastructure designs. However, existing microsimulation-based safety studies have adopted simplified rule-based behaviour models, which reproduce traffic flow reasonably well but often fail to generate realistic conflict dynamics, limiting crash prediction accuracy. Recent advances in machine learning (ML)-based behaviour models offer a promising opportunity to potentially improve microsimulation realism and crash frequency predictions by learning human driving behaviour directly from large-scale trajectory datasets. To investigate this possibility, traffic microsimulation was conducted for five real-world signalised intersections in Leeds, UK, using both a standard rule-based model and a state-of-the-art ML model. Simulated vehicle trajectories were analysed using a two-dimensional Time-to-Collision metric to identify simulated conflicts, which were then modelled using Extreme Value Theory to predict crash frequency. Results show that conflicts from the ML model yielded crash predictions in line with the real-world crash data, whereas the rule-based model did not permit meaningful predictions, presumably due to a lack of model calibration to the specific simulated intersections. Directly using ML-generated simulated crashes to predict real-world crash frequency also yielded poor results, suggesting that while current ML models can realistically reproduce conflicts, they are not yet able to generate realistic crashes. Overall, the findings demonstrate that ML-based behaviour models are promising for improving crash prediction from simulated conflicts, without a need for location-specific model calibration, and suggest clear future directions for ML-based traffic microsimulation.

04.
arXiv (CS.CV) 2026-06-16

Chroma-gated, differentiable OKLCH interpolation: Continuous Oklab fallback for color-cast reduction

OKLCH – the cylindrical (lightness, chroma, hue) form of Ottosson's Oklab color space – is the interpolation space recommended by CSS Color 4 for gradients and color-mix(), and it is now broadly deployed. Its polar parameterization, however, casts color near the neutral axis in two ways: (1) an inter-hue detour between two chromatic endpoints that sweeps through an unintended hue (blue to yellow visibly passing through green), and (2) an off-line bow when one endpoint is achromatic. Existing remedies are uniformly two-valued – a threshold switch that fires only at an achromatic endpoint – so they address only (2); on chromatic pairs every one of them reduces to raw OKLCH, leaving the (1) inter-hue cast untreated. We introduce Continuous Oklab fallback (COFb), a one-parameter, differentiable chroma gate $w(C)=C^n/(C^n+\sigma^n)$ that continuously blends the OKLCH path toward the linear Oklab path as chroma falls. A single gate reduces the (1) cast that the two-valued family leaves untreated and unifies the handling of (1) and (2) without any endpoint test. We characterize a cast-hue trade-off frontier, adopt a default ($n=1$, the rational Michaelis-Menten form; $\sigma\approx0.19$ for a typical sRGB palette, from a normalization-independent cast-half criterion), and verify the gate's properties symbolically. At the default, COFb halves the inter-hue path detour (mean lateral deviation -49.5%, chroma-weighted hue excursion -35.5%). We also state the method's limits: on (2) alone the two-valued switch remains better, and like any Cartesian blend COFb does not preserve chroma. In deployment, COFb runs entirely in plain Oklab (a,b) to sRGB, so it serves as a fallback that delivers the same cast-reduced gradients where modern CSS color interpolation (color-mix(in oklch) and the like) is unavailable – older engines, image and video pipelines, or GPU shaders.

05.
arXiv (CS.AI) 2026-06-16

QoS-Aware Token Scheduling and Private Data Valuation for Multi-Modal Agentic Networks

arXiv:2606.15573v1 Announce Type: new Abstract: In agentic systems, human-generated data records anchor the value of AI services. Yet cloud compute pipelines centralize processing on remote servers. Data centralization reduces personal data sovereignty and may potentially degrade the quality of service (QoS). Meanwhile, user contributions are diverse in quantity and quality: decentralized records can be biased, noisy, and heterogeneously distributed. To address the data challenge, we study fair token allocation and private data valuation for decentralized and resource-constrained agentic systems. Our approach embeds multi-modal representations in a shared semantic space and releases differentially private (DP) prototypes to preserve utility while reducing semantic leakage. With the DP guarantee, we design a fair token allocation scheme that rewards effective contributions and remains robust to data heterogeneity and AI resource scarcity. Extensive simulations demonstrate improved contribution-based fairness and QoS compared to standard benchmarks. The improved resistance to image reconstruction attacks indicates enhanced privacy for multi-modal personal data.

06.
medRxiv (Medicine) 2026-06-18

Chest X-Ray as a critical screening tool for Household Contacts of TB: Lessons from Three Years of Programmatic Data in India

Introduction: Household contacts (HHCs) of pulmonary TB patients remain at high risk for TB infection and disease progression, yet many remain asymptomatic and are missed by symptom-screening pathways. While India expanded its TB preventative guidelines to include all HHCs in 2021, chest X-ray (CXR) screening continues to be used selectively, representing a missed opportunity in early case detection. Methods: The analysis uses programmatic data from Project JEET 2.0 (Joint Effort for Elimination of Tuberculosis), implemented by the William J. Clinton Foundation in India, between October 2021 and March 2024. Eligible HHCs (>=5 years) were offered CXR screening as part of TB preventive therapy (TPT) evaluation. Descriptive and multivariable analyses examined predictors of CXR uptake and TB yield. A two-stage logistic regression model estimated potential TB yield under universal CXR coverage. Model performance was evaluated using the area under the curve (AUC), and bootstrap simulations generated counterfactual estimates of missed TB cases. Results: Among 1,034,621 HHCs, 1.02% individuals were found positive for TB, which includes 7,786 HHCs who were on TB treatment already, while an additional 2,812 were identified during pre-TPT evaluation. Among eligible HHCs (n = 1,026,835), 70% were screened with CXR, of which 2.4% had suggestive TB findings. Of these, 79% went for further TB assessment. Symptomatic HHCs were more likely to be CXR screened (84% vs 69%) and assessed for TB, yet two-thirds of all detected TB cases were asymptomatic. It is estimated that universal CXR coverage and TB testing for suggestive cases can increase TB detection by at least 87%. Conclusion: The study provides a scalable approach to expand CXR coverage through public-private partnerships, enabling early TB detection among HHCs, especially among asymptomatic contacts. Future implementations will benefit from integrating AI-enabled reading, along with systematic follow up for those with suggestive findings.

07.
arXiv (CS.AI) 2026-06-18

HeRo-Q: A General Framework for Stable Low Bit Quantization via Hessian Conditioning

arXiv:2601.21626v2 Announce Type: replace-cross Abstract: Post Training Quantization (PTQ), a mainstream model compression technique, often leads to the paradoxical 'low error, high loss' phenomenon because it focuses solely on minimizing quantization error. The root cause lies in the Hessian matrix of the LLM loss landscape: a few high curvature directions are extremely sensitive to perturbations. To address this, we propose the Hessian Robust Quantization (HeRo Q) algorithm, which applies a lightweight, learnable rotation-compression matrix to the weight space prior to quantization. This joint framework reshapes the loss landscape by reducing the largest Hessian eigenvalue and reducing its max eigenvalue, thereby significantly enhancing robustness to quantization noise. HeRo-Q requires no architectural modifications, incurs negligible computational overhead, and integrates seamlessly into existing PTQ pipelines. Experiments on Llama and Qwen models show that HeRo Q consistently outperforms state of the art methods including GPTQ, AWQ, and SpinQuant not only achieving superior performance under standard W4A8 settings, but also excelling in the highly challenging W3A16 ultra low bit regime, where it boosts GSM8K accuracy on Llama3 8B to 70.15\% and effectively avoids the logical collapse commonly seen in aggressive quantization.

08.
arXiv (CS.AI) 2026-06-16

From Agent Traces to Trust: A Survey of Evidence Tracing and Execution Provenance in LLM Agents

arXiv:2606.04990v2 Announce Type: replace-cross Abstract: Large language model (LLM)-based agents are evolving from passive text generators into autonomous systems capable of planning, tool use, retrieval, memory access, environmental interaction, and multi-agent collaboration. These capabilities expand agent autonomy, but also make agent behavior harder to verify, debug, and audit. Final-answer accuracy alone cannot explain how an output was produced, which evidence supported each claim, whether tool calls were justified, how memory influenced later decisions, or where failures originated. This survey examines evidence tracing and execution provenance as foundations for process-level accountability in trustworthy LLM agents. We define execution provenance as the typed graph of an agent execution and evidence tracing as its projection onto evidence-support relations. This perspective connects retrieval grounding, claim support, tool-use safety, memory lineage, observability, debugging, audit, and recovery within a unified framework. We introduce a taxonomy covering trace sources, evidence and execution units, provenance relations, tracing granularity and timing, representation forms, and trust functions. We then review key methodological directions, including provenance representation, evidence attribution, tool-use provenance, runtime guardrails, provenance-bearing memory, observability, and failure diagnosis. Finally, we discuss benchmarks, datasets, metrics, and open challenges for building provenance-aware, auditable, and recoverable agent systems.

09.
arXiv (CS.LG) 2026-06-16

Probabilistic Signature Inversion: Learning Conditional Distributions from Truncated Signatures

arXiv:2606.15332v1 Announce Type: new Abstract: The signature transform is a principled feature map for continuous-time paths, valued for its uniqueness and universality. Recovering a path from its truncated signature is, however, structurally ill-posed because the truncated signature map is not injective. We therefore reframe truncated signature inversion as a probabilistic problem – learning the conditional distribution of a path given its truncated signature – and adopt a signature-conditioned flow matching model as a practical estimator. This probabilistic formulation elucidates the fundamental difficulty of inversion: Bayes reconstruction error quantifies the irreducible uncertainty remaining after conditioning on a statistic. We derive the Bayes-optimal error under linear statistics, obtaining a closed form for log-GBM and numerically tractable formulas for log-fBM and OU, yielding a concrete theoretical baseline for model validation. This baseline upper-bounds the Bayes error under truncated-signature conditioning, since truncated signatures provide richer information than linear statistics. Experiments show that empirical reconstruction errors under linear-statistics conditioning faithfully align with the theory-derived baseline, while errors decrease when the statistic is replaced with truncated signatures. Moreover, generated paths faithfully recover the conditioning signature while preserving key distributional and temporal structures, indicating that the estimator is well-calibrated to the target conditional distribution. Together, these results establish a well-posed probabilistic framework for truncated-signature inversion, with applicability demonstrated on real financial data beyond the parametric process families covered by theory.

10.
arXiv (CS.LG) 2026-06-16

Smoothness Errors in Dynamics Models and How to Avoid Them

arXiv:2602.05352v3 Announce Type: replace Abstract: Modern neural networks have shown promise for solving partial differential equations over surfaces, often by discretizing the surface as a mesh and learning with a mesh-aware graph neural network. However, graph neural networks suffer from oversmoothing, where a node's features become increasingly similar to those of its neighbors. Unitary graph convolutions, which are mathematically constrained to preserve smoothness, have been proposed to address this issue. Despite this, in many physical systems, such as diffusion processes, smoothness naturally increases and unitarity may be overconstraining. In this paper, we systematically study the smoothing effects of different GNNs for dynamics modeling and prove that unitary convolutions hurt performance for such tasks. We propose relaxed unitary convolutions that balance smoothness preservation with the natural smoothing required for physical systems. We also generalize unitary and relaxed unitary convolutions from graphs to meshes. In experiments on PDEs such as the heat and wave equations over complex meshes and on weather forecasting, we find that our method outperforms several strong baselines, including mesh-aware transformers and equivariant neural networks.

11.
arXiv (CS.AI) 2026-06-16

Model Graph Inductive Learning for Knowledge Graph Completion

arXiv:2606.16509v1 Announce Type: new Abstract: Link prediction in knowledge graphs fundamentally depends on the quality of learned embeddings for entities and relations. However, most existing methods derive these embeddings by aggregating only the local neighborhood of each entity, neglecting the global structure of the knowledge graph. This limited view prevents models from capturing higher-level structural patterns that are essential for accurate and generalizable link prediction. To address these limitations, we introduce Model Graph Inductive Learning (MGIL), a framework that constructs a model graph by clustering entities based on the similarity of their incoming and outgoing relational structures or their entity types. A GNN is then applied to this model graph to produce embeddings that capture the global view of the knowledge graph. These embeddings subsequently serve as high-quality initial features %embeddings for the original knowledge graph, replacing random initialization and leading to more stable and expressive representations. Extensive experiments on standard and recently proposed inductive benchmarks demonstrate that MGIL achieves state-of-the-art or highly competitive performance in inductive link prediction, highlighting its effectiveness across diverse graph settings.

12.
arXiv (CS.AI) 2026-06-15

LEPO: Latent Reasoning Policy Optimization for Large Language Models

arXiv:2604.17892v4 Announce Type: replace-cross Abstract: Recently, latent reasoning has been introduced into large language models (LLMs) to leverage rich information within a continuous space. However, without stochastic sampling, these methods inevitably collapse to deterministic inference, failing to discover diverse reasoning paths. To bridge the gap, we inject controllable stochasticity into latent reasoning via Gumbel-Softmax, restoring LLMs' exploratory capacity and enhancing their compatibility with Reinforcement Learning (RL). Building on this, we propose \underline{L}atent R\underline{e}asoning \underline{P}olicy \underline{O}ptimization~(LEPO), a novel framework that applies RL directly to continuous latent representations. Specifically, in rollout stage, LEPO maintains stochasticity to enable diverse trajectory sampling, while in optimization stage, LEPO constructs a unified gradient estimation for both latent representations and discrete tokens. Extensive experiments show that LEPO significantly outperforms existing RL methods for discrete and latent reasoning.

13.
arXiv (CS.AI) 2026-06-18

Practical Anonymous Two-Party Gradient Boosting Decision Tree

arXiv:2605.26903v2 Announce Type: replace-cross Abstract: Structured data is well handled by gradient-boosted decision trees (GBDT), which are usually trained on vertically partitioned features across mutually distrustful parties. High speed and interpretability make GBDTs popular in finance and healthcare, where neural networks may fall short. Enabling secure computation for GBDTs poses unique challenges, requiring secure record alignment for comparison. Relying on private set intersection (PSI) is a de facto approach. Mistaking PSI for a safety measure actually exposes which record identifiers (IDs) are shared between the datasets. Although circuit-PSI could help, it is costly for generic uses. New ideas are needed to efficiently train in a "dark forest". Aiming to hide the IDs, we initiate the study of anonymous GBDT training on split data held by two parties. Dual circuit-PSI in our design lets the parties alternate as receiver to run pick-then-sum over local features. Via oblivious programmable pseudorandom functions, we propagate circuit-PSI outputs as shared state across runs. Avoiding universal alignment, we resolve the neglected dilemma that ID hiding incurs a cost that scales with domain size. Next, we halve the cost of ciphertext packing used to convert single-instruction multiple-data homomorphic encryption from (ring) learning with errors in prior secure GBDT (Usenix Security' 23) and related secure machine-learning computations. Comparative experiments show our protocol remains competitive with leaky approaches in efficiency. Enabling ID-hiding aggregation, our techniques can extend to other vertically partitioned analytics.

14.
arXiv (CS.CL) 2026-06-15

SIMMER: Benchmarking Latent Failures in LLM Executable Planning with a World Model

Large language models (LLMs) are increasingly deployed as planners for autonomous agents in household environments. While existing benchmarks evaluate whether LLM-generated plans execute successfully, they overlook a critical type of failure: latent failures. Unlike immediate failures that trigger instant feedback at execution time and enable timely correction, latent failures do not immediately halt plan execution but silently compromise goal achievement. In severe cases, they cause irreversible harm. To address this gap, we introduce SIMMER, a benchmark for evaluating latent failures in LLM planning through a human-curated symbolic world model grounded in the kitchen domain. SIMMER defines a world model comprising 77 actions, 262 unique objects, and approximately 46,800 possible interactions that are semantically realistic, derived from real-world cooking scripts. It then leverages a state machine executor that validates plans against the world model and detects immediate precondition violations, latent hazards, and irreversible failures. Experiments across six LLMs show that even frontier models achieve at most 17% error-free plans. Moreover, up to 56% of plans contain latent failures, the majority of which lead to irreversible consequences. We further demonstrate that explicit state reasoning via counterfactual foresight simulation can reduce latent failures by up to 72% and irreversible cases by up to 75%, suggesting a promising direction for more robust LLM planners.

15.
arXiv (CS.LG) 2026-06-18

Hierarchical Attention via Domain Decomposition

arXiv:2606.18525v1 Announce Type: new Abstract: We propose a hierarchical attention mechanism based on two-level overlapping Schwarz domain decomposition. The method is motivated by the observation that two-level Schwarz domain decomposition methods combine local subdomain corrections with a coarse level that communicates global, long-range information. We test its usefulness in the context of finite-dimensional operator learning using a simple, one-dimensional diffusion problem with homogeneous Dirichlet boundary conditions. Although elementary, this problem provides a controlled sequence-to-sequence setting in which the exact nonlocal solution operator is known. After discretization, learning the solution operator amounts to approximating the inverse of a symmetric positive definite matrix. As a baseline, we use a global softmax-free low-rank attention operator of the form $QK^T$. The proposed construction replaces this dense global factorization by a two-level additive structure: local low-rank attention blocks on overlapping subdomains are combined with a coarse attention block. The resulting operator has the form $$M_{\theta}^{-1} = \Phi Q_0 K_0^T \Phi^T + \sum_{i=1}^{N} R_i^T D_i^{1/2} Q_i K_i^T D_i^{1/2} R_i.$$ Here $R_i$ restricts to an overlapping subdomain, $D_i$ is a partition-of-unity weight, and $\Phi$ is a coarse interpolation (or prolongation) matrix. Numerical experiments for synthetic Fourier right-hand sides indicate that the domain-decomposition attention operator is able to train faster and can give more accurate approximations than a global low-rank attention baseline while using significantly fewer parameters.

16.
bioRxiv (Bioinfo) 2026-06-12

Deciphering cross-omics complexity of tissues via diagonal integration of unpaired spatial multi-omics data

Recent spatial multi-omics technologies enable the simultaneous in situ profiling of multiple omics modalities on the same tissue section; however, they face challenges in experimental complexity and high costs. This technical limitation can be circumvented by diagonal integration methods, which integrate omics data from different modalities. However, existing single-cell diagonal integration approaches overlook spatial information, causing unreliable anchoring across omics layers. Here, we introduce STAMO, a graph attention neural network model for spatially aware integration of unpaired spatial slices from different omics. Systematic benchmarking on spatial epigenome-transcriptome slices proves that STAMO outperforms the state-of-the-art methods in generating aligned embeddings and identifying consensus spatial domains across omics. We apply STAMO to integrate unpaired data from diverse spatial omics types (transcripts, epigenetics, DNA, and proteins), including slices from spatial RNA and four different epigenomic modalities, spatial ATAC and RNA slices across embryonic stages, spatial protein and RNA slices, and spatial DNA and RNA slices. In addition, the integration capability of STAMO can be further used to achieve cross-omics generation, offering a solution for exploring spatial region-specific gene regulatory mechanisms.

17.
arXiv (CS.AI) 2026-06-18

Rescaling MLM-Head for Neural Sparse Retrieval

arXiv:2606.18811v1 Announce Type: cross Abstract: Learned sparse retrieval (LSR) models such as SPLADE have traditionally used BERT-style masked language models as backbone encoders. A natural expectation is that replacing BERT with stronger pretrained encoders should improve retrieval effectiveness. However, we find that under standard SPLADE training recipes, backbones with large MLM-head L2 norms can suffer performance degradation and even training collapse under standard SPLADE training recipes. We identify this failure as a scale mismatch in the MLM head: SPLADE directly uses MLM-head outputs to construct sparse lexical representations, and query-document relevance is computed by an unnormalized dot product over these representations. As a result, an inflated MLM-head scale can amplify sparse activations, distort matching scores, and destabilize contrastive training under common training settings. To address this issue, we introduce a simple initialization-time correction that rescales the MLM-head projection by a constant factor before SPLADE training. This zero-cost adjustment improves training stability without modifying the model architecture or training objective. Across both in-domain and out-of-domain retrieval benchmarks, this simple correction substantially improves large-norm backbones such as ModernBERT and Ettin, turning unstable training runs into competitive sparse retrievers. In several settings, the corrected models further match or surpass the classic BERT-SPLADE baseline. These findings suggest that the bottleneck in adapting pretrained encoders to LSR is not encoder capacity alone, but the calibration of the MLM-head scale used to construct sparse lexical representations.

18.
arXiv (CS.LG) 2026-06-15

More with LESS – Local Scene Representations for Tactile Imaging

arXiv:2606.14344v1 Announce Type: new Abstract: Tactile imaging seeks to reconstruct the internal structure of soft objects through touch sensing, with applications in medical diagnosis and robotic manipulation. Recent self-supervised learning approaches have shown promising results, but rely on global, unstructured representations and robot-controlled sensing, limiting generalization and practical use. We propose Local Encoder for Spatial Sensing (LESS), an object-centric tactile representation that exploits the local nature of touch. The tactile scene is modeled as a grid of recurrent encoders with local receptive fields, whose states are fused to reconstruct 2D or 3D images of internal structure. This compositional design enables strong generalization: models trained on single-inclusion phantoms accurately image objects with multiple inclusions and varying sizes. The local structure further supports spatial uncertainty estimation. In addition, we enable hand-held tactile imaging via external pose tracking and human-like palpation data, and extend tactile imaging to full 3D reconstruction.

19.
arXiv (quant-ph) 2026-06-24

Active interference suppression in frequency-division-multiplexed quantum gates via off-resonant microwave tones

arXiv:2601.14547v3 Announce Type: replace Abstract: The increasing number of control lines connecting quantum processors to external electronics constitutes a major bottleneck in the realization of large-scale quantum computers. Frequency-division multiplexing is expected to enable control of multiple qubits through a single microwave cable; however, interference from off-resonant microwave tones hinders precise qubit control. Here, we propose an active interference suppression method for frequency-division-multiplexed simultaneous gates on microwave-controlled qubits. We demonstrate that the deliberate incorporation of off-resonant microwave tones improves single-qubit gate fidelity. In particular, the gate infidelity scales inversely with the square of the number of microwave tones when off-resonant orthogonal or quasi-orthogonal tones are incorporated. Furthermore, we show that fast oscillations, neglected under the rotating wave approximation, degrade the gate fidelity, and that this degradation can be mitigated through optimized frequency allocation. The proposed approach is simple and effective for improving the performance of frequency-division-multiplexed quantum gates.

20.
arXiv (CS.CV) 2026-06-19

Abstraction in Style: Beyond Texture and Color

Artistic styles often embed abstraction beyond surface appearance, involving deliberate reinterpretation of structure rather than mere changes in texture or color. Conventional style transfer methods typically preserve the input geometry and therefore struggle to capture this deeper abstraction behavior, especially for illustrative and nonphotorealistic styles. In this work, we introduce Abstraction in Style (AiS), a generative framework that separates structural abstraction from visual stylization. Given a target image and a small set of style exemplars, AiS first derives an intermediate abstraction proxy that reinterprets the target's structure in accordance with the abstraction logic exhibited by the style. The proxy captures semantic structure while relaxing geometric fidelity, enabling subsequent stylization to operate on an abstracted representation rather than the original image. In a second stage, the abstraction proxy is rendered to produce the final stylized output, preserving visual coherence with the reference style. Both stages are implemented using a shared image space analogy, enabling transformations to be learned from visual exemplars without explicit geometric supervision. By decoupling abstraction from appearance and treating abstraction as an explicit, transferable process, AiS supports a wider range of stylistic transformations, improves controllability, and enables more expressive stylization.

21.
arXiv (CS.CL) 2026-06-25

Sarashina2.2-TTS: Tackling Kanji Polyphony in Japanese Speech Generation via Data Scaling and Targeted Data Synthesis

While large language model (LLM)-based text-to-speech (TTS) systems have achieved high-quality speech synthesis, most existing systems focus on English and Chinese. Japanese, however, remains under-explored, and its unique linguistic challenges, such as widespread context-dependent kanji polyphony, have yet to be adequately tackled. Here we introduce Sarashina2.2-TTS (https://github.com/sbintuitions/sarashina2.2-tts), a Japanese-centric LLM-TTS system that tackles these challenges through a dual approach: data strategy and evaluation methodology. First, we scale training to approximately 361k hours of speech, incorporating a balanced mix of Japanese and English data. Furthermore, we design a targeted data augmentation pipeline covering all 2,136 Joyo (regular-use) kanji designated by Japan's Agency for Cultural Affairs to efficiently address kanji polyphony disambiguation. Second, we introduce the Joyo Kanji Yomi Benchmark (https://github.com/sbintuitions/JoyoKanji-Yomi-Benchmark), covering all 2,136 Joyo kanji and their 4,378 readings. Alongside this benchmark, we propose Kana-CER, a metric that compares synthesized speech against reference readings in the kana space, eliminating orthographic variations to directly measure pronunciation correctness. Experiments demonstrate that our targeted data augmentation significantly improves reading accuracy. Overall, Sarashina2.2-TTS achieves state-of-the-art kanji-level reading accuracy and matches top baselines on general sentence-level pronunciation, while delivering the highest speaker similarity in zero-shot Japanese speech synthesis. Furthermore, cross-lingual evaluation reveals that Sarashina2.2-TTS is the only system that maintains stable Japanese pronunciation regardless of the prompt language, confirming that our balanced training approach improves cross-lingual robustness.

22.
arXiv (CS.AI) 2026-06-15

DiffusionBlocks: Block-wise Neural Network Training via Diffusion Interpretation

arXiv:2506.14202v4 Announce Type: replace-cross Abstract: End-to-end backpropagation requires storing activations throughout all layers, creating memory bottlenecks that limit model scalability. Existing block-wise training methods offer means to alleviate this problem, but they rely on ad-hoc local objectives and remain largely unexplored beyond classification tasks. We propose $DiffusionBlocks$, a principled framework for transforming transformer-based networks into genuinely independent trainable blocks that maintain competitive performance with end-to-end training. Our key insight leverages the fact that residual connections naturally correspond to updates in a dynamical system. With minimal modifications to this system, we can convert the updates to those of a denoising process, where each block can be learned independently by leveraging the score matching objective. This independence enables training with gradients for only one block at a time, thereby reducing memory requirements in proportion to the number of blocks. Our experiments on a range of transformer architectures (vision, diffusion, autoregressive, recurrent-depth, and masked diffusion) demonstrate that DiffusionBlocks training matches the performance of end-to-end training while enabling scalable block-wise training on practical tasks beyond small-scale classification. DiffusionBlocks provides a theoretically grounded approach that successfully scales to modern generative tasks across diverse architectures. Code is available at https://github.com/SakanaAI/DiffusionBlocks .

23.
medRxiv (Medicine) 2026-06-22

Image-based deep learning for emergency electrocardiogram classification

Automated electrocardiogram analysis has advanced largely through digital waveforms, yet many emergency-care workflows rely on ECGs available only as printed tracings, scanned reports, PDFs or mobile photographs. We developed an image-based deep learning system for emergency ECG classification and evaluated it in InCor-EMG, an expert-adjudicated dataset of 18,519 emergency ECGs spanning 12 ECG categories, with labels from 19 cardiologists. On the held-out test set, the final ConvNeXt ensemble achieved a macro F1-score of 0.807 (95% CI, 0.788-0.825), compared with 0.820 (95% CI, 0.805-0.832) for annotating cardiologists, and higher F1-scores than Mortara Veritas in most evaluated categories. Performance was associated more strongly with inter-reader agreement than with training sample size and remained informative across scanned and photographed ECGs, with supportive performance in model-enriched temporal and heterogeneous public-image evaluations. These findings support ECG image classification when digital waveforms are unavailable.

24.
bioRxiv (Bioinfo) 2026-06-16

RetroMol: Parsing a shared encoding from natural products and their biosynthetic gene clusters

Natural products such as polyketides and nonribosomal peptides (NRPs) are important sources of bioactive compounds, including many antibiotics. Many of them are assembled by modular enzyme complexes and further modified and diversified by tailoring reactions encoded by biosynthetic gene clusters (BGCs). Although natural products and their coding BGCs describe different data modalities of the same biochemical process, a unified language to jointly describe their biochemistry is lacking. Here we introduce a sequence-based representation of the core biosynthesis of modular natural products, which we call primary sequences, that bridges chemical structures and BGCs. We also present RetroMol, an algorithm that parses either natural product structures or their encoding BGCs into their primary sequences of natural product building blocks. RetroMol allows for similarity scoring between natural products and BGCs, enabling the retrieval of compounds, BGCs, and a combination of the two, based on their biosynthetic similarity. This can, for instance, be used to retrieve biosynthetically similar but structurally dissimilar compounds, or link natural products to candidate coding BGCs in large experimental datasets. We demonstrate the latter by rediscovering the nocardichelin B BGC as a proof of principle. We also exemplify the utility of biosynthetic similarity by showing various pairs of biosynthetically similar compounds with low structural similarity. Together, these results establish primary sequences as a shared biosynthetic encoding for natural product comparison and BGC prioritization.

25.
medRxiv (Medicine) 2026-06-18

Hospital-Level Variation in Antenatal Corticosteroids for Late Preterm Births

Objective: To determine whether and to what extent hospitals across the United States vary in their use of late-preterm steroids using a novel data set in which the timing of steroid administration relative to delivery can be observed. Methods: This was a retrospective cohort study of singleton births with known gestational ages identified in the Premier Healthcare Database from 2015 to 2022. The primary variable of interest was hospital-level adoption of antenatal corticosteroids for late-preterm singleton deliveries, calculated as the proportion of late-preterm singleton births (34-36 completed weeks of gestation) with any betamethasone exposure during the same late-preterm period. Hospital adoption was defined as the weighted average rate of ALPS administration among late-preterm infants across the entire post-period. Hospitals were ranked by their late-preterm steroid adoption rates and categorized by quartile based on the empirical distribution. Temporal trends were assessed using annual hospital-level adoption rates and visualized using time-series plots and distributional plots. A logistic regression model was constructed to determine hospital characteristics associated with being a highest-quartile adopting hospital. Results: The analysis cohort included 728 hospitals and 5,452,791 births, of which 361,006 (6.6%) were singleton late preterm births. Hospital steroid exposure rates ranged from 0 to 82% and were categorized into quartiles based on overall exposure rate, with cutoffs at 20.6%, 29.8%, and 40.1%. Median exposure rates increased progressively across quartiles from 14.1% (IQR 9.3-17.4%) in the lowest adopting hospitals (Q1) to 47.6% (IQR 43.7-53.2%) in the highest adopting hospitals (Q4), with substantial within-quartile variation. In the multivariable model, urban location was a strong predictor of high adoption after adjustment (aOR 2.05; 95% CI 1.11-3.83, p=0.02). Compared to Midwest hospitals, Southern hospitals had significantly lower odds of being high adopters (aOR 0.37; 95% CI 0.20-0.69, p