论文广场 - AcademicHub

01.

arXiv (CS.CV) 2026-06-18 DOI: arXiv:2606.19195

Moebius: 0.2B Lightweight Image Inpainting Framework with 10B-Level Performance

作者:

Kangsheng Duan ↗Ziyang Xu ↗Wenyu Liu ↗Xiaohu Ruan ↗Xiaoxin Chen ↗Xinggang Wang ↗

While 10B-level industrial foundation models have pushed the boundaries of image inpainting, their prohibitive computational costs severely hinder practical deployment. Constructing a highly optimized task-specific specialist offers a promising solution; however, extreme structural compression inevitably triggers a severe representation bottleneck. To conquer this, we propose Moebius, a highly efficient lightweight inpainting framework. We systematically reconstruct the diffusion backbone by introducing the Local-$\lambda$ Mix Interaction ($L\lambda MI$) block. Comprising Local-$\lambda$ and Interactive-$\lambda$ modules, it elegantly summarizes spatial contexts and global semantic priors into fixed-size linear matrices, preserving complex latent interactions while drastically shedding parameters. Furthermore, to unlock the full representational capacity of this highly compact architecture, we synergistically pair it with an adaptive multi-granularity distillation strategy. Operating strictly within the latent space to avoid expensive pixel-space decoding, this strategy dynamically balances multiple gradient-based losses to achieve high-fidelity alignment. Extensive experiments across natural and portrait benchmarks demonstrate that this optimal synergy enables Moebius to rival or even surpass the generation quality of the 10B-level industrial generalist FLUX.1-Fill-Dev. Remarkably, Moebius achieves this using less than 2\% of the parameters (0.22B vs. 11.9B) while delivering a $>15\times$ acceleration in total inference time, setting a new efficiency standard for high-fidelity inpainting. Project page at https://hustvl.github.io/Moebius.

阅读与讨论 → 访问原文 →

02.

arXiv (CS.CL) 2026-06-24 DOI: arXiv:2606.22748

AI Fiction in the Wild

作者:

Neel Gupta ↗Maria Antoniak ↗Melanie Walsh ↗

Some professional authors are beginning to use AI tools to help produce their fiction writing. Are readers using AI to generate fiction, too? Drawing on over 500,000 anonymized, English-language ChatGPT-user conversations (arXiv:2405.01470), we find that more than one third of the conversations involve some form of fiction generation – including original stories, roleplay, fanfiction, and erotica. This AI-generated fiction is notably dominated by power users. We identify common fiction generation patterns and profiles among these users, including what we call "infinite story demanders," who repeatedly request and revise variations of the same or similar narratives over extended periods of time. We show that users especially gravitate toward fanfiction and erotica, and that they are broadly drawn to generic forms, repetition, immediacy, and niche combinations of story elements. Our findings motivate two theoretical provocations. First, we argue that AI technologies may lead to a shift in the conventional relationship between the author and reader, potentially producing what we call a "solipsistic reader-writer," who both generates and consumes fiction within a closed conversational loop, interacting with a machine rather than a human other. Second, we note that LLMs enable interactivity, play, and permutation in ways that are seemingly pleasurable for users, raising questions about where AI will fit into contemporary storytelling and entertainment ecosystems. We situate these developments within broader transformations in literature and media, including self-publishing, fanfiction, and pornography, and suggest that AI-generated fiction shares structural affinities with on-demand, personalized, and repetitive cultural forms.

阅读与讨论 → 访问原文 →

03.

arXiv (CS.CV) 2026-06-16 DOI: arXiv:2606.15389

Timestep Rescheduling in Diffusion Inversion

作者:

Shangquan Sun ↗Ting Gong ↗Zhirui Liu ↗Jiamin Wu ↗Runkai Zhao ↗Mianxin Liu ↗Wenqi Ren ↗Xiaochun Cao ↗

Diffusion inversion, which maps images back to the Gaussian latent space of a diffusion model, is a critical task for image reconstruction and editing. While DDIM enables fast deterministic inversion, it inherently introduces deviations that accumulate into noticeable inversion errors. Existing methods often address this by solving a fixed-point problem but largely overlook how the selection of the diffusion timestep in the noise scheduler influences inversion fidelity. In this work, we reveal that the deviation scale in diffusion inversion is strongly dependent on the timestep size, and exhibits a parabolic trend, with larger errors concentrated at both small and large timesteps. Based on this finding, we propose a simple yet effective nonuniform timestep scheduler that integrates a global rescaling with a local dynamic programming based rescheduling, enabling a strategic allocation of computational effort that minimizes the overall inversion error and preserves higher inversion accuracy. Our method serves as an off-the-shelf enhancement for existing inversion techniques and requires no extra parameters or computational overhead. Through extensive experiments, we verify that integrating our scheduler consistently boosts the performance of existing inversion methods, achieving superior results in image reconstruction and editing.

阅读与讨论 → 访问原文 →

04.

arXiv (CS.AI) 2026-06-16 DOI: arXiv:2606.16415

Posterior Twins: Distributional Behavioral Simulation for Enterprise Decisions

作者:

Ankit Das ↗

arXiv:2606.16415v1 Announce Type: new Abstract: Enterprise behavioral simulation requires more than producing a plausible response. Many decisions depend on the shape of a population under a proposed action: which segments accept, defect, hesitate, or move into risk-sensitive states. This paper introduces Posterior Twins, a memory-grounded digital-twin approach that represents likely behavior as an updated distribution under a specific decision context. We evaluate a family of Twinning Labs behavioral-model operating points on a 226-example held-out behavioral-response benchmark and report both modal accuracy and Wasserstein-1 distance. The results show that modal accuracy and distributional fidelity identify different operating regimes. TL-Twin Alpha achieves the lowest observed Wasserstein-1 distance in the reported result set ($W_1 = 1.16$), while TL-Twin Delta and TL-Twin Gamma provide balanced operating points near the modal-accuracy frontier. The paper frames these results as a systems result: governed memory, behavioral model routing, scenario orchestration, distributional aggregation, and auditability are necessary for turning simulated behavior into reusable enterprise decision evidence.

阅读与讨论 → 访问原文 →

05.

arXiv (quant-ph) 2026-06-15 DOI: arXiv:2507.15738

Symplectic coherence: a measure of position-momentum correlations in quantum states

作者:

Varun Upreti ↗Ulysse Chabaud ↗

arXiv:2507.15738v2 Announce Type: replace Abstract: The interdependence of position and momentum, as highlighted by the Heisenberg uncertainty principle, is a cornerstone of quantum physics. Yet, position-momentum correlations have received little systematic attention. Motivated by recent developments in bosonic quantum physics that underscore their relevance in quantum thermodynamics, metrology, and computing, we establish a general framework to study and quantify position-momentum correlations in quantum states. We introduce symplectic coherence, a faithful and easily computable measure defined as the Frobenius norm of the block of the covariance matrix encoding position-momentum correlations, and demonstrate that symplectic coherence is monotone under relevant operations and robust under small perturbations. Furthermore, using a recent mapping by Barthe et al. (Phys. Rev. Lett. 134, 070604) which relates the covariance matrix of a bosonic state to the density matrix of a finite-dimensional system, we show that position-momentum correlations correspond to beyond-classical correlations in a virtual finite-dimensional quantum state, with symplectic coherence mapping naturally to geometric quantum discord. Taking energy constraints into account, we determine the maximal position-momentum correlations achievable at fixed energy, revealing structural insights about the corresponding optimal states. Finally, we illustrate the operational relevance of symplectic coherence through several examples in quantum information tasks and quantum thermodynamics. In the process, we establish new technical results on matrix norms and quantum covariance matrices, and demonstrate the conceptual significance of viewing covariance matrices as density matrices of virtual quantum states.

阅读与讨论 → 访问原文 →

06.

arXiv (CS.CV) 2026-06-15 DOI: arXiv:2606.14071

ShearFuse-UNet: Hadamard, DCT, and Shearlet Transform Fusion for Next-Day Wildfire Spread Prediction

作者:

Ene Meco ↗Yingyi Luo ↗Emadeldeen Hamdan ↗Adam Watts ↗Ahmet Enis Cetin ↗

We propose ShearFuse-UNet, a lightweight and computationally efficient deep learning model for next-day wildfire spread prediction from multi-modal satellite data. The model integrates three complementary transform-domain branches inside each encoder block of a U-Net backbone: a 2D Fast Walsh-Hadamard Transform (WHT) branch, a 2D Discrete Cosine Transform (DCT) branch, and a cone-adapted digital Shearlet residual branch. The WHT and DCT branches establish orthogonal latent spaces with learnable spectral scaling and fixed soft-thresholding, while the Shearlet branch provides anisotropic, multi-directional feature decomposition that explicitly encodes the elongated edge structures characteristic of fire fronts. A learned SpectralFusion gate adaptively combines the WHT and DCT responses, and the Shearlet reconstruction is added as a residual. This three-branch design bears a loose structural analogy to transformer self-attention: the WHT and DCT branches provide complementary spectral representations that are adaptively fused, while the Shearlet branch contributes directional content through a residual pathway. Unlike self-attention, the proposed design relies on fixed mathematical transforms rather than learned projection operators, reducing parameter count and computational cost. Evaluated on the WildfireSpreadTS dataset, ShearFuse-UNet achieves an F1 score of 0.596 with only 267k parameters, outperforming a ResNet18-based U-Net (14M parameters, F1 = 0.589) and demonstrating a highly favorable accuracy-efficiency trade-off. Results on the Google Next-Day Wildfire Spread dataset further validate these findings across a different benchmark.

阅读与讨论 → 访问原文 →

07.

arXiv (CS.CL) 2026-06-16 DOI: arXiv:2606.15734

Retrievable Gradients: Continual Post-Training Without Cumulative Weight Drift

作者:

Weihang Su ↗Jiacheng Kang ↗Jingyan Xu ↗Qingyao Ai ↗Jianming Long ↗Hanwen Zhang ↗Bangde Du ↗Xinyuan Cao ↗Min Zhang ↗Yiqun Liu ↗

Continual post-training enables models to absorb emerging knowledge after deployment, but repeatedly updating shared parameters can accumulate weight drift, potentially causing catastrophic forgetting and degrading general capabilities. Retrieval-augmented generation avoids such parameter drift, yet often lacks the depth of parametric knowledge integration. In this paper, we propose ReGrad (Retrievable Gradients), a new paradigm that treats gradients as retrievable units of knowledge. ReGrad pre-computes document-specific gradients offline, stores them in an indexed Gradient Bank, and retrieves only query-relevant gradients at inference time for temporary weight adaptation. However, raw language-modeling gradients are optimized for token-level document reconstruction rather than for query-driven knowledge use. We therefore introduce a bi-level meta-learning objective that reshapes document-derived gradients into generalizable adaptation signals for downstream tasks. Experiments across general and domain-specific settings show that \textsc{ReGrad} outperforms CPT and RAG baselines, enabling scalable and reversible parametric knowledge injection without accumulating weight drift.

阅读与讨论 → 访问原文 →

08.

arXiv (CS.CV) 2026-06-25 DOI: arXiv:2606.25478

TACO: Towards Task-Consistent Open-Vocabulary Adaptation in Video Recognition

作者:

Minghao Zhu ↗Xiao Lin ↗Mengxian Hu ↗Xun Zhou ↗Liuyi Wang ↗Xiaoyan Qi ↗Chengju Liu ↗Qijun Chen ↗

Adapting CLIP for open-vocabulary video recognition necessitates a delicate balance between newly acquired video knowledge and the pretrained generalization. While existing studies pursue this generalization-specialization trade-off with additional regularizations or constraints, we argue that they overlook the deviation of representations beyond the fine-tuning data distribution, resulting in suboptimal adaptation effects. We believe such deviation is inherited from the inconsistency between the fine-tuning and evaluation objectives, where model optimization is restricted to the known training distribution but evaluated on unseen ones. In this paper, we introduce TACO, a simple yet effective framework to mitigate the potential negative effects induced by this inconsistency. Our key insight is that adaptation should preserve OOD-relevant alignment beyond the training distribution. To this end, we propose Relative Structure Distillation, which regularizes the relative geometry of the representation space and suppresses harmful alignment shift during training. We further decouple the representation space from the optimization space with a lightweight specialization projection, allowing task-specific adaptation without directly overspecializing the representations used at test time. TACO establishes state-of-the-art performance on diverse benchmarks under cross-dataset and base-to-novel settings. Code will be released at https://github.com/ZMHH-H/TACO.

阅读与讨论 → 访问原文 →

09.

medRxiv (Medicine) 2026-06-24 DOI: HASH:b52131b0c56c7f0d956b2e6d24a846d6

Screen-Free Haptic Breathwork with HRV-Adaptive Control, Pilot Outcomes and System Design

作者:

Adhia ↗Raithatha ↗Ferguson ↗Pasquier ↗

Vayu is a mobile breathwork system comprising an iOS companion app and Apple Watch application that delivers slow, resonant breathing using screen-free haptic cues, HRV-adaptive pacing, and reflective journaling grounded in Patanjali's five states of mind. The watchOS component provides tactile phase guidance and real-time biometric sensing (heart rate, HRV), while the iOS interface supports analytics and personalized recommendations. In a 4-6-week naturalistic pilot involving 199 adults (ages 22-65) across Canada, the United States, and India, participants engaged in daily 5-10-minute sessions guided by on-wrist haptics. Average adherence was 4.1 +/- 2.3 sessions per week, with 71% of active users maintaining at least 3 sessions per week. By week four, perceived stress (PSS-10) decreased by 2.5 points, resting heart rate declined by 7.4 bpm, and HRV increased by a median of 28.6% relative to baseline, accompanied by mood improvements. No adverse events were reported. HRV metrics are derived from Apple Watch PPG-based proxies and interpreted as relative trends. These findings suggest Vayu is effective and well-tolerated, demonstrating strong engagement and early efficacy signals.

阅读与讨论 → 访问原文 →

10.

medRxiv (Medicine) 2026-06-25 DOI: HASH:e67665d72415e4d8ec58a1697175919d

Peripheral Blood Mononuclear Cell-Derived miR-664a-3p is Associated with Plaque Burden and Necrotic Core Characteristics in Coronary Artery Disease Across Two Independent Populations

作者:

Duggal ↗Kashyap ↗A. K ↗Kumar ↗Naga Prasad ↗S. V ↗

Background: Stability of the atherosclerotic plaque in coronary artery disease (CAD) is determined by features such as total plaque burden and necrotic core volume. Since invasive procedures are required to evaluate plaque stability, we tested whether the peripheral blood mononuclear cell (PBMC) microRNA (miR) signature could correlate with measures of plaque stability and thus serve as a non-invasive biomarker. Method: Patients from two distinct geographical locations in India were recruited to the study (Site 1: CAD=19, non-CAD=5; Site 2: CAD=12, non-CAD=7) and underwent invasive intravascular ultrasound with virtual histology to assess plaque burden and necrotic core volume. RNA from PBMCs of these patients was subjected to unbiased sequencing. Differential miR expression evaluated by DESeq2 and assessed for co-relationship with plaque stability. miR target gene prediction was performed using multiple databases, and Enrichr was used for enrichment analysis. Results: Unbiased RNA sequencing identified miR-664a-3p to be significantly downregulated in CAD patients from both sites (Site 1: log2FC=-1.02, p=0.0033 & Site 2: log2FC=-1.04, p=0.0007). miR-664a-3p expression was inversely correlated with plaque burden and necrotic core volume. Receiver operating characteristic (ROC) analysis of miR-664a-3p showed significant discriminative performance in the CAD cohort, with AUC values of 0.842 (Site 1) and 0.881 (Site 2). miR664a-3p target prediction and pathway enrichment analysis revealed selective enrichment of inflammatory signaling pathways, such as IL-17 and TNF, suggesting an association between PBMC pro-inflammatory response and plaque vulnerability. Conclusion: miR-664a-3p is downregulated in CAD patients and inversely correlates with measures of plaque stability, with potential as a biomarker for identifying patients at risk of CAD progression and plaque instability. Keywords: Coronary artery disease, peripheral blood mononuclear cells (PBMCs), microRNA, atherosclerotic plaque, necrotic core, plaque burden, biomarkers.

阅读与讨论 → 访问原文 →

11.

arXiv (CS.LG) 2026-06-16 DOI: arXiv:2606.15074

TriAdReview: Triangular Adversarial Review Architecture for Multi-Model Technical Document Generation

作者:

Zhiqiang Zhou ↗Junliang Dai ↗Xu Ling ↗

arXiv:2606.15074v1 Announce Type: new Abstract: Large language models (LLMs) are increasingly used for technical document generation, yet single-model outputs often suffer from over-engineering, security blind spots, and incomplete coverage. We propose TriAdReview, a triangular adversarial review architecture that employs two independent reviewer models (engineering and boundary perspectives) and a triangular judging mechanism to iteratively improve a generator model's output. We evaluate TriAdReview across five benchmark tasks - architecture design, code generation, proposal review, security audit, and requirements analysis - using three configurations: single model (baseline), dual model (single review), and triple model (full system). Results across 75 experiments (n=5 per cell) show that the triple model configuration achieves a 10.1% overall improvement over the single model baseline (26.2 vs. 23.8 out of 50; p

阅读与讨论 → 访问原文 →

12.

arXiv (CS.AI) 2026-06-11 DOI: arXiv:2606.10046

Inside the Latent Flow: Causal Deciphering of Attention Dynamics in Audio Separation Foundation Models

作者:

Yuxuan Chen ↗Haoyuan Yu ↗Peize He ↗

arXiv:2606.10046v2 Announce Type: replace-cross Abstract: Flow-matching transformers achieve strong audio separation, yet their attention dynamics are opaque. We adapt established causal-intervention principles into a deterministic, inference-time probing protocol for SAM Audio. Orthogonal probing uncovers a dual-pathway text-conditioning mechanism: additive injections control semantic identity, while cross-attention refines acoustic structure. We observe an asynchronous layerwise convergence: stable layers build temporal scaffolds early, whereas fast layers continue resolving artifacts during sampling. The model also attenuates temporal segmentation cues to maintain continuous-flow stability. Using these insights, we propose Layer-Selective Attention Caching (LSAC), a training-free acceleration method that caches attention in stable layers. Across acoustic complexities, LSAC cuts self-attention computation by about ~25% with negligible quality loss and yields up to 6.7x higher quality retention than naive step reduction.

阅读与讨论 → 访问原文 →

13.

arXiv (CS.AI) 2026-06-11 DOI: arXiv:2606.12073

"That's AI Slop, You Bot!" Studying Accusations, Evidence, and Credibility in Online Discourse Towards LLM-Generated Comments

作者:

Jason Miklian ↗John E. Katsos ↗

arXiv:2606.12073v1 Announce Type: cross Abstract: Generative AI has made fluent prose cheap to produce, breaking the old promise to readers that good writing meant real thinking. How have readers responded, and what can this tell us about changing anti-AI attitudes? We analyzed 25 million comments from Hacker News and Reddit (2023-2026), combining LLM judgment on 7,500 sampled accusations of AI use, sentiment trajectories, speech-act coding of 300 confirmed accusations of AI use, and a matched-control test of accused versus non-accused parent comments. We found that the pejorative-label share of accusations rose more than tenfold on both platforms while a placebo vocabulary of pre-2022 inauthenticity terms (shill, astroturf) did not. This shift reflected a fast-growing trend of branding any suspicious or seemingly inauthentic prose as "AI slop". The slop frame now constitutes 94 percent of pejorative mentions, with the dominant comments shifting in tone from mockery toward gatekeeping and structural protest. The key surprise comes from a matched-control test which found that prose features that statistically distinguish AI from human text do not predict which human text gets accused as AI. The new accusations work as social gatekeeping of perceived authenticity without actually screening for AI. This research extends signaling theory by showing that substitute signals used socially can grow even when inaccurate if the underlying detection problem cannot be solved at the non-expert level. It shows that AI's effects on writing from the reader side are distinct from those on the production (writer) side. Detection technology cannot resolve this dynamic because the social function of accusations is increasingly to perform social gatekeeping and in-group signaling as opposed to identifying AI-generated writing.

阅读与讨论 → 访问原文 →

14.

arXiv (CS.LG) 2026-06-11 DOI: arXiv:2606.11500

FlexiBrain: Resolution-Agnostic Voxel-Level Encoding for Native fMRI

作者:

Mo Wang ↗Wenhao Ye ↗Junfeng Xia ↗Minghao Xu ↗Hongkai Wen ↗Quanying Liu ↗

arXiv:2606.11500v1 Announce Type: cross Abstract: The success of large-scale deep learning models in neuroscience is fundamentally constrained by severe data heterogeneity. Native fMRI data aggregated from diverse sources exhibit substantial variation in both spatial and temporal resolutions. Consequently, most existing frameworks rely on lengthy, rigid preprocessing pipelines that enforce uniformity across datasets. This practice introduces two critical limitations: (1) potential degradation of subject-specific anatomical information; (2) significant computational overhead, often requiring hours of processing per subject. Here, we propose FlexiBrain, a resolution-agnostic voxel-level encoding framework for native fMRI based on Mamba-JEPA. FlexiBrain defines patch sizes in real-world physical units and employs a dynamic patch resizing, thereby bypassing destructive spatial standardization while enabling direct ingestion of data in native space. We instantiate the framework using an efficient Mamba-JEPA backbone to model high-dimensional 4D fMRI signals. Across five diverse downstream neuroscience tasks, FlexiBrain consistently outperforms recent state-of-the-art methods, achieving gains of up to 12 percentage points without external data augmentation. Importantly, FlexiBrain functions as a seamless plug-in module, substantially reducing preprocessing costs and accelerating the development of robust voxel-level fMRI foundation models. Code is available at https://github.com/OneMore1/FlexiBrain.

阅读与讨论 → 访问原文 →

15.

arXiv (CS.LG) 2026-06-17 DOI: arXiv:2601.22184

Tacit Coordination of Large Language Models

作者:

Ido Aharon ↗Emanuele La Malfa ↗Michael Wooldridge ↗Sarit Kraus ↗

arXiv:2601.22184v2 Announce Type: replace-cross Abstract: Large Language Models (LLMs) are increasingly deployed in multi-agent settings that require coordination without communication, from human-AI interaction to safety-critical scenarios. Humans often overcome the absence of communication through focal points: salient solutions that naturally stand out to all participants. We present the first large-scale evaluation of how, when, and why focal points emerge in LLMs, comparing their behaviour with humans across cooperative and competitive games, including realistic search and rescue scenarios, demonstrating when focal points enable effective coordination. Across more than 20 open- and closed-source models, we find that LLMs exhibit a remarkable ability to coordinate without communication, often matching or outperforming humans. However, the same models consistently fail in tasks requiring numerical common sense or culturally nuanced notions of salience. We additionally evaluate simple learning-free strategies that substantially improve coordination both among LLMs and between humans and LLMs. Our results reveal striking coordination capabilities, as well as social limitations in modern LLMs, and offer new insight into the latent notions of salience encoded within them. Our findings caution against assuming that LLMs share humans' cultural and perceptual substrate when deployed in coordination settings.

阅读与讨论 → 访问原文 →

16.

arXiv (CS.CL) 2026-06-17 DOI: arXiv:2412.10139

TACOMORE: Exploring a replicable prompting protocol for LLM-assisted corpus analysis

作者:

Bingru Li ↗Han Wang ↗Nicholas Groom ↗

As corpus linguistics continues to scale, researchers are facing a growing methodological bottleneck: while computational tools can easily count billions of words, the qualitative interpretation of these data remains a slow and labor-intensive human task. Large Language Models (LLMs) offer a promising way to automate this process, yet their integration into the field is often hindered by concerns over black-box unpredictability and a lack of replicability. This study introduces TACOMORE, a structured prompting framework designed to transform ad-hoc AI interactions into a standardized linguistic protocol. Built upon four foundational principles (Task, Context, Model, and Replicability), the framework guides LLMs to move beyond generic probability prediction to anchoring their reasoning in the specific co-occurrence patterns of a target corpus. We applied this framework to three core corpus tasks, i.e., the analysis of keywords, collocates, and concordances, using an open corpus of COVID-19 research abstracts. After testing three LLMs, we found that while structured prompting improves accuracy and replicability, inherent limitations regarding hallucination persist. This research offers a critical lens into the role of LLMs in corpus linguistics, highlighting their potential as complementary tools while emphasizing the irreplaceable role of human validation.

阅读与讨论 → 访问原文 →

17.

arXiv (CS.CL) 2026-06-19 DOI: arXiv:2606.20097

HydraHead: From Head-Level Functional Heterogeneity to Specialized Attention Hybridization

作者:

Zhentao Tan ↗Wei Chen ↗Jingyi Shen ↗Yao Liu ↗Xu Shen ↗Yue Wu ↗Jieping Ye ↗

The quadratic complexity of attention poses a critical bottleneck for long-context processing, spurring interest in hybrid attention designs. Most open-source hybrid models adopt a layer-wise strategy. Yet, prior work has noted the inherent difficulty of integrating Linear Attention (LA) with Full Attention (FA), suggesting that the design space of attention hybridization remains underexplored. To probe this space, we conduct interpretability analysis and observe that layers exhibit block-wise functional similarity, while individual heads within the same layer display distinct functional specialization despite sharing input features. This head-level heterogeneity suggests that the head dimension provides a natural and principled granularity for fusing heterogeneous attention signals. Building on this insight, we introduce HydraHead, a novel architecture that hybridizes FA and LA along the head axis. HydraHead features two key innovations: (1) an interpretability-driven selection strategy that identifies retrieval-critical heads and preserves FA only for them, and (2) a scale-normalized fusion module that reconciles the distributional gap between FA and LA head outputs. By leveraging a three-stage transfer pipeline with parameter reuse and distillation, we achieve high-performance hybrid models with minimal training overhead. Under a unified training setup, HydraHead outperforms other hybrid designs in long-context tasks while maintaining strong general reasoning. With interpretability-driven head selection, it matches a 3:1 layer-wise hybrid's long-context performance at a 7:1 LA-to-FA ratio. Crucially, trained on only 15B tokens, HydraHead achieves over 69% improvement over the baseline at 512K context length, approaching Qwen3.5, a leading model of comparable size with a native context length of 256K. This highlights the significant scaling potential of head-level hybridization.

阅读与讨论 → 访问原文 →

18.

arXiv (CS.LG) 2026-06-25 DOI: arXiv:2606.24969

Frequency Domain Reservoir Computing

作者:

Klaus Schertler ↗Xiomara Runge ↗Andrea Ceni ↗David Kappel ↗Claudio Gallicchio ↗

arXiv:2606.24969v1 Announce Type: new Abstract: While the quadratic sequence-length bottleneck of transformers has fueled a resurgence in recurrent models, effectively capturing complex dynamics requires architectures that balance efficient training with highly expressive latent states. Echo State Networks (ESNs) offer a compelling approach by utilizing fixed recurrent weights to circumvent backpropagation through time, enabling a closed-form training solution. However, achieving the expressivity needed for complex tasks demands large reservoirs, exposing an $\mathcal{O}(N^2)$ state-update bottleneck that prevents ESNs from matching the scale of contemporary recurrent models. To address this limitation, we introduce Frequency Domain Reservoir Computing (FRESCO), an ESN architecture operating entirely in the frequency domain while avoiding domain-shift overheads to achieve $\mathcal{O}(N)$ complexity for dense, non-linear recurrent updates. By employing a novel dimensional zero-padding input embedding, a packed \operatorname{FD}h readout, and a natively applied frequency-domain non-linearity, FRESCO drastically reduces computational costs and energy consumption of training and inference. Furthermore, FRESCO matches the state-of-the-art predictive performance on memory benchmarks, sequential classification, and multivariate long-horizon forecasting, offering a scalable path forward for dense recurrent architectures.

阅读与讨论 → 访问原文 →

19.

arXiv (CS.AI) 2026-06-19 DOI: arXiv:2606.20094

MakeupMirror: Improving Facial Attribute Preservation in Diffusion Models for Makeup Transfer

作者:

Nefeli Andreou ↗Angel Mart\'inez-Gonz\'alez ↗Sabine Sternig ↗Matthieu Guillaumin ↗Epameinondas Antonakos ↗Michael Opitz ↗

arXiv:2606.20094v1 Announce Type: cross Abstract: Makeup transfer models enable fun augmented reality (AR) experiences as well as virtual try-on (VTO) for online makeup shopping. While recent state-of-the-art diffusion based solutions such as Stable-Makeup dramatically improve the accuracy and realism of makeup transfer, they still face limitations in identity and skin color preservation, making production-level VTO for makeup shopping unrealistic. In this work, we propose MakeupMirror, a diffusion-based approach to makeup transfer that makes significant progress towards preserving facial features and skin tone. We introduce several technical innovations over Stable-Makeup: (1) integration of facial geometry conditioning with ControlNets to maintain facial fidelity; (2) region-specific makeup transfer control to enable precise makeup application across facial regions such as skin, eyes and lips; (3) skin tone-based makeup transfer modulation that prevent skin tone alteration in cross-subject transfer scenarios; and (4) integration of a Levenberg-Marquardt Langevin sampler to speed up inference while maintaining generation quality. Our experiments on CPM-Real, Makeup Wild, and (herein newly collected, more diverse) MakeupSelfies datasets show that MakeupMirror improves relative facial recognition similarity by +60%, reduces relative skin tone difference by -50% over Stable-Makeup, with a latency of 0.7s, while achieving expert acceptance rate of 94% across core facial identity preservation criteria.

阅读与讨论 → 访问原文 →

20.

arXiv (CS.CV) 2026-06-25 DOI: arXiv:2606.25160

Toward Low-Latency Vision-Language Models with Doubly-Correct Predictions in Egocentric Visual Understanding

作者:

Qitong Wang ↗Fan Du ↗Pranav Maneriker ↗Jihui Jin ↗Christopher Rasmussen ↗

The rapid rise of Vision-Language Models (VLMs) in egocentric visual understanding has made low-latency inference in human-robot collaborative (HRC) tasks increasingly critical. Weight pruning techniques developed for VLMs to shrink model size and computation can be readily applied to satisfy the efficiency demands of on-board processing and real-time interactive robotics. Moreover, safe human-robot interaction demands pruning strategies that preserve doubly-correct predictions; outputs must be both accurate and evidentially grounded to mitigate risks and ensure user trust. In this paper, we present a new study of VLM pruning through the lens of doubly-correct prediction. Our experiments surprisingly show that existing pruning methods often preserve the right evidence localization but undermine correct prediction. To address this, we propose a rationale-informed pruning strategy that better aligns evidence with decisions. Benchmark results on egocentric video datasets demonstrate that our method not only achieves the highest prediction accuracy but also outperforms existing approaches in attaining doubly-correct predictions. We aim to stimulate research on efficient and reliable VLMs, ensuring accuracy-driven advances align with the transparency, auditability, and safety required for responsible human-robot interaction and embodied intelligence.

阅读与讨论 → 访问原文 →

21.

arXiv (CS.CL) 2026-06-15 DOI: arXiv:2605.16739

EmoMind: Decoding Affective Captions from Human Brain fMRI

作者:

Bilal A. Mohammed ↗Lin Gu ↗Ruogu Fang ↗

Decoding visual experience from brain activity has advanced substantially, but current brain-to-text systems largely recover semantic content while discarding affect. Additionally, language models can generate emotional text when prompted with categorical labels, but such labels collapse rich inter-subject variability into coarse discrete bins. We present EmoMind, the first end-to-end pipeline for decoding affective captions directly from fMRI signals. EmoMind first retrieves a semantically grounded neutral scene description from brain-decoded visual features, then rewrites it using a continuous 34-dimensional emotion vector decoded from the same fMRI recording. To control the balance between content preservation and affective expression, we train the rewriter with classifier-free guidance against an identity-preserving null branch, enabling smooth interpolation between semantic fidelity and affective expressivity. We evaluate affective caption generation with a three-axis validation framework spanning subject-specificity, structural geometry, and causal control. We further augment this framework with a synthetic-brain substitution test that probes robustness to the measurement apparatus, and we benchmark each axis against GPT-4 prompted with brain-decoded top-5 emotion labels as a strong discrete baseline. Across two independent emotion fMRI datasets, EmoMind significantly outperforms label-prompted GPT-4 on all three axes, with the largest gains on metrics that require person-specific affective structure rather than population-level emotion aggregation. These results establish continuous brain-decoded affect as a viable control signal for individualized affective caption generation and open new directions for studying individual affective brain organisation.

阅读与讨论 → 访问原文 →

22.

medRxiv (Medicine) 2026-06-15 DOI: HASH:f701b58d62595f21b0ac7b46969ff6ab

Iron deficiency testing among people with incident heart failure in primary care

作者:

Maharajan ↗Jones ↗N. R ↗Bankhead ↗Erone ↗Haynes ↗Kutumba ↗Li ↗Maynard ↗Roy ↗Shah ↗Stanworth ↗…

Background: Given around 50% of people with heart failure have a degree of iron deficiency, guidelines recommend screening. It is uncertain to what extent this is done in primary care and whether testing is equitable. Aim: To report the proportion of people with incident heart failure who undergo a ferritin test within 12 months. Design and setting: Retrospective primary care cohort study using Clinical Practice Research Datalink Aurum data, between 2016 and 2021. Methods: We report the proportion of adults with an incident diagnosis of heart failure who received a ferritin test within 12 months. Multivariable logistic regression was used to examine the odds of testing based on key demographic covariates and co-morbidities. Results: Among 105,749 individuals with an incident diagnosis of heart failure (mean age 71.6 years, SD 14.3), only 35,688 (33.7%) received a ferritin test within the subsequent year. Increasing age (odds ratio 1.25 per 10-year increase, 95% CI: 1.24-1.27), female sex (male sex OR 0.86, 0.84-0.89) and Asian ethnicity (OR 1.70, 1.59-1.80) were all associated with increased odds of testing as were diagnoses of coeliac disease (OR 1.86, 1.58-2.21), type 1 diabetes (OR 1.82, 1.51-2.19) and cirrhosis (OR 1.64, 1.43-1.87). There was geographic variation in testing, even in adjusted analyses. Conclusion: In a large primary care dataset, two thirds of people with incident heart failure did not receive a ferritin test for iron deficiency within a year of diagnosis demonstrating a gap in current practice and an opportunity for improvements in service delivery.

阅读与讨论 → 访问原文 →

23.

arXiv (CS.LG) 2026-06-15 DOI: arXiv:2602.09161

Minimum Distance Summaries for Robust Neural Posterior Estimation

作者:

Sherman Khoo ↗Dennis Prangle ↗Song Liu ↗Mark Beaumont ↗

arXiv:2602.09161v2 Announce Type: replace-cross Abstract: Simulation-based inference (SBI) enables amortized Bayesian inference by first training a neural posterior estimator (NPE) on prior-simulator pairs, typically through low-dimensional summary statistics, which can then be cheaply reused for fast inference by querying it on new test observations. Because NPE is estimated under the training data distribution, it is susceptible to misspecification when observations deviate from the training distribution. Many robust SBI approaches address this by modifying NPE training or introducing error models, coupling robustness to the inference network and compromising amortization and modularity. We introduce minimum-distance summaries, a plug-in robust NPE method that adapts queried test-time summaries independently of the pretrained NPE. Leveraging the maximum mean discrepancy (MMD) as a distance between observed data and a summary-conditional predictive distribution, the adapted summary inherits strong robustness properties from the MMD. We demonstrate that the algorithm can be implemented efficiently with random Fourier feature approximations, yielding a lightweight, model-free test-time adaptation procedure. We provide theoretical guarantees for the robustness of our algorithm and empirically evaluate it on a range of synthetic and real-world tasks, demonstrating substantial robustness gains with minimal additional overhead.

阅读与讨论 → 访问原文 →

24.

arXiv (CS.CL) 2026-06-16 DOI: arXiv:2606.15161

Beyond Layer Importance in Layer-wise Sparsity: An Inter-Layer Perturbation-Absorption Perspective

作者:

Tao Jing ↗Ningxin Wu ↗Chen Kang ↗Dong Yu ↗Changliang Li ↗Pengyuan Liu ↗

The considerable layer-wise redundancy in large language models (LLMs) has established non-uniform sparsity allocation across layers as the standard pruning approach for efficient compression. Existing layer-wise allocation methods that estimate allocation strategy from local signals such as activation outliers or weight spectra mainly derive from local layer importance, whereas the final post-pruning performance is also influenced by the network's subsequent compensatory capacity. In this paper, we directly characterize this property through controlled perturbation experiments. We make the following empirical findings. First, layers exhibit highly heterogeneous responses to pruning-scale perturbations. In most cases, early layers amplify perturbations, while middle and late layers actively absorb them, with relative L2 drift decreasing monotonically across depth and direction realigning toward the unperturbed hidden-state trajectory. Second, absorption is a large-perturbation phenomenon. Under small perturbations the network exhibits amplification across all layers, and the transition to absorption occurs smoothly as perturbation magnitude grows to pruning scale. This enriches the linearized accumulation theory underlying related works. Building on these findings, we define an absorption coefficient per layer and propose absorption-aware correction, an orthogonal augmentation that improves OWL and AlphaPruning by reducing perplexity by 7.13% and boosting zero-shot accuracy by 1.02% across multiple model families at 70% sparsity.

阅读与讨论 → 访问原文 →

25.

arXiv (quant-ph) 2026-06-16 DOI: arXiv:2606.15071

Quantum learning with a single-atom sensor

作者:

Yin Mo ↗Emilio Bagan ↗Giulio Chiribella ↗

arXiv:2606.15071v1 Announce Type: new Abstract: The ability to gather information and to act upon it is at the core of every learning agent. But what is the impact of quantum mechanics on an agent's ability to sense external inputs and to translate them into actions? Here we address the question for a prototype task of learning agency at the quantum scale: rotating a single spin based on information gathered by a single atom. We determine the ultimate performance limit for this task, revealing a fundamental tradeoff between entanglement at the sensing stage and coherence at the action stage: if the single-atom sensor is not entangled with the quantum system serving as the agent's internal memory, then the best learning strategy requires a coherent transfer of quantum information from the sensor to the system that controls the agent's actions. In contrast, if the sensor is initially entangled with the agent's memory, then the transfer of quantum information is no longer necessary. Our results indicate that the quantum properties of the sensor radically affect the optimal way to convert external stimuli into actions, revealing a link between quantum sensing and the behavior of quantum agents.

阅读与讨论 → 访问原文 →

探索全球前沿学术脉络