Academic Intelligence · Curated Daily

探索全球前沿学术脉络

AcademicHub 汇聚顶级期刊与预印本平台的实时文献。定制您的专属科研雷达,利用大语言模型自动生成交叉领域文献分析简报。

01.
arXiv (CS.AI) 2026-06-19

Optimal Order of Multi-Agent and General Many-Body Systems

作者:

arXiv:2606.20485v1 Announce Type: cross Abstract: This paper develops a general framework for analyzing multi-agent systems with feedback loops between agents actions and collective observations. The framework is built on two fundamental agent-level variables: power, which measures agent influence on collective outcomes, and response functions, which determine how agents react to observations. We derive how macroscopic properties, including total power, useful power, entropy, order, fragility, and mobility, emerge from these two variables of heterogeneous agents. To study the trade off between growth and resilience, we introduce a system-level utility function parameterized by a risk-appetite coefficient and derive an optimal degree of order that balances productivity, stability, and adaptability. The analysis suggests that stronger synchronization can increase collective output but may also increase systemic fragility and reduce mobility. We further argue that order, entropy, information, and useful energy are task-dependent and system-relative concepts whose meanings depend on the objectives of the system. By measuring and designing agent power distributions and response functions, it may be possible to better understand, predict, and optimize collective behavior and identify the conditions under which collective intelligence and optimal order emerge.

02.
arXiv (CS.CL) 2026-06-16

MosaicQuant: Inlier-Outlier Disaggregation for Unified 4-Bit LLM Quantization

4-bit quantization significantly reduces the memory footprint and accelerates the inference of large language models (LLMs). However, its limited bit-width representation struggles to faithfully capture both dense common values (inliers) and rare large-magnitude values (outliers), causing substantial accuracy degradation. Existing mixed-precision methods mitigate this by retaining outliers in high precision, but at the cost of breaking the uniformity of low-bit execution, introducing precision conversion and extra data movement that undermine practical speedup. We propose MosaicQuant, a unified 4-bit LLM quantization paradigm built on a novel principle of inlier–outlier disaggregation. Rather than elevating outlier precision, MosaicQuant quantizes the full weight matrix into a dense 4-bit base component, where inliers are captured faithfully while outlier are inevitably quantized. A sparse 4-bit residual component is then introduced to compensate for these quantization errors, selectively targeting the most error-critical weight blocks where output distortion is shown to be concentrated. However, a unified representation alone is insufficient, as naïvely executing the sparse residual as a separate kernel still breaks the unified low-bit inference pipeline. To bridge this gap, we introduce ZipperEngine, which fuses sparse block computation into the dense 4-bit GEMM kernel via an overlapped pipeline, unifying not only the representation but also the execution into a single coherent low-bit inference pipeline. Extensive experiments on LLaMA3 and Qwen3 demonstrate that MosaicQuant preserves near-FP16 accuracy while achieving up to $1.24\times$ speedup over the W16A16 baseline.

03.
arXiv (CS.AI) 2026-06-16

An Empirical Investigation of Pre-Trained Deep Learning Model Reuse in the Scientific Process

arXiv:2603.13584v2 Announce Type: replace-cross Abstract: Deep learning has achieved recognition for its impact within natural sciences, yet the prohibitive financial and technical cost of training models from scratch inhibit adoption. Following software engineering community guidance, natural scientists are reusing pre-trained deep learning models (PTMs) to amortize these costs. While prior works recommend PTM reuse patterns, we present the first empirical study of PTM reuse patterns in the natural sciences, quantifying the utilization and impact of PTM reuse within the scientific process across 17,718 peer reviewed, open access papers. Our results show that "Biochemistry, Genetics and Molecular Biology" has outpaced other natural scientific fields in PTM reuse, "adaptation" reuse is the most prevalent PTM reuse pattern identified across all natural science fields, and the "testing" stage of the scientific process has been most impacted by PTM integration.

04.
arXiv (quant-ph) 2026-06-15

Tantalum as a base material for superconducting integrated circuits

arXiv:2606.13750v1 Announce Type: new Abstract: The performance of superconducting integrated circuits for quantum applications is fundamentally limited by material-related losses. Tantalum, as an emerging material for next-generation quantum circuits, has attracted considerable attention in recent years after demonstrating breakthrough performance in both superconducting microwave resonators and qubits. Concurrently, a growing body of work is devoted to the operation of tantalum-based circuits and related fabrication techniques. This interest is further stimulated by tantalum thin films polymorphism resulting in a variety of its crystalline structure, superconducting properties, coherence, etc. Furthermore, tantalum circuits exhibit distinctive features in cryogenic experiments, which have not been observed in aluminum- or niobium-based ones. In this review, we summarize the recent research of tantalum thin films growth and phase selection mechanisms on various substrates, key aspects of fabrication and performance of superconducting circuit, including a material first-principles theoretical study. In conclusion, we address a number of open issues, including the role of \b{eta}-phase impurities, the effect of hydrofluoric acid solutions on chain characteristics, and the anomalous behavior of {\alpha}-tantalum chains at cryogenic temperatures.

05.
arXiv (CS.CV) 2026-06-11

SpecLoR: Spectral Lookahead Rectification for Motion-Coherent Text-to-Video Generation

Flow Matching has enabled robust text-to-video generation via latent ODE sampling. However, velocity approximation and numerical discretization errors inevitably accumulate, causing sampling trajectories to drift. Consequently, generated videos often suffer from severe spatiotemporal inconsistencies. Nevertheless, directly correcting these drifted, noisy latents is challenging: (i) timestep-dependent noise obscures reliable structural cues; (ii) spatial interventions risk disrupting intricate local geometry while incurring heavy computational costs. To address this, we propose Spectral Lookahead Rectification (SpecLoR), a plug-and-play inference method that bypasses noise via lookahead prediction, and circumvents spatiotemporal entanglement by shifting corrections to the frequency domain, where universal statistical priors of natural videos are readily available. First, during early sampling stages, SpecLoR looks ahead to estimate the clean latent $z_{t,0}$ and computes its 3D spatiotemporal spectrum. Next, SpecLoR rectifies the amplitude spectrum to match the prior, leaving the phase intact. Finally, the corrected state is re-noised to resume ODE integration. Experiments on Wan2.2 demonstrate that SpecLoR significantly reduces physical artifacts and enhances motion coherence across multiple benchmarks with minimal computational overhead (4 additional NFEs).

06.
arXiv (CS.CL) 2026-06-18

Human-AI Coevolution Dynamics: A Formal Theory of Social Intelligence Emergence Through Long-Term Interaction

Current conversational AI systems have made significant progress in language generation, personalization, and long-context interaction. However, most existing methods model social behavior through isolated components such as emotion modeling, memory retrieval, or persona conditioning, lacking a unified framework to explain the emergence of stable social relationships and social intelligence in long-term human-AI interaction.To address this, we propose the Human-AI Coevolution Dynamics Framework (HACD-H), a formal model of human-AI interaction as a self-organizing social cognitive system. HACD-H integrates emotional adaptation, relational organization, social memory, and personality consistency into a unified dynamical framework and introduces principles including multi-timescale social cognition, relational attractors, trust basins, developmental phase transitions, and social cognitive energy dynamics.We construct a conversational dataset with approximately 14,700 interaction turns and develop a theory-driven empirical evaluation framework. Results reveal a hierarchy of temporal persistence in social cognition, stable relational attractors, phase-transition-like developmental patterns, and a structured social cognitive energy landscape. Social intelligence shows a significant negative correlation with social cognitive energy (r = -0.391, p < 0.001), and interaction trajectories exhibit progressive energy reduction over time.These findings suggest that social intelligence emerges from long-term social cognitive coevolution rather than isolated conversational capabilities. HACD-H provides a unified theoretical foundation for modeling adaptive human-AI social interaction and developing socially intelligent AI systems.

07.
arXiv (CS.LG) 2026-06-19

Quantum-classical physics-informed Kolmogorov-Arnold networks for PDEs

arXiv:2606.20326v1 Announce Type: new Abstract: We develop QCPIKAN, the first quantum-classical physics-informed Kolmogorov-Arnold network designed to solve partial differential equations (PDEs). Built upon Chebyshev-polynomial KAN layers and parameterized quantum circuits, this hybrid framework embeds physical constraints into the training loss to enforce physical consistency. Our theoretical investigations grounded in approximation theory prove that this design accelerates high-frequency error convergence to an exponential rate and effectively mitigates numerical dispersion. We validate the framework across three typical seepage scenarios in porous media, including single-phase flow, component transport and two-phase flow. Compared with existing quantum-classical physics-informed neural networks, QCPIKAN achieves superior performance in global prediction accuracy, local error control, dynamic evolution tracking and displacement front localization. This work provides a robust and efficient alternative for solving complex PDEs.

08.
arXiv (CS.CV) 2026-06-16

Learning Directional Semantic Transitions for Longitudinal Chest X-ray Analysis

Chest X-ray (CXR) interpretation often requires longitudinal comparison to assess disease progression. Existing approaches typically rely on temporal feature fusion or inter-study discrepancy modeling, yet remain limited in capturing subtle progression semantics and overlook the inherently directional nature of disease trajectories. In this paper, we propose ProTrans, a novel vision-language pretraining framework that formulates disease progression as a directional semantic transition between paired CXR studies. ProTrans leverages radiology reports to anchor individual CXR representations within interpretable disease states, and introduces a learnable progression feature map to explicitly encode semantic shifts between states, aligned with report-derived progression descriptions. To enforce direction-aware perception, ProTrans incorporates a reversed temporal modeling process and imposes bidirectional reconstruction consistency across states and transitions, thereby disentangling directional semantics and promoting coherent trajectory modeling. Extensive experiments on longitudinal downstream tasks, including disease progression classification and progression captioning, demonstrate that ProTrans consistently outperforms existing methods, establishing a unified pretraining framework for longitudinal CXR understanding. https://github.com/RPIDIAL/ProTrans

09.
medRxiv (Medicine) 2026-06-15

Population-scale genomics reveals divergent pathogenicity of variant classes across paralogous collagen IV genes

Monoallelic pathogenic or likely pathogenic variants in COL4A3 and COL4A4 occur in approximately 1 in 106 individuals, yet whether these paralogous genes confer equivalent pathogenicity for the same variant classes has not been tested at population scale. Using whole-genome sequencing data from the UK Biobank (UKB; n = 500,000), with replication in the All of Us Research Program (n = 414,000), we performed per-variant association testing, gene-based collapsing analyses and phenome-wide association studies (PheWAS) across haematuria, proteinuria and chronic kidney disease. We identified 64 COL4A3 and 92 COL4A4 rare variants significantly associated with haematuria or proteinuria, generating a quantitative allelic series for clinical variant interpretation. Glycine substitutions within collagenous domains conferred similar risks in both genes. In contrast, truncating and non-collagenous domain (NC1) missense variants were strongly associated with haematuria and proteinuria in COL4A4 carriers but showed substantially attenuated or absent associations in COL4A3 carriers despite comparable carrier frequencies and predicted pathogenicity scores. These findings were independently replicated in All of Us. Genome-wide association analysis identified the COL4A3/COL4A4 locus as the dominant genetic determinant of haematuria, with the signal attributable to the aggregate effects of rare coding variants and no evidence of independent common variant or trans-acting modifier effects. These findings demonstrate substantial gene-specific differences in tolerance to truncating and NC1 variants between COL4A3 and COL4A4, challenging assumptions of equivalent pathogenicity across paralogous collagen IV genes. Gene identity and not variant class alone, should inform risk stratification, variant interpretation and genetic counselling in individuals carrying collagen IV risk genotypes.

10.
arXiv (CS.CV) 2026-06-15

GMN4AD: Graph Matching Network for Alzheimer's Disease Diagnosis with Test-Time Domain Adaptation using Multi-centered Structure Magnetic Resonance Imaging

Alzheimer's Disease (AD) is a progressive neurodegenerative disorder that affects millions of older adults, with prevalence expected to rise significantly in the coming years. Early diagnosis, particularly during the mild cognitive impairment (MCI) stage, is critical for timely intervention. Structural Magnetic Resonance Imaging (sMRI) has emerged as a key modality for detecting AD-related brain changes, but traditional graph-based approaches often struggle with modality and inter-site heterogeneity, limiting diagnostic performance. In this paper, we propose Graph Matching Network for Alzheimer's Disease Diagnosis (GMN4AD), designed to model interactions between heterogeneous brain graphs derived from neuroimaging data. Unlike conventional methods that treat each brain graph independently, GMN4AD leverages graph matching to capture cross-graph relationships, enhancing diagnostic precision. Furthermore, we introduce a test-time domain adaptation strategy that combines contrastive learning to mitigate domain shifts during inference. Extensive experiments on three public AD datasets demonstrate that GMN4AD achieves superior performance compared to state-of-the-art methods, offering a robust and generalizable solution for AD diagnosis.

11.
arXiv (CS.AI) 2026-06-17

DeepInsight: A Unified Evaluation Infrastructure Across the Physical AI Stack

arXiv:2606.17574v1 Announce Type: new Abstract: Evaluating a Physical AI stack spans operators that differ by more than three orders of magnitude – from a single foundation-model decoding step to thousands of physics ticks of whole-body control – varying orthogonally in modality, reward semantics, and resource profile. No existing framework spans this range, so the stack is evaluated today by stitching together separate harnesses that share neither runtime nor scoring, preserving each segment's local validity but losing the shared identity needed to diagnose cross-layer regressions. We present DeepInsight, an evaluation infrastructure that serves this full spectrum on a single runtime. Rather than homogenize the regimes, it preserves their heterogeneity behind three narrow abstractions – task, resource, and result – each realized as one invariant shared by every subsystem: one episode driver, one resource-handle protocol implemented by every expensive backend (LLM inference and sandboxed runtimes alike), and one trace identity scheme under which every event is written. Deployed in production across all three layers of an embodied humanoid stack, this single set of invariants onboards new benchmarks largely by configuration. Where mature peer orchestrators exist – at the foundation-model end – it reproduces published references and peer-framework readings within their own spread, runs the same suites faster on a single node, and scales near-linearly across nodes. Its distinctive return is diagnostic: because every layer writes into one shared trace, a regression that begins in one layer and surfaces in another stays localizable on that trace – a cross-layer payoff no federation of per-segment harnesses can reproduce.

12.
arXiv (CS.AI) 2026-06-19

Augmenting Game AI with Deep Reinforcement Learning

arXiv:2606.20210v1 Announce Type: new Abstract: Immersion in video games depends not only on graphics, audio, and game mechanics, but also on the quality of in-game characters. Producing believable characters, or game AI, remains a significant challenge as behavioral complexity is hard to capture with hand-coded systems. Game AI is a source of immersion and engagement; however, the limitations stemming from the challenges of creating game AI often lead to frustration and the breaking of the illusion of realism within the game. The introduction of machine learning models opens the door to creating more believable, authentic, and relatable characters in games. The promise is that they either learn from interacting with the game, or from player data, to develop true human-like behavior. In this paper, we envision more applications of reinforcement learning for game AI in the future. For this to materialize, current research limitations are prohibitive to broad deployment across game genres. Therefore, we propose a framework for training reinforcement learning models with a set of requirements in mind that are suited towards game AI and game development. We present examples of games with reinforcement learning-augmented game AI and describe the practicalities of deploying player-facing machine learning agents in modern games. Furthermore, we identify bottlenecks and hard problems in these areas, which we believe offer promising research directions to accelerate the adoption of machine learning in game AI for the video game industry.

13.
arXiv (CS.AI) 2026-06-19

Modeling Day-Long ECG Signals to Predict Heart Failure Risk with Explainable AI

arXiv:2601.00014v2 Announce Type: replace-cross Abstract: Heart failure (HF) affects 11.8% of adults aged 65 and older, reducing quality of life and longevity. Preventing HF can reduce morbidity and mortality. We hypothesized that artificial intelligence (AI) applied to 24-hour single-lead electrocardiogram (ECG) data could predict the risk of HF within five years. To research this, the Technion-Leumit Holter ECG (TLHE) dataset, including 69,663 recordings from 47,729 patients, collected over 20 years was used. Our deep learning model, DeepHHF, trained on 24-hour ECG recordings, achieved an area under the receiver operating characteristic curve of 0.80 that outperformed a model using 30-second segments and a clinical score. High-risk individuals identified by DeepHHF had a two-fold chance of hospitalization or death incidents. Explainability analysis showed DeepHHF focused on arrhythmias and heart abnormalities. This study highlights the feasibility of deep learning to model 24-hour continuous ECG data, capturing paroxysmal events essential for reliable risk prediction. Artificial intelligence applied to single-lead Holter ECG is non-invasive, inexpensive, and widely accessible, making it a promising tool for HF risk prediction.

14.
arXiv (CS.AI) 2026-06-12

Graph Reduction in Multirelational Networks: A Spreading-Oriented Reduction Benchmark

arXiv:2606.12581v1 Announce Type: cross Abstract: Real-world networks are inherently incomplete, noisy, and dynamically evolving, making it difficult to capture all actors and their relationships. Their scale often renders direct analysis computationally demanding. While influence maximisation (IM) has been widely studied, the role of graph reduction as a preprocessing step, and its impact on IM accuracy, remains underexplored. In this work, we introduce the Spreading-Oriented Reduction Benchmark (SORB), an open-source, standardised framework for systematically evaluating IM models across diverse task settings. SORB provides an extensible pipeline operating on a representative collection of real-world networks, including single- and multilayer structures, and accounts for graph reduction directly into the evaluation process. This design shifts the focus from analysing IM algorithms in isolation to quantifying how graph reduction alters predictive performance. Using SORB, we study the effects of sparsification and coarsening across multiple IM scenarios. Our results show that the impact of reduction is strongly dependent on both the network type (single-layer vs. multirelational) and the downstream task ($Gain@k$ vs. $\mathrm{AUC}_{\mathrm{cutoff}}$): sparsification preserves seed set quality on single-layer networks, whereas flattened multilayer networks exhibit systematic ranking degradation regardless of reduction strategy. These findings highlight the importance of reduction-aware, multi-task evaluation when studying spreading processes in complex networks.

15.
arXiv (CS.LG) 2026-06-15

Shuttling Compiler for Trapped-Ion Quantum Computers Based on Large Language Models

arXiv:2512.18021v3 Announce Type: replace-cross Abstract: We present the first shuttling compiler based on large language models (LLMs) for trapped-ion quantum computers, where qubits are shuttled between segments for gate execution and qubit storage. We fine-tune pre-trained LLMs on examples from linear and branched one-dimensional shuttling architectures. Thus, we obtain a layout-independent compilation strategy that learns the required shuttling operations directly from data. Using benchmark circuits with up to 16 qubits, such fine-tuned LLMs can now generate valid schedules for shuttling architectures. Notably, we also obtain a valid schedule for a previously unseen four-way junction layout. This demonstrates that trained LLMs can generalize to layouts not encountered during training. For various architectures, LLM-based schedules improve upon state-of-the-art baseline compiler results, reducing the shuttling effort by up to 15%.

16.
arXiv (CS.CL) 2026-06-15

Detecting Historical Turning Points in Italian Media: A Complex Systems Approach to a Diachronic News Corpus

The increasing availability of large-scale textual corpora has opened new possibilities for data-driven, quantitative approaches to historical analysis using Natural Language Processing (NLP). However, diachronic corpora with historical relevance from the pre-digital era remain scarce and often incomplete. We present a quantitative approach to historical analysis based on the reconstruction and exploration of a diachronic corpus of around 600,000 articles from the Italian newspaper "La Repubblica", covering all the articles published from the 1st of January 1985 to the 31st of December 2000 - a period of major political, social, and geopolitical change in Italy and globally. Using NLP techniques, we analyze the text at both lexical and semantic levels; we then apply tools from complex systems and statistical physics to trace shifts in media discourse over time. This allows us to detect key transition periods, such as the transition from the First Republic to the Second Republic in Italy, or major international conflicts like the Gulf War or the Kosovo War, without relying on prior labeling. The results show how combining computational linguistics with ideas from complex systems can offer new quantitative insight into historical changes, opening up new paths for studying the dynamics of media and society through large-scale textual data.

17.
arXiv (CS.LG) 2026-06-18

Structural MRI Synthesis for Alzheimer's Disease via Conditional Diffusion on Anatomical Masks

arXiv:2606.18354v1 Announce Type: cross Abstract: Recent advances in generative machine learning models have significantly improved medical imaging, offering promising solutions for data augmentation, privacy preservation, and improved model generalization. However, synthesizing high-quality structural MRI data for Alzheimer's Disease (AD) remains challenging due to the subtle, region-specific, and progressive anatomical changes associated with neurodegeneration. In this paper, we extend the Med-DDPM conditional diffusion model – originally designed for brain tumor synthesis – to generate 3D structural MRIs specifically tailored to AD. We adopted Med-DDPM due to its established stability and structural fidelity compared to other generative models, which makes it particularly suitable for capturing the subtle anatomical changes characteristic of AD. Our approach conditions the diffusion process on anatomical segmentation masks derived from the ADNI dataset, incorporating key AD-relevant brain structures into the generation process. We systematically evaluate the quality and utility of the synthetic images by training segmentation models on real, synthetic, and hybrid (mixed) datasets. Experimental results demonstrate that segmentation models trained exclusively on synthetic data achieve comparable Dice scores (0.6532) to those trained on real data (0.6513), while exhibiting significantly enhanced recall. Notably, models trained on hybrid datasets (mixing real and synthetic images) outperform both real and synthetic-only baselines, achieving a Dice score of 0.7244. These findings underscore the successful use of conditional diffusion models for generating anatomically accurate, AD-specific synthetic MRIs, and highlight their potential for enhancing training data availability, improving diagnostic accuracy, and promoting research reproducibility in neuroimaging studies.

18.
medRxiv (Medicine) 2026-06-10

Epidemiology of Cervical Precancerous Lesions: Prevalence and Predictors from Pap Smear Screening in Hawassa City Hospitals, Sidama Region, Ethiopia. Institutional-Based Cross-sectional Study

Background: Cervical cancer is the fourth most common cancer in women worldwide and remains a major public health challenge. In Ethiopia, it is the second leading cause of cancer deaths, with around 8,000 new cases and 6,000 deaths each year. Region?specific data on the prevalence and predictors of precancerous lesions remain scarce, yet such information is vital for guiding targeted reproductive health strategies. This study therefore examined the prevalence and predictors of cervical precancerous lesions among women aged 21-60 years undergoing Pap smear screening in public hospitals in Hawassa City, Sidama Region. Methods: An institution-based cross-sectional study was conducted among 241 women attending Pap smear screening at public hospitals in Hawassa City from March to August 2025. Sociodemographic and clinical data were collected via interviews and medical records. Lesions were classified based on the standardized international framework for reporting cervical cytology results from Pap smears per the Bethesda system. Multivariable logistic regression identified predictors p

19.
arXiv (CS.LG) 2026-06-17

Expanding SPHERE-JEPA: A Family of Statistical Regularizers for the Hypersphere

arXiv:2606.17603v1 Announce Type: new Abstract: In Self-Supervised Learning (SSL), preventing representation collapse by explicitly enforcing a uniform distribution on the unit hypersphere has proven to be effective. However, current frameworks typically rely on sliced statistical regularizers such as SIGReg (used in LeJEPA) and SUSReg (used in SPHERE-JEPA), which approximate this continuous objective via Monte Carlo sampling along random 1D directions. This stochasticity injects projection variance into the training gradients, destabilizing optimization, and hindering convergence. In this work, we first show that analytically integrating out these random projections natively yields a deterministic Maximum Mean Discrepancy (MMD), bypassing the variance of sliced methods. Motivated by this equivalence, we formulate full-dimensional objectives for MMD, Kernel Stein Discrepancy (KSD), and Kullback-Leibler (KL) divergence directly on the sphere to enforce a uniform distribution. To prevent spatial bias, we equip these tests with rotationally invariant kernels constructed via spectral theory, systematically evaluating two canonical families: smooth exponential decay (Heat) and strict frequency cutoff (Bandlimited) filters. Empirically, removing projection-induced noise results in more stable optimization, faster convergence, and consistent improvements over stochastic sliced regularizers on ImageNet and Galaxy10. Furthermore, we reveal that the choice of the statistical test shapes the geometry of the learned latent space: MMD and KSD favor locally clustered organization suitable for object-centric domains, whereas the continuous KDE-based KL divergence promotes fine-grained instance separation, yielding the strongest results on unclustered procedural texture retrieval.

20.
arXiv (CS.CV) 2026-06-18

CAMEO: A Conditional and Quality-Aware Multi-Agent Image Editing Orchestrator

Conditional image editing aims to modify a source image according to textual prompts and optional reference guidance. Such editing is crucial in scenarios requiring strict structural control (i.e., anomaly insertion in driving scenes and complex human pose transformation). Despite recent advances in large-scale editing models (i.e., Seedream, Nano Banana, etc), most approaches rely on single-step generation. This paradigm often lacks explicit quality control, may introduce excessive deviation from the original image, and frequently produces structural artifacts or environment-inconsistent modifications, typically requiring manual prompt tuning to achieve acceptable results. We propose CAMEO, a structured multi-agent framework that reformulates conditional editing as a quality-aware, feedback-driven process rather than a one-shot generation task. CAMEO decomposes editing into coordinated stages of planning, structured prompting, hypothesis generation, and adaptive reference grounding, where external guidance is invoked only when task complexity requires it. To overcome the lack of intrinsic quality control in existing methods, evaluation is embedded directly within the editing loop. Intermediate results are iteratively refined through structured feedback, forming a closed-loop process that progressively corrects structural and contextual inconsistencies. We evaluate CAMEO on anomaly insertion and human pose switching tasks. Across multiple strong editing backbones and independent evaluation models, CAMEO consistently achieves 20\% more win rate on average compared to multiple state-of-the-art models, demonstrating improved robustness, controllability, and structural reliability in conditional image editing.

21.
arXiv (CS.AI) 2026-06-11

Position: Stop Anthropomorphizing Intermediate Tokens as Reasoning/Thinking Traces!

arXiv:2504.09762v4 Announce Type: replace Abstract: Intermediate token generation (ITG), where a model produces output before the solution, has become a standard method to improve the performance of language models on reasoning tasks. These intermediate tokens have been called \say{reasoning traces} or even \say{thinking traces} – implicitly anthropomorphizing the traces, and implying that these traces resemble steps a human might take when solving a challenging problem, and as such can provide an interpretable window into the operation of the model's thinking process to the end user. In this position paper, we present evidence that this anthropomorphization isn't a harmless metaphor, and instead is quite dangerous – it confuses the nature of these models and how to use them effectively, and leads to questionable research. We call on the community to avoid such anthropomorphization of intermediate tokens.

22.
arXiv (CS.CL) 2026-06-19

Telenor Nordics Customer Service self-help corpus

作者:

This paper presents a multilingual customer service self-help corpus comprising 1,122 manually validated documents in Finnish, Danish, Norwegian, and Swedish, totaling 274,599 words and 1,884,833 characters. The documents have been sourced from the public self-help pages of four Nordic telecommunications operators and subsequently filtered for person-identifiable information and relevance through a combined LLM and human annotation pipeline. Domain-specific datasets for Nordic languages remain scarce, particularly in customer service: a domain of growing importance for retrieval-augmented generation, cross-lingual transfer learning, and emerging agent-based service architectures. An analysis of the corpus reveals substantial variation in document length and structure across operators, reflecting distinct editorial strategies, as well as broad topical coverage spanning network hardware, mobile services, TV and streaming, billing, and account management. The dataset is publicly available under a CC-BY-NC-SA-4.0 license at https://zenodo.org/records/20732652, intended to support reproducible research in Nordic NLP and information retrieval.

23.
arXiv (CS.CV) 2026-06-17

DRIVESPATIAL: A Benchmark for Spatiotemporal Intelligence in VLMs for Autonomous Driving

Spatiotemporal intelligence in autonomous driving (AD) requires an agent to integrate multi-view observations into a coherent scene representation, maintain object continuity across viewpoints and time, and reason about spatial relations, interactions, and future dynamics. However, existing AD vision-language benchmarks largely focus on single-view, static, ego-centric, or single-source question answering, leaving it unclear whether current Vision-Language Models (VLMs) can truly construct and reason over dynamic driving scenes. We introduce DriveSpatial, a benchmark of 15.6K human-verified QA pairs across 20 tasks from five large-scale AD datasets. DriveSpatial evaluates four abilities: Cognitive Scene Construction, Multi-view Relational Understanding, Temporal Reasoning, and Generalization. Unlike prior benchmarks, DriveSpatial is generated from a dynamic multi-relational scene graph that encodes object states, spatial relations, interactions, camera visibility, and temporal correspondences, enabling QA pairs that enforce genuine cross-view and spatiotemporal reasoning. Evaluating 15 representative VLMs reveals a substantial human-model gap: the strongest model trails humans by 28.4 points, with Cognitive Scene Construction emerging as the key bottleneck. Further diagnostics show that language-only prompting is insufficient, while explicit BEV grounding consistently improves performance. These results suggest that current VLMs lack the scene-construction ability needed for reliable spatiotemporal driving intelligence. DriveSpatial and its construction pipeline will be released to support future research.

24.
medRxiv (Medicine) 2026-06-12

An integrative multi-omics framework identifies epigenetic dysregulation of HAND2 as a potential primary driver of impaired enteric neural crest cell differentiation in Hirschsprung Disease

Hirschsprung disease (HSCR) is a congenital neurodevelopmental disorder characterized by segmental aganglionosis due to impaired developmental processes of enteric neural crest cells (NCCs). Despite being the leading genetic cause of functional intestinal obstruction in early childhood, HSCR represents a paradigmatic challenge in precision medicine: its multifactorial etiology, complex gene-environment interactions and limited resolution of single-modality analyses have long hindered mechanistic understanding and therapeutic translation. Here, we applied an integrative multi-omics approach combining genetic, phenotypic, epigenomic and transcriptomic analyses of matched ganglionic and aganglionic formalin-fixed paraffin-embedded (FFPE) patient tissues, complemented by patient-specific in vitro models. Beyond established genetic contributors, our integrative approach reveals novel regulatory pathways predominantly affecting enteric NCC differentiation, with convergent evidence pointing to epigenetic dysregulation as a primary disease mechanism. Notably, we identified over 1,300 differentially methylated positions between ganglionic and aganglionic FFPE samples, with HAND2 emerging as a key candidate due to multiple hypermethylated sites and consistently reduced expression levels in aganglionic tissues and in vitro models, suggesting a potential role in HSCR pathophysiology. We propose that our multi-omics approach offers a powerful and comprehensive framework for dissecting disease mechanisms. Beyond advancing biological understanding, this strategy holds promise for paving the way for molecularly informed patient stratification and supporting the development of personalized treatment and postoperative management strategies.

25.
arXiv (CS.LG) 2026-06-16

RepNet: Tackling spectral bias in deep neural networks via parameter reparameterization

arXiv:2606.16575v1 Announce Type: new Abstract: Deep neural networks (DNNs) have achieved remarkable success in scientific computing, yet they often suffer from spectral bias in capturing oscillatory and multiscale behaviors. In this study, we investigate this limitation by examining the failure of shallow ReLU neural networks in fitting high-frequency functions. This observation identifies two important factors in resolving rapid oscillations: the initial slope scale and the distribution of partition points induced by the networks. Motivated by this analysis, we propose RepNet, a reparameterized DNN model for ReLU and tanh networks designed for high-frequency and multiscale problems. The key idea is to reparameterize the weights and biases in the first hidden layer, which enables effective control of the initial slope scale and provides an appropriate distribution of the initial partition points. Furthermore, treating the reparameterized weights and biases as trainable parameters allows the DNN to achieve adaptive frequency scaling during training. In addition, we derive quantitative estimates for the output and slope magnitudes of the reparameterized DNN to guide the initialization of the proposed method. Numerical experiments, including multiscale one- and four-dimensional function approximation, forward and inverse PDE problems in combination with physics-informed neural networks (PINNs), and operator learning, demonstrate that RepNet improves the predicted accuracy of vanilla DNNs in capturing highly oscillatory features with slightly additional computational cost. These results indicate that RepNet provides an effective and flexible approach for overcoming spectral bias and applying DNNs to multiscale problems.