Academic Intelligence · Curated Daily

探索全球前沿学术脉络

AcademicHub 汇聚顶级期刊与预印本平台的实时文献。定制您的专属科研雷达,利用大语言模型自动生成交叉领域文献分析简报。

01.
arXiv (quant-ph) 2026-06-11

Quantum Correlation Hierarchy and Teleportation in Dephased Hydrogen Hyperfine System

arXiv:2606.11731v1 Announce Type: new Abstract: We study the dynamics of quantum correlations in the hydrogen hyperfine spin system subject to Markovian phase noise. Treating the electron and proton spin degrees of freedom as an open two-qubit system governed by an isotropic hyperfine Hamiltonian and local dephasing, we obtain the exact time-dependent density matrix and derive analytical expressions for the full X-state family. We compute concurrence($C$), trace-distance measurement-induced nonlocality (Trace MIN–$\mathcal{N}_1$), and average steering coherence (ASC) in closed form and establish their strict ordering $ C(t)\leq \mathcal{N}_1(t)\leq \mathrm{ASC}(t) $ at all times. Entanglement is identified as the most fragile resource, undergoing sudden death at a finite time. Trace MIN exhibits dephasing-immune freezing for states with nonzero population imbalance, while ASC is the most robust quantity, persisting longest in every scenario studied.We additionally demonstrate that the dephased thermal hyperfine state serves as a resource for quantum teleportation, deriving a closed-form expression for the average fidelity and establishing that the teleportation advantage window coincides exactly with the entanglement survival interval, $\mathcal{F}_A > 2/3 \Longleftrightarrow \mathcal{C} > 0$, for the full X-state family with maximally mixed marginals. We identify four distinct dynamical regimes and map all three correlation measures onto directly measurable Pauli spin correlators, enabling experimental reconstruction of the full hierarchy without full state tomography.

02.
PLOS Computational Biology 2026-06-22

Adhesion and polarity-driven morphogenesis: Mechanisms and constraints in tissue formation

by Yoshiyuki T. Nakamura, Chikara Furusawa, Kunihiko Kaneko Embryonic development in multicellular organisms exhibits diverse morphogenetic patterns, which can generally be categorized into fundamental types such as monolayer and multilayer spheres, as well as cell masses. Furthermore, we identify two distinct processes for the formation of spherical structures. These basic patterns are thought to be governed by the microscopic properties of intercellular adhesion. However, the specific mechanisms linking the microscopic factors to the emergence of distinct macroscopic morphogenetic patterns remain poorly understood. In this study, we explore how different morphogenetic patterns arise by employing a computational model that incorporates intercellular adhesion and polarity. Our results demonstrate that all fundamental morphogenetic patterns can be generated through the interplay of two key parameters: the polarity strength of the cell and the regulation of polarity via mechanical signals. Furthermore, analytical considerations reveal key mechanisms underlying the formation of these patterns. These findings highlight the critical role of physical constraints in morphogenesis and suggest potential applications to the design of artificial tissues and organoids.

03.
arXiv (CS.CL) 2026-06-19

A BART-based approach with hierarchical strategy for Vietnamese abstractive multi-document summarization

In this technical report, we focus on solving the challenge of Vietnamese multi-document abstractive summarization, introduced in the International Workshop on Vietnamese Language and Speech Processing (VLSP) 2022. We choose to follow the popular hierarchical approach, i.e. condensing each document followed by aggregation and summarization. We propose a novel yet simple strategy to shorten documents that is driven by the golden summary, thus ensuring high correlation between stages of the hierarchical approach. Our method achieves a ROUGE2-F1 score of 0.2468 on the VLSP's public test set, and can produce fluent and concise summaries. Additionally, we utilize external sources for extra data, which greatly enhances the quantity of data for Vietnamese multi-document summarization. The additional data is made available for the community.

04.
arXiv (CS.LG) 2026-06-16

A Bifurcation Theory Framework for Gradient Descent on the Edge of Stability

作者:

arXiv:2606.15551v1 Announce Type: new Abstract: The Edge of Stability (EoS) phenomenon, where gradient descent operates with sharpness exceeding the classical convergence threshold yet the loss decreases over long timescales, is ubiquitous in modern deep learning but remains poorly understood in realistic settings. Prior rigorous analyses have been largely confined to scalar or low-dimensional losses with specific structural forms. In this work, we develop a bifurcation theory framework for gradient descent on the edge of stability that applies directly to overparameterized neural networks. By decomposing the training dynamics into components normal and tangent to the manifold of minimizers, we show that stable EoS training arises from a flip bifurcation in the normal direction, governed by the sign of the first Lyapunov coefficient, while the tangent dynamics drift toward regions of decreasing sharpness. Under mild spectral and geometric assumptions on the loss landscape, we prove convergence to the minimizing manifold when training at the EoS threshold. As a corollary, we recover and unify prior results: we show that the product-stability condition of Gan (2026) is an instance of our framework.

05.
PLOS Computational Biology 2026-06-18

A comparison of contact patterns derived from the population structure in agent-based models and empirical contact survey data

作者:

by Janik Suer, Johannes Ponge, Michael Brüggemann, Jan Pablo Burgard, Vitaly Belik, Bernd Hellingrath, Alejandra Rincón Hidalgo, Andrzej K. Jarynowski, Richard Pastor, Huynh Thi Phuong, Steven Schulz, Ashish Thampi, Chao Xu, Marlli Zambrano, Rafael Mikolajczyk, André Karch, Veronika K. Jaeger, on behalf of the OptimAgent Consortium Agent-based models (ABMs) are powerful tools for simulating disease spread, relying on individual-level interaction rules from which emergent dynamics arise. An important component in ABMs is contact behaviour. To reduce computational complexity, contact behaviour in ABMs is often assumed as random mixing within structurally defined settings (as, e.g., workplaces). with setting composition typically based on empirical data such as census information. However, the validity of this approach to represent contacts remains unclear. To address this gap, we compare the contact structure derived through this approach in a large-scale ABM with empirical contact survey data with respect to age contact matrices for households, schools, workplaces, all remaining contact settings, and all contacts combined (based on difference matrices and sum of squared errors (SSE)). Our results demonstrate that random mixing in settings with known age compositions like households (SSE:0.7(95%CI0.4–0.9)), schools (SSE:0.7(95%CI:0.3–1.1)) and workplaces (SSE:0.5(95%CI:0.2-0.7)), captures basic interaction patterns but fails to account for age-related variation in contact numbers. The largest differences arise for contacts outside these settings (SSE:3.8(95%CI:1.2–6.5)), as ABMs typically use random regional contacts that do not capture age-structured behaviour observed in contact surveys. Applying contact matrices from both approaches to an age-structured compartmental model, leads to noticeable differences in simulated epidemic outcomes regarding reproduction numbers and spreading dynamics between age groups. Our results suggest that naïve approaches to represent contact behaviour in ABMs based on population structure can be valid in settings with defined age-structures while settings with low a priori structure require more advanced methods to represent contact behaviour observed in contact surveys.

06.
arXiv (quant-ph) 2026-06-24

Quantum Entanglement Halves the Oblivious Update Bandwidth

作者:

arXiv:2605.19248v2 Announce Type: replace Abstract: We consider $(n,k)$ MDS-coded distributed storage over $\mathbb{F}_q$ with per-node storage $\alpha$ symbols. For the oblivious update problem, where a single message symbol changes and neither helpers nor the stale node know which, the classical lower bound is $\alpha k \log_2 q$ bits. We prove that when the $k$ contacted helpers share prior quantum entanglement, the update bandwidth is $\lceil \alpha/2 \rceil \cdot k \log_2 q$ bits-equivalent, a factor approaching 2 reduction. For $\alpha = 2$, a $[[k, k-2]]_q$ CSS code achieves bandwidth $k \log_2 q$ with one qudit per helper. For general $\alpha$, a $[[\lceil \alpha/2 \rceil k, \lceil \alpha/2 \rceil k - \alpha]]_q$ CSS code achieves the bound with $\lceil \alpha/2 \rceil$ qudits per helper. The matching converse uses the superdense coding bound: the stale node holds all transmitted qudits and hence the entangled partners, so each helper's channel supports at most $D^2$ distinguishable signals for dimension $D$. The result holds for all $(n,k)$ pairs with sufficiently large prime $q$.

07.
arXiv (CS.LG) 2026-06-19

PaAno+: Multiscale Encoding and Cross-Variable Attention for Time Series Anomaly Detection

arXiv:2606.20055v1 Announce Type: new Abstract: Time-series anomaly detection has significant practical value for industrial and medical monitoring, as well as other critical domains. Current Transformer- and large-model-based detection approaches incur excessive computational overhead, while existing lightweight alternatives are constrained by insufficient feature extraction and inadequate modeling of dependencies across multivariate variables. To mitigate the above drawbacks, this study develops a lightweight, efficient anomaly detection model, dubbed PaAno, within the patch-oriented representation learning paradigm. In the encoder module, a multiscale feature-extraction backbone is constructed using convolutional kernels with differentiated receptive fields to capture hierarchical temporal characteristics; subsequent cross-scale adaptive attention aggregation, combined with residual connection optimization, further stabilizes feature representation learning. A cross-variable fusion attention module is embedded to explicitly characterize inter-variable correlations, empowering the model to identify anomalous patterns amid intricate operational conditions. Moreover, a novel pretext task based on temporal patch-window sorting is customized to uncover intrinsic structural properties of time series, and triplet loss is leveraged to optimize the patch embedding space for enhanced feature discrimination. Extensive experiments on the TSB-AD benchmark demonstrate that the proposed PaAno achieves state-of-the-art detection accuracy on both univariate and multivariate tasks, yielding significant performance gains across evaluation metrics, including VUS-PR, relative to the original PaAno. Leveraging a compact network design, the presented model achieves favorable computational efficiency, enabling deployment on resource-limited terminals for real-time anomaly inference.

08.
arXiv (CS.CL) 2026-06-17

Securing Multi-Agent GIS Systems: Risk Evaluation and Prompt Hardening Optimization

Agentic systems are increasingly integrated with geographic information systems (GIS), where multi-agent coordination enables complex conversational and spatial analysis but introduces security risks. This work presents a security-oriented framework for risk identification, evaluation, and mitigation in a multi-agent GIS system while maintaining adaptability to broader agentic architectures. We test the agentic system of a commercial geospatial partner while developing a modular state-machine-based orchestration framework that abstracts agent behavior into reusable components. We evaluate robustness using a red-teaming framework with an adaptive attacker LLM and a deterministic judge that produces binary outcomes with supporting rationales across multi-turn attacks. We further improve resilience with a prompt optimization framework that treats prompts as structured signatures and injects adversarial demonstrations, enabling systematic security improvements without degrading task performance.

09.
arXiv (CS.CL) 2026-06-25

The Hitchhiker's Guide to Agentic AI: From Foundations to Systems

The Hitchhiker's Guide to Agentic AI is a comprehensive practitioner's reference for building autonomous AI systems. The book covers the full stack from first principles to production deployment, organized around a central thesis: building great agentic systems requires understanding every layer of the pipeline, not just one. The book opens with the LLM substrate – transformer architecture, GPU systems, training and fine-tuning (SFT,LoRA, MoE), model compression, and inference optimization – treated as essential foundations rather than the primary focus. It then develops the alignment and reasoning layer: reinforcement learning from human feedback (RLHF), PPO, DPO and its variants, GRPO, reward modeling, and RL for large reasoning models including chain-of-thought and test-time scaling. The second half is devoted to agentic AI proper. Topics include agentic training and trajectory-based RL, retrieval-augmented generation (RAG and Agentic RAG), memory systems (in-context, external, episodic, and semantic), agent harness design and context management, and a taxonomy of agent design patterns. Inter-agent coordination is covered in depth: the Model Context Protocol (MCP), agent skills and tool use, the Agent-to-Agent (A2A) communication protocol, and multi-agent architectures spanning centralized, decentralized, and hierarchical topologies. The book concludes with agent development frameworks, agentic UI design, evaluation methodology for agentic tasks, and production deployment. Each chapter pairs rigorous theoretical foundations with implementation guidance, code examples, and references to the primary literature.

10.
bioRxiv (Bioinfo) 2026-06-11

DyMoTree decodes early cell state transitions and drivers from single-cell transcriptomes using a tree-structured neural network

Inferring early cell fate from single-cell RNA-sequencing data is essential for identifying cellular origins and fate plasticity in development and disease. However, existing methods often fail to exploit tree-structured lineage trajectories, limiting the accuracy and interpretability of fate mapping. Here we present DyMoTree, a computational framework that models cell fate decisions as nonlinear mappings between progenitor and terminal cell states under explicit lineage constraints. By integrating lineage graphs with a tree-structured neural architecture, DyMoTree learns lineage-resolved cell-state transition maps from single-cell transcriptomes, enabling robust inference of early fate bias and identification of fate-specific progenitor substates and driver genes. Across simulations, lineage-tracing experiments, and in vivo systems, DyMoTree outperformed existing methods in resolving early fate biases. Applications to mouse embryogenesis, lung adenocarcinoma progression, and CAR-T immunotherapy revealed regulatory programs underlying developmental and disease-associated transitions. DyMoTree provides a general framework for modeling lineage-resolved cell-state dynamics underlying development and disease progression.

11.
arXiv (CS.CL) 2026-06-12

Trait, Not State: The Durability of Reading Identity in Social Highlighting

Prior work on a social web highlighter located individuality in selection – which documents a person chooses to highlight – but measured it cross-sectionally. We ask the temporal question: is a reader's selection signature a trait or a state? We freeze each reader's first six months of highlighting as a profile and track its own-vs-other advantage on their later selections at growing gaps (to 24+ months), with negatives drawn from the same calendar era – so supply drift cannot masquerade as personal drift – at a coarse global level and at a fine level whose negatives and controls come from the reader's own interest neighborhood; the anchor cell reproduces the prior cross-sectional level (+0.188 vs +0.169), validating the harness. Four results. Within the same users, the fine-layer advantage shows no statistically detectable paired decline at any horizon (6-12 month retention R = 1.00 [0.85, 1.18], n = 212; the farthest bin is compatible with a modest decline; the only contrast whose interval excludes zero is the coarse layer at 12-24 months, about 13%). The signal is not reducible to repeated domains (~90% survives excluding all profile sources). Within-person drift is slow (a recent-half profile beats the old half by +0.042). Prospectively, personal profiles – even one built from a reader's earliest documents, median 20 months before evaluation – rank their next reads at roughly 3x the AP of every simple non-personal prior tested. We use "trait" operationally (a stable signature under continued engagement); the scope is heavy, long-tenured readers of one platform, and exposure is not separable from choice.

12.
arXiv (CS.CV) 2026-06-16

DDTNet: Degradation Disentanglement and Transfer Network for Test-Time All-in-One De-weathering Adaptation

All-in-one adverse weather image restoration aims to remove multiple degradations, such as rain, haze, and snow, using a single unified model. Despite their broad applicability, existing methods typically compromise performance, delivering balanced but suboptimal results for individual degradation types. This issue becomes more pronounced when a domain gap exists between training and testing data. Motivated by the observation that modeling degradation patterns is more feasible than recovering clean content, we propose the Degradation Disentanglement and Transfer Network (DDTNet), which focuses specifically on degradation transfer. By disentangling degradation patterns from target-domain degraded images and transferring them to source domain clean images, DDTNet generates domain-adaptive paired training data. These pairs are then used to fine-tune restoration models, significantly enhancing their adaptability across diverse weather conditions and domains. The core of DDTNet is the Degradation Disentanglement Module (DDM), which comprises Degradation Coupled Attention (DCA) to capture both general and weather-specific features, thereby enabling effective disentanglement and transfer of degradation patterns. Experimental results demonstrate that DDTNet significantly and consistently improves existing all-in-one models across real-world deraining, desnowing, and dehazing datasets.

13.
arXiv (CS.CV) 2026-06-16

LOCUS: Local Visual Cue Search for Enhancing Fine-Grained Perception in Multimodal Large Language Models

Multimodal Large Language Models (MLLMs) remain unreliable on fine-grained visual perception, even when high-resolution inputs preserve the necessary local details. We identify this limitation as visual context rot: decisive evidence may exist in the full image, yet fail to be reliably selected and used amid redundant visual context. We propose LOCUS (LOcal visual CUe Search), a training framework that teaches MLLMs to internalize local evidence search through a verifiable proxy task. During training, LOCUS provides a local crop as a visual cue and optimizes the model to recover its spatial support in the full image using an IoU-based reward. The visual cue is used only during training, leaving the standard image-question inference interface unchanged. Experiments across fine-grained perception, hallucination, general understanding, and reasoning benchmarks show that LOCUS improves localization-sensitive visual understanding while preserving broad capabilities. Attention analyses further indicate stronger focus on task-relevant evidence regions, suggesting that training-time visual cue search provides an effective route to internalized fine-grained evidence selection.

14.
arXiv (CS.CL) 2026-06-16

IMPACTeen: Intentions, Manipulation, Persuasion, Annotations, and Consequences in Teen Communication Dataset

IMPACTeen is a dataset of textual social influence scenarios spanning interpersonal, media-based, and digital settings in an adolescent context. It contains 1,021 texts, 5,100 individual annotation records, and gold labels for social influence techniques, with each text annotated from five distinct perspectives: teenagers, parents, psychologists, communication experts, and teachers. The resource was constructed through constrained LLM generation, followed by a two-step human editing and validation phase aimed at ensuring youth-context realism. A multi-dimensional annotation covered influence presence, techniques, intentions, consequences, resistance, reactions, and annotation confidence. The dataset supports research on social influence detection, annotator disagreement, cross-lingual modeling, and the training and evaluation of language models. The dataset was created in Polish and is accompanied by a corresponding English version.

15.
arXiv (CS.CL) 2026-06-16

Ling and Ring 2.6 Technical Report: Efficient and Instant Agentic Intelligence at Trillion-Parameter Scale

Efficient and scalable agentic intelligence requires models that can deliver both low-latency responses and strong reasoning capabilities while remaining practical to train, serve, and deploy. In this report, we present Ling-2.6 and Ring-2.6, a family of models designed to address this challenge at scale. Ling-2.6 is optimized for instant response generation and high capability per output token, whereas Ring-2.6 is tailored for deeper reasoning and more advanced agentic workflows. Instead of training from scratch, we upgrade the Ling-2.0 base model through architectural migration pre-training and large-scale post-training. This upgrade is guided by a unified co-design of model architecture, optimization objectives, serving systems, and agent training environments, enabling improvements in both model capability and deployment efficiency. At the architectural level, we introduce a hybrid linear attention design that integrates Lightning Attention with MLA, improving the efficiency of long-context training and decoding. To further enhance token efficiency, we optimize capability per output token through Evolutionary Chain-of-Thought, Linguistic Unit Policy Optimization, bidirectional preference alignment, and shortest-correct-response distillation. For agentic capabilities, we propose KPop, a reinforcement learning framework designed to support stable training of Ring-2.6-1T on large-scale environment-grounded data. KPop improves training efficiency through asynchronous scheduling across coding, search, tool use, and workflow execution, enabling scalable learning from complex agent-environment interactions. Together, Ling-2.6 and Ring-2.6 provide a practical pathway toward efficient, scalable, and open agentic systems. We open-source all checkpoints in the 2.6 family to support further research and development in practical agentic intelligence.

16.
arXiv (quant-ph) 2026-06-24

Higher-Order Adiabatic Elimination in Atom-Cavity Systems and Its Impact on Spin-Squeezing Generation

arXiv:2506.22383v4 Announce Type: replace Abstract: Spin-squeezed states are metrologically useful quantum states where entanglement allows for enhanced sensing with respect to the standard quantum limit. Key challenges include the efficient preparation of spin-squeezed states and the scalability of estimation precision with the number $N$ of probes. Recently, in the context of the generation of spin-squeezed states via coupling of three-level atoms to an optical cavity, it was shown that increasing the atom-cavity coupling can be detrimental to spin squeezing generation, an effect that is not captured by the standard second-order adiabatic cavity removal approximation. We describe adiabatic elimination techniques to derive an effective Lindblad master equation up to third order for the atomic degrees of freedom. Numerical simulations show that the spin squeezing scalability loss is correctly reproduced by the reduced open system dynamics, highlighting the role of higher-order contributions. Furthermore, we conjecture an extension beyond leading order of the adiabatic elimination technique to the case of conditional dynamics under quantum non-demolition continuous measurement and fast cavity loss, whose reliability is again confirmed by numerical simulation of the dynamics and the corresponding behavior of spin squeezing as a function of $N$.

17.
PLOS Medicine 2026-05-20

Brain morphology in Anorexia Nervosa and its subtypes: A multi-cohort study of individual participant data

by Fabio Bernardoni, Dominic Arold, Luis Schoppik, Klaas Bahnsen, Ruiyang Ge, Clara Moreau, Lasse Bang, Federico D’Agata, Giovanni Abbate-Daga, Christian K. Tamnes, Iain Campbell, Owen O’Daly, Ulrike Schmidt, Guido Frank, Stefanie Horndasch, Andreas Hess, Arnd Dörfler, Hans-Christoph Friederich, Joe Simon, Angela Favaro, Luca Lavagnino, Christina E. Wierenga, Amanda Bischoff-Grethe, Amy E. Miles, Allan Kaplan, Aristotle Voineskos, Paul A. M. Smeets, Annemarie A. van Elburg, Unna Danner, Sophia I. Thomopoulos, Laura Berner, Neda Jahanshad, Sophia Frangou, Joseph A. King, Paul Thompson, Stefan Ehrlich Background In a recent coordinated meta-analysis of neuroimaging data, we reported gray matter (GM) alterations in acutely underweight patients with anorexia nervosa (AN). Here, we extend these findings by examining individual variation in brain structure within AN, individual-level differentiation between AN and healthy controls (HC), and differences between AN subtypes, with potential relevance for understanding clinical heterogeneity. Methods and findings We analyzed individual-level data from 11 international sites in the ENIGMA Eating Disorders Working Group, including 570 female participants with AN and 739 HC. We examined cortical thickness, cortical surface area and subcortical volumes in AN versus HC using three complementary approaches: (i) group-level differences in a mega-analysis correcting for age effects, (ii) frequencies of extreme deviations (infra-/supranormal; z  1.96) based on normative reference models by the CentileBrain Initiative, and (iii) individual-level classification performance using machine learning. The same analytic framework was applied to compare AN restricting versus binge-eating/purging subtype, additionally correcting for BMI effects.Mega-analyses reinforced previous meta-analytic findings of pronounced and widespread GM deficits in AN compared to HC. Normative modelling revealed that the frequency of infranormal z-scores (23/68 cortical thickness, 13/14 subcortical volume metrics) and supranormal z-scores (35/68 cortical thickness, 17/68 cortical surface area metrics) was significantly higher in AN than expected based on reference data. Individuals with AN could be reliably differentiated from HC using machine-learning classifiers (ROC–AUC = 0.75–0.81). In contrast, neither group-level differences nor frequency of extreme z-scores differed between AN subtypes, and individuals with different subtypes could not be reliably differentiated from each other. Importantly, the observational design cannot distinguish neurobiological differences related to AN from the effects of starvation or low BMI in the AN versus HC analyses. The lack of differences between subtypes does not exclude brain structural differences between AN subtypes that might be detectable with other modalities or analytic approaches. Conclusion Using a mega-analytic approach, we confirm widespread GM deficits in AN, show that these alterations are (in some patients) extreme, and demonstrate that they enable robust classification with superior performance compared to most MRI-based psychiatric classification studies. The absence of differences between AN subtypes may reflect shared neurobiology, though other imaging modalities may reveal distinctions beyond brain structure.

18.
arXiv (CS.LG) 2026-06-18

Signature filtering: a lightweight enhancement for statistical watermark detection in large language models

arXiv:2606.18430v1 Announce Type: new Abstract: Statistical watermarks help organizations attribute large language model (LLM) outputs, yet existing detectors often struggle when watermark signals are weak, texts are repetitive, or watermarks are edited. We propose signature filtering, a detection-time module that enhances watermark detection without modifying watermark embedding and text generation. It learns a small set of ``signature'' tokens whose presence makes watermark tests unreliable, and removes these tokens before detection. The signatures are obtained by solving a mixed-integer linear program on a small training set, with constraints that maximize the true positive rate. We additionally derive finite-sample and asymptotic bounds under several attacker models (color-blind, color-adaptive, and distributionally correlated). On four well-known watermark families (Kgw, Sweet, Unigram, Exp), four benchmark corpora (C4, MBPP, HumanEval, Code-Search-Net), and six LLMs (Opt-1.3b, Opt-6.7b, Llama2-13b, Llama3.1-8b, Qwen2.5-14b, Phi-3-medium-14b), 2- and 3-gram signatures raise detection rates in weak-signal and low-entropy settings from 8~31% without filtering to 78~99% with filtering, while keeping false positives controllable and often negligible. In stress tests where we scramble sentences and perturb 25~50% of tokens by dilution, deletions, and substitutions, 2-gram filters for Kgw-style watermarks preserve most of the clean-text detection gains, often matching or outperforming the advanced WinMax watermark detector. Signature filtering thus provides a simple, scalable, and model-agnostic add-on to strengthen watermark-based provenance checks for LLM text in information processing workflows.

19.
arXiv (quant-ph) 2026-06-25

Evolving Quantum Error-Correcting Encodings for Molecular Simulation

arXiv:2606.25870v1 Announce Type: new Abstract: Useful quantum algorithms require many coupled discrete design choices. We study LLM-driven evolutionary program synthesis – a language model edits a program, an external verifier scores the result, and high-scoring programs are retained and re-mutated – as a tool for quantum-computing research. As a case study, we apply this loop to the Generalized Superfast Encoding (GSE), a fermion-to-qubit encoding whose prior molecular constructions reach code distance $3$. The search discovered interpretable constructor programs whose codes have exact distance $5$ on the molecular instances tested, and distance $6$ on one $20$-mode instance, under strict stabilizer-coset semantics. To our knowledge these are the first GSE/superfast encodings beyond distance $3$ for dense molecular Hamiltonians. A second search, guided by verifier analysis of the first artifact, found a circulant constructor that reaches a five-qubits-per-mode floor on the tested $12$-, $14$-, $16$-, and $20$-mode instances, with certified dense-rule fallback at the failing $18$-mode case. As secondary resource descriptors, in a code-capacity memory comparison at $p=10^{-3}$ the resulting encodings use $4.2$–$5.0\times$ fewer data qubits than a scoped per-mode Jordan–Wigner $+$ $[[25,1,5]]$ surface route and have $3.4$–$8.2\times$ lower logical-failure rates under finite-weight decoding tables with explicit truncation brackets; we claim no circuit-level fault-tolerance or Trotter-cost advantage. The search trajectory illustrates a general operating lesson: rewarding distance alone selects trivial dense graphs, whereas holding verified distance fixed and rewarding compression selects structured rules.

20.
arXiv (CS.AI) 2026-06-19

Robust $Q$-learning for mean-field control under Wasserstein uncertainty in common noise

arXiv:2606.20356v1 Announce Type: cross Abstract: In this article, we present a robust $Q$-learning algorithm for discrete-time mean-field control problems under Wasserstein uncertainty in the common noise law. The algorithm combines a quantization-and-projection scheme with a Wasserstein dual reformulation on the common-noise space. We establish its convergence together with finite-time iteration bounds for both synchronous and asynchronous learning schemes. Numerical experiments on systemic risk and epidemic models compare the asynchronous implementation with an idealized Bellman iteration, illustrate the robustness-performance tradeoff under common-noise misspecification, and report the observed convergence behavior of the asynchronous $Q$-learning algorithm.

21.
arXiv (CS.CL) 2026-06-16

Emergent retokenization symmetry in large language models: phenomenology and applications

Tokenization introduces representational redundancy: under a fixed token vocabulary, every byte string admits many valid token encodings, or segmentations, that decode to the same surface string. However, given a prompt, most language model tokenizers break this representational symmetry by returning a canonical segmentation. Training only on canonical segmentations should influence inference behavior, and there is little reason to expect models to respect segmentation symmetry on downstream tasks. We find that this symmetry partially emerges during training. Here, we probe this emergent symmetry through experiments testing token compositional understanding, representation diversity, and task focused benchmark performance. We primarily use retokenization – replacing a prompt's canonical tokenization with an alternative segmentation while preserving its bytes exactly. Relative to other prompt perturbations, retokenization is unusually clean because it isolates segmentation effects without changing syntax, semantics or surface form. We use retokenization to study sensitivity and robustness to semantically identical input representations across pretraining and post-training. Moreover, this partial retokenization symmetry suggests a distinct inference-time sampling axis. While temperature sampling generates diverse outputs from the model using its next-token probability distribution, retokenization generates diversity from the model's internal computations through semantically equivalent input representations. We find that while this retokenization sampling strategy can hurt performance on easy problems, it can also recover solutions that conventional sampling does not find. Overall, our work presents retokenization as a simple yet powerful probe of large language models, shedding light on compositional understanding and prompt sensitivity, and offering a novel sampling strategy.

23.
arXiv (CS.LG) 2026-06-16

Beyond Artifacts: Towards Generalizable Synthetic Song Detection via Music-Intrinsic Features

arXiv:2606.16612v1 Announce Type: cross Abstract: The rapid advancement of AI music generators highlights the urgent need for reliable Synthetic Song Detection (SSD). Existing SSD methods often rely on low-level artifacts or fixed feature assumptions, struggling to capture generator-agnostic cues. To address this, we propose Sofia (Synthetic-song detection framework via music features), a flexible framework that models music-intrinsic attributes via feature-specific experts and an adaptive Mixture-of-Experts (MoE) module. By configuring Sofia with representative Vocal, Audio-effect, Global structure features, and their combinations, we present their individual and complementary contributions. To comprehensively evaluate our framework, we further construct MUSIC8K, a challenging benchmark featuring lastest emerging generators and realistic audio perturbations. Experiments show that Sofia learns generator-agnostic representations from music-intrinsic features, improving the F1 score by 18.5 points over the strongest baseline on MUSIC8K-O while maintaining strong robustness.

24.
medRxiv (Medicine) 2026-06-24

Structural ethnic inequities in maternal mortality between Indigenous and non-Indigenous women in Paraguay, 2014-2023: a national analysis of territorial, institutional, and preventable factors.

Background: Indigenous women in Paraguay continue to experience disproportionately high maternal mortality despite national efforts to improve maternal health. Evidence on the structural factors underlying these disparities remains limited. Objectives: To analyze structural ethnic inequities in maternal mortality between Indigenous and non-Indigenous women in Paraguay, focusing on territorial patterns, institutional access, and potentially preventable causes of death. Design: National population-based study using maternal mortality records registered in Paraguay between 2014 and 2023. Maternal mortality ratios (MMRs), incidence rate ratios (IRRs), and absolute differences were estimated according to Indigenous status. Logistic regression models were used to assess associations with deaths occurring outside healthcare institutions and specific preventable causes of death. Results: A total of 907 maternal deaths were identified, including 112 among Indigenous women (12.3%). Indigenous women were overrepresented by a factor of 4.8 relative to their population share. Maternal mortality remained consistently higher among Indigenous women throughout the study period, with mortality ratios ranging from 317.7 to 773.6 per 100,000 live births, compared with 58.7 to 145.1 among non-Indigenous women. Absolute inequalities remained persistently high over time. Overall, 24.3% of maternal deaths occurred outside healthcare institutions, with a substantially higher proportion among Indigenous women (44.6% versus 21.5%). After adjustment for age and educational level, Indigenous women had more than three times greater odds of dying outside healthcare institutions (aOR = 3.41; 95% CI: 2.20-5.29). Potentially preventable causes accounted for 42.4% of maternal deaths. Obstetric hemorrhage was strongly associated with Indigenous status (aOR = 3.83; 95% CI: 2.31-6.37). Conclusion: Indigenous women in Paraguay experience a disproportionate burden of maternal mortality characterized by persistent ethnic disparities, higher occurrence of deaths outside healthcare institutions, and a substantial burden of preventable causes of death. These findings suggest the presence of enduring territorial, institutional, and healthcare access barriers that contribute to structural ethnic inequities in maternal health.

25.
arXiv (CS.CV) 2026-06-17

MuseVLA: An Adaptive Multimodal Sensing Vision-Language-Action Model for Robotic Manipulation

Humans naturally leverage diverse sensing modalities to interact with the physical world, while most Vision-Language-Action (VLA) models for robotics rely solely on RGB observations. This limits their ability to perceive physical properties that are difficult or impossible to infer from RGB cameras, such as temperature, sound, or radar response. We present MuseVLA, an adaptive multimodal sensing VLA model that integrates novel sensors as on-demand tools for robotic manipulation. Given a task instruction and visual context, MuseVLA first generates a sensor token and target description that select the sensing modality to invoke and what to attend to, analogous to a tool call with arguments. It then converts the selected sensor measurement into a grounded sensor image, a unified intermediate representation that encodes heterogeneous readings for multimodal fusion and action generation. This design decouples sensor-specific processing from the VLA backbone, enabling efficient integration of diverse modalities. To reduce the need for expensive multisensory robot datasets, we further introduce a data synthesis pipeline that augments existing RGB video datasets with grounded sensor images, enabling generalization to unseen sensor-guided tasks. We evaluate MuseVLA on a real-world robot across challenging dexterous hand manipulation tasks that require multimodal sensing inputs, including temperature-guided pick-and-place, audio-driven object search, and radar-assisted hidden object retrieval. MuseVLA achieves 80.6% success rate on average, outperforming RGB-only and multisensory VLA baselines significantly, and exhibits strong zero-shot capabilities on unseen tasks.