Academic Intelligence · Curated Daily

Explore the Frontier of Global Academia

AcademicHub aggregates real-time literature from top journals and preprint platforms. Build your personal research radar and let large language models compile cross-disciplinary analysis briefings automatically.

01.
arXiv (CS.CV) 2026-06-24

Tri-Efficient Transfer Learning for Point Cloud Videos

While point cloud foundation models have significantly advanced point cloud video understanding, existing parameter-efficient fine-tuning (PEFT) methods still suffer from two critical limitations: prohibitive annotation costs for large-scale point cloud datasets and severe memory bottlenecks. In this paper, we aim to mine richer supervision signals from existing data rather than blindly scaling datasets. A further key principle is that the memory footprint of fine-tuning must be drastically reduced compared to full fine-tuning, which remains elusive for current PEFT techniques. Driven by these challenges, we identify three core desiderata: data-, parameter-, and memory efficiency, and present PoinTriE, a unified framework that excels along all three dimensions. For pre-training, pseudo-motion trajectories are synthesized via rigid transformations, paired with text corpora and 2D projections derived from raw point clouds. We then propose a Geometric-Motion Duality Network optimized via multimodal contrastive learning, rigid rotation prediction, and motion distribution divergence to produce dense self-supervision. During fine-tuning, we freeze the pretrained backbone and only update a lightweight Spatio-temporal Side Network built with LoRA units. Equipped with a gradient flow masking strategy, PoinTriE simultaneously reduces memory consumption and parameter overhead. Extensive experiments confirm that PoinTriE establishes new state-of-the-art results on action recognition and semantic segmentation tasks.

02.
arXiv (CS.AI) 2026-06-11

Automated Mediator for Human Negotiation: Pre-Mediation via a Structured LLM Pipeline

arXiv:2606.11379v1 Announce Type: new Abstract: Pre-mediation, the preparatory phase preceding direct human negotiation, plays a critical role in achieving mutually beneficial agreements, yet is often omitted due to cost, time, and limited access to trained mediators. We introduce an automated mediator for human negotiation, implemented as a structured pipeline of LLM modules, that supports pre-mediation in integrative negotiation settings. The pipeline decomposes preparation into specialized modules for dialogue, preference prediction, response-level critique, and structured summarization, separating inference, generation, and evaluation to address limitations of monolithic single-prompt approaches. We use the term "agent" for each module following common LLM-systems terminology, but the components are not autonomous and do not interact peer-to-peer; outputs are passed forward in a fixed sequence. We evaluate the system in two controlled human-subject experiments comparing AI-based pre-mediation with professional human mediators in a multi-issue negotiation scenario. On short-term self-reported measures, the automated mediator achieves preparation outcomes broadly comparable to human mediators, including trust in the mediator and confidence in reaching mutually beneficial agreements, while achieving substantially lower error on the preference-inference task under our scenario and prompts (36% lower RMSE). A second study shows that targeted prompt refinements reduce excessive affirmation patterns from 36.6% to 16.8%, matching human mediator baselines. Our findings suggest that structured LLM pipelines can provide scalable, low-effort pre-mediation support broadly comparable to human mediators on short-term self-reported preparation outcomes. The pipeline's single-party design mirrors how human mediators run pre-mediation today and enables parallel deployment across all parties to a dispute, supporting scalability.

03.
arXiv (CS.LG) 2026-06-16

Benchmarking Instance-Dependent Label Noise with Controlled Corruptions

arXiv:2606.14965v1 Announce Type: new Abstract: Synthetic instance-dependent label noise (IDN) benchmarks are widely used to evaluate noisy-label learning methods, yet existing approaches typically generate noise through imperfect annotators or classifier raters, leaving the source of ambiguity implicit. We introduce CILN, a benchmark generation framework that creates IDN through controlled input corruptions. A diverse voter pool labels corrupted instances, producing benchmark datasets in which both the source and severity of ambiguity are explicit and controllable. Using CIFAR10, MNIST, and Adult, we construct 90 benchmark settings spanning multiple corruption families and severity levels. Our experiments show that the resulting benchmarks exhibit genuine instance-dependent noise, provide diverse confusion structures, and, on CIFAR-10, can produce label distributions that are closer to human uncertainty than an existing synthetic IDN benchmark. We further demonstrate that corruption-mediated IDN can expose failure modes of popular noisy-label learning methods, including Co-Teaching and DivideMix, that are not observed under comparable levels of rater-fallibility noise. These findings suggest that noise structure, not only noise rate, plays an important role in benchmark difficulty and algorithm behavior. By making ambiguity generation explicit and controllable, CILN provides a complementary benchmarking framework for studying noisy-label learning under diverse sources of instance difficulty.

04.
arXiv (CS.AI) 2026-06-12

MP3: Multi-Period Pattern Pre-training forSpatio-Temporal Forecasting

arXiv:2606.13119v1 Announce Type: cross Abstract: Spatio-Temporal forecasting is crucial in diverse fields, such as transportation, climate, and energy. Urban spatio-temporal data exhibits temporal mirage: similar short-window inputs have divergent future trends, and vice versa. Existing spatio-temporal graph neural networks (STGNNs) cannot effectively identify such mirages. We argue that the core reason lies in the short-window inputs that have incomplete period observation, heterogeneous global spatial correlation, and cross-period superposition causality. To bridge this gap, we develop a novel Multi- Period Pattern Pre-training (MP3), a plug-and-play pre-training plugin for distinguishing temporal mirages. MP3 presents two core innovations: (1) The multi-period pattern learning is designed to learn multi-period patterns from long time series. Specifically, multi-period temporal modeling leverages edge convolution to identify different multi-period patterns. Multi-period spatial modeling uses a bottleneck project and a global memory bank to capture heterogeneous global spatial relations efficiently. Cross-period pattern interaction employs a causality-enhanced Transformer to capture dependencies across different period patterns. (2) This plugin can seamlessly integrate into existing STGNN backbones to strengthen their forecasting performance. The experiment on five STGNN baselines across five real-world datasets (including a large-scale dataset CA) verify the effectiveness, superior scalability and strong adaptability of MP3, which brings consistent and robust performance improvements across all evaluated baselines. On average, MP3 reduces the MAE 4.7% and the RMSE 5.0%. The code can be available at https://github.com/YAN-outlook/MP3.

05.
arXiv (CS.AI) 2026-06-25

Tinker Tales: A Tangible Dialogue System for Child-AI Co-Creative Storytelling

arXiv:2602.04109v2 Announce Type: replace-cross Abstract: Conversational AI agents are increasingly explored as creative partners, yet how conversation design shapes child-AI dialogue in co-creative settings remains underexplored. We present Tinker Tales, a tangible dialogue system for child-AI collaborative storytelling, in which educational frameworks (narrative development and social-emotional learning) are instantiated as conversation design, shaping how the agent engages children across four narrative stages. The system combines a physical storytelling board, NFC-embedded toys, and a mobile app mediating multimodal interaction through tangible manipulation and voice-based dialogue. We conducted a home-based user study with 10 children (ages 6-8) across two conversation design conditions varying in how the agent structured elaboration, with and without educational scaffolding. Our findings show that prompt framing shapes the form and consistency of children's narrative contributions, structuring how they participate in co-creative dialogue with AI.

06.
arXiv (quant-ph) 2026-06-16

Phase controlled spectral topology, dynamic stability and sensitivity in Non-Hermitian Cavity Magnonics

arXiv:2606.16522v1 Announce Type: new Abstract: We theoretically investigate a non-Hermitian cavity-magnon platform in which coherent photonmagnon interactions and reservoir-mediated dissipative coupling interfere through a single externally tunable phase. We show that this interference phase provides a universal control parameter that continuously rotates the effective coupling between Hermitian and anti-Hermitian regimes, enabling dynamic transitions between level repulsion and level attraction without modifying intrinsic system parameters. The resulting phase-controlled non-Hermitian topology gives rise to exceptional points, linewidth engineering, and zero-damping conditions. Owing to the propagation-direction dependence of the dissipative interaction, the system further exhibits strong nonreciprocal transport and phase-tunable isolation arising from asymmetric hybridization of the cavity and magnon modes. Beyond its spectral and transport properties, we establish a direct connection between nonHermitian spectral topology and nonequilibrium population dynamics. The interference phase governs the stability of the hybrid modes, driving transitions between stable relaxation, critical slowing down near exceptional points, oscillatory energy exchange, and exponentially amplified dynamics. We further demonstrate that the same phase-controlled exceptional topology can be exploited for enhanced sensing, where the eigenvalue response exhibits the characteristic square-root scaling associated with exceptional-point physics. Our results provide a unified framework linking spectral topology, directional transport, dynamical stability, and sensing functionality through reservoirengineered interference in cavity magnonic systems.

07.
arXiv (quant-ph) 2026-06-25

From spectral structure to sensing limits in quantum thermometry

arXiv:2606.25933v1 Announce Type: new Abstract: The precision of a quantum thermometer is fundamentally constrained by the spectral structure of the probe itself, and a systematic mapping between the configurations of energy levels and thermometric performance provides relevant information to design optimized devices. In this work, we establish such a mapping by analyzing a broad class of quantum systems, ranging from finite spin ensembles and degenerate atoms to confining potentials, quantum walks, and continuous-spectrum models. We derive exact scaling laws for the quantum Fisher information, revealing two distinct high-temperature universality classes: finite-spectrum probes exhibit a $T^{-4}$ decay, while unbounded or continuous spectra yield a slower $T^{-2}$ decay. At low temperatures, we show that sensitivity, though universally exponentially suppressed, can be enhanced arbitrarily by engineering degenerate excited states or a quantum walk on a fully connected topology. By contrast, specific quantum walk topologies provide a distinct enhancement mechanism based on gap engineering, whereby an optimal network size yields an optimized $T^{-2}$ low-temperature scaling. Furthermore, power-law spectra enable tunable scaling of thermometric performance with system size, offering a design principle for optimal probes in specific temperature windows. Our results contribute to transform spectral information into a resource for quantum thermometry, providing both fundamental bounds and practical guidelines to tailored temperature sensing.

08.
arXiv (quant-ph) 2026-06-17

Optimality Condition for the Petz Map

arXiv:2410.23622v5 Announce Type: replace Abstract: In quantum error correction, the Petz map serves as a perfect recovery map when the Knill-Laflamme conditions are satisfied. Notably, while perfect recovery is generally infeasible for most quantum channels of finite dimension, the Petz map remains a versatile tool with near-optimal performance in recovering quantum states. This work introduces and proves, for the first time, the necessary and sufficient conditions for the optimality of the Petz map in terms of entanglement fidelity. In some special cases, the violation of this condition can be easily characterized by a simple commutator that can be efficiently computed. We provide multiple examples that substantiate our new findings.

09.
arXiv (CS.AI) 2026-06-25

Agentic Knowledge Tracing: A Multi-Agent LLM Architecture for Stealth Assessment of Financial Literacy in Serious Games

arXiv:2606.25358v1 Announce Type: new Abstract: Assessing financial literacy during gameplay without disrupting the learning experience remains a key challenge in serious games for education. We present the Agentic BKT pipeline, a multi-agent large language model architecture for stealth assessment of financial competencies from open-ended gameplay events. The pipeline processes events from a 2D platformer serious game aligned with the OECD/INFE financial literacy framework through four phases: (1) the game captures every player decision as a structured event log; (2) an LLM event classifier labels each action on a four-point rubric validated against three domain experts (Fleiss kappa = 0.624, substantial agreement); (3) four domain-specific agents specializing in risk mitigation, investing, spending, and credit management perform session-level reasoning over behavioral trajectories, feeding per-competency Bayesian Knowledge Tracing that estimates mastery within each domain; and (4) an expert judge agent synthesizes the domain-level estimates into an overall mastery score. Evaluated with 193 K-12 participants across 264 game sessions, the Agentic BKT pipeline yields mastery estimates significantly correlated with learning gain (r = 0.276, p = 0.0001) and post-test scores (r = 0.333, p < 0.0001) while showing no correlation with pre-test scores, providing both convergent and discriminant validity. The multi-agent approach approximately triples the predictive validity of a single-LLM baseline (r = 0.095, not significant) in this study, demonstrating that domain decomposition and session-level reasoning play a central role in capturing the multidimensional nature of financial literacy from gameplay

10.
medRxiv (Medicine) 2026-06-15

ECHOCARDIOGRAPHY ABNORMALITIES IN PREECLAMPSIA WITH SEVERE FEATURES.

Purpose To determine the frequency of echocardiographic abnormalities in women with preeclampsia with severe features. To describe the spectrum and types of echocardiographic abnormalities associated with preeclampsia with severe features. Method This is a Prospective observational study conducted in Vani Vilas hospital attached to Bangalore Medical College and Research Institute, Bangalore from January 2023 to December 2025. 560 pregnant women diagnosed with severe preeclampsia(SPE) were included in the study. Chronic hypertension without superimposed preeclampsia, underlying cardiac diseases and previous history of peripartum cardiomyopathy were excluded from the study. Transthoracic echocardiography-TTE (2D ECHO) was done to evaluate cardiac structure and function. Echocardiographic abnormalities identified during the study were documented and analysed using descriptive statistical methods. Results Abnormalities in ECHO was noted in 23.03%. A unique finding was the documentation of elevated pulmonary artery systolic pressures (PASP) suggestive of Pulmonary Hypertension (PH) (PASP >35 mm HG) among 20.25% of the participants. It was also the commonest abnormality on ECHO. Mild PH was the commonest (15.71%), moderate PH was seen in 3.92% and severe PH in 0.71% of cases. Next most frequent abnormality was moderate to severe valvular regurgitation (10%), followed by left ventricular hypertrophy (5.53%). Diastolic dysfunction (DD) was seen in 3.92%, systolic dysfunction(SD) in 3.57%, chamber dilatation in 3.57% and LV global hypokinesia in 3.03% cases of SPE Conclusion Preeclampsia with severe features (SPE) is associated with 23.03% abnormalities on echocardiography. SPE is associated with systolic dysfunction, diastolic dysfunction, chamber dilatation, valvular regurgitation, left ventricular hypertrophy and pulmonary hypertension.

11.
medRxiv (Medicine) 2026-06-17

Diagnostic Concordance of Immediate Versus 1-Hour Technetium-99m Hydroxydiphosphonate Scintigraphy in Suspected Transthyretin Amyloid Cardiomyopathy

Background Bone-avid tracer myocardial scintigraphy for the diagnosis of transthyretin amyloid cardiomyopathy (ATTR-CM) has traditionally employed imaging at one or 3-hour intervals. Technetium-99m hydroxydiphosphonate (99mTc-HDP) has unique characteristics that may enable earlier imaging. We investigated the diagnostic concordance of immediate versus 1-hour acquisitions. Methods Consecutive patients with suspected ATTR-CM underwent planar imaging and SPECT/CT immediately and at 1-hour following the administration of 99mTc-HDP. Perugini grades and heart to contralateral lung (H/CL) ratios were assessed. Target-to-background ratios (TBRs) were calculated on the SPECT/CT acquisitions using the left ventricular (LV) septum and three background regions: aorta, LV blood-pool, and vertebrae. We assessed diagnostic concordance using Cohen's Kappa ({kappa}), temporal stability using paired t-tests, and correlation between timepoints using Pearson's coefficient (r). The 1-hour SPECT/CT interpretation served as the protocol reference standard. Results Forty-eight patients (83% male; median age, 80 [73-85] years) were evaluated. One-hour SPECT/CT identified 19 positive and 29 negative cases. Immediate SPECT/CT demonstrated 100% diagnostic concordance with the 1-hour reference standard ({kappa} = 1.000; 95% CI: 1.00 to 1.00; p < 0.001). The LV septum/LV Blood-Pool TBR showed the highest correlation (r = 0.956; 95% CI: 0.922 to 0.975; p < 0.001). The LV Septum/Aorta TBR demonstrated high correlation (r = 0.918; 95% CI: 0.857 to 0.953; p < 0.001) and remained stable in the ATTR-negative cohort (-0.02; 95% CI: -0.08 to 0.04; p = 0.54). Significant decrease in the LV Septum/Vertebrae TBR in the ATTR-negative (-0.55; 95% CI: -0.64 to -0.47; p < 0.001) and ATTR-positive cohorts (-1.14; 95% CI: -1.39 to -0.89; p < 0.001) was observed. Conclusions Immediate 99mTc-HDP SPECT/CT is diagnostically concordant with standard 1-hour protocols. By leveraging SPECT/CT and the favorable kinetics of 99mTc-HDP, immediate-phase imaging can accurately reproduce 1-hour acquisitions in cases of suspected ATTR-CM. This expedited approach may improve nuclear laboratory throughput and patient satisfaction.

12.
arXiv (quant-ph) 2026-06-16

Degeneracy Cannot Violate the Quantum Hamming Bound

arXiv:2606.15558v1 Announce Type: new Abstract: The quantum Hamming bound is the standard finite-length sphere-packing bound for exact correction of arbitrary qubit errors. Whether degeneracy can evade this bound has remained unresolved in full generality for nearly three decades: distinct correctable errors may act identically on the code space, so the usual disjoint-sphere argument breaks down. We prove that every exact binary quantum subspace code with $K>1$ obeys the bound, without assuming either nondegeneracy or additivity. Our proof turns the Li–Xing linear-programming polynomial into an exact intersection count for quaternary Hamming balls. Monotonicity in block length and in ball-center separation then reduces the problem to a local node–edge charging inequality at the shortest admissible length. Thus degeneracy can merge correctable error sectors, but cannot enlarge the finite-length binary Hamming bound.

13.
arXiv (quant-ph) 2026-06-19

Many-Body Protection of Topological Edge Memory in Strong Interacting Quenches

arXiv:2606.19437v1 Announce Type: cross Abstract: Quantum quenches drive edge states far from equilibrium, yet whether the memory of a topological initial state survives in a non-integrable, interacting system has remained largely unexplored. We study this question in the bond-alternating XXZ chain – an interacting Su–Schrieffer–Heeger model hosting symmetry-protected topological edge modes with markedly enhanced boundary magnetization – and analyze quenches across all combinations of single-particle and many-body initial and final Hamiltonians. The results organize by a single distinction as we rigorously establish in this work: whether the post-quench Hamiltonian is free or genuinely interacting. For a free post-quench Hamiltonian, the dynamics is solved exactly by a correlation-matrix approach; the boundary-mode return amplitude decays as $t^{-3/2}$, and initial interactions enter only through a dressed one-body density matrix. For a genuinely interacting post-quench Hamiltonian, finite-time stability bounds prove that away from local resonances the first-dimer magnetization remains stable on time windows growing as arbitrarily large powers of the inverse inter-dimer coupling. Matrix product state simulations across all four protocols show that interactions in the final Hamiltonian markedly extend finite-time boundary memory – with local suppression near the isotropic $SU(2)$ point – revealing a many-body protection mechanism in a non-integrable system where scrambling would otherwise wash out initial-state memory fast.

14.
arXiv (CS.LG) 2026-06-19

Online Dynamic Batching with Formal Guarantees for LLM Training

arXiv:2606.19989v1 Announce Type: cross Abstract: Modern LLM training breaks a core assumption behind offline batch samplers: the true training cost of a sample is only observable after preprocessing, augmentation, templating, tokenization, and multimodal visual-token expansion. Unless one pays for a preprocessing- and augmentation-dependent length cache, batch construction is therefore blind to the quantity that determines padding, memory use, and GPU saturation. We introduce Online Dynamic Batching (ODB), a DataLoader-side drop-in system that moves batch formation to this point of accurate observability while preserving DDP step alignment. We formalize this synchronization requirement as the Distributed Group Alignment Problem and prove deadlock-free bounded termination with default join-mode identity coverage and opt-in non-join sample-quota closure. ODB requires no model, optimizer, or attention-kernel changes and is released as online-dynamic-batching with lightweight trainer adapters. Across public 2B/8B Qwen3-VL runs on UltraChat/LLaVA/ShareGPT4o, ODB improves literal emitted-sample throughput vs. fixed-batch Standard by 1.58-2.51x on single-node Full FT/LoRA and 1.71-3.78x on two-node Full FT, with Standard-comparable quality; production MM-Mix reaches 4.43x. Against GMT/BMT offline token-budget oracles, ODB is within 15% on UltraChat/LLaVA and faster on high-CV ShareGPT4o: 2.24-2.39x single-node Full FT/LoRA and 3.06-3.69x two-node Full FT. Together, ODB occupies the online/drop-in regime for high-heterogeneity LLM fine-tuning: large throughput gains at Standard-comparable quality, formal DGAP guarantees, and no length-cache precompute or kernel rewrites.

15.
arXiv (CS.CL) 2026-06-11

"Do Not Mention This to the User": Detecting and Understanding Malicious Agent Skills in the Wild

LLM-based coding agents increasingly rely on third-party extensions called skills, which bundle natural language instructions and helper scripts that execute with full user privileges. Community registries have emerged to distribute these skills, but the security implications remain unstudied due to the absence of labeled threat data. This paper presents a systematic security analysis of 98,380 skills collected from two major registries. Through a combination of static pattern matching and dynamic behavioral verification, we identify 157 skills exhibiting confirmed malicious behavior, encompassing 632 distinct vulnerabilities across 13 attack techniques. Our analysis reveals that these threats are deliberate rather than accidental: each malicious skill contains an average of 4.03 vulnerabilities spanning multiple attack phases. We identify two dominant attack strategies with statistically significant negative correlation – credential theft via remote code execution, and agent manipulation through adversarial instructions embedded in documentation. Over half of all confirmed cases originate from a single threat actor employing templated brand impersonation at scale. We further observe that attack sophistication correlates with concealment investment, with advanced skills universally employing undocumented capabilities while also exploiting platform-native trust mechanisms. Following responsible disclosure, registry maintainers removed all 157 (100%) of the reported skills. Our dataset and detection pipeline are publicly available to facilitate future research on securing LLM agent ecosystems.

16.
arXiv (CS.AI) 2026-06-17

Combating Data Laundering in LLM Training

arXiv:2604.01904v3 Announce Type: replace-cross Abstract: Post-hoc unauthorized-training data detection for large language models (LLMs) typically assumes a query-with-originals regime: rights holders query a target LLM with raw proprietary data and assess whether the model assigns them stronger memorization-based detection signals, e.g., higher confidence or lower loss, than held-out non-training reference texts. We show that this regime becomes brittle under data laundering, where the target LLM is trained on semantics-preserving but stylistically or structurally transformed surrogates of proprietary data to obfuscate provenance. Since training-time exposure occurs in the laundered form, memorization signals may no longer appear on the originals, collapsing the candidate-reference signal separation that standard detectors rely on. We counter this threat by studying laundering-aware detection with raw proprietary data, a held-out reference corpus, and query access to the target LLM, while the laundering transformation is undisclosed. Since exact recovery of the laundered corpus is infeasible, we infer a detection-useful synthesis process via an auxiliary LLM that maps originals into training-like queries. To make this search tractable, we introduce Synthesis Data Reversion (SDR), which constrains the unbounded space of natural-language transformations through a goal-details abstraction: a high-level transformation goal, e.g., "lyrical rewriting", and fine-grained details, e.g., "with vivid imagery". SDR identifies the most likely goal and iteratively refines details so synthesized queries elicit stronger target-model detection signals. Evaluated on the MIMIR benchmark against diverse laundering practices and target LLM families (Pythia, Llama2, and Falcon), SDR consistently restores detection signals, offering a practical auditing layer against data laundering.

17.
arXiv (CS.LG) 2026-06-16

Ricci-Filtration: Boosting Retrieval-Augmented Generation Reranker to Query-Answer Tasks by Discrete Ricci Flow

arXiv:2606.15482v1 Announce Type: cross Abstract: Ricci flow is a curvature-guided diffusion process that deforms space by shrinking regions of high positive curvature and expanding those with negative curvature. Similarly, discrete Ricci flow on weighted graphs modifies edge weights by shrinking edges with positive Ricci curvature and stretching those with negative Ricci curvature, effectively increasing the separation between clusters. Inspired by these two cornerstone works, we propose a geometry-based RAG reranker enhancement procedure called Ricci-Filtration. By modeling the input query and initial retrieved chunks as a network, where the input query and chunks serve as nodes and embedding-based pairwise relations define an initial graph, Ricci-Filtration leverages discrete curvature and Ricci flow to evaluate the structural importance of each chunk with respect to the user query. The system first filters the initial chunks based on their geometric curvature relative to the query; then, a reranker processes the remaining chunks to enhance generative performance. We theoretically prove that normalized discrete Ricci flow can detect community structures by identifying distinct asymptotic behaviors in edge weights. This supports the removal of ``noisy'' document chunks characterized by large weights and negative Ricci curvature relative to the query node. Extensive experiments confirm that Ricci-Filtration outperforms several baseline reranking methods in accuracy, precision, recall, and F1 scores. Furthermore, ablation studies demonstrate that the Ricci-Filtration generally outperforms the baseline under various settings, highlighting the framework's robustness across different architectures.

18.
arXiv (CS.CL) 2026-06-12

Agent-based models for the evolution of morphological alternation patterns

Why is the past of English "go" the apparently unrelated "went"? Such alternations are frequent in languages. They neither aid communication nor learnability, yet they can be persistent, surviving over centuries or millennia. We present a multi-agent simulation of the emergence of morphological stem and inflection alternations. Alternate forms arise by phonological changes or, as with "go/went", from lexical alternatives associated with a subset of the population. When an agent 'hears' another agent use a novel form for a slot in the paradigm of a word (say, the past tense of go), they will with some probability adopt that form, possibly spreading its use to other slots in the paradigm that shared the same original form. Thus alternative forms can spread through the population and become entrenched as stem or inflectional marker alternants. Unlike many previous computational studies, our system allows for naturalistic lexical forms, realistic phonological rules, lexicons with hundreds or thousands of entries, and agent populations in the tens or hundreds. It supports several network topologies, diffusion patterns and agent adoption policies. One issue with such simulations is evaluation: how realistic is the resulting morphology compared to those of real languages? We introduce the AI Historical Linguist, a novel Large Language Model-driven system that models a debate between two historical linguists. We use this to compare a set of real language morphologies, disguised morphologies, and experimentally evolved morphologies. The results suggest that among the factors that favor more plausible morphologies are scale-free social networks and random Bernoulli adoption of forms. We also present three case studies modeling attested historical changes, allowing us to test what might have happened if history had been different. All code and data are released.

19.
arXiv (CS.CL) 2026-06-18

PatchWorld: Gradient-Free Optimization of Executable World Models

Text-agent environments are typically modeled as partially observable Markov decision processes (POMDPs), assuming that the simulator's latent state and transition dynamics are hidden from the agent. Yet little work has examined whether executable code can be induced to serve as a world model for prediction and planning under partial observability. We introduce PatchWorld, a gradient-free framework that turns offline trajectories into executable Python world models through counterexample-guided code repair. Instead of predicting the next observation with a black-box model, PatchWorld induces symbolic belief-state programs whose action updates can be inspected, replayed, and locally patched. Across seven AgentGym environments, PatchWorld-Simple achieves the highest code-based planning score among evaluated methods, reaching 76.4\% macro success in live one-step lookahead while invoking no LLM calls inside the world-model prediction module itself. We further find that a human-specified residual-memory bias improves surface observation fidelity but weakens decision utility. This exposes a tradeoff in executable world models, since improving observation fidelity can come at the expense of action-discriminative dynamics, and vice versa. Code is available at https://github.com/HKBU-KnowComp/PatchWorld.

20.
arXiv (CS.CL) 2026-06-16

CentroidKV: Efficient Long-Context LLM Inference via KV Cache Clustering

Large language models (LLMs) with extended context windows have become increasingly prevalent for tackling complex tasks. However, the substantial Key-Value (KV) cache required for long-context LLMs poses significant deployment challenges. Existing approaches either discard potentially critical information needed for future generations or offer limited efficiency gains due to high computational overhead. In this paper, we introduce CentroidKV, a simple yet effective framework for online KV cache clustering. Our approach is based on the observation that key states exhibit high similarity along the sequence dimension. To enable efficient clustering, we divide the sequence into chunks and propose Chunked Soft Matching, which employs an alternating partition strategy within each chunk and identifies clusters based on similarity. CentroidKV then merges the KV cache within each cluster into a single centroid. Additionally, we provide a theoretical analysis of the computational complexity and the optimality of the intra-chunk partitioning strategy. Extensive experiments across various models and long-context benchmarks demonstrate that CentroidKV achieves up to 75% reduction in KV cache memory usage while maintaining comparable model performance. Moreover, with minimal computational overhead, CentroidKV accelerates the decoding stage of inference by up to $1.92\times$ and increases the serving throughput by up to $4\times$.

21.
arXiv (CS.CV) 2026-06-25

Minimalist Preprocessing Approach for Image Synthesis Detection

Generative models have significantly advanced image generation, resulting in synthesized images that are increasingly indistinguishable from authentic ones. However, the creation of fake images with malicious intent is a growing concern. Low-configured smart devices have become highly popular, making it easier for deceptive images to reach users. Consequently, the demand for effective detection methods is increasingly urgent. In this paper, we introduce a simple yet efficient method that captures pixel fluctuations between neighboring pixels by calculating the gradient, which highlights variations in grayscale intensity. This approach functions as a high-pass filter, emphasizing key features for accurate image distinction while minimizing color influence. Our experiments on multiple datasets demonstrate that our method achieves accuracy levels comparable to state-of-the-art techniques while requiring minimal computational resources. Therefore, it is suitable for deployment on low-end devices such as smartphones. The code is available at https://github.com/vohoaidanh/adof.

22.
arXiv (quant-ph) 2026-06-19

Ultrafast nonadiabatic dynamics of tetraphenylsubstituted nitrogen-based heterocycles

arXiv:2604.16897v2 Announce Type: replace-cross Abstract: Tetraphenylpyrazine (TPP) and 2,3,4,5-tetraphenyl-1H-pyrrole (TePP) are closely related heterocycles bearing four phenyl substituents, whose structural similarity makes them a useful pair for comparing how intramolecular flexibility influences excited-state relaxation and emission in the gas phase and in the solid state. TPP is a prototypical solid-state luminescence enhancement (SLE) emitter, exhibiting a markedly increased quantum yield upon molecular aggregation. In contrast, TePP displays similar quantum yields in solution and solid state, characteristic of dual-state emission (DSE). This behaviour indicates that intramolecular rotations are already significantly hindered in the isolated-molecule regime, consistent with our previous observations for TPP and other solid-state emitters (Hernández-Rodríguez et al., ChemPhysChem, 2024, 25, e202400563). To unravel the excited-state dynamics underlying this contrasting behaviour, we performed mixed quantum-classical trajectory simulations on a single molecule of TPP and TePP employing the surface-hopping method. Twelve singlet states were included at the TD-B3LYP-D3/def2-SVP level, which were previously benchmarked against coupled cluster methods. Simulated observables such as gas phase ultrafast electron diffraction (GUED) and time-resolved fluorescence (TR-FL) signals allow us to dissect the distinct deactivation pathways operating in both systems in the gas phase, while also providing mechanistic insight into how these pathways are expected to evolve in solution and solid-state environments.

23.
arXiv (CS.CV) 2026-06-19

Shape of Thought: Progressive Object Assembly via Visual Chain-of-Thought

Multimodal models for text-to-image generation have achieved strong visual fidelity, yet they remain brittle under compositional structural constraints, notably generative numeracy, attribute binding, and part-level relations. To address these challenges, we propose Shape-of-Thought (SoT), a visual CoT framework for process-supervised progressive shape assembly in the rendered 2D domain, without external engines at inference time. SoT trains a unified multimodal autoregressive model to generate interleaved textual plans and rendered intermediate states, helping the model capture shape-assembly logic without producing explicit geometric representations. Unlike text-only CoT, each decision is grounded in a rendered state, making counts, attachments, topology, and intermediate part-addition errors inspectable across the trajectory. To support this paradigm, we introduce SoT-26K, a large-scale dataset of grounded assembly traces derived from part-based CAD hierarchies, and T2S-CompBench, a benchmark for evaluating structural integrity and trace faithfulness. Fine-tuning on SoT-26K achieves 88.4% on component numeracy and 84.8% on structural topology, outperforming direct generation by +24.2 points on component numeracy and +19.3 points on structural topology. SoT establishes a transparent testbed for rendered-domain structure-aware generation. The code is available at https://github.com/yuhuo03/Shape-of-Thought.

24.
arXiv (CS.AI) 2026-06-19

FlowMaps: Modeling Long-Term Multimodal Object Dynamics with Flow Matching

arXiv:2606.20209v1 Announce Type: cross Abstract: Joint spatial and temporal understanding of 3D scenes is a crucial requirement for robots deployed in everyday household environments. Such agents must not only comprehend and navigate spatial layouts, but also reason about how these spaces evolve over time. In particular, humans interact with objects daily, causing them to change position throughout the environment and making it difficult for robots to reliably associate current observations with previously seen objects. However, these interactions are not random: human habits and routines induce spatio-temporally consistent patterns in object locations, which robotic agents can potentially learn and then exploit for downstream tasks such as navigation. To this end, we introduce FlowMaps, a latent flow matching model for estimating multimodal distributions over the future locations of dynamic objects in a continuous 3D space. By learning the implicit dependencies among objects and their temporal evolution, FlowMaps predicts likely changes in object locations conditioned on past human interactions, while supporting generalization across previously unseen environments that share similar object routines. To demonstrate the utility of this method, we deploy FlowMaps in a downstream dynamic Object Navigation task in both simulated and real-world environments. Across more than 600 episodes, FlowMaps outperforms state-of-the-art approaches, showing that modeling object dynamics through continuous, multimodal spatio-temporal distributions improves robotic search and navigation in changing household environments. Code and additional material is available at https://fra-tsuna.github.io/flowmaps/.

25.
arXiv (CS.AI) 2026-06-18

Spotlight: Synergizing Seed Exploration and Spot GPUs for DiT RL Post-Training

arXiv:2606.19004v1 Announce Type: cross Abstract: Reinforcement learning (RL) post-training of Diffusion Transformers (DiTs) is prohibitively expensive, requiring thousands of high-end GPUs. Existing works explore two directions to reduce cost: seed exploration improves training convergence by selecting high-contrast samples, yet adds compute to the critical path; spot GPUs offer 69–77\% lower cost, yet sit idle during training because DiT rollouts finish nearly simultaneously, which prevents LLM-style pipelining of rollout with training. Spot preemptions further break Sequence Parallelism (SP) groups, fragmenting GPU topology. We present Spotlight, the first system that harvests spot GPUs for DiT RL post-training. Spotlight rests on two key insights we devise: (1)~we show that exploration can tolerate stale model weights because exploration that uses the model weights from the previous iteration preserves the relative ranking of random seeds, allowing exploration to run on idle spot GPUs during training. (2)~SP reconfiguration can reuse on-node state, reducing group recovery from minutes to sub-second launches. Built on these insights, Spotlight introduces three techniques: a bandit-based exploration planner that maximizes reward variance within the training time budget, elastic sequence parallelism that reconfigures SP groups on the fly via persistent schedulers and intra-node weight copying, and a preemption-aware pull-based request scheduler that balances load and commits in-flight state upon preemption. We implement Spotlight on the open-source RL platform ROLL and evaluate it on Qwen-Image post-training. Spotlight reaches the same target validation score $4\times$ faster than baselines, reducing total cost by $1.4$-$6.4\times$ while achieving superior image quality on DeepSeek-OCR and Geneval datasets with resolution $512\times512$ and $1280\times1280$.