Academic Intelligence · Curated Daily

Explore the Frontier of Global Academia

AcademicHub aggregates real-time literature from top journals and preprint platforms. Build your personal research radar and let large language models compile cross-disciplinary analysis briefings automatically.

01.
arXiv (CS.AI) 2026-06-19

PSCT-Net: Geometry-Aware Pediatric Skull CT Reconstruction via Differentiable Back-Projection and Attention-Guided Refinement

arXiv:2606.19867v1 Announce Type: cross Abstract: Computed Tomography (CT) is essential for diagnosing pediatric craniofacial abnormalities, yet poses radiation risks to developing anatomies. Reconstructing 3D CT from sparse bi-planar X-rays offers a low-dose alternative but is severely ill-posed. Existing methods employ geometry-agnostic feature lifting, naively projecting 2D features into 3D without explicit spatial modeling, causing depth ambiguity and degraded osseous boundaries. We present PSCT-Net, a geometry-aware framework with differentiable back-projection. Differentiable back-projection establishes a spatially faithful volumetric prior, alleviating depth ambiguity. An Attention-Guided Projection (AGP-3D) module then learns non-linear voxel-wise correspondences between 2D regions and 3D locations. A Bidirectional Mamba (BiM-3D) module captures long-range volumetric dependencies with linear complexity. We further curate a private institutional pediatric skull CT cohort, PedSkull-CT, comprising normal and pathological cases for internal evaluation, addressing the gap in adult-centric, trunk-focused datasets.

02.
arXiv (CS.AI) 2026-06-18

A CEFR-Inspired Classification Framework with Fuzzy C-Means To Automate Assessment of Programming Skills in Scratch

arXiv:2604.00730v2 Announce Type: replace-cross Abstract: Context: Schools, training platforms, and technology firms increasingly need to assess programming proficiency at scale with transparent, reproducible methods that support personalized learning pathways. Objective: This study introduces a pedagogical framework for Scratch project assessment, aligned with the Common European Framework of Reference (CEFR), providing universal competency levels for students and teachers alongside actionable insights for curriculum design. Method: We apply Fuzzy C-Means clustering to 2008246 Scratch projects evaluated via Dr.Scratch, implementing an ordinal criterion to map clusters to CEFR levels (A1-C2), and introducing enhanced classification metrics that identify transitional learners, enable continuous progress tracking, and quantify classification certainty to balance automated feedback with instructor review. Impact: The framework enables diagnosis of systemic curriculum gaps-notably a "B2 bottleneck" where only 13.3% of learners reside due to the cognitive load of integrating Logic Synchronization, and Data Representation–while providing certainty–based triggers for human intervention.

03.
arXiv (CS.CV) 2026-06-16

Avoiding Exponential Blow-Up in Distributive Lattice Submodular Minimization

Authors:

Submodular function minimization has gained a lot of interest in recent years. They are highly applicable in the area of Computer Vision and Machine Learning. Often such applications require to work with submodular functions defined on distributive lattice. Current best way of dealing with it is using a transformation which extrapolates the submodular function for the respective boolean lattice. It makes optimization system too inefficient due to enlargement of the working space. Quantitatively, the expanded space has additional exponential (in set size) number of elements. We propose a generic framework for dealing with distributive lattice which only works within distributive lattice. Our framework allows one to use already established submodular function minimization algorithms for boolean lattice. In our experiment, we show the huge improvement in terms of running time over tranditional methods for handling distributive lattice.

04.
arXiv (CS.LG) 2026-06-16

Hidden Degradation Costs in Energy-Cost-Only HEMS Optimisation: Study on Battery and PV Sensitivity

arXiv:2606.16051v1 Announce Type: cross Abstract: Residential battery energy storage systems (BESS) are increasingly deployed alongside photovoltaic (PV) generation to reduce household energy costs under volatile time-of-use (TOU) tariffs. Model predictive control (MPC) is a widely adopted optimisation strategy for home energy management systems (HEMS), typically formulated to minimise net energy cost, subject to physical and operational constraints. However, battery degradation is rarely embedded in the optimisation objective, meaning its cost is unquantified and aggressive; high-cycle-count strategies could incur significant losses once deployed to physical systems. This paper presents a receding-horizon mixed-integer linear programming (MILP) baseline for a UK residential HEMS, using demand data from the REFIT dataset. A 3 by 3 sensitivity study is conducted across three battery sizes and three PV array sizes, with post-hoc degradation cost estimated using the Naumann stress model and rainflow cycle counting. Results show that degradation remains constant for each battery size and can exceed energy cost savings by up to 1,060 %. These results demonstrate that energy-cost-only optimisation systematically underestimates the true system cost, motivating a degradation-aware control formulation.

05.
arXiv (CS.AI) 2026-06-18

NeuralMUSIC: A Hybrid Neural-Subspace Framework for Robot Sound Source Localization

arXiv:2606.18664v1 Announce Type: cross Abstract: Reliable sound source localization is fundamental to robot audition, enabling autonomous robots to perceive spatial cues and operate effectively in dynamic environments. Classical methods such as Multiple Signal Classification (MUSIC) offer strong theoretical foundations but degrade under low signal-to-noise ratios. While deep learning-based approaches achieve promising performance, they often struggle with limited generalization across conditions. To address these challenges, we propose NeuralMUSIC, a hybrid neural-subspace framework for robotic sound source localization. Specifically, a neural network first estimates the spatial covariance matrix from multichannel microphone observations. The predicted covariance is then integrated into a classical MUSIC pipeline with eigenvalue decomposition (EVD) and pseudo-spectrum computation, followed by a Frequency Attention Fusion (FAF) module to produce the final DOA estimates. To improve data efficiency, we further introduce a Self-supervised Spatial Correlation Learning (SSCL) strategy that leverages unlabeled acoustic data to capture spatial structure. Extensive experiments across different robotic tasks demonstrate that NeuralMUSIC achieves competitive localization accuracy while exhibiting improved robustness and cross-domain generalization.

06.
arXiv (CS.CL) 2026-06-19

Investigating Human-Model Discrepancies in Speech Quality Assessment via Acoustic and Prosodic Perturbations

Mean opinion score (MOS) prediction models are widely used as proxy metrics in text-to-speech (TTS) research, yet their ability to capture quality differences beyond acoustic fidelity remains unclear. We investigate this via controlled perturbations on speech: acoustic degradation, prosodic errors, and manipulation of speaker-specific characteristics such as pitch and speaking rate. We obtained MOS predictions for these speech samples from both human listeners and the model, and analyzed the differences in their perceptual characteristics. Results show that most models track acoustic degradation well, while all are insensitive to prosodic errors despite large subjective score drops. For speaker characteristics, models exhibit a double dissociation: strong mean fundamental frequency (F0) biases absent in human ratings, yet insensitivity to speaking rate and F0 variability that humans notice. These findings highlight limitations of scalar MOS prediction beyond acoustic fidelity.

07.
arXiv (CS.AI) 2026-06-11

LSTM-Based Detection of Structural Breaks in Property Insurance Loss Reserving: A Climate-Informed Approach

arXiv:2606.11463v1 Announce Type: cross Abstract: Accurate loss reserving is foundational to insurer solvency, yet accelerating climate driven catastrophes systematically violate the stability assumptions on which traditional actuarial methods depend. This white paper presents a research program testing whether Long Short Term Memory (LSTM) neural networks can detect and adapt to these structural breaks faster and more accurately than Chain Ladder, Bornhuetter Ferguson, and Cape Cod methods. Using 15 plus years of regulatory development triangle data from Florida and Louisiana, enriched with NOAA hurricane intensity indices and sea surface temperatures, we hypothesize a targeted improvement of 15, 20% in reserve accuracy for catastrophe exposed years, a threshold grounded both in the prior neural network reserving literature and in the formal convergence results developed here. Beyond empirical validation, we develop a theoretical framework grounding LSTM structural break detection in probabilistic terms, providing formal performance guarantees that compensate for the limited number of catastrophe events in the test period. We document the research design, methodology, expected contributions, and a candid assessment of limitations.

08.
arXiv (quant-ph) 2026-06-12

Coulomb crystallization of xenon highly charged ions in a laser-cooled Ca+ matrix

arXiv:2512.12266v2 Announce Type: replace-cross Abstract: We report on the sympathetic cooling and Coulomb crystallization of xenon highly charged ions (HCIs) with laser-cooled Ca$^+$ ions. The HCIs are produced in a compact electron beam ion trap, then charge selected, decelerated, and finally injected into a cryogenic linear Paul trap. There, they are captured into $^{40}$Ca$^+$ Coulomb crystals, and co-crystallized within them, causing dark voids in their fluorescence images. Fine control over the number of trapped ions and HCIs allows us to realize mixed-species crystals with arbitrary ordering patterns. By investigating Xe$^{q+}$–Ca$^+$ strings, we confirm the HCI charge states, measure their lifetime and characterize the mixed-species motional modes. Our system effectively combines the established quantum control toolbox for Ca$^+$ with the rich set of atomic properties of Xe highly charged ions, providing a resourceful platform for optical frequency metrology, searches for signatures of new physics, and quantum information science.

09.
medRxiv (Medicine) 2026-06-17

A non-invasive liquid biopsy resolves the diagnostic blind spot in chronic kidney disease

Chronic kidney disease is a major global health burden, and its early detection is critical for delaying progression to kidney failure using recently developed targeted therapies. However, current diagnostic screening relies heavily on blood markers that are confounded by muscle mass, and on urine tests that frequently miss structural damage occurring without protein leakage. This creates a critical diagnostic blind spot that hinders timely intervention. Here we show a non-invasive liquid biopsy platform that quantifies a specific protein marker, MUC1, on urinary extracellular vesicles to accurately assess renal parenchymal integrity. By bypassing the systemic metabolic noise of traditional blood tests, our assay provides a remarkably stable, person-specific functional signature. Following extensive validation across diverse cohorts, our longitudinal analysis demonstrated that the discrepancy between this novel urine-based readout and standard blood tests unmasks hidden renal vulnerability, successfully predicting rapid functional decline. By comprehensively evaluating both tubular and glomerular integrity from a single spot urine sample, these findings establish a completely non-invasive, highly scalable prescreening tool that resolves the diagnostic blind spot, enabling broader early detection strategies and ushering in a new era of proactive risk management.

10.
arXiv (CS.AI) 2026-06-16

Evaluation of Alternative-Based Information Systems for Deliberative Polling using an Agentic Simulator

arXiv:2606.11692v1 Announce Type: cross Abstract: Deliberative polling promises to improve collective decision-making by exposing shareholders to a broad range of arguments before they vote. Yet ensuring that every voter encounters a representative sample of the reason space, the coverage problem, remains an open challenge, particularly at scale and in adversarial or strategically motivated electorates. This paper introduces a way of evaluating solutions using the LLM-based Agentic Bipolar Argumentation Simulator, grounded in a framework which formalises a poll as a six-tuple of endorsing and opposing justifications, attack and enhance relations, and shareholder- and relation-weights. ABAS simulates N autonomous shareholder agents, each assigned a latent opinion according to desired distributions in [-1, 1], who sequentially vote, choose or author justifications, and optionally submit argumentation-graph links. The simulator implements recommendations that rank existing justifications by their observable endorsement mass. It evaluates the mechanism's success by coverage, namely the fraction of the corpus reason-tag set represented in the K recommendations presented to each shareholder, as a solution to the NP-hard Subsuming Justification Problem. Reported experiments characterise how creativity rate (pown), recommendation size (K), argumentation density (plinks), and population size (N) affect coverage and corpus diversity. In an authenticated electorate where Sybil attacks are impossible and only the relation graph is gameable, we stress-test the scoring with coordinated strategic voting attacks: a tag-flood attack collapses coverage, while author-count relation weighting through a reversed-PageRank rule resists the flood markedly better than uniform weights.

11.
arXiv (CS.AI) 2026-06-11

Erased but Not Forgotten: How Backdoors Compromise Concept Erasure

arXiv:2504.21072v3 Announce Type: replace-cross Abstract: The expansion of text-to-image diffusion models has raised concerns about harmful outputs, from fabricated depictions of public figures to sexually explicit imagery. To mitigate such risks, prior work has proposed concept erasure methods that aim to sever unwanted concepts from the model via fine-tuning, yet it remains unclear whether these approaches truly remove all links to the harmful concept or merely conceal superficial connections. In this work, we reveal a critical vulnerability, the Erasure Evasion Backdoor (EEB): an adversary binds a backdoor trigger to a concept slated for removal, and this malicious link survives subsequent erasure. We show that both black-box and white-box adversaries can instantiate this threat. Across six state-of-the-art erasure methods, including robust ones that explicitly search for alternative representations of the target concept, EEB consistently exposes harmful content: up to 82% success against celebrity-identity unlearning, up to 94% for object erasure, and up to 16 times amplification of explicit-content exposure. While EEB uncovers a blind spot in current erasure methods, it also provides a diagnostic tool for stress-testing future concept erasure techniques.

12.
arXiv (CS.AI) 2026-06-11

TileFuse: A Fused Mixed-Precision Kernel Library for Efficient Quantized LLM Inference on AMD NPUs

arXiv:2606.11357v1 Announce Type: cross Abstract: With the growing demand for on-device LLM inference, edge SoCs increasingly integrate NPUs to improve performance and energy efficiency under tight power and thermal budgets. However, practical LLM deployment on current client NPUs remains difficult: widely used quantization formats such as AWQ do not map cleanly onto many existing NPU software stacks, which are often proprietary and expose limited low-level control. In this work, we present TileFuse, a close-to-metal mixed-precision kernel library for AMD XDNA2 NPUs that targets transformer linear layers in quantized LLM inference. TileFuse brings practical low-bit formats such as AWQ-style W4A16 and W8A16 directly onto XDNA2, rather than forcing the model to be reshaped around an NPU-specific quantization scheme. TileFuse co-designs weight layout, metadata placement, mixed-precision microkernels, and array-level dataflow. Specifically, it fuses unpacking, dequantization, and GEMM/GEMV execution into a single kernel flow, introduces an interleaved pre-tiling layout that supports GEMM dimensions up to 32K, and redesigns GEMV dataflow to utilize the full 4x8 AIE array. Across kernel-level evaluations, TileFuse improves performance by up to 121.6% for GEMM and 281% for GEMV over full-precision baselines, while delivering more than 2x performance and energy-efficiency gains over strong iGPU baselines on GEMM. In end-to-end LLM experiments on Ryzen AI laptops, TileFuse achieves up to 2.0x lower prefilling latency with more than 64.6% lower energy consumption. Together, these results show that XDNA2 is a practical target for AWQ-style edge LLM inference and that native NPU support for off-the-shelf quantization can make NPUs substantially more usable in real client deployments.

13.
arXiv (CS.CV) 2026-06-17

Geometric Consistency Protocol for Foundation Model Features in Multi-View Satellite Imagery

Standardized evaluation protocols are indispensable for robust benchmarking in remote sensing, particularly as foundation features are increasingly transferred across diverse sensors and complex imaging geometries. In satellite multi-view reconstruction, conventional evaluations relying on unconstrained 2D global matching are often misleading. The Rational Function Model (RFM) and its Rational Polynomial Coefficients (RPC) dictate a curved, height-dependent epipolar geometry that render flat 2D search spaces physically inconsistent. We propose a geometry-faithful and reproducible protocol tailored for the RPC framework. Our approach integrates an RPC-projected 3D consistency metric with a geometry-constrained dense matching proxy, specifically evaluating whether similarity responses remain localized and unique under physically plausible search manifolds. A pivotal finding of our joint reporting strategy is the decoupling of semantic agreement and geometric localization: high cross-view similarity at a projected 3D point does not guarantee reliable matchability in practical inference. Our benchmark demonstrates that incorporating geometric constraints is fundamental to the problem definition in satellite imagery. Furthermore, we show that state-of-the-art 2D backbones remain remarkably competitive against specialized 3D-aware models when subjected to this RPC-consistent evaluation.

14.
arXiv (CS.AI) 2026-06-19

ROSE: Benchmarking the Perception-to-Action Gap in Multimodal Models

arXiv:2606.19965v1 Announce Type: cross Abstract: Multimodal large language models (MLLMs) are increasingly expected to act on visual information, yet the same scene may require different actions under different task contexts. How reliably can a model turn the same visual evidence into the action required by the current context? To answer this question, we introduce \textsc{ROSE} (Reference-conditioned Oddity and Symbolic Execution), a controlled benchmark that holds the visual scene fixed while varying region constraints and required symbolic outputs. Through coupled counting and coordinate-action tasks, \textsc{ROSE} tests whether models can infer an implicit majority reference and act on the resulting fine-grained visual evidence under changing contexts. Across nine recent MLLMs, performance drops by as much as 44.5 percentage points from counting-oriented tasks to region-conditioned action, despite 98.8\% human performance. The gap persists on paired scenes and regions for which the same model returns the correct count, while global-click and matched local controls show that coordinate grounding explains only part of the loss, revealing a distinct, model-dependent bottleneck in turning shared visual evidence into context-specific actions.

15.
arXiv (CS.LG) 2026-06-16

Diffusion Models for Adaptive Sequential Data Generation

arXiv:2606.06007v2 Announce Type: replace Abstract: Generating realistic synthetic sequential data is critical in real-world applications across operations research, finance, healthcare, energy systems, and scientific computing, where time-indexed observations are used for prediction, simulation, risk assessment, and data-driven decision-making. While diffusion models have achieved remarkable success in generating static data, their direct extensions to sequential settings often fail to capture temporal dependence and information structure. Designing diffusion models that can simulate sequential data in an adapted manner, and hence without anticipation of future information, therefore remains an open challenge. In this work, we propose a sequential forward-backward diffusion framework for adapted time series generation. Our approach progressively injects and removes noise along the sequence, conditioning on the previously generated history to ensure adaptiveness. A novel score-matching objective is introduced for efficient parallel training. We derive rigorous statistical guarantees under a generic framework, then establish score approximation, score estimation, and distribution estimation results with ReLU networks serving as a concrete instance. Empirically, we validate our method on synthetic data, including ARMA models and Gaussian processes, and demonstrate its effectiveness in constructing mean-variance optimal portfolios.

16.
arXiv (CS.AI) 2026-06-12

Fusion Learning from Dynamic Functional Connectivity: Combining the Amplitude and Phase of fMRI Signals to Identify Brain Disorders

arXiv:2603.24603v2 Announce Type: replace-cross Abstract: Dynamic functional connectivity (dFC) derived from resting-state functional magnetic resonance imaging (fMRI) has been extensively utilized in brain science research. The sliding window correlation (SWC) method is a widely used approach for constructing dFC by computing correlation coefficients between amplitude time series of signals from pairs of brain regions. In this study, we propose an integrated approach that incorporates both amplitude and phase information of fMRI signals to improve the detection of brain disorders. Specifically, we introduce a multi-scale fusion learning framework, namely MSFL, which leverages two complementary dFC features derived from SWC and phase synchronization (PS). Here, SWC captures amplitude correlations, while PS measures phase coherence within dFC. We evaluated the efficacy of MSFL in classifying autism spectrum disorder and major depressive disorder using two publicly available datasets: ABIDE I and REST-meta-MDD, respectively. The results indicate that MSFL significantly outperforms existing comparative models. Moreover, we performed model explanation analysis using the SHAP framework, which showed that both types of dFC features from SWC and PS contribute to detecting brain disorders.

17.
medRxiv (Medicine) 2026-06-12

Does the method matter? Evaluating the effectiveness, efficiency and ease of hearing-aid gain self-adjustment

In conventional hearing-aid personalisation, clinicians cannot hear what their patients hear, and patients cannot often reliably detect or describe what they hear. Self-adjustment avoids this issue but requires user controls that adjust hearing-aid signal processing parameters to be effective, efficient and easy. In this study, we explored (a) the roles of interface complexity and stimulus type in the self-adjustment of hearing-aid gain, and (b) how well individuals can adjust one sound to match another to assess the same interfaces and stimuli. Adult hearing-aid users with mild to moderate symmetrical sensorineural hearing loss repeatedly adjusted the gain (a) to their preference from individual prescription (n = 41) and (b) to match their previous preferences from a random starting point (n = 32) using three interfaces representing different bass/mid/treble configurations and three stimuli (music, speech and speech-in-noise). The large interindividual variability in self-adjusted gains clustered into three patterns of deviation from initial prescription: increased relative bass, overall gain reduction, and close to initial prescription. There were no substantial effects of interface nor stimulus on self-adjustment reliability (median {sigma} = 2.8 dB), whereas absolute sound-matching error increased with increasing interface complexity and centre frequency. Neither individual matching accuracy nor questionnaire responses predicted either self-adjusted gains or reliability. Overall, these results show that many - but not all - hearing-aid users can adjust gains with reasonable reliability, and while it can be difficult to predict the behaviour from the individual, the individual applies a similar self-adjustment behaviour across different interfaces and stimuli.

18.
arXiv (CS.CL) 2026-06-18

Are LLMs Ready to Assist Physicians? PhysAssistBench for Interactive Doctor-Patient-EHR Assistance

The most plausible near-term role of medical LLMs is to assist rather than replace physicians, yet current evaluations often test isolated capabilities: clinical knowledge, EHR system interaction, or patient communication. Physician assistance instead requires coordinating these capabilities within the same interaction, where physicians issue underspecified requests, patients describe symptoms ambiguously, and EHR systems demand precise tool use. We introduce PhysAssistBench, a benchmark for interactive doctor-patient-EHR assistance. Built from real MIMIC-IV cases, PhysAssistBench uses a scalable pipeline to construct agentic patients: interactive, record-grounded agents that turn static EHR records into multi-turn clinical scenarios while preserving clinical factuality. PhysAssistBench provides a curated bilingual evaluation set of 1,296 manually reviewed and physician-validated turns. Experiments with leading LLMs show that current models remain unreliable in this setting, which exposes a key bottleneck for clinical LLMs: reliable assistance requires coordination across knowledge, communication, and systems, not isolated gains in any of them.

19.
arXiv (CS.LG) 2026-06-19

Comparative Study of Neural Surrogate Architectures for Autoregressive Prediction of Internal Battery States

arXiv:2606.20053v1 Announce Type: new Abstract: The Doyle-Fuller-Newman (DFN) model resolves internal electrochemical states in lithium-ion batteries with high fidelity. However, the numerical solution of its governing equations is computationally prohibitive for real-time deployment, limiting scalability from individual cells to pack and fleet-scale applications. While machine learning surrogates can substantially reduce inference latency through GPU acceleration, most existing approaches learn solution approximations tied to specific operating conditions rather than learning generalizable state-evolution dynamics. This work presents a systematic comparison of four neural network architectures (MLP, ResNet, U-Net, FNO) formulated as autoregressive state-transition operators that predict full DFN internal states across a wide range of operating conditions. To ensure a controlled architectural comparison, all models are trained under a unified framework using multi-step unrolling and current-conditioning, isolating the impact of spatial inductive bias. Results demonstrate that the U-Net's multi-scale feature hierarchy achieves a mean final-step nRMSE of 3% averaged across all internal state variables after 300-step autoregressive rollouts, while providing a 5.38x speed-up over the numerical solver. These findings highlight spatial inductive bias as a critical determinant of surrogate performance, advancing the development of surrogates for internal state observability for next-generation battery management systems and digital twins.

20.
arXiv (math.PR) 2026-06-11

Additive Noise, Shift Recovery, and Signed Signals in the Cumulative Distribution Transform

arXiv:2606.11432v1 Announce Type: cross Abstract: The cumulative distribution transform (CDT) is a quantile-based transport representation that exactly linearizes one-dimensional translations of positive densities. We study how this structure behaves under additive perturbations and how it can be exploited for shift recovery. Under a local nondegeneracy condition, we derive a first-order expansion showing that additive noise in physical space induces a nonlocal perturbation in CDT space through the primitive of the noise, weighted by the reciprocal density. This yields an explicit description of transform-domain sensitivity and shows, in particular, that perturbations are amplified in low-density regions. When the physical-space perturbation is modeled as a centered Gaussian random field, the induced first-order CDT perturbation is again Gaussian, with an explicit covariance kernel. We then use this structure to study recovery in CDT coordinates. In the known-template setting, the transport shift is obtained by projection onto the constant mode, giving an explicit estimator together with exactness in the noiseless case and a stability bound under perturbations. In the unknown-template setting, multiple observations permit joint recovery of the shifts and a common template up to the natural constant-mode gauge, leading to a simple de-shift–and–average procedure. We also consider a signed-signal analogue based on the signed cumulative distribution transform (SCDT), where shifts are estimated numerically by feature matching and unknown templates are recovered by alternating alignment and averaging. Numerical experiments validate the perturbation analysis and illustrate effective recovery for both density-valued and signed signals.

21.
arXiv (CS.LG) 2026-06-12

COSMOS: Model-Agnostic Personalized Federated Learning with Clustered Server Models and Pseudo-Label-Only Communication

arXiv:2605.11165v2 Announce Type: replace Abstract: Federated learning (FL) in heterogeneous environments remains challenging because client models often differ in both architecture and data distribution. While recent approaches attempt to address this challenge through client clustering and knowledge distillation, simultaneously handling architectural and statistical heterogeneity remains difficult. We introduce COSMOS, a model-agnostic framework that enables server-side personalization using only pseudo-label communication. Clients train local models and predict on the public data; the server clusters clients by prediction similarity, trains a cluster-specific model for each group using its own compute, and distills the resulting models back to clients. We provide the first theoretical analysis showing that distillation from the learned cluster models can yield exponential personalization risk contraction, going beyond the convergence-to-stationarity guarantees typically provided in model-agnostic FL. Experiments across benchmarks demonstrate that COSMOS consistently outperforms all model-agnostic FL baselines while remaining competitive with state-of-the-art personalized FL methods. More broadly, our results highlight personalized server-side learning with pseudo-labels as a promising paradigm for scalable and model-agnostic federated learning in highly heterogeneous environments.

22.
arXiv (CS.CL) 2026-06-16

XAI-Grounded Explanation Generation for Speech Deepfake Detection with Training-Free Multimodal Large Language Models

Speech deepfake detection (SDD) systems require trustworthy explanations for reliable decision-making. Existing explanation ways mainly fall into two categories. Traditional explainable AI (XAI), such as gradient-based attribution, produces low-level attribution signals tightly coupled with model decisions, and harder to be understood by human than natural language explanations. Meanwhile, large language model (LLM)-based explanation generation often produces generic and ungrounded descriptions due to the lack of heuristic evidence and task-specific supervision, stemming from limited grounded explanation datasets for SDD. We therefore propose a training-free explanation framework that integrates XAI evidence with multimodal LLMs to generate grounded and specific explanations. Using the PartialSpoof dataset, we construct a grounded explanation dataset and show that methods with XAI increase inside accuracy by over 45\%, verified through human evaluation and faithfulness checks.

23.
arXiv (CS.AI) 2026-06-12

Competition and Diversity in Generative AI

arXiv:2412.08610v3 Announce Type: replace-cross Abstract: Recent evidence, both in the lab and in the wild, suggests that the use of generative artificial intelligence reduces the diversity of content produced. The use of the same or similar AI models appears to lead to more homogeneous behavior. Our work begins with the observation that there is a force pushing in the opposite direction: competition. When producers compete with one another (e.g., for customers or attention), they are incentivized to create novel or unique content. We explore the impact competition has on both content diversity and overall social welfare. Through a formal game-theoretic model, we show that competitive markets select for diverse AI models, mitigating monoculture. We further show that a generative AI model that performs well in isolation (i.e., according to a benchmark) may fail to provide value in a competitive market. Our results highlight the importance of evaluating generative AI models across the breadth of their output distributions, particularly when they will be deployed in competitive environments. We validate our results empirically by using language models to play Scattergories, a word game in which players are rewarded for answers that are both correct and unique. Overall, our results suggest that homogenization due to generative AI is unlikely to persist in competitive markets, and instead, competition in downstream markets may drive diversification in AI model development.

24.
arXiv (quant-ph) 2026-06-12

Symmetry and Topology of Monitored Quantum Dynamics

arXiv:2412.06133v4 Announce Type: replace-cross Abstract: The interplay between unitary dynamics and quantum measurements induces diverse phenomena in open quantum systems with no counterparts in closed quantum systems at equilibrium. Here, we generally classify Kraus operators and their effective non-Hermitian dynamical generators, thereby establishing the tenfold classification for symmetry and topology of monitored free fermions. Our classification elucidates the role of topology in measurement-induced phase transitions and identifies potential topological terms in the corresponding nonlinear sigma models. Furthermore, we establish the bulk-boundary correspondence in monitored quantum dynamics: nontrivial topology in spacetime manifests itself as topologically nontrivial steady states and gapless boundary states in Lyapunov spectra, such as Lyapunov zero modes and chiral edge modes, leading to the topologically protected slowdown of dynamical purification.

25.
arXiv (CS.CV) 2026-06-16

Stepwise Token Selection for Efficient Multimodal Large Language Models

In multimodal large language models (MLLMs), inference cost is largely dominated by the visual token prefix rather than the language backbone, making token reduction a key factor for improving efficiency. Existing approaches typically assign independent importance scores to visual tokens and retain a fixed number of top-ranked tokens, implicitly assuming token independence and a uniform compression ratio across inputs. In this work, we reformulate visual token pruning as a sequential decision-making process. Specifically, we introduce a pointer-style selection mechanism that iteratively chooses informative tokens, conditioning each decision on previously selected ones, and dynamically determines when to stop via a learned termination action. This enables joint optimization of both the selected subset and its size. To enable end-to-end training under standard language modeling objectives, we design a differentiable relaxation based on a variance-preserving noise interpolation scheme, allowing gradients to propagate through the discrete selection process. Extensive experiments on LLaVA-v1.5-7B and Qwen2.5-VL-7B demonstrate that our approach consistently outperforms fixed-ratio baselines across different compression levels. Under aggressive pruning that removes 88.9% of visual tokens, our method preserves 94.6% of the original accuracy while achieving a 1.88x speed-up in prefill latency.