Academic Intelligence · Curated Daily

Explore the Frontier of Global Academia

AcademicHub aggregates real-time literature from top journals and preprint platforms. Build your personal research radar and let large language models compile cross-disciplinary analysis briefings automatically.

01.
arXiv (CS.CL) 2026-06-11

Unstable Features, Reproducible Subspaces: Understanding Seed Dependence in Sparse Autoencoders

Sparse autoencoders (SAEs) are widely used to interpret neural network representations, but their utility depends on whether the learned features are reproducible across training runs. We study this question through feature stability: for each SAE feature, we estimate the probability that a similar feature reappears in an independently trained SAE. This yields a scalable per-feature signal that separates stable from unstable features. In a large-scale study across seeds, models, layers, dictionary sizes, and SAE variants, we find a pronounced functional asymmetry: stable features carry most of the reconstruction- and prediction-relevant signal, while unstable features have weak marginal impact and are dominated by low-frequency surface-form triggers in both activation statistics and automatic explanations. Geometrically, unstable features are individually non-reproducible but concentrate in reproducible lower-rank subspaces, suggesting that seed dependence often reflects basis ambiguity within a shared region of activation space rather than pure noise. A controlled synthetic model makes this mechanism explicit, showing that low-rank ground-truth features can be recovered at the subspace level while remaining non-identifiable as individual SAE latents across seeds. Finally, by pooling unique cross-seed features, we construct more stable SAEs while preserving explained variance in this setting. Together, these results show that unstable features are not merely failed or noisy latents: they have weak individual functional impact, but reflect reproducible low-dimensional structure that standard SAEs resolve differently across seeds.

02.
arXiv (CS.AI) 2026-06-11

MobilityBench: A Benchmark for Evaluating Route-Planning Agents in Real-World Mobility Scenarios

arXiv:2602.22638v2 Announce Type: replace Abstract: Route-planning agents powered by large language models (LLMs) have emerged as a promising paradigm for supporting everyday human mobility through natural language interaction and tool-mediated decision making. However, systematic evaluation in real-world mobility settings is hindered by diverse routing demands, non-deterministic mapping services, and limited reproducibility. In this study, we introduce MobilityBench, a scalable benchmark for evaluating LLM-based route-planning agents in real-world mobility scenarios. MobilityBench is constructed from large-scale, anonymized real user queries collected from Amap and covers a broad spectrum of route-planning intents across multiple cities worldwide. To enable reproducible, end-to-end evaluation, we design a deterministic API-replay sandbox that eliminates environmental variance from live services. We further propose a multi-dimensional evaluation protocol centered on outcome validity, complemented by assessments of instruction understanding, planning, tool use, and efficiency. Using MobilityBench, we evaluate multiple LLM-based route-planning agents across diverse real-world mobility scenarios and provide an in-depth analysis of their behaviors and performance. Our findings reveal that current models perform competently on Basic information retrieval and Route Planning tasks, yet struggle considerably with Preference-Constrained Route Planning, underscoring significant room for improvement in personalized mobility applications. We publicly release the benchmark data, evaluation toolkit, and documentation at https://github.com/AMAP-ML/MobilityBench.

04.
arXiv (CS.CV) 2026-06-18

Bridging Single Distortion Artifacts and Mmultifactorial Clinical Quality: Few-shot Biparametric MRI Quality Assessment via Distortion-trained Prototypical Networks

Clinical prostate multi-parametric MRI relies heavily on high-quality diffusion-weighted imaging (DWI), yet reading DWI is frequently compromised by geometric distortion, often caused by rectal air. Assessing quality via the PI-QUAL scoring system is an emerging clinical standard, but it is subjective, time-consuming and suffers from a class imbalance where low-quality cases are diverse and relatively scarce. Using the PRIME clinical trial as an example, there are $6\%$ images with PI-QUAL scores lower than 4, $87\%$ of DWI issues are due to distortion. Many of the other clinical quality issues are under-represented. To address this common dual-scarcity of annotated clinical data, we propose a few-shot biparametric prototypical network for automated image quality assessment (IQA). Our framework utilizes a dual-branch 3D ResNet to fuse T2-weighted and DWI features, providing anatomical context to distinguish true morphology from distortion. To handle real-world heterogeneity, we introduce feature-wise linear modulation (FiLM) and a gradient reversal layer (GRL) to align feature distributions conditioned on varying b-values while suppressing acquisition-related biases. We demonstrate that a model meta-trained solely on comparatively objective, readily obtainable distortion labels can effectively adapt to predicting complex, multi-factorial clinical quality scores such as PI-QUAL using only five representative samples. Experimental results on two datasets show that our method significantly outperforms few-shot learning baselines for this challenging IQA task, offering a practically feasible and data-efficient solution for standardizing prostate MRI quality control in clinical workflows.

05.
arXiv (CS.LG) 2026-06-12

COSMOS: Model-Agnostic Personalized Federated Learning with Clustered Server Models and Pseudo-Label-Only Communication

arXiv:2605.11165v2 Announce Type: replace Abstract: Federated learning (FL) in heterogeneous environments remains challenging because client models often differ in both architecture and data distribution. While recent approaches attempt to address this challenge through client clustering and knowledge distillation, simultaneously handling architectural and statistical heterogeneity remains difficult. We introduce COSMOS, a model-agnostic framework that enables server-side personalization using only pseudo-label communication. Clients train local models and predict on the public data; the server clusters clients by prediction similarity, trains a cluster-specific model for each group using its own compute, and distills the resulting models back to clients. We provide the first theoretical analysis showing that distillation from the learned cluster models can yield exponential personalization risk contraction, going beyond the convergence-to-stationarity guarantees typically provided in model-agnostic FL. Experiments across benchmarks demonstrate that COSMOS consistently outperforms all model-agnostic FL baselines while remaining competitive with state-of-the-art personalized FL methods. More broadly, our results highlight personalized server-side learning with pseudo-labels as a promising paradigm for scalable and model-agnostic federated learning in highly heterogeneous environments.

06.
arXiv (CS.LG) 2026-06-16

Active Learning with Low-Rank Structure for Data Selection

arXiv:2606.16045v1 Announce Type: new Abstract: In the data selection problem, the objective is to choose a small, representative subset of data that can be used to efficiently train a machine learning model. Sener and Savarese [ICLR 2018] showed that, given an embedding representation of the data and suitable geometric assumptions, heuristics based on $k$-center clustering can be used to perform data selection. This perspective was further explored by Axiotis et. al. [ICML 2024], who proposed a data selection approach based on $k$-means clustering and sensitivity sampling. However, these methods rely on the assumption that the dataset exhibits intrinsic geometric structure that can be effectively captured by clustering, whereas many modern datasets instead possess global algebraic structure that is better exploited by low-rank approximation or principal component analysis. In this paper, we introduce a new data selection framework based on low-rank approximation and residual-based sampling, formulated through the lens of row subset selection and loss-preserving coreset construction. Given an embedding representation of the data satisfying mild regularity conditions, which can be interpreted as algebraic or angular notions of Lipschitz continuity, we show that it is possible to select a weighted subset of $\tilde{O}\left(k + \frac{1}{\varepsilon^2}\right)$ data points whose average loss approximates the average loss over the full dataset within a $(1+\varepsilon)$ relative error, up to an additive $\varepsilon \Phi_k$ term, where $\Phi_k$ denotes the optimal rank-$k$ approximation cost of the embedding matrix. We complement these theoretical guarantees with empirical evaluations, demonstrating that on a range of real-world datasets, our data selection approach achieves improved performance over prior strategies based on uniform sampling or clustering-based sensitivity sampling.

07.
arXiv (CS.CV) 2026-06-15

Connections Between Pairs of Filters Improve the Accuracy of Convolutional Neural Networks

While researchers continue to find new and improved network structures for CNNs, most of the newly invented architectures still rely on the traditional pattern of stacking convolutional blocks and separating them with pointwise activation functions. However, there are drawbacks to a network purely building on pointwise nonlinearities. One alternative is to introduce a pairwise connection between two filters of a network. Typical connection functions use multiplications or the minimum operation to realize logical AND connections. In this paper, we go one step further by demonstrating that CNNs can benefit from more general connections, which include parameters that are learned. With such parameters, the network is able to implement different connections in different network layers and better adapt the connection function to the task at hand.

08.
arXiv (quant-ph) 2026-06-16

Minimum measurements quantum protocol for band structure calculation

arXiv:2511.04389v2 Announce Type: replace Abstract: Protocols for quantum measurement are an essential part of quantum computing. Measurements are no longer confined to the final step of computation but are increasingly embedded within quantum circuits as integral components of noise-resilient algorithms. However, each observable typically requires a distinct measurement basis, often demanding a different circuit configuration. As the number of such configurations typically grows with the number of qubits, measurements constitute a major bottleneck. Focusing on electronic structure calculations in crystalline systems, we propose a measurement protocol that restricts the required measurement configurations to an absolute minimum of just three, independent of the number of qubits. This makes it one of the few known protocols that do not scale with qubit number. In particular, we derive the measurement protocol from the symmetries of tight-binding (TB) Hamiltonians and implement it within the Orthogonal-Ansatz Variational Quantum Eigensolver (OA-VQE) algorithm. We demonstrate its performance on three systems, namely a two-dimensional CuO$_2$ square lattice (3 qubits), bilayer graphene with hexagonal (Honeycomb) lattice (4 qubits) and three-dimensional diamond lattice (10 qubits). Beyond tight-binding systems, the protocol can be extended to enable efficient initial state preparation for many-body Hamiltonians, such as multi-orbital Hubbard models in a momentum space.

09.
medRxiv (Medicine) 2026-06-11

Ferritin across long-term conditions in England: cross-sectional primary care study

Background Iron deficiency (ID) is a readily treatable condition once identified. Ferritin is the primary diagnostic marker, but cut-offs vary and inflammation complicates interpretation in patients with long-term conditions (LTCs). Aim To describe ferritin distribution and the prevalence of threshold-defined low ferritin in adults with and without LTCs in primary care. Design and setting Cross-sectional observational study using routinely collected electronic health records from a national primary care database in England (1st January 2015 to 31st December 2021). Method Adults with >1 ferritin test in Clinical Practice Research Datalink (CPRD) Aurum were included. LTCs were identified using validated primary-care code lists. Outcomes included ferritin distribution and threshold-defined ID prevalence using World Health Organization (WHO) (

10.
arXiv (CS.LG) 2026-06-19

BLISS: A Lightweight Bilevel Influence Scoring Method for Data Selection in Language Model Pretraining

arXiv:2510.06048v5 Announce Type: replace Abstract: Effective data selection is essential for pretraining large language models (LLMs), enhancing efficiency and improving generalization to downstream tasks. However, existing approaches often require leveraging external pretrained models, making it difficult to disentangle the effects of data selection from those of the external pretrained models. In addition, they often overlook the long-term impact of selected data if the model is trained to convergence, primarily due to the prohibitive cost of full-scale LLM pretraining. In this paper, we introduce BLISS (BileveL Influence Scoring method for data Selection): a lightweight data selection method that operates entirely from scratch, without relying on any external pretrained oracle models, while explicitly accounting for the long-term impact of selected data. BLISS leverages a small proxy model as a surrogate for the LLM and employs a score model to estimate the long-term influence of training samples if the proxy model is trained to convergence. We formulate data selection as a bilevel optimization problem, where the upper-level objective optimizes the score model to assign importance weights to training samples, ensuring that minimizing the lower-level objective (i.e., training the proxy model over the weighted training loss until convergence) leads to best validation performance. Once optimized, the trained score model predicts influence scores for the dataset, enabling efficient selection of high-quality samples for LLM pretraining. We validate BLISS by pretraining 410M/1B/2.8B Pythia and LLaMA-0.5B models on selected subsets of the C4 dataset. Notably, under the 1B model setting, BLISS achieves $1.7\times$ speedup in reaching the same performance as the state-of-the-art method, demonstrating superior performance across multiple downstream tasks.

11.
arXiv (CS.LG) 2026-06-16

Fantastic Pretraining Optimizers and Where to Find Them II: Hyperball Optimization

arXiv:2606.16899v1 Announce Type: new Abstract: Matrix based optimizers such as Muon can substantially speed up language model pretraining, but their gains over AdamW are observed to shrink as model size and data scale grow when using standard constant decoupled weight decay. We propose Hyperball, a simple optimizer wrapper that addresses this issue. Given a base optimizer such as Adam or Muon, Hyperball sets the Frobenius norms of weight matrices and their corresponding optimizer updates to fixed constants. On Qwen3 style models up to 1.2B parameters, Muon Hyperball achieves 20–30% token equivalent speedup over weight decay baselines. Hyperball also improves learning rate transfer across widths and depths compared to decoupled weight decay. This method is motivated by prior theory showing that training with weight decay leads to an equilibrium weight norm that only depends on the training hyperparameters. Through this mechanism, the weight decay then decides the angular learning rate, i.e. how fast the direction of the weight matrix changes.

12.
PLOS Computational Biology 2026-06-22

Cell-type resolved transcriptional network analysis of <i>in vivo</i> cellular senescence following injury

Authors:

by Alda Sabalic, Victoria Moiseeva, Andres Cisneros, Oleg Deryagin, Eusebio Perdiguero, Pura Muñoz-Cánoves, Jordi Garcia-Ojalvo Identifying the genetic correlates of complex phenotypes is a challenging task. Methods coming from the field of complex networks can help finding such molecular patterns, by revealing statistical associations among groups of genes that correlate with the phenotype. Here we study cellular senescence, a complex cell state whose molecular underpinnings are still under active investigation. We analyze cell type–resolved RNA sequencing data obtained from injured muscle tissue in mice, with a network-based approach that merges eigenvector centrality feature selection and community detection. Our analysis identifies genetic markers that had not been associated with senescence so far, which are validated with existing single-cell RNA sequencing data in a different type of tissue. The identified key genes belong to transcriptional pathways associated with established hallmarks of senescence, and thus can be interpreted as molecular correlates of such hallmarks. The method proposed here could be applied to any complex cellular phenotype even when only bulk RNA sequencing is available, provided the data is resolved by cell type.

13.
arXiv (CS.CL) 2026-06-12

KCSAT-ML: Probing Reasoning Models with Nationwide-Cohort Human Difficulty

Math reasoning benchmarks have proliferated, yet most lack a per-item difficulty signal grounded in actual human performance. We introduce KCSAT-ML, a decade (2014-2025) of Korean College Scholastic Ability Test (KCSAT; Suneung) mathematics: 664 problems with a 339-item core set carrying official per-item error rates from nationwide cohorts of hundreds of thousands of examinees. We pair the benchmark with Difficulty-aligned Reasoning Gain (DRG): a score-orthogonal metric that asks whether a model's mistakes concentrate on the items humans found hard, or on items humans found easy. Together they expose, across a wide range of VLMs (and LLMs via OCR), three patterns: (i) low-budget accuracy collapses on the high-human-error tail at every model size; (ii) test-time scaling (TTS) raises token use roughly linearly with cohort error rate, while accuracy gains follow a non-monotonic curve; (iii) within a single family, TTS flips between anti-scaling on the hardest items and overthinking on easier ones – two faces of the same alignment failure. On DRG, models with near-identical accuracy can sit at near-opposite values: one model gets wrong what humans also find hard, while another solves the hardest items yet fails on items humans find easy – a contrast that aggregate accuracy hides. Our code and dataset builder will be open-sourced at https://github.com/naver-ai/KCSAT-ML.

14.
arXiv (CS.LG) 2026-06-15

Ensembling Sparse Autoencoders

arXiv:2505.16077v2 Announce Type: replace Abstract: Sparse autoencoders (SAEs) are used to decompose neural network activations into human-interpretable features. Typically, features learned by a single SAE are used for downstream applications. However, it has recently been shown that a single SAE captures only a limited subset of features that can be extracted from the activation space. Motivated by this limitation, we introduce and formalize SAE ensembles. Furthermore, we propose to ensemble multiple SAEs through naive bagging and boosting. In naive bagging, SAEs trained with different weight initializations are ensembled, whereas in boosting SAEs sequentially trained to minimize the residual error are ensembled. Theoretically, naive bagging and boosting are justified as approaches to reduce reconstruction error. Empirically, we evaluate our ensemble approaches with three settings of language models and SAE architectures. Our empirical results demonstrate that, compared to an expanded SAE that matches the number of features in the ensemble, ensembling SAEs improves the reconstruction of language model activations along with SAE stability. Additionally, on downstream tasks such as concept detection and spurious correlation removal, SAE ensembles achieve better performance, showing improved practical utility.

15.
arXiv (quant-ph) 2026-06-24

Dimensionality Reduction of QAOA Parameter Space with Kernel PCA for Max-Cut

arXiv:2606.23718v1 Announce Type: new Abstract: The Quantum Approximate Optimization Algorithm (QAOA) is a leading variational algorithm for combinatorial optimization on near term quantum devices. As circuit depth increases, the number of optimization parameters grows, making the search landscape increasingly nonlinear and difficult to optimize. Previous studies have shown that optimal QAOA parameters often lie on a low dimensional manifold that can be approximated using Principal Component Analysis (PCA) at shallow circuit depths. However, the effectiveness of PCA decreases at higher depths because the underlying parameter manifold becomes increasingly nonlinear. In this work, we investigate Kernel Principal Component Analysis (KPCA) with a radial basis function kernel as a nonlinear dimensionality reduction technique for QAOA parameter optimization. The model is trained using 200 graphs from each of 3 graph families, namely Erdos-Renyi, Barabasi-Albert, and Watts-Strogatz, with graph sizes ranging from 7 to 10 nodes. Performance is evaluated on 30 test graphs containing 12 nodes at circuit depths 1, 2, 4, and 8. Experimental results demonstrate that KPCA consistently outperforms PCA at deeper circuit depths across all graph families. At depth 8, KPCA achieves approximation ratios above 0.86, while PCA declines to approximately 0.81 to 0.83. Both methods reduce the number of quantum circuit evaluations by more than 93 percent relative to unrestricted QAOA optimization. These findings suggest that nonlinear kernel methods more effectively capture the structure of the QAOA parameter manifold and provide a practical approach for scaling variational quantum optimization to deeper circuits.

16.
arXiv (CS.CV) 2026-06-12

Acquisition state behaves as a structured, measurable variable governing lung-nodule AI: kernel-driven measurement instability and noise-driven detection fragility, invisible to DICOM metadata

AI governance for medical imaging is formalizing: the 2026 ACR-SIIM Practice Parameter recommends local acceptance testing and ongoing drift monitoring, and the ACR Assess-AI registry monitors AI outputs using DICOM metadata for context. We argue that a necessary, currently unmonitored layer sits beneath output metrics: whether incoming studies remain within the acquisition envelope a model was validated on. Using a LUNA16-trained MONAI RetinaNet lung-nodule detector, we test whether acquisition state behaves as a structured, measurable variable. On real paired CT differing only in reconstruction kernel (NLST B30f vs B80f), kernel alone shifted AI-measured diameter and flipped a Fleischner size category in 5.2% (8 of 155) of nodules at fixed patient and acquisition, while detection confidence was unchanged (Wilcoxon p=0.22). Under controlled LIDC-IDRI perturbations the effects dissociated by axis: the noise axis degraded detection confidence (p=5.9e-32, concentrated in nodules under 6 mm) but not measurement, while the frequency/kernel axis corrupted measurement (p=8.6e-13) but not detection. A 4-feature pixel fingerprint recovered reconstruction identity (patient-level AUC about 0.95 on real CT, 0.995 on a QIBA phantom) where the ConvolutionKernel DICOM tag was uninformative (identical labels across reconstructions). The kernel axis transported across four manufacturers (leave-one-vendor-out AUC 0.94-0.98, matching the within-vendor ceiling). Acquisition state thus maps to distinct AI failure modes, frequency content to measurement reliability and noise to detection sensitivity, and is not recoverable from metadata. Acquisition-aware, input-side validation is the missing layer for the acceptance-testing and drift-monitoring requirements now entering imaging-AI accreditation.

17.
arXiv (CS.CV) 2026-06-16

Structure-Semantic Co-optimized Latent Diffusion Model for Fast Visual Anagram Synthesis

Visual anagram is an intriguing form of art creation wherein a single image presents different conceptual interpretations under transformations such as flipping or rotation. Recent work has achieved visual anagram synthesis by leveraging pretrained text-to-image (T2I) diffusion models, yet still suffers from several key limitations including computational inefficiency, suboptimal aesthetic quality, and weak semantic fidelity and expressiveness. This work focuses on generating visual anagrams with substantially improved visual quality at minimal computational cost, thereby advancing intelligent creation of illusionary digital art. To increase image resolution while reducing time overhead, we adapt the cutting-edge parallel denoising algorithm from pixel-based T2I model to the adversarially distilled latent-based one, and accordingly propose a structure-semantic co-optimization (S2CO) framework to counteract the consequent visual degradation. As the core of our approach, S2CO framework comprises three key innovations: (\romannumeral1) null-text structure alignment optimization; (\romannumeral2) semantic enhancement optimization; (\romannumeral3) attention-guided noise fusion. Building upon these components, our method dubbed S2CO-Anagram is able to generate higher-resolution anagram images with noticeably superior visual harmony and semantic faithfulness than related SOTA approaches, all while achieving substantially faster inference speed. Code will be publicly available.

18.
arXiv (CS.CL) 2026-06-16

Bridging Passive and Active: Enhancing Conversation Starter Recommendation via Active Expression Modeling

Large Language Model (LLM)-driven conversational search is shifting information retrieval from reactive keyword matching to proactive, open-ended dialogues. In this context, Conversation Starters are widely deployed to provide personalized query recommendations that help users initiate dialogues. Conventionally, recommending these starters relies on a closed "exposure-click" loop. Yet, this feedback loop mechanism traps the system in an echo chamber where, compounded by data sparsity, it fails to capture the dynamic nature of conversational search intents shaped by the open world. As a result, the system skews towards popular but generic suggestions. In this work, we uncover an untapped paradigm shift to shatter this harmful feedback loop: harnessing user "free will" through active user expressions. Unlike traditional recommendations, conversational search empowers users to bypass menus entirely through manually typed queries. The open-world intents in active queries hold the key to breaking this loop. However, incorporating them is non-trivial: (1) there exists an inherent distribution shift between active queries and formulated starters. (2) Furthermore, the "non-ID-able" nature of open text renders traditional item-based popularity statistics ineffective for large-scale industrial streaming training. To this end, we propose Passive-Active Bridge (PA-Bridge), a novel framework that employs an adversarial distribution aligner to bridge the distributional gap between passively recommended starters and active expressions. Moreover, we introduce a semantic discretizer to enable the deployment of popularity debiasing algorithms. Online A/B tests on our platform, demonstrate that PA-Bridge significantly boosts the Feature Penetration Rate by 0.54% and User Active Days by 0.04%.

19.
arXiv (CS.CV) 2026-06-12

Transformer-Guided Graph Attention for Direct Cardiac Mesh Reconstruction: A Structural Digital Twin Framework

Building patient-specific cardiac models sits at the heart of precision cardiology, yet getting those models into clinical use keeps running into the same wall: mesh generation is slow, messy, and frustrating. The standard workflow – segmenting the image, running Marching Cubes, and then manually cleaning up the result – is time-consuming, inconsistent across operators, and demands specialist knowledge most clinical teams do not have. We take a fundamentally different approach. Instead of treating segmentation and mesh generation as two separate problems, we train a single end-to-end network that goes directly from a raw 3D medical image to a smooth, simulation-ready cardiac surface mesh. The core is a 3D Swin Transformer encoder-decoder that extracts volumetric features from CT or MRI volumes, paired with a Graph Attention Network (GAT) head that iteratively deforms a template mesh to fit the patient's cardiac boundary. We tested on the MM-WHS 2017 benchmark using both CT and MRI. Segmentation scores were competitive (Dice of 0.84 on CT, 0.83 on MRI), but the primary focus is mesh quality: mean Chamfer distance of 1.8 mm, with 95th-percentile surface distance below 5 mm. Every mesh is produced in a single forward pass – no Marching Cubes, no smoothing filters, no manual cleanup. We argue that for cardiac digital twin pipelines, geometric fidelity and topological correctness matter more than pixel-level Dice scores. By removing the post-processing bottleneck, this approach makes patient-specific cardiac simulation substantially more accessible for clinical use.

20.
arXiv (CS.LG) 2026-06-16

An Integrable Token Mixing Layer from the Generalized Yang Baxter Equation

arXiv:2606.15085v1 Announce Type: new Abstract: The YB Mixer is a sequence token mixing layer derived from free fermion and generalized Yang Baxter structures. It applies a core principle from integrable systems where a local algebraic constraint guarantees global computational stability. By using the Ising exchange algebra the mixer creates a free fermionic structure that acts as an exactly norm preserving orthogonal map. This algebra also produces commuting transfer matrices which allow inference to be order free and adaptable to any variable budget. To ensure the model can generalize to longer sequence lengths it uses a spectral circulant generator. This generator maintains the crucial orthogonal and commuting properties of the system. The result is a highly stable and mathematically grounded architecture for sequence processing.

21.
arXiv (CS.CL) 2026-06-12

Low-Latency Real-Time Audio Game Commentary System via LLM-Based Parallel Text Generation

We present a low-latency real-time audio game commentary system that generates spoken commentary directly from live gameplay video. In this end-to-end setting, a key bottleneck is accumulated waiting time; conventional pipelines capture frames, generate text, and synthesize speech sequentially for each utterance, and do not request the next generation until speech playback has completed. This strict sequentiality causes long and unnatural silence between utterances. To address this latency bottleneck, our system runs text generation in parallel with speech playback and buffers multiple candidate utterances ahead of time, enabling immediate synthesis at playback boundaries. Experiments on fast-paced game videos show that our parallel design reduces the mean inter-utterance silence from 9.6 seconds to 0.3 seconds compared to sequential baselines. It also improves similarity to professional speaking–silence timing patterns by over 40 %, and a user study with 120 experienced game players confirms significantly improved perceived speaking rhythm. Our demo video is available at: https://youtu.be/pmrRUlvav8M.

22.
arXiv (CS.CV) 2026-06-12

Learning Visually Interpretable Oscillator Networks for Soft Continuum Robots from Video

Learning soft continuum robot (SCR) dynamics from video offers flexibility but existing methods lack interpretability or rely on prior assumptions. Model-based approaches require prior knowledge and manual design. We bridge this gap by introducing: (1) The Attention Broadcast Decoder (ABCD), a plug-and-play module for autoencoder-based latent dynamics learning that generates pixel-accurate attention maps localizing each latent dimension's contribution while filtering static backgrounds, enabling visual interpretability via spatially grounded latents and on-image overlays. (2) Visual Oscillator Networks (VONs), a 2D latent oscillator network coupled to ABCD attention maps for on-image visualization of learned masses, coupling stiffness, and forces, thereby enabling mechanical interpretability. We validate our approach on single- and double-segment SCRs, demonstrating that ABCD-based models significantly improve multi-step prediction accuracy with 5.8x error reduction for Koopman operators and 3.5x for oscillator networks on a two-segment robot. VONs autonomously discover a chain structure of oscillators. This fully data-driven approach yields compact, mechanically interpretable models with potential relevance for future control applications.

23.
arXiv (CS.AI) 2026-06-17

Can LLMs Be CEOs? Benchmarking Strategic Resource Reallocation with Multi-Role Agent Simulation

arXiv:2606.17459v1 Announce Type: new Abstract: Evaluating the decision-making capabilities of large language models (LLMs) is a growing research priority, yet existing benchmarks focus on isolated cognitive tasks such as reasoning, knowledge retrieval, and economic rationality in stylized settings. These evaluations overlook the defining challenge of real executive decision-making: integrating conflicting recommendations from specialized stakeholders under information asymmetry, organizational constraints, and temporal dependencies. We introduce \textsc{CEO-Bench}, a multi-agent benchmark that evaluates LLMs on CEO-level strategic resource reallocation – the process of redirecting capital across business units in a multi-round, constraint-rich organizational environment. In \textsc{CEO-Bench}, LLM agents receive conflicting advice from four role-conditioned C-suite advisors (CFO, CTO, COO, CMO), each with private signals and distinct priorities, and must synthesize these into a concrete allocation plan evaluated along four dimensions: role integration, conditional boldness, history-sensitive judgment, and plan validity. Experiments across five frontier models on 13 scenarios reveal that all models achieve high structural validity but diverge sharply on strategic calibration – the hardest capability layer. We identify systematic failure modes including single-advisor capture, conservative default under ambiguity, and historical amnesia, and uncover a structural integration-boldness tradeoff: models that engage more deeply with conflicting perspectives tend to produce less decisive action. These findings delineate the current capability boundary of LLMs as organizational decision-makers and inform the design of future AI-assisted executive systems.

24.
arXiv (quant-ph) 2026-06-17

Helical Dirac Current with Local Coupling to a Chiral Potential

arXiv:2606.17618v1 Announce Type: new Abstract: We show that exact Dirac eigenstates in cylindrical confinement carry a definite helical conserved-current texture even in the zero orbital angular momentum channel l = 0. For the lowest confined mode, the Dirac current contains a nonvanishing azimuthal component together with longitudinal transport and exhibits opposite handedness in the two spin-resolved sectors. The structure also persists into the evanescent region. We further derive the channel-resolved matrix-element kernel generated by a static chiral scalar potential acting on the confined l = 0 Dirac modes. The resulting spin-selective coupling arises from the Dirac current texture and the scalar chiral potential, and yields a geometric selection rule in which diagonal channels vanish while off-diagonal conversion channels survive. The coupling strength is governed by an internal sampled-current overlap Jchi(k), defined as the integral from 0 to R of f(rho) times jphi_up(rho, k) times rho d rho. This quantity measures the spatial overlap between the chiral radial profile and the spin-up azimuthal Dirac-current density. The mechanism is fully local and texture-based, without external magnetic fields or spin-orbit coupling. Within standard Dirac theory, this work identifies the minimal static Dirac-geometric kernel underlying spin-selective response, establishing a baseline structure from which dynamical-medium, scattering, and transport formalisms can be systematically developed toward a complete description of spin-polarization phenomena such as CISS.

25.
arXiv (CS.AI) 2026-06-19

Reinforcement-aware Knowledge Distillation for LLM Reasoning

arXiv:2602.22495v3 Announce Type: replace-cross Abstract: Reinforcement learning (RL) post-training has recently driven major gains in long chain-of-thought reasoning large language models (LLMs), but the high inference cost of such models motivates distillation into smaller students. Most existing knowledge distillation (KD) methods are designed for supervised fine-tuning (SFT), relying on fixed teacher traces or teacher-student Kullback-Leibler (KL) divergence-based regularization. When combined with RL, these approaches often suffer from distribution mismatch and objective interference: teacher supervision may not align with the student's evolving rollout distribution, and the KL regularizer can compete with reward maximization and require careful loss balancing. To address these issues, we propose RL-aware distillation (RLAD), which performs selective imitation during RL – guiding the student toward the teacher only when it improves the current policy update. Our core component, Trust Region Ratio Distillation (TRRD), replaces the teacher-student KL regularizer with a PPO/GRPO-style likelihood-ratio objective anchored to a teacher–old-policy mixture, yielding advantage-aware, trust-region-bounded distillation on student rollouts and naturally balancing exploration, exploitation, and imitation. Across diverse logic reasoning and math benchmarks, RLAD consistently outperforms offline distillation, standard GRPO, and KL-based on-policy teacher-student knowledge distillation.