Academic Intelligence · Curated Daily

探索全球前沿学术脉络

AcademicHub 汇聚顶级期刊与预印本平台的实时文献。定制您的专属科研雷达,利用大语言模型自动生成交叉领域文献分析简报。

01.
arXiv (CS.LG) 2026-06-11

Loss Landscape Diagnosis for Gradient-Based Gray-Scott System Inversion: Disentangling the Roles of PINN Components

作者:

arXiv:2606.11258v1 Announce Type: new Abstract: Gradient-based inversion of reaction-diffusion systems is typically approached via surrogate models or physics-informed neural networks (PINNs), while the most direct route, backpropagation through the PDE's structure itself, has largely been avoided. We pursue this direct route as a diagnostic probe, backpropagating a steady-state loss through unrolled Gray-Scott simulation to recover its parameters, with no surrogate or neural-network augmentation. Optimization fails to converge, and plotting the landscape directly locates the failure in its geometry – flat plateaus with no gradient signal, bounded by sharp cliffs that align with bifurcation boundaries – a structure that recurs across loss functions and is inherited however the gradients are routed to parameters. Reading this minimal setup as an ablation of PINN, we disentangle each component's role: with the neural network fixed, the residual loss is quadratic in the PDE parameters and yields a smooth landscape, so it alone already avoids the pathology, by implicitly encoding the full PDE dynamics across all initial conditions. The neural network, for its part, cannot repair an ill-posed parameter subspace, and so serves only to complete the observed data – a division of labor not previously made explicit. These findings carry concrete design implications for PINN-type methods and a broader heuristic on when added dimensions actually help.

02.
arXiv (CS.CL) 2026-06-17

ZeroSyl: Simple Zero-Resource Syllable Tokenization for Spoken Language Modeling

Pure speech language models aim to learn language directly from raw audio without textual resources. A key challenge is that discrete tokens from self-supervised speech encoders result in excessively long sequences, motivating recent work on syllable-like units. However, methods like Sylber and SyllableLM rely on intricate multi-stage training pipelines. We propose ZeroSyl, a simple training-free method to extract syllable boundaries and embeddings directly from a frozen WavLM model. Using L2 norms of features in WavLM's intermediate layers, ZeroSyl achieves competitive syllable segmentation performance. The resulting segments are mean-pooled, discretized using K-means, and used to train a language model. ZeroSyl outperforms prior syllabic tokenizers across lexical, syntactic, and narrative benchmarks. Scaling experiments show that while finer-grained units are beneficial for lexical tasks, our discovered syllabic units exhibit better scaling behavior for syntactic modeling.

03.
arXiv (CS.CL) 2026-06-18

LLMZero: Discovering Adaptive Training Strategies for RL Post-Training via LLM Agents

RL post-training strategies are dataset-dependent and reveal a recurring empirical pattern: capacity parameters accumulate monotonically across stages, while regularization parameters predominantly oscillate in response to shifting training dynamics. This distinction matters because fixed schedules commit all parameters to fixed trajectories and therefore cannot express the non-stationary exploration-exploitation tradeoffs that regularization must track; the principle provides actionable design rules for multi-stage training. We discover this through LLMZero, a system where LLM agents search over training trajectories via tree search, diagnosing pathologies at each checkpoint and proposing coordinated multi-parameter transitions. Across 4 diverse GRPO tasks, LLMZero discovers strategies that improve over the base model by 9% to 140% relative and over grid search by 6% to 15% relative, consistently outperforming random search and the skill-based agent. The structural principle transfers across tasks, providing an explanation for why discovered strategies take qualitatively different forms yet share similar parameter dynamics.

04.
arXiv (quant-ph) 2026-06-15

No classical particle limit for massless quanta

arXiv:2606.14632v1 Announce Type: new Abstract: We investigate whether relativistic massless classical particles may emerge as the classical limit of massless quanta. To address this question independently of any specific dynamics, environment, or pointer basis, we develop an axiomatic and purely kinematical framework for the coarse-graining approach. In this formulation, a candidate classical phase space is taken as the outcome space of a POVM subject only to minimal classicality and covariance under the relevant spacetime symmetry group. Applying this framework to the Poincaré group, we prove a no-go theorem for massless particles: the covariance requirement is incompatible with the operational conditions for classicality. The theorem leaves open field-like limits of massless quanta, for example the emergence of electromagnetic or gravitational fields, while ruling out classical massless particles, such as classical photons or gravitons.

05.
arXiv (CS.CL) 2026-06-15

Detecting Historical Turning Points in Italian Media: A Complex Systems Approach to a Diachronic News Corpus

The increasing availability of large-scale textual corpora has opened new possibilities for data-driven, quantitative approaches to historical analysis using Natural Language Processing (NLP). However, diachronic corpora with historical relevance from the pre-digital era remain scarce and often incomplete. We present a quantitative approach to historical analysis based on the reconstruction and exploration of a diachronic corpus of around 600,000 articles from the Italian newspaper "La Repubblica", covering all the articles published from the 1st of January 1985 to the 31st of December 2000 - a period of major political, social, and geopolitical change in Italy and globally. Using NLP techniques, we analyze the text at both lexical and semantic levels; we then apply tools from complex systems and statistical physics to trace shifts in media discourse over time. This allows us to detect key transition periods, such as the transition from the First Republic to the Second Republic in Italy, or major international conflicts like the Gulf War or the Kosovo War, without relying on prior labeling. The results show how combining computational linguistics with ideas from complex systems can offer new quantitative insight into historical changes, opening up new paths for studying the dynamics of media and society through large-scale textual data.

06.
medRxiv (Medicine) 2026-06-18

Empirical Validation and Predictive Utility of the Perinatal Grief Scale in Men after Perinatal Loss

Background. The Perinatal Grief Scale (PGS) is a widely used instrument for assessing grief following pregnancy loss, yet no study has validated it specifically in men despite documented use in several studies. This gap is critical given fathers' persistent underrepresentation in perinatal bereavement research and the absence of empirically supported screening thresholds for this population. Methods. This cross-sectional validation study used data from the OPALE project (Observatory on PerinatAL hEalth) conducted by the CiaoLapo Foundation in Italy. Among 276 fathers who experienced stillbirth or miscarriage, we examined criterion validity by testing the association between PGS scores and trauma-related symptomatology assessed via three validated instruments: the Revised Impact of Event Scale (RIES, n=103), National Stressful Events Survey Short Scale (NSESSS, n=95), and SCL-90 (n=173). We systematically tested multiple threshold combinations to identify optimal discriminative performance. Results. The PGS demonstrated excellent criterion validity. The optimal threshold (PGS >=92) showed sensitivity 81.0%, specificity 81.8%, and Youden's J index 0.628. Fathers scoring >=92 had 19.12 times the odds of high trauma symptoms (95% CI: 9.35 to 39.14, p

07.
medRxiv (Medicine) 2026-06-22

Symptom-based phenotype discovery in motor neuron disease using natural language processing of electronic health records

Background: Motor neuron disease (MND) is a fatal neurodegenerative condition with significant clinical heterogeneity that is incompletely captured by existing phenotype classifications based on onset site. Electronic health records (EHRs) contain detailed symptom documentation in clinical narratives that may enable data-driven discovery of clinically meaningful patient subgroups. Methods: We developed a natural language processing (NLP) pipeline using MedCAT to extract symptoms from clinical notes of 2,361 people with a confirmed diagnosis of MND at a tertiary neurology center. MND cohort confirmation used three complementary methods: clinic attendance records, text-based diagnosis detection, and NLP extraction with negation detection. Extracted symptoms were filtered to Unified Medical Language System semantic type T184 (Sign or Symptom) with removal of negated concepts. Patients were clustered using latent class analysis on binary symptom profiles. Survival differences were assessed using Kaplan-Meier analysis, log-rank tests, and Cox proportional hazards regression. Results: From the first clinical notes, we identified four clusters of symptoms among 872 patients and 76 symptoms: Motor-Bulbar (n=373), Motor-Tremor (n=154), Sensory-Pain (n=222), and Motor-Respiratory (n=123). When extended to all clinical notes (n=2,065; 184 symptoms), these reorganized into three clusters: Autonomic-Respiratory (n=472), Nocturnal-Respiratory (n=338), and Classic Motor (n=1,255). Survival differences were significant across all clusters in both the first notes and all notes analyses (log-rank p < 0.001). Conclusions: NLP-based symptom extraction from EHRs identifies clinically meaningful MND subgroups that extend beyond traditional onset-site classifications. Autonomic-respiratory symptom burden is associated with poorer survival while a newly identified Sensory-Pain subtype with a better prognosis. These data-driven phenotypes may improve prognostication and inform targeted supportive care.

08.
arXiv (CS.AI) 2026-06-11

MobilityBench: A Benchmark for Evaluating Route-Planning Agents in Real-World Mobility Scenarios

arXiv:2602.22638v2 Announce Type: replace Abstract: Route-planning agents powered by large language models (LLMs) have emerged as a promising paradigm for supporting everyday human mobility through natural language interaction and tool-mediated decision making. However, systematic evaluation in real-world mobility settings is hindered by diverse routing demands, non-deterministic mapping services, and limited reproducibility. In this study, we introduce MobilityBench, a scalable benchmark for evaluating LLM-based route-planning agents in real-world mobility scenarios. MobilityBench is constructed from large-scale, anonymized real user queries collected from Amap and covers a broad spectrum of route-planning intents across multiple cities worldwide. To enable reproducible, end-to-end evaluation, we design a deterministic API-replay sandbox that eliminates environmental variance from live services. We further propose a multi-dimensional evaluation protocol centered on outcome validity, complemented by assessments of instruction understanding, planning, tool use, and efficiency. Using MobilityBench, we evaluate multiple LLM-based route-planning agents across diverse real-world mobility scenarios and provide an in-depth analysis of their behaviors and performance. Our findings reveal that current models perform competently on Basic information retrieval and Route Planning tasks, yet struggle considerably with Preference-Constrained Route Planning, underscoring significant room for improvement in personalized mobility applications. We publicly release the benchmark data, evaluation toolkit, and documentation at https://github.com/AMAP-ML/MobilityBench.

09.
arXiv (CS.CV) 2026-06-16

FusionRS: A Large-Scale RGB-Infrared Remote Sensing Dataset for Dual-Modal Vision-Language Foundation Models

Remote sensing vision-language models have advanced Earth observation understanding, but most existing work remains centered on RGB imagery, leaving the complementary information in infrared data underexplored. Infrared images provide distinctive cues, including thermal intensity structures, object boundaries, and illumination-invariant scene features, which can enrich visual-language learning beyond conventional RGB observations. However, a large-scale RGB-infrared-text dataset for remote sensing vision-language modeling is still absent. To address this gap, we introduce FusionRS, the first large-scale RGB-infrared-text dataset designed for dual-modal vision-language learning in remote sensing. FusionRS is constructed by translating diverse public RGB remote sensing images into infrared-style counterparts, forming aligned RGB-IR image pairs. Each pair is associated with conventional scene captions and IR-aware captions that explicitly describe infrared-specific visual properties while preserving semantic content. Based on FusionRS, we train dual-modal vision-language foundation models for RGB-IR joint understanding. We first train CLIP-style models for RGB-IR-text alignment, and then fine-tune generative VLMs for dual-modal RGB-IR captioning. Experiments show that FusionRS improves RGB-IR alignment, infrared-to-text retrieval, and dual-modal captioning over RGB-only and non-IR-aware training settings. Ablation studies further verify that IR-aware captions are crucial for strengthening infrared-language alignment, highlighting the importance of modality-specific textual supervision for more scalable RGB-infrared remote sensing vision-language representation learning.

10.
arXiv (quant-ph) 2026-06-12

Experiment-compatible measurement–feedback quantum state preparation with reinforcement learning

arXiv:2606.13005v1 Announce Type: new Abstract: Ground-state preparation is a critical task in quantum simulation and quantum computing, as it enables the study of correlated phases and the generation of entangled resource states. While measurement–feedback control has emerged as a promising route to state preparation, existing schemes either rely on handcrafted, task-specific policies or are designed using full quantum-state information that is unavailable in real experiments and becomes impractical for large many-body systems. Here we develop an adaptive measurement–feedback protocol based on reinforcement learning under partial observability. The controller uses only the history of experimentally accessible measurement outcomes to choose both the measurement operator and the feedback action in real time. To make training compatible with experiments, we introduce a stochastic terminal reward built from one-shot measurements of randomly sampled Hamiltonian components, avoiding unphysical full-state reconstruction while remaining an unbiased estimator of the target energy. We demonstrate the method by preparing ground states of the Bose–Hubbard model and by generating GHZ states, establishing a scalable and hardware-compatible route to quantum state preparation.

11.
arXiv (CS.CV) 2026-06-16

Planning with Unified Multimodal Models

With the powerful reasoning capabilities of large language models (LLMs) and vision-language models (VLMs), many recent works have explored using them for decision-making. However, most of these approaches rely solely on language-based reasoning, which limits their ability to reason and make informed decisions. Recently, a promising new direction has emerged with unified multimodal models (UMMs), which support both multimodal inputs and outputs. We believe such models have greater potential for decision-making by enabling reasoning through generated visual content. To this end, we propose Uni-Plan, a planning framework built on UMMs. Within this framework, a single model simultaneously serves as the policy, dynamics model, and value function. In addition, to avoid hallucinations in dynamics predictions, we present a novel approach self-discriminated filtering, where the generative model serves as a self-discriminator to filter out invalid dynamics predictions. Experiments on embodied decision-making tasks show that Uni-Plan substantially improves success rates compared to VLM-based methods, while also showing strong data scalability, requiring no expert demonstrations and achieving better performance under the same training-data size. This work lays a foundation for future research in reasoning and decision-making with UMMs.

12.
medRxiv (Medicine) 2026-06-18

Guiding the development of climate counterfactuals for health impact attribution studies

Climate change detection and attribution (D&A) methods have become vital for quantifying the influence of anthropogenic forcing on the Earth's systems, including human health. Health impact attribution (HIA) studies seek to disentangle climate-driven health effects from natural variability yet are often constrained by the availability of accessible counterfactual climate scenarios. This tutorial paper presents a flexible, reproducible framework for developing counterfactual climates without reliance on computationally intensive global circulation models. We provide practical, R-based methodologies for constructing both trend-based (temperature and non-temperature) and event-based counterfactual, using a variety of techniques including model residual detrending, data-driven decomposition (e.g., Singular Spectrum Analysis and Empirical Mode Decomposition) and stochastic weather generators. The tutorial also explores the incorporation of greenhouse gas concentrations as forcing variables, rather than global mean temperature anomalies. By operationalising these methods through worked examples and an open code repository, this paper aims to build capacity within the HIA community, enhance methodological transparency, and foster interdisciplinary collaboration between climate and health researchers.

13.
arXiv (math.PR) 2026-06-16

Flowing to Normality and the Fate of the Single Ring Theorem

arXiv:2606.15791v1 Announce Type: cross Abstract: Random non-hermitian matrix ensembles with double-sided rotation invariance obey, in the limit of large matrix size, the Single Ring Theorem, which states that the support of the mean eigenvalue distribution in the complex plane is either a disk or an annulus. In contrast, rotational-invariant random normal matrix ensembles can have mean eigenvalue densities supported over any number of concentric annuli in the complex plane. In this paper we introduce and investigate, both analytically and numerically, a non-hermitian matrix model which flows from a generic matrix distribution obeying the Single Ring Theorem to a distribution of normal matrices by tuning a parameter which penalizes non-normality. We observe numerically breakdown of the Single Ring Theorem as the model flows towards normality, and determine the critical value of the parameter at which the transition occurs. We also study in detail the behavior of the singular values of these matrices under the flow. These singular values form a Fermi gas confined to the positive half-line. In particular, we find that at small values of the flow parameter, the interparticle spacings in the gas exhibit Wigner-Dyson repulsion, whereas for asymptotically large values of the flow parameter, at the normal matrix endpoint of the flow, the spacing statistics is Poissonian. The flow interpolates continuously between these two types of statistics. However, this change in statistics is not related directly to breaking of the Single Ring Theorem, which occurs very early-on along the flow, in the regime of Wigner-Dyson statistics. Finally, we introduce a certain ensemble of random permutations associated with the gas, and make a conjecture on how to use it in order to reconstruct approximately the average density of complex eigenvalues from that of the singular values in the large-$N$ limit.

14.
arXiv (CS.LG) 2026-06-17

Price of metric universality in vector quantization is at most 0.11 bit

arXiv:2602.05790v2 Announce Type: replace-cross Abstract: Fast computation of a matrix product $W^\top X$ is a workhorse of modern LLMs. To make their deployment more efficient, a popular approach is that of using a low-precision approximation $\widehat W$ in place of true $W$ (``weight-only quantization''). Information theory demonstrates that an optimal algorithm for reducing precision of $W$ depends on the (second order) statistics of $X$ and requires a careful alignment of vector quantization codebook with PCA directions of $X$ (a process known as ``waterfilling allocation''). Dependence of the codebook on statistics of $X$, however, is highly impractical. This paper proves that there exist a universal codebook that is simultaneously near-optimal for all possible statistics of $X$, in the sense of being at least as good as an $X$-adapted waterfilling codebook with rate reduced by 0.11 bit per dimension in the case when $W$ is Gaussian. Such universal codebook would be an ideal candidate for the low-precision storage format, a topic of active modern research, but alas the existence proof is non-constructive. Equivalently, our result shows existence of a net in $\mathbb{R}^n$ that is a nearly-optimal covering of a sphere simultaneously with respect to all Hilbert norms.

15.
arXiv (CS.CV) 2026-06-16

DC-Motion: Decoupling Semantics and Details via Discrete-Continuous Tokens for Human Motion Generation

Text-to-motion generation requires synthesizing physically realistic dynamics that strictly follow complex and long-horizon textual instructions. Existing approaches rely on homogeneous representation spaces that may fail to capture the hierarchical nature of human motion, with diffusion models struggling at compositional semantic reasoning and AR models sacrificing fine-grained physical details due to quantization. To solve it, we introduce DC-Motion, a factorized generative framework designed to explicitly decouple semantics and details via discrete-continuous tokens. A Discrete-Continuous VAE (DC-VAE) first decomposes motion into discrete tokens for semantics and continuous residuals for fine-grained dynamics. Then, a masked AR model predicts the discrete structure from text, and a lightweight residual diffusion model recovers the continuous physical details. Extensive experiments demonstrate that DC-Motion effectively improves the capability to follow complex instructions. By effectively balancing semantic controllability and physical realism, our approach offers a highly adaptable modeling paradigm for human motion generation. On both HumanML3D and KIT-ML datasets, DC-Motion achieves state-of-the-art performance, delivering the best FID for motion realism and R-precision for text alignment.

16.
arXiv (CS.LG) 2026-06-12

Predicting Cognitive Load from Speech and Interaction Dynamics in Dyadic Conversations

arXiv:2606.12971v1 Announce Type: new Abstract: Estimating cognitive load from speech has largely been studied in controlled laboratory settings, with limited understanding of its reliability in natural collaborative conversations. We investigate whether speech and interaction dynamics predict perceived cognitive load during dyadic conversations. We analyze audio from 53 dyads performing nine collaborative tasks and extract static acoustic, dynamic, and interaction features to train a two-head Gated Recurrent Unit encoder to predict cognitive load scores. Results show conversational interaction provides useful signals for predicting cognitive load related to time pressure, mental work, effort, and task performance. Temporal demand is associated with turn-taking dynamics such as overlap and speaker switch, while mental demand is linked to imbalanced participation between speakers. These findings highlight the importance of task structure and conversational interaction for modeling cognitive load in natural collaborative settings.

17.
Nature (Science) 2026-06-19

Daily briefing: Human detritus remakes geology

作者:

What, exactly, is a rock? Plus, a stem-cell success for a severe autoimmune disease and evidence that ‘AI deskilling’ is real. Researchers have tracked the electrical activity of individual brain cells during conversation in real time. Plus, the history of GPS and a cross-species transplant that could reveal clues about the origin of animals.

18.
arXiv (CS.AI) 2026-06-16

E-mem: Multi-agent based Episodic Context Reconstruction for LLM Agent Memory

arXiv:2601.21714v5 Announce Type: replace Abstract: The evolution of Large Language Model (LLM) agents towards System~2 reasoning, characterized by deliberative, high-precision problem-solving, requires maintaining rigorous logical integrity over extended horizons. However, prevalent memory preprocessing paradigms suffer from destructive de-contextualization. By compressing complex sequential dependencies into pre-defined structures (e.g., embeddings or graphs), these methods sever the contextual integrity essential for deep reasoning. To address this, we propose E-mem, a framework shifting from Memory Preprocessing to Episodic Context Reconstruction. Inspired by biological engrams, E-mem employs a heterogeneous hierarchical architecture where multiple assistant agents maintain uncompressed memory contexts, while a central master agent orchestrates global planning. Unlike passive retrieval, our mechanism empowers assistants to locally reason within activated segments, extracting context-aware evidence before aggregation. Evaluations on the LoCoMo benchmark demonstrate that E-mem achieves over 54\% F1, surpassing the state-of-the-art GAM by 7.75\%, while reducing token cost by over 70\%.

19.
arXiv (CS.AI) 2026-06-11

Mind the Perspective: Let's Reason Recursively for Theory of Mind

arXiv:2606.11724v1 Announce Type: new Abstract: Theory of Mind (ToM) reasoning requires inferring agents' beliefs from partial and asymmetric observations, which remains an open challenge for LLMs. Existing prompting-based approaches improve ToM reasoning through observable-event filtering or temporal belief chains, without explicitly modeling nested beliefs. We introduce RecToM, an inference-time framework for ToM reasoning that models nested beliefs via recursive perspective construction. RecToM constructs each character perspective from the preceding character perspective along the character chain specified by the question, reducing higher-order belief questions to actual-world questions within the final constructed perspective. We further provide a KD45 analysis showing that RecToM's perspective construction induces a well-formed belief modality beyond simple event filtering. Experiments on ToM benchmarks, including Hi-ToM, Big-ToM, and FanToM, across multiple LLM backbones show that RecToM consistently outperforms recent advanced approaches, achieving state-of-the-art performance. Notably, RecToM reaches 100\% accuracy on Hi-ToM with GPT-5.4 and Qwen3.5, a benchmark requiring higher-order ToM reasoning.

20.
medRxiv (Medicine) 2026-06-17

The interaction between chronic hepatitis B (CHB) and Metabolic dysfunction-associated steatotic liver disease (MASLD) in a diverse central London population

Introduction: The overlap between chronic hepatitis B (CHB) and metabolic dysfunction-associated steatotic liver disease (MASLD) is an emerging global health challenge. We investigated the impact of MASLD and metabolic comorbidity in a diverse London viral hepatitis clinic. Methods: This retrospective cross-sectional study (May 2018-Feb 2024) included adults with CHB having controlled attenuation parameter (CAP) measurements. MASLD was defined as CAP >264 dB/m plus [&ge;]1 cardiometabolic factor (CMF). We used univariable and multivariable models to examine MASLD's relationship with liver stiffness and hepatitis B viral load (HBV VL). Results: Among 323 individuals (67% male, median age 36), most were from Black (35%) or non-white British/Irish (29%) backgrounds. Overall, 64% had [&ge;]1 CMF, and 20% had MASLD. The CHB/MASLD group was significantly older (median 43 vs 35 years, p

21.
arXiv (CS.LG) 2026-06-15

On the Influence of the Feature Computation Budget on Per-Instance Algorithm Selection for Black-Box Optimization

arXiv:2605.04954v2 Announce Type: replace-cross Abstract: Per-instance algorithm selection (PIAS) takes advantage of complementarity between a set of algorithms by deciding which algorithm to run on a given instance. This decision is based on features of the instances, which, in the context of black-box optimization (BBO), require a part of the optimization budget to be computed. This raises two questions: (a) from which fraction of the budget spent on feature computation does PIAS become worth it for BBO, and (b) which fraction of the budget optimizes the tradeoff between feature accuracy and PIAS performance. To this end, we perform a broad study where PIAS with varying sampling budgets for feature computation is compared to the single best algorithm on a broad range of algorithm selection scenarios. These scenarios consist of two portfolio sizes, three problem sets, 4 dimensionalities, and 10 target budgets. We find that PIAS is viable for the majority of tested scenarios, even when as much as a quarter of the total budget is spent on feature computation. The tradeoff for the fraction of the budget spent on feature computation to maximize the benefit of PIAS is highly dependent on the specific AS scenario. Further, on average 20 percent of PIAS loss to the virtual best solver is explained by the budget spent on feature computation, highlighting the importance of properly accounting for the feature budget.

22.
arXiv (CS.CV) 2026-06-17

Revisiting LLM Adaptation for 3D CT Report Generation: A Study of Scaling and Diagnostic Priors

Recent advances in multimodal learning, including large language models (LLMs) and vision-language models (VLMs), have demonstrated strong adaptability to natural images. However, extending their use to the medical domain, particularly for volumetric (3D) images, is challenging due to high computational complexity, volumetric dependencies and the semantic gap between visual features and clinical terminology. Naively fine-tuning LLMs on limited medical data often leads to overfitting and clinical hallucination, where linguistic fluency is prioritized over clinical factuality. In this study, we investigate parameter-efficient adaptation strategies for volumetric CT report generation and introduce RAD3D-Prefix, a lightweight diagnostic-prior conditioning framework that minimizes the need for extensive parameter training. This module integrates image embeddings with multi-label diagnostic classification logits, preserving critical clinical details while bridging the semantic gap. By keeping the LLM frozen, our method requires minimal trainable parameters and mitigates the risk of overfitting on small, domain-specific datasets. Through a systematic study spanning LLMs from 96.1M to 1.6B parameters, we find that fine-tuning is most beneficial for smaller LLMs, whereas freezing larger (~1B+ LLMs and training only lightweight projection layers provides a superior trade-off between performance, generalization, and computational efficiency. Across multiple automatic metrics and a clinical reader study, RAD3D-Prefix outperforms comparable parameter-efficient baselines and demonstrates strong out-of-domain generalization while using substantially fewer trainable parameters than fully fine-tuned alternatives.

23.
arXiv (quant-ph) 2026-06-11

TensorKit.jl: A Julia package for large-scale tensor computations, with a hint of category theory

arXiv:2508.10076v2 Announce Type: replace-cross Abstract: TensorKit$.$jl is a Julia-based software package for tensor computations, especially focusing on tensors with internal symmetries. This paper introduces the design philosophy, core functionalities, and distinctive features, including how to handle abelian, non-abelian, and anyonic symmetries through the ``TensorMap'' type. We highlight the software's flexibility, performance, and its capability to extend to new tensor types and symmetries, illustrating its practical applications through select case studies.

24.
arXiv (CS.CV) 2026-06-17

Learning QoE from Packet-Level Measurements in Encrypted Video Conferencing Traffic

The quality of the user experience has become one of the most important aspects in todays world, as it directly influences individuals willingness to continue using or abandon a product or service. In this context, video conferencing applications (VCAs), which experienced widespread adoption following the COVID-19 pandemic, must deliver excellent performance to remain competitive in an increasingly crowded market. Although content providers (CPs) such as Zoom, WhatsApp, Telegram, and Google Meet can assess conversation quality by comparing transmitted and received data. The widespread use of end-to-end encryption in VCAs makes quality-of-experience (QoE) evaluation by internet service providers (ISPs) far more challenging. Since ISPs do not have access to the encrypted content, they must rely on passive measurements of unencrypted traffic characteristics on the data path. In this work, we present a simple yet effective QoE prediction framework based on an almost stock convolutional neural network (CNN) architecture that uses only the packet sizes extracted from the communication between two participants in a video conferencing (VC) call to predict two QoE metrics: BRISQUE and MOS. The proposed framework is simple, easy to implement, and does not require high-end computational resources, yet it provides superior prediction performance, as shown in our experiments on two custom datasets collected from WhatsApp and Zoom, which achieve substantial improvements over previous models for the QoE prediction task.

25.
arXiv (math.PR) 2026-06-19

Optimal Sparsification of Gaussian Processes

arXiv:2606.19763v1 Announce Type: new Abstract: We prove an optimal dimension-free sparsification theorem for suprema of centered Gaussian processes. Given a bounded set $T\subseteq\mathbb{R}^n$, we show that the supremum of the canonical Gaussian process on $T$ can be $L^2$-approximated by the supremum of a shifted subprocess indexed by only $\exp(O(1/\varepsilon^2))$ points, with error at most $\varepsilon$ times the Gaussian width of $T$. In particular, the size of the approximating process is independent of both the ambient dimension and the cardinality of the original index set. This improves a recent sparsification theorem of De, Nadimpalli, O'Donnell, and Servedio (2026) by an exponential factor, and we show that the dependence on $\varepsilon$ is tight up to constants in the exponent. As consequences, we obtain an exponentially improved junta theorem for norms over Gaussian space and sharpen results on learning, property testing, and polyhedral approximation of convex sets under the Gaussian measure. The proof is based on an interpolation argument that combines Sudakov's minoration with the Brascamp–Lieb inequality.