Academic Intelligence · Curated Daily

探索全球前沿学术脉络

AcademicHub 汇聚顶级期刊与预印本平台的实时文献。定制您的专属科研雷达,利用大语言模型自动生成交叉领域文献分析简报。

01.
arXiv (quant-ph) 2026-06-16

Generalized symmetries, invariant solutions and conservation laws in the Jaynes-Cummings model

arXiv:2606.15538v1 Announce Type: cross Abstract: In this work, we investigate the Jaynes–Cummings model (JCM) using Lie symmetry analysis and conservation-law theory. The dynamics is formulated as a system of partial differential equations by projecting the von Neumann equation onto the atomic degrees of freedom and representing the field mode through its characteristic function. We determine the admitted point and generalized symmetries and construct invariant solutions satisfying the physical conditions imposed by quantum mechanics. The conventional dressed-state dynamics is recovered while a second class of solutions with radial dependence expressed through Heun polynomials is obtained for coupled atom–field configurations. We also apply the generating functions methodology to derive local conservation laws of the JCM differential system. Besides recovering the conservation of the total number of excitations, we obtain additional conserved currents involving atomic populations, coherence, reduced-state purity, and moments of the field characteristic function. In particular, we derive a balance equation for a combination of atomic purity and coherence whose evolution is controlled by the atom–field coupling and is linked to atom–field correlation and entanglement dynamics. The symmetry structure further generates generalized symmetries and an infinite hierarchy of conservation laws.

02.
arXiv (CS.CV) 2026-06-18

Fuzzy-Geometric Branch-Point Modeling for Structure-Aware Augmentation of Handwritten Chinese Characters

Data scarcity and structural distortion significantly limit handwriting recognition in high-security authentication. Existing augmentation methods often cause topological and morphological damage, particularly when processing complex Chinese characters where stroke intersections, ligatures, and sharp turns render traditional branch-point detection unreliable. To address this, this paper proposes a fuzzy geometry-driven structure-aware (FGSA) augmentation framework. We model branch points as fuzzy sets within the skeleton space, constructing a continuous branch-point membership field by integrating topological neighborhood evidence with direction field divergence. This membership field is adaptively optimized via an unsupervised surrogate objective, enabling robust stroke decoupling without manual annotation. Finally, kinematically-aligned samples are synthesized through parameterized cubic Bézier reconstruction and multi-strategy perturbations, ensuring a balance between structural fidelity and sample diversity. Moreover, we establish LZUSig, a large-scale, highly challenging dataset specifically dedicated to fine-grained structural degradation in Chinese handwritten signatures. Extensive experiments on CASIA-HWDB1.1, ChiSig, and LZUSig demonstrate that FGSA significantly reduces the word-level error rate ($\Delta$WER), achieving optimal recognition gains over the compared baselines. More importantly, it strikes a robust trade-off among task gain, structural fidelity, and discriminative feature preservation, offering a highly controllable solution for handwriting augmentation.

03.
arXiv (CS.AI) 2026-06-16

The Missing Knowledge Layer in Cognitive Architectures for AI Agents

arXiv:2604.11364v2 Announce Type: replace Abstract: The two most influential cognitive architecture frameworks for AI agents, CoALA [21] and JEPA [12], both lack an explicit Knowledge layer with its own persistence semantics. This gap produces a category error: systems apply cognitive decay to factual claims, or treat facts and experiences with identical update mechanics. We survey persistence semantics across existing memory systems and identify eight convergence points, from Karpathy's LLM Knowledge Base [10] to the BEAM benchmark's near-zero contradiction-resolution scores [22], all pointing to related architectural gaps. We propose a four-layer decom position (Knowledge, Memory, Wisdom, Intelligence) where each layer has fundamentally different persistence semantics: indefinite supersession, Ebbinghaus decay, evidence-gated revision, and ephemeral inference respectively. Companion implementations in Python and Rust demonstrate the architectural separation is feasible. We borrow terminology from cognitive science as a useful analogy (the Knowledge/Memory distinction echoes Tulving's trichotomy), but our layers are engineering constructs justified by persistence-semantics requirements, not by neural architecture. We argue that these distinctions demand distinct persistence semantics in engineering implementations, and that no current framework or system provides this.

04.
arXiv (CS.AI) 2026-06-12

Geometric and Quantum Kernel Methods for Predicting Skeletal Muscle Outcomes in chronic obstructive pulmonary disease

arXiv:2601.00921v3 Announce Type: replace-cross Abstract: Chronic obstructive pulmonary disease (COPD) affects hundreds of millions of people worldwide, and skeletal-muscle dysfunction is clinically important. Quantum machine learning is increasingly explored for biomedical prediction, but its value in small biomarker cohorts requires benchmarking against strong classical baselines. We analysed a cigarette-smoke COPD cohort of 213 animals with blood and bronchoalveolar-lavage biomarkers to predict tibialis anterior muscle weight, muscle quality, and force. We developed a kernel-geometric quantum hybrid method in which synthetic symmetric positive definite (SPD) references are mapped through a reproducing kernel Hilbert space, compressed using train-only random projection, normalised, and supplied to low-dimensional quantum regression circuits. We benchmarked this approach against classical ridge/kernel models, SPD relational representations, and quantum-kernel regression (QKR). All methods were evaluated using condition-stratified repeated cross-validation. The largest numerical improvement was observed for muscle weight, where the proposed method had the numerically lowest mean root mean squared error (RMSE), approximately 1.8% below the best classical comparator; paired fold-level testing did not establish statistically significant superiority after Holm adjustment, but the endpoint is biologically meaningful. The method also had the numerically lowest mean RMSE for muscle quality. For force, biomarker-only Ridge performed best, suggesting a more linear endpoint structure.

05.
arXiv (CS.AI) 2026-06-11

DecompSR: A dataset for decomposed analyses of compositional multihop spatial reasoning

arXiv:2511.02627v3 Announce Type: replace Abstract: We introduce DecompSR, decomposed spatial reasoning, a large benchmark dataset (over 5m datapoints) and generation framework designed to analyse compositional spatial reasoning ability. The generation of DecompSR allows users to independently vary several aspects of compositionality, namely: productivity (reasoning depth), substitutivity (entity and linguistic variability), overgeneralisation (input order, distractors) and systematicity (novel linguistic elements). DecompSR is built procedurally in a manner which makes it is correct by construction, which is independently verified using a symbolic solver to guarantee the correctness of the dataset. DecompSR is comprehensively benchmarked across a host of Large Language Models (LLMs) where we show that LLMs struggle with productive and systematic generalisation in spatial reasoning tasks whereas they are more robust to linguistic variation. DecompSR provides a provably correct and rigorous benchmarking dataset with a novel ability to independently vary the degrees of several key aspects of compositionality, allowing for robust and fine-grained probing of the compositional reasoning abilities of LLMs.

06.
arXiv (CS.AI) 2026-06-25

Lightweight PCGAE-Net: Parallel CrossGate Attention and Bottleneck AutoEncoder for Efficient 5G Channel Prediction

arXiv:2606.25401v1 Announce Type: cross Abstract: Accurate channel state information (CSI) prediction is essential for proactive beamforming and resource management in 5G massive MIMO systems, yet the deployment of high-accuracy transformer-based predictors on base-station hardware remains challenging because the most capable models carry upwards of 30\,M parameters. This paper introduces Lightweight PCGAE-Net, which addresses the efficiency problem not by post-hoc compression but by correcting two architectural flaws in the current state of the art. The first is a sequential attention ordering bias: in CS3T-UNet, group-wise temporal attention (GTA) always operates on features that have already been transformed by cross-shaped spatial attention (CSA), distorting what temporal information GTA can capture. We remove this dependency by routing both attention modules to the same layer-normalized input and combining their independent outputs through a learned per-channel sigmoid CrossGate. The second flaw is an uncompressed bottleneck: applying full self-attention at the deepest encoder stage, where channel depth reaches $4C$, is quadratically expensive and carries redundant features. A Bottleneck AutoEncoder (BAE) with $1\times1$ convolutions halves this depth and uses an auxiliary reconstruction loss to prevent information collapse. Wrapping these components inside a shallower encoder-decoder with frequency-domain dimensionality reduction ($N_f\!=\!32$, $C\!=\!48$) produces a model with just 8.54\,M parameters – 58\% fewer than the CS3T-UNet baseline – that outperforms it by up to 3.26\,dB at 5\,km/h and 6.0\,dB at 9\,km/h in single-step prediction on QuaDriGa dataset.

07.
arXiv (CS.LG) 2026-06-16

Machine Learning-Driven Chemical Reactor Network Modeling of the Sandia-D Flame

arXiv:2606.14729v1 Announce Type: cross Abstract: Turbulent combustion simulations are crucial for many scientific and engineering systems. However, the high cost to fully resolve the complex multiscale and multiphysics behavior makes direct simulation typically infeasible. The equivalent reactor network (ERN) approach attempts to improve computational efficiency by replacing a multidimensional turbulent simulation with a series of much cheaper 0-D and 1-D chemical reactors, providing a surrogate model that retains detailed chemistry at the cost of simplified flow physics. However, their development remains a challenge, often requiring either expert analysis, or automated approaches that sacrifice accuracy. In this work, we develop an automated machine-learning-assisted framework for constructing ERNs of the Sandia-D turbulent methane/air flame. Principal component analysis is first used to reduce high-dimensional thermochemical computational fluid dynamics (CFD) data to a low-dimensional latent space, where k-means clustering identifies physically interpretable flame regions used to initialize a reactor-network graph. This initialization is then refined using finite-difference gradient descent wrapped around non-differentiable Cantera reactor simulations. Across 30 RANS simulations spanning a range of pilot temperatures and inlet methane compositions, the optimized 7-reactor ERN achieves a maximum-temperature $R^2$ score of 0.7945 while preserving a $\sim6000\times$ speedup over the CFD solver. Outlet CO prediction remains more challenging, with a final $R^2$ score of $-0.4183$, but improves substantially from the unoptimized clustering initialization. These results show that unsupervised thermochemical feature extraction can provide effective physics-informed initializations for ERN construction, while gradient-based refinement can significantly improve predictive accuracy without manual reactor-network design.

08.
Nature Medicine 2026-06-08

Effects of SGLT2 inhibition on incident heart failure in carriers of cardiomyopathy-associated genetic variants

Although the beneficial effects of sodium–glucose cotransporter 2 (SGLT2) inhibition in heart failure (HF) have been well established, it is unknown whether SGLT2 inhibition confers benefit in carriers of rare variants in cardiomyopathy-associated genes. Here we evaluated whole-exome sequencing data from the randomized DECLARE-TIMI 58 trial, in which adults with type 2 diabetes and increased cardiovascular risk were randomized to dapagliflozin or placebo treatment. Pathogenic or likely pathogenic variants (P/LP) in high-confidence cardiomyopathy genes were identified, and treatment effects on hospitalization for HF (HHF) were compared between carriers of such variants and noncarriers. Among 12,685 patients for whom sequence data were obtained, 121 carried a cardiomyopathy variant (76 dilated cardiomyopathy, 25 hypertrophic cardiomyopathy and 25 arrhythmogenic cardiomyopathy). Over a median follow-up of 4.2 years, dapagliflozin lowered the risk of HHF more strongly in carriers (hazard ratio 0.18, 95% confidence interval 0.04–0.86) than in noncarriers (hazard ratio 0.70, 95% confidence interval 0.57–0.86; P interaction 0.03). Absolute risk reduction was 13.0% in carriers and 1.0% in noncarriers (P interaction 0.03). Most carriers (82%) had no prior HF, and in carriers without prior HF, treatment with dapagliflozin reduced the absolute risk of HHF by 12.8%, compared with a reduction of 0.6% in noncarriers (P interaction 0.01). The findings from this cohort of older and high-risk patients raise the possibility that SGLT2 inhibitor treatment should be started early to prevent HF in individuals who carry P/LP cardiomyopathy variants. These results need to be confirmed in a prospective, dedicated trial of preventive HF treatments in carriers of P/LP cardiomyopathy-associated variants. In a whole-exome sequencing analysis, the beneficial effects of the SGLT2 inhibitor dapagliflozin in reducing the risk of future heart failure hospitalization in individuals with type 2 diabetes were markedly greater in individuals who carried a cardiomyopathy-associated genetic variant compared with noncarriers, suggesting a personalized preventative therapy based on genetic information.

09.
arXiv (CS.CV) 2026-06-15

Rendering-Aware Sparse Sampling for BRDF Acquisition

Accurate BRDF acquisition is essential for realistic rendering, but dense gonioreflectometer measurements are slow and expensive. We study how to select a small set of BRDF measurements that is most informative for reconstructing material appearance under a learned BRDF prior. Existing sparse-acquisition methods often optimize samples for BRDF-space reconstruction for all materials, while the perceptual importance of a adaptive measurement ultimately depends on its effect on each rendered appearance. We therefore formulate sparse adaptive acquisition as a rendering-aware optimization problem. Our method combines a set encoder for sparse coordinate–value observations, a pretrained hypernetwork-based/PCA-based BRDF reconstructor, and a differentiable renderer. During sampler training, the reconstructor remains fixed, and gradients from a rendered-image loss optimize the measurement locations. This separates acquisition design from prior fitting and encourages the sampler to choose directions that are informative under the learned material distribution. To make the comparison controlled, we evaluate the uniform baseline, meta-learning method, HyperBRDF method, and our learned sampler under matched sample numbers, train/test split, rendering scene, object mask, image mapping, and metrics. Our central claim: rendering-aware sampling improves extremely sparse BRDF acquisition when final rendered appearance is the target. BRDF-space and combined losses are reported only as ablations, together with joint refinement and image-only latent fitting for unseen materials.

10.
arXiv (quant-ph) 2026-06-25

The Cost of Removing Tunability in Quantum Data Re-Uploading

arXiv:2606.25598v1 Announce Type: new Abstract: Fixed encoding data re-uploading quantum circuits provide a striking example of universality emerging from a highly constrained architecture. However, universality alone is insufficient for assessing the theoretical and practical value of fixed and tunable upload circuits. The resource cost of removing tunability remains poorly understood. In this work, we establish quantitative depth-error scaling for approximating tunable upload circuits with fixed upload circuits. We show that a tunable upload circuit can be approximated by a fixed upload circuit using depth \( D = O_\sigma\!\left[(\log(1/\varepsilon))^\sigma\right] \) for every \(\sigma>1\), with a target dependent constant overhead, thereby improving the previously known polynomial dependence on \(1/\varepsilon\) with the same overhead. Our proof is based on an auxiliary extension approximation mechanism that combines Gevrey class construction, Jackson's theorem and generalized quantum signal processing theorem. Thus, the expressive power lost by removing tunability can be recovered using only polylogarithmic growth in circuit depth with a target dependent constant overhead. We further identify a periodic mismatch obstruction intrinsic to fixed upload approximations and use Turán-Nazarov inequalities to prove logarithmic lower bounds \( D = \Omega(\log(1/\varepsilon)) \) for the approximation of mismatch class target tunable upload circuits. Conceptually, our analysis reveals two structural mechanisms underlying approximation in fixed upload architectures: auxiliary extensions and mismatch obstructions. These results provide a quantitative understanding of how expressivity is transferred from tunable frequencies into circuit depth, and suggest a broader framework for studying approximation complexity in quantum signal processing and related quantum learning models.

11.
arXiv (CS.AI) 2026-06-19

Calibration Without Comprehension: Diagnosing the Limits of Fine-Tuning LLMs for Vulnerability Detection in Systems Software

arXiv:2606.20502v1 Announce Type: cross Abstract: Whether LLMs scoring well on vulnerability benchmarks genuinely reason about security or merely pattern-match on contaminated data remains unresolved. We present CWE-Trace, a framework for LLM vulnerability detection built from 834 manually curated Linux kernel samples spanning 74 CWEs. The framework enforces a strict temporal split (pre-2025 historical set / post-cutoff leakage-free set), preserves context-aware vulnerable–patched pairs, and introduces two diagnostic metrics: the Directional Failure Index (DFI) and Hierarchical Distance and Direction (HDD). We evaluate eight vanilla LLMs and 15 LoRA fine-tuned variants across non-targeted detection, targeted detection, and CWE classification. Our analysis yields two key results. First, data contamination provides no measurable advantage. Function-level analysis shows that 84% of nominally contaminated samples carry no usable memorization signal: vulnerable functions are absent or cross-mapped across datasets, and ~31% of contaminated samples carry CWE misclassification. Second, backbone directional priors dominate fine-tuning. Models exhibit stable, systematic failure modes (DFI ranging from -85.5 to +94.8 pp) that persist from historical to post-cutoff data and resist correction. Fine-tuning shifts the output threshold without changing the decision policy. This is calibration without comprehension: output distributions adapt to training data while the underlying security reasoning remains absent. The weakest backbone at binary detection (DeepSeek-R1) gains the most in coarse CWE classification, revealing that detection and understanding are decoupled capabilities. The best detection score reaches only 52.1% (+2.1 pp above chance); exact CWE ranking remains below 1.3% Top-1 accuracy, confirming that current LLMs lack reliable security reasoning for systems software, regardless of fine-tuning strategy.

12.
arXiv (CS.CV) 2026-06-25

MVTrack4Gen: Multi-View Point Tracking as Geometric Supervision for 4D Video Generation

Synthesizing a novel-view video from a monocular reference video along a target camera trajectory requires both geometric consistency and motion fidelity with respect to the reference video. Existing methods based on explicit 3D representations are limited by the accuracy of off-the-shelf reconstruction modules, which often produce inaccurate geometry for dynamic objects in monocular videos. In contrast, camera-conditioning-only methods can achieve high visual quality but often struggle to preserve geometric and motion consistency. In this work, we introduce MVTrack4Gen (Multi-View point Tracking for Novel-View Generation), a motion-aware training framework that leverages multi-view point tracking as an additional geometric and motion supervision signal for camera-conditioning-only novel-view video diffusion models. Our key finding is that specific attention layers encode strong correspondence cues, where query features attend to key features at geometrically corresponding locations across views and over time, and the misalignment of these correspondences causes motion inconsistency. Based on this observation, we route these features into an auxiliary multi-view tracking head and jointly train the diffusion model with a point-tracking objective. By explicitly strengthening these motion-aware correspondences, MVTrack4Gen improves existing models to better follow the motion in the reference view and maintain cross-view geometric consistency. Across diverse benchmarks, our method achieves state-of-the-art geometric consistency and competitive camera accuracy.

13.
medRxiv (Medicine) 2026-06-19

Cardiometabolic multimorbidity and care experiences in primary healthcare among Brazilian adults aged 50 and over (ELSI-Brazil)

Background: Population aging and the rising burden of non-communicable diseases have increased the prevalence of cardiometabolic multimorbidity (CM-MM) among older adults. Patient-reported experience measures (PREMs) are recognized as essential components of healthcare quality assessment, yet evidence on primary care experiences among individuals with CM-MM remains scarce. Objective: To analyze primary care experiences according to the presence of cardiometabolic multimorbidity among Brazilians aged 50 years and older. Methods: Cross-sectional study using data from the second wave of the Brazilian Longitudinal Study of Aging (ELSI-Brazil, 2019-2021; n = 9,949). CM-MM was defined as the self-reported coexistence of two or more of the following conditions: hypertension, diabetes mellitus, dyslipidemia, acute myocardial infarction, and stroke. Primary care experiences were assessed using a validated 12-item instrument organized into four domains: first-contact access, longitudinality, communication, and care coordination. Associations were estimated using Poisson regression adjusted for sociodemographic, health conditions, and healthcare utilization variables, with stratified analysis by Family Health Strategy (FHS) coverage. Results: CM-MM prevalence was 25.5%, with a progressive increase by age and an inverse gradient by education. Individuals with CM-MM reported significantly more positive experiences in longitudinality (mean index 2.53 vs. 2.34; adjusted PR = 1.22; 95%CI 1.12-1.33; p < 0.001) and, to a lesser extent, in communication (mean index 2.68 vs. 2.58; adjusted PR = 1.10; 95%CI 1.00-1.20; p = 0.041). No statistically significant differences were found in first-contact access or care coordination. After stratified by FHS coverage, the observed differences in longitudinality and communication were no longer statistically significant. Conclusions: CM-MM was associated with more positive primary care experiences in longitudinality and communication. The absence of differentiated experiences in first-contact access and coordination highlights structural gaps in primary care responsiveness to individuals with greater clinical complexity. Keywords: Multimorbidity; Cardiometabolic diseases; Primary Care; Patient-reported experience measures; Older adults; ELSI-Brazil.

14.
arXiv (CS.CV) 2026-06-16

Teacher-Student Structure for Domain Adaptation in Ensemble Audio-Visual Video Deepfake Detection

The rapid advancement of generative AI models is leading to more realistic deepfake media, encompassing the manipulation of audio, video, or both. This raises severe privacy and societal concerns. Numerous studies in this area have yielded promising intra-domain results; however, these models frequently exhibit decreased efficacy when faced with data from dissimilar domains. Consequently, recent deepfake detection approaches focus on enhancing the generalization ability through multiple techniques that incorporate all input modalities, including audio, images, and their interactions. In this regard, we propose the EAV-DFD method, a generalized deep ensemble audio-visual model (EAV-DFD) combined with a domain adaptation mechanism utilizing a teacher-student framework to enhance the model's ability to perform and generalize effectively across unseen domains. To evaluate the model's performance, we used the FakeAVCeleb dataset as the primary domain and the DFDC, Deepfake_TIMIT, and PolyGlotFake datasets as an unseen domain. Our experimental results demonstrate that the proposed framework is efficient in domain adaptation, improving AUC performance of the model by 4.09%, 17.94%, and 0.5% on three unseen datasets, using only a small portion of them to train the student model. This leads to a novel deepfake detection model capable of adapting to new domains and interpreting which modality has been manipulated, highlighting the potential of our approach for real-world applications.

15.
arXiv (math.PR) 2026-06-19

Critical parameters of germ-monotone families of branching random walks

arXiv:2602.21062v2 Announce Type: replace Abstract: We introduce a broad class of families of branching random walks on a countable set $X$, which we refer to as germ-monotone branching random walks (GMBRWs). The processes in each family are parametrized by a positive parameter $\lambda>0$, which controls the overall reproductive speed, and they are monotonically increasing in $\lambda$ with respect to the germ order, a notion that extends classical stochastic domination. This framework encompasses a wide range of models, including classical continuous-time branching random walks, as well as discrete-time counterparts of certain non-Markovian processes such as ageing branching random walks. We define a general notion of critical parameter $\lambda(A)$ associated with each subset $A \subseteq X$, which serves as a threshold separating almost sure extinction in $A$ from positive probability of survival in $A$. This unifies and extends the classical global and local critical parameters $\lambda_w$ and $\lambda_s$, which can be recovered as special cases. We then investigate how modifications of the reproduction laws, either on a finite set or on a more general subset of $X$, affect these critical parameters. Our results extend earlier contributions in the literature.

16.
arXiv (CS.LG) 2026-06-16

Semantic DLM+: Improving Diffusion Language Models through Bias-variance Trade-off in Transition Kernel Design

arXiv:2606.15327v1 Announce Type: new Abstract: Diffusion Language Models (DLMs) have demonstrated strong scaling capacity as alternatives to autoregressive language models. However, their performance is highly sensitive to the choice of transition kernels, and poorly designed kernels can lead to issues like training instability, slow convergence, and biased sampling. In this paper, we study this sensitivity through a principled analysis of generalization error and identify three critical factors: asymptotic bias (difficulty in approximating the posterior distribution), exposure bias (error propagation during sampling), and optimization variance induced by kernel dispersion. We further compare different transition kernels: masking diffusion yields sparse and easier posterior-approximation targets, while uniform diffusion provides stronger sampling-side repair but induces harder approximation. Motivated by this trade-off, we revisit a previously overlooked variant, semantic DLM (SemDLM), where the transition kernel corrupts tokens to neighborhoods that are semantically similar. Our theory suggests that SemDLM can serve as a plausible middle ground by reducing the posterior approximation difficulty of uniform diffusion while retaining repair ability. However, we find that SemDLM suffers from a semantic basin problem, where sampling repeatedly stays within a semantic region and produces low-diversity text. To address this, we propose SemDLM+, which adds a global transition and a semantic-frequency penalty during sampling. Experiments on LM1B and OpenWebText show that SemDLM+ improves training dynamics and achieves competitive language modeling and generation quality with satisfactory diversity.

17.
arXiv (CS.LG) 2026-06-24

Natural Identifiers for Privacy and Data Audits in Large Language Models

arXiv:2606.24408v1 Announce Type: new Abstract: Assessing the privacy of large language models (LLMs) presents significant challenges. In particular, most existing methods for auditing differential privacy require the insertion of specially crafted canary data during training, making them impractical for auditing already-trained models without costly retraining. Additionally, dataset inference, which audits whether a suspect dataset was used to train a model, is infeasible without access to a private non-member held-out dataset. Yet, such held-out datasets are often unavailable or difficult to construct for real-world cases since they have to be from the same distribution (IID) as the suspect data. These limitations severely hinder the ability to conduct scalable, post-hoc audits. To enable such audits, this work introduces natural identifiers (NIDs) as a novel solution to the above-mentioned challenges. NIDs are structured random strings, such as cryptographic hashes and shortened URLs, naturally occurring in common LLM training datasets. Their format enables the generation of unlimited additional random strings from the same distribution, which can act as alternative canaries for audits and as same-distribution held-out data for dataset inference. Our evaluation highlights that indeed, using NIDs, we can facilitate post-hoc differential privacy auditing without any retraining and enable dataset inference for any suspect dataset containing NIDs without the need for a private non-member held-out dataset.

18.
arXiv (CS.AI) 2026-06-24

Data Scale, Not Latency, Shapes Cross-Lingual Encoder Transfer in Streaming ASR

作者:

arXiv:2606.24169v1 Announce Type: new Abstract: Adapting a streaming speech recognition model to a new language requires choosing between two plausible warm starts: a multilingual (ML) encoder or an English-only (EN) encoder. The common intuition is that the multilingual encoder should help most at low data, but it is unclear how long that advantage persists, whether tight streaming latency amplifies it, and whether it survives deployment quantization. We answer these questions with a controlled sweep of a 0.6 B-parameter cache-aware FastConformer transducer across eight European languages, up to five target-language data scales (100 h to 2500 h), three streaming tiers plus offline decoding, and up to four public test sets. The main result is that multilingual initialization is a data-limited advantage, not a latency-limited one. On FLEURS at 160 ms, the mean EN-ML word error rate (WER) gap falls from +4.21 percentage points (pp) at 100 h to +0.20 pp at 2500 h; a power-law fit summarizes this decay, with each doubling of target-language data roughly halving the remaining advantage. Across the three streaming tiers, the across-language mean EN-ML gap is approximately stable at each scale from 100 to 1000 h, and is near zero by 2500 h. Finally, 4-bit weight-only encoder quantization at the matched 560 ms streaming tier reduces the encoder footprint by about 3x, with an average FLEURS WER increase of about 0.5 pp. The resulting guideline is simple: use multilingual initialization in low-data regimes, treat the choice as effectively irrelevant at large data, and make latency and quantization decisions independently.

19.
PLOS Computational Biology 2026-06-15

A multilevel hierarchical framework for quantification of experimental heterogeneity in population snapshot data

by David J. Warne, Xiangrun Zhu, Thomas P. Steele, Stuart T. Johnston, Scott A. Sisson, Matthew Faria, Ryan J. Murphy, Alexander P. Browning Biological systems exhibit substantial heterogeneity: that is, variation in specific characteristics of individuals within a population. As a result, it is of critical importance to appropriately account for biological heterogeneity when calibrating mathematical models to infer cellular processes and predict behaviour. Recent approaches consider ordinary differential equations with random parameters to quantify heterogeneity in dynamical processes of cells. In this setting, statistical inference is performed to characterise the distribution of these random parameters within a cell population. One significant limitation of this approach is the tacit assumption that there are no substantial deviations in these distributions across experimental replicates. In this work, we propose a flexible Bayesian hierarchical differential equation modelling framework that quantifies and distinguishes both inter-experimental heterogeneity (heterogeneity between experimental replicates) and intra-experimental heterogeneity (biological heterogeneity within replicate populations). We consider two recent studies that employ mathematical models to interpret flow cytometry snap-shot data and quantify heterogeneity in nano-particle cell interactions and cell internalisation processes. Using simulation data, we demonstrate that substantial inaccuracy in the inferred dynamics can arise when experimental heterogeneity is not accounted for. By contrast, our hierarchical approach is robust to variability in inter-experimental and intra-experimental heterogeneity and our method simplifies to previous methods when inter-experimental heterogeneity is negligible. Our approach is flexible and widely applicable to applications involving replicate populations and snapshot data. We provide open-source implementations of our methods on GitHub.

20.
arXiv (CS.AI) 2026-06-19

Creativity Reconsidered: Generative AI and the Problem of Intentional Agency

arXiv:2601.15797v2 Announce Type: replace Abstract: Many theorists maintain that conscious intentional agency is a necessary condition of creativity. We argue that this requirement, which we call the Intentional Agency Condition (IAC), should be abandoned. We motivate this by highlighting the problems this criterion encounters in the face of recent advances in generative AI, which is ostensibly creative despite being incapable of intentional agency. We present two corpus analyses to illustrate the rapidly increasing tendency of people to predicate creativity to generative AI. In response to this predicament, theorists of creativity have proposed a range of conflicting solutions, which we critically evaluate. We find that none of these satisfyingly resolves the initial predicament, and we therefore propose a novel approach. Our claim is that ascriptions of creativity are dependent on what we call creative ability. This solution explains why intentional agency is important for judgements of creativity, without being a necessary condition. Our approach thereby accommodates AI creativity without dismissing the intuition that perceived intentions are of key importance for ascriptions of creativity.

21.
arXiv (CS.AI) 2026-06-25

AeroCast: Probabilistic 3D Trajectory Prediction for Non-Cooperative Aerial Obstacles via Transformer-MDN Architecture

arXiv:2606.25122v1 Announce Type: cross Abstract: Autonomous aerial vehicles operating in shared airspace must predict the future positions of non-cooperative obstacles to plan evasive maneuvers before a collision becomes unavoidable. Unlike cooperative systems that share intent, non-cooperative obstacles such as birds, uncontrolled drones, or debris exhibit multi-modal motion that deterministic predictors cannot adequately represent. Existing methods either rely on recurrent encoders that propagate temporal information sequentially, limiting their ability to capture long-range kinematic precursors of maneuver initiation, or produce point forecasts that provide no distributional information to downstream planners. This paper presents AeroCast, a probabilistic trajectory prediction framework that combines a Transformer encoder with a Mixture Density Network output head to predict per-timestep Gaussian mixture distributions over future three-dimensional displacements. A translation-invariant consecutive displacement encoding and a calibration-oriented training objective address the input design and mode-degeneracy challenges specific to mixture-based aerial trajectory prediction. On a hybrid real-and-synthetic quadrotor corpus spanning nine motion categories, AeroCast reduces Average Displacement Error and Final Displacement Error by approximately 50% relative to the baselines over a five-second horizon, and achieves the lowest negative log-likelihood and Continuous Ranked Probability Score among all compared methods. Ablation analysis identifies velocity input and model capacity as the primary contributors to prediction quality, and positional encoding as essential for long-horizon trajectory coherence. AeroCast inference completes in 0.1ms per sample, compatible with real-time onboard deployment at 100Hz.

22.
medRxiv (Medicine) 2026-06-24

Uncovering the fitness of endemically circulating Zika virus strains

Zika virus (ZIKV) is an arbovirus that usually causes few symptoms and has circulated endemically in Asia for decades. However, a large outbreak in South America in 2015 uncovered the serious risk of congenital Zika syndrome in infants born from ZIKV infected mothers. It is unknown whether a lineage with distinct pre-existing fitness advantage emerged from Asia to cause the South American outbreak, and whether there is ongoing evolution that can result in future globally fit strains. Here we used 107 sequences from a single setting (Thailand) collected over an 18 year period (2006-2023). We used novel analytical tools to identify distinct lineages that have circulated in the population and estimated their relative epidemiological fitness. We found there have been six lineages circulating sequentially in the country, with regular emergence and replacement of lineages showing higher fitness than their predecessors. We identified 15 lineage-defining amino acid changes, including four well-documented fitness-enhancing mutations, and two UTR substitutions. The lineage that emerged in South America was evolutionarily linked to the highest-fitness lineage in Thailand, carrying seven of our lineage-defining substitutions acquired during endemic circulation there, and subsequently accumulating four additional changes. After the global pandemic, endemic ZIKV in Thailand continued to evolve, with newly emerged lineages showing novel mutations and increased fitness. Our findings have key implications for the monitoring of ZIKV and can help identify the pathway to increased transmissibility of this globally important pathogen.

23.
arXiv (CS.CV) 2026-06-16

RGFVR: Reference-Guided Face Video Restoration with Flow Matching

Face video restoration from degraded observations is challenging, as it requires simultaneously recovering visual fidelity, temporal consistency, and subject identity. Existing approaches are often either reference-free, which can lead to identity loss when person-specific facial details are lost, or subject-specific, which limits generalization to unseen identities. We propose a subject-agnostic, reference-guided framework for identity-preserving face video restoration. Our method introduces bimodal perceptual-descriptive identity conditioning into a pretrained flow-based text-to-video generator and employs a two-stage training strategy to strengthen identity guidance during restoration. Experiments show that our approach improves restoration fidelity, temporal consistency, and identity preservation, achieving superior performance under challenging video degradations, including downsampling, blur, noise, and compression artifacts. The code is available under: https://github.com/batuhanntosun/RG-FVR.

24.
arXiv (CS.AI) 2026-06-25

Exploring Information Seeking Agent Consolidation

arXiv:2602.00585v2 Announce Type: replace Abstract: Information-seeking agents have emerged as a powerful paradigm for knowledge-intensive tasks, yet today's systems remain specialized for the open web, documents, or local knowledge bases, hindering scalable and cross-domain deployment. We present the first systematic empirical study of consolidating these information-seeking agents into a single foundation agentic model. We compare two paradigms – data-level mixing, which trains a unified model on a mixture of datasets, and parameter-level merging, which merges independently trained experts in parameter space – across 3 training scenarios, evaluating 26 representative parameter-level methods on 10 benchmarks. To compare across heterogeneous benchmarks, we introduce a geometric Composite Score and an Imbalance Score that describe overall performance and task skew. Our analysis shows that (i) well-designed parameter-level merging attains parity with data mixing at a fraction of its training cost and is order-agnostic; (ii) parameter-level merging structurally preserves out-of-domain capabilities that data mixing universally forgets; and (iii) cross-scenario stability is strongly tied to consolidation quality. We distil our observations into a method-selection guide and design principles for next-generation merging operators.

25.
arXiv (CS.CL) 2026-06-17

Beyond Native Success: Auditing Deployment-Interface Exposure of CLIP Backdoors

Contrastive Language-Image Pre-training models are widely reused across downstream interfaces, including feature extraction, retrieval, reranking, and selection. Existing CLIP backdoor, however, usually validate attacks on a small attack-native task, leaving unclear whether the same poisoned checkpoint remains exposed, weakens, or becomes not applicable when reused through other interfaces. We introduce DIFE, a Deployment-Interface Footprint Evaluation framework that audits backdoored CLIP checkpoints across deployment interfaces. DIFE makes various evaluations comparable by specifying each interface's component readout, trigger channel, target event, reference condition, and metric. DIFE also introduces effective-footprint diagnosis to identify the reusable CLIP component or component combination that carries exposure and explains where risk transfers. Auditing reproduced CLIP backdoors with DIFE reveals a structured landscape: native success is not a checkpoint-level risk certificate, exposure follows component footprints, text-side poisoning does not yield textual-encoder control, and some coupled attacks remain mechanism-bound. This audit reveals a import gapin existing CLIP backdoors: a textual encoder that itself becomes a reusable carrier of adversarial behavior. We therefore introduce BadTextTower to fill this gap. BadTextTower produces strong text-conditioned retrieval, reranking, and selection exposure while leaving visual-only reuse nearly clean.