Academic Intelligence · Curated Daily

Explore the Frontier of Global Academia

AcademicHub aggregates real-time literature from top journals and preprint platforms. Build your personal research radar and let large language models compile cross-disciplinary analysis briefings automatically.

01.
arXiv (CS.CL) 2026-06-16

Dealing with Annotator Disagreement in Hate Speech Classification

Hate speech detection is a crucial task, especially on social media where harmful content can spread quickly. Collecting social media content (tweets etc.) to train machine learning models is easy, but detecting and categorizing hate speech can be difficult due to the inherently subjective nature. This subjectivity leads to frequent disagreement among annotators, particularly for subtle or borderline content. Traditional approaches either discard non-consensus samples or force a ''gold standard'' through expert adjudication, ignoring valuable information about uncertainty and diverse human perspectives. We examine the largely overlooked problem of annotator disagreement in hate speech classification and evaluate a range of aggregation methods, including majority voting, ordinal strategies (minimum, maximum, and mean), and analyze their impact across binary, 4-class, and 6-class classification tasks. In addition, we leverage annotators' perceived hate speech strength scores to explore regression-based and hybrid modeling approaches. Among others, we show that filtering non-consensus samples results in over-optimistic results and that the perceived strength provides a complementary signal that enhance classification performance. Finally, we establish new state-of-the-art results for hate speech detection in Turkish tweets, and demonstrate that annotator disagreement, when properly modeled, is a valuable resource for building more robust and reliable systems.

02.
medRxiv (Medicine) 2026-06-15

Data-Driven Stochastic Model for Detecting Patientswith Alzheimer's Disease

Alzheimer s disease (AD) is a critical neurological disorder that causes the brain to shrink and leads to the eventual death of brain cells, adversely affecting a person s ability to function. AD is a fast-growing disease in the United States and was the fifth leading cause of death among Americans 65 years of age or older in 2023. In the United States 6.9 million people aged 65 or older were diagnosed with AD, along with a high rate of undiagnosed patients. Thus, the objective of our study is to develop a real data-driven predictive model to identify a patient with AD based on eight risk factors: Age, Gender, ADAS-Cog13, Entorhinal, Fusiform, Intracranial Volume (ICV), Amyloid-Beta, and Tau Protein, with a high degree of accuracy. The quality of the model was evaluated using well-established and sophisticated statistical measures: the area under the receiver operating characteristic curve, calibration plot, Hosmer-Lemeshow goodness-of-fit test, and K-fold cross-validation. If a patient is given information on the above risk factors, our proposed binary logistic regression model can classify the patient as having AD or not with at least 98% accuracy.

03.
arXiv (quant-ph) 2026-06-19

Quantum Kernels are Spectral Tensor Networks

arXiv:2606.20402v1 Announce Type: new Abstract: Quantum kernels admit Fourier representations whose frequencies are determined by the data-encoding gates of the underlying feature map. We show that entangling tensor kernels are matrix product operator factorizations of the corresponding Fourier coefficient tensors, thereby identifying quantum kernels as spectral tensor networks. By grouping gate-level frequency configurations that yield the same feature-wise frequency, we obtain a grouped Fourier form that induces a more compact spectral tensor network representation of the kernel. We further show that kernel target alignment serves as a bridge between the Fourier and tensor network views. On a grid that resolves the accessible Fourier modes, it becomes the Frobenius cosine similarity between Fourier coefficient tensors. Our numerical experiments show that layered quantum kernels admit accurate representations with small bond dimension, revealing a compressibility governed by correlations between Fourier modes. This compressibility provides a diagnostic of classical representability and of whether kernel evaluation is likely to remain classically tractable.

04.
arXiv (CS.CL) 2026-06-11

MA-DLE: Speech-based Automatic Depression Level Estimation via Memory Augmentation

Speech-based automatic estimation of depression levels is essential for enabling early detection and timely intervention, particularly in resource-constrained mental health settings. In recent years, deep learning has demonstrated impressive success across various domains, including affective computing and mental health assessment. Most existing approaches rely on RNN-based architectures (such as LSTM and GRU) to model temporal information for depression estimation. However, the extracted features often emphasize only a few adjacent speech segments, limiting their ability to capture long-range dependencies. To overcome this limitation, we introduce a memory-based feature augmentation method that enhances the representational capacity of GRU-extracted features. Rather than indiscriminately incorporating historical data, our memory bank is designed to selectively integrate two types of components in order to reduce redundancy and irrelevance: (1) historical temporal features that closely resemble the current GRU output, offering complementary contextual information; and (2) dynamic memory features identified based on feature variability, which capture behavioral and emotional fluctuations indicative of depressive symptoms. To effectively fuse the memory-augmented features with GRU outputs, we further design a Hierarchical Attention Fusion (HAF) module. Our method is evaluated on the widely used DAIC-WOZ and E-DAIC datasets, achieving state-of-the-art performance.

05.
arXiv (CS.AI) 2026-06-12

Reasoning for Mobile User Experience with Multimodal LLMs: Task, Benchmark, and Approach

arXiv:2606.13192v1 Announce Type: new Abstract: User experience (UX) centered on usability, perceived consistency, and functional clarity is fundamental to real-world user interfaces (UI). The application of multimodal large language models (MLLMs) in the field of user interfaces is evolving rapidly, such as visual element grounding, graphical user interface (GUI) agents, and design-to-code generation. However, research efforts on evaluating UX based on UI screenshots are still immature. To address this, we propose UXBench, a novel multimodal benchmark consisting of 2,000 VQA data samples designed to assess MLLMs' ability to perform UI-based reasoning. UXBench includes 8 tasks based on real-world UI screenshots that require fine-grained diagnosis of UX issues across layout relationships, visual hierarchy, and content consistency. Our extensive evaluation of mainstream MLLMs shows that they remain fundamentally limited in their capacity for UI-based reasoning. The results underscore the need for further advancements in this area. To bridge this gap, we propose UI-UX, an MLLM based on Qwen3-VL-4B-Thinking foundation model and enhanced via reinforcement learning with two key innovations: a reward routing mechanism that dynamically balances perceptual understanding and logical reasoning during inference, and an asymmetric transition reward that suppresses redundant or insufficient reasoning steps. Experiments demonstrate that UI-UX achieves state-of-the-art (SOTA) performance on UXBench, attaining an accuracy of 0.7963 – surpassing Claude-4.5-Sonnet's 0.6550 – while exhibiting strong generalization across diverse UI tasks and maintaining low inference latency.

06.
arXiv (CS.LG) 2026-06-18

Spatiotemporal downscaling and nowcasting of urban land surface temperatures with deep neural networks

arXiv:2605.13566v2 Announce Type: replace Abstract: Land Surface Temperature (LST) is a key variable for various applications, such as urban climate and ecology studies. Yet, existing satellite-derived LST products provide either high spatial or high temporal resolution, resulting in a fundamental trade-off between the two. To address this trade-off, we combine observations from a geostationary and a polar orbiting satellite and provide LST fields at high spatial and high temporal resolution (1 km at 15-min intervals). We demonstrate their application for intraday forecasting of LSTs. To estimate LST fields at high spatiotemporal resolution, a U-Net model is trained to map LST fields from SEVIRI/MSG (3 km and 15 min resolution) to LST fields from Terra/Aqua MODIS (1 km, 4 overpasses per day) that are collocated in space and time. The presented model has been trained on LSTs across large European cities with a population exceeding 1 million inhabitants, and achieves an RMSE = $1.92${\deg}C and near-zero bias MBE = $0.01${\deg}C on the hold-out test set. As a second step, we present an LST nowcasting model based on ConvLSTM architecture, trained across downscaled LST fields with forecast lead times of 15 to 75 minutes. The nowcasting model outperforms a persistence and a Climatological Rolling Median benchmarks, with RMSEs of $0.57$ to $1.15${\deg}C for the considered lead times and biases ranging from $-0.1$ to $0.14${\deg}C. An additional validation conducted against independent MODIS overpasses confirms robust performance. Our LST forecast model at high spatiotemporal resolution is directly applicable to operational satellite-based LST monitoring.

08.
arXiv (CS.CL) 2026-06-17

LVLMs and Humans Ground Differently in Referential Communication

For generative AI agents to partner effectively with human users, the ability to accurately predict human intent is critical. But this ability to collaborate remains limited by a critical deficit: an inability to model common ground. We present a referential communication experiment with a factorial design involving director-matcher pairs (human-human, human-AI, AI-human, and AI-AI) that interact with multiple turns in repeated rounds to match pictures of objects not associated with any obvious lexicalized labels. We show that LVLMs cannot interactively generate and resolve referring expressions in a way that enables smooth communication, a crucial skill that underlies human language use. We release our corpus of 356 dialogues (89 pairs over 4 rounds each) along with the online pipeline for data collection and the tools for analyzing accuracy, efficiency, and lexical overlap.

09.
arXiv (CS.LG) 2026-06-12

Foundations of Practical Quantum Advantage in Quantum-Informed Machine Learning for Predicting Chaos

arXiv:2606.13422v1 Announce Type: cross Abstract: We develop theoretical foundations for a practical quantum-advantage mechanism in quantum-informed machine learning for chaotic dynamical systems. A family of k-indexed higher-order quantum statistical priors (Q-Priors) hosts the k-point marginal of the invariant measure on n_q = kq qubits, extending the single-site construction of prior work. We prove a two-stage advantage. In the representation stage, superposition and entanglement compactly store non-factorisable spatial correlations of the invariant measure on n_q qubits. In the extraction stage, joint Bell measurements on two copies estimate any post hoc Pauli functional with a copy-pair count independent of n_q, whereas any adaptive single-copy protocol for the corresponding full-Pauli read-out requires Omega(2^(n_q)) copies; this is a provable quantum-classical separation in copy-measurement complexity. The two-copy read-out is realised in simulation and on IQM superconducting processors. Two case studies instantiate the mechanism in workflows of independent scientific value: a turbulent channel-flow study in which the two-copy read-out yields a named non-diagonal correlator of the invariant measure (the velocity-direction coherence), and a medium-range weather forecasting workflow on the European Centre for Medium-Range Weather Forecasts ERA5 reanalysis in which the diagonal k

10.
arXiv (CS.LG) 2026-06-12

Epistemic Uncertainty Is Not the Reducible Kind

Authors:

arXiv:2606.12646v1 Announce Type: cross Abstract: The standard taxonomy of predictive uncertainty defines epistemic uncertainty as the part removable by collecting more data, while the standard measure identifies it with a mutual-information term. We prove the definition and the measure are extensionally inconsistent. On an explicit construction, the measure assigns all uncertainty to the epistemic class, yet no quantity of training data reduces it. Reducibility is instead a property of the pair (uncertainty, acquisition class), and the dichotomy resolves into three parts: aleatoric, sample-reducible epistemic, and mechanism-reducible epistemic uncertainty. An exact identity for the value of an observation shows that in-distribution data never reduces mechanism-irreducible uncertainty and generically increases it. Ensemble disagreement, the deployed epistemic estimate, tracks the training procedure rather than the epistemic term. It collapses to zero beneath a positive truth under consistent training, and equals hyperparameter-scaled initialization noise under interpolation. A finite-sample falsification test and seed-swept experiments confirm the theory.

11.
arXiv (CS.AI) 2026-06-16

Skill-to-LoRA: From Using Skills to Learning Behaviors for Token-Efficient LLM Agents

arXiv:2606.16769v1 Announce Type: new Abstract: Agent skills are commonly distributed as SKILL.md files: human-readable procedural documents that describe workflows, tools, resources, and domain conventions. While convenient for inspection and reuse, this design requires the same reusable procedure to be repeatedly injected into the runtime context. We propose Skill-to-LoRA(S2L), a behavior-centric skill representation that replaces runtime skill text with skill-specific LoRA adapters. Rather than compressing the skill document itself, S2L models the behavioral change induced by the skill text: offline, the complete SKILL.md is used to synthesize skill-guided demonstrations; online, the full document is omitted and the corresponding LoRA adapter is dynamically loaded to activate the learned skill behavior. We evaluate S2L with Qwen3.6-27B on a 21-skill subset of SWE-Skills-Bench. Compared with the no-skill and Full Skill Text baselines, S2L improves pass rate by 2.9 and 5.2 percentage points, respectively, while reducing per-step token cost by 6.6% relative to Full Skill Text prompting. S2L matches or improves Full Skill Text on 18/21 skills and the no-skill baseline on 15/21 skills. Control experiments further show that the gains depend on skill-specific adapter alignment: Wrong-LoRA and Shared-LoRA both reduce performance. These results suggest that many procedural agent skills can be converted from runtime instructions into trainable, dynamically loadable behavioral modules. Code will be released upon acceptance.

13.
arXiv (CS.CV) 2026-06-16

MapDream: Task-Driven Map Learning for Vision-Language Navigation

Vision-Language Navigation (VLN) requires agents to follow natural language instructions in partially observed 3D environments, motivating map representations that aggregate spatial context beyond local perception. However, most existing approaches rely on hand-crafted maps constructed independently of the navigation policy. We argue that maps should instead be learned representations shaped directly by navigation objectives rather than exhaustive reconstructions. Based on this insight, we propose MapDream, a map-in-the-loop framework that formulates map construction as autoregressive bird's-eye-view (BEV) image synthesis. The framework jointly learns map generation and action prediction, distilling environmental context into a compact three-channel BEV map that preserves only navigation-critical affordances. Supervised pre-training bootstraps a reliable mapping-to-control interface, while the autoregressive design enables end-to-end joint optimization through reinforcement fine-tuning. Experiments on R2R-CE and RxR-CE achieve state-of-the-art monocular performance, validating task-driven generative map learning.

14.
arXiv (CS.CV) 2026-06-16

DeepMine-Mamba: Mitigating Information Dilution in Mamba-Based State Space Models for Document Image Binarization

Document image binarization aims to separate foreground text from degraded backgrounds while preserving thin, broken, and low-contrast strokes. Although deep learning methods have improved binarization performance, most existing approaches rely on convolutional, transformer-based, or generative architectures, while Mamba-based state space models remain largely unexplored for this task. In this work, we investigate Mamba-based feature propagation and observe that direct state-space propagation may dilute weak foreground cues during long-range modeling, especially faint ink traces, fragmented characters, and boundary-sensitive stroke details. To address this problem, we propose DeepMine-Mamba, a Mamba-based binarization framework equipped with a novel Anti-Dilution Gate that estimates propagation-induced feature changes and selectively restores stroke-sensitive local responses while suppressing unnecessary background enhancement. Experiments on DIBCO/H-DIBCO benchmarks under a strict leave-one-year-out protocol show that DeepMine-Mamba achieves competitive overall performance, with strong average FM and Fps across benchmark years. Ablation results further show that the Anti-Dilution Gate is the key component for mitigating propagation-induced foreground dilution and improving stroke preservation.

15.
arXiv (CS.LG) 2026-06-11

GENERIC-FNO: Embedding Energy Conservation and Entropy Production into Fourier Neural Operators

arXiv:2606.08343v2 Announce Type: replace Abstract: We introduce GENERIC-FNO, the first neural operator to embed the full GENERIC (metriplectic) structure of nonequilibrium thermodynamics – reversible, energy-conserving dynamics and irreversible, entropy-producing dynamics coupled through the degeneracy conditions – directly in function space. Existing structure-preserving neural operators enforce at most a single conservation law or reversible (Hamiltonian) structure, while thermodynamically consistent learning has been confined to finite-dimensional, graph, or particle systems. GENERIC-FNO closes this gap: it learns the energy and entropy functionals as neural operators and parameterizes the Poisson and friction operators as diagonal Fourier multipliers sandwiched between rank-one projections that enforce the degeneracy conditions exactly, by construction, with no penalty term, update projection, or residual. The degeneracy identities hold to machine precision (residuals ~10^-13) for any initialization, dimension, or resolution, so the continuous-time dynamics conserve the learned energy and produce entropy exactly; the explicit time stepping adds only a small O(dt^2) drift (per-step residual ~10^-6). We further note that the (E,S,L,M) decomposition of a given flow is not unique, and introduce a gauge-invariant dissipation diagnostic separating reversible from dissipative dynamics independently of the learned functionals. Across three operator backbones (1D/2D FNOs and DeepONet) and four PDEs spanning reversible, dissipative, and mixed regimes, GENERIC-FNO preserves its exact structural guarantees zero-shot across a 4x super-resolution range (64 to 256), recovers the ground-truth ordering of physical dissipation, and is competitive with strong unconstrained and energy-penalized baselines, outperforming them on several dissipative and mixed problems at comparable or fewer parameters.

16.
arXiv (CS.CL) 2026-06-18

UniECG: Understanding and Generating ECG in One Unified Model

Electrocardiogram (ECG) interpretation is a fundamental skill in medical education, yet students often need more than static examples to connect waveform evidence with diagnostic reasoning. This paper presents UniECG as a step toward interactive ECG education. UniECG supports two complementary learning interactions: given an ECG signal or image, it generates an evidence-based explanation; given a textual learning objective, it generates a corresponding ECG signal example for case-based learning. The model follows a two-stage design. First, it learns grounded ECG explanation from ECG signal–image–text data. Second, it introduces special ECG generation tokens and aligns their hidden representations with a pretrained text-conditioned ECG diffusion model, enabling controllable signal-level ECG generation. We evaluate UniECG through grounded ECG explanation and generation-oriented qualitative analysis, examining its potential to support explanation and case-based learning. UniECG is intended as an educational aid and a research step toward interactive AI-assisted ECG learning, rather than a clinically validated diagnostic system.

17.
arXiv (CS.LG) 2026-06-16

Model Stealing Through the Lens of Model Multiplicity

arXiv:2606.15493v1 Announce Type: new Abstract: Model stealing attacks, where adversaries create high-fidelity surrogate models, are a significant threat to the intellectual property of machine learning services. Conventional wisdom suggests these surrogates could provide adversaries with economic leverage comparable to the original service providers. This paper challenges this assumption by evaluating model stealing attacks beyond mere fidelity to the target model. Because query-based extraction provides only partial supervision of the target's input-output behavior, the surrogate is not uniquely identified: many near-optimal surrogates can achieve comparable fidelity while differing in deployment-relevant properties. Instead of performing a classic learning-based model stealing attack, we compute the Rashomon Set (i.e., the set of almost-equally-accurate models) of surrogate models, and evaluate its diversity using multiplicity metrics (ambiguity, discrepancy, and Rashomon Capacity) and group fairness metrics. Across tabular, medical imaging, and NLP tasks, our experiments on real-world datasets reveal that despite exhibiting similar fidelity to the target model, surrogate models can display significant variances in other critical performance metrics. These findings cast doubt on the presumed equivalence between high-fidelity surrogates and the target model in practical deployment scenarios.

18.
arXiv (CS.LG) 2026-06-16

If These Walls Could Talk: Critical Play with Large Language Models in Museums

arXiv:2606.15565v1 Announce Type: cross Abstract: Large Language Models (LLMs) are increasingly being used in museums to as role playing chatbots which let visitors talk to simulated versions of people and artefacts from the past. While such installations can be playful and engaging, they are also problematic because LLMs cannot be trusted to speak truthfully. I identify a fundamental dilemma for the use of LLMs in museum chatbots: LLMs cannot be trusted to tell the truth, and efforts to make them more reliable may ruin that which is attractive about the bots in the first place - their ability to engage in life-like conversation. In response, I propose designing for critical play with LLM-based bots: Designing for playful interactions with bots that are unreliable but still able to represent the past in an adequate and engaging manner - as fictional characters representing historical narratives, styles of discourse, diverse perspectives, humor and satire.

19.
arXiv (CS.AI) 2026-06-17

Comprehensive pKa Data Augmentation from Limited Real Data through an Engineered Models-Quantum Framework

arXiv:2606.17077v1 Announce Type: cross Abstract: Proton dissociation constants (pKa) are critical for functional molecule discovery and molecular modeling. Building on iBonD, the largest experimental pKa database established, we and other researchers have developed several methods including machine-learning-based empirical prediction and high-accuracy energy calculations. Despite this foundation, the rapid augmentation of high-quality pKa data remains fundamentally constrained. As part of this work, we performed large-scale regression-based pKa prediction on unlabeled molecular datasets using a collection of extensively optimized machine-learning models. The results indicate that, since the feature distributions of unlabeled molecular datasets, the pKa data distribution approximates normality, with extreme scarcity of tail-region samples. Although such augmentation is highly valuable for improving overall data availability and predictive modeling, it remains insufficient for efficiently discovering molecules with broad-spectrum pKa properties. To address this, we explore the targeted generation of molecules with sparse pKa properties from the vast chemical space. Given that traditional continuous latent space VAE-RNN methods for molecular generation suffer from insufficient stability and fail to demonstrate clear advantages in complementing sparse data, we design and implement a quantum-assisted sparse-pKa molecular generation. Feasibility is validated on a simulated quantum annealer, and superior extreme-value sampling is further achieved on physical coherent Ising machines (CIMs). (to be continued)

20.
arXiv (CS.AI) 2026-06-11

OmniBioTwin: A System-of-Twinned-Systems Framework for Health Digital Twins

arXiv:2606.11264v1 Announce Type: cross Abstract: Health digital twins (HDTs) promise patient-specific modeling and decision support but current approaches remain structurally fragmented: monolithic models that address a single organ or task lack cross-scale fidelity, while system-level twins lack generalizable architectural frameworks. We propose OmniBioTwin, a System-of-Twinned-Systems (SoTS) framework that organizes HDTs as modular computational entities coupled through explicit interaction operators within a multi-layer network architecture. The framework comprises seven coordinated layers - spanning data integration, autonomous twin modeling, cross-scale coupling, temporal synchronization, and human-in-the-loop decision support. We demonstrate OmniBioTwin by instantiating a multiscale twin for glucagon-like peptide-1 (GLP-1) signaling pathways in Alzheimer's disease, illustrating how molecular, cellular, and organ-level twins can be composed and coupled within a unified system.

21.
arXiv (CS.CV) 2026-06-16

Trusting Right Predictions for Wrong Reasons: A LIME Based Analysis of Deep Learning Interpretability in Lung Cancer Diagnosis

Lung cancer is the leading cause of cancer-related mortality, with approximately 2.5 million new cases and 1.8 million deaths annually, making reliable diagnosis a clinical priority. Although deep learning models have achieved strong performance in lung cancer classification, evaluation has largely focused on predictive accuracy, leaving their decision-making processes insufficiently examined. This study compares three architecturally distinct models: a Convolutional Neural Network (CNN), a pretrained ResNet50, and a Vision Transformer (ViT), trained on the IQ-OTH/NCCD lung cancer CT dataset. Local Interpretable Model-Agnostic Explanations (LIME) were applied to investigate model reasoning. In addition to standard performance metrics, a dual-correlation framework was introduced to measure both prediction agreement and explanation agreement across model pairs. All three models achieved strong classification performance, with ResNet50 attaining 98.61% accuracy, CNN 97.91%, and ViT 93.75%, while all achieved ROC-AUC scores of 0.99. Prediction correlations exceeded 0.99 across all model pairs, indicating highly consistent outputs. However, LIME explanation correlations remained below 0.26, revealing substantial differences in the image regions used to reach those predictions. Analysis of misclassified samples further identified a consistent spatial pattern: incorrect predictions were associated with attention outside the lung parenchyma, whereas correct predictions focused primarily within lung regions. These findings demonstrate that prediction agreement is a poor proxy for reasoning consistency, and that interpretability evaluation must be treated as an independent validation criterion alongside predictive performance in clinical AI systems.

22.
bioRxiv (Bioinfo) 2026-06-10

ECMME: an atlas of selection pressures on the mammalian extracellular matrix reveals contrasting evolutionary dynamics

The extracellular matrix (ECM) is a fundamental metazoan innovation that provides structural support and regulatory cues essential for multicellular life. While core matrisome components are subject to strong functional constraints, their evolutionary dynamics at the molecular level remain incompletely characterized. Here, we present a comprehensive per-residue analysis of selection pressures across 272 human core matrisome proteins using high-quality orthologous sequences from up to 228 placental mammal species. We developed an automated pipeline integrating ortholog identification, codon-aware alignments, and site-specific selection analyses with the MEME and FUBAR methods from the HyPhy suite. Results reveal pervasive strong purifying selection across the matrisome, consistent with its structural and functional indispensability. This is accompanied by episodic positive selection and rarer pervasive positive selection, with collagens exhibiting significantly elevated episodic positive selection compared to glycoproteins and proteoglycans. To facilitate community access, we developed ECMME (ECM Molecular Evolution) browser, an intuitive open-access web resource that visualizes selection metrics plotted directly onto protein topologies. ECMME allows researchers to seamlessly browse and investigate the data, providing a powerful framework for interpreting functional sites. It is available online and requires no local installation or set-up (https://izzilab-ecmme.share.connect.posit.cloud/).

23.
arXiv (CS.CV) 2026-06-19

Cinematic Compositing Using Character-Environment-Harmonized Video Generation Models

Cinematic compositing aims to integrate green-screen characters into novel environments while maintaining physical and photometric realism. Previous methods often fail to capture the complex bidirectional interactions between characters and their surroundings, which we characterize as Character-to-Environment (C2E) physical interaction and Environment-to-Character (E2C) lighting harmonization. To address this, we propose an end-to-end video diffusion framework that jointly models C2E and E2C interactions, specifically handling the challenges of interactive props. Our approach introduces a tri-mask-guided architecture with RGB-D joint denoising to ensure physically consistent interactions among the character, props, and environment. We further develop an efficient prior-driven data curation pipeline to construct high-quality relighting pairs without expensive rendering. Finally, a reference-conditioned mechanism enables controllable environment synthesis and precise prop replacement. Extensive experiments demonstrate that our framework significantly outperforms existing methods in cinematic-quality dynamic video compositing.

24.
arXiv (quant-ph) 2026-06-12

Entanglement Detection by Approximate Entanglement Witnesses

arXiv:2402.14755v2 Announce Type: replace Abstract: The problem of determining whether a given quantum state is separable is known to be computationally difficult. We develop an approach to this problem based on approximations of convex polytopes in high dimensions. By showing that a convex polytope constructed from a finite number of hyperplanes approximates the Euclidean ball arbitrarily well in high dimensions, we find evidence that a finite set of approximate entanglement witnesses is potentially sufficient to determine the entanglement of a state with high probability.

25.
arXiv (CS.LG) 2026-06-11

Bernstein-Schur Kernels: Random Features by Sketched Modulation and Radial Randomization

Authors:

arXiv:2606.11255v1 Announce Type: new Abstract: Bernstein–Schur kernels are products of a finite-feature kernel (one with an explicit finite-dimensional feature map) and a completely monotone shift-invariant kernel: nonstationary kernels that fall between the shift-invariant and dot-product templates random features usually exploit, so in general neither Bochner sampling nor polynomial sketching applies to the full kernel directly. We give one random-feature construction for the whole class that randomizes both factors: it sketches the finite modulation and randomizes the completely monotone radial factor, sampling the latter's one-dimensional Bernstein–Widder scale and then applying Gaussian random Fourier features (whose frequency is still $d$-dimensional). The feature dimension is then $Dm$, set by the sketch size $m$ and the radial-draw count $D$, free of the $O(d^2)$ size of the exact modulation feature. Keeping the modulation \emph{exact is the analyzable limit ($m\to\infty$): there we prove unbiasedness, an exact variance for the recommended flat estimator, an expected matrix-Bernstein operator-norm bound (with a matching high-probability tail) controlled by the top eigenvalues of the kernel and modulation Gram matrices together with an intrinsic dimension rather than the crude $N\max_{ij}$ entrywise route, and a deterministic relative-spectral kernel-ridge stability result. By conditioning on the sketch, the doubly-randomized estimator inherits the same intrinsic-dimension operator-norm guarantee plus a single additive sketch term, tunable by $m$ independently of $D$. The motivating instance is the biased $yat$-kernel $k_{yat,b}(w,x)=(w^\top x+b)^2/(\|w-x\|^2+\varepsilon)$, $b\ge0$, whose family span contains the inverse-multiquadric kernel by finite differences in $b$; for it the radial mixture is the IMQ spectral sampler, and one frequency per scale is variance-optimal at a fixed radial-feature budget.