Academic Intelligence · Curated Daily

探索全球前沿学术脉络

AcademicHub 汇聚顶级期刊与预印本平台的实时文献。定制您的专属科研雷达,利用大语言模型自动生成交叉领域文献分析简报。

01.
arXiv (CS.CL) 2026-06-16

Generative AI and the future of scientometrics: current topics and future questions

In this paper, we contribute to the debate on generative artificial intelligence (GenAI) in scientometrics. We argue that moving from a trial-and-error approach to an explainable and actionable use requires a principled understanding of strengths and weaknesses of GenAI as compared with other techniques and with human judgment. To this end, we introduce a conceptual framework based on the distinction between the semantic dimensions of texts, i.e. the meanings attributed to words, and their pragmatic dimension, i.e. their embedding within communicative situations. We leverage this framework to interpret the results of applications of GenAI in scientometrics and to provide guidance to users. Specifically, we conclude that key parameters to be considered are the nature of the task, the level of granularity of the analysis and whether the goal was descriptive, inferential or evaluative. These parameters lead to different strategies for using GenAI and human-machine integration. Finally, we suggest that, by generating large amounts of scientific language, GenAI might affect textual characteristics used to measure science, such as authors, words, and references. We argue that careful empirical work and theoretical reflection will be essential to remain capable of interpreting the evolving patterns of knowledge production in the age of AI.

02.
arXiv (CS.CL) 2026-06-17

PARSE: Provenance-Aware Retrieval Sanitization for Professional Domain LLM Agents

作者:

Prompt injection defenses evaluated on synthetic benchmarks do not generalize to real enterprise documents, which are longer, denser, and interleave legitimate authority language with factual content. We demonstrate this gap with a real-document benchmark of 122 tasks across five professional domains (financial, legal, medical, scientific, DevOps) using actual SEC filings, Federal Register rules, PubMed abstracts, arXiv papers, and GitHub postmortems. Paraphrasing, the strongest defense on synthetic benchmarks, shows no statistically significant attack success rate reduction on real documents (p=0.500) while degrading utility from 91.8% to 82.8%. We introduce PARSE (Provenance-Aware Retrieval Sanitization), a domain-aware, fact-preserving sanitization pipeline that classifies each sentence by injection likelihood, extracts structured facts before rewriting, and verifies fact preservation via a consistency-checking loop. A directiveness gate routes 59% of real enterprise documents to a lightweight path, concentrating computational cost on high-risk documents. PARSE achieves 15.6% attack success rate – a 38% reduction versus the 25.4% baseline – at 86.9% utility, the only condition that is both statistically significant (p=0.014, adequately powered) and maintains near-baseline utility. Practitioners should evaluate defenses on domain-matched real documents, not synthetic proxies.

03.
arXiv (CS.CV) 2026-06-15

Stream3D: Sequential Multi-View 3D Generation via Evidential Memory

View-conditioned 3D generators such as SAM 3D, TRELLIS, and Hunyuan3D produce high-quality object reconstructions from a single view, but real-world visual observation often arrives as long monocular streams. Naively applying these generators to each streaming frame independently leads to severe temporal inconsistency in the generated results. To address this problem, we propose Stream3D, the first training-free streaming mechanism that turns a frozen view-conditioned 3D generator into a streaming generator with constant cross-chunk memory. Stream3D achieves this by maintaining a compact evidential memory, which selectively caches the most informative historical frames based on a proposed evidence score mechanism. As the stream progresses, the memory dynamically updates to retain a fixed number of informative frames, preventing the memory footprint from growing linearly with sequence length. This also prevents degradation over long sequences and keeps the underlying generator completely unchanged without retraining, architectural modifications, or auxiliary losses. Evaluated on both realistic and synthetic streaming benchmarks, Stream3D outperforms latent-transport baselines, including KV-cache reuse and flow-based feature editing, across both photometric and geometric metrics. More details can be found at: https://stream-3d.github.io/stream3d.github.io/.

04.
arXiv (CS.CV) 2026-06-25

CustomX: Unified Character, Action, and Scene Customization in Video World Models

Recent advances in world models have greatly enhanced interactive environment simulation. Existing methods mainly fall into two categories: (1) static world generation models, which construct 3D environments without active agents, and (2) controllable-entity models, which allow a single entity to perform limited actions in an otherwise uncontrollable environment. In this work, we introduce CustomX, leveraging the realism and structural grounding of static world generation while extending controllable-entity models to support user-specified characters capable of performing open-ended actions. Users can provide a 3DGS scene and a character, then use natural language to direct the character to perform diverse behaviors, ranging from basic locomotion to object-centric interactions, while freely exploring the environment. CustomX synthesizes temporally coherent video clips that preserve visual fidelity with the provided scene and character, formulated as a conditional autoregressive video generation problem. Built upon a pre-trained video generator, our training strategy significantly enhances motion dynamics while maintaining generalization across actions and characters. Our evaluation covers a broad range of aspects, including visual quality, character consistency, action controllability, and long-horizon coherence.

05.
arXiv (quant-ph) 2026-06-11

Logical error estimation from syndrome data of surface-code experiments

arXiv:2606.11496v1 Announce Type: new Abstract: Decoders for quantum error correction (QEC) experiments rely on detector error models (DEMs), which encode, for each error, its probability and the detectors and logical observables it flips. Here we show that estimating DEM event probabilities from experimental syndromes is feasible, avoids independent device benchmarking, and produces useful decoder priors for estimating and reducing decoded logical error probabilities. We evaluate our methods using open-source data from surface-code memory experiments performed on Google's Willow chip, and we carry out analogous surface-code experiments on IBM's \texttt{ibm\_miami} processor. Despite the different physical error scales of the Google and IBM devices, in both cases our estimated DEMs improve logical error probabilities relative to baseline device-informed DEMs, typically at the $5\%-10\%$ level and with larger gains in some IBM cases, without additional calibration circuits, decoder fine-tuning, or supervised fitting to logical outcomes.

06.
bioRxiv (Bioinfo) 2026-06-24

Statistical tests for bivariate spatial association across multi-omics data with disjoint coordinates

Spatial biology has entered a new era of multimodal profiling, with multiple, high-dimensional spatial omics types being measured on consecutive tissue slices, or co-assayed on the same slice. Interest then lies in statistical testing for spatial association between the features of the different modalities, to gain insight in biological processes. One major challenge is the multitude of bivariate combinations, leading to high computational demands. Another difficulty is the difference in spatial resolution between technologies, implying no one-to-one matching between the measurement spots of the two modalities, even after alignment. As a result, common statistical measures such as joint distributions and correlations are not defined, and tests need to rely on spatial vicinity only. Moreover, we argue that many existing bivariate association tests address an inappropriate null hypothesis, or make inappropriate assumptions, both implying absence of spatial autocorrelation in any of the features and leading to misleading conclusions. As a remedy, we modify tests for the detection of spatially variable genes (Moran's I, Gaussian processes and generalized additive models (splines)) to derive bivariate tests across modalities with non-overlapping coordinate sets and provide variance estimators that do account for spatial autocorrelation. We develop inference methods for single sections as well as for replicated experiments with multiple sections, and compare their performance in nonparametric and parametric simulations. Finally, we apply the newly developed methods to two co-assayed spatial transcriptomics and metabolomics datasets from mouse and human. The full suite of tests is available from github.com/sthawinke/sbivar as the R-package sbivar.

07.
arXiv (quant-ph) 2026-06-25

Nonlocal Quantum Phase Transitions

arXiv:2606.25061v1 Announce Type: new Abstract: Phase transitions are paradigmatic examples of emergent phenomena, in which symmetries present at the microscopic level can be spontaneously broken in the thermodynamic limit. Two primary physical mechanisms can drive this symmetry breaking: thermal fluctuations in classical phase transitions and quantum fluctuations in quantum critical phenomena. Here, we introduce $nonlocal$ $quantum$ $fluctuations$ as a new fundamental mechanism to drive phase transitions. We show that entanglement shared between environmental modes can induce a correlated symmetry breaking in remote systems, independent of their spatial separation. Using the framework of driven-dissipative phase transitions, we theoretically investigate a system composed of two nonlinear quantum resonators placed at arbitrarily large spatial separations, each coupled to independent local Markovian baths. We consider the regime in which remote environmental modes are prepared in broadband entangled states. We show that near the critical point, where the susceptibility to weak perturbations diverges, quantum correlations in the environments govern the system critical behavior. While these correlations manifest locally only as effective thermal fluctuations, at the global level they give rise to an emergent nonlocal phase transition, marked by the spontaneous symmetry breaking of a collective mode shared by the two remote systems.

08.
arXiv (quant-ph) 2026-06-24

Quantum Correlations of Neutrinos in the Kerr-Newman Space-time

arXiv:2605.10424v2 Announce Type: replace-cross Abstract: Quantum phases provide a connection between gravitation and quantum information, which proposes a novel avenue to explore the properties of space-time. In this paper, we investigate the quantum correlations (QCs) of neutrinos in the Kerr–Newman space-time. Both radial and non-radial propagations are considered under the weak-field approximation. The results show that, for inward propagations, the oscillation probabilities and QCs differ significantly from those obtained in the Schwarzschild metric. In the case of radial outward propagation, the larger angular momentum $a$ increases the oscillation period of the survival probability $P_{ee}$, entanglement, and monogamy of nonlocality, whereas the larger charge $Q$ decreases the corresponding periods. For non-radial propagations, $M$ and $a$ can noticeably modulate the amplitudes of the considered QCs, which is not observed in the case of radial propagations. Furthermore, we find that, despite differences in their variation ranges, entanglement and coherence exhibit highly consistent oscillation behaviors in both radial and non-radial propagation cases. These findings provide a comprehensive understanding for the neutrinos-based relativistic quantum information.

09.
arXiv (quant-ph) 2026-06-17

Robust Spin Splitting and Strain-Controlled Optical Response in Monolayer CrC2N4 for Valleytronic and Optoelectronic Applications

arXiv:2606.17329v1 Announce Type: cross Abstract: Monolayer CrC2N4 recently emerged as a promising two-dimensional semiconductor, yet its spin-orbit-coupled (SOC) physics and strain-tunable optical response remained largely unexplored. Here, we investigated the electronic, valley, charge-transfer, and optical properties of pristine and biaxially strained monolayer CrC2N4 using first-principles calculations. The monolayer exhibited a direct band gap at the K/K' valleys. SOC produced valley contrasting out-of-plane spin polarization, yielding a moderate valence band spin splitting of 51.9 meV and a small conduction band spin splitting of 1.7 meV. Orbital-resolved analysis showed that the edge states were mainly governed by Cr-d and N-p hybridization, while Bader analysis indicated polar-covalent bonding through charge transfer toward N atoms. Biaxial strain in the range of -4% to +4% tuned the band gap from 1.987 to 1.421 eV and drove an indirect-to-direct gap transition near -1% strain. Tensile strain enhanced the Berry curvature and red-shifted the optical response toward the visible-near-infrared region. These results suggested monolayer CrC2N4 as a promising platform for strain-engineered valleytronic and optoelectronic device applications.

10.
arXiv (CS.AI) 2026-06-24

Subjective-Graph LLM Agents for Simulating Uncertainty in Classroom Social Perception

arXiv:2603.20750v2 Announce Type: replace Abstract: Social actors do not observe a common social world: each individual forms judgments from a partial and potentially distorted view of the surrounding network. We study whether graph-local evidence and credibility-weighted communication can generate persistent distortions in perceived academic standing, even when agents repeatedly receive objective performance signals. We introduce a data-constrained multi-agent framework in which LLM agents operate through individualized subjective graphs that determine peer visibility, evidence access, and interaction opportunities. Agents exchange uncertainty-annotated assessments, evaluate message credibility, and maintain explicit Gaussian belief states updated through Bayesian fusion. We evaluate the framework on 12 middle-school classrooms comprising 482 students, using questionnaire-derived social information and six consecutive examinations. On the Social-Observed subset (n=419), collective ranking error increases from 0.066 \pm 0.008 to 0.124 \pm 0.009 across six epochs despite repeated exam-based anchoring. Ablations associate individualized visibility and LLM-based trust gating with more stable long-horizon behavior, while constrained retrieval primarily safeguards against global-information leakage. Compared with evaluated DeGroot configurations, the proposed framework achieves lower final ranking error; those DeGroot configurations exhibit near-zero terminal opinion diversity. These findings establish subjective-graph LLM agents as a mechanism-oriented framework for data-constrained simulated social perception. Code is available at https://anonymous.4open.science/r/Rashomonomon-0126.

11.
arXiv (CS.CV) 2026-06-11

ViT-FREE: Efficient Face Recognition via Early Exiting and Synthetic Adaptation

Vision Transformers (ViTs) have gained significant attention in computer vision and shown strong potential for face recognition (FR). However, their high computational cost makes deployment on resource-constrained devices challenging, motivating the need for methods that balance efficiency and accuracy. In this work, we investigate early exiting in pretrained ViTs as a simple yet effective training-free strategy for efficient FR inference. Leveraging the uniform feature dimensionality across transformer encoder blocks, we introduce ViT-FREE, a multi-exit framework that enables face verification directly from intermediate representations without modifying or retraining the backbone model, and thus, reducing inference cost. Empirically, we show that patch embeddings and attention maps evolve progressively across depth, exhibiting high similarity between consecutive ViT blocks and increasing alignment with the final representation. This indicates gradual feature refinement and attention convergence, suggesting that intermediate layers already provide stable and discriminative representations suitable for early exiting. Through extensive experiments on multiple FR benchmarks, we systematically analyze the accuracy-efficiency trade-off across exit depths. Our results demonstrate that later exits achieve a highly favorable balance, with exiting at layer 10 yielding up to a 20% speedup while incurring only a 1.5 drop in verification performance on benchmarks such as IJB-C. Also, we propose ViT-FREE_FT, a lightweight exit-specific fine-tuning strategy that adapts only the projection layers using a small synthetic dataset while keeping the transformer backbone frozen. This approach improves the performance of shallow exits while preserving the efficiency benefits and leaving deeper exits largely unaffected.

12.
arXiv (quant-ph) 2026-06-24

Improved State Readout in NV Centers using Regression Models and Rabi Driving

arXiv:2606.23454v2 Announce Type: replace Abstract: Readout of state populations in nitrogen-vacancy centers from fluorescence measurements at room-temperature is routinely achieved via contrast-based calibration. The fidelities achieved by this conventional approach are limited by reducing the dynamical fluorescence behaviour of the NV center to a scalar value, and calculating the population of each possible state independently. To address these limitations, we use regression models trained on experimental data to map the fluorescence signals onto ideal simulated populations. Additionally, we enhance the informational content of the fluorescence signals by performing measurements during induced Rabi oscillations. Our results demonstrate that including these dynamical signals significantly reduces state readout errors across multiple tested models. Notably, linear ridge regression performs nearly on par with a non-linear kernel-based model, showing that simple models already capture the relevant mapping between the enhanced fluorescence signals and the underlying state populations. This data-driven approach provides a robust alternative that achieves higher fidelities than conventional calibration in our setting, paving the way for high-fidelity state readout in solid-state quantum registers.

13.
arXiv (CS.AI) 2026-06-24

A Survey on Federated Causal Discovery and Inference

arXiv:2606.23741v1 Announce Type: cross Abstract: Causal reasoning, which encompasses the discovery of causal structures and the inference of causal effects, is fundamental to data-driven decision making. In practice, data for reliable causal analysis are often distributed across institutions and cannot be centralized due to privacy regulations or communication constraints. Federated learning (FL) addresses this by enabling collaborative analysis without raw data sharing, giving rise to the rapidly growing field of federated causal discovery (FCD) and inference (FCI). However, the interdisciplinary nature of this field and the absence of a comprehensive survey present barriers to entry for researchers. This paper bridges that gap by providing a systematic review through multi-dimensional taxonomies. Grounded in the three core design decisions underlying any FCD solution, namely how structures are learned, how data are partitioned, and what structural knowledge each party obtains, we organize FCD along three axes: methodological paradigm, federation topology, and structural scope. We further examine key practical dimensions, including temporal dynamics, data heterogeneity, missing data, and non-identical variable sets. For FCI, we categorize methods by target estimand (average versus individualized/conditional treatment effects) and by estimation strategy, from classical weighting methods to modern deep generative architectures. Unlike prior works that treat FCD and FCI separately, we formalize their connection as complementary stages of a unified federated causal reasoning pipeline, where FCD supplies the structural knowledge required for valid effect estimation in FCI. Finally, we highlight their shared concerns regarding privacy, communication efficiency, theoretical guarantees, and application domains, and conclude by identifying open challenges for future research.

14.
arXiv (quant-ph) 2026-06-12

Quantized time in quantum walks under weak rank-K measurements

arXiv:2606.13552v1 Announce Type: new Abstract: Measurements can be used to monitor the evolution of quantum systems and may lead to a universally quantized time statistics. It is known that the mean return time is quantized for strong and indirect monitoring through the winding number of the return amplitude in a one-dimensional space. Here we discuss that under multi-channel strong or indirect monitoring, where the latter is achieved through ancilla coupling, the mean return time of a quantum walk in the projected subspace is also quantized. This reflects a universal time quantization for a higher dimensional evolution.

15.
bioRxiv (Bioinfo) 2026-06-18

A Two-Stage Interpretable Framework for Predicting Plant-Derived Small RNA Targets on Human 3'UTRs

作者:

Can plant-derived small RNAs target human mRNA 3'UTRs via complementary base pairing and produce experimentally detectable regulatory effects? This question concerns not only the fundamental feasibility of cross-kingdom RNA regulation but also the technological pathway for screening plant-derived active small nucleic acids. Existing miRNA target prediction tools are predominantly designed for endogenous miRNA-mRNA systems, exhibiting notable limitations when applied to cross-species small RNA inputs and small-sample wet-lab experimental adaptation. In this study, we developed a two-layer prediction framework, MetaLulu-AI. The first layer builds upon publicly available human miRNA-mRNA 3'UTR interaction data, utilizing XGBoost to learn foundational binding rules on human 3'UTRs based on 41 interpretable computational features, including seed region pairing types, local context sequence composition, site positioning, and RNA secondary structures. The second layer is tailored to the experimental system of plant-derived small RNAs and human target genes. It introduces 40 experimental samples using significant changes in endogenous protein expression as the regulatory standard (determined by Western blot or ELISA 48 hours post-transfection of small RNAs via Lipo3000). Using 52-dimensional computational features and the optimal transcript scores from the first layer as inputs, this layer employs TabPFN for experimental label adaptation. The first-layer dataset consists of 38,752 training samples, 5,536 validation samples, and 11,073 testing samples (totaling 55,361), with a positive-to-negative sample ratio of approximately 1:5.4. On the randomly split test set, the model achieved an AUC of 0.9686, a recall of 0.8523, a precision of 0.8080, and an accuracy of 0.9452 (at a decision threshold of 0.4797). Group-based splitting revealed that the model maintains high discriminative power for unseen genes (AUC = 0.9541), though its generalization ability for completely unseen miRNAs decreases (AUC = 0.7390). For the 40 experimental samples in the second layer, the TabPFN model achieved an average AUC of 0.7406 {+/-} 0.092 across ten repeated 70/30 random splits, outperforming the baseline of directly using the first-layer scores (0.3563 {+/-} 0.149); the average AUC in a 5-fold cross-validation was 0.770 {+/-} 0.177. SHAP analysis demonstrated a clear divergence in the discriminative basis of the two models: the first layer relies more heavily on the thermodynamics of the small RNA itself and the quality of canonical seed sites, whereas the second layer focuses more on the local UTR environment and statistical site features. Although the current second-layer results are constrained by sample size and gene coverage, this framework serves as a preliminary observation of the adaptation mechanism for cross-kingdom regulation experiments, and motivating future large-scale validation. Under stricter leave-one-gene-out and leave-one-small-RNA-out evaluation, the adapter exceeded the first-layer score baseline but only matched the majority-class baseline, underscoring that entity-level generalization is not yet established.

16.
arXiv (CS.CL) 2026-06-15

The Linguistics Olympiads: Towards a New Corpus for Linguistics Research?

Linguistics olympiad problems (LOPs) are a category of self-sufficient puzzles consisting of a scaled-down corpus representative of certain linguistic phenomena, from which the solver must deduce a primitive set of rules of the language and then translate a new set of elements. The linguistics olympiads (LOs) have become a worldwide phenomenon with 43 different territories taking part in the International Linguistics Olympiad (IOL) 2025. While the typology and solving strategies of LOPs have been analysed, their scientific facet and connections to academic linguistics have yet to be explored. LOPs are directly connected to many linguistic fields, e.g., linguistic typology, linguistic relativity, and linguistics fieldwork. Recently, LOPs have become a research focus as benchmarks for large language models, thus highlighting their usefulness in computational linguistics. Nevertheless, they have not yet been integrated into mainstream linguistics research. This paper attempts to open new directions of including this particular type of puzzle in academic research by offering a structured evaluation of LOPs as linguistic data sources and proposes criteria for their responsible use in academic research. Starting from a set of over 1800 LOPs, this study critically examines the potential of LOPs as a novel corpus for linguistics research by discussing their strengths and limitations as tools, as well as the areas of linguistics into which these problems could fit. This work forms the foundation for a broader initiative aimed at bridging the gap between LOs and academic linguistics, by establishing a robust theoretical framework for LOPs.

17.
arXiv (quant-ph) 2026-06-16

Bi-qutrit entangled edge states of positive partial transposes with largest ranks

arXiv:2606.16265v1 Announce Type: new Abstract: Whenever $E$ is an eight dimensional subspace of the bi-qutrit quantum system whose orthogonal complement is spanned by a vector of Schmidt rank three, we show that there exist PPT entangled edge states with the range space $E$ whose partial transposes are of rank six, which is the largest possible rank. In this way, we exhibit a huge family of bi-qutrit PPT entangled edge states of type $(8,6)$. They make faces of the convex set of all PPT states, and we find bi-qutrit PPT entangled edge states of other types on the boundaries of such faces.

18.
bioRxiv (Bioinfo) 2026-06-18

Deciphering shared and divergent tissue architectures from cross-species spatial transcriptomics

作者:

The integration of spatial transcriptomics (ST) data across species is essential for cross-species and translational studies, but remains challenging due to molecular divergence and anatomical differences between organisms. We present STACAME, a graph attention autoencoder-based framework to decipher shared and divergent tissue architectures from cross-species ST data by explicitly modeling both orthologous and species-specific genes. STACAME aligns ST slices in a spatially aware manner, identifies homologous and species-specific domains, and enables a suite of downstream comparative analyses. We demonstrate its utility by integrating ST datasets from diverse tissues, including hippocampus, isocortex, embryo, breast, liver, and cerebellum, across multiple species such as human, macaque, marmoset, mouse, and zebrafish. STACAME supports cross-species spatial domain alignment, the detection of shared and divergent spatially variable genes, development alignment and comparison, and the 3D integration of tissue architecture. This flexible approach facilitates the translation of findings from model organisms to humans, providing a unified computational platform for cross-species spatial transcriptomics.

19.
arXiv (CS.AI) 2026-06-11

WeaveBench: A Long-Horizon, Real-World Benchmark for Computer-Use Agents with Hybrid Interfaces

arXiv:2606.09426v2 Announce Type: replace Abstract: Computer-use agents (CUAs) increasingly operate in runtimes that combine visual desktop control, command-line execution, code editing, browsers, and external tools. Existing benchmarks, however, often evaluate these interfaces as separable capabilities, leaving long-horizon cross-interface orchestration under-tested. Thus, we introduce WeaveBench, a long-horizon hybrid-interface benchmark with 114 tasks across 8 real-world work domains, grounded in real user requests and publicly verifiable artifacts. Each task requires agents to combine GUI observations/actions with CLI/code operations within a single trajectory. We evaluate these tasks on a real Ubuntu desktop inside deployed CLI-agent runtimes, augmented with a minimal desktop-control plugin. We also propose a companion trajectory-aware judge that inspects deliverables, files, screenshots, logs, and action traces, while detecting shortcut behaviors such as fabricated visual evidence or hard-coded metrics. Across frontier model-runtime pairings, the best PassRate reaches only 41.2%, showing the benchmark remains far from saturated. The trajectory-aware judge further reveals that outcome-only grading substantially overestimates agent performance. Overall, WeaveBench exposes a critical gap in CUA evaluation and provides an effective testbed to measure whether agents can orchestrate GUI, CLI, and code operations across long-horizon real-world tasks.

20.
arXiv (CS.CV) 2026-06-11

Exploring Adaptive Masked Reconstruction for Self-Supervised Skeleton-Based Action Recognition

Recently, masked skeleton reconstruction models have emerged as strong action representation learners, driving significant progress in self-supervised skeleton-based action recognition. However, existing state-of-the-art methods must predict an exceedingly large number of spatiotemporal patches, significantly prolonging training time. Besides, by treating all spatiotemporal regions equally during reconstruction, these models are distracted from learning the critical motion patterns that underlie action semantics. To address these challenges, we propose Adaptive Masked Reconstruction (AMR), a faster and stronger pre-training framework. We first decouple the decoder from the encoder, enabling flexible prediction of larger spatiotemporal patches and dramatically reducing reconstruction complexity. Given that larger patches contain more complex information, which is challenging to predict and consequently degrades performance, we accordingly introduce an adaptive guidance module. This module identifies regions of high motion informativeness, guiding the model to focus on the most discriminative parts of each patch and alleviating reconstruction difficulty. Experiments on NTU RGB+D 60, NTU RGB+D 120, and PKU-MMD datasets demonstrate that AMR not only accelerates pre-training substantially but also improves downstream recognition accuracy, surpassing current state-of-the-art approaches.

21.
arXiv (quant-ph) 2026-06-24

Quantum-enhanced estimation of stimulated Raman optical activity

arXiv:2606.23722v1 Announce Type: new Abstract: In recent times there has been growing interest in Raman optical activity (ROA) for its label free detection of absolute configuration, conformation, and stereochemical structure in chiral biosamples and drug molecules. Since ROA signals are generally small, techniques such as stimulation by a probe beam can be used to enhance the signal strength. However, with a classical probe, the measurement precision is still fundamentally limited by its shot noise. To solve this problem we propose the use of two-mode squeezed vacuum and show that it can achieve sub-shot noise limited measurement sensitivity. Using quantum estimation theory, we derived the quantum Fisher information and the quantum Cramér-Rao bound (QCRB) for stimulated ROA measurement to quantify the precision enhancement. This improvement comes from photon-number correlations which suppress the intensity fluctuation common to both modes. We further show that balanced detection of the output intensity difference is a practical measurement scheme that approaches the QCRB and becomes optimal in the small-chirality limit. This opens a promising path toward more sensitive Raman chiroptical spectroscopy of weak and photosensitive samples.

22.
arXiv (CS.LG) 2026-06-15

Nonlocal Bayesian Modeling of Continuous Spatio-Temporal Dynamics

arXiv:2606.14313v1 Announce Type: cross Abstract: Real-world spatio-temporal forecasting must handle irregular time points, spatially sparse observations, and the need for uncertainty quantification. This setting is often further compounded by nonlocal interactions (long-range spatial coupling). Modeling continuous-space, continuous-time nonlocal dynamics naturally leads to infinite-dimensional integro-differential equations (IDEs), making principled Bayesian inference intractable. We propose the NonLocal Bayesian Spatio-Temporal model (NLBST), a hierarchical Bayesian framework for continuous spatio-temporal fields that learns explicit nonlocal coupling while retaining tractable inference. NLBST represents the latent field via a coordinate-based spatial basis expansion and models the coefficient process with a continuous-time ODE whose learnable linear operator corresponds to a Galerkin reduction of a nonlocal IDE; a Neural ODE residual captures additional nonlinear dynamics. A linear-Gaussian observation model enables Kalman-style sequential updates under missing and irregular observations, while the spatial basis representation enables inductive prediction at unmeasured locations without retraining. Global parameters are learned via variational inference, and uncertainty is handled through a Bayesian hierarchy. Experiments on synthetic and real-world datasets demonstrate strong forecasting and spatial generalization with well-calibrated uncertainty, yielding substantial gains over baselines in strongly nonlocal and partially observed regimes.

23.
arXiv (CS.CL) 2026-06-18

ASyMOB: Algebraic Symbolic Mathematical Operations Benchmark

Large language models (LLMs) are increasingly applied to symbolic mathematics, yet existing evaluations often conflate pattern memorization with genuine reasoning. To address this gap, we present ASyMOB, a high-resolution dataset of 35,368 validated symbolic math problems spanning integration, limits, differential equations, series, and hypergeometrics. Unlike prior benchmarks, ASyMOB systematically perturbs each seed problem using symbolic, numeric, and equivalence-preserving transformations, enabling a fine-grained assessment of generalization. Our evaluation reveals three key findings: (1) most models' performance collapses under minor perturbations, while top systems exhibit an apparent regime shift in robustness; (2) integrated code tools stabilize performance, particularly for weaker models; and (3) we identify examples where Computer Algebra Systems (CAS) fail while LLMs succeed, as well as problems solved only via a hybrid LLM-CAS approach, highlighting a promising integration frontier. ASyMOB serves as a principled diagnostic tool for measuring and accelerating progress toward building verifiable, trustworthy AI for scientific discovery.

24.
arXiv (CS.CV) 2026-06-18

URDF Synthesis from RGB-D Sequences via Differentiable Joint Inference and Energy-Consistent Verification

作者:

Reconstructing simulation-ready digital twins of articulated objects from sensor observations remains constrained by two persistent gaps: (i) part-level geometric reconstruction is decoupled from kinematic-parameter estimation, and (ii) the recovered models often violate basic dynamic invariants such as energy conservation, leading to drift when the URDF is replayed in physics simulators. We present KinemaForge, a constraint-driven pipeline that jointly infers part-level shape, joint topology, and joint parameters from short RGB-D sequences and validates the result against an energy-consistent verifier built on differentiable rigid-body dynamics. The pipeline introduces three components: a kinematic constraint graph that encodes joint-part incidences as soft edges; a differentiable screw-axis solver that backpropagates from rendered observations through Featherstone's articulated-body algorithm to joint parameters; and an energy residual loss that penalises non-physical free responses of the reconstructed model. Across five PartNet-Mobility categories and an internal RGB-D benchmark, KinemaForge reduces the average joint-axis error from 4.52 degrees to 2.83 degrees (-37.4%) over the strongest geometric baseline (PARIS) and from 5.30 degrees to 2.83 degrees (-46.6%) over the interaction-based Ditto baseline, lowers long-horizon simulation drift by 64% (vs. PARIS) over 50 s rollouts, and yields URDFs whose closed-loop manipulation success rate improves by 14.6 percentage points over Ditto in our preliminary evaluation. Code and reconstruction data will be released upon acceptance.

25.
arXiv (CS.LG) 2026-06-19

Predicting Mergeability of Parameter-Efficient Fine-Tuning Updates

arXiv:2606.19549v1 Announce Type: new Abstract: Low-rank adaptation (LoRA) makes it cheap to train many domain- and task-specific language model adapters, but whether two adapters can be merged is usually discovered only after both have been fully trained and evaluated. This late feedback is costly: adapters that are strong in isolation can interfere destructively once their updates are combined. We ask whether this outcome can be anticipated. We formalize adapter mergeability as the degree to which an adapter preserves its single-task utility after merging, and show that it can be forecast from signals measured in the first few percent of training – chiefly how the low-rank updates and their gradients align across tasks and how much they disturb shared representations. We package these signals into MergeProbe, a lightweight predictor that estimates pairwise and set-level retention and turns the estimate into a concrete decision: merge directly, reweight, prune, or route. On MERGE-PEFT, a five-domain benchmark spanning math, code, science, instruction following, and safety, MergeProbe attains the best average and worst-case retention among strong interference-aware merge baselines while adding far less deployment overhead than full task routing. This turns LoRA merging from a post-hoc engineering step into an anticipatory measurement problem.