Academic Intelligence · Curated Daily

探索全球前沿学术脉络

AcademicHub 汇聚顶级期刊与预印本平台的实时文献。定制您的专属科研雷达,利用大语言模型自动生成交叉领域文献分析简报。

01.
arXiv (quant-ph) 2026-06-16

Trainable Quantum Channels as Computational Primitives for Quantum Learning

arXiv:2606.15808v1 Announce Type: new Abstract: Variational quantum learning is traditionally constrained to unitary dynamics, often treating quantum channels as detrimental noise. In this work, we reformulate the quantum channels as trainable computational primitives and establish a non-unitary quantum machine learning framework grounded in open-system dynamics. We demonstrate that the outputs of channel-enhanced quantum models form a structured superposition of multiple functional components. Each component is governed by an effective observable whose spectrum can be adaptively modulated during training, a significant departure from the spectral invariance in unitary transformations. Moreover, the proposed framework generalizes conventional unitary quantum models by retaining them as a special case while introducing additional non-unitary degrees of freedom. Furthermore, we reveal that trainable quantum channels enrich the optimization geometry through ensemble-averaged gradient and additional optimization directions induced by the Kraus operators. Empirical evaluations on classification tasks using trainable amplitude-damping and phase-damping channels confirm enhanced optimization dynamics and predictive performance. Our work provides a principled approach for leveraging quantum channels as trainable resources and advances the design of high-performance quantum learning architectures.

02.
arXiv (CS.CV) 2026-06-11

DepthMaster: Unified Monocular Depth Estimation for Perspective and Panoramic Images

While monocular depth estimation has achieved significant progress, achieving generalized metric depth estimation for both narrow field-of-view (FoV) perspectives and $360^\circ$ panoramas remains an unsolved challenge. Existing methods are often tailored to specific camera types and struggle to produce accurate metric depth that generalizes across diverse settings. This limitation stems from two key challenges: the inherent geometric discrepancy between perspective and panoramic cameras, and the scarcity of panoramic training data with metric annotations. In this work, we introduce DepthMaster, a unified metric depth estimation framework. Rather than employing specialized networks to learn spherical distortions, we reformulate the problem by decomposing panoramic images into overlapping perspective patches. Crucially, distinct from prior projection-based methods that rely on ad-hoc architectural modifications to handle boundaries, we introduce a novel Correspondence Consistency Loss (CCL) and inject virtual projection cameras as geometric priors, allowing us to seamlessly stitch the patches while avoiding specialized operators and keeping the backbone largely compatible with standard Transformer designs. This strategy also resolves the geometric differences by unifying all inputs into a canonical perspective representation, and effectively circumvents data scarcity by directly unlocking powerful metric priors from vast perspective datasets. Trained on a mixed dataset that contains only one panorama dataset, DepthMaster achieves state-of-the-art zero-shot performance on 13 diverse datasets, outperforming not only universal methods but also leading specialist models in both perspective and panoramic domains.

03.
arXiv (CS.AI) 2026-06-12

Structured Testbench Generation for LLM-Driven HDL Design and Verification-Oriented Data Curation

arXiv:2606.12983v1 Announce Type: new Abstract: Automated testbench generation has become a critical bottleneck in large language model (LLM)-driven Register Transfer Level (RTL) workflows, where large numbers of candidate designs must be verified rapidly and reliably. Existing prompt-based approaches treat testbench generation as unconstrained code synthesis, yielding stochastic outputs with high token cost, low reproducibility, and insufficient coverage. To address this gap, we present STG, a Structured Testbench Generation framework that exploits the inherent structure of hardware designs to generate deterministic testbenches. As a direct verification tool, STG runs 720x faster than an iterative LLM-based testbench generation flow and higher rate of successful compilation, achieves higher coverage, and reduces false-pass verdicts on incorrect DUTs. STG also helps identify errors in RTL generation benchmarks by exposing faulty benchmark testbenches. As a data curation engine, it is 11x faster than LLM-based filtering on a single CPU core with 127x less energy, and the resulting distilled models provide state-of-the-art performance in our multi-benchmark evaluation. As a test-time scaling oracle, it reduces node count by 14-47\%. Our models are available at https://huggingface.co/collections/AS-SiliconMind/siliconmind-v12.

04.
arXiv (CS.LG) 2026-06-12

Limits of spectral learning under noise

arXiv:2606.13067v1 Announce Type: new Abstract: Learning functional relationships from noisy data is a central problem in scientific inference. Spectral methods approximate unknown functions by expanding them in a basis and estimating the corresponding coefficients from data, but the stability of these coefficients under noise remains poorly understood. Here we study supervised regression with additive label noise using sparse spectral representations across multiple bases and dimensions. We show that noise induces a predictable drift in the learned coefficient vector whose magnitude depends on the effective number of active spectral modes. After whitening the empirical feature geometry, we derive a closed-form expression for the overlap between noisy and noiseless coefficient vectors, revealing a universal degradation curve governed by a single intrinsic noise scale. Numerical experiments across Fourier, Legendre, Bessel, and Haar bases confirm the theoretical prediction. The results demonstrate that spectral learning exhibits a fundamental noise threshold beyond which coefficient estimates become unstable, placing intrinsic limits on recovering functional structure from noisy data.

05.
arXiv (CS.AI) 2026-06-17

Enhanced Evolutionary Multi-Objective Deep Reinforcement Learning for Reliable and Efficient Wireless Rechargeable Sensor Networks

arXiv:2510.21127v2 Announce Type: replace-cross Abstract: Despite rapid advancements in sensor networks, conventional battery-powered sensor networks suffer from limited operational lifespans and frequent maintenance requirements that severely constrain their deployment in remote and inaccessible environments. As such, wireless rechargeable sensor networks (WRSNs) with mobile charging capabilities offer a promising solution to extend network lifetime. However, WRSNs face critical challenges from the inherent trade-off between maximizing the node survival rates and maximizing charging energy efficiency under dynamic operational conditions. In this paper, we investigate a typical scenario where mobile chargers move and charge the sensor, thereby maintaining the network connectivity while minimizing the energy waste. Specifically, we formulate a multi-objective optimization problem that simultaneously maximizes the network node survival rate and mobile charger energy usage efficiency across multiple time slots, which presents NP-hard computational complexity with long-term temporal dependencies that make traditional optimization approaches ineffective. To address these challenges, we propose an enhanced evolutionary multi-objective deep reinforcement learning algorithm, which integrates a long short-term memory (LSTM)-based policy network for temporal pattern recognition, a multilayer perceptron-based prospective increment model for future state prediction, and a time-varying Pareto policy evaluation method for dynamic preference adaptation. Extensive simulation results demonstrate that the proposed algorithm significantly outperforms existing approaches in balancing node survival rate and energy efficiency while generating diverse Pareto-optimal solutions. Moreover, the LSTM-enhanced policy network converges 25% faster than conventional networks, with the time-varying evaluation method effectively adapting to dynamic conditions.

06.
arXiv (CS.CL) 2026-06-15

Verbatim Chunks Beat Extracted Artifacts: A Controlled Ablation of Memory Representations for Long LLM Conversations

作者:

A growing class of conversational-memory systems compresses dialogue history into structured artifacts – extracted facts, decisions, or events – on the premise that distilled structure retrieves better than raw text. We test this premise with a controlled ablation: within one fixed retrieval-rerank-reasoning pipeline, we swap only the stored representation – LLM-extracted typed artifacts versus verbatim conversation chunks – holding the model, retriever, reranker, and judge constant. Verbatim chunks win by 15.9 points on LoCoMo (43.9% vs. 28.0%) and 22.0 points on LongMemEval-S (67.4% vs. 45.4%); a 1-hop semantic graph does not recover the gap, and five confound controls reproduce the effect. The mechanism is lossy distillation: extraction discards verbatim detail that chunks retain for free, and the extracted-artifact pipeline never beats naive RAG in overall accuracy. Concurrent positive results with near-verbatim, provenance-preserving units fit the same account: retrieval accuracy tracks how far the representation departs from the source. For the extraction designs we test, structured memory should augment verbatim text rather than replace it: a chunks $\cup$ artifacts union store matches chunks on both benchmarks while artifacts alone forfeit the gap. Code and data: https://github.com/tao-hpu/cog-canvas

07.
medRxiv (Medicine) 2026-06-15

A More-Than-Human Approach to Designing for Mental Health: Remixing Prototypes for the Contexts of Complex Healthcare Infrastructures

Digital mental health tools (DMHTs) often fail to be successfully implemented in clinical settings. While user- and human-centred design frameworks are frequently proposed for developing effective tools, they are insufficient to address the sociotechnical complexity of healthcare environments. This paper addresses this limitation by detailing the application of a more-than-human design framework to incorporate wider contextual factors into design decisions. To demonstrate the application of this more-than-human design framework, we present a case study showcasing the design of one specific feature within a DMHT intended to support Health Improvement Practitioners (HIPs) in New Zealand's Integrated Primary Mental Health and Addictions (IPMHA) service. Our process blends usage-context storyboards with interface prototypes, using think-aloud interviews to test the contextual fit of our prototypes. The initial design concept failed due to contextual factors such as inconsistent wait times and the administrative burden on clients and clinic staff. This led to a pivot to a more context-appropriate, practitioner-focused, in-session concept for digital psychometric administration and automated scoring. This case study demonstrates that for DMHTs to be viable within complex healthcare environments, design must focus on more than the needs of a single user, incorporating multiple stakeholders and contextual variables across the wider service-delivery context.

08.
arXiv (CS.CV) 2026-06-19

VisDom: Sparse Novel View Synthesis with Visible Domain Constraint

Sparse novel view synthesis (NVS) remains challenging due to the ambiguity of recovering 3D geometry from few input views. While NeRF- and Gaussian Splatting (GS)-based methods perform well with dense supervision, they often overfit in sparse settings, producing floating artifacts and inconsistent geometry. Silhouette consistency is commonly used as a regularizer, but it remains insufficient, as silhouette-consistent regions can extend beyond the true object geometry. We introduce VisDom, a learning-free geometric constraint that augments classical carving-based visual hull reconstruction by enforcing a minimum multi-view visibility requirement. Specifically, we define a visible domain as the subset of 3D space observed by at least $K$ views and use it as an additional filtering criterion on top of standard silhouette-based reconstruction. This provides a stronger spatial prior in sparse-view settings. We integrate VisDom into both implicit (NeRF) and explicit (GS) pipelines by restricting volumetric sampling and guiding Gaussian placement during optimization. Experiments on three challenging datasets show consistent improvements in sparse-view NVS, enabling high-quality object-centric reconstruction from as few as four input images. Our method is domain-agnostic, requires only silhouettes, and introduces no learned parameters, making it a simple complement to existing approaches. Applying VisDom on top of GaussianObject further improves performance on Omni3D and MipNeRF360, while matching or surpassing it at 22 $\times$ lower training cost.

09.
arXiv (CS.CV) 2026-06-15

Value-order Decomposition for Generalist Anomaly Detection

Industrial anomaly detection suffers from limited data, making cross-domain generalization particularly challenging. Generalist Anomaly Detection (GAD) aims to train a unified model on a source domain that can effectively detect anomalies in unseen target domains. In the initial semantic feature space, strong entanglement between anomalies and object categories or defect types hinders effective generalization across domains. Recent works address this issue by projecting features into a residual space; however, such methods primarily increase cross-domain overlap for normal features, while anomalous features remain specific to object categories, defect types and data domains, leading to poor alignment and generalization. To address this limitation, we propose Value-order Decomposition (VOD), a simple yet effective technique that bridges three types of generalization gaps across object categories, defect types (including real and synthetic defects), and data domains. VOD disentangles and suppresses object-category-, defect-type-, and domain-specific information, promoting alignment within normal and abnormal samples while preserving their separability, thereby enabling robust generalization across the three gaps. Leveraging the strong alignment between real and synthetic defects within the same object, we perform anomaly detection using only normal and synthetic-abnormal reference, and effectively generalize to unseen real defect types. Experiments on diverse industrial and medical benchmarks demonstrate that our method, using a simple cut-and-paste anomaly simulation strategy, achieves strong generalization across the three gaps.

10.
medRxiv (Medicine) 2026-06-17

Womens intentions and motivations towards health behaviour change before pregnancy: a cross-sectional survey of pregnant women in Australia

Introduction: The preconception period (i.e. the weeks and months before pregnancy) is a critical window during which parental health behaviours can influence pregnancy outcomes and the childs long-term health. Modifiable factors such as nutrition, physical activity, substance use, and environmental exposures play a key role, yet womens ability to adopt and sustain healthy behaviours is shaped by complex psychological, social and environmental influences. This study applies the Theory of Planned Behaviour to identify the beliefs underpinning womens preconception behaviours, with the aim of informing support for effective and sustained health behaviour change. Methods: An Australian national retrospective cross-sectional survey of pregnant women (18-49 years), recruited through social media platforms. The 92-item survey captured respondent socio-demographics, pregnancy status and health conditions, health behaviours, and beliefs regarding preconception health behaviours. Respondents level of pregnancy planning was categorised using the London Measure of Unplanned Pregnancy (LMUP). Items regarding preconception beliefs were structured in accordance with the Theory of Planned Behaviour, with a focus on regular exercise, healthy diet, and alcohol avoidance. These beliefs variables were analysed using structured equation modelling to identify paths between latent variables and the items used to estimate each concept. Results: The study was completed by 430 pregnant women of whom 72.7% had a planned pregnancy. Most had a partner, were university educated and in good health. Structural equation modelling showed intention strongly predicted exercise ({beta}=0.65), healthy diet ({beta}=0.54) and alcohol avoidance ({beta}=0.64). Perceived control and partner norms influenced intentions, whereas health professional norms had limited effect. Positive beliefs were associated with folate supplement use and smoking cessation. Conclusion: These findings highlight intention as a key driver of preconception health behaviours, with perceived control and partner influences playing a more significant role than individual beliefs or health professional input. Effective interventions should therefore address structural barriers and actively involve partners, while respecting womens autonomy. Overall, couples-focused, multi-level strategies are likely essential to support meaningful and sustained preconception health behaviour change.

11.
arXiv (CS.CL) 2026-06-16

A Practical Evaluation Method for Long-Form Simultaneous Speech-to-Speech Translation

Simultaneous speech-to-speech translation (SimulS2ST) enables real-time cross-lingual communication, but existing evaluation has focused largely on short or pre-segmented speech rather than long-form, continuous input. Prior approaches are difficult to reproduce and make assumptions that do not hold for end-to-end systems. We present a practical evaluation method for long-form SimulS2ST. Given source speech, pre-segmented source transcripts, and reference translations, we run automatic speech recognition (ASR) and forced alignment on the generated target speech to recover token-level timestamps, then apply a sentence-embedding-based aligner to match the target text to its corresponding source sentences. This enables sentence-level computation of latency and quality metrics, including YAAL and xCOMET, which are then aggregated into final system-level scores. Experiments on representative SimulS2ST systems show that the method is effective in practice and reveal that current systems suffer from substantial latency accumulation on long speech.

12.
Nature (Science) 2026-06-17

Navigating a crowded developing brain leaves neurons with broken DNA

As neurons migrate to their final destinations in the forming brain, their DNA gets damaged. The brain has evolved a fix, but there can be lasting consequences if repair fails. As neurons migrate to their final destinations in the forming brain, their DNA gets damaged. The brain has evolved a fix, but there can be lasting consequences if repair fails.

13.
arXiv (CS.LG) 2026-06-19

Alternating Direction Method of Multipliers for Nonlinear Matrix Decompositions

arXiv:2512.17473v3 Announce Type: replace-cross Abstract: We present an algorithm based on the alternating direction method of multipliers (ADMM) for solving nonlinear matrix decompositions (NMD). Given an input matrix $X \in \mathbb{R}^{m \times n}$ and a factorization rank $r \ll \min(m, n)$, NMD seeks matrices $W \in \mathbb{R}^{m \times r}$ and $H \in \mathbb{R}^{r \times n}$ such that $X \approx f(WH)$, where $f$ is an element-wise nonlinear function. We evaluate our method on several representative nonlinear models: the rectified linear unit activation $f(x) = \max(0, x)$, suitable for nonnegative sparse data approximation, the component-wise square $f(x) = x^2$, applicable to probabilistic circuit representation, and the MinMax transform $f(x) = \min(b, \max(a, x))$, relevant for recommender systems. The proposed framework flexibly supports diverse loss functions, including least squares, $\ell_1$ norm, and the Kullback-Leibler divergence, and can be readily extended to other nonlinearities and metrics. We illustrate the applicability, efficiency, and adaptability of the approach on real-world datasets, highlighting its potential for a broad range of applications.

14.
arXiv (CS.CV) 2026-06-18

Optimizing Incomplete, Large-Scale and Sparse Multi-Graph Matching in Bioimaging

Multi-graph matching is a fundamental problem in computer vision. Our work is motivated by a challenging application in bioimaging, where dozens or even hundreds of 3D microscopy images of worms must be brought into correspondence. Existing datasets do not cover this large-scale regime, and virtually all existing methods are inapplicable because they assume a complete or dense problem setting. To support further research, our first contribution is a new large-scale dataset based on problem instances from bioimaging. Our second contribution is a comprehensive analysis of the two main multi-graph matching paradigms: direct and permutation synchronization-based formulations. We argue, in part by proof, that practical large-scale methods must explicitly address problem sparsity and incompleteness. Since standard permutation synchronization approaches fail in this setting, we further introduce a sparse permutation synchronization paradigm. Our final contribution is GREEDA, a general method for sparse and incomplete problems that can be instantiated across cost orders and paradigms. While our paper focuses on objective functions up to quadratic order, GREEDA is inherently generalizable to arbitrary orders. On larger, sparse instances, GREEDA outperforms competing methods in both objective value and runtime. For example, for moderately-sized problems based on 30 worm images GREEDA produces a high-quality solution within 2 minutes, whereas competitors require at least half an hour and yield far worse results. On smaller dense problems, GREEDA remains on par with leading methods while being an order of magnitude faster.

15.
arXiv (CS.CL) 2026-06-16

Calibrated Triage, Not Autonomy: Confidence Estimation for Medical Vision-Language Models

A vision-language model can answer a question about a medical image fluently and confidently while barely using the image, leaning instead on language priors. In medicine this is the failure that matters most, because the answer looks trustworthy and is not, and the only protection is a confidence score reliable enough to tell the system when to abstain. We ask a deployment question rather than an accuracy one: how much imaging work a model can safely handle alone, and which confidence signal makes that possible. We evaluate seven confidence estimators across five open-weight LVLMs and three medical visual-question-answering datasets spanning broad clinical imaging, radiology, and pathology, with every probe trained only on natural images and applied without adaptation. Recast as bounded selective prediction (automate a case only when confidence clears a threshold, defer the rest), the comparison is cautionary. The standard metrics are poor guides: discrimination barely separates the methods, and the weak calibration of a cheap self-report is cheaply removed by off-domain temperature scaling without changing deployable yield. What distinguishes a usable estimator is the high-confidence region a clinician acts on: the weakest baselines are confidently wrong on 41 to 45 percent of their errors against 1 to 4 percent for the best probe, and no estimator is reliably best across domains or models. Safe handoff is governed at two levels: base-model competence sets a ceiling, so a well-calibrated score recovers roughly a third of radiology cases at a 20 percent error tolerance but almost none of pathology; the confidence layer then decides how much of that ceiling is reachable. The usable role today is calibrated triage, not autonomy: automate the cases a calibrated score marks safe, route the rest to a clinician. We release all outputs, correctness judgments, and confidence scores, with code.

16.
arXiv (CS.CL) 2026-06-16

Interactor: Agentic RL oriented Iterative Creation for Ad Description Generation in Sponsored Search

This paper focuses on automatically generating informative ad descriptions in sponsored search. Unlike ad titles which are usually optimized to attract user click feedbacks, ad descriptions have a longer text span and possess the potential of incorporating world knowledge to address user search intents while presenting the fine-grained selling points of the ads. We propose Interactor, a multi-turn iterative creation framework optimized with agentic RL for ad description generation. The generation model acts as a policy that interacts with a customized environment consisting of multiple generative reward models. Given initial generations by the policy, the customized GenRMs evaluate multi-dimensional qualities including knowledge capacity and landing page consistency, providing both binary signals and reasoning feedbacks. The policy then iteratively refines the descriptions based on such feedbacks to ensure continuous improvement. Experiments on industrial datasets show that the Interactor framework significantly outperforms state-of-the-art approaches in generating knowledge-rich and faithful ad descriptions. Since May 2026, it has been deployed online in a leading search ads system, contributing to both ad revenue and user experience.

17.
arXiv (CS.AI) 2026-06-11

Beyond Continuity: Simulation-free Reconstruction of Discrete Branching Dynamics from Single-cell Snapshots

arXiv:2605.00545v2 Announce Type: replace-cross Abstract: Inferring cellular trajectories from destructive snapshots is complicated by the challenges of stochasticity and non-conservative mass dynamics such as cell proliferation and apoptosis. Existing unbalanced Optimal Transport (OT) methods treat mass as a continuous fluid, performing inference at the population level. However, this macroscopic view often fails to capture the discrete, jump-like nature of birth-death events at single-cell resolution, which is essential for understanding lineage branching and fate decisions. We present Unbalanced Schrödinger Bridge (USB), a simulation-free framework for learning underlying dynamics that effectively integrates both stochastic and unbalanced effects which also models the discrete, jump-like birth-death dynamics at single-cell resolution. Theoretically, USB provides a tractable solution to the Branching Schrödinger Bridge (BSB) problem, offering a rigorous microscopic interpretation where individual cells undergo both Brownian motion and discrete birth-death jumps. Technically, the method implements an efficient solver by introducing a simulation-free training objective that effectively scales to high-dimensional omics data. Empirically, we demonstrate on both simulated and real-world datasets that USB not only achieves trajectory reconstruction performance better than or comparable to deterministic baselines but also uniquely enables realistic discrete simulation of birth-death dynamics at single-cell resolution.

18.
bioRxiv (Bioinfo) 2026-06-20

The recount3 Python package for programmatic access to uniformly processed RNA-seq data

The recount3 online resource provides tens of thousands of uniformly processed RNA-seq samples across human and mouse from major sequencing repositories like the Sequence Read Archive. While access to these datasets has traditionally been centered in the R/Bioconductor ecosystem, the growing prominence of Python in bioinformatics and machine learning necessitates native, efficient tooling for Python users. Therefore, we present the recount3 Python package with robust application programming interface (API) and command-line interface (CLI) for discovering, downloading, and materializing recount3 resources. The software orchestrates uniform resource locator (URL) resolution, persistent on-disk caching, and the automatic parsing of data into analysis-ready data structures, including Pandas DataFrames and BiocPy RangedSummarizedExperiment objects. The recount3 Python package drastically lowers the barrier to entry for large-scale utilization of RNA-seq data in Python-based computational pipelines, bridging the gap between massive public transcriptomic data and modern machine learning ecosystems.

19.
arXiv (CS.CL) 2026-06-16

Control-Plane Placement Shapes Forgetting: An Architectural Study of Agent Memory Across Thirteen System Configurations

作者:

Where an LLM sits in an agent memory pipeline – between the recall plane that retrieves stored facts (extensively benchmarked) and the control plane that mutates them via supersede, release, purge (largely untested) – shapes which forgetting failure modes the system recovers. Comparing thirteen system configurations on a 385-case adversarial surface, we observe three placement regimes with partly complementary coverage: deterministic primitives suffice for lexical/temporal categories but fail canonicalization (5% on identifier-obfuscation, 0% on cross-lingual); inscribe-time LLM recovers canonicalization (100%) but cannot help intent-aware deletion (0% on prefix-collision and compound-fact); a mutation-time hook recovers intent-aware deletion (78-85%) and brightens nearly all categories simultaneously (91.7-93.2% overall, $0.17 per 385-case run, 2.3s/case mutation latency vs. 64-191ms/case deterministic, recall path unchanged). We expose the trade-off via ForgetEval, a 1000-case templated suite plus a 385-case adversarial layer (132 hand-crafted + 253 LLM-drafted oracle-validated) scored by deterministic substring match, paired with a six-method Adapter Protocol with honest N/A scoring that lets heterogeneous memory stores enter in 130 lines. Admission is corroborated by 10-annotator IAA (Fleiss' kappa = 0.958) and a 77-case external-authored subset (four blind contributors) that replicates the canonicalization asymmetry and amplifies the joint-placement lift (+27.8 pt). Production failures are predominantly forgetting failures rather than recall failures, yet existing benchmarks measure only recall. ForgetEval and all adapters are released under MIT.

20.
arXiv (CS.CV) 2026-06-17

Bounding Box Label Propagation for Re-Annotation of Document Layout Analysis Datasets

Datasets in practical document processing scenarios typically grow over time, and their class annotations undergo continuous refinement. This creates significant re-annotation efforts, which are time-consuming and costly. A promising remedy is to re-annotate only a small subset of available documents manually and apply semi-supervised learning techniques that leverage both labelled and unlabelled data. Although there are numerous approaches to tackle this problem for classification, there exists no adaptation for the problem of re-classifying object detection instances, e.g. for document layout analysis. To this end, we propose Bounding Box Label Propagation (BBLP), a pseudo-labelling framework for object detection. An object encoder integrates visual, textual, and positional embeddings from object detection samples to come up with a joint embedding that can be used for Label Propagation on partially annotated datasets in a plug-and-play fashion. Evaluation results indicate that the proposed approach produces high-quality class annotations of bounding boxes. In the D4LA layout analysis dataset, it achieves a mAP of 54.0%, corresponding to 81.6% of fully supervised performance, while using only 10% labelled data. Our work demonstrates the potential of Label Propagation for object detection and lays the groundwork for reducing manual annotation efforts in real-world document processing applications.

21.
arXiv (quant-ph) 2026-06-16

Light-induced nonadiabatic dissipative quantum dynamics of the Na2 molecule

arXiv:2606.15292v1 Announce Type: new Abstract: Strong light-matter coupling between molecules and optical or plasmonic cavity modes has emerged as a promising platform for advancing photonics, materials science, and chemistry. However, optical cavities and plasmonic resonators in particular are inherently lossy systems characterized by finite photon lifetimes. Accurate theoretical descriptions of molecular dynamics under strong coupling therefore require a proper treatment of cavity losses. In this work, we compare three theoretical approaches for modeling dissipative molecule-cavity dynamics within a realistic parameter regime: the Lindblad master equation, the stochastic Schrödinger equation, and the non-Hermitian Schrödinger equation. As an example, we consider the two lowest energy state of Na2 molecule coupled to a cavity mode and analyze the time evolution of the excited-state population and the mean photon number. Our results demonstrate that the stochastic Schrödinger equation provides an accurate and computationally efficient alternative to the Lindblad master equation, while the non-Hermitian Schrödinger approach is found to be applicable only within a limited range of conditions. Furthermore, we show that inclusion of molecular rotation leads to rotational-vibrational-photonic coupling and gives rise to pronounced nonadiabatic dynamics through light-induced conical intersections. These findings highlight the importance of both dissipation and rotational degrees of freedom for a realistic description of molecular dynamics in strongly coupled molecule-cavity systems.

23.
arXiv (CS.CV) 2026-06-18

VidCRAFT3: Camera, Object, and Lighting Control for Image-to-Video Generation

Controllable image-to-video (I2V) generation transforms a reference image into a coherent video guided by user-specified control signals. While precise control over camera motion, object motion, and lighting is essential for high-fidelity creation, existing methods often treat these factors independently. This overlooks the physical coupling among viewpoint, geometry, and illumination in dynamic scenes, leading to visual inconsistencies such as mismatched shadows and perspective drift under simultaneous changes. We present VidCRAFT3, a unified and flexible I2V framework that explicitly models cross-factor interactions among geometry, motion, and illumination, enabling both independent and joint control over camera motion, object motion, and lighting direction. Image2Cloud provides explicit 3D geometric priors for accurate camera motion control. ObjMotionNet encodes sparse object trajectories into multi-scale motion features to guide realistic object motion. A Spatial Triple-Attention Transformer integrates lighting direction through lighting cross-attention for consistent relighting. To address the scarcity of jointly annotated data, we construct the VideoLightingDirection (VLD) dataset with accurate per-frame lighting direction annotations, and introduce a three-stage progressive training strategy that enables robust learning without fully joint annotations. Extensive experiments demonstrate that VidCRAFT3 achieves state-of-the-art performance in control precision and visual coherence across diverse scenarios.

24.
bioRxiv (Bioinfo) 2026-06-15

SMS: Symmetric Mediation Statistics for Powerful High-Dimensional Mediation Analysis

Background: Mediation analysis of high-dimensional features, particularly molecular-level omics features, provides important opportunities to uncover biological mechanisms underlying human health and disease. However, two central statistical challenges remain: testing the composite-null hypothesis and maintaining power when the exposure-mediator and mediator-outcome associations differ substantially in statistical significance. Existing methods typically rely on accurate estimation of the proportions of the three null types or on the maximum of the two association p-values, and may not always control the FDR well and may have limited power under imbalanced significance. Methods: We propose SMS, a new statistical framework based on symmetric mediation statistics. By exploiting symmetry, SMS calibrates the composite null distribution as a whole for FDR control. It also allows flexible combinations of the two association p-values, including the maximum, and then enables construction of an omnibus test. Moreover, it permits direct use of effect-size estimates, bypassing the need to compute p-values. Results: SMS controlled the FDR across a wide range of simulation scenarios while achieving a substantial sensitivity gain, often around 20 percentage points, over existing methods including HDMT, DACT, and DEI-B. Applications to a metabolomics dataset and a DNA methylation dataset further corroborated these findings. Notably, SMS discovered five plausible mediators in the metabolomics dataset that were missed by all existing methods considered.