Academic Intelligence · Curated Daily

Explore the Frontier of Global Academia

AcademicHub aggregates real-time literature from top journals and preprint platforms. Build your personal research radar and let large language models compile cross-disciplinary analysis briefings automatically.

01.
arXiv (CS.CV) 2026-06-24

BioMedVR: Confusion-Aware Mixture-of-Prompt Experts for Biomedical Visual Reprogramming

Recent advances in vision-language models (VLMs) such as CLIP have demonstrated strong generalization across natural-image domains. However, adapting these models to biomedical imaging is non-trivial: full-model fine-tuning is computationally expensive, while medical data are often scarce and exhibit subtle, fine-grained inter-class differences, making parameter-efficient adaptation particularly critical. Visual Reprogramming (VR) offers a parameter-efficient alternative by injecting learnable perturbations into the input space, but existing VR approaches for VLMs mainly focus on positive class prompts and overlook confusing negatives, leading to miscalibrated predictions in fine-grained medical scenarios. We present BioMedVR, the first VR-based framework for biomedical imaging, enabling few-shot adaptation of pretrained VLMs through compact learnable VR modules. To mitigate class confusion, we introduce a Confusion Minimization Mechanism that leverages LLM-generated confusion-aware attributes together with a Confusion-Suppression Loss to explicitly reduce false-positive alignment. Moreover, the designed Mixture-of-Prompt Experts combines a positive expert for main-class discrimination and a negative expert for confusion suppression, balanced via adaptive gating. Extensive experiments on 18 datasets, including 11 biomedical datasets and 7 natural image benchmarks, demonstrate that BioMedVR achieves superior accuracy and generalization, effectively bridging VR and VLMs in biomedical domains.

02.
arXiv (CS.CV) 2026-06-11

MedVeriSeg: Teaching LISA-Like Medical Segmentation Models to Verify Query Validity Without Extra Training

Despite recent progress in text-prompt-based medical image segmentation, existing LISA-like MLLM-based methods typically generate masks regardless of whether the target specified in the query is present, leading to hallucinated segmentation. In this work, we propose MedVeriSeg, a training-free query verification framework that enables LISA-like medical segmentation models to reject false segmentation queries. MedVeriSeg first quantifies the response quality between the [SEG] token and image features through a Similarity Response Quality Scoring Module. To further improve robustness, it employs a Lightweight Routed Multi-Agent Verification Module, which fuses quantitative score evidence with qualitative agent evidence to comprehensively verify the validity of the query. To support systematic evaluation, we construct MedVeriSeg-Bench, a benchmark designed for query verification in medical image segmentation. Experimental results demonstrate that MedVeriSeg effectively identifies false segmentation queries and reduces hallucinated segmentation, while maintaining a high acceptance rate for valid queries, thereby largely preserving the segmentation utility of LISA-like medical segmentation models.

03.
arXiv (CS.CV) 2026-06-15

A Unified Theory of Sinusoidal Activation Families for Implicit Neural Representations

Implicit Neural Representations (INRs) model continuous signals with compact neural networks and have become a standard tool in vision, graphics, and signal processing. A central challenge is accurately capturing fine detail without heavy hand-crafted encodings or brittle training heuristics. Across the literature, periodic activations have emerged as a compelling remedy: from SIREN, which uses a single sinusoid with a fixed global frequency, to more recent architectures employing multiple sinusoids and, in some cases, trainable frequencies and phases. We study this family of sinusoidal activations and develop a principled theoretical and practical framework for trainable sinusoidal activations in INRs. Concretely, we instantiate this framework with Sinusoidal Trainable Activation Functions (STAF), a Fourier-like activation whose amplitudes, frequencies, and phases are learned. Our analysis (i) establishes a Kronecker-equivalence construction that expresses trainable sinusoidal activations with standard sine networks and quantifies expressive growth, (ii) characterizes how the Neural Tangent Kernel (NTK) spectrum changes under trainable sinusoidal parameterization, and (iii) provides an initialization that yields standard normal post-activations without asymptotic central limit theorem (CLT) arguments. Empirically, on images, audio, shapes, inverse problems (super-resolution, denoising) and NeRF, STAF is competitive and often stronger on distortion-oriented reconstruction metrics such as PSNR/SSIM across the evaluated INR tasks, with favorable parameter efficiency under layer-wise sharing. While periodic activations can alleviate practical manifestations of spectral bias, our results indicate they do not eliminate it; instead, trainable sinusoids can improve the observed capacity-optimization trade-off in the evaluated settings.

04.
arXiv (CS.AI) 2026-06-19

Spatial-Aware Reduction Framework: Towards Efficient and Faithful Visual State Space Models

arXiv:2606.19932v1 Announce Type: cross Abstract: Mamba demonstrates strong efficiency in modeling long visual sequences. However, when token reduction is applied to structurally enhanced Mamba variants, these models exhibit a severe performance collapse. We attribute this degradation to the spatially agnostic nature of existing reduction methods, which violate the two-dimensional structural premise required by the selective scanning mechanism. In this work, we propose STORM, a spatial-aware token reduction framework designed to maintain structural integrity throughout the compression process. STORM reformulates reduction into a structured operation on spatial units, enforcing localized constraints to maintain both grid topology and neighborhood coherence. As a plug-and-play module, STORM equips existing reduction pipelines with explicit spatial awareness without any training. Empirical results demonstrate that STORM achieves state-of-the-art pruning accuracy across diverse vision Mamba backbones under training-free settings. Notably, STORM delivers a substantial accuracy recovery on VMamba, outperforming prior methods by up to 63.3\% in top-1 accuracy. Meanwhile, STORM incurs only a 1.0\% accuracy drop on PlainMamba, achieving performance comparable to ViT.

05.
Nature (Science) 2026-06-10

Light slows down carbon nanotubes in water

Water-suspended carbon nanotubes move more slowly in green light, suggesting that excited electrons in the tubes couple to the water through ‘quantum friction’. Water-suspended carbon nanotubes move more slowly in green light, suggesting that excited electrons in the tubes couple to the water through ‘quantum friction’.

06.
bioRxiv (Bioinfo) 2026-06-22

EventHorizon: A Foundation Model for Clinical Flow Cytometry

Flow cytometry is an essential tool for diagnosis of hematologic malignancies, but existing clinical workflows are highly dependent on expert manual interpretation. Existing machine learning approaches typically require extensive labeled data and are sensitive to variability in panel design, instrumentation, and laboratory workflows, limiting their generalizability. We present EventHorizon, a self-supervised foundation model for clinical flow cytometry that produces unified specimen-level representations from heterogeneous multi-panel data. EventHorizon employs a two-stage hierarchical transformer architecture with marker-aware tokenization, enabling seamless integration of cells measured across different antibody panels into a single shared latent space. We pre-train the model using a DINO-inspired self-distillation strategy with a variety of flow cytometry-specific augmentations on a dataset of more than 100,000 clinical specimens across 17 distinct panels. We evaluate the resulting embeddings on three clinically relevant classification tasks spanning common and rare panels, demonstrating that simple k-nearest neighbor probing of frozen EventHorizon embeddings achieves performance comparable to a fully supervised baseline model and a prior panel-specific self-supervised model. To ensure EventHorizon is not simply shortcut learning on features such as the markers/panels run for a given specimen, we perform a graph-theoretic analysis of EventHorizon's latent space which argues that specimen embeddings are organized primarily by biological diagnosis. Taken together, these results demonstrate that EventHorizon produces biologically meaningful, panel-agnostic specimen representations from clinical flow cytometry data which, with further development and validation, could provide a potential basis for scalable, reproducible diagnostic support across diverse clinical laboratory settings.

07.
arXiv (CS.LG) 2026-06-17

Finite-Time Queue Peak Laws in Stochastic Networks: Logarithmic Scaling After Geometric Thresholds

arXiv:2606.18218v1 Announce Type: cross Abstract: We study finite-horizon queue peaks in generalized switches, a standard stochastic-network model in which many queues share constrained service resources. Arrivals may be dependent, time-varying, and adapted to the past; the standing load condition is uniform interior slack, meaning the conditional mean arrival vector stays in a fixed contraction of the capacity region. We show that this slack reshapes the finite-time peak law for drift-minimizing scheduling policies such as MaxWeight. The square-root envelope that is sharp without slack persists only up to a geometry-dependent threshold; beyond that threshold, the running maximum grows only logarithmically with the horizon, both with high probability and in expectation. The mechanism is self-normalization: in the current queue direction, the projected fluctuation scale is normalized by the stabilizing drift scale. This removes capacity geometry from the logarithmic coefficient, while geometry remains in the threshold. Matching lower bounds show that both the logarithmic term and a geometric threshold are unavoidable. When finite-time state-space collapse is available, the threshold can be sharpened using local bottleneck geometry. For generalized input-queued switches, we obtain finite-time peak bounds with tight logarithmic coefficients. Simulations illustrate the two-phase envelope, local geometric refinements, and variance-sensitive improvements predicted by the theory.

08.
arXiv (CS.AI) 2026-06-12

Counterfactual Credit Policy Optimization for Multi-Agent Collaboration

arXiv:2603.21563v5 Announce Type: replace Abstract: Collaborative multi-agent large language models (LLMs) can solve complex reasoning tasks by decomposing roles, but reinforcement learning for such systems is limited by credit assignment: shared terminal rewards obscure individual contributions and can encourage free-riding. We introduce two optimizer-agnostic credit assignment methods for converting joint outcomes into agent-specific learning signals. Counterfactual Credit for Policy Optimization (CCPO) estimates an agent's marginal contribution by comparing the realized joint outcome with a counterfactual outcome where that agent is removed. Self-Evaluated Credit for Policy Optimization (SEPO) uses constrained self- and peer-evaluations as a verifier-anchored credit signal while keeping the external task outcome dominant. Both operate at the reward-construction layer rather than as policy optimizers, producing role-specific rewards or advantages for GRPO, GSPO, or REINFORCE++. We instantiate these credit signals in a sequential Think–Solve setting and evaluate them on mathematical reasoning benchmarks. Results show that explicit credit assignment often improves dual-agent reasoning, especially on MATH500 and several out-of-distribution settings, while gains vary across models and datasets. Our code is available at: https://github.com/bhai114/ccpo.

09.
arXiv (CS.CV) 2026-06-25

Learning Action Priors for Cross-embodiment Robot Manipulation

Most Vision-Language-Action (VLA) models build on a Vision-Language Model (VLM) backbone by attaching an action module and optimizing the full policy jointly. This design inherits strong visual and linguistic priors from the VLM, but leaves the action module to learn physical motion almost from scratch. As a result, the policy lacks an explicit motion prior, forcing early optimization to simultaneously discover temporal action dynamics and cross-modal alignment, a challenge further amplified in cross-embodiment settings. In this work, we propose to pretrain the action module with motion priors before cross-modal VLA alignment. Specifically, we introduce a two-stage training framework that equips the action module with cross-embodiment temporal motion structure before VLA training begins. In Stage~1, a lightweight flow-matching-based encoder-decoder action module efficiently learns temporal motion structure solely from unconditioned action trajectories, without processing visual or language tokens. In Stage~2, this learned prior is transferred to VLA training through decoder reuse and early-stage latent distillation, aligning visual-language features with the action embedding space while still allowing end-to-end policy refinement. In addition, the trained encoder serves as a compact history compressor, summarizing state-action histories into a single temporal context token for history-aware modeling at negligible cost. Extensive experiments across 13 diverse cross-embodiment tasks on both simulated and real-world platforms validate the effectiveness of our approach. Compared with VLA training without action priors, our model achieves faster convergence, higher success rates, and substantially stronger performance on data-scarce real-world tasks. Moreover, scaling up the action data in Stage~1 yields a more generalizable action prior that directly improves downstream VLA performance.

10.
arXiv (CS.CV) 2026-06-25

BioVid: Autoregressive Video Generation with Biological Behavior Semantic Comprehension

Video generation for biological behavior requires more than visually plausible motion: the duration of an action is itself a semantic property. Existing models usually rely on fixed temporal windows, external continuation, or prompt-driven stories, so length is specified externally rather than learned from behavior. To address this gap, we propose BioVid, a data-driven autoregressive framework for adaptive-length biological behavior generation. BioVid uses a 2D-encode/3D-decode tokenizer: a two-dimensional FSQ-R3GAN encoder converts each frame into discrete visual tokens, preserving single-frame information suited for next-token prediction and EOS-based termination, while a temporally inflated and video-finetuned three-dimensional decoder reconstructs generated tokens with temporal context to reduce flickering. A causal Transformer then models the frame-wise token sequence and, conditioned only on the first frame, stops generation when it emits an End-of-Sequence token, allowing duration to emerge from the learned behavior distribution. We evaluate BioVid on the A001 drinking action from NTU RGB+D. On 94 held-out clips, BioVid achieves a Wasserstein-1 distance of 1.24 frames from the real duration distribution. In comparison, fixed-length baselines yield distances of approximately 6-7 frames even when configured to the available length closest to the dataset mean, and approximately 15 frames when using the conventional 16-frame generation length. These results demonstrate the ability of BioVid to learn and reproduce the intrinsic duration distribution of biological behavior.

11.
arXiv (CS.CL) 2026-06-25

Introducing corpora Hlava Cor and Hlava AD: Human Label Variation in Coreference and Discourse Relations

As previous research on annotator disagreement in discourse phenomena has shown, understanding text coherence varies considerably from one individual to another. To explore this phenomenon, we created two corpora with multiple annotations of Czech texts, accompanied by annotators' explanations of their choices. The first corpus consists of 1,024 contexts annotated in parallel by three annotators. It captures differences in the identification of coreference across various text types and grammatical-semantic categories, including pronouns, full noun phrases, and anaphoric adverbials. The second corpus comprises 512 contexts, annotated in parallel by five annotators, and focuses on identifying discourse relations in attributive and non-attributive constructions. Both corpora achieve a comparable inter-annotator agreement of approximately 60-65%. For coreference annotation, agreement tends to be lower in cases where automatic coreference resolution models disagree, suggesting that when the models disagree, the examples tend to be more difficult or ambiguous for human annotators to interpret. The annotators' comments, both for coreference and discourse relations, further reveal differences in interpretation, varying levels of confidence in text understanding, and individual reading strategies.

12.
arXiv (quant-ph) 2026-06-19

Exclusion Statistics as a Thermodynamic Resource in Quantum Heat Engines

arXiv:2606.19310v1 Announce Type: cross Abstract: The maximum power extractable from a quantum thermoelectric heat engine operating with free fermion carriers is bounded by the universal Whitney limit, $P_{fermion}^{\max} \simeq 0.0321\pi^2 k_B^2(T_L-T_R)^2/h$. We demonstrate that this bound is not fundamental to quantum heat engines but is instead an artifact of fermionic statistics. Within the nonlinear Landauer-B\"{u}ttiker framework, a bosonic working medium yields a strictly enhanced universal maximum power, $P_{boson}^{\max} = (\ln 2)^2\, k_B^2(T_L-T_R)^2/h$, exceeding the fermionic limit by a factor of $(\ln 2)^2/(0.0321\pi^2) \approx 1.52$. We propose magnon transport through a ferromagnetic spin chain as an experimentally viable bosonic realization. Incorporating Haldane fractional exclusion statistics with parameter $g$ provides a continuous interpolation between the bosonic ($g = 0$) and fermionic ($g = 1$) limits, revealing a monotonic enhancement of maximum power for $g < 1$ at reduced bias cost. These results establish quantum statistical exclusion as a previously unrecognized and independently tunable thermodynamic resource, opening performance regimes inaccessible to conventional carrier-engineering approaches.

13.
arXiv (CS.CV) 2026-06-24

Resonant Minds: Closed-Loop Social Avatars with Theory of Mind

Creating lifelike digital humans with genuine social intelligence requires unifying cognitive reasoning and multimodal generation within a coherent framework. Current approaches treat these as separate tasks: Large Language Models excel at dialogue but lack embodied expression, while diffusion-based talking head models achieve visual fidelity but ignore social cognition. To bridge this gap, we propose a closed-loop dual-agent framework integrating perception, social reasoning, and expression into a continuous interaction cycle. The perception module analyzes partners' multimodal behaviors from video, while the social reasoning module infers hidden mental states through Theory of Mind and selects responses via an ensemble mechanism. The expression module then generates emotion-controllable videos that jointly synthesize speaker speech and facial expressions with listener reactive behaviors, capturing bidirectional dynamics absent in prior work. We further construct a hierarchical Persona-Scenario dataset with psychologically grounded personas and private social goals to support evaluation under information asymmetry. Experiments on this dataset demonstrate competitive or superior performance on both dialogue quality and video generation metrics. Notably, our method surpasses even the full-information Script mode on key dialogue quality dimensions, suggesting that explicit mental state inference under uncertainty can elicit more thoughtful dialogue than unrestricted information access. Project page: https://resonantminds.github.io/.

14.
arXiv (CS.CV) 2026-06-11

Contactless 3D Human Body Measurement Using Depth Cameras for Smart Health Monitoring

Contactless body measurement technologies are becoming increasingly significant for smart health monitoring, digital health applications, and remote patient assessment. Traditional anthropometric measurements typically necessitate physical contact and trained personnel, which may constrain scalability in remote healthcare settings. In this study, we introduce a depth camera-based framework for estimating human body measurements utilizing 3D point cloud data. An Orbbec Astra 2 depth camera was employed to capture RGB images, depth maps, and 3D point clouds of participants. The captured point cloud was processed using Python-based tools, including Open3D, NumPy, and OpenCV, to segment the human body from the background. Key anthropometric measurements, such as height and arm span, were computed. The measurements were obtained through a combination of spatial filtering and landmark selection on the 3D point cloud, followed by the projection of the computed measurements onto the corresponding RGB image using camera intrinsic parameters. In addition to linear measurements, the approximate body volume and visible surface area were estimated using voxel-based occupancy analysis and mesh-based surface reconstruction methods. The experimental results from a single depth capture demonstrated that accurate body measurements and geometric estimates could be obtained from depth camera data without physical contact. This study provides a foundation for future real-time systems that integrate depth sensing with intelligent health monitoring and generative AI models for smart healthcare applications.

15.
arXiv (CS.AI) 2026-06-24

A Survey on Federated Causal Discovery and Inference

arXiv:2606.23741v1 Announce Type: cross Abstract: Causal reasoning, which encompasses the discovery of causal structures and the inference of causal effects, is fundamental to data-driven decision making. In practice, data for reliable causal analysis are often distributed across institutions and cannot be centralized due to privacy regulations or communication constraints. Federated learning (FL) addresses this by enabling collaborative analysis without raw data sharing, giving rise to the rapidly growing field of federated causal discovery (FCD) and inference (FCI). However, the interdisciplinary nature of this field and the absence of a comprehensive survey present barriers to entry for researchers. This paper bridges that gap by providing a systematic review through multi-dimensional taxonomies. Grounded in the three core design decisions underlying any FCD solution, namely how structures are learned, how data are partitioned, and what structural knowledge each party obtains, we organize FCD along three axes: methodological paradigm, federation topology, and structural scope. We further examine key practical dimensions, including temporal dynamics, data heterogeneity, missing data, and non-identical variable sets. For FCI, we categorize methods by target estimand (average versus individualized/conditional treatment effects) and by estimation strategy, from classical weighting methods to modern deep generative architectures. Unlike prior works that treat FCD and FCI separately, we formalize their connection as complementary stages of a unified federated causal reasoning pipeline, where FCD supplies the structural knowledge required for valid effect estimation in FCI. Finally, we highlight their shared concerns regarding privacy, communication efficiency, theoretical guarantees, and application domains, and conclude by identifying open challenges for future research.

16.
arXiv (quant-ph) 2026-06-19

Observation of alignment tensor effects in metastability-exchange collisions with highly polarized 3He ensembles

arXiv:2606.20330v1 Announce Type: new Abstract: Highly polarized 3He ensembles prepared by metastability-exchange optical pumping (MEOP) have been widely used in precision measurements and fundamental physics. Metastability-exchange (ME) collisions, serving as the basis of MEOP, are traditionally described in terms of atomic orientation, while the significant contributions of metastable alignment tensor at high polarization remain unexplored. In this work, we develop a linearized model under mean-field approximation to investigate alignment tensor effects in highly polarized 3He , which originate from the metastable F = 3/2 manifold and are revealed through ME-induced relaxation and frequency shift. By means of free-induction-decay (FID) measurements, a pronounced dependence on nuclear polarization is experimentally observed in the response of the ground-state-metastable hybrid 3He ensembles to the external magnetic field. Furthermore, after obtaining the characteristics of tensor-induced phenomena, we demonstrate good agreement between the experiment and the theory. This work advances the understanding of nuclear spin dynamics in highly polarized 3He using MEOP. It further provides applications in systematic error correction of high-accuracy magnetometry, as well as in optimal protocol for the generation of nuclear spin-squeezed states.

17.
medRxiv (Medicine) 2026-06-16

Development of a symptom-based severity score anchored to health-related quality of life post-COVID-19 within the population-based EPILOC cohorts

Purpose Because simple symptom counts treat all symptoms as equally important and may not adequately capture the HRQoL impact of heterogeneous post-COVID-19 symptoms, we aimed to develop an HRQoL-anchored symptom severity score providing an interpretable measure of post-COVID-19 disease burden. Methods Baseline data from the population-based EPILOC and EPILOC Omicron surveys (adults aged 18-65 years) were used to develop a symptom-based severity score anchored to physical and mental HRQoL assessed with the SF-12. A two-stage modelling approach was applied to identify HRQoL-relevant symptoms and to derive symptom-specific weights for physical and mental component scores, incorporating 30 ordinal symptom severity variables. Symptom-specific weights were extracted to compute physical, mental, and composite severity scores. Score interpretation was examined using external reference measures, including EPILOC case status, self-reported health recovery, and functional consequences. Results A total of 19,004 participants (mean age 44.3 years, 59.6% female) were included. Sixteen symptoms contributed to the physical and eleven to the mental HRQoL score, with a limited subset accounting for most of the HRQoL loss. Severity scores were heavily right-skewed, with 50.6% of participants showing no measurable HRQoL impairment. Higher scores correlated with lower self-reported recovery, and increased probability of rehabilitation use and health-related changes in working time, supporting convergent and criterion-related validity. Conclusions This study introduces a transparent, HRQoL-anchored symptom severity score that measures graded post-COVID-19 burden beyond simple symptom counts. The score may be particularly suited for longitudinal assessment of recovery trajectories.

18.
bioRxiv (Bioinfo) 2026-06-18

Population-associated molecular variation in histologically normal breast tissue is context-dependent and associated with distinct transcriptional states

Population-associated molecular variation in breast tissue may contribute to differences in tissue biology and disease susceptibility, yet the extent to which such variation is shaped by underlying tissue states remains unclear. Here, we performed RNA-seq and lipidomic profiling of histologically normal breast tissue samples from African American (AA) and Caucasian White (CW) individuals, followed by conceptual integration of the resulting transcriptomic and lipidomic patterns. Unsupervised analysis revealed two distinct baseline transcriptional states (G1 and G2) that defined the primary axis of molecular variation across the cohort and corresponded to epithelial-enriched (G1) and vascular-enriched (G2) tissue contexts as determined by cell-type deconvolution. Global comparisons between AA and CW samples showed minimal transcriptomic differences, with only a single gene reaching significance after multiple testing correction. However, when stratified by baseline tissue state, 191 genes were differentially expressed within G1, with coordinated upregulation of extracellular matrix organization and proliferative/cytoskeletal processes in AA samples. These patterns were consistently supported across multiple enrichment approaches. No comparable population-associated differences were observed within G2. Lipidomic analyses showed partial but non-significant trends consistent with transcriptomic structure, suggesting that lipid variation provides complementary but limited support for baseline molecular differences, likely reflecting constraints of bulk tissue composition. Together, these findings suggest that population-associated molecular differences in normal breast tissue are context-dependent and emerge within specific baseline transcriptional states, where distinct biological programs can coexist and be differentially modulated. These findings highlight the importance of tissue heterogeneity in shaping molecular variation and its potential relevance to disease-associated tissue states.

19.
arXiv (CS.CV) 2026-06-19

TimeProVe: Propose, then Verify for Efficient Long Video Temporal Reasoning in Activities of Daily Living

Long Video Question Answering (LVQA) requires identifying sparse, query-relevant evidence within hours-long untrimmed videos. Existing approaches either process videos densely with large vision-language models (VLMs), incurring prohibitive computational cost, or rely on sparse caption-based reasoning, which often misses temporally localized and motion-centric evidence. We introduce TimeProVe, a cost-efficient hybrid framework for temporally grounded reasoning in long videos. TimeProVe first employs lightweight modules to generate action-grounded answer–evidence hypotheses and subsequently invokes an expensive VLM only for targeted verification. The core of our framework lies in the Action-based Candidate Evidence (ACE) module, which converts temporally localized actions into query-conditioned candidate answers and supporting evidence windows through lightweight LLM reasoning. We further introduce OpenTSUBench (OTB), an open-ended benchmark designed to evaluate temporally grounded reasoning in real-world Activities of Daily Living (ADL) scenarios. Experiments show that TimeProVe outperforms the strongest baseline on OTB by 7.3%, while reducing VLM calls by 75% and inference cost by 93%. Furthermore, without explicit temporal grounding training, TimeProVe achieves competitive performance on Charades-STA, and reaches state-of-the-art results when enhanced with grounding VLMs.

20.
bioRxiv (Bioinfo) 2026-06-17

AMaNITA: an end-to-end workflow for native tRNA nanopore sequencing data analysis

Transfer RNA (tRNA) molecules serve as essential adapters during protein translation. While direct RNA sequencing (DRS) via Oxford Nanopore Technologies has emerged as a powerful platform for systematic tRNAome profiling, we currently lack a simple and robust statistical framework for nanopore tRNA data analyses. Here, we address this gap by developing AMaNITA (Abundance, Modifications, and Nanopore Intensity Toolbox Application), an end-to-end bioinformatic workflow that enables simplified, robust, and scalable analyses of nanopore native tRNA sequencing datasets. AMaNITA streamlines the entire analytical trajectory: from upstream processing (basecalling, mapping, filtering, batch effect correction) to downstream assessment of differential tRNA abundance and modification stoichiometry. The workflow generates an interactive HTML report for data exploration and analysis, allowing the user to download the source data files and resulting plots. AMaNITA can be executed using Singularity from the command line, without requiring installation of dependencies.

21.
arXiv (CS.LG) 2026-06-25

Towards Robust EEG Decoding Based on Riemannian Self-Attention

arXiv:2606.25456v1 Announce Type: new Abstract: Brain-Computer Interface (BCI) based on electroencephalography (EEG) enables direct interaction between the brain and external environments and has significant applications in assistive technologies, medical rehabilitation, and entertainment. Recently, EEG decoding methods based on Symmetric Positive Definite (SPD) learning have demonstrated superior performance. However, these methods typically employ basic network architectures and do not explicitly capture local relationships between EEG signals. This limitation is problematic for EEG signals due to their inherently low Signal-to-Noise Ratio (SNR). Moreover, most existing Riemannian manifold-based methods are restricted to specific metrics. The most widely used is the Affine-Invariant Metric (AIM). However, it has a quadratic dependency on the SPD matrices and cannot handle ill-conditioned SPD matrices, which hinders the effectiveness of networks. In contrast, the Bures-Wasserstein Metric (BWM) exhibits linear dependence on SPD matrices and demonstrates superior performance for ill conditioning. To overcome these challenges, we propose a Riemannian self-attention network based on the BWM. Additionally, the recently introduced power-deformed generalized Bures-Wasserstein metric reveals a nonlinear relationship between SPD matrices and matrix power deformation. This metric provides a more nuanced representation of the geometric structure of the SPD manifold. Consequently, we extend our model to a learnable version. For simplicity, we refer to it as GBWAtt. Experimental results on three EEG benchmarking datasets validate the robustness and effectiveness of our proposed method. The code is available at https://github.com/jissc/GBWAtt.

22.
arXiv (CS.LG) 2026-06-16

Scalable and Interpretable Representation Alignment with Ordinal Similarity

arXiv:2606.16379v1 Announce Type: new Abstract: Evaluating representation similarity is fundamental to representation learning. However, existing metrics suffer from significant limitations: they lack interpretability due to shifting baselines, lack robustness to outliers, and are computationally intractable for large datasets, forcing reliance on heuristic approximations. To address this, we develop an ordinal-similarity framework, instantiated by the Triplet (TSI) and Quadruplet (QSI) Similarity Indices, which measure alignment by quantifying the consistency of ordinal relationships. We theoretically demonstrate this formulation is inherently interpretable, robust to outliers, and computationally efficient. Finally, we establish a formal equivalence between TSI and local neighborhood alignment, measured by Mutual Nearest Neighbors. Empirically, we validate these properties and show that ordinal similarity offers a scalable approach to measuring alignment, enabling practitioners to better understand and design representations.

23.
arXiv (CS.AI) 2026-06-15

Mask, Sample, Revise: A Revisable CTMC Inference Stack for Guided Discrete Flow Matching Text-to-Speech

arXiv:2606.13989v1 Announce Type: cross Abstract: Recent alignment-free non-autoregressive (NAR) text-to-speech (TTS) models formulate synthesis as a conditional infilling task, bypassing explicit duration predictors and external aligners. When speech is represented with neural codec tokens, the infilling problem becomes discrete, making Discrete Flow Matching (DFM), a Continuous-Time Markov Chain (CTMC) framework for discrete generation, a natural fit. However, inference-time control for stable low-step conditional infilling remains underexplored. We propose Mask, Sample, Revise, an inference-time CTMC stack for alignment-free DFM-TTS. The stack combines predictor-free guidance to strengthen text conditioning, prompt-matched conditional coupling to align the probability path with the acoustic prompt, and SC-ReMask, a schedule-constrained remasking mechanism that introduces token-to-mask transitions so early de-masking decisions can be revised. These components require no post-hoc fine-tuning and operate in a single tau-leaping sampler. Controlled ablations show that this stack improves intelligibility and robustness in the low-NFE prompted setting, outperforming unguided and guidance-only samplers with substantially more steps.

25.
arXiv (quant-ph) 2026-06-17

Entanglement dynamics for atoms near a reflecting boundary: Enhancement and suppression by environment-induced interactions

arXiv:2602.23773v2 Announce Type: replace Abstract: We investigate how environment-induced interactions influence the entanglement dynamics of two atoms held at fixed positions near a perfectly reflecting boundary. Within the framework of open quantum systems, we explicitly incorporate the environment-induced energy shifts, including both atom-boundary contributions and an environment-induced atom-atom interaction, which are often neglected in previous studies. We show that, for any initial two-atom state, these energy-shift effects qualitatively and quantitatively modify the entanglement dynamics relative to treatments that omit them. Depending on the geometry and parameter regime, the environment-induced interactions can either enhance entanglement generation – yielding a larger maximum concurrence and a longer entanglement lifetime – or suppress it, reducing both the peak concurrence and the survival time. This behavior contrasts sharply with the free-space case, where the environment-induced atom-atom interaction affects entanglement generation only for a restricted class of initial states and does so in an exclusively assisting manner.