Academic Intelligence · Curated Daily

Explore the Frontier of Global Academia

AcademicHub aggregates real-time literature from top journals and preprint platforms. Build your personal research radar and let large language models compile cross-disciplinary analysis briefings automatically.

01.
arXiv (CS.AI) 2026-06-12

OCOO-T : A Simple and Scalable Virtual Cell Model for Transcriptional Perturbation Response Prediction

arXiv:2606.12838v1 Announce Type: cross Abstract: Predicting single-cell transcriptional responses to genetic, chemical and cytokine perturbations is a fundamental challenge in computational biology and AI Virtual Cell (AIVC) modeling, with direct implications for drug discovery and the elucidation of gene regulatory networks. Existing approaches often rely on auxiliary cell-state encoders, hierarchical variational autoencoders, dedicated Transformer encoder-decoder modules, or gene-interaction priors to compress high-dimensional expression profiles into latent representations. While effective, these designs increase architectural complexity and may limit scalability and generalizability. This paper introduces OCOO-T, a minimalist flow-matching-based AIVC model for transcriptional perturbation response prediction. OCOO-T utilizes a vanilla Transformer stack that operates directly on continuous gene expression profiles and formulates perturbation response prediction as a continuous-time denoising process. Perturbation embeddings, dosage information, and cell-line/cell-type specificity are integrated through adaptive layer normalization and in-context tokens. Comprehensive evaluations on Tahoe100M, Replogle, and PBMC benchmarks demonstrate that OCOO-T achieves state-of-the-art performance across diverse perturbations and cell types while effectively scaling to long transcriptional profiles through patching and depatching of cellular contexts. By leveraging the simplicity of Transformer-based denoising for single-cell omics, OCOO-T provides an effective and scalable framework for in-silico cellular simulation.

02.
arXiv (CS.CV) 2026-06-11

Bridging the Modality Gap in Forensic Image Retrieval

Automated image retrieval plays an increasingly critical role in modern forensic analysis, supporting investigative workflows that rely on efficient comparison of visual evidence. While prior work has focused primarily on developing and optimizing multimodal retrieval systems, limited attention has been paid to evaluating the forensic applicability of these technologies across diverse real-world scenarios. In this study, we present a unified retrieval framework adapted to four key forensic tasks: (1) tattoo image retrieval given a tattoo query image; (2) tattoo retrieval guided by human-expert textual descriptions, modelling the common situation where a witness verbally describes a tattoo; (3) tattoo retrieval from hand-drawn sketches; and (4) face retrieval from forensic face sketches. Our system leverages a multimodal large language model (MLLM) to automatically generate structured textual descriptions for all queries and gallery images, followed by sentence-transformer embedding for text-based comparison. We evaluate retrieval using visual-only embeddings, text-only embeddings and a multimodal fusion strategy that combines text- and image-based similarity scores derived from state-of-the-art visual feature extractors relevant to each task. The fusion of modalities consistently improves retrieval precision and robustness, especially in scenarios where visual information is limited or noisy (e.g., sketches, partial tattoos, or fragmented witness statements). This work highlights the forensic value of a unified multimodal retrieval pipeline and demonstrates how modern MLLMs can operationalize challenging forensic tasks that traditionally rely on manual expert analysis. Our results position multimodal retrieval as a promising tool for supporting investigative workflows involving tattoos, facial composites, and witness descriptions.

03.
medRxiv (Medicine) 2026-06-17

Method comparisons for differentiation of Schizophrenia and Bipolar based on rs-fMRI Intrinsic and Functional Networks

Psychosis as a symptom manifests in schizophenia and bipolar disorder, two highly heterogeneous psychiatric illnesses with overlapping clinical manifestations. Resting-state functional Magnetic Resonance Imaging (rsfMRI), represents a promising tool for identifying objective biomarkers of functional brain alterations to aid differential diagnosis. In this work, we comparatively evaluate multiple rs-fMRI representations for differentiating schizophrenia and bipolar disorder using intrinsic connectivity network (ICN) temporal profiles and several functional network connectivity (FNC) approaches, including static, dynamic, and high-order connectivity analyses. The study was conducted on a cohort of 371 subjects with psychosis, while evaluation was performed using a separate held-out cohort of 315 subjects. We investigated convolutional neural network architectures applied to ICN temporal profiles, spectrograms, and scalograms, alongside classical machine learning models trained on connectivity-derived features. Across the evaluated approaches, ICN temporal profiles provided the most consistent discriminative performance, with a 1D convolutional neural network achieving the strongest overall results under the benchmark protocol. Among connectivity-based methods, static functional connectivity generally outperformed dynamic and high-order representations, suggesting that increased representational complexity did not necessarily translate into improved generalization. Although the obtained classification performance remained modest, the results highlight the challenges of robust psychosis differentiation using rs-fMRI while emphasizing the relative stability of low-order connectivity representations and temporal ICN features. These findings contribute to ongoing efforts toward reproducible and interpretable neuroimaging biomarkers for psychiatric disorders.

04.
arXiv (quant-ph) 2026-06-12

Exploring Exotic Spin-Dependent Interactions Beyond the Standard Model: Theoretical Foundations and Experimental Investigations

arXiv:2606.13318v1 Announce Type: cross Abstract: New interactions mediated by novel particles propose solutions to several important questions in modern physics. Axions serve as examples of such particles; they are lightweight and interact weakly with ordinary matter. This category of particles, including those similar to axions-termed Axion-Like Particles (ALPs)-arises from diverse theoretical frameworks, such as the Peccei-Quinn mechanism addressing the strong CP problem, string theory, and spontaneous supersymmetry breaking. Given their light mass and weak coupling, ALPs are also possible candidates for cold dark matter. Introducing these new interactions mediated by novel particles not only tackles several challenges in modern physics but also raises a crucial question: Are there undiscovered interactions beyond the Standard Model? Many of the interactions predicted by these theories are spin-dependent, which is the primary focus of this review. In this review, we first outline the theoretical foundations for investigating exotic spin-dependent interactions, highlighting their importance in various models beyond the Standard Model. We examine the potential roles of new lightweight particles in mediating these interactions, which may enhance our understanding of dark matter. Relevant formulas derived from theoretical models are included to support experimental investigations. Following this theoretical framework, we conduct a detailed review of recent experimental efforts to detect these exotic interactions. A systematic review of current constraints on these interactions is presented, along with an assessment of various detection approaches.

05.
arXiv (CS.CV) 2026-06-16

OmniOPSD: Rationale-Privileged On-Policy Self-Distillation for Affective Computing

Reinforcement learning for multimodal large language models (MLLMs) is often hindered by severe reward sparsity in complex reasoning tasks. This challenge is particularly pronounced in human-centered scenarios involving states, emotions, intentions, and behaviors, where heterogeneous multimodal signals and subjective human factors make high-quality chain-of-thought (CoT) annotations expensive and difficult to obtain. Although many multimodal datasets provide expert-annotated ground-truth labels, directly using these labels for supervised fine-tuning may encourage shortcut learning in multimodal perception and provides limited transparency for safety-critical human–AI interaction. To address these limitations, we propose OmniOPSD, a Rationale-Privileged On-Policy Self-Distillation framework that uses frontier-generated rationales as teacher-side privileged evidence rather than student imitation targets. OmniOPSD uses frontier-generated evidence-aware rationales only as training-time privileged evidence context for a local teacher. The student samples its own rollout from the original multimodal input, while the rationale-privileged teacher scores the same tokens and provides dense token-level supervision. Thus, the student learns on its own trajectory distribution without directly imitating frontier-model completions, and inference requires no labels, rationales, CoT annotations, or closed-source model access. Experiments on MER-UniBench show that OmniOPSD achieves state-of-the-art performance with an average score of $84.19$, and ablations further support the value of rationale-privileged teacher guidance.

06.
bioRxiv (Bioinfo) 2026-06-15

Inferring Cell Fate Trajectories in Time-Resolved Metabolic RNA Labeling data

Single-cell RNA sequencing provides high-resolution snapshots of cellular states but lacks direct information about transcriptional dynamics. Metabolic RNA labeling addresses this limitation by distinguishing newly synthesized RNA, offering insight into the direction of cell state changes, and providing valuable information when attempting to recover the underlying continuous dynamics from static snapshots of cell distributions. However, existing trajectory inference methods do not fully exploit this additional signal. Here, we propose FLOWSATATE, a framework for single-cell trajectory inference that leverages time-resolved RNA labeling within an Optimal Transport setting. We model cell dynamics as a gradient flow in an inferred potential landscape parameterized by a neural network, integrating both total and labeled RNA across time points. The learned potential enables identification of key genes and transcription factors driving cell fate decisions and supports prediction of future cellular states. We benchmark our approach on its ability to generalize unseen data and recover coherent trajectories. We also apply it to study colorectal cancer response to demethylation treatment as well as neuronal differentiation of embryonic stem cells.

07.
arXiv (CS.LG) 2026-06-15

Online Convex Optimization with Sublinear Noisy Probes

arXiv:2606.14640v1 Announce Type: new Abstract: We study Online Convex Optimization (OCO) over a convex set $K\subseteq \mathbb R^d$, where in each round $t$ the learner selects $x_t\in K$ and then observes a convex loss $f_t:K\to[0,1]$, with the goal of minimizing regret to the best fixed decision in hindsight. We introduce a unified probing model that generalizes two recent lines of work: sublinear best-expert queries in the experts setting, and pairwise (comparison-based) feedback available every round in OCO. In our framework, the learner has a budget of $k\le T$ pairwise probes; on a probed round it may query two points and learn which one has smaller loss. Our main result shows that even a sublinear and noisy probe budget can provably improve worst-case regret in the full feedback OCO regime. With $k$ $\delta$-noisy pairwise probes, we obtain: $ Reg_T \le O\left(\min\left\{\sqrt{dT\ln T},\; \frac{dT\ln T}{k|1-2\delta|}\right\}\right) $, which is tight (up to logarithmic factors in $T$) across $T$, $k$ and $\delta$. Specifically regarding the noise parameter $\delta \in [0,1]$, the regret guarantee smoothly degrades as the oracle response approaches a coin flip, i.e., $\delta$ is close to $\frac{1}{2}$. When applying the same techniques to a finite $K$ for the prediction with $d$ experts setting, the resulting rates are instead completely tight in all parameters, including $d$. Our analysis gives a streamlined treatment of pairwise probing in OCO by quantifying the benefit of probing via a variance reduction effect, combined with a second-order (variance-based) analysis of Continuous Exponential Weights.

08.
arXiv (CS.AI) 2026-06-18

Pareto Q-Learning with Reward Machines

arXiv:2606.19134v1 Announce Type: cross Abstract: We present Pareto Q-Learning with Reward Machines (PQLRM), a multi-objective reinforcement learning algorithm for tasks whose reward structure is specified by a set of reward machines (RMs). PQLRM combines Pareto Q-Learning (PQL), which maintains sets of vector-valued Q-estimates to approximate the Pareto front, with enhancements from Q-Learning with Reward Machines (QRM), which exploits the factored automaton structure of the reward signal. This yields a multi-policy algorithm that remains sample-efficient under non-Markovian, RM-encoded rewards. Experimental trials show that PQLRM converges faster than a naive PQL baseline applied to the cross-product MDP and can synthesize Pareto-optimal policies that QRM cannot.

09.
arXiv (CS.LG) 2026-06-18

KANEL\'E: Kolmogorov-Arnold Networks for Efficient LUT-based Evaluation

arXiv:2512.12850v3 Announce Type: replace-cross Abstract: Low-latency, resource-efficient neural network inference on FPGAs is essential for applications demanding real-time capability and low power. Lookup table (LUT)-based neural networks are a common solution, combining strong representational power with efficient FPGA implementation. In this work, we introduce KANEL\'E, a framework that exploits the unique properties of Kolmogorov-Arnold Networks (KANs) for FPGA deployment. Unlike traditional multilayer perceptrons (MLPs), KANs employ learnable one-dimensional splines with fixed domains as edge activations, a structure naturally suited to discretization and efficient LUT mapping. We present the first systematic design flow for implementing KANs on FPGAs, co-optimizing training with quantization and pruning to enable compact, high-throughput, and low-latency KAN architectures. Our results demonstrate up to a 2700x speedup and orders of magnitude resource savings compared to prior KAN-on-FPGA approaches. Moreover, KANEL\'E matches or surpasses other LUT-based architectures on widely used benchmarks, particularly for tasks involving symbolic or physical formulas, while balancing resource usage across FPGA hardware. Finally, we showcase the versatility of the framework by extending it to real-time, power-efficient control systems.

10.
bioRxiv (Bioinfo) 2026-06-15

Biological meaning in protein embedding space is resolution-dependent

Protein language model embeddings are increasingly used to organise biological sequences, yet how biological meaning is encoded within embedding neighbourhoods remains poorly understood. Using two independent hierarchical enzyme systems, carbohydrate-active enzymes and peptidases, we investigated how biological interpretation changes across embedding organisations aligned to different levels of biological hierarchy. Different embedding organisations give rise to distinct neighbourhood semantics. When aligned to membership-boundary resolution, embeddings robustly separated artefacts and unrelated proteins from members of the target category. However, embeddings aligned to functional-grouping resolution maintained compositional neighbourhood structure for multi-domain proteins spanning more than one functional or catalytic group. Finally, embeddings aligned to local-family resolution recovered compact family-like neighbourhoods, including families withheld from training, while weakening broader membership-boundary and functional-grouping relationships. Moreover, embeddings optimised toward the same level of biological organisation retain different biological relationships depending on optimisation trajectory employed. Together, our results show that proximity in protein embedding space has no fixed biological interpretation. Instead, biological meaning emerges across embedding resolutions through selective preservation of different forms of biological organisation.

11.
arXiv (CS.LG) 2026-06-12

ProtoX-AD: Self-Explainable Time Series Anomaly Detection and Characterization

arXiv:2606.13277v1 Announce Type: cross Abstract: Recent advances in time series anomaly detection (TSAD) have highlighted the effectiveness of self-supervised classification-based approaches. These methods apply transformations to normal training samples, training a classifier to recognize transformation-specific patterns that help identify anomalies through increased classification errors. Despite their strong performance, a significant challenge is their lack of explainability, as they provide limited insight into the characteristics of flagged anomalies. To address this limitation, we propose ProtoX-AD, a prototype-based self-explainable framework for self-supervised TSAD. ProtoX-AD learns transformation-aware latent representations alongside interpretable prototypes, enabling both accurate anomaly detection and the identification of distinct anomalous profiles through prototype-based explanations. Additionally, it allows for systematic analysis of how transformation design impacts detection performance and explainability. Experimental results on synthetic and real-world datasets demonstrate that ProtoX-AD achieves detection performance comparable to its black-box counterparts while offering more consistent and semantically meaningful explanations than existing explainable baselines. Our code is publicly available at https://github.com/Aitorzan3/ProtoX-AD.

13.
arXiv (CS.LG) 2026-06-16

Multi-User Dueling Bandits: A Fair Approach using Nash Social Welfare

arXiv:2605.01961v2 Announce Type: replace Abstract: Learning from human preference data is becoming a useful tool, from fine-tuning large language models to training reinforcement learning agents. However, in most scenarios, the model is trained on the average preference of all human evaluators, which, under large variations of preferences, can be unfair to minority groups. In this work, we consider fairness in dueling bandits, a standard framework for online learning from preference data. We assume that each user has a (potentially distinct) Condorcet winner, which is an arm preferred to every other arm. Using these user-specific Condorcet winners as reference points, we evaluate and score arms according to their performance relative to the corresponding winner. To promote fairness across heterogeneous users, we adopt the well-established Nash Social Welfare objective, which maximizes the product of user utilities, thereby inherently penalizing inequality and preventing the marginalization of any single user. Within this framework, we construct a hard instance to establish a regret lower bound of $\Omega(T^{2/3}\min(K,D)^\frac{1}{3})$ for a time horizon $T$, $K$ arms, and $D$ users, which, to the best of our knowledge, is the first result quantifying the cost of fairness in dueling bandits with heterogeneous preferences. We then present the Fair-Explore-Then-Commit and Fair-$\epsilon$-Greedy algorithms with a Condorcet winner identification phase. We further derive their regret upper bounds that match the lower-bound dependence on $T$ up to logarithmic factors.

14.
arXiv (CS.CL) 2026-06-17

Variable-Width Transformers

Scaling model size, specifically depth and width, has driven significant progress in transformer-based language models. However, most architectures maintain a constant width across all layers, allocating a fixed parameter and computation budget evenly despite different layers potentially playing distinct computational roles. In this work, we empirically investigate nonuniform capacity allocation across network depth by proposing a $\times$-shaped >

15.
arXiv (CS.CV) 2026-06-19

Addressing Detail Bottlenecks in Latent Diffusion for RGB-to-SWIR Image Translation

Latent diffusion models (LDMs) enable efficient image-to-image translation but discard fine spatial details during compression, degrading downstream perception tasks. We identify two bottlenecks: the autoencoder, which loses spatial information, and the conditioning pathway, which further degrades the source signal through naive downsampling. We propose two lightweight, backbone-agnostic fixes: a Source-Conditioned Autoencoder (SCAE) that injects high-resolution source features into the decoder via skip connections, and a Learnable Guidance Encoder (LGE) that replaces naive downsampling with a learned conditioning signal. Evaluated on RGB-to-SWIR translation for driving scenes with two denoiser backbones (U-Net and DiT), our approach improves detection mAP by up to 2x over the latent diffusion baseline, with up to 3.4x gains on small objects (COCO-small,

16.
arXiv (CS.AI) 2026-06-19

Bistable by Construction: Wall-Clock-Calibrated State Monitors Have No Moment-Detection Regime at Agent Cadence

arXiv:2606.19386v1 Announce Type: cross Abstract: Runtime monitors for autonomous agents commonly threshold an accumulated internal state - a behavioural baseline, a drift statistic, or, in our prior work, a modelled affective state. We previously reported a State Saturation Trap: threshold-on-state triggers over a continuous affect engine become near-constant alarms on SWE-bench debugging agents (Modgil 2026). A post-release audit found the engine received dt=0 between actions, so its exponential decay never operated: the published trap is a pure-accumulator result. We correct the record (erratum, v2) and treat the flaw as an experiment. The key variable it exposes is whether a monitor's dynamics are calibrated in sample time (per observation, as in CUSUM) or wall-clock time (half-lives in seconds, as in affect models and EMA baselines). On fixed-rate streams these coincide; on agent streams, where inter-action time varies by orders of magnitude, they do not. A pre-registered sweep over uniform intervals (dt in {0..600}s) on 20 trajectories shows the wall-clock level trigger has two regimes: at dt=60s silent. Every critical dt lies in (1,30]s. Real agent runs measure latency at median 1.53s (p90 2.33s); real coding cadence sits inside the trap regime, vindicating the empirical finding under a corrected mechanism. The structure is a property of the calibration class, not the engine: a minimal wall-clock accumulator over the raw error stream reproduces the same cliff, while a sample-time CUSUM over the identical stream is exactly dt-invariant (20/20). A rising-edge trigger with hysteresis fires 0-3 times per trajectory in every condition. We conclude that wall-clock-calibrated leaky-integrator monitors admit no regime in which they act as moment detectors on agent streams; transition detection escapes the trap at every cadence, but does not recover human intervention timing.

17.
arXiv (math.PR) 2026-06-15

Lehner's operator norm formulas, semidefinite programming, and spiked matrix models

arXiv:2606.14687v1 Announce Type: new Abstract: Lehner (1999) derived elegant formulas for the operator norm $\|\mathfrak{X}\|$ of operators of the form $\mathfrak{X} = \mathbf{A}_0 \otimes \mathfrak{1} + \sum_{i = 1}^n \mathbf{A}_i \otimes \mathfrak{m}_i$, also easily generalized to the spectral edge $\lambda_{\max}(\mathfrak{X})$, in terms of nonlinear optimization problems over positive definite matrices. Here the $\mathbf{A}_i$ are finite-dimensional Hermitian matrices, the $\mathfrak{m}_i$ are either free semicircular or free Rademacher families of operators, and $\mathfrak{1}$ is the identity operator. We first show that both of Lehner's nonlinear optimizations can be rewritten as linear semidefinite programs (SDPs), even in the Rademacher case where Lehner's optimization is not itself convex. We give the primal and dual forms of these SDPs, derive the complementary slackness relations and consequences thereof, and propose that the SDPs are more stable and accurate than the iterative numerical scheme proposed in Lehner's original work. We then apply the SDPs from the semicircular case to spiked matrix models, studied recently via Lehner's formula by Bandeira, Cipolloni, Schröder, and van Handel (2024). We give a new proof of the Baik–Ben Arous–Péché (BBP) transition they establish in models with isotropic (but possibly correlated) Gaussian noise by constructing feasible variables for the associated primal and dual SDPs. Combining our construction with a sensitivity interpretation of optimal dual variables, we study the fluctuations of leading eigenvectors of such models. We conjecture and give numerical evidence that these fluctuations are Gaussian but anisotropic and non-universal, and that their covariance may be computed in terms of the optimizer of the dual of Lehner's formula, which in turn is approximately the leading eigenmatrix of a completely positive operator associated to the covariance of the noise model.

18.
medRxiv (Medicine) 2026-06-11

Dissecting the functional landscape of rare diseases through genomic variation in a heterogeneous cohort of 11,000 patients

Rare diseases (RDs) remain a major diagnostic challenge. Genetic and phenotypic heterogeneity, incomplete knowledge of disease mechanisms, and limitations in variant clinical interpretation leave many patients without a molecular diagnosis. Meanwhile, the growing volume of genomic data generated in clinical practice offers an opportunity to develop data-driven methodologies for exploring disease mechanisms and improving the reanalysis of unsolved cases. We aggregated real-world genomic data from 11,084 unrelated patients with suspected RD. Patients were clinically classified into 122 diseases. We built a multi-disease genomic variant frequency database (FJD-DB), which enabled the development of variant and gene-disease association scores by means of case-control subcohort comparisons across 32 disease groups. Functional enrichment analyses were then used to highlight disease-associated protein domains, pathways, biological processes, and phenotypes. Finally, the resulting knowledge was integrated into a data-driven framework for the guided reanalysis of unsolved RD patients applied to Inherited Retinal Dystrophies (IRD) patients as first use case. FJD-DB contained more than 45 million unique variants, including ~185,000 potentially pathogenic variants. Disease-specific analyses identified disease-associated pathogenic variants and highlighted both established and candidate disease genes. We detected 179 significantly enriched protein domains across 23 diseases, 124 Human Phenotype Ontology terms across 13 diseases, 79 Reactome pathways across 10 diseases, and 72 Gene Ontology biological processes across 8 diseases, revealing highly disease-specific functional signatures. Integration of disease-specific variant, gene, and functional association signals enabled the development of a data-driven framework for guided reanalysis of unsolved RD cases. Applied to more than 1,100 unsolved IRD cases, the framework generated clinically relevant findings in 26 patients, including four molecular diagnoses, seven candidate diagnoses, and 15 cases upgraded from non-informative findings to variants of uncertain significance. Aggregated real-world genomic data can be leveraged to identify disease-associated molecular signals generating novel biological hypotheses. A unified analytical framework provides a scalable strategy for knowledge discovery and guided reanalysis, facilitating the identification of overlooked and potentially novel genetic causes of RDs.

19.
arXiv (CS.LG) 2026-06-16

Hidden Degradation Costs in Energy-Cost-Only HEMS Optimisation: Study on Battery and PV Sensitivity

arXiv:2606.16051v1 Announce Type: cross Abstract: Residential battery energy storage systems (BESS) are increasingly deployed alongside photovoltaic (PV) generation to reduce household energy costs under volatile time-of-use (TOU) tariffs. Model predictive control (MPC) is a widely adopted optimisation strategy for home energy management systems (HEMS), typically formulated to minimise net energy cost, subject to physical and operational constraints. However, battery degradation is rarely embedded in the optimisation objective, meaning its cost is unquantified and aggressive; high-cycle-count strategies could incur significant losses once deployed to physical systems. This paper presents a receding-horizon mixed-integer linear programming (MILP) baseline for a UK residential HEMS, using demand data from the REFIT dataset. A 3 by 3 sensitivity study is conducted across three battery sizes and three PV array sizes, with post-hoc degradation cost estimated using the Naumann stress model and rainflow cycle counting. Results show that degradation remains constant for each battery size and can exceed energy cost savings by up to 1,060 %. These results demonstrate that energy-cost-only optimisation systematically underestimates the true system cost, motivating a degradation-aware control formulation.

20.
arXiv (math.PR) 2026-06-18

Rigidity of infinite exchangeable sequences with Gaussian marginals

arXiv:2606.18654v1 Announce Type: new Abstract: We study infinite exchangeable sequences with Gaussian one-dimensional marginals. We formulate the conjecture that joint Gaussianity of a single pair of coordinates forces the entire sequence to be a Gaussian process. Although this conjecture remains open, we prove that joint Gaussianity of the first four coordinates is sufficient. We also establish the corresponding two-point criterion under the additional assumption that the directing measure is almost surely infinitely divisible.

21.
arXiv (quant-ph) 2026-06-16

Quantum vortex in a fluid flow: negative effective mass and a novel mechanism for turbulence formation

Authors:

arXiv:2606.15803v1 Announce Type: cross Abstract: We explore the movement of a thin, circular quantum vortex filament within an infinite cylindrical pipe. The fluid surrounding the vortex ring moves through the pipe at a non-zero velocity denoted by $v$. Our study examines the energy spectrum $E = E(p)$, where $p$ represents the total momentum of a vortex ring. We have demonstrated that the function $E(p)$ significantly depends on the velocity $v$. The discovered spectrum $E(p)$ reveals the existence of states with both negative and extremely large effective masses. We also explored the hypothesis regarding the existence of coupled vortex pairs possessing finite summary effective masses. Every pair consists of vortices that possess both positive and negative masses, with the magnitude of these masses being unrestricted. In our model, the criterion for the appearance of these states is based on comparing two numbers. The first is seen as a quantum counterpart to the Reynolds number, while the second represents its critical value for a flow with a single vortex. We also explore how this studied effect might contribute to the emergence of quantum turbulence. This study discusses a method for determining the critical Reynolds number in quantum turbulence, using the proposed model as a framework. Here, we use a new quantization technique for classical closed vortex filaments developed by the author earlier.

22.
arXiv (CS.LG) 2026-06-15

PostDeg: Placement Beats Parameterization in LayerNorm GNNs

arXiv:2606.14022v1 Announce Type: new Abstract: LayerNorm-based GNNs routinely erase the topology signals (degree, centrality, $k$-core) that node-selection policies should depend on, but the literature has not located where in the residual block the erasure happens. We answer that question: a positive per-node scalar inserted before LayerNorm is divided out up to a stabilizer term, while the same scalar inserted after LayerNorm reaches the score head as representation magnitude. The surviving slot is the post-LayerNorm position. We instantiate it with PostDeg, a parameter-free post-LayerNorm inverse-degree scale, and pre-register four falsifiers (graphwise scalars, extra LayerNorm, expressive same-slot capacity, backbone-agnostic source) that would reject the rule. PostDeg gains $+3.5\%/+2.5\%/+5.6\%$ over the LN backbone on influence maximization, network dismantling, and maximum independent set, with $10/10$ paired-seed wins per task; none of the four falsifiers fires. The takeaway is that placement, not parameterization, carries the gain – a small invariance check that generalizes to any positive topology scalar in any normalized residual stack.

23.
arXiv (CS.CV) 2026-06-11

From Nominal Intensity to Equivalent Rainfall: A Path-Based Credibility Evaluation Framework for Simulated Rainfall in Autonomous-Driving Perception Tests

Credible simulated-rainfall conditions are essential for identifying perception-system boundaries and supporting SOTIF-oriented risk assessment in automated driving. However, closed-field tests are often described only by nominal rainfall intensity or single-point measurements, making it difficult to align simulated rain fields with real rainfall and map test results to real-world scenarios. This paper proposes a path-based credibility evaluation method for simulated rainfall in autonomous-driving perception tests. Using the drop size and velocity joint distribution of real rainfall as the reference, each candidate path is represented by path-equivalent rainfall intensity, an uncertainty band, and a path-averaged Realism of Raindrop Distribution (RRD) score. Lidar target point-cloud count and mean reflectivity are further used for perception-consistency correction, quantifying the proxy capability of each simulated-rainfall path for real-rainfall perception effects. Experiments are conducted using about 10,000 real-rainfall raindrop-spectrum samples, 728 RainSense perception samples, and 45 spatial sampling points in a 2.4 m x 7.2 m simulated-rainfall area. Results show that spatial non-uniformity remains under the same nominal condition, confirming the need for path-based evaluation. The method identifies Path IV and Path VI as preferable candidates, with results of 11.54 +/- 0.31 mm/h, RRD = 0.43, and 8.28 +/- 0.34 mm/h, RRD = 0.46, respectively. These paths show more balanced performance in rainfall-intensity stability, raindrop-spectrum realism, and perception consistency. The proposed method supports path selection, condition description, and credible interpretation of autonomous-driving perception tests under rainfall.

24.
arXiv (CS.AI) 2026-06-16

PolyKV: Heterogeneous Retention and Allocation for KV Cache Compression

arXiv:2606.15157v1 Announce Type: cross Abstract: KV cache compression is essential for reducing the memory cost of long-context large language model inference. Existing approaches, however, typically apply a single compression policy and a uniform cache budget across all transformer layers. This uniform design ignores the fact that different layers can play different roles during prefill and decoding, and may therefore require different eviction strategies and cache capacities. We present PolyKV, a layer-wise KV cache optimization framework that considers design space with method selection and budget allocation. PolyKV routes each layer to a suitable KV compression policy based on layer-level signals, while assigning non-uniform budgets under a fixed total budget. This formulation enables heterogeneous compositions of existing KV cache methods. Experiments on LLaMA-3.1-8B and Qwen3-8B show that, under the same 512-token average KV budget, PolyKV recovers 54.5% and 25.7% of the LongBench performance gap between the strongest single-policy baseline and FullKV, respectively. Across 128-1024 budget sweep, PolyKV consistently improves over the strongest baseline by 1.7%-6.4%, corresponding to 40.0%-54.5% recovery of the FullKV gap.

25.
arXiv (CS.CV) 2026-06-18

DREAM: Extending Vision-Language Models with Dual-Objective Encoding for Cross-Modal Retrieval

In today's media-driven world, the exponential growth of video content across domains such as surveillance, education, and entertainment has made retrieving semantically relevant videos via natural language queries increasingly critical. Early video retrieval systems relied on handcrafted features or shallow cross-modal mappings, limiting their ability to capture complex semantics and temporal dynamics. While large-scale vision-language models have improved cross-modal alignment, challenges remain in modeling fine-grained temporal dependencies and nuanced linguistic structures. In this paper, we introduce DREAM: Dual-path Representation Enhancement and Alignment Model, a novel multimodal framework that addresses these limitations through enhanced visual and textual encoding. DREAM incorporates a hybrid language modeling strategy that combines masked and permuted language modeling objectives to capture both local and global linguistic semantics. On the visual side, we design a hierarchical vision encoder with cascaded group attention, which integrates spatial and temporal information through multi-stage token interaction and coarse-to-fine attention refinement. We validate DREAM through comprehensive evaluations on the widely-used MSRVTT, MSVD and LSMDC benchmark datasets, where it achieves new state-of-the-art R1 scores of 49.4%, 49.7% and 27.3%, respectively. Qualitative analyses further show the model's ability to maintain coherent attention across frames and align complex queries with dynamic video content. These findings underscore the effectiveness of hierarchical attention and dual-objective textual modeling in enabling robust, context-aware video retrieval, and pave the way for future research in advancing cross-modal representation learning.