Academic Intelligence · Curated Daily

Explore the Frontier of Global Academia

AcademicHub aggregates real-time literature from top journals and preprint platforms. Build your personal research radar and let large language models compile cross-disciplinary analysis briefings automatically.

01.
arXiv (math.PR) 2026-06-18

On the Singular Control of a Diffusion and its Running Infimum or Supremum

arXiv:2501.17577v2 Announce Type: replace-cross Abstract: We study a class of singular stochastic control problems for a one-dimensional diffusion $X$ in which the performance criterion to be optimised depends explicitly on the running infimum $I$ (or supremum $S$) of the controlled process. We introduce two novel integral operators that are consistent with the Hamilton-Jacobi-Bellman equation for the resulting two-dimensional singular control problems. The first operator involves integrals where the integrator is the control process of the two-dimensional process $(X,I)$ or $(X,S)$; the second operator concerns integrals where the integrator is the running infimum or supremum process itself. Using these definitions, we prove a general verification theorem for problems involving two-dimensional state-dependent running costs, costs of controlling the process, costs of increasing the running infimum (or supremum) and exit times. Finally, we apply our results to explicitly solve an optimal dividend problem in which the manager's time-preferences depend on the company's historical worst performance.

02.
arXiv (CS.CV) 2026-06-15

$\mu_0$: A Scalable 3D Interaction-Trace World Model

World models that capture how actions induce physical change enable scalable robot learning without reliance on embodiment-specific action labels. Pixel-space video models provide broad visual priors but expend model capacity on dense appearance reconstruction, while direct action models require embodiment-specific labels that hinder scalability. We present $\mu_0$, a scalable world model based on 3D traces. Rather than predicting dense pixels or directly modeling actions, $\mu_0$ forecasts smooth 3D trajectories for salient interaction points such as objects, tools, hands, and contact regions, yielding a compact, embodiment-agnostic motion interface. To enable training from diverse video sources, our TraceExtract system automatically extracts 3D supervision by selecting keypoints, constructing globally aligned traces, and associating motion segments with hierarchical language captions. This TraceExtract supervision pretrains $\mu_0$ by combining a pretrained vision-language backbone with a modular trace expert, which represents each query via B-spline control points and predicts future traces. Experiments show that $\mu_0$ outperforms baselines in both 2D and 3D trace prediction, including trace prediction models and tokenized VLM methods. Because $\mu_0$ is frozen and reusable, it can be paired with action experts for downstream robot embodiments. Despite action-free pretraining, the resulting trace-conditioned policies achieve performance competitive with VLA models pretrained with action supervision, such as $\pi_0$. These results establish 3D traces as a scalable and transferable representation for cross-embodiment manipulation.

03.
arXiv (CS.CV) 2026-06-19

GenTrack2: An Improved Hybrid Approach for Multi-Object Tracking

This paper proposes a visual multi-object tracking method that jointly employs stochastic and deterministic mechanisms to ensure identifier consistency for unknown and time-varying target numbers under nonlinear dynamics. A stochastic particle filter addresses nonlinear dynamics and non-Gaussian noise, with support from particle swarm optimization (PSO) to guide particles toward state distribution modes and mitigate divergence through proposed fitness measures incorporating motion consistency, appearance similarity, and social-interaction cues with neighboring targets. Deterministic association further enforces identifier consistency via a proposed cost matrix incorporating spatial consistency between particles and current detections, detection confidences, and track penalties. Subsequently, a novel scheme is proposed for the smooth updating of target states while preserving their identities, particularly for weak tracks during interactions with other targets and prolonged occlusions. Moreover, velocity regression over past states provides trend-seed velocities, enhancing particle sampling and state updates. The proposed tracker is designed to operate flexibly for both pre-recorded videos and camera live streams, where future frames are unavailable. Experimental results confirm superior performance compared to state-of-the-art trackers. The source-code reference implementations of both the proposed method and compared-trackers are provided on GitHub: https://github.com/SDU-VelKoTek/GenTrack2

04.
arXiv (CS.CL) 2026-06-19

Apparent Psychological Profiles of Large Language Models are Largely a Measurement Artifact

Psychological instruments designed for humans are increasingly used to assign large language models (LLMs) stable psychological profiles that affect their usability, safety assessment, and use as proxies for human participants in research. Using a formal psychometric framework, we show that these profiles are largely a measurement artifact. Administering a battery of personality and risk-preference instruments spanning self-reports and behavioral tasks to 56 instruction-tuned LLMs alongside large human reference samples, we report four findings. First, differences between models are driven not by the traits an instrument targets but by a directional response bias, a tendency to respond toward one end of the scale, or one labeled option, regardless of item content; a variance decomposition attributes 81-90% of between-model variation to this bias, against 9-16% in humans. Second, the bias declines with model capability but is not eliminated by it. Third, because bias rather than trait drives responding, an instrument's apparent reliability is almost entirely predicted by its response orthogonality, a term we coin for the proportion of items for which trait and bias point in opposite directions. Fourth, the profile a model appears to have shifts with the items used and can be manufactured through item selection. These results demonstrate that the apparent psychological profiles of LLMs are artifacts of the instrument used to measure them, not properties of the models themselves. As instruments borrowed from human psychology are rarely fully orthogonal and may inherently lack validity for LLMs, we call for dedicated assessments centered on response orthogonality.

05.
arXiv (CS.CV) 2026-06-16

EdgeZSAD: Practical Zero-Shot Anomaly Detection on Edge Devices

Industrial inspection needs zero-shot anomaly detection (ZSAD) that remains useful under edge deployment constraints. Recent methods often rely on ViT-L foundation backbones (~300M parameters), which exceed the memory and operator budget of typical embedded hardware. We study this regime through EdgeZSAD, a compact reference system built around a TinyViT-21M-512 backbone, an asymmetric global-local readout (EdgeGLR), and a reproducible source-side training recipe (Real-IAD-DR). We train a single checkpoint in a source-trained, target-unseen protocol and evaluate it across six industrial benchmarks. Across three independent runs, the resulting model reaches an average image AUROC of 91.6 on MVTec-AD and 88.2 on VisA, while remaining directly deployable on Jetson Orin Nano Super (TensorRT FP16) and RB5 Gen2 (QNN GPU FP16). Across the six device-rescored benchmarks, image-AUROC drift stays below 0.2 points, indicating that the exported graph preserves host-side ranking behavior in the evaluated deployment setting.

06.
arXiv (CS.AI) 2026-06-17

Fixed-Point Reasoners: Stable and Adaptive Deep Looped Transformers

arXiv:2606.18206v1 Announce Type: new Abstract: Looped architectures provide an inductive bias toward learning step-by-step procedures for tasks that require compositional reasoning. The number of effective layers reached by looping determines the quality of the solution these models find. Like deep architectures, looped architectures are prone to a signal propagation problem induced by depth as the halting decision is postponed. In this paper, we address this signal propagation issue using pre-norm layers and residual scaling. Building on these architectural modifications, we propose FPRM, a Transformer-based Fixed-Point Reasoning Model that uses fixed-point convergence as an end-to-end halting mechanism in a looped architecture. We show that fixed-point halting allows FPRM to adapt its compute to task difficulty. FPRM is effective on common reasoning benchmarks, namely Sudoku, Maze, state-tracking, and ARC-AGI.

07.
arXiv (CS.LG) 2026-06-15

Ensembling Sparse Autoencoders

arXiv:2505.16077v2 Announce Type: replace Abstract: Sparse autoencoders (SAEs) are used to decompose neural network activations into human-interpretable features. Typically, features learned by a single SAE are used for downstream applications. However, it has recently been shown that a single SAE captures only a limited subset of features that can be extracted from the activation space. Motivated by this limitation, we introduce and formalize SAE ensembles. Furthermore, we propose to ensemble multiple SAEs through naive bagging and boosting. In naive bagging, SAEs trained with different weight initializations are ensembled, whereas in boosting SAEs sequentially trained to minimize the residual error are ensembled. Theoretically, naive bagging and boosting are justified as approaches to reduce reconstruction error. Empirically, we evaluate our ensemble approaches with three settings of language models and SAE architectures. Our empirical results demonstrate that, compared to an expanded SAE that matches the number of features in the ensemble, ensembling SAEs improves the reconstruction of language model activations along with SAE stability. Additionally, on downstream tasks such as concept detection and spurious correlation removal, SAE ensembles achieve better performance, showing improved practical utility.

08.
arXiv (CS.CV) 2026-06-18

Multi-Modal Hyper-Graph Fusion for Low-Light Crowd Counting

Crowd counting is a fundamental task in computer vision. However, crowd counting in low-light environments remains largely underexplored, despite its practical importance in the real world. Existing methods mainly focus on well-lit scenes or rely on single-modality Red-Green-Blue (RGB) representations, which often become unreliable under extreme darkness and complex non-uniform illumination. To handle this problem, we construct three new low-light crowd counting benchmarks, which consist of two synthetic datasets, SHA\_Dark and SHB\_Dark, and a real-world benchmark LC-Crowd (Low-light Crowd Dataset). Inspired by Retinex-based physical modeling, we introduce depth and Canny edge cues as complementary geometric and structural priors to enhance the intrinsic reflectance representation under low-light conditions. We propose a Multi-Modal Hyper-Graph Fusion module, which formulates RGB appearance, depth geometry, and edge structure cues as nodes in a unified hyper-graph and explicitly captures their high-order complementary relationships via dynamic hyperedge construction and message passing. Furthermore, to adaptively allocate computation in dense prediction, we propose a Deformable Rectangular Sparse Attention (DRSA) module, which concentrates computation on informative regions through anchor-aware estimation and adaptive rectangular window modeling. Based on these designs, we develop a unified Low-Light Counting Network (LCNet) for robust low-light crowd counting. Extensive experiments on three benchmarks demonstrate that the proposed method achieves the best overall performance against existing state-of-the-art (SOTA) methods. The code is in the supplementary material. The datasets will be made public upon acceptance.

09.
arXiv (CS.LG) 2026-06-19

Fisher-Geometric Sharpness and the Implicit Bias of SGD toward Flat Minima

arXiv:2606.20469v1 Announce Type: new Abstract: A widely held intuition in deep learning is that stochastic gradient descent (SGD) implicitly favors flat minima and that flat minima generalize better, but standard Euclidean measures of flatness such as the trace or maximum eigenvalue of the loss Hessian are not invariant under reparametrizations that preserve the network function, which undermines the theoretical foundations of this narrative. In this study we resolve this issue by grounding flatness in the Riemannian geometry of the statistical manifold induced by the Fisher Information Matrix (FIM). We define Riemannian sharpness mathematically and prove that it is invariant under smooth, function-preserving reparametrizations, which directly addresses the critique of Dinh et al. in the paper ``Sharp minima can generalize for deep nets''.We note that this invariance is a property of the true FIM; the diagonal empirical estimator used in practice (and in all experiments below) inherits invariance only approximately, and exact invariance under arbitrary reparametrizations would require structured estimators such as K-FAC. We formalize the gradient noise of mini-batch SGD as having a covariance structure proportional to the FIM, derive the stationary distribution of the resulting stochastic differential equation, and then show that the probability mass is exponentially concentrated at Riemannian-flat minima. A PAC-Bayes generalization bound controlled explicitly by SR formally links this geometric bias to test performance. Our experiments on MNIST and CIFAR-10 confirm that SR reliably tracks generalization in ways that Euclidean sharpness does not, and that its scaling with $\eta/B$ matches the theoretical predictions. Together these results provide a rigorous, reparametrization-invariant account of why flat minima generalize.

10.
arXiv (CS.CL) 2026-06-11

Litespark Inference For CPUs: Ultra-Fast SIMD Framework for Ternary (1.58-bit) Language Models

Large language models (LLMs) have transformed artificial intelligence, but their computational requirements remain prohibitive for most users. Standard inference demands expensive datacenter GPUs or cloud API access, leaving over one billion personal computers underutilized for AI workloads. Ternary models offer a path forward: their weights are constrained to {-1, 0, +1}, theoretically eliminating the need for floating-point multiplication. However, existing frameworks fail to exploit this structure, treating ternary models as dense floating-point networks. We address this gap with custom SIMD kernels that replace matrix multiplication with simple addition and subtraction operations, targeting the integer dot product instructions available on modern CPUs. Our implementation, Litespark-Inference, is pip-installable and integrates directly with Hugging-Face, achieving 18.15x higher throughput, 7.15x faster time-to-first-token and 6.03x memory reduction compared to standard PyTorch inference on Apple Silicon, with comparable or higher throughput speedups up to 95.81x on Intel and AMD processors.

11.
arXiv (CS.LG) 2026-06-19

Integrating national forest inventory, airborne lidar, and satellite imagery for wall-to-wall mapping of forest structure with computer vision

arXiv:2606.20291v1 Announce Type: new Abstract: Remote sensing is increasingly relied upon to deliver actionable science for forest and wildfire risk management across large landscapes. Wall-to-wall, annually updated maps are a persistent need for effective forest management. Many planning systems and data collections combine disparate data sources with different purposes, vintages, and prediction quality, which leads to confounding behavior in operational planning systems. We introduce the VibrantForests framework, developed and applied to map forest attributes and provide a coherent foundation for effective forest and wildfire planning. VibrantForests includes a satellite-based forest structure model trained on lidar-derived samples and applied across the contiguous United States to concurrently generate estimates of canopy cover, canopy height, aboveground live tree biomass, basal area, and quadratic mean diameter at 10-meter resolution. We demonstrate predictive capability spanning the full spectrum of forest conditions ranging from sparse-canopy/low-biomass to dense-canopy/high-biomass. Results show that our model extends the range at which saturation is commonly encountered in comparable passive-sensor models, and reduces regression-to-mean behavior that commonly produces overestimation of forest attributes in small/sparse conditions and underestimation in large/dense conditions. The VibrantForests framework addresses a key limitation in large-area forest and wildfire planning by delivering coherent wall-to-wall estimates of management-relevant attributes at annual cadence and 10m resolution.

12.
arXiv (CS.LG) 2026-06-16

ML Inference Scheduling with Predictable Latency

arXiv:2512.18725v3 Announce Type: replace Abstract: Machine learning (ML) inference serving systems can schedule requests to improve GPU utilization and to meet service level objectives (SLOs) or deadlines. However, improving GPU utilization may compromise latency-sensitive scheduling, as concurrent tasks contend for GPU resources and thereby introduce interference. Given that interference effects introduce unpredictability in scheduling, neglecting them may compromise SLO or deadline satisfaction. Nevertheless, existing interference prediction approaches remain limited in several respects, which may restrict their usefulness for scheduling. First, they are often coarse-grained, which ignores runtime co-location dynamics and thus restricts their accuracy in interference prediction. Second, they tend to use a static prediction model, which may not effectively cope with different workload characteristics. In this paper, we evaluate the potential limitations of existing interference prediction approaches, finding that coarse-grained methods can lead to noticeable deviations in prediction accuracy and that static models degrade considerably under changing workloads.

13.
arXiv (CS.CV) 2026-06-15

ADAPT: An Autonomous Forklift for Construction Site Operation

Efficient material logistics play a critical role in controlling costs and schedules in the construction industry. However, manual material handling remains prone to inefficiencies, delays, and safety risks. Autonomous forklifts offer a promising solution to streamline on-site logistics, reducing reliance on human operators and mitigating labor shortages. This paper presents the development and evaluation of ADAPT (Autonomous Dynamic All-terrain Pallet Transporter), a fully autonomous off-road forklift designed for construction environments. Unlike structured warehouse settings, construction sites pose significant challenges, including dynamic obstacles, unstructured terrain, and varying weather conditions. To address these challenges, our system integrates AI-driven perception techniques with traditional approaches for decision making, planning, and control, enabling reliable operation in complex environments. We validate the system through extensive real-world testing, comparing its continuous performance against an experienced human operator across various weather conditions. Our findings demonstrate that autonomous outdoor forklifts can operate near human-level performance, offering a viable path toward safer and more efficient construction logistics.

14.
arXiv (CS.CV) 2026-06-24

Automated Residual Plot Assessment With the R Package autovi and the Shiny Application autovi.web

Visual assessment of residual plots is a common approach for diagnosing linear models, but it relies on manual evaluation, which does not scale well and can lead to inconsistent decisions across analysts. The lineup protocol, which embeds the observed plot among null plots, can reduce subjectivity but requires even more human effort. In today's data-driven world, such tasks are well suited for automation. We present a new R package that uses a computer vision model to automate the evaluation of residual plots. An accompanying Shiny application is provided for ease of use. Given a sample of residuals, the model predicts a visual signal strength (VSS) and offers supporting information to help analysts assess model fit.

15.
arXiv (CS.CL) 2026-06-24

From Task-Guided Conversational Graphs to Goal-Oriented Dialogue Runtimes

Graph and multi-agent orchestration frameworks make production large language model (LLM) workflows practical, but they do not by themselves solve conversational continuity when users maintain several interdependent objectives. This conceptual systems paper focuses on the high-complexity end of that design space, where goals can be suspended, resumed, revised, and invalidated by actions in other goals. We introduce the Goal-Oriented Dialogue Runtime (GODR), a framework-neutral design pattern that treats goals, task frames, lifecycle state, invalidation rules, and resumption contracts as first-class runtime objects while delegating bounded execution to graph runtimes, agents, tools, or application programming interfaces (APIs). GODR is not proposed as a replacement for workflow graphs in simple guided processes; it is intended for complex, multi-domain, interruptible conversations where objective continuity cannot be recovered reliably from agent identity, chat history, or execution-graph position alone. The paper formalizes the problem, proposes runtime objects and architecture-selection criteria, and frames evaluation as an agenda for future empirical validation rather than as a measured performance claim.

16.
arXiv (CS.CV) 2026-06-16

Decoupled Motion Representation Learning for Moving Infrared Small Target Detection

Infrared small target detection in dynamic scenes remains challenging due to the highly coupled motions among targets, imaging platforms, and dynamic backgrounds. Existing multi-frame methods usually perform implicit temporal modeling, where coherent background dynamics dominate motion correspondence learning, leading to an inherent trade-off between detection and false alarms. In this work, we observe that background motions exhibit strong global coherence, whereas small targets mainly correspond to sparse local motion anomalies. Moreover, many false-alarm responses maintain high consistency with globally coherent motion patterns, indicating that they mainly originate from coherent background dynamics rather than genuine target motions. Based on these observations, we propose a decoupled motion representation learning framework for moving infrared small target detection. Specifically, an explicit motion branch is introduced to model globally coherent motion dynamics using pretrained optical flow priors, together with a structure-preserving self-supervised adaptation strategy for infrared motion correspondence learning. Meanwhile, an implicit motion branch based on deformable feature alignment is designed to capture target-sensitive local motion anomalies under coherent motion guidance. Furthermore, a coherent-motion-guided local anomaly reasoning module is proposed to identify and suppress coherent-motion-induced false responses during localized motion modeling. Extensive experiments on two challenging infrared small target detection benchmarks demonstrate that the proposed method consistently outperforms existing state-of-the-art approaches, particularly in dynamic scenes with complex motions, while maintaining favorable inference efficiency.

17.
arXiv (CS.LG) 2026-06-18

Gaussian Mixture Attention: Linear-Time Sequence Mixing via Probabilistic Latent Routing

arXiv:2606.18283v1 Announce Type: new Abstract: The dense token-to-token interaction pattern of standard dot-product attention remains a central bottleneck in scaling Transformer architectures to long contexts. We introduce Gaussian Mixture Attention (GMA), a probabilistic attention-style sequence mixer that replaces explicit pairwise query–key comparison with routing through $K$ learned Gaussian mixture components. Queries and keys are mapped to posterior responsibility vectors over a shared latent routing space; their overlap defines an implicit responsibility-space affinity, while values are written into and read from a $K$-slot latent memory. By exploiting the associativity of matrix multiplication, GMA avoids materializing the induced $N\times N$ affinity matrix and instead uses two responsibility matrices whose dominant activation storage scales as $\mathcal{O}(NK)$ rather than $\mathcal{O}(N^2)$ for fixed $K$. We formulate bidirectional and causal variants of GMA, provide an end-to-end differentiable parameterization of the Gaussian mixture components, and analyze its responsibility-modulated gradient structure, constrained non-negative low-rank affinity interpretation, and local routing stability. Empirically, GMA exhibits the intended fixed-$K$ linear memory scaling and is competitive with attention-style baselines on long-context classification, while causal GMA improves over tested linear/random-feature attention variants on WikiText-103 but remains behind optimized causal SDPA and Mamba in the current implementation. Analysis of learned responsibilities further shows broad component usage and moderate alignment with surface-form token categories, supporting GMA as a probabilistic, interpretable, fixed-$K$ linear-time attention-style alternative rather than a universal replacement for optimized softmax attention or state-space models.

18.
bioRxiv (Bioinfo) 2026-06-18

A unified smoothing framework for protein domain bigram model

Biomolecular sequences can be represented as strings over an alphabet, an analogy that has motivated many applications of computational linguistic techniques to biological problems. However, such methods must be adapted to the characteristic scale and organization of biomolecular data. Here, we consider the problem of bigram smoothing for multidomain protein architectures, where domain bigram frequency data is extremely sparse and differs from textual data in alphabet size, string length distribution, the relationship between bigram and unigram frequencies, tandem repeat lengths, and the distribution of domain adjacencies. Moreover, some domain combinations are unobserved because they are biologically incompatible, others because the data are incomplete. A smoothing method that distinguishes these two cases is required. We propose a unified smoothing framework based on interpolation that can be tuned to accommodate different bigram data characteristics. Within this framework, we design specific model variants suited to protein domain bigram data: these assign low adjusted counts to pairs that are likely incompatible, while making appropriate adjustments for undersampled pairs. We demonstrate empirically that this approach distinguishes the two cases while preserving the characteristic signatures of multidomain data.

19.
medRxiv (Medicine) 2026-06-17

Trends in Suicide Mortality by Method among US Individuals aged 10-24 Years from 1999 to 2024

Background: Suicide is the second leading cause of death in US adolescents aged 10-24. Method use strongly influences lethality and design of prevention strategies, but recent trends remain unclear. We therefore aimed to investigate trends in suicide mortality rates by method, age group, and sex. Methods: This cross-sectional study used suicide mortality data from the National Center for Health Statistics for a quarter-century period, between 1999 and 2024. All individuals aged 10-24 years at the time of death, with suicide as the underlying cause, were included. We estimated suicide mortality rates (i.e., the number of suicide deaths per 100,000 people) and annual percent change by method (firearm, asphyxiation, poisoning, other), age group (10-14, 15-19, 20-24), and sex. Changing trend time points were determined using Joinpoint regression models Results: From 1999 to 2024, 159,241 suicide deaths occurred among individuals aged 10-24. While suicide rates declined across all age groups between 2017 and 2024, the male-to-female gap narrowed by 18.9%. Among 10-14-year-olds, declining rates among males masked a consistent increase in female suicide rates since 2011. Although asphyxiation-related suicides decreased across all groups since 2018, firearm suicide rates increased for females in the 10-14 and 20-24 age groups. Albeit not as common as firearms or asphyxiation, poisoning suicide rates increased in the 15-19 and 20-24 age groups. Since 1999, suicide rates by other less common methods (e.g., jumping) showed significant increases, for both sexes, especially among individuals aged 20-24. Suicide rates were consistently highest in the 20-24 age group across all study years. Conclusion: The decrease in suicide mortality rates among individuals aged 10-24 was largely driven by declines in males and reductions in asphyxiation-related suicides. However, increasing female suicide rates in the 10-14 age group, as well as increasing rates of death by less common means, warrant close attention. While suicide prevention efforts like structural interventions and means restriction have shown effectiveness among male adolescents, priority should now be given to adapting these approaches for female adolescents, particularly those aged 10-14.

20.
arXiv (CS.CL) 2026-06-11

When More Documents Hurt RAG: Mitigating Vector Search Dilution with Domain-Scoped, Model-Agnostic Retrieval

Retrieval-augmented generation degrades when scaled to large, heterogeneous document collections, where dense similarity loses discriminative power, and top-k retrieval increasingly returns semantically similar but contextually incorrect chunks. We refer to this failure mode as vector search dilution. Even when using hybrid dense+sparse retrieval, we observed this firsthand in a deployed Wyoming Department of Transportation corpus, where scaling from 54 to 1,128 documents (88,907 chunks) reduced accuracy from 75% to below 40%. To address this dilution, we propose MASDR-RAG ( Multi-Agent Scoped Domain Retrieval for RAG) and evaluate it on 200 expert-validated queries across five LLM backbones, six corpora, and two index stacks. Our results indicate that domain scoping using organizational metadata is the key fix, significantly improving P@10 from 0.77 to 0.86 ($p < 0.05$). Furthermore, our investigation of multi-agent orchestration revealed that a high degree of configuration dependence results –creating what we call the precision-faithfulness paradox. Based on these varied outcomes, our practical recommendation is simple: scope first, then perform a single synthesis call, reserving full multi-agent orchestration for genuinely multi-domain corpora paired with native-tool-call backbones. Code and Data will be made public upon acceptance.

21.
arXiv (quant-ph) 2026-06-19

Application and quantum properties of superpositions of oppositely squeezed states

arXiv:2511.03204v2 Announce Type: replace Abstract: We show that superpositions of oppositely squeezed states – non-Gaussian Schr{\"{o}}dinger-cat-like states – exhibit enhanced nonclassical features and provide an entanglement advantage in the small-squeezing regime. These states possess photon-number structures distinct from conventional coherent-state cat states, and we analyze their Wigner functions and the entanglement generated when they are injected into a 50-50 beam splitter. As a practical application, we demonstrate that they enable a high-quality heralded single-photon source whose second-order intensity correlation function is smaller than that obtained from a pure two-mode squeezed vacuum state. We further propose a linear-optical heralding scheme that approximates these superpositions without requiring strong Kerr nonlinearities. Our results indicate that the superposition of oppositely squeezed states is a promising non-Gaussian resource for quantum information processing, particularly for single-photon generation.

22.
arXiv (math.PR) 2026-06-11

On the Wasserstein distance between a hyperuniform point process and its mean

arXiv:2404.09549v3 Announce Type: replace Abstract: We study the existence of bounds on the expected $p$-Wasserstein distance between a random measure and its mean under the assumption that the $p$-th centered moments of the counting statistics are controlled uniformly in space. The average Wasserstein transport cost is shown to be bounded from above and from below by some multiples of the number of points. $D$-dimensional versions of those results are also obtained. As a corollary, we prove that for any value of $p\geq 1$ the Ginibre point process can be seen as a perturbed lattice with identically distributed perturbations with a finite $p$-th moment.

23.
arXiv (quant-ph) 2026-06-24

Infinite-Level Hierarchy of Solvable Quantum Circuits

arXiv:2606.23803v1 Announce Type: new Abstract: Dual-unitary circuits have emerged as a paradigm of exactly solvable yet non-integrable quantum dynamics. Recently, a generalization of dual unitarity attempting to extend the phenomenology of exactly solvable circuits has been introduced through a hierarchy of conditions, with dual unitarity as the first level. However, beyond the second level the proposed generalized dual-unitary hierarchy ceases to be solvable in the whole spacetime. We present an infinite hierarchy of solvability conditions remedying this problem. These new conditions can be combined with the generalized dual-unitary hierarchy to obtain circuits for which correlation functions and entanglement dynamics can be analyzed exactly in the whole spacetime. We show that this novel hierarchy possesses non-trivial solutions at every level. Our results demonstrate that dual unitarity can be systematically extended while preserving solvability, opening up investigations of exactly solvable non-integrable systems with more general properties.

24.
arXiv (CS.LG) 2026-06-16

ROVE: Unlocking Human Interventions for Humanoid Manipulation via Reinforcement Learning

arXiv:2606.17011v1 Announce Type: cross Abstract: Human interventions provide crucial corrective signals for post-training Vision-Language-Action (VLA) models. However, enabling seamless humanoid interventions is a formidable systems challenge due to complex whole-body kinematics and dexterous-hand control. Consequently, the collected intervention trajectories are often suboptimal, and methods that rely on human interventions as expert supervision can absorb hesitant, inefficient, or even erroneous behaviors. To address both the system and algorithmic challenges, we propose ROVE, a reinforcement learning framework for humanoid VLA post-training with imperfect human interventions. First, ROVE introduces a human-in-the-loop pipeline capable of collecting deployment and intervention data for humanoid manipulation. Second, it utilizes Optimistic Value Estimation (OVE) to prioritize high-value behaviors from mixed-quality trajectories. To further robustify value estimation, we incorporate cross-embodiment human experience videos to provide rich supervision for long-tailed failure and recovery modes. The resulting critic yields informative advantage signals, steering the VLA actor to focus on high-value behaviors rather than indiscriminately imitating all actions. On challenging real-world contact-rich and fine-grained humanoid manipulation tasks, ROVE outperforms experience-learning baselines and consistently improves across multiple rollout-intervention iterations.

25.
arXiv (CS.AI) 2026-06-15

CSPO: Constraint-Sensitive Policy Optimization for Safe Reinforcement Learning

arXiv:2606.14415v1 Announce Type: new Abstract: Safe reinforcement learning (Safe RL) aims to maximize expected return while satisfying safety constraints, typically modeled as Constrained Markov Decision Processes (CMDPs). While primal-dual methods scale well to deep RL, they often suffer from delayed constraint correction, leading to oscillatory behavior and prolonged safety violations. In this paper, we propose Constraint-Sensitive Policy Optimization (CSPO), a first-order primal-dual method that incorporates local constraint sensitivity into policy updates. CSPO augments the primal objective with a constraint-sensitive correction derived from the shortest signed distance to the safety boundary, enabling smarter recovery steps back to safety, compensating for delayed Lagrange multiplier updates, reducing oscillations near the boundary, and preserving the KKT solutions of the original constrained problem. Experiments on navigation and locomotion benchmarks demonstrate that CSPO achieves faster safety recovery and high reward preservation, resulting in higher constrained returns compared to state-of-the-art primal-dual and penalty-based methods