Academic Intelligence · Curated Daily

探索全球前沿学术脉络

AcademicHub 汇聚顶级期刊与预印本平台的实时文献。定制您的专属科研雷达,利用大语言模型自动生成交叉领域文献分析简报。

01.
arXiv (CS.AI) 2026-06-18

Ghost Attractor Networks: Basin-Structured Dynamical Decoders for Closed-Loop Sequential Generation

arXiv:2606.18315v1 Announce Type: cross Abstract: Sequential output generation with large-scale Transformer and diffusion decoders pays a memory cost that grows with sequence length, plus iterative per-step computation. Replacing them with small feed-forward decoders restores efficiency but produces unstructured latent representations that limit closed-loop control: phase-conditioned action generation and cross-step latent carry-over both require a latent geometry with stable basins. This article proposes Ghost Attractor Networks, a theoretically derived dynamical decoder whose latent evolves under a learned potential with drift and produces a basin-attractor structure by construction. Three desiderata (multi-modality, decoder-level single-pass switching, and constant memory) motivate the potential-drift form, and mode transitions arise as saddle-node bifurcations with ghost-attractor escape. A hierarchical phase-space decomposition separates first-order basin convergence from second-order proprioceptive refinement. Empirically, a Ghost trained end-to-end with a behavioral-cloning and contrastive objective exhibits the predicted gradient-flow contraction in its potential, with the gradient norm decaying by 67 percent across five integration steps on 1430 held-out samples. Ghost is evaluated as a robotic action decoder. A 2.3-million-parameter Ghost matches the offline accuracy of a 1.07-billion-parameter Diffusion Transformer at 462 times fewer parameters and 32 times lower latency, and beats five alternative 2M-parameter decoders (MLP, Neural ODE, CVAE, Transformer, 1-step Diffusion) on offline mean squared error by 5.9 to 29 percent. On the LIBERO-10 closed-loop benchmark, phase conditioning on Ghost's basin-structured latent yields a 13.5 percentage-point success-rate gain over a feed-forward MLP baseline, and persistent-latent ensembling reaches a 95.7 percent final success rate.

02.
arXiv (quant-ph) 2026-06-16

Towards Quantum Limited Spatial Resolution of NV-Diamond Magnetometry

arXiv:2508.13438v2 Announce Type: replace Abstract: Optically addressable ensembles of solid-state defects, such as nitrogen vacancy (NV) centers, are a leading modality for imaging-based magnetometry, thermometry and strain sensing. However, monitoring the fluorescence of individual defects within a sub-diffraction ensemble remains an outstanding challenge that currently limits access to atomic-scale features and dynamics. For compact clusters of NVs, we formulate imaging-based atomic sensing as a low-dimensional multiparameter estimation task in which one seeks to localize each defect and quantify the field strength in its immediate vicinity. In this work, we employ optical spatial mode demultiplexing (SPADE) to enhance localization and brightness estimation accuracy at sub-diffraction scales. Specifically, we develop a two-stage sensing protocol that augments direct imaging by projecting the incoming optical field onto point spread function (PSF)-adapted, i.e., PAD spatial modes and Yuen-Kennedy-Lax (YKL) spatial modes enabling efficient extraction of emitter positions and brightnesses. The YKL-SPADE measurement employed for brightness estimation is shown to be quantum-optimal in the case of two emitters and establishes a new connection between quantum detection and estimation theories. We numerically evaluate the statistical performance of our protocol for sub-diffraction optically detected magnetic resonance (ODMR) and Rabi sensing experiments. Compared to conventional focal plane intensity measurements, our protocol improves emitter localization accuracy by 6$\times$ and brightness estimation accuracy by 2$\times$ for tightly confined ensembles, residing well below the diffraction limit.

03.
arXiv (quant-ph) 2026-06-17

An energy-based uncertainty principle and low-energy state preparation

作者:

arXiv:2603.15495v2 Announce Type: replace Abstract: Preparing low-energy states of many-body Hamiltonians is a central challenge in quantum computing, quantum complexity, and condensed matter physics. Existing approaches often get trapped in suboptimal states such as high-energy eigenstates or, more generally, low-variance states that resist further energy reduction. In this work, we explore a different perspective: instead of optimizing with respect to a single Hamiltonian, we leverage the fact that many systems admit families of Hamiltonians that share similar low-energy subspaces but differ at higher energies. We show that this redundancy can be turned into an algorithmic resource by establishing an energy-based uncertainty principle, which implies that these Hamiltonians cannot simultaneously admit low-variance states at higher energies. This suggests a simple strategy of alternating energy-lowering steps across such Hamiltonians, which we investigate numerically on several models. We also introduce a sparse variant where the uncertainty principle yields quadratically larger variance at higher energies, leading to more pronounced energy change. Overall, this work suggests a range of open questions at the interface of random matrix theory, local Hamiltonians and low-energy state preparation, aimed at understanding when such approaches are practical and how they can be analyzed rigorously.

04.
arXiv (CS.CV) 2026-06-16

Transformation-driven generation of comparable projection images from multimodal anatomical scenes

This work addresses the computational problem of generating reproducible projection-space observations from heterogeneous anatomical scenes whose components may undergo independent spatial transformations. We propose a transformation-driven framework for synthetic projection imaging from multimodal anatomical data and demonstrate it on mandibular-motion scenarios. In contrast to conventional Digitally Reconstructed Radiograph (DRR) approaches primarily designed for registration, projection realism, or rendering efficiency, the proposed formulation treats projection imaging as an observation process operating on an explicitly represented anatomical scene. Independently transformable volumetric and surface-based anatomical objects are embedded within a shared scene representation and propagated directly into projection space through explicit transformations. Projection geometry, acquisition modelling, material interpretation, and image presentation remain explicitly separated, enabling controlled exploration of methodological assumptions while preserving reproducibility and direct comparability between generated projections. Particular emphasis is placed on transformation-driven anatomical scenarios relevant to craniofacial analysis, including mandibular motion and therapeutic repositioning. Using a shared anatomical reference scene composed of CT/CBCT volumes, segmented structures, surface models, and auxiliary anatomical or therapeutic objects, the framework enables generation of directly comparable VirtualRTG projections from multiple anatomical configurations while preserving identical imaging assumptions. Rather than aiming at fully physically faithful radiographic simulation, the proposed approach provides a controllable and reproducible methodological environment for studying anatomy–projection relationships, motion observability, and transformation-aware imaging workflows.

05.
arXiv (math.PR) 2026-06-12

Scaling limit of additive functionals for reversible non-gradient exclusion process: critical cases

arXiv:2606.13442v1 Announce Type: new Abstract: For the reversible speed-change exclusion process $(\eta_t)_{t \geq 0}$ in $\mathbb{Z}^d$, we study the scaling limit of additive functionals ${\Gamma_t(f) = \int_0^t f(\eta_s)\, \mathrm{d} s}$. Concerning the local centered function $f$, the previous work [Commun. Math. Phys. 104, 1-19, 1986] by Kipnis and Varadhan and [Comm. Pure Appl. Math., 66: 649-677, 2013] by Gon{ç}alves and Jara respectively covered the cases $d \geq 3$ and $d=1$. The present paper completes the missing part $d=2$, and also develops the theory for functions with higher degree. The novelty is a quantitative homogenization of the resolvent, which allows to overcome the obstacle of correlation function in non-gradient models.

06.
arXiv (quant-ph) 2026-06-12

Experiment-compatible measurement–feedback quantum state preparation with reinforcement learning

arXiv:2606.13005v1 Announce Type: new Abstract: Ground-state preparation is a critical task in quantum simulation and quantum computing, as it enables the study of correlated phases and the generation of entangled resource states. While measurement–feedback control has emerged as a promising route to state preparation, existing schemes either rely on handcrafted, task-specific policies or are designed using full quantum-state information that is unavailable in real experiments and becomes impractical for large many-body systems. Here we develop an adaptive measurement–feedback protocol based on reinforcement learning under partial observability. The controller uses only the history of experimentally accessible measurement outcomes to choose both the measurement operator and the feedback action in real time. To make training compatible with experiments, we introduce a stochastic terminal reward built from one-shot measurements of randomly sampled Hamiltonian components, avoiding unphysical full-state reconstruction while remaining an unbiased estimator of the target energy. We demonstrate the method by preparing ground states of the Bose–Hubbard model and by generating GHZ states, establishing a scalable and hardware-compatible route to quantum state preparation.

07.
arXiv (quant-ph) 2026-06-16

Optimizing Wigner Negativity in Scattering Processes Using Energetic Cost Functions

arXiv:2606.15101v1 Announce Type: new Abstract: Wigner negativities (WNs) are key signatures of non-Gaussian bosonic states and essential resources for quantum technologies. We study their generation in the scattering of coherent pulses by a two-level atom coupled to a one-dimensional reservoir, a unitary and energy-preserving platform. Optimization in this multimode setting is hindered by the complexity of evaluating Wigner functions. We overcome this challenge by introducing energetic cost functions that identify output modes most likely to host large negativities. First using incoherent energy and then isolating a genuinely non-Gaussian contribution, we demonstrate a strong correlation between these quantities and WNs. This correlation extends beyond short, intense pulses to encompass pulses of finite energy, where photons are scattered while the two-level atom is driven. Focusing on the energy-efficiency of the process, we show that maximally efficient generation takes place for one input photon, on average, spectrally mode-matched with the atom.

08.
arXiv (CS.LG) 2026-06-15

Diffusion Policy Optimization without Drifting Apart

arXiv:2606.13795v1 Announce Type: new Abstract: RL post-training has become increasingly pivotal for improving diffusion policies, but existing diffusion policy-gradient methods are often unstable and cannot achieve reliable policy improvement. We identify the cause as the double-drift phenomenon: optimizing a variational surrogate can let the ELBO separate from the true log-likelihood, which then makes the resulting proxy policy gradient misaligned with the true policy gradient of expected return. We propose DiPOD, a diffusion policy optimization framework that maintains tight-bound behavior throughout training by interleaving self-distillation with policy-improving gradient updates. This leads to a simple and practical algorithm: augmenting each diffusion policy-gradient update with an on-policy ELBO regularizer. Across diffusion language model post-training and continuous-control diffusion policies, DiPOD substantially stabilizes training and reaches higher rewards than previous methods.

09.
arXiv (CS.CV) 2026-06-18

Beyond Nearest Neighbor Interpolation in Data Augmentation

Avoiding the risk of undefined categorical labels using nearest neighbor interpolation overlooks the risk of exacerbating pixel level annotation errors in augmented training data. Additionally, the inherent low pass filtering effects of interpolation algorithms exacerbate the risk of degrading high frequency structural details within annotated regions of interest. To avoid these risks, the author modified convolutional neural networks data transformation functions by incorporating a modified geometric transformation function, removing reliance on nearest neighbor interpolation, and integrating a mean-based class filtering mechanism to handle undefined categorical labels with alternative interpolation algorithms. The author also implemented an offline data augmentation pipeline to generate interpolation specific augmented training data, enabling quantitative assessment of interpolation specific low pass filtering effects on augmented training data. Experimental evaluation on three medical image segmentation datasets and the XBAT+ datasets demonstrated performance gains across multiple quantitative metrics.

10.
arXiv (CS.CL) 2026-06-18

Enhancing Decision-Making with Large Language Models through Multi-Agent Fictitious Play

Large language model (LLM)-based multi-agent systems (MAS) have demonstrated great potential in solving tasks with execution complexity, by distributing subtasks across cooperative agents. However, this divide-and-conquer paradigm falls short on decision-making tasks that are also prevalent in the real world. These tasks require simultaneous reasoning from the stances of all involved stakeholders whose decisions are mutually dependent and thus cannot be solved in isolation. We characterize this challenge as stance entanglement, a form of decision complexity distinct from execution complexity. To address it, we propose Multi-Agent Fictitious Play (MAFP), a novel MAS paradigm that represents stakeholder stances as agents and formulates decision-making as an equilibrium-seeking process. Built on the game-theoretic principle of fictitious play, MAFP iteratively updates each agent's decision by best responding to the empirical mixture of other agents' past decisions. This enables agents to expose and address one another's weaknesses, progressively improving decision quality and robustness. We evaluate MAFP on challenging decision-making tasks that test the capability of deciding strategies for competitive scenarios prior to acting. MAFP outperforms both single-round and multi-round baselines on two complementary metrics, tournament strength and robustness, demonstrating its effectiveness in addressing stance entanglement.

11.
arXiv (CS.AI) 2026-06-19

Modeling Day-Long ECG Signals to Predict Heart Failure Risk with Explainable AI

arXiv:2601.00014v2 Announce Type: replace-cross Abstract: Heart failure (HF) affects 11.8% of adults aged 65 and older, reducing quality of life and longevity. Preventing HF can reduce morbidity and mortality. We hypothesized that artificial intelligence (AI) applied to 24-hour single-lead electrocardiogram (ECG) data could predict the risk of HF within five years. To research this, the Technion-Leumit Holter ECG (TLHE) dataset, including 69,663 recordings from 47,729 patients, collected over 20 years was used. Our deep learning model, DeepHHF, trained on 24-hour ECG recordings, achieved an area under the receiver operating characteristic curve of 0.80 that outperformed a model using 30-second segments and a clinical score. High-risk individuals identified by DeepHHF had a two-fold chance of hospitalization or death incidents. Explainability analysis showed DeepHHF focused on arrhythmias and heart abnormalities. This study highlights the feasibility of deep learning to model 24-hour continuous ECG data, capturing paroxysmal events essential for reliable risk prediction. Artificial intelligence applied to single-lead Holter ECG is non-invasive, inexpensive, and widely accessible, making it a promising tool for HF risk prediction.

12.
arXiv (CS.CV) 2026-06-16

Systematic Evaluation of Novel View Synthesis for Video Place Recognition

The generation of synthetic novel views has the potential to positively impact robot navigation in several ways. In image-based navigation, a novel overhead view generated from a scene taken by a ground robot could be used to guide an aerial robot to that location. In Video Place Recognition (VPR), novel views of ground locations from the air can be added that enable a UAV to identify places seen by the ground robot, and similarly, overhead views can be used to generate novel ground views. This paper presents a systematic evaluation of synthetic novel views in VPR using five public VPR image databases and seven typical image similarity methods. We show that for small synthetic additions, novel views improve VPR recognition statistics. We find that for larger additions, the magnitude of viewpoint change is less important than the number of views added and the type of imagery in the dataset.

13.
arXiv (math.PR) 2026-06-12

Storage and Transport Capacity Design for a Self-Reliable Two-Node Stochastic Resource System

arXiv:2606.12707v1 Announce Type: cross Abstract: We study a two-node stochastic resource system operating over a finite horizon. Each node experiences uncertain supply and demand and is equipped with finite storage. The objective is to ensure that resource levels remain within prescribed limits with high probability. To this end, we formulate a chance-constrained capacity-design problem in which resources can be exchanged through a capacity-limited transport link. We characterize the minimum storage required at each node, derive the optimal transport policy, and quantify the trade-off between storage and transport capacities. Our results show the existence of a critical transport-capacity threshold that enables full risk pooling between the nodes. Moreover, this threshold decreases with the operating horizon, implying that full-pooling performance can be achieved with progressively smaller transport capacity over longer horizons.

14.
arXiv (CS.LG) 2026-06-11

Bernstein-Schur Kernels: Random Features by Sketched Modulation and Radial Randomization

arXiv:2606.11255v1 Announce Type: new Abstract: Bernstein–Schur kernels are products of a finite-feature kernel (one with an explicit finite-dimensional feature map) and a completely monotone shift-invariant kernel: nonstationary kernels that fall between the shift-invariant and dot-product templates random features usually exploit, so in general neither Bochner sampling nor polynomial sketching applies to the full kernel directly. We give one random-feature construction for the whole class that randomizes both factors: it sketches the finite modulation and randomizes the completely monotone radial factor, sampling the latter's one-dimensional Bernstein–Widder scale and then applying Gaussian random Fourier features (whose frequency is still $d$-dimensional). The feature dimension is then $Dm$, set by the sketch size $m$ and the radial-draw count $D$, free of the $O(d^2)$ size of the exact modulation feature. Keeping the modulation \emph{exact is the analyzable limit ($m\to\infty$): there we prove unbiasedness, an exact variance for the recommended flat estimator, an expected matrix-Bernstein operator-norm bound (with a matching high-probability tail) controlled by the top eigenvalues of the kernel and modulation Gram matrices together with an intrinsic dimension rather than the crude $N\max_{ij}$ entrywise route, and a deterministic relative-spectral kernel-ridge stability result. By conditioning on the sketch, the doubly-randomized estimator inherits the same intrinsic-dimension operator-norm guarantee plus a single additive sketch term, tunable by $m$ independently of $D$. The motivating instance is the biased $yat$-kernel $k_{yat,b}(w,x)=(w^\top x+b)^2/(\|w-x\|^2+\varepsilon)$, $b\ge0$, whose family span contains the inverse-multiquadric kernel by finite differences in $b$; for it the radial mixture is the IMQ spectral sampler, and one frequency per scale is variance-optimal at a fixed radial-feature budget.

15.
arXiv (quant-ph) 2026-06-11

On the Addressability Problem on CSS Codes

arXiv:2502.13889v4 Announce Type: replace Abstract: Recent discoveries in asymptotically good quantum codes have intensified research on their application in quantum computation and fault-tolerant operations. This study focuses on the addressability problem within CSS codes: we ask what circuits might implement logical gates on strict subsets of logical qubits. With some notion of fault-tolerance, we prove several impossibility results: for CSS codes with non-zero rate, one cannot address a logical $H$, $HS$, $SH$, or $\mathsf{CNOT}$ to any non-empty strict subset of logical qubits using a circuit made only from 1-local Clifford gates. Furthermore, we show that one cannot permute the logical qubits in a code purely by permuting the physical qubits, if the rate of the code is (asymptotically) greater than 1/3 and the distance is at least 3. We can show a similar no-go result for $\mathsf{CNOT}$s and $\mathsf{CZ}$s between two such high-rate codes, albeit under a more restrictive assumption on the circuit, which we call "global" (though recent addressable CCZ gates use global circuits). This work pioneers the study of distance-preserving addressability in quantum codes, mainly by considering automorphisms of the code. This perspective offers new insights and potential directions for future research. We argue that studying this trade off between addressability and efficiency of the codes is essential to understand better how to do efficient quantum computation.

16.
arXiv (CS.LG) 2026-06-16

Simulation-Augmented Multi-Step Split Conformal Prediction for Aggregated Forecasts

arXiv:2606.16356v1 Announce Type: new Abstract: We study uncertainty quantification for aggregated forecasting tasks such as annual totals and year-over-year growth rates. We propose SA-MSCP, a simulation-augmented multi-step split conformal method that generates future paths from cross-validated residuals using a block bootstrap and constructs prediction intervals from empirical quantiles. Experiments show that SA-MSCP improves empirical coverage over a simulated-path baseline for aggregated and growth-rate targets. Our results demonstrate that simulation-enhanced conformal calibration is an effective and general framework for uncertainty quantification in aggregated time-series forecasting.

17.
arXiv (CS.CL) 2026-06-18

SciHorizon-GENE: Benchmarking LLM for Life Sciences Inference from Gene Knowledge to Functional Understanding

Large language models (LLMs) have shown growing promise in biomedical research, particularly for knowledge-driven interpretation tasks. However, their ability to reliably reason from gene-level knowledge to functional understanding, a core requirement for knowledge-enhanced cell atlas interpretation, remains largely underexplored. To address this gap, we introduce SciHorizon-GENE, a large-scale gene-centric benchmark constructed from authoritative biological databases. The benchmark integrates curated knowledge for over 190K human genes and comprises more than 540K questions covering diverse gene-to-function reasoning scenarios relevant to cell type annotation, functional interpretation, and mechanism-oriented analysis. Motivated by behavioral patterns observed in preliminary examinations, SciHorizon-GENE evaluates LLMs along four biologically critical perspectives: research attention sensitivity, hallucination tendency, answer completeness, and literature influence, explicitly targeting failure modes that limit the safe adoption of LLMs in biological interpretation pipelines. We systematically evaluate a wide range of state-of-the-art general-purpose and biomedical LLMs, revealing substantial heterogeneity in gene-level reasoning capabilities and persistent challenges in generating faithful, complete, and literature-grounded functional interpretations. Our benchmark establishes a systematic foundation for analyzing LLM behavior at the gene scale and offers insights for model selection and development, with direct relevance to knowledge-enhanced biological interpretation.

18.
bioRxiv (Bioinfo) 2026-06-15

oxo-flow: compiled, memory-safe bioinformatics workflow orchestration

作者:

Bioinformatics analyses depend on workflow engines to coordinate dozens of computational tools across complex dependency chains. The most widely adopted engines-Snakemake, Nextflow, the Common Workflow Language (CWL), and the Workflow Description Language (WDL)-run on interpreted or just-in-time (JIT) compiled language runtimes, incurring hundreds of milliseconds of startup latency and providing no compile-time safety guarantees from the host language. We developed oxo-flow, a workflow engine written in Rust that compiles to a single native binary. On an Apple M5 processor, oxo-flow parses, validates, and dry-runs a production-scale workflow in roughly 22 milliseconds-before Snakemake or Nextflow have finished loading their runtime environments. Peak memory usage is 16 megabytes, representing six- to seven-fold reductions relative to Snakemake and Nextflow. Dry-run latency is essentially independent of workflow size: a hundred-fold increase in rule count adds approximately 0.4 milliseconds. oxo-flow integrates 31 command-line tools, a REST interface with 60 endpoints, an embedded web application, and native cluster submission into a single 10-megabyte binary. It provides per-rule environment isolation across seven backends, checkpoint-based fault tolerance with cryptographic output verification, and a formal installation and operational qualification protocol for regulated laboratory environments. Ten curated workflows and three demonstration pipeline repositories are available. oxo-flow is freely available under Apache License 2.0 at https://github.com/Traitome/oxo-flow.

19.
arXiv (CS.AI) 2026-06-17

FlowRAG: Synergizing Explicit Reasoning via Frequency-Aware Multi-Granularity Graph Flow

arXiv:2606.17856v1 Announce Type: new Abstract: Graph-based retrieval-augmented generation (GraphRAG) is effective for knowledge-intensive and multi-hop query tasks; however, many existing methods primarily seed entity-based graphs and rely on implicit semantic relevance propagation. This often (i) under-retrieves when user queries are abstract and semantically sparse at the entity level, and (ii) suffers from brittle multi-hop reasoning, where noisy activations can derail entity-to-entity transitions and corrupt the inferred relation chain, yielding unreliable conclusions. To this end, we propose \texttt{FlowRAG}, a semantic-aware retrieval framework that improves both semantic recall and explicit reasoning. Specifically, \texttt{FlowRAG} constructs a quad-level heterogeneous graph over passages, summaries, sentences, and entities, where summary nodes serve as a coarse semantic hub. At retrieval time, a dual-granularity activation module combines summary–query alignment with sentence-level matching to activate relevant entities under paraphrase and abstraction robustly. We then introduce a frequency-aware weighted flow module that routes relevance through entity–passage links weighted by within-passage term frequency, pruning noisy connections and extracting high-confidence reasoning paths as an explicit logic skeleton for generation. Extensive experiments show that \texttt{FlowRAG} obtains state-of-the-art performance on complex reasoning benchmarks.

20.
arXiv (quant-ph) 2026-06-11

Random Grover Search

arXiv:2606.11759v1 Announce Type: new Abstract: Grover's algorithm achieves a quadratic speedup for unstructured search given a global oracle for the target set. In many applications, however, the target set is specified as the intersection of multiple constraint sets. Constructing a global oracle for the intersection can be costly, whereas the individual constraint oracles are often much simpler to implement. We study a randomized Grover search algorithm that directly uses these constraint oracles. At each iteration, one of the corresponding Grover operators is selected at random. For the two-operator case with uniform sampling, we prove that the success probability approaches one after \[ \Theta \left(\frac\pi4\sqrt{\frac{N}{r}}\right) \] iterations, where $r$ is the size of the intersection. Thus, the algorithm achieves the same asymptotic query complexity as standard Grover search but without requiring a global oracle. We then generalize the analysis to arbitrary sampling distributions and an arbitrary number of Grover operators through an auxiliary operator that approximates the expected Grover evolution, while retaining the same asymptotic complexity. We further show that highly biased sampling distributions can still achieve near-unit success probability, enabling cheaper Grover operators to be used more frequently. Finally, we prove asymptotic optimality and support the theoretical results with numerical simulations.

21.
arXiv (CS.CL) 2026-06-16

Rethinking the Role of Efficient Attention in Hybrid Architectures

Modern language models increasingly adopt hybrid architectures that combine full attention with efficient attention modules, such as sliding-window attention (SWA) and recurrent sequence mixers. However, how these efficient modules shape model capabilities remains poorly understood. To address this gap, we conduct a systematic analysis across hybrid architectures from three perspectives: scaling behavior, mechanism analysis, and architecture design. First, from a scaling perspective, we find that efficient-attention design primarily affects how fast long-context capability emerges, while different hybrids eventually converge to comparable long-context performance under sufficient training. Second, mechanistically, we show that long-range retrieval is mainly carried by full attention, whereas efficient attention shapes its optimization trajectory. This explains a counter-intuitive phenomenon we call Large-Window Laziness: larger SWA windows can delay the formation of retrieval heads in full-attention layers. Third, guided by this mechanism, we show that applying NoPE to only the full-attention layers of a small-window SWA hybrid substantially improves long-context performance with negligible impact on short-context performance.

22.
arXiv (CS.CV) 2026-06-11

Contactless 3D Human Body Measurement Using Depth Cameras for Smart Health Monitoring

Contactless body measurement technologies are becoming increasingly significant for smart health monitoring, digital health applications, and remote patient assessment. Traditional anthropometric measurements typically necessitate physical contact and trained personnel, which may constrain scalability in remote healthcare settings. In this study, we introduce a depth camera-based framework for estimating human body measurements utilizing 3D point cloud data. An Orbbec Astra 2 depth camera was employed to capture RGB images, depth maps, and 3D point clouds of participants. The captured point cloud was processed using Python-based tools, including Open3D, NumPy, and OpenCV, to segment the human body from the background. Key anthropometric measurements, such as height and arm span, were computed. The measurements were obtained through a combination of spatial filtering and landmark selection on the 3D point cloud, followed by the projection of the computed measurements onto the corresponding RGB image using camera intrinsic parameters. In addition to linear measurements, the approximate body volume and visible surface area were estimated using voxel-based occupancy analysis and mesh-based surface reconstruction methods. The experimental results from a single depth capture demonstrated that accurate body measurements and geometric estimates could be obtained from depth camera data without physical contact. This study provides a foundation for future real-time systems that integrate depth sensing with intelligent health monitoring and generative AI models for smart healthcare applications.

23.
arXiv (CS.LG) 2026-06-12

Dense Supervision, Sparse Updates: On the Sparsity and Geometry of On-Policy Distillation

arXiv:2606.13657v1 Announce Type: new Abstract: On-policy distillation (\textsc{OPD}) has recently become a prominent post-training recipe as it combines two desirable ingredients: on-policy student trajectories and dense teacher supervision, yet how this hybrid changes a model's parameters remains unclear. Across several language and vision-language model pairs and use cases, our analysis yields two main findings. On sparsity, \textsc{OPD}-style updates are small and coordinate-sparse. They are distributed across layers and are usually FFN-heavy. This sparse structure is operationally useful: training only the discovered subnetwork recovers nearly the same performance as full \textsc{OPD}. However, the sparsity-inducing SGD optimizer underperforms AdamW in our optimizer ablation, likely because dense teacher supervision preserves heterogeneous coordinate-wise gradient scales where AdamW's adaptive scaling remains useful. On geometry, the updates are numerically full-rank but spectrally concentrated; they lie mostly away from the principal singular subspaces of the source weights and fall disproportionately on coordinates where the source weights are close to zero. These findings suggest that dense teacher supervision does not turn \textsc{OPD} into ordinary dense parameter rewriting; instead, \textsc{OPD} retains important geometric signatures of on-policy post-training.

24.
arXiv (CS.LG) 2026-06-11

Spectrally Regularized Latent Flow Matching for Turbulence Generation

arXiv:2606.11691v1 Announce Type: new Abstract: Latent diffusion and flow matching have emerged as leading approaches for synthetic turbulence generation, yet they systematically under-represent dissipation-range amplitudes. We introduce a latent flow matching framework with a spectrally regularized compression stage that directly targets this failure mode. On a 256^2 DNS dataset at Re_f \approx 2250, replacing an MSE-trained VAE with a zone-weighted log-spectral objective raises deep-dissipation retained spectral power from 25% to 94% in reconstruction and from 20% to 79% in unconditional generation. The improved latent representation also yields a substantially better sampling cost-fidelity tradeoff: the MSE-trained latent space imposes a fundamental quality ceiling near DD bias -0.70 that no integrator or step-count can overcome, while the spectrally regularized latent space reaches DD bias -0.117 at just 20 function evaluations. Mechanistically, encoder-decoder swap experiments show that the improvement is driven primarily by encoder-induced latent reorganization rather than decoder capacity, while a support-amplitude decomposition reveals that MSE-trained models behave as conservative suppression models, minimizing pointwise error by attenuating intermittent high-wavenumber structure. Both pipelines recover the second-order structure function and the correct sign of S_3, indicating the correct cascade direction without explicit supervision. A small residual gap in the magnitude of S_3 suggests that phase-coherent triadic organization remains a complementary axis to amplitude fidelity for future generative turbulence models.

25.
arXiv (CS.CV) 2026-06-15

A Unified Theory of Sinusoidal Activation Families for Implicit Neural Representations

Implicit Neural Representations (INRs) model continuous signals with compact neural networks and have become a standard tool in vision, graphics, and signal processing. A central challenge is accurately capturing fine detail without heavy hand-crafted encodings or brittle training heuristics. Across the literature, periodic activations have emerged as a compelling remedy: from SIREN, which uses a single sinusoid with a fixed global frequency, to more recent architectures employing multiple sinusoids and, in some cases, trainable frequencies and phases. We study this family of sinusoidal activations and develop a principled theoretical and practical framework for trainable sinusoidal activations in INRs. Concretely, we instantiate this framework with Sinusoidal Trainable Activation Functions (STAF), a Fourier-like activation whose amplitudes, frequencies, and phases are learned. Our analysis (i) establishes a Kronecker-equivalence construction that expresses trainable sinusoidal activations with standard sine networks and quantifies expressive growth, (ii) characterizes how the Neural Tangent Kernel (NTK) spectrum changes under trainable sinusoidal parameterization, and (iii) provides an initialization that yields standard normal post-activations without asymptotic central limit theorem (CLT) arguments. Empirically, on images, audio, shapes, inverse problems (super-resolution, denoising) and NeRF, STAF is competitive and often stronger on distortion-oriented reconstruction metrics such as PSNR/SSIM across the evaluated INR tasks, with favorable parameter efficiency under layer-wise sharing. While periodic activations can alleviate practical manifestations of spectral bias, our results indicate they do not eliminate it; instead, trainable sinusoids can improve the observed capacity-optimization trade-off in the evaluated settings.