Academic Intelligence · Curated Daily

探索全球前沿学术脉络

AcademicHub 汇聚顶级期刊与预印本平台的实时文献。定制您的专属科研雷达,利用大语言模型自动生成交叉领域文献分析简报。

01.
arXiv (CS.CV) 2026-06-11

Semantically-Aware Diver Activity Recognition Framework for Effective Underwater Multi-Human-Robot Collaboration

Effective multi-human-robot collaboration is essential for expanding human-led operations in the challenging and high-risk underwater environment. For autonomous underwater vehicles (AUVs) to become true teammates, they must be able to comprehend their surroundings and recognize a diver's activities to offer assistance and ensure safety. Towards this goal, we introduce DAR-Net, a novel transformer-based framework that analyzes complex underwater scenes to classify diver activities. Our contribution lies in a semantically guided learning formulation that couples transformer-based temporal reasoning with pixel-level scene supervision. This multi-loss training strategy explicitly aligns global activity recognition with local human-robot interaction semantics, which is particularly critical in low-visibility underwater conditions. To address the significant challenge of data scarcity in this domain, we present the first-ever Underwater Diver Activity (UDA) dataset, a foundational resource containing over 2,600 annotated images with pixel-level masks. Through rigorous experimental evaluations in a controlled environment, we demonstrate that DAR-Net achieves promising accuracy in recognizing six distinct diver activities, outperforming state-of-the-art models. While this dataset provides a crucial baseline, our work serves as a pioneering step, laying the groundwork for future research and facilitating the development of more intelligent, collaborative underwater robotic systems.

02.
arXiv (CS.AI) 2026-06-17

WallZero: Mastering the Game of WallGo with Strategic Analysis

arXiv:2606.17847v1 Announce Type: new Abstract: WallGo is a recently introduced strategic board game popularized by the 2025 Netflix series The Devil's Plan. Although played on a small 7 x 7 board, its combination of stone movement and wall placement yields high game-tree complexity and intricate strategic interactions. Despite its growing popularity, WallGo remains underexplored. This paper presents WallZero, an AlphaZero-based agent for the two-player WallGo setting. We introduce tailored action and feature designs to improve playing performance significantly. In the evaluation, WallZero defeats two professional Go players who participated in this study, securing on average 1.98x more territory per game. Beyond its strength, we use WallZero to assess game fairness and identify key strategies for mastering WallGo. Interestingly, our results show that the opening used in the Netflix series yields a more balanced game. Our code is available at https://rlg.iis.sinica.edu.tw/papers/wallzero.

03.
arXiv (CS.AI) 2026-06-12

Boosting Direct Preference Optimization with Penalization

作者:

arXiv:2606.12505v1 Announce Type: cross Abstract: Offline preference optimization has become a practical substitute for reinforcement learning from human feedback, but pairwise objectives such as Direct Preference Optimization (DPO) and its variants use only the chosen and rejected responses stored in a static dataset. This leaves a useful signal unused: the response that the reference model itself would generate for the same prompt. We propose Direct Preference Optimization with Penalization (DPOP), a simple extension of DPO that augments the base preference loss with a gated penalty on reference-greedy responses. DPOP activates this penalty only when the current policy still assigns a lower likelihood to the preferred response than to the rejected response. On AlpacaEval 2.0, DPOP improves length-controlled win rate over DPO, SimPO, and AlphaDPO on both Llama-3-8b-it and Gemma-2-9b-it, achieving relative gains of 5.3\% and 4.4\% over baselines on the two models, respectively. Ablations further show that a SimNPO-style length-normalized penalty is stronger than NPO and token-level unlikelihood in this setting.

04.
arXiv (CS.LG) 2026-06-16

Latent space mapping of interpretable structural coordinates from stochastic single-molecule signals

arXiv:2606.16950v1 Announce Type: cross Abstract: Nanopores are versatile single-molecular sensors, but their utility is fundamentally constrained by stochastic translocation dynamics warping any encoded information. We resolve it by shifting from time-domain analysis to a learned latent-space mapping via a contrastive encoder trained exclusively on simulated signals from a physics-informed model. This encoder maps solid-state nanopore signals of engineered DNA barcodes into an interpretable molecular coordinate system. The learned representation is responsive to structural barcode parameters while remaining invariant to acquisition conditions and translocation conformation, allowing data pooling across devices. Molecule identification requires a single pass through the encoder, reducing computational cost by three orders of magnitude relative to alignment-based methods. We experimentally validate through mixture quantification, rare-variant detection, consensus barcode reconstruction, and real-time signal acquisition. This shift from temporal analysis to mapping structural coordinates into a latent space changes the paradigm behind analyzing stochastic sensor signals by linking classification to interpretable encoded molecular information.

05.
arXiv (CS.CV) 2026-06-12

High-Fidelity Two-Step Image Generation via Teacher-Aligned End-to-End Distillation

Few-step diffusion distillation has become increasingly mature for 4-8-step generation, yet pushing further to 2 steps remains challenging. In this work, we introduce Z-Image Turbo++, a high-quality 2-step image generation model distilled from the 8-step Z-Image Turbo teacher. Our method addresses the central bottlenecks of increased task difficulty and limited model capacity in 2-step generation through three simple but effective design choices tailored to this regime. First, we propose Distribution-Aligned Adversarial Learning, which uses teacher-generated images rather than external real images as real samples for GAN training, providing a more attainable and informative adversarial target. Second, we adopt Step-Decoupled Parameterization, assigning independent model parameters to the two denoising steps to better match their distinct capacity demands. Third, we perform End-to-End Training with Iterative Regularization, allowing the first step to receive gradients from final image quality while preserving a meaningful intermediate generation through an explicit step-1 loss. Together, these designs substantially narrow the quality gap between 2-step and 8-step generation in both qualitative and quantitative evaluations, highlighting the potential of carefully tailored distillation strategies for improving the quality-efficiency trade-off in few-step generation.

06.
arXiv (quant-ph) 2026-06-15

A Collective-Spin Derivation of the Uniform Magnon Hamiltonian in Cavity Magnonics

arXiv:2606.13830v1 Announce Type: cross Abstract: We present a direct collective-spin derivation of the effective uniform-mode Hamiltonian used in cavity magnonics. Starting from a nearest-neighbor Heisenberg ferromagnet coupled to long-wavelength magnetic fields, we show that the relevant dynamics can be restricted to the fully symmetric spin sector, where the exchange interaction contributes only a constant energy shift and the ferromagnet behaves as a macrospin of length $Ns$. Applying the Holstein–Primakoff transformation directly to this total spin yields the usual uniform magnon mode and its leading nonlinear corrections without first introducing site-resolved bosonic operators. This collective formulation makes explicit the interpretation of the ferromagnet as a synthetic large-spin atom and provides a compact route to the effective Hamiltonians used in driven and Floquet cavity magnonics. As a physical consequence, the leading nonlinear correction produces an occupation-dependent reduction of the effective magnon–photon coupling, providing a simple signature of finite-spin saturation under strong uniform-mode driving.

07.
arXiv (CS.AI) 2026-06-15

The Silent Cost of Artificial Intelligence Assistance: A Theory of Autonomy Surrender, the Recovery Mechanism, and the Restoration of Human Agency

arXiv:2606.13962v1 Announce Type: cross Abstract: The integration of artificial intelligence into human decision-making environments has introduced a previously undertheorized cost: the gradual surrender of human autonomy in exchange for access to information and computational assistance. Building on the Human Identity and Autonomy Gap (HIAG) framework, this paper advances a theoretical model of autonomy surrender as a measurable, cumulative process driven by cognitive bandwidth depletion. The model proposes three interacting mechanisms: the silent cost of AI assistance, in which autonomy is transferred incrementally and without awareness; the surrender threshold, beyond which reclaiming autonomous function becomes cognitively and psychologically difficult; and the recovery mechanism, which establishes the design obligation and the ethical responsibility accompanying deliberate human re-assumption of control. The paper argues that human re-entry into the decision loop is not a passive option but an active cognitive event requiring intentional bandwidth restoration. The design of AI systems must incorporate structured re-entry pathways, here termed recovery mechanisms, that preserve human agency while appropriately distributing responsibility. The model further predicts a terminal state, here termed preference inversion, in which functional dependence on AI assistance is experienced not as a deficit but as a preference, transforming the restoration of autonomy from a design problem into a cultural and political one. Implications are drawn for AI system design, governance frameworks, and human factors research.

08.
arXiv (CS.AI) 2026-06-19

Frequency-Aware Flow Matching for Continuous and Consistent Robotic Action Generation

arXiv:2606.20135v1 Announce Type: cross Abstract: Flow matching has emerged as a standard paradigm for robotic manipulation owing to its strong expressive power for modelling complex, multimodal action distributions, alongside similar approaches like diffusion policy. However, existing methods rely on discretized action chunks, making them brittle to demonstrations collected at heterogeneous control frequencies and prone to temporally inconsistent actions that degrade control stability. In this paper, we propose Frequency-Aware Flow Matching (FAFM), which outputs continuous, temporally consistent actions. To handle heterogeneous frequency input, we transform discrete action sequences into the frequency domain with the discrete cosine transform (DCT), perform flow matching over the resulting coefficients, and reconstruct continuous actions via cosine basis expansion. To generate temporally consistent actions, we regularize the first-order temporal derivative to promote smooth actions. This corresponds to a Sobolev-type constraint that suppresses high-frequency errors and discourages abrupt action changes. Our FAFM is simple, introduces no additional network parameters and applies to standalone flow-matching policies and vision-language action models. Across synthetic toy benchmark, obstacle avoidance, LapGym, and LIBERO, FAFM improves success rates, multimodal expressivity, motion smoothness, convergence speed, robustness to mechanical bias and mixed-frequency input. These gains are consistent when deployed on a real-world Franka robot. Code available at https://anonymous.4open.science/r/FAFM.

09.
arXiv (CS.CV) 2026-06-16

Geometric Action Model for Robot Policy Learning

Generalist robot policies must follow user instructions while reasoning about how objects, cameras, and robot actions interact in the 3D physical world. Recent vision-language-action models (VLAs) and video world-action models (WAMs) inherit strong semantic or temporal priors from large-scale foundation models, but they still operate primarily on 2D image frames or 2D-derived latent spaces, leaving implicit the 3D geometry required for contact-rich manipulation. We propose the Geometric Action Model (GAM), a language-conditioned manipulation policy that directly repurposes a pretrained geometric foundation model (GFM) as a shared substrate for perception, temporal prediction, and action decoding. GAM splits the GFM at an intermediate layer: the shallow layers serve as an observation encoder, and a causal future predictor inserted at the split layer forecasts future latent tokens conditioned on language, proprioception, and action history. The predicted future tokens are then routed through the remaining GFM blocks for feature propagation and decoding, allowing a single backbone to produce both future geometry and actions. This design equips the GFM with language-conditioned temporal world modeling through minimal architectural modification while preserving its rich geometric priors. Across a broad suite of simulation and real-robot manipulation benchmarks, GAM is more accurate, more robust, faster, and lighter than current foundation-model-scale baselines.

10.
arXiv (CS.CL) 2026-06-12

Given, When, Then, Again: Mining Subscenario Refactoring Candidates in Behaviour-Driven Test Suites with ML Classifiers and LLM-Judge Baselines

Context. Behaviour-Driven Development (BDD) test suites accumulate duplicated step subsequences. Three published refactoring patterns are available (within-file Background, within-repo reusable-scenario invocation, cross-organisational shared higher-level step), but no prior work automates which recurring subsequences are worth extracting or which mechanism applies. Objective. Rank recurring step subsequences ("slices") by refactoring suitability (extraction-worthy), pre-map each to one of the three patterns, and quantify prevalence across the public BDD ecosystem. Method. Every contiguous L-step window (L in [2, 18]) in a 339-repository / 276-upstream-owner Gherkin corpus is keyed by paraphrase-robust cluster identifiers and counted under three scopes. SBERT / UMAP / HDBSCAN clustering recovers paraphrase-equivalent slices. Three authors label a stratified 200-slice pool against a written rubric. An XGBoost extraction-worthy classifier trained under 5-fold cross-validation is compared with a tuned rule baseline and two open-weight Large Language Model (LLM) judges. Results. The miner produces 5,382,249 slices collapsing to 692,020 recurring patterns. Three-author Fleiss' kappa = 0.56 (extraction-worthy) and 0.79 (mechanism). The classifier reaches out-of-fold F1 = 0.891 (95% CI [0.852, 0.927]), outperforming both the rule baseline (F1 = 0.836, p = 0.017) and the better LLM judge (F1 = 0.728, p = 1.5e-4). 75.0%, 59.5%, and 11.7% of scenarios carry a within-file Background, within-repo reusable-scenario, and cross-organisational shared-step candidate, respectively; the figures are stable under a sweep of the classifier decision threshold. Conclusion. Paraphrase-robust subscenario discovery yields a corpus-wide census of BDD refactoring candidates; pipeline, classifier predictions, labelled pool, and rubric are released under Apache-2.0.

11.
arXiv (CS.LG) 2026-06-11

Calibrating Decision Robustness via Inverse Conformal Risk Control

arXiv:2510.07750v3 Announce Type: replace-cross Abstract: Robust optimization safeguards decisions against uncertainty by optimizing against worst-case scenarios, yet their effectiveness hinges on a prespecified robustness level that is often chosen ad hoc, leading to either insufficient protection or overly conservative and costly solutions. Recent approaches using conformal prediction construct data-driven uncertainty sets with finite-sample coverage guarantees, but they still fix coverage targets a priori and offer little guidance for selecting robustness levels. We propose a new framework that provides distribution-free, finite-sample guarantees on both miscoverage and regret for any family of robust predict-then-optimize policies. Our method constructs valid estimators that trace out the miscoverage–regret Pareto frontier, enabling decision-makers to reliably evaluate and calibrate robustness levels according to their cost–risk preferences. The framework is simple to implement, broadly applicable across classical optimization formulations, and achieves sharper finite-sample performance. This paper offers a principled data-driven methodology for guiding robustness selection and empowers practitioners to balance robustness and conservativeness in high-stakes decision-making.

12.
medRxiv (Medicine) 2026-06-22

Panel-level multilocus methylation quantification in native cell-free DNA by PCR-compatible sequential enzymatic processing

DNA methylation is informative for liquid biopsy, but low template abundance, distributed methylation signals and workflow complexity limit implementation. Here we present Delta-HLD, a PCR-compatible methylation assay platform that quantifies methylation directly in native DNA through sequential hybridization, ligation and methylation-sensitive digestion. The assay co-reports methylation-dependent signals from multiple loci through a shared amplification architecture, generating a single panel-level PCR readout. We established the chemistry, optimized panel size and composition through model-guided experiments, and implemented the assay as a triplex qPCR workflow with per-sample internal process controls. Plasma proof-of-concept analyses showed discriminatory signal in CRC and proof-of-concept transferability to hepatocellular carcinoma. Additional platelet-retaining experiments identified a strategy to increase recovery of analyzable circulating templates while reducing genomic DNA recognition. Delta-HLD provides a compact PCR-compatible framework for low-input methylation analysis without base conversion.

13.
arXiv (CS.AI) 2026-06-15

VeriGeo: Controllable Geometry Question Generation with Numerical and Analytical Verification

arXiv:2606.14176v1 Announce Type: new Abstract: Geometry problem generation is useful for AI-assisted education and multimodal mathematical reasoning, but reliable synthesis remains difficult because the problem statement, diagram, constraints, and solution should be mutually consistent. Existing methods often trade off controllability and reliability: seed-based rewriting is flexible but weakly verifiable, whereas diagram-first construction improves validity but is less suited to arbitrary user-specified constraints. We introduce VeriGeo, a controllable geometry generation framework grounded in executable reasoning traces. Given user constraints such as target concepts and difficulty, an Author agent generates a problem and diagram, and a Solver agent produces a proof-aligned solution. Both agents use a shared action sequence that connects natural language, diagrams, geometric constraints, and proof steps into a verifiable representation. A three-stage pipeline checks numerical consistency, analytical realizability, and global consistency, using verification-guided reflection to repair recoverable failures and reject unrecoverable ones. Across five LLM backbones, raw generations frequently fail these checks, while VeriGeo repairs a substantial fraction of the invalid attempts. Supervised fine-tuning on 8.7k examples generated by VeriGeo achieves the best reported GeoQA performance among end-to-end multimodal LLM-based solvers, and obtains strong results on PGPS9K and MathVista-GPS, demonstrating the effectiveness of verified synthetic data for improving multimodal geometry reasoning.

14.
arXiv (CS.CL) 2026-06-17

Evaluating Second-Order Bias of LLMs Through Epistemic Entitlement

Evaluations of social bias in LLMs largely focus on whether models generate or imply biased content. However, as LLMs are increasingly used as judges of bias, they may exhibit social biases in subtler ways in how they evaluate biased content, which current methods do not systematically capture. We call this second-order bias: social bias in an LLM's judgment about social bias, which we evaluate through a novel, philosophically grounded reasoning task. Drawing on entitlement epistemology, we conceptualize bias as misplaced foundational knowledge that shapes an agent's rational inquiry, and derive a logical reasoning task for LLMs to judge to whom a biased text is acceptable or non-acceptable. We develop two simple metrics to measure how biased LLM judges are in inferring demographics for acceptability without sufficient support, and how these inferences vary across groups targeted by biased texts. Evaluating open and closed models, we find that our task evades safety guardrails by surfacing bias in model judgment. It varies systematically across target groups, reflects implicit social maps, and shows how models are still triggered by demographic labels. Our work points to the need for LLM bias evaluation in judgment tasks and broadly, for more theoretically grounded approaches to bias evaluation in NLP. We release our code and model responses at https://github.com/uofthcdslab/second-order-bias.

15.
arXiv (CS.CL) 2026-06-16

Contrastive-Difference CKA Reveals Concept-Specific Structural Alignment Across Language Model Architectures

作者:

Do different LLM architectures encode high-level concepts in structurally compatible ways? We systematically characterize a geometric-functional universality dissociation: across multiple concept domains and architectural families, moderate geometric convergence coexists with near-perfect functional transfer. Using contrastive-difference CKA (CKA_Delta), a training-free diagnostic that computes kernel alignment on per-sample contrastive differences, we isolate concept-specific convergence from generic similarity – achieving significant discrimination where standard CKA cannot. The dissociation replicates across all six concept domains we test (five with p =70B models. We position CKA_Delta as a practical regime classifier and architectural outlier detector (Gemma: d = 1.08, AUC = 0.79) rather than an absolute transfer-accuracy predictor, providing a training-free diagnostic for cross-architecture concept monitoring.

16.
arXiv (CS.LG) 2026-06-16

Zero-order Parameter-free Optimization for LMO-based Methods: Novel Approach for Efficient Fine-tuning

arXiv:2606.14970v1 Announce Type: new Abstract: Fine-tuning large language models (LLMs) has become a central application of modern optimization, enabling pretrained models to adapt to diverse downstream tasks and domain-specific data. A major obstacle in large-scale fine-tuning is the memory overhead of backpropagation, which requires storing activations, gradients, and optimizer states. Zeroth-order (ZO) optimization offers a memory-efficient alternative, but its performance is highly sensitive to the stepsize and smoothing parameter, often requiring costly task-specific tuning. Parameter-free (PF) optimization addresses this issue by adapting algorithmic parameters without prior knowledge of problem-dependent constants. Moreover, large-scale fine-tuning can benefit from geometry-aware updates that account for the heterogeneous structure of parameter blocks, which can be modeled through methods that exploit linear minimization oracle (LMO). In this work, we study PF adaptation for LMO-based ZO optimization and introduce $\texttt{AdaNAGED}$, a method that unifies gradient-free training, adaptive tuning, and non-Euclidean update geometry. We establish convergence guarantees and validate the method on large-scale LLM fine-tuning task with $\texttt{OPT}-1.3\mathrm{B}$ model.

17.
arXiv (CS.CL) 2026-06-16

Risk-Aware LLM Agents for Geospatial Data Retrieval: Design and Preliminary Adversarial Evaluation

We present an LLM-driven framework for retrieving remote sensing data from cloud-based geospatial catalogues using natural language queries. The system converts user intent into structured API calls, enabling efficient access to satellite imagery and environmental datasets. The architecture integrates three agents: Guardrail for safety and policy enforcement, General-QA for intent interpretation, and Recommender-Analyst for schema-aware API call generation. This coordinated design ensures reliable, semantically aligned interaction with external data services. The modular framework is portable across platforms through API schema substitution and supports applications in environmental monitoring, disaster response, and climate analysis. It establishes a scalable interface between user intent and geospatial infrastructure, enabling streamlined and automated Earth observation workflows. Preliminary experiments under adversarial multi-turn settings show that prompt-level safety instructions improve robustness, although rare high-impact failures persist in API manipulation scenarios and highlight the need for adaptive, system-level defenses that balance safety, usability, and cost efficiency, which motivates the use of our intercept-level Guardrail agent.

18.
bioRxiv (Bioinfo) 2026-06-18

MorphoStat: A Statistics-Aware Pipeline for Morphological Profiling Analysis

作者:

High-content imaging produces thousands of morphological measurements per cell. Interpreting these measurements requires normalization to remove plate effects, statistical tests selected on the basis of data distribution, and control over false discoveries across many features tested at once. MorphoStat is an open-source Python pipeline that applies this sequence of steps automatically. Given a CSV file from CellProfiler or a compatible imaging platform, it removes low-quality wells, normalizes each plate against DMSO controls using a MAD-scaled z-score, routes each feature to a parametric or nonparametric test based on a distributional check, applies Benjamini Hochberg correction, and writes out results and publication-ready figures. On the BBBC021 benchmark (MCF-7 breast-cancer cells, 632 wells, 473 features), MorphoStat recovered 12 of 13 known mechanism-of-action classes in principal component space, confirming that the normalization and statistical routing work as intended. The tool is available at https://github.com/Almunthir334/morphostat (DOI: 10.5281/zenodo.20354069) under the MIT license.

19.
arXiv (CS.CV) 2026-06-18

CABLE: Cloud-Assisted Bandwidth-efficient LMM-based Encoding for V2X Systems

Cloud-hosted large multimodal models (LMMs) can provide strong open-vocabulary perception for Vehicle-to-Everything systems, but naively transmitting full-resolution frames from edge to cloud causes severe communication overhead and high cloud-side prefill latency. We present CABLE, a cloud-assisted bandwidth-efficient LMM-based encoding framework for edge-cloud perception. CABLE propagates the previous cloud segmentation mask on the edge using ego-motion compensation, refines it with residual-motion cues, and consolidates disconnected regions via a corridor envelope to form a robust region of interest (ROI). Only ROI-masked images are uploaded, while the cloud segmentation output is fed back as the prior for the next frame, forming a mask-to-ROI-to-LMM feedback loop. Experiments on five datasets (nuScenes, WOD-ZB, Waymo, KITTI, and CADC) show consistent communication savings while largely preserving perception, achieving $73$–$87\%$ ROI pixel-coverage reduction with $5$–$8\times$ estimated LMM prefill speedup at a modest detection-quality trade-off relative to full-frame inference.

20.
medRxiv (Medicine) 2026-06-10

Epidemiology of Cervical Precancerous Lesions: Prevalence and Predictors from Pap Smear Screening in Hawassa City Hospitals, Sidama Region, Ethiopia. Institutional-Based Cross-sectional Study

Background: Cervical cancer is the fourth most common cancer in women worldwide and remains a major public health challenge. In Ethiopia, it is the second leading cause of cancer deaths, with around 8,000 new cases and 6,000 deaths each year. Region?specific data on the prevalence and predictors of precancerous lesions remain scarce, yet such information is vital for guiding targeted reproductive health strategies. This study therefore examined the prevalence and predictors of cervical precancerous lesions among women aged 21-60 years undergoing Pap smear screening in public hospitals in Hawassa City, Sidama Region. Methods: An institution-based cross-sectional study was conducted among 241 women attending Pap smear screening at public hospitals in Hawassa City from March to August 2025. Sociodemographic and clinical data were collected via interviews and medical records. Lesions were classified based on the standardized international framework for reporting cervical cytology results from Pap smears per the Bethesda system. Multivariable logistic regression identified predictors p

21.
arXiv (CS.AI) 2026-06-18

Clin-JEPA: A Multi-Phase Co-Training Framework for Joint-Embedding Predictive Pretraining on EHR Patient Trajectories

arXiv:2605.10840v3 Announce Type: replace-cross Abstract: We present Clin-JEPA, a multi-phase co-training framework for joint-embedding predictive (JEPA) pretraining on EHR patient trajectories. JEPA architectures have enabled latent-space planning in robotics and high-quality representation learning in vision, but extending the paradigm to EHR data – to obtain a single backbone that simultaneously forecasts patient trajectories and serves diverse downstream risk-prediction tasks without per-task fine-tuning – remains an open challenge. Existing JEPA frameworks either discard the predictor after pretraining (I-JEPA, V-JEPA) or train it on a frozen pretrained encoder (V-JEPA 2-AC), leaving the encoder unaware of the rollout signal that the retained predictor must use at inference; co-training the encoder and predictor under a shared JEPA prediction objective would supply this grounding, but naïve co-training is unstable, with representation collapse and online/target drift causing autoregressive rollout to diverge. Clin-JEPA's five-phase pretraining curriculum – predictor warmup, joint refinement, EMA target alignment, hard sync, and predictor finalization – addresses each failure mode by phase, stably co-training a Qwen3-8B-based encoder and a 92M-parameter latent trajectory predictor. On MIMIC-IV ICU data, three independent evaluations support the framework: (1) latent $\ell_1$ rollout drift uniquely converges ($-$15.7%) over 48-hour horizons while baselines and ablations diverge (+3% to +4951%); (2) the encoder learns a clinically discriminative latent geometry (deteriorating-patient cohorts displace 4.83$\times$ further than stable patients in latent space, vs $\leq$2.62$\times$ for baseline encoders); (3) a single backbone outperforms strong tabular and sequence baselines on multi-task downstream evaluation. Clin-JEPA achieves mean AUROC 0.851 on ICareFM EEP and 0.883 on 8 binary risk tasks (+0.038 and +0.041 vs baseline average).

22.
arXiv (quant-ph) 2026-06-16

A New Definition of Quantum Superposition

arXiv:2606.15607v1 Announce Type: new Abstract: The usual description of the superposition of two (pure quantum) states is ambiguous, since the binary operation of summation in a Hilbert space does not pass down to the quotient projective space. Even though Dirac noted this as early as 1930, it is often asserted that the superposition is a binary operation acting on two states with a value that is a unique state. The goal for this note is to motivate a rigorous, geometrical definition of the superposition of states in the setting of complex projective space, which has been argued elsewhere to be the natural geometric phase space for quantum theory. The upshot is that the new definition of the superposition of two pure states, viewed as two distinct points in the projective space, is the unique (complex) line on which those two points lie. Finally, a comparison is given between superposition and expansion in an orthonormal basis.

23.
arXiv (CS.LG) 2026-06-19

Spectral Retrieval-Augmented Time-Series Forecasting

arXiv:2606.19412v1 Announce Type: new Abstract: Time series forecasting leverages historical patterns to predict future values, but traditional methods face challenges when dealing with complex, non-stationary patterns that are difficult to memorize during training. Retrieval-augmented approaches have emerged as promising solutions by retrieving similar historical patterns to enhance predictions. However, existing retrieval methods suffer from two fundamental limitations: spectral blindness, which overlooks critical frequency-domain characteristics that capture underlying periodic structures, and temporal recency, which treats all historical data equally without emphasizing recent, more relevant patterns. In this paper, we propose SpecReTF, a novel retrieval method that addresses these issues by converting time series into windowed frequency representations, measuring similarity with a combined metric that captures both amplitude and phase information. To balance recency and historical context, we apply an exponential moving average weighting scheme that emphasizes recent windows. Extensive experiments on benchmark datasets demonstrate that SpecReTF outperforms time-domain retrieval methods, achieving superior forecasting accuracy across diverse, non-stationary time series.

24.
arXiv (CS.CL) 2026-06-11

MA-DLE: Speech-based Automatic Depression Level Estimation via Memory Augmentation

Speech-based automatic estimation of depression levels is essential for enabling early detection and timely intervention, particularly in resource-constrained mental health settings. In recent years, deep learning has demonstrated impressive success across various domains, including affective computing and mental health assessment. Most existing approaches rely on RNN-based architectures (such as LSTM and GRU) to model temporal information for depression estimation. However, the extracted features often emphasize only a few adjacent speech segments, limiting their ability to capture long-range dependencies. To overcome this limitation, we introduce a memory-based feature augmentation method that enhances the representational capacity of GRU-extracted features. Rather than indiscriminately incorporating historical data, our memory bank is designed to selectively integrate two types of components in order to reduce redundancy and irrelevance: (1) historical temporal features that closely resemble the current GRU output, offering complementary contextual information; and (2) dynamic memory features identified based on feature variability, which capture behavioral and emotional fluctuations indicative of depressive symptoms. To effectively fuse the memory-augmented features with GRU outputs, we further design a Hierarchical Attention Fusion (HAF) module. Our method is evaluated on the widely used DAIC-WOZ and E-DAIC datasets, achieving state-of-the-art performance.

25.
medRxiv (Medicine) 2026-06-15

Sociodemographic Disparities in Tafamidis Initiation and Clinical Outcomes in ATTR-CM Across the United States

BACKGROUND Transthyretin amyloid cardiomyopathy (ATTR-CM) is a progressive, life-threatening disease. Sociodemographic factors may influence time to treatment initiation and resulting clinical outcomes, yet these relationships are poorly characterized. OBJECTIVE Assess the effects of sex and race on tafamidis initiation and subsequent outcomes and their interaction with factors such as ATTR-CM type and social deprivation measures. METHODS A retrospective cohort analysis was conducted using the US Komodo Healthcare Map (01/2016-06/2024) among patients with amyloidosis, identified by ICD-10-CM diagnosis codes. Cumulative incidence of treatment initiation and survival probabilities for cardiovascular-related hospitalization (CVH) or death were estimated by Kaplan-Meier, stratified by sex and race. Cox proportional hazards models were fitted for both endpoints to estimate hazard ratios, adjusting for demographics and clinical characteristics. RESULTS Of 11,311 patients identified, White and Black patients (n=9,223) were included in subsequent analyses. Within 12 months of diagnosis, White women had the lowest cumulative incidence of tafamidis initiation (11.4%), followed by Black women (22.0%), Black men (26.7%), and White men (31.0%). Event-free survival at 12 months was lowest in Black women (42.9%), followed by Black men (46.8%), White women (48.6%), and White men (54.4%). Median (95% CI) time to CVH or death was shortest for Black women (8.0 months [6.8-10.0]) followed by Black men (9.9 months [8.8-12.0]), White women (11.0 months [9.6-13.0]), and White men (15.0 months [14.0-16.0]). CONCLUSIONS In this large, real-world cohort of US patients with ATTR-CM, sex and race contributed to disparities in tafamidis initiation and survival, underscoring compounded disparities in both access and outcomes.