Academic Intelligence · Curated Daily

Explore the Frontier of Global Academia

AcademicHub aggregates real-time literature from top journals and preprint platforms. Build your personal research radar and let large language models compile cross-disciplinary analysis briefings automatically.

01.
arXiv (CS.CL) 2026-06-11

Unstable Features, Reproducible Subspaces: Understanding Seed Dependence in Sparse Autoencoders

Sparse autoencoders (SAEs) are widely used to interpret neural network representations, but their utility depends on whether the learned features are reproducible across training runs. We study this question through feature stability: for each SAE feature, we estimate the probability that a similar feature reappears in an independently trained SAE. This yields a scalable per-feature signal that separates stable from unstable features. In a large-scale study across seeds, models, layers, dictionary sizes, and SAE variants, we find a pronounced functional asymmetry: stable features carry most of the reconstruction- and prediction-relevant signal, while unstable features have weak marginal impact and are dominated by low-frequency surface-form triggers in both activation statistics and automatic explanations. Geometrically, unstable features are individually non-reproducible but concentrate in reproducible lower-rank subspaces, suggesting that seed dependence often reflects basis ambiguity within a shared region of activation space rather than pure noise. A controlled synthetic model makes this mechanism explicit, showing that low-rank ground-truth features can be recovered at the subspace level while remaining non-identifiable as individual SAE latents across seeds. Finally, by pooling unique cross-seed features, we construct more stable SAEs while preserving explained variance in this setting. Together, these results show that unstable features are not merely failed or noisy latents: they have weak individual functional impact, but reflect reproducible low-dimensional structure that standard SAEs resolve differently across seeds.

02.
arXiv (CS.LG) 2026-06-16

Filtered ANN as a Phase Transition: When Selectivity-Estimation Error Causes Plan Regret

arXiv:2606.16341v1 Announce Type: new Abstract: A filtered approximate-nearest-neighbor (ANN) query returns the k nearest vectors among those satisfying an attribute predicate P of selectivity s. The best execution strategy – pre-filter, post-filter, or in-filter – changes with s, so a system must estimate s and choose. We model this as an argmax over a landscape with phases (regions where each strategy wins) separated by boundaries, and show that selectivity-estimation error produces plan regret – recall lost versus the oracle strategy – only in the critical regions around those boundaries. The regret is a wedge of log-width equal to the multiplicative estimation error epsilon and height equal to the local cliff |V'(s*)| epsilon; the flip-margin 1/|V'(s*)| is the condition number of a sibling cardinality-estimation study reappearing as the local boundary theory. The two phase boundaries follow from independent mathematics: order statistics place the post-filter cliff at s ~ k/K, and site percolation places the in-filter cliff at s_c ~ 0.83/M for graph degree M (corpus-size independent). Criticality exists only under a constrained budget B < sqrt(k n). Under pre-registered decision rules we confirm, on synthetic sweeps and real SIFT1M, that regret concentrates ~290x at the boundary and that the regret curves obey a finite-size scaling collapse onto one universal wedge across two decades of corpus size. A real approximate index does not mis-locate the boundary, but a biased cost model opens a persistent miscalibration band that estimation-error robustness cannot fix. The contribution is a characterization, not a new index. Code and the full pre-registration are public.

04.
arXiv (quant-ph) 2026-06-12

Towards Geostrategic Critical Minerals and Materials Resilience: Secure Supply-Chain and Criticality Analyses for Quantum Technologies in Arctic and Space Environments

arXiv:2605.02926v2 Announce Type: replace-cross Abstract: This manuscript maps secure-supply and criticality risks for quantum technologies deployed in extreme environments, linking upstream critical minerals and materials (CMMs) to downstream system performance, continuity of security, and mission assurance. It adopts a reproducible "Critical Level I" screening method to identify materials whose supply concentration, essentiality, and limited mitigatability can create bottlenecks for quantum deployment. The analysis is structured around two use cases: (i) niobium as a key input for superconducting quantum computing and related manufacturing and toolchain dependencies; and (ii) space-qualified superconducting nanowire single-photon detectors (SNSPDs), alongside adjacent single-photon detector platforms such as SPADs, where radiation, thermal cycling, vibration, and electromagnetic interference can degrade device metrics and, in communications settings, threaten continuity of security. The manuscript further situates these dependencies within U.S.-China strategic competition over critical materials, refining capacity, export controls, and overseas mineral acquisitions, while also connecting them to standards-first governance, post-quantum cryptography migration, and the emerging security logic of quantum networking. It argues that static national critical-minerals lists are insufficient for mission-relevant quantum technology and proposes a dedicated Quantum Criticality and Critical Minerals (QCCM) dashboard as a living decision-support tool for tracking concentration, substitutability, qualification bottlenecks, stockpiling gaps, and geopolitical stress signals across quantum platforms. The paper concludes with implications for substitution, diversification, stockpiling, shielding, qualification-by-design, and standards-aligned governance to support secure, sustained, and mission-relevant quantum deployment.

05.
arXiv (CS.LG) 2026-06-11

APEX: A Network-Native Time-Series Foundation Model for Forecasting and Anomaly Detection for Wireless Edge Operations

arXiv:2606.11553v1 Announce Type: new Abstract: Generic time-series foundation models transfer poorly to wireless network telemetry whose signals are bursty, zero-inflated, and coupled across protocol layers. We present APEX, a network-native, decoder-only transformer for forecasting enterprise AP telemetry, and evaluate it on DHCP degradation as a representative network task. APEX is pre-trained on 10-channel multivariate telemetry from ~4,500 production wireless networks (~100K AP time series, 34 metrics per AP), and is available as APEX-Large (269M, cloud) and APEX-Edge (10.5M, edge). On a 192-step (4-day) DHCP degradation benchmark, APEX-Large reduces MAE by 18% over the strongest foundation-model baseline (Toto) and 38% over SARIMA, with anomaly-detection F1 = 0.93, while APEX-Edge enables sub-second, privacy-preserving inference on AP-class edge hardware. These results suggest network-native pre-training is a practical foundation for proactive wireless operations.

06.
arXiv (CS.AI) 2026-06-16

Beyond Classification: A Cough Regression Benchmark for Respiratory Acoustic Foundation Models

arXiv:2606.15436v1 Announce Type: cross Abstract: Respiratory acoustic foundation models (FMs) excel at cough classification, yet their ability to predict continuous health quantities from cough audio remains largely unexplored, despite the clinical value of passive age, BMI, and disease probability estimation in settings where physical measurements are unavailable. We introduce the multi-model, multi-target cough regression benchmark evaluating five FMs (OPERA-CT, OPERA-CE, OPERA-GT, HeAR, M2D+Resp) across six targets on three datasets under subject-disjoint protocols, comparing linear, MLP-small, and full MLP regression heads. MLP-small beats the mean-predictor baseline on all tasks and linear probing in 23 of 30 model x task cases, with full MLP overfitting on small clinical data but recovering on larger sets, revealing a dataset size x head-capacity trade-off. HeAR leads within-dataset age regression on Coswara (9.12 yr MAE); its CIDRZ result is excluded from headline claims owing to possible HeAR-CIDRZ pretraining overlap. OPERA-GT is favored over OPERA-CT on age in all three datasets, with the CIDRZ margin within seed variance, extending a generative-pretraining advantage from breath to cough. HeAR and M2D+Resp reach near-full performance at N = 50 samples while OPERA models require N = 400. Cross-dataset transfer is strongly asymmetric as large diverse data generalises to small clinical populations (CoughVID to CIDRZ: -0.17 yr) but not vice versa (CIDRZ to Coswara: +2.43 yr, +26.6%).

07.
arXiv (CS.LG) 2026-06-17

BioArtlas: Computational Clustering of Multi-Dimensional Complexity in Bioart

Authors:

arXiv:2511.19162v3 Announce Type: replace-cross Abstract: Bioart brings living material into artistic practice, where a single work can be at once an aesthetic object, a scientific instrument, and an ethical provocation. Traditional categories sort such works along one axis at a time, which flattens the very hybridity that defines the field and leaves curators no way to compare works across many dimensions together. I introduce BioArtlas, a computational atlas that represents each bioartwork along many curated dimensions at once and organizes the field by conceptual similarity rather than by medium or chronology. My method embeds the keywords of all 81 works on each of thirteen interpretive axes, groups related concepts into a shared codebook that tames inconsistent terminology, and then searches systematically for a clustering that is both statistically clean and interpretable. Among the methods that place every work on the map, agglomerative clustering separates the field far more cleanly than the usual k-means baseline (silhouette 0.664 versus 0.483), whereas density-based methods reach higher scores only by discarding most of the corpus as noise. By separating rigorous analysis from public storytelling, BioArtlas turns the tangled complexity of bioart into a navigable landscape, openly available as an interactive interface (https://www.bioartlas.com) and dataset (https://github.com/joonhyungbae/BioArtlas).

08.
arXiv (CS.AI) 2026-06-24

Prob-BBDM: a Probabilistic Brownian Bridge Diffusion Model for MRI sequence image-to-image translation

arXiv:2606.24313v1 Announce Type: new Abstract: AI-driven image-to-image synthesis is rapidly advancing, with growing applications in medical imaging. Multi-modal image analysis plays a crucial role in optimizing examination quality, yet acquiring multiple imaging modalities in clinical settings remains resource-intensive and time-consuming, especially for 3D imaging. To address this challenge, we propose a novel image-to-image translation model based on Brownian Bridge Diffusion Models (BBDM), which synthesizes magnetic resonance imaging (MRI) sequences from 2D axial slices. Our approach integrates a variational encoder-guided diffusion mechanism, leveraging probabilistic image distributions to enhance synthesis quality. Evaluated on the BraTS 2021 dataset, our Probabilistic-BBDM (Prob-BBDM) achieves superior performance across multiple translation tasks, reaching up to 88.46% SSIM and 26.09 dB PSNR, with consistent improvements over baselines. Notably, our diffusion process requires only 4 steps, making it computationally efficient while maintaining high-quality synthesis. To further validate generalizability, we test Prob-BBDM on an external third-party dataset, demonstrating consistent performance across domains. Additionally, we assess the clinical utility of the synthesized slices by using them as input to a pre-trained segmentation model. Tumor segmentation yields a Dice score of 88.71% and an HD95 of 3.49 mm, confirming that the synthesized slices preserve critical diagnostic information. These results highlight the potential of Prob-BBDM for high-quality, efficient, and generalizable MRI synthesis, offering a promising step toward improved medical image translation.

09.
arXiv (CS.AI) 2026-06-24

Representation Interventions Enable Lifelong Knowledge Memory Control in LLMs

arXiv:2511.20892v4 Announce Type: replace Abstract: Large language models (LLMs) often produce incorrect or outdated content after being employed. Efficient and accurate knowledge updates without costly retraining are a major challenge. This problem is particularly challenging in lifelong settings, where complex, unstructured knowledge must coexist without interference. We introduce RILKE (Representation Intervention for Lifelong KnowledgE Control), a robust and scalable method that treats knowledge control as interventions within the model's representation space. Leveraging representation-space expressiveness, we identify two key properties enabling RILKE to achieve fine-grained control over complex, unstructured knowledge while maintaining general utility with frozen base weights. During training, RILKE learns paraphrase-robust and edit-localized modules that limit each update to a low-dimensional subspace to minimize cross-edit interference. At inference, a query-adaptive router selects the appropriate module to guide the model's generation. Across LLaMA and Qwen models, RILKE scales effectively to large-scale benchmarks, demonstrating high edit success and strong paraphrase generalization while preserving general utility with modest memory overhead. These results show RILKE is an effective and scalable solution for lifelong knowledge control in LLMs.

10.
arXiv (math.PR) 2026-06-16

A Low-Regularity Semigroup Sewing Lemma via Quotient Structures

arXiv:2606.16164v1 Announce Type: new Abstract: We develop a low-regularity Sewing theory for the semigroup coboundary $\hat\delta=\delta-a$ associated with a strongly continuous semigroup $S$. Unlike the ordinary low-regularity Sewing problem, the semigroup setting has an intrinsic algebraic non-uniqueness below the threshold $1$, in the sense that solutions are canonical only modulo semigroup cocycles. Accordingly, the natural target is a quotient space rather than an increment space. We identify this quotient structure and construct the corresponding semigroup Sewing map. The construction uses a frozen terminal-time transform, which rewrites semigroup defects, for each terminal time, as ordinary low-regularity Sewing problems on a frozen simplex. This reduction, however, does not by itself produce a genuine semigroup increment; the main additional step is to prove that the frozen solution classes are compatible as the terminal time varies and hence assemble into a canonical quotient class for $\hat\delta$. This yields canonical classes for $0

11.
arXiv (CS.CL) 2026-06-19

IdealGPT: Iteratively Decomposing Vision and Language Reasoning via Large Language Models

The field of vision-and-language (VL) understanding has made unprecedented progress with end-to-end large pre-trained VL models (VLMs). However, they still fall short in zero-shot reasoning tasks that require multi-step inferencing. To achieve this goal, previous works resort to a divide-and-conquer pipeline. In this paper, we argue that previous efforts have several inherent shortcomings: 1) They rely on domain-specific sub-question decomposing models. 2) They force models to predict the final answer even if the sub-questions or sub-answers provide insufficient information. We address these limitations via IdealGPT, a framework that iteratively decomposes VL reasoning using large language models (LLMs). Specifically, IdealGPT utilizes an LLM to generate sub-questions, a VLM to provide corresponding sub-answers, and another LLM to reason to achieve the final answer. These three modules perform the divide-and-conquer procedure iteratively until the model is confident about the final answer to the main question. We evaluate IdealGPT on multiple challenging VL reasoning tasks under a zero-shot setting. In particular, our IdealGPT outperforms the best existing GPT-4-like models by an absolute 10% on VCR and 15% on SNLI-VE. Code is available at https://github.com/Hxyou/IdealGPT

12.
bioRxiv (Bioinfo) 2026-06-22

CellTosg2Sequence: A Unified Text-Omics-Signaling-Graph Large Language Model for Single-Cell Analysis

bioRxivLaTeXUnicodeabstract — In single-cell (sc)-based scientific discovery, text-formatted biomedical prior knowledge and signaling graphs are essential for annotating and interpreting numeric sc-omics data and for generating novel testable hypotheses. A major limitation of existing single-cell large language models (scLLMs) is that they rely on numeric expression data with gene names as the only textual signal, while comprehensive biomedical priors – cellular localization, gene function, disease associations, and signaling interaction patterns – remain absent from the model input. We introduce CellTosg2Sequence, a textual-prior- and signaling-graph-augmented cell-omics-sentence language model. A lightweight heterogeneous graph encoder maps a curated 62,507-node biomedical knowledge graph (KG) into compact virtual tokens that are prepended to each cell sentence, allowing the language model to condition on biological structure with minimal sequence-length overhead. We train CellTosg2Sequence with a three-stage objective: Stage I anchors the KG channel under autoregressive language-model pretraining, leveraging Qwen2.5-32B's own language reasoning for rapid KG alignment; Stage II aligns labels via supervised fine-tuning with KG-anchored InfoNCE; Stage III applies Group Relative Policy Optimization (GRPO) with an ontology-hierarchy reward, enabling free-generation cell-type prediction that generalizes beyond the closed training vocabulary. Across multiple benchmarks and ablation experiments, CellTosg2Sequence outperforms strong baselines. All results are achieved with lightweight LoRA training and a single unified checkpoint.

13.
Nature (Science) 2026-06-09

Scientists have a bad case of AI FOMO, <i>Nature</i> poll reveals

Authors:

Almost half of the scientists who responded said that they feel broadly negative towards artificial intelligence, but they think that some tools are better than others. Almost half of the scientists who responded said that they feel broadly negative towards artificial intelligence, but they think that some tools are better than others.

14.
arXiv (CS.AI) 2026-06-24

What Does ODRL Mean? A Cross-Level Ontological Grounding of Permissions, Prohibitions, and Duties in UFO-L

arXiv:2606.24344v1 Announce Type: cross Abstract: ODRL policy evaluators produce verdicts, but say nothing about the normative positions a policy brings into existence, the authority structures those positions presuppose, or who holds the power to declare a norm violated. We formulate the Cross-Level Design Principle: any normative language with violable, consequential norms requires both conduct-level positions (Permission, Duty, Right, No right) and competence-level positions (Power, Subjection, Immunity, Disability). Applying this to ODRL, we establish that prohibition is sanctioned (violation possible and consequential), that permission is underspecified across its behaviour parameter (open vs. closed world), and that the formal semantics covers achievement obligations only. We ground ODRL in UFO-L, mapping each activated rule to a simple legal relator and extending coverage from two to eight legal positions; violation-declaration authority, implicit in every existing evaluator, becomes an explicit Power-Subjection pair. All axioms are mechanically verified in Isabelle/HOL and across a 39-problem benchmark under Vampire, E, and Z3.

15.
arXiv (CS.AI) 2026-06-11

Blind Dexterous Grasping via Real2Sim2Real Tactile Policy Learning

arXiv:2606.11767v1 Announce Type: cross Abstract: Blind grasping with a dexterous hand is a crucial manipulation capability. Nevertheless, learning such tactile-only policies for real robots remains challenging due to the tactile sim-to-real gap and the limited expressiveness of sparse tactile signals. To bridge this gap, we propose a framework for tactile-only blind grasping that is deployable on a physical multi-fingered robotic hand. Our approach combines three key components. First, we introduce a Real2Sim tactile calibration pipeline that constructs a contact-calibrated digital-twin simulator capable of reproducing real tactile signals. Second, we improve the expressiveness of sparse tactile observations using a layout-aware tactile encoder, which incorporates sensor-geometry priors through self-supervised pretraining. Third, to improve generalization to unseen objects, we train object-specific reinforcement-learning experts in the calibrated simulator and aggregate their successful grasp trajectories into a tactile-conditioned Diffusion Policy. We evaluate our method on a physical LEAP Hand equipped with distributed tactile sensing across 10 seen and 10 unseen objects. The deployed policy achieves a 27\% real-world grasp success rate across all 20 objects, without real-world grasping demonstrations or visual input. Simulation ablations show that layout-aware tactile pretraining improves grasping performance, while sensing-level evaluations confirm that Real2Sim calibration increases the consistency of tactile contact events between simulation and hardware. Together, these results suggest that contact-event calibration, geometry-aware tactile representation learning, and diffusion-based policy aggregation provide an effective path toward tactile-only blind grasping on real dexterous robotic hands. Project page:Dex-Blind-Grasp.github.io.

16.
arXiv (CS.LG) 2026-06-12

Clipping Makes Distributed and Federated Asynchronous SGD Robust to Stragglers

arXiv:2606.13287v1 Announce Type: new Abstract: In modern machine learning, parallelization of training is an important strategy for increasing scale. Asynchronous stochastic gradient descent (ASGD), which maximizes the utilization of available hardware by avoiding waiting for slow workers. However, with constant step sizes, the convergence of ASGD is nonetheless affected negatively by slow workers due to large delays in updates. At the same time, it has been empirically observed in asynchronous training of deep learning models that gradient clipping "stabilizes" training. In this work, we provide a theoretical justification for this behavior, as we show that clipping removes the dependence of the maximum delay in the oracle complexity. We employ a sub-Weibull model of gradient noise which generalizes sub-Gaussian and sub-exponential distributions to more heavy-tailed distributions, motivated by empirical observations in deep learning. We show convergence in expectation, and the first time in asynchronous optimization, convergence with high probability.

17.
arXiv (CS.CV) 2026-06-24

Dual-Branch Cross-Projection Debiasing through Diffusion-based Disentanglement

Foundation models trained on biased datasets often rely on spurious correlations between target labels and non-causal attributes, resulting in poor generalization on minority groups. Bias mitigation remains challenging due to two fundamental issues. First, when group labels are unavailable, existing group-unsupervised methods typically infer spurious attributes implicitly from model behavior, making it difficult to identify spurious factors that are semantically aligned with real-world biases. Second, even with pseudo spurious supervision, most existing debiasing methods follow a single-branch design that operates within a single shared feature space, where target and spurious attributes are intrinsically entangled. To address the first challenge, we introduce Confidence-guided Bias Concept Mining (CBCM), which leverages diffusion-disentangled, semantically grounded concept representations to identify reliable spurious attributes without attribute annotations. To address the second challenge, we propose Dual-branch Cross-projection Debiasing (DCD), a prompt-tuning framework that separates target and spurious representations into two branches and explicitly removes spurious information through cross null-space projection while preserving target-relevant semantics. Extensive experiments on four benchmark datasets show that our method achieves state-of-the-art worst group accuracy among group-unsupervised approaches, while tuning at most 0.22% of the model parameters. The source code is available in the supplementary materials.

18.
arXiv (CS.AI) 2026-06-16

Skill-to-LoRA: From Using Skills to Learning Behaviors for Token-Efficient LLM Agents

arXiv:2606.16769v1 Announce Type: new Abstract: Agent skills are commonly distributed as SKILL.md files: human-readable procedural documents that describe workflows, tools, resources, and domain conventions. While convenient for inspection and reuse, this design requires the same reusable procedure to be repeatedly injected into the runtime context. We propose Skill-to-LoRA(S2L), a behavior-centric skill representation that replaces runtime skill text with skill-specific LoRA adapters. Rather than compressing the skill document itself, S2L models the behavioral change induced by the skill text: offline, the complete SKILL.md is used to synthesize skill-guided demonstrations; online, the full document is omitted and the corresponding LoRA adapter is dynamically loaded to activate the learned skill behavior. We evaluate S2L with Qwen3.6-27B on a 21-skill subset of SWE-Skills-Bench. Compared with the no-skill and Full Skill Text baselines, S2L improves pass rate by 2.9 and 5.2 percentage points, respectively, while reducing per-step token cost by 6.6% relative to Full Skill Text prompting. S2L matches or improves Full Skill Text on 18/21 skills and the no-skill baseline on 15/21 skills. Control experiments further show that the gains depend on skill-specific adapter alignment: Wrong-LoRA and Shared-LoRA both reduce performance. These results suggest that many procedural agent skills can be converted from runtime instructions into trainable, dynamically loadable behavioral modules. Code will be released upon acceptance.

19.
arXiv (CS.LG) 2026-06-16

libhmm: A Modern C++20 Library for Hidden Markov Models with Correct MLE Emission M-Steps

Authors:

arXiv:2605.29208v2 Announce Type: replace-cross Abstract: We describe libhmm, a C++20 library for Hidden Markov Model parameter estimation, sequence decoding, and model selection. libhmm addresses two gaps in existing software: the absence of a well-maintained, zero-dependency C++ HMM library suitable for embedding in production systems, and the widespread use of method-of-moments (MOM) approximations in the emission distribution M-step of the Baum-Welch algorithm. The library implements correct maximum likelihood estimators for sixteen scalar emission distributions, including an ECME algorithm for the location-scale Student-t distribution, Newton-Raphson maximization for Gamma, Beta, Weibull, and Negative Binomial distributions, and the von Mises distribution for circular data. All forward-backward and Viterbi calculations operate in full log-space. SIMD acceleration is provided for AVX-512, AVX2, SSE2, and ARM NEON via compile-time dispatch with scalar fallback. Version 4 adds multivariate observation support via the BasicHmm template, with three multivariate emission families (diagonal Gaussian, full-covariance Gaussian, and independent components) each with correct weighted MLE M-steps. Python bindings are available via the companion package pylibhmm. We compare libhmm against established C and C++ HMM libraries and against published R reference packages on seven real-data benchmarks, and discuss the architectural tradeoffs made in the design.

20.
arXiv (CS.LG) 2026-06-19

PaAno+: Multiscale Encoding and Cross-Variable Attention for Time Series Anomaly Detection

arXiv:2606.20055v1 Announce Type: new Abstract: Time-series anomaly detection has significant practical value for industrial and medical monitoring, as well as other critical domains. Current Transformer- and large-model-based detection approaches incur excessive computational overhead, while existing lightweight alternatives are constrained by insufficient feature extraction and inadequate modeling of dependencies across multivariate variables. To mitigate the above drawbacks, this study develops a lightweight, efficient anomaly detection model, dubbed PaAno, within the patch-oriented representation learning paradigm. In the encoder module, a multiscale feature-extraction backbone is constructed using convolutional kernels with differentiated receptive fields to capture hierarchical temporal characteristics; subsequent cross-scale adaptive attention aggregation, combined with residual connection optimization, further stabilizes feature representation learning. A cross-variable fusion attention module is embedded to explicitly characterize inter-variable correlations, empowering the model to identify anomalous patterns amid intricate operational conditions. Moreover, a novel pretext task based on temporal patch-window sorting is customized to uncover intrinsic structural properties of time series, and triplet loss is leveraged to optimize the patch embedding space for enhanced feature discrimination. Extensive experiments on the TSB-AD benchmark demonstrate that the proposed PaAno achieves state-of-the-art detection accuracy on both univariate and multivariate tasks, yielding significant performance gains across evaluation metrics, including VUS-PR, relative to the original PaAno. Leveraging a compact network design, the presented model achieves favorable computational efficiency, enabling deployment on resource-limited terminals for real-time anomaly inference.

21.
arXiv (CS.LG) 2026-06-12

Mirror Descent on Riemannian Manifolds

arXiv:2603.17527v2 Announce Type: replace-cross Abstract: Mirror Descent (MD) is a scalable first-order method widely used in large-scale optimization, with applications in image processing, policy optimization, and neural network training. This paper generalizes MD to optimization on Riemannian manifolds. In particular, we develop a Riemannian Mirror Descent (RMD) framework via reparameterization and further propose a stochastic variant of RMD. We also establish non-asymptotic convergence guarantees for both RMD and stochastic RMD. As an application to the Stiefel manifold, our RMD framework reduces to the Curvilinear Gradient Descent (CGD) method proposed in [26]. Moreover, when specializing the stochastic RMD framework to the Stiefel setting, we obtain a stochastic extension of CGD, which effectively addresses large-scale manifold optimization problems.

22.
arXiv (CS.CV) 2026-06-15

C-MambaPose: A Physics-Informed Complex Mamba Framework for Cross-Environment WiFi Human Pose Estimation

Authors:

Human pose estimation (HPE) utilizing wireless WiFi signals has emerged as a promising technology owing to its device-free nature, privacy preservation, and robustness against occlusion and poor lighting. However, existing methods often overlook the physical complex phase information of WiFi signals and fail to generalize across diverse environments due to severe domain shifts. In this paper, we present C-MambaPose, a physics-informed complex-valued Mamba-GraFormer hybrid framework for robust cross-environment WiFi-based 3D HPE. Our framework first sanitizes raw WiFi Channel State Information (CSI) phase errors and constructs a phase-preserving complex-valued representation. We then employ a Spatiotemporal Complex Mamba encoder with a dynamic selective receptive field to capture fine-grained phase dynamics. A cross-attention joint-query mapper maps the unstructured sequence tokens to human joints, which are decoded by a Graph Convolutional Network (GCN) to predict anatomically coherent 3D coordinates. Extensive evaluations on the MM-Fi dataset show that C-MambaPose achieves competitive or superior performance to state-of-the-art baselines across all settings, setting a new state-of-the-art specifically on the challenging cross-environment split, requiring only 3.78 M parameters-an 83.1\% reduction compared to GraphPose-Fi[chen2026graph] and an 85.7\% reduction compared to MetaFi++[zhou2023metafi++], while maintaining a comparable size to DT-Pose[chen2025towards] (which is only 18\% smaller) but achieving significantly superior performance without requiring any pretraining. Our code is publicly available at https://github.com/phucngvinuni/cmampose.git.

23.
arXiv (CS.CV) 2026-06-24

MedP-CLIP: Medical CLIP with Region-Aware Prompt Integration

Contrastive Language-Image Pre-training (CLIP) has demonstrated outstanding performance in global image understanding and zero-shot transfer through large-scale text-image alignment. However, the core of medical image analysis often lies in the fine-grained understanding of specific anatomical structures or lesion regions. Therefore, precisely comprehending region-of-interest (RoI) information provided by medical professionals or perception models becomes crucial. To address this need, we propose MedP-CLIP, a region-aware medical vision-language model (VLM). MedP-CLIP innovatively integrates medical prior knowledge and designs a feature-level region prompt integration mechanism, enabling it to flexibly respond to various prompt forms (e.g., points, bounding boxes, masks) while maintaining global contextual awareness when focusing on local regions. We pre-train the model on a meticulously constructed large-scale dataset (containing over 6.4 million medical images and 97.3 million region-level annotations), equipping it with cross-disease and cross-modality fine-grained spatial semantic understanding capabilities. Experiments demonstrate that MedP-CLIP significantly outperforms baseline methods in various medical tasks, including zero-shot recognition, interactive segmentation, and empowering multimodal large language models. This model provides a scalable, plug-and-play visual backbone for medical AI, combining holistic image understanding with precise regional analysis.

24.
arXiv (math.PR) 2026-06-19

A Cycle Walk for Sampling Measures on Spanning Forests for Redistricting

arXiv:2509.08629v2 Announce Type: replace-cross Abstract: We introduce the Cycle Walk, a new Markov chain Monte Carlo method for sampling distributions on balanced graph partitions, motivated by applications in political redistricting. The method operates on spanning forests and combines two types of updates: local "cycle" moves within districts and global moves that exchange population between adjacent districts while preserving balance constraints. This construction enables efficient Metropolis–Hastings correction while allowing proposals at multiple spatial scales. We show that the Cycle Walk naturally interpolates between existing approaches based on local updates and a class of global update methods derived from recombination (RECOM). Through a range of numerical experiments on synthetic graphs and real-world precinct data, we demonstrate that the Cycle Walk exhibits improved empirical convergence diagnostics for distributions that place weaker weight on spanning-tree counts, a regime that is challenging for existing methods. In particular, the algorithm remains effective when incorporating alternative compactness measures that more closely reflect policy-relevant criteria. These results suggest that the Cycle Walk provides a flexible and computationally efficient framework for sampling from a broader class of redistricting distributions than previously accessible with MCMC techniques.

25.
arXiv (CS.AI) 2026-06-19

Robust $Q$-learning for mean-field control under Wasserstein uncertainty in common noise

arXiv:2606.20356v1 Announce Type: cross Abstract: In this article, we present a robust $Q$-learning algorithm for discrete-time mean-field control problems under Wasserstein uncertainty in the common noise law. The algorithm combines a quantization-and-projection scheme with a Wasserstein dual reformulation on the common-noise space. We establish its convergence together with finite-time iteration bounds for both synchronous and asynchronous learning schemes. Numerical experiments on systemic risk and epidemic models compare the asynchronous implementation with an idealized Bellman iteration, illustrate the robustness-performance tradeoff under common-noise misspecification, and report the observed convergence behavior of the asynchronous $Q$-learning algorithm.