Academic Intelligence · Curated Daily

探索全球前沿学术脉络

AcademicHub 汇聚顶级期刊与预印本平台的实时文献。定制您的专属科研雷达,利用大语言模型自动生成交叉领域文献分析简报。

02.
arXiv (math.PR) 2026-06-17

Poisson approximation by coupling

arXiv:2605.01894v2 Announce Type: replace Abstract: It is well known that a binomial $(n,p)$ can be approximated by a Poisson distribution with parameter $np$. The typical approach in undergraduate probability texts is to show a convergence result for the distribution of the binomial as $n$ goes to infinity and $np$ converges to some $\lambda$. In this note we use instead the coupling technique to show a much more general result. Moreover, we only use elementary results from probability.

03.
arXiv (CS.AI) 2026-06-19

Spatial-Aware Reduction Framework: Towards Efficient and Faithful Visual State Space Models

arXiv:2606.19932v1 Announce Type: cross Abstract: Mamba demonstrates strong efficiency in modeling long visual sequences. However, when token reduction is applied to structurally enhanced Mamba variants, these models exhibit a severe performance collapse. We attribute this degradation to the spatially agnostic nature of existing reduction methods, which violate the two-dimensional structural premise required by the selective scanning mechanism. In this work, we propose STORM, a spatial-aware token reduction framework designed to maintain structural integrity throughout the compression process. STORM reformulates reduction into a structured operation on spatial units, enforcing localized constraints to maintain both grid topology and neighborhood coherence. As a plug-and-play module, STORM equips existing reduction pipelines with explicit spatial awareness without any training. Empirical results demonstrate that STORM achieves state-of-the-art pruning accuracy across diverse vision Mamba backbones under training-free settings. Notably, STORM delivers a substantial accuracy recovery on VMamba, outperforming prior methods by up to 63.3\% in top-1 accuracy. Meanwhile, STORM incurs only a 1.0\% accuracy drop on PlainMamba, achieving performance comparable to ViT.

04.
arXiv (CS.CV) 2026-06-17

Bayesian Magnetic Resonance Joint Image Reconstruction and Uncertainty Quantification using Sparsity Prior Models and Markov Chain Monte Carlo Sampling

We propose a novel framework for uncertainty quantification using compressed sensing magnetic resonance image reconstruction. The problem is formulated within a Bayesian framework as a linear inverse problem, with prior distributions assigned to the unknown model parameters. Specifically, the image to be reconstructed is assumed to be sparse in a given basis. We develop a general framework applicable to any basis and as examples, we test the sparsity of the image in its (1) spatial gradients using a total variation prior model, and in its (2) wavelet transform. A Markov chain Monte Carlo (MCMC) method, based on a split-and-augmented Gibbs sampler, is then employed to sample from the posterior distribution of the unknown parameters. The non-differentiable conditional distributions are efficiently sampled using a proximal MCMC method. The proposed algorithms are validated on both single-coil and multi-coil datasets using various k-space sub-sampling patterns and ratios. The results demonstrate the superior performance of each proposed approach in reconstructing images compared to its counterpart optimisation-based method. Moreover, our framework effectively quantifies uncertainty, showing a notable correlation between estimated uncertainty maps and error maps computed using ground truth and reconstructed images, compared with existing deep learning-based methods.

05.
bioRxiv (Bioinfo) 2026-06-20

Seed variation impacts clustering stability in Single-Cell RNA-Seq and can be mitigated by StAbility-BasEd-Reassignment (SABER)

Single-cell RNA-seq clustering is commonly treated as reproducible once a random seed is fixed, yet the choice of seed itself may alter cell assignments and downstream interpretation. We systematically quantified seed-induced clustering variability by running Louvain and Leiden clustering across 100 seeds in Seurat and Scanpy on 28 single-cell RNA-seq datasets from the Human Cell Atlas and IMMUcan. Using Element-Centric Consistency, we found that seed choice affected a substantial fraction of cells, with Scanpy showing more unstable assignments than Seurat on average, 40.46% versus 26.78% unstable cells, respectively. This increased stability came at a marked computational cost: Seurat required approximately 19-fold higher median memory than Scanpy. Seed-dependent clustering variability also propagated to cell-type annotation, particularly among transcriptionally related populations including macrophage/monocyte, endothelial/epithelial and T/NK cell states. To mitigate this instability, we developed StAbility-BasEd Reassignment (SABER), a Scanpy-based framework that identifies seed-sensitive cells across repeated clusterings and reassigns them to stable cluster cores using cosine similarity. SABER improved clustering quality while preserving annotation concordance and reduced median memory usage 3.5-fold compared with Seurat-Louvain. Our results identify seed choice as an underappreciated source of variability in single-cell analysis and provide a scalable strategy to improve clustering robustness.

06.
arXiv (CS.LG) 2026-06-17

Evaluating Open-Source LLMs for Multi-Label ATT&CK Technique Classification on CTI Reports

arXiv:2606.18166v1 Announce Type: cross Abstract: Classifying Cyber Threat Intelligence (CTI) using MITRE Adversarial Tactics, Techniques, and Common Knowledge (ATT&CK) is essential for proactive defense, but historically required extensive human effort. Pre-Large Language Model (LLM) automation sped up this process, but could not resolve the complex language and multi-step attack patterns found in unstructured CTI reports. LLMs addressed previous limitations by using contextual reasoning to understand unstructured text. However, current evaluations rely on simplified, single-technique sentences that ignore the complexity of real-world CTI reports, which often leads to inflated performance results. Consequently, the baseline performance of open-source LLMs on complex unstructured CTI reports remains unevaluated. To address this gap, we constructed a ground-truth dataset of 2,076 human-annotated sentences (1,281 technique-positive, 795 negative) from 83 complex unstructured CTI reports. These sentences were mapped to 114 unique ATT&CK techniques using a six-phase annotation process, achieving \k{appa} = 0.68 inter-annotator agreement. Using this dataset, we evaluated seven open-source LLMs ranging from 8B to 236B parameters across prompt strategy and temperature configurations. The highest-performing LLM achieved a micro-averaged F1 score of 0.22, establishing the empirical baseline for multi-label ATT&CK classification on complex unstructured CTI. Parameter size showed a statistically significant positive correlation with F1 score. Prompt strategy and temperature produced no statistically significant gains across model configurations. These results indicate that current open-source LLMs are insufficient for production-grade ATT&CK classification. The dataset, benchmark, and findings provide a reproducible foundation for future CTI research.

07.
arXiv (CS.CV) 2026-06-17

OpenTie: Open-vocabulary Sequential Rebar Tying System

Robotic practices on the construction site emerge as an attention-attracting manner owing to their capability of tackling complex challenges, especially in the rebar-involved scenarios. Most of existing products and research are mainly focused on the collection of large amounts of data with model training demands. To fulfill this gap, we propose OpenTie, a 3D training-free rebar tying framework utilizing a RGB-to-point-cloud generation and an open-vocabulary rebar detection on the real-world test. We implement the OpenTie via a robotic arm with a binocular camera and guarantee a high accuracy by applying the prompt-based object detection method on the image filtered by our proposed post-processing procedure for the image-to-point-cloud generation framework. Our pipeline requires no training efforts and outperforms the training-based object detection, i.e., YOLO-based method, with the verification on the real-world sequential rebar tying test. The system is flexible for horizontal and vertical rebar tying tasks and holds the potential application to the real construction site with possibility of commercialization.

08.
arXiv (quant-ph) 2026-06-11

Emergent Bell Phase in an Electro-Nanomechanical Quantum Simulator

arXiv:2511.02613v2 Announce Type: replace Abstract: Suspended carbon nanotubes hosting electrostatically defined quantum dots allow for exceptionally strong and tunable electromechanical coupling as well as mechanical modes that can reach the quantum ground state of motion simply by cryogenic cooling. This makes them a unique platform for quantum simulation of electron-phonon coupling. Here, we propose an experimentally realisable setup with two such carbon nanotubes in parallel, each hosting four quantum dots. Our system not only exhibits phonon-mediated electron-electron attraction, but also supports a robust, maximally entangled Bell phase at mesoscopic scales shared across the subsystems. These features highlight its potential as a simulator of strongly correlated quantum systems.

09.
arXiv (CS.AI) 2026-06-16

Guiding Federated Graph Recommendation with LLM-encoded knowledge

arXiv:2606.15277v1 Announce Type: cross Abstract: Graph-based recommender systems are highly effective at extracting collaborative signals from user–item interactions, and federated learning (FL) allows these models to be trained while preserving user privacy. However, aggregating graph representations across distributed, non-IID clients remains a challenge; structural embeddings learned locally often misalign, and naive averaging fails to capture meaningful cross-client relationships. Most existing federated graph methods rely exclusively on structural aggregation, neglecting the rich, global semantic context available in large language models (LLMs). In this paper, we propose a novel framework that uses LLM-encoded knowledge to guide federated graph recommendation. Specifically, clients learn structural representations from local graphs while simultaneously summarizing their typical interaction patterns into compact semantic vectors via a frozen LLM. The central server then uses these LLM-encoded semantic signals to discover related preference patterns across clients, guiding the selective aggregation of their structural representations. This enables semantically informed cross-client collaboration without exposing raw data. Extensive experiments on standard benchmarks show that guiding structural alignment with LLM-encoded knowledge consistently improves recommendation accuracy over existing federated graph baselines.

10.
arXiv (CS.AI) 2026-06-16

AI Engram: In Search of Memory Traces in Artificial Intelligence

arXiv:2606.14997v1 Announce Type: new Abstract: Memory formation is fundamental to intelligence, yet whether deep neural networks preserve identifiable memory traces analogous to biological memory units remains an open question. This work introduces a geometric framework to identify such "AI engrams" by formalizing the neuroscientific criteria of specificity, reactivation, sufficiency, and necessity into a constrained inverse problem. We derive a closed-form estimator that isolates individual memory traces from globally entangled parameters, and show that this biologically-derived solution corresponds to a natural gradient update on the parameter manifold. AI engrams enable surgical manipulation of learned knowledge: any subset of memories can be composed or erased through linear arithmetic, without iterative optimization. Experiments ranging from simple MLPs to LLMs demonstrate the causal validity and substantial scalability of AI engrams. Together, these results bridge theories of biological memory and artificial representation learning and offer geometric insight into how deep networks simultaneously support functional specificity within distributed storage.

11.
arXiv (CS.AI) 2026-06-12

MARS: Margin-Adversarial Risk-controlled Stopping for Parallel LLM Test-time Scaling

arXiv:2606.12935v1 Announce Type: new Abstract: Parallel test-time scaling samples many reasoning traces and majority-votes their answers, improving LLM accuracy but requiring traces to run to completion, incurring substantial computational overhead. We observe that probing partial traces at intermediate checkpoints can extract current answers without disrupting generation, revealing an evolving aggregate vote. Based on this observation, we introduce MARS, a margin-adversarial stopping rule that estimates which active traces are likely to change their answers and stops once the leader remains safe under a conservative bound on future vote movement. The rule separates two sources of uncertainty. It learns the trace-level switch probabilities that determine how much of the current margin is likely to be retained, while handling the harder question of where switching traces land through an adversarial bound calibrated from warmup traces. With true switch probabilities, MARS guarantees with high probability that the early-stopped answer matches the full-budget vote. In practice, a five-feature logistic model closely matches oracle switching behavior. Across three reasoning models and three competition-math benchmarks, MARS saves 25-47% of self-consistency tokens and 14-29% on top of DeepConf Online, a strong confidence-weighted baseline that already filters and truncates weak traces, while matching the accuracy of the corresponding full-budget baselines.

13.
arXiv (CS.LG) 2026-06-17

Geometrical fairness in graph neural networks

arXiv:2606.17684v1 Announce Type: cross Abstract: Graph-based learning methods have become increasingly prominent due to their strong performance across diverse applications. Among these, recent frameworks grounded in diffusion processes provide a unifying perspective that extends traditional graph neural network formulations while addressing limitations of standard message-passing mechanisms. Despite these advances, concerns remain regarding the fairness of such models, as they may propagate or amplify biases present in the data. In this work, we introduce a fairness-aware adaptation of graph-based diffusion by modifying the underlying Laplacian operator. Our approach incorporates multiple complementary transformations, including subspace projections, spectral adjustments, and frequency-based filtering, to mitigate bias-related components. Leveraging the intrinsic smoothing properties of graph diffusion, we provide a principled analysis of the resulting behavior and establish theoretical insights into fairness properties. We evaluate the proposed framework on both synthetic and real-world datasets, demonstrating that it achieves competitive performance while improving fairness metrics with limited additional computational cost.

14.
arXiv (math.PR) 2026-06-12

Pathwise integration beyond Young via Faber–Schauder energy spaces

作者:

arXiv:2606.13331v1 Announce Type: cross Abstract: We develop a pathwise integration theory based on Faber–Schauder energy spaces. The approach replaces the classical Hölder–Young and finite-variation Young conditions by dyadic summability conditions expressed in terms of Faber–Schauder coefficients. On the normalized interval $[0,1]$, these conditions define Banach spaces $\mathcal{E}^p$, which we call Faber–Schauder energy spaces. For $p,q>1$ satisfying $1/p+1/q\ge1$, we prove that every pair $f\in\mathcal{E}^p$ and $g\in\mathcal {E}^q$ admits a continuous pathwise integral $I_{f,g}$, constructed from dyadic left Riemann sums. We call $I_{f,g}$ the Faber–Schauder integral, and show that it depends boundedly and bilinearly on $(f,g)$ in the corresponding energy norms. The integral satisfies additivity, integration by parts, and a dyadic Young–Loève estimate. It is also the uniform limit of classical Riemann–Stieltjes integrals of finite Faber–Schauder approximations. The Faber–Schauder integral agrees with the classical Young integral whenever the latter is available, but also applies to deterministic and Gaussian examples for which neither the Hölder–Young condition nor the finite-variation Young condition can be verified. In this sense, it provides a Faber–Schauder coefficient-based extension of Young's framework.

15.
arXiv (CS.AI) 2026-06-12

Teach-and-Repeat: Accurately Extracting Operational Knowledge from Mobile Screen Demonstrations to Empower GUI Agents

arXiv:2606.12817v1 Announce Type: new Abstract: Understanding the digital world on mobile devices is shifting from static UI perception to dynamic action comprehension. This capability enables models to convert visual state transitions into operational knowledge, defined as short natural-language sentences that describe action types, target UI elements, textual arguments, and execution orders. However, due to the highly diverse and heterogeneous UI designs across applications, existing vision-language models (VLMs) struggle to accurately infer these underlying operations. To bridge this gap, we introduce Teach VLM, a core model designed to translate mobile screen trajectories into step-wise operational knowledge by extracting and analyzing operation-related keyframes from demonstration videos. To address the scarcity of aligned training data, we develop a systematic data flywheel for scalable data acquisition. We further introduce a novel Chinese Mobile Screen Teach Benchmark for fine-grained evaluation. Building upon Teach VLM, we propose the Teach-and-Repeat paradigm, where the generated operational knowledge serves as an interpretable procedural reference to guide downstream screen-based execution agents. Extensive evaluations demonstrate that Teach VLM significantly outperforms strong VLM baselines, achieving state-of-the-art performance in operation semantics prediction. Furthermore, experiments in Android World show that our paradigm yields consistent Task Success Rate improvements for downstream agents. Together, Teach VLM and the Teach-and-Repeat paradigm offer a practical pathway from raw demonstrations to reusable task automation.

16.
arXiv (CS.CL) 2026-06-12

A Unifying Lens on Reward Uncertainty in RLHF

Reinforcement learning from human feedback (RLHF) is bottlenecked by reward hacking, where the policy exploits errors in a proxy reward model (RM) and produces high RM scores without genuine quality gains. A natural mitigation is pessimism: lowering rewards in regions where the RM is uncertain. However, standard scalar RMs provide no principled notion of uncertainty. We argue that the right object is a distributional reward model $p(r\mid x,y)$. Under either a Bayesian inference or a KL-distributionally robust optimization (KL-DRO) lens, the KL-regularized RLHF objective admits a closed-form effective reward $\tilde r(x,y) = \pm\beta\log\mathbb{E}_p[e^{\pm r/\beta}]$. The pessimistic branch unifies the prior heuristics for RM ensemble aggregation: mean aggregation, worst-case optimization (WCO), and uncertainty-weighted optimization (UWO) all emerge as limits or truncations of this single expression. This also clarifies the implicit assumptions of each existing rule.

17.
bioRxiv (Bioinfo) 2026-06-08

DDI_single: Single-Sequence-Based Protein Domain Assembly

作者:

Domains are the basic units of protein structure and function. Appropriate inter-domain organization is critical to enable cooperative execution of multiple related functions. It is thus a crucial step to determine the full-length structure of multi-domain proteins for the purpose of elucidating their functions and designing new drugs to regulate these functions. Existing structure prediction algorithms are generally better at solving the internal conformation of domains, rather than modeling the relative positions between domains. To address the challenge of accurately determining multi-domain protein conformations, we develop a single-sequence-based domain assembly algorithm called DDI_single. DDI_single directly extracts features from the amino acid sequence using the protein language model ESM-1b, and accurately predicts the interactions between residue pairs of structural domains through a novel gated cross-attention module, thus achieving the correct assembly of structural domains. With the knowledge of domain definition, DDI_single achieves more than 20% higher accuracy in the task of predicting the relative distances of residue pairs between domains than that of the single-sequence-based structure prediction algorithm trRosettaX_single. When assembling domains with known spatial conformations, DDI_single correctly assembles 74.4% of the samples in the test set (TM-score>0.5). When assembling domains with unknown spatial conformations, in cases where the internal spatial conformations of domains are correctly modeled, DDI_single correctly assembles 73.9% of the samples.

18.
arXiv (CS.CV) 2026-06-11

Semantic search for 100M+ galaxy images using AI-generated captions

Finding scientifically interesting phenomena through slow manual labeling campaigns severely limits our ability to explore the billions of galaxy images produced by telescopes. In this work, we develop a pipeline to create a semantic search engine from completely unlabeled image data. Our method leverages Vision-Language Models (VLMs) to generate descriptions for galaxy images, then contrastively aligns a pre-trained astronomy foundation model with these embedded descriptions to produce searchable embeddings at scale. We find that current VLMs provide descriptions that are sufficiently informative to train a semantic search model that outperforms direct image similarity search. Our model, AION-Search, achieves state-of-the-art zero-shot performance on finding rare phenomena despite training on randomly selected images with no deliberate curation for rare cases. Furthermore, we introduce a VLM-based re-ranking method that nearly doubles the recall for our most challenging targets in the top-100 results. For the first time, AION-Search enables flexible semantic search for over 100 million galaxy images, enabling discovery from previously infeasible searches, including the identification of 36 new extragalactic stellar stream candidates. More broadly, our work provides an approach for making large, unlabeled scientific image archives semantically searchable, expanding data exploration capabilities in fields from Earth observation to microscopy. The code, data, and app are publicly available at https://github.com/NolanKoblischke/AION-Search

19.
arXiv (CS.AI) 2026-06-12

EpiBench: Verifiable Evaluation of AI Agents on Epigenomics Analysis

arXiv:2606.13602v1 Announce Type: new Abstract: We introduce EpiBench, a verifiable benchmark for short-horizon epigenomics analysis. EpiBench evaluates whether agents can make well-defined analysis decisions from realistic workflow states and return deterministically gradable answers. The benchmark includes 106 evaluations across CUT\&Tag/CUT\&RUN, ATAC-seq, ChIP-seq, and DNA methylation workflows. Across 5,088 valid trajectories from 16 model-harness pairs, no system passed a majority of attempts: GPT-5.5 / Pi led at 45.0\% (143/318 attempts; 95\% confidence interval (CI), 36.3–53.7), followed by GPT-5.5 / OpenAI Codex at 39.9\% (127/318 attempts; 95\% CI, 31.6–48.3). Claude Opus 4.8 Max / Pi and GPT-5.4 / Pi each passed 39.0\% (124/318 attempts; 95\% CI, 30.2–47.8 and 31.0–47.0, respectively). Performance varies across assay types, and many failed runs still contain parts of the correct answer. Agents often found the right files and computed useful intermediate results, but failed when the task required deeper, assay-specific scientific judgment.

20.
arXiv (CS.AI) 2026-06-12

Automated reproducibility assessments in the social and behavioral sciences using large language models

arXiv:2606.13670v1 Announce Type: new Abstract: Reproducibility in the social and behavioral sciences is typically evaluated by independent researchers who reanalyze the original data to assess whether the published findings can be recovered. However, such approaches are resource-intensive and difficult to scale. Here, we show that large language models (LLMs) can automate reproducibility assessments. Using N=76 published studies with predefined claims from the behavioral and social sciences, we compare LLM-generated analysis with the original findings and human reanalysis. For 7 studies, the LLM could not produce a viable effect size estimate. For the remaining studies, our LLM pipeline recovered the original effect sizes in 41% of studies using a +/-0.05 tolerance in Cohen's d. Further, our LLM pipeline reached the same qualitative conclusion as the original study in 96% of cases, where conclusions indicate whether the reanalysis supports the original claim. For comparison, human reanalysts recovered the original effect sizes in 34% of studies and reached the same qualitative conclusion in 74% of cases. Together, these results show that LLMs can serve as a scalable tool for automated reproducibility assessment and provide a foundation for systematic auditing of empirical results in the social and behavioral sciences.

21.
arXiv (math.PR) 2026-06-15

Sectional Curvature for Kantorovich-Wasserstein and Hellinger-Kantorovich Geometries

arXiv:2606.14318v1 Announce Type: cross Abstract: We derive an explicit formula for the sectional curvature of the space ${\cal M}(M)$ of finite measures on a Riemannian manifold M. The space ${\cal M}(M)$ is equipped with the Hellinger-Kantorovich metric $HK$. Even in the case M=R^n, the curvature is comprised of two parts: the `lifted part' is negative, and the `twisted part' is positive. It will be analyzed in detail for the multidimensional torus. Our general approach to sectional curvature in geodesic spaces also leads to new insights into the curvature of the space $P_2(M)$ of probability measures on M equipped with the Kantorovich-Wasserstein metric $W_2$.

22.
arXiv (CS.CV) 2026-06-11

MLT-Dedup: Efficient Large-Scale Online Video Deduplication via Multi-Level Representations and Spatial-Temporal Matching

The explosive growth of user-generated video content on online platforms is accompanied by the emergence of numerous near-duplicate videos–videos that are identical or highly similar but differ by partial edits. These duplicates degrade user experience and increase storage and bandwidth costs, making large-scale video deduplication a critical task. Existing video deduplication frameworks face a fundamental challenge in retrieving sufficient high-quality candidates under a limited index budget, as well as trade-offs between efficiency and precision. To address these issues, we propose MLT-Dedup, an efficient large-scale online video deduplication framework with Multi-Level representations and spatial-Temporal matching. Our approach employs a Multi-Level Video Encoder (ML-VE) to extract both fine-grained frame-level and sparse clip-level embeddings: sparse embeddings support efficient candidate retrieval, while fine-grained embeddings are loaded for precise pairwise matching. During matching, we introduce DiF-SiM, a Differential Feature-enhanced Similarity Module capable of locating duplicated temporal segments and providing reliable similarity evidence to support policy-driven deduplication decisions. Extensive experiments on a real-world large-scale platform demonstrate that MLT-Dedup reduces online repetition rates by 91% at 90% precision. Furthermore, our sparse retrieval design achieves a 5x increase in indexing capacity, enabling broader candidate coverage in real-world deployment.

23.
arXiv (CS.CL) 2026-06-18

Improve Large Language Model Systems with User Logs

Scaling training data and model parameters has long driven progress in large language models (LLMs), but this paradigm is increasingly constrained by the scarcity of high-quality data and diminishing returns from rising computational costs. As a result, recent work is increasing the focus on continual learning from real-world deployment, where user interaction logs provide a rich source of authentic human feedback and procedural knowledge. However, learning from user logs is challenging due to their unstructured and noisy nature. Vanilla LLM systems often struggle to distinguish useful feedback signals from noisy user behavior, and the disparity between user log collection and model optimization (e.g., the off-policy optimization problem) further strengthens the problem. To this end, we propose UNO (User log-driveN Optimization), a unified framework for improving LLM systems (LLMsys) with user logs. UNO first distills logs into semi-structured rules and preference pairs, then employs query-and-feedback-driven clustering to manage data heterogeneity, and finally quantifies the cognitive gap between the model's prior knowledge and the log data. This assessment guides the LLMsys to adaptively filter out noisy feedback and construct different modules for primary and reflective experiences extracted from user logs, thereby improving future responses. Extensive experiments show that UNO achieves state-of-the-art effectiveness and efficiency, significantly outperforming Retrieval Augmented Generation (RAG) and memory-based baselines. We have open-sourced our code at https://github.com/bebr2/UNO .

24.
arXiv (CS.LG) 2026-06-19

Utility-Aware DRL-Based TXOP Adaptation for NR-U and Wi-Fi Coexistence Networks

arXiv:2605.00457v4 Announce Type: replace-cross Abstract: The coexistence of NR-U and Wi-Fi in the unlicensed spectrum introduces a challenging resource management problem, where heterogeneous channel access mechanisms can lead to unbalanced spectrum utilization and severe Wi-Fi performance degradation. To address this issue, this paper proposes a utility-aware deep reinforcement learning (DRL) framework for adaptive transmission opportunity (TXOP) control in NR-U/Wi-Fi coexistence networks. The coexistence process is formulated as a Markov decision process (MDP), in which the NR-U TXOP duration is treated as a controllable variable for regulating post-access channel occupancy. A deep Q-network (DQN) is then employed to learn adaptive TXOP control policies through online interaction with the coexistence environment. A key feature of the proposed framework is the integration of a configurable reward and criterion design, which enables explicit control of the fairness-efficiency-utility tradeoff. Three operating policies are developed, namely absolute fairness, moderate fairness, and utility-oriented moderate fairness, to characterize different coexistence operating points. Simulation results show that the proposed framework achieves a Jain fairness index above 0.9 under strict fairness control. Compared with the absolute fairness policy, the moderate fairness policy improves aggregate throughput by 68.22%, while the utility-oriented policy achieves a 177.6% improvement under the adopted utility evaluation metric. These results demonstrate that the proposed utility-aware DRL framework provides an effective and flexible solution for adaptive TXOP control and tradeoff management in heterogeneous unlicensed coexistence networks.

25.
arXiv (CS.AI) 2026-06-11

SVoT: State-aware Visualization-of-Thought for Spatial Reasoning via Reinforcement Learning

arXiv:2606.11770v1 Announce Type: new Abstract: Spatial reasoning remains a challenge for Multimodal Large Language Models (MLLMs), as it requires reliable multi-hop inference over both intermediate states and state transitions. Current studies often leave intermediate states unverified and treat state transitions as implicit processes, which limits reliability in multi-hop spatial reasoning. To address this, we propose State-aware Visualization-of-Thought (SVoT), a reinforcement learning framework that generates interleaved, verifiable intermediate states and visualizations. SVoT integrates transition reasoning chains into the generation processes, enabling the model to verify action preconditions and effects through interleaved textual and visual reasoning. We train SVoT via Group Relative Policy Optimization (GRPO), instantiating verification through reward design and evaluating the efficacy of different fine-grained rewards. As existing benchmarks reduce state transitions to single-variable updates, substantially simplifying the problems, we establish five domains by extending classical environments and introducing two novel domains, Pacman and Gather, that require multi-object interactions and numerical reasoning. These domains support systematic evaluation of multi-hop spatial reasoning with quantitative verification of generated intermediate states and transition reasoning. SVoT with transition-aware supervision achieves state-of-the-art performance across the introduced domains, yielding up to a 65% absolute accuracy gain on out-of-distribution test sets.