Academic Intelligence · Curated Daily

探索全球前沿学术脉络

AcademicHub 汇聚顶级期刊与预印本平台的实时文献。定制您的专属科研雷达,利用大语言模型自动生成交叉领域文献分析简报。

01.
arXiv (math.PR) 2026-06-18

Second-Order Approximation of Limit Order Books in a Single-Scale Regime

arXiv:2308.00805v3 Announce Type: replace-cross Abstract: We establish a first- and second-order approximation for an infinite dimensional limit order book model in a single (critical) scaling regime where market and limit orders arrive at a common time scale. With our choice of scaling we obtain non-degenerate first- and second-order approximations for the price and volume dynamics. While the first-order approximation is given by a coupled ODE-PDE system, the second-order approximation is described in terms of an infinite-dimensional stochastic evolution equation driven by a cylindrical Brownian motion. The driving noise processes exhibit a non-trivial correlation in terms of the model parameters. We prove that the evolution equation has a unique solution and that the sequence of standardized limit order book models converges weakly to the solution of the evolution equation. The proof uses a non-standard martingale problem. We calibrate a linearized model to market data and explain how our model can be used for deriving confidence intervals of portfolio liquidation values.

02.
Nature Medicine 2026-06-08

Apitegromab for lean mass preservation during tirzepatide-induced weight loss: a randomized, double-blind, placebo-controlled phase 2 trial

Loss of lean mass in proportion to total weight loss is observed with incretin mimetic therapies such as tirzepatide and has the potential to adversely affect health and function. Apitegromab is an investigational, fully human monoclonal antibody that selectively inhibits myostatin activation and is, thereby, capable of increasing muscle mass. In the randomized, double-blind, placebo-controlled phase 2 EMBRAZE study, adults with overweight or obesity (n = 102) were randomized 1:1 to receive tirzepatide plus apitegromab (10 mg kg−1) or tirzepatide plus placebo. At week 24, apitegromab resulted in a least square mean (80% confidence interval (CI)) of 1.9 (1.2−2.7) kg less lean mass loss than placebo (P = 0.001), despite similar total body weight loss between groups, representing a 54.9% retention of lean mass relative to placebo. In participants receiving apitegromab, trough concentrations of apitegromab and total latent myostatin, a pharmacodynamic marker, both increased over time and reached a plateau after approximately 16 weeks. Incidence of adverse events (AEs) (% (95% CI)) was generally similar across apitegromab-treated participants and placebo-treated participants, with 39 of 51 (76% (63−86%)) and 36 of 51 (71% (57−81%)) participants experiencing an AE, respectively. Serious adverse events (SAEs) were balanced and experienced by one of 51 (2% (0−10%)) participants in each arm. In summary, this proof-of-concept study demonstrated that selective targeting of myostatin by apitegromab was well tolerated and effective in preserving lean mass when combined with tirzepatide. ClinicalTrials.gov identifier: NCT06445075 . In the phase 2 EMBRAZE study, participants receiving tirzepatide and apitegromab lost less lean mass compared to participants receiving tirzepatide and placebo.

03.
arXiv (CS.AI) 2026-06-12

WISE: A Long-Horizon Agent in Minecraft with Why-Which Reasoning

arXiv:2606.12852v1 Announce Type: new Abstract: Rapid advances have been made in developing general-purpose embodied agent in environments like Minecraft through the adoption of LLM-augmented hierarchical approaches. Despite their promise, low-level controllers often become performance bottlenecks due to repeated execution failures. We argue that a key limitation is not only the lack of episodic memory, but also the decoupling of what-where-when memory from which-why reasoning. To address this, we propose WISE (Which-Why Informed Semantic Explorer), a long-horizon agent framework with an enhanced low-level controller equipped with a Causal Event Graph that augments episodic memory with explicit causal structure linking observations to task relevance. Unlike prior work such as MrSteve, which relies on feature similarity for retrieval, WISE enables robust recall under viewpoint changes and supports opportunistic task reordering through causal reasoning. Building on this memory, we propose an Opportunistic Task Scheduler that dynamically re-prioritizes subtasks when causally relevant opportunities are detected. We further equip WISE with a multi-scale progressive exploration strategy to provide spatially comprehensive observations for downstream reasoning. Experiments show that WISE largely improves task success and efficiency on long-horizon sparse tasks, particularly in settings requiring adaptive decision-making.

04.
arXiv (CS.AI) 2026-06-17

LLM-as-Judge in Education: A Curriculum-Grounded Marking Pipeline

arXiv:2606.17507v1 Announce Type: new Abstract: Generative AI and large language models (LLMs) are increasingly applied to question generation and automated assessment. However, deploying LLMs in preparation for high-stakes exams requires more than prompt engineering; it demands software pipelines that systematically ground model outputs in authorised curriculum artefacts and marking guidelines issued by education authorities. This paper presents a curriculum-grounded, configurable LLM-as-Judge pipeline for question-level marking, co-developed with an industrial partner, to support exam preparation for university admission. The pipeline identifies the relevant topics, subtopics, and cognitive demand of a question, and assembles verifiable and authorised context to support LLM judgement. Curriculum intent is operationalised through concrete syllabus artefacts, including prescribed verbs and outcomes, performance band descriptors, glossary definitions, and marking-guideline principles. A staged LLM workflow is employed to first generate question-specific rubrics, capturing structured expectations of performance, and then derive and evaluate marking criteria used to allocate marks to student responses. This design improves consistency, transparency, and alignment with official marking practices. Preliminary evaluation shows that the proposed LLM-as-Judge pipeline delivers marking outcomes comparable to human tutors, while yielding justifications that are more traceable to authorised curriculum artefacts and marking standards. The pipeline has also been integrated into an online study platform, where early deployment data provide initial insights into operational usage and manual overrides.

05.
arXiv (CS.CV) 2026-06-16

Explainable Flood Segmentation on Sentinel-1 SAR Imagery: A Comparative Study of CNN and Transformer Architectures

Rapid and accurate flood prediction is essential for disaster response and mitigation planning. Synthetic Aperture Radar (SAR) sensors in satellites are well-suited for this purpose because they operate independently of weather and daylight conditions. Although SAR-based data enable all-weather flood monitoring, distinguishing flooded land from permanent water remains a significant challenge, particularly when flooding is defined strictly as inundated land. This study provides a comprehensive comparison of convolutional neural network (CNN) and vision transformer architectures for multi-class flood segmentation using Sentinel-1 SAR imagery, specifically trained to separate flooded land from permanent water bodies and land. Three state-of-the-art (SOTA)CNN-based models, U-Net, U-Net++, and DeepLabV3 with ResNet-34 backbone, and three SegFormer variants (b0,b1,b2) were evaluated in two benchmark datasets, the ETCI NASA dataset and SenFloods11, using scene-based data splits to ensure a realistic assessment of spatial generalization. The results demonstrate that SegFormer-b2 significantly outperforms the U-Net baseline on the ETCI dataset (higher flood IoU across all 7 test scenes in the Wilcoxon signed-rank test), while after fine-tuning on Sen1Floods11, the advantage narrows to within the range of scene variability and is concentrated in spatially fragmented flood events. The study includes both qualitative and quantitative explainability techniques to visually comprehend model decisions and systematically assess prediction reliability. Qualitative analysis reveals that SegFormer-b2 produces more spatially coherent Grad-CAM activations focused on flood-relevant features, while U-Net generates more informative uncertainty estimates along flood boundaries.

06.
arXiv (CS.CV) 2026-06-11

Image Quality Assessment of Identity Cards Using Measures from Open Face Image Quality

This paper addresses the challenge of assessing image quality in ID cards in remote verification systems by applying capture-related quality measures from the Open Face Image Quality (OFIQ) standard to ID card images. Our preprocessing pipeline includes corner detection, perspective normalization, and comprehensive foreground masking to ensure accurate and unbiased quality measure computation. We evaluate the effectiveness of these measures by analyzing their correlation with the performance of three presentation attack detection (PAD) algorithms across four diverse ID card datasets, where two datasets contain bona fide, i.e. pristine, images and two contain printed mock ID cards. Our results suggest that quality assessment based on some OFIQ measures can significantly improve PAD performance.

07.
arXiv (CS.AI) 2026-06-25

Agentic evolution of physically constrained foundation models

arXiv:2606.25532v1 Announce Type: new Abstract: Artificial intelligence increasingly drives automated scientific discovery, yet contemporary generalist agents lack physical grounding, frequently hallucinating hardware-incompatible designs. Here, we present a physically grounded, multi-agent discovery engine that autonomously architects hardware-compliant computing systems. Anchored by an Evolutionary Knowledge Graph structuring past scientific innovations, the framework extracts an "algorithmic Chain-of-Thought" to transform blind stochastic search into directed structural evolution. Applied to the extreme testbed of foundation model deployment, the engine evolved two hardware-aware compression methodologies surpassing human-engineered heuristics: Q-Enhance mitigates long-context accuracy loss in dense models, and MoE-Salient-AQ outperforms state-of-the-art manual sparse Mixture-of-Experts designs by 3.7% at sub-3-bit regimes. Utilizing a bandwidth-efficient Sensitivity Profile, we successfully deployed a massive 235-billion-parameter model onto a constrained dual-A100 server, reducing memory requirements by 75% with a marginal 0.64% accuracy degradation. By transforming unconstrained combinatorial search into knowledge-driven autonomy, this establishes a scalable hardware-software co-design paradigm for machine-driven discovery within strict physical boundaries.

08.
arXiv (CS.AI) 2026-06-17

Rethinking Multimodal Fusion for Time Series: Text Modalities Need Constrained Fusion

arXiv:2603.22372v2 Announce Type: replace-cross Abstract: Recent advances in multimodal learning have motivated the integration of auxiliary modalities such as text or vision into time series (TS) forecasting. However, most existing methods provide limited gains, often improving performance only in specific datasets or relying on architecture-specific designs that limit generalization. In this paper, we show that multimodal models with naive fusion strategies (e.g., simple addition or concatenation) often underperform unimodal TS models, which we attribute to the uncontrolled integration of auxiliary modalities which may introduce irrelevant information. Motivated by this observation, we explore various constrained fusion methods designed to control such integration and find that they consistently outperform naive fusion methods. Furthermore, we propose Controlled Fusion Adapter (CFA), a simple plug-in method that enables controlled cross-modal interactions without modifying the TS backbone, integrating only relevant textual information aligned with TS dynamics. CFA employs low rank adapters to filter irrelevant textual information before fusing it into temporal representations. We conduct over 20K experiments across various datasets and TS/text models, demonstrating the effectiveness of the constrained fusion methods. Code is available at: https://github.com/seunghan96/cfa.

09.
arXiv (CS.AI) 2026-06-24

Invariant Graph Representations for Continuous-Time Dynamic Graphs Under Distribution Shifts

arXiv:2405.19062v2 Announce Type: replace-cross Abstract: Continuous-Time Dynamic Graphs (CTDGs) enable fine-grained modeling of evolving relational systems. However, most existing CTDG representation learning methods are tailored to in-distribution settings and exhibit limited robustness under out-of-distribution (OOD) shifts. Although recent causal approaches learn invariant representations via interventions, they are primarily designed for static or discrete-time graphs and become computationally prohibitive for CTDGs due to the combinatorial explosion of structural and temporal variations. To address these challenges, we propose CIR, a framework grounded in a novel structural causal model termed the ICCM. To avoid exhaustive interventions, we leverage the Normalized Weighted Geometric Mean (NWGM) to efficiently approximate interventional predictions. We further instantiate ICCM within a practical deep learning architecture that jointly captures invariant structural and temporal patterns through dedicated subgraph extractors, and maintains an environment memory bank to model distributional shifts across evolving contexts. Extensive experiments demonstrate that CIR consistently outperforms existing methods under diverse OOD scenarios.

10.
arXiv (quant-ph) 2026-06-12

Non-Hermitian skin effect induced by spatial noncommutativity

arXiv:2606.12961v1 Announce Type: new Abstract: In all known schemes for the non-Hermitian skin effect, the non-Hermitian ingredient that drives the skin localization, whether asymmetric hopping or gain and loss, is invariably introduced by hand as an independent model parameter along the skin direction. Here we show that when two spatial coordinates do not commute, the skin effect can break free of this paradigm: a gain-loss potential applied along one coordinate automatically generates non-reciprocity along the other through the coordinate noncommutativity, driving all eigenstates to pile up exponentially at a boundary. We term this phenomenon the noncommutative skin effect. The inverse skin length is proportional to the noncommutativity parameter and is given by an analytic formula, exact in the thermodynamic limit and verified by exact diagonalization of lattice models; the reflection symmetry of the imaginary potential furnishes an exact criterion for the presence or absence of the effect, valid rigorously for finite-size systems. For a sinusoidal imaginary potential, the skin direction of all eigenstates flips collectively at parameter points fixed purely by geometry. Because the flip point is independent of the potential strength, the reversal constitutes a zero-crossing measurement scheme intrinsically robust against systematic errors, from which the noncommutativity parameter can be extracted directly. The qualitative transition of the eigenstates from uniform to exponentially localized renders the effect a nonperturbative probe of spatial noncommutativity, and the Peierls-phase structure of its lattice model is in principle accessible to cold-atom synthetic dimensions, photonic resonators, and topolectrical circuits.

11.
arXiv (CS.AI) 2026-06-11

EvalStop: Using World Feedback to Detect and Correct Reward Overoptimization in Multi-Tenant RLHF Platforms

arXiv:2606.04145v2 Announce Type: replace-cross Abstract: Cloud LLM fine-tuning platforms increasingly serve RLHF workloads, where a learned reward model is optimized as a proxy for human quality. As Gao et al. (2023) showed, this proxy diverges from world feedback (downstream eval metrics) under sustained optimization pressure, a phenomenon known as reward overoptimization. Existing platform schedulers ignore this divergence: non-clairvoyant schedulers optimize JCT without any quality signal, SLAQ-style quality-aware schedulers use training loss (a weaker proxy that drops monotonically through hacking), and classical per-job early stopping requires human monitoring and does not free shared GPUs. We propose EvalStop, a composable scheduling primitive that terminates jobs on k consecutive eval-score declines, releases GPUs, preserves the best checkpoint, and delegates to any base scheduler. We frame scheduler-level early stopping as a detection problem and evaluate it in a discrete-event simulator whose RLHF workload mixes reward-hacking and structurally healthy runs, with ground-truth labels hidden from schedulers. On RLHF-heavy workloads (80% RLHF, 64 GPUs), EvalStop achieves precision 98% / recall 99% / FPR 1.5% while improving JCT by 9% and cutting wasted compute by 22% over SRTF-Est (p

12.
medRxiv (Medicine) 2026-06-16

Doctors, Wellness Influencers, and Probiotic Gummies: A Cross-Sectional Analysis of Gut Health Claims and Financial Conflicts on TikTok

TikTok has emerged as a major source of health information, yet concerns persist regarding the accuracy of content and influence of financial conflicts. Gut health content is particularly vulnerable to misinformation. This study examined the relationship between creator profession ("medical" versus "non-medical") and the quality of gut health claims and the presence of financial conflicts on TikTok. We conducted a cross-sectional study of 412 TikTok creator accounts identified using the search terms "guthealth," "gutcleansing," and "digestion." One video per creator was analyzed. Creator profession was categorized as medical or non-medical. Health claim quality was coded as high, moderate, or poor. Financial conflicts (Showcase, Subscription, external links) were assessed. Modified Poisson regression was used to estimate prevalence ratios (PRs) of health claim quality (high versus poor- or moderate-quality) and financial conflicts between medical and non-medical creators, and negative binomial regression was used to evaluate associations between claim quality and number of video likes. Non-medical creators were more likely than medical creators to present poor- or moderate-quality health claims (adjusted PR: 2.33; 95% CI: 1.50-3.62). Most creators (92%) exhibited at least one financial conflict, and Showcase use was greater among non-medical creators (adjusted PR: 1.57; 95% CI: 1.02-2.42). Videos containing moderate- and poor-quality health claims received three times as many likes as videos containing high-quality claims. Non-medical creators disproportionately produced lower-quality gut health content on TikTok, and misleading claims received greater engagement. These findings highlight a misalignment between information quality and visibility, emphasizing the need for interventions promoting evidence-based health communication.

13.
arXiv (CS.CV) 2026-06-12

Trajectory-Level Redirection Attacks on Vision-Language-Action Models

Vision-language-action (VLA) policies bring natural language into closed-loop robot control, enabling robots to execute manipulation tasks directly from text instructions. The same interface gives text a recurring role in control because the prompt is reused at every replanning step, and each prompt-conditioned action changes the future observations on which the policy acts. Existing VLA attacks study adversarial prompts that elicit targeted low-level actions or make such actions persist across changing images. We identify a stronger trajectory-level failure mode: a prompt that still $appears$ to specify the intended task but redirects the final physical outcome. We mathematically formalize this setting as $command-preserving trajectory redirection$, a prompt-only threat model in which the attacker chooses one prompt before the episode, all policy and environment components remain fixed, and the prompt must stay close to the benign instruction while omitting target words and correction language. To find such prompts, we introduce an on-policy prompt search method that uses rollouts to discover perturbations whose closed-loop behavior tracks a target task while satisfying the command-preserving constraints. Experiments in simulation and on hardware show that near-benign prompt perturbations can redirect VLA rollouts to attacker-specified targets. These results expose a trajectory-level vulnerability in VLA instruction grounding: text that appears to preserve the intended command can still give an adversary control over the robot's final physical outcome. Project website: https://vla-redirection-attack.github.io/

14.
arXiv (CS.AI) 2026-06-19

AAPA: Adversarially Anchored Preference Alignment for Post-Training of Large Language Models

arXiv:2509.25148v2 Announce Type: replace Abstract: Post-training alignment of large language models often combines supervised fine-tuning (SFT) on expert demonstrations with reinforcement learning (RL) from preference or verifiable feedback. SFT provides a useful behavioral anchor but can overfit to static demonstrations, whereas RL encourages exploration but may drift from expert behavior or exploit imperfect rewards. We propose AAPA (Adversarially Anchored Preference Alignment), a plug-in framework that augments existing post-training objectives with a sentence-level adversarial anchoring signal. AAPA compares policy rollouts with offline, pre-collected expert responses using a fixed lightweight discriminator, and therefore requires neither online teacher inference nor discriminator co-training during policy optimization. The same anchoring term can be added to SFT, GRPO, and CHORD while preserving their original training pipelines. Experiments on instruction-following benchmarks show that AAPA consistently improves the corresponding base objectives across model scales. In particular, the staged AAPA configuration improves over a strong GRPO baseline by 5.77\% on \texttt{Qwen3-0.6B} and 3.75\% on \texttt{Qwen3-4B}. Further analyses on response length, log-probability distributions, and discriminator variants suggest that adversarial anchoring provides a stable semantic grounding signal for preference optimization. Code is available at \url{https://github.com/IsFaqq/AAPA}.

15.
arXiv (CS.CV) 2026-06-16

SPARK: Spatial Policy-driven Adaptive Reinforcement learning for Knowledge distillation

Low-bit quantization enables deployment of image restoration (IR) networks on resource-constrained devices, but introduces rounding noise that disproportionately degrades high-frequency regions such as edges and fine textures. Existing knowledge distillation (KD) methods apply distillation signals uniformly across all spatial locations, overlooking the varying reconstruction difficulty across image regions. To address this, we propose SPARK (Spatial Policy-driven Adaptive Reinforcement Learning for Knowledge Distillation), a framework that adaptively allocates distillation effort using a lightweight reinforcement learning (RL) policy network. At each training step, a difficulty feature extractor computes four signals, namely Laplacian variance, pixel variance, student reconstruction error, and teacher-student knowledge gap, which are fed into a compact policy CNN that produces a stochastic spatial weight map to modulate the KD loss during quantization-aware training (QAT). SPARK is IR task-agnostic, adds no inference cost, and integrates into any existing QAT pipeline without architectural changes. Experiments on benchmark datasets demonstrate that SPARK consistently outperforms PTQ, QAT, and state-of-the-art (SOTA) KD approaches across multiple student architectures, achieving reconstruction quality closest to the full-precision teacher under significant computational constraints.

16.
arXiv (quant-ph) 2026-06-12

Toward Entanglement Bootstrap for Conformal Field Theory in Any Dimension

arXiv:2606.12540v1 Announce Type: cross Abstract: Given a quantum critical wavefunction in any dimension, we propose a reconstructed Hamiltonian, analogous to the ones previously found for 1+1d CFT and for 2+1d bosonic liquid topologically-ordered states. We test numerically that, for known regularized approximate CFT groundstates (on the icosahedron and the fuzzy sphere), (1) they are close to the groundstate of their reconstructed Hamiltonian, and (2) the spectrum of their reconstructed Hamiltonian on the unit sphere has CFT properties (integer spacing of descendants) and matches known low-lying energies. We show that this provides an automated method to improve the finite-size effects in a fixed Hilbert space.

17.
medRxiv (Medicine) 2026-06-24

Screen-Free Haptic Breathwork with HRV-Adaptive Control, Pilot Outcomes and System Design

Vayu is a mobile breathwork system comprising an iOS companion app and Apple Watch application that delivers slow, resonant breathing using screen-free haptic cues, HRV-adaptive pacing, and reflective journaling grounded in Patanjali's five states of mind. The watchOS component provides tactile phase guidance and real-time biometric sensing (heart rate, HRV), while the iOS interface supports analytics and personalized recommendations. In a 4-6-week naturalistic pilot involving 199 adults (ages 22-65) across Canada, the United States, and India, participants engaged in daily 5-10-minute sessions guided by on-wrist haptics. Average adherence was 4.1 +/- 2.3 sessions per week, with 71% of active users maintaining at least 3 sessions per week. By week four, perceived stress (PSS-10) decreased by 2.5 points, resting heart rate declined by 7.4 bpm, and HRV increased by a median of 28.6% relative to baseline, accompanied by mood improvements. No adverse events were reported. HRV metrics are derived from Apple Watch PPG-based proxies and interpreted as relative trends. These findings suggest Vayu is effective and well-tolerated, demonstrating strong engagement and early efficacy signals.

18.
arXiv (CS.AI) 2026-06-12

The Hidden Power of Scaling Factor in LoRA Optimization

arXiv:2606.12883v1 Announce Type: new Abstract: In Low-Rank Adaptation (LoRA), the scaling factor $\alpha$ is often treated as a mere complement to the learning rate, yet its role in optimization remains poorly understood. In this paper, we reveal that the scaling factor $\alpha$ and the learning rate function differently, with $\alpha$ emerging as the dominant driver of effective optimization, delivering gains that cannot be replicated by learning rate scaling alone. Through the synergy of extensive empirical analysis and a theoretical Signal-Drift framework, we uncover three findings into LoRA's scaling mechanism: First, LoRA's spectral suppression smooths the optimization landscape, rendering standard hyperparameters overly conservative and creating an optimization gap. Second, when leveraging this smoothness to accelerate convergence, $\alpha$ outperforms the learning rate by amplifying the task signal without increasing the drift ratio. Third, the optimal scaling factor follows a sublinear relationship with the rank, well characterized by a square-root law with an unexpectedly large coefficient, revealing the insufficient scaling of existing rank-tied heuristics. Based on these insights, we propose LoRA-$\alpha$, a minimalist framework that restores $\alpha$ to its principled regime, making LoRA compatible with standard small learning rates. Extensive evaluations across diverse tasks demonstrate that LoRA-$\alpha$ consistently improves performance while streamlining hyperparameter search, unleashing the learning potential of LoRA.

19.
arXiv (math.PR) 2026-06-25

On the jump of the cover time in random geometric graphs

arXiv:2501.02433v4 Announce Type: replace Abstract: In this paper we study the cover time of the simple random walk on the giant component of supercritical $d$-dimensional random geometric graphs on $\mathrm{Poi}(n)$ vertices. We show that the cover time undergoes a jump at the connectivity threshold radius $r_c$: with $r_g$ denoting the threshold for having a giant component, we show that if the radius $r$ satisfies $(1+\varepsilon)r_g \le r \le (1-\varepsilon)r_c$ for $\varepsilon > 0$ arbitrarily small, the cover time of the giant component is asymptotically almost surely $\Theta(n \log^2 n$). On the other hand, we show that for $r \ge (1+\varepsilon)r_c$, the cover time of the graph is asymptotically almost surely $\Theta(n \log n)$ (which was known for $d=2$ only for a radius larger by a constant factor). Our proofs also shed some light onto the behavior around $r_c$.

20.
arXiv (quant-ph) 2026-06-25

Anomalous topological superradiant phases

arXiv:2606.25635v1 Announce Type: new Abstract: We present a novel set of light-matter topology realized by implementing a finite-component quantum Rabi array with a photonic analog of the Su-Schrieffer-Heeger (SSH) configuration. We demonstrate how complex light-matter couplings with species-dependent phases lead to the closure of superradiance-induced band gap in a manner that differs from that in the SSH model. We uncover an topological superradiant phase transition from a normal phase to a topological superradiant electromagnet phase, which is characterized both by a local order parameter and a global topological invariant. Novel superradiance-enhanced edge states emerge with significantly amplified excitations superior to those in topological normal phase. Strikingly, tuning light-atom coupling induces novel topological superradiant electric and magnetic phases, exhibiting chiral edge-mode excitation at opposite boundaries. Our proposed setup offers a tunable platform for topological quantum optics, advancing applications in topological superradiant lasers.

21.
arXiv (CS.LG) 2026-06-17

VISTA: Scale-Aware Visual Navigation via Action History Conditioning

arXiv:2606.17294v1 Announce Type: cross Abstract: Vision Navigation Foundation Models (VNMs) promise end-to-end learned navigation policies capable of zero-shot deployment across diverse embodiments and environments. To maintain generality, many vision-based navigation models predict normalized actions. However, this normalization introduces a critical deployment vulnerability: applying different scaling factors to the same normalized trajectory alters its physical geometry, which degrades navigation performance and increases collision risks. We address this vulnerability by conditioning the model on normalized action histories alongside image observations, providing explicit context on the relationship between the model's predictions and the robot's actual physical displacement. Furthermore, current VNMs often struggle in visually repetitive environments that lack distinct features. To resolve this issue, we integrate a DINOv3 encoder, whose richer representations enable our model to capture both spatial and geometric dimensions between observations. VISTA generalizes robustly to out-of-distribution environments, achieving 100% goal prediction accuracy in zero-shot, real-world deployment in Outdoor, Forest and Office settings, and an average of 95% checkpoints crossed, demonstrating consistent path following in unseen environments.

22.
arXiv (CS.AI) 2026-06-19

Spatial-Aware Reduction Framework: Towards Efficient and Faithful Visual State Space Models

arXiv:2606.19932v1 Announce Type: cross Abstract: Mamba demonstrates strong efficiency in modeling long visual sequences. However, when token reduction is applied to structurally enhanced Mamba variants, these models exhibit a severe performance collapse. We attribute this degradation to the spatially agnostic nature of existing reduction methods, which violate the two-dimensional structural premise required by the selective scanning mechanism. In this work, we propose STORM, a spatial-aware token reduction framework designed to maintain structural integrity throughout the compression process. STORM reformulates reduction into a structured operation on spatial units, enforcing localized constraints to maintain both grid topology and neighborhood coherence. As a plug-and-play module, STORM equips existing reduction pipelines with explicit spatial awareness without any training. Empirical results demonstrate that STORM achieves state-of-the-art pruning accuracy across diverse vision Mamba backbones under training-free settings. Notably, STORM delivers a substantial accuracy recovery on VMamba, outperforming prior methods by up to 63.3\% in top-1 accuracy. Meanwhile, STORM incurs only a 1.0\% accuracy drop on PlainMamba, achieving performance comparable to ViT.

23.
arXiv (CS.AI) 2026-06-15

FAConformer: Frequency-Aware Convolutional Transformer for Auditory Attention Decoding

arXiv:2606.14120v1 Announce Type: cross Abstract: Auditory attention decoding (AAD) aims to infer the attended speaker from neural responses in multi-speaker acoustic environments and is a key problem for neuro-steered hearing systems. Although recent studies have achieved encouraging progress, existing AAD models still do not fully exploit frequency domain electroencephalography (EEG) information. In particular, most approaches introduce multi-band information through handcrafted feature extraction or direct cross-band feature concatenation, which mainly exploit frequency information at a shallow level and may overlook band-specific patterns and cross-band interactions. To address these limitations, this paper proposes FAConformer, a frequency-aware CNN-Transformer framework for AAD that explicitly integrates band-specific encoding and adaptive cross-band interaction. Specifically, FAConformer first decomposes EEG signals into multiple frequency bands and assigns each band to an independent CNN-Transformer encoder for band-specific modeling. The resulting band-wise features are then adaptively fused by a carefully designed frequency-aware attention (FAA) module that models cross-band dependencies by treating band-wise features as tokens. Further, band-wise auxiliary supervision (BAS) is introduced to prevent weakly contributing branches from being under-optimized during joint training. In this way, FAConformer performs frequency-aware modeling that more effectively exploits frequency domain information. Extensive experiments on two public AAD datasets with three decision-window lengths demonstrated that FAConformer consistently outperformed 12 competitive baselines, surpassing the current state-of-the-art model by 4.9%. Further analyses of band importance, ablation, and parameter sensitivity verify the effectiveness, robustness, and interpretability of the proposed framework. Code is available at https://github.com/wzwvv/FAConformer.

24.
arXiv (CS.LG) 2026-06-12

Adaptive Model-Predictive Control of a Soft Continuum Robot Using a Physics-Informed Neural Network Based on Cosserat Rod Theory

arXiv:2508.12681v3 Announce Type: replace-cross Abstract: Dynamic control of soft continuum robots (SCRs) holds great potential for expanding their applications, but remains a challenging problem due to the high computational demands of accurate dynamic models. While data-driven approaches like Koopman-operator-based methods have been proposed, they typically lack adaptability and cannot reconstruct the full robot shape, limiting their applicability. This work introduces a real-time-capable nonlinear model-predictive control (MPC) framework for SCRs based on a domain-decoupled physics-informed neural network (DD-PINN) with adaptable bending stiffness. The DD-PINN serves as a surrogate for the dynamic Cosserat rod model with a speed-up factor of up to 44,000. It is also used within an unscented Kalman filter for estimating the model states and bending compliance from end-effector position measurements. We implement a nonlinear evolutionary MPC running at 70 Hz on the GPU. In simulation, it demonstrates accurate tracking of dynamic trajectories and setpoint control with end-effector position errors below 3 mm (2.3\% of the actuator's length). In real-world experiments, the controller achieves similar accuracy and accelerations up to 3.55 m/s2.

25.
arXiv (CS.CV) 2026-06-19

3D-PLOT-LLM: Part-Level Object Tokens for 3D Large Language Models

3D multimodal large language models (3D MLLMs) describe a 3D object as a whole but cannot address, name, or reason about its parts. Prior part-aware attempts add segmentation decoders, heavier 3D encoders, or bounding-box grammars at substantial parameter cost. We take a fundamentally different path: we reorganize the input token stream so that parts become directly addressable through the LLM's own vocabulary. Our model, 3D-PLOT-LLM, partitions the frozen point encoder's patches into K locally coherent regions and inserts, before each region's patch tokens, a learnable per-region marker and a reserved vocabulary token ; a Marker-Space Refinement (MSR) module then conditions each marker on its region's spatial statistics and adjacency neighbors. The model thus cites parts in its output and follows prompts that refer to parts by token, a capability absent from prior object-level 3D MLLMs. To probe this interface, we construct PartVerse-QA, a vocabulary-level part-QA benchmark adapted from PartVerse mesh annotations (77K training pairs and 588 held-out queries on disjoint object splits), on which 3D-PLOT-LLM reaches caption-to-slots Jaccard 0.459 and Exact-match 13.78%, with a slot-to-caption GPT-4o judge of 44.68. On the 3DCoMPaT-GrIn part-aware grounded description benchmark, 3D-PLOT-LLM outperforms PointLLM, Kestrel, PARIS3D, and SegPoint on every text-output metric, and ShapeLLM on 3 of 4, with up to +3.03 GPT-4o judge over PointLLM. On Objaverse whole-object captioning, adding PartVerse-QA at Stage 2 yields +0.65 SBERT and +1.85 GPT-4o over PointLLM, and tops PointLLM-PiSA on 4 of 5 traditional metrics (SBERT, SimCSE, BLEU-1, METEOR) despite targeting a different (part-grounded) objective. All with under 1M new trainable parameters on a frozen point encoder, an order of magnitude below prior part-aware 3D MLLMs, and no segmentation decoder or bounding-box head.