Academic Intelligence · Curated Daily

探索全球前沿学术脉络

AcademicHub 汇聚顶级期刊与预印本平台的实时文献。定制您的专属科研雷达,利用大语言模型自动生成交叉领域文献分析简报。

01.
arXiv (CS.LG) 2026-06-11

A Data-Centric Framework for Detecting and Correcting Corrupted Labels

arXiv:2606.11699v1 Announce Type: new Abstract: The performance of machine learning and deep learning models largely depends on the quality of the training data. However, the quality of the real-world datasets is often compromised by noisy labels, which can substantially degrade model accuracy and reliability. To address this challenge, we propose Relabeler, an end-to-end data-centric framework for detecting and correcting corrupted labels. For corrupted label detection, Relabeler jointly leverages both local and global relationships among data instances to identify potentially noisy samples. After detecting suspicious instances, Relabeler further performs label correction by estimating the most probable clean label for each instance based on both its input features and observed noisy label. Extensive experiments across multiple datasets, noise types, and noise rates demonstrate that Relabeler consistently outperforms state-of-the-art baselines, achieving up to 58% improvement in label correction precision and 6% improvement in downstream task performance.

02.
arXiv (CS.LG) 2026-06-15

Efficient On-Device Diffusion LLM Inference with Mobile NPU

arXiv:2606.13740v1 Announce Type: new Abstract: Diffusion large language models (dLLMs) accelerate generation by denoising multiple tokens in parallel, making them attractive for latency-sensitive mobile inference. However, repeated denoising introduces substantial computation on smartphones. Mobile neural processing units (NPUs) offer high-throughput dense matrix computation, but efficiently exploiting them remains challenging: token commitment shrinks per-block effective workloads, token revision complicates KV cache reuse, and limited NPU-visible address space incurs costly remapping and data transfer overheads. In this paper, we propose llada.cpp, the first NPU-aware inference framework for accelerating dLLMs on smartphones. llada.cpp aligns block-wise dLLM inference with the execution characteristics of mobile NPUs through three techniques. (1) Multi-Block Speculative Decoding fills the shrinking workload in late-stage current-block decoding with speculative future-block tokens. (2) Dual-Path Progressive Revision keeps committed tokens revisable until stable and refreshes unstable tokens through a CPU-side path without stalling dense NPU execution. (3) Swap-Optimized Memory Runtime compacts NPU-visible address layouts and overlaps data staging with NPU computation to reduce remapping and transfer overheads. We implement llada.cpp as an end-to-end framework and evaluate it across diverse hardware platforms and dLLM workloads. llada.cpp reduces LLaDA-8B generation latency by 17x-42x over the CPU baseline with prefix KV cache reuse, while preserving generation quality.

03.
arXiv (CS.LG) 2026-06-18

ToolChain-CRC: Conformal Risk Control for Agentic AI Under Retrieval and Tool-Use Drift

arXiv:2606.18467v1 Announce Type: cross Abstract: Modern AI agents retrieve documents, call tools, check intermediate information, and then produce a final answer or action. This creates a risk-control problem that is not visible from the final answer alone. A final response may look acceptable even when the retrieval was weak, a tool output was wrong, or an earlier step was unsupported. We propose ToolChain-CRC, a conformal risk-control method for retrieval-augmented and tool-using agents under drift. The method treats each agent run as a full trajectory of actions, observations, and final output. It builds step-level risk scores, combines them into a trajectory risk score, calibrates an accept-or-intervene rule, and adds an anytime alarm that can stop risky runs before the final answer. We prove trajectory-level risk control under exchangeable calibration runs, give a drift-aware extension with auditable constants, and prove an anytime escalation rule through a supermartingale construction. Experiments cover synthetic tool-chain drift, RAG/tool-use stress tests, public SQuAD-derived retrieval tasks, an API-free agentic QA case study, ablations, target-risk sensitivity checks, 20-seed robustness checks, a drift-margin audit, and a live RAG/tool-use agent benchmark. Across these settings, final-answer-only calibration can miss retrieval and tool failures, while trajectory-level calibration keeps accepted-trajectory risk below the target.

04.
arXiv (CS.AI) 2026-06-17

FinAcumen: Financial Multimodal Reasoning via Self-Evolving Experience Memory Harness

arXiv:2606.17642v1 Announce Type: new Abstract: Financial multimodal reasoning requires agents to coordinate numerical computation, retrieval, visual interpretation, and temporal grounding across heterogeneous evidence sources. Existing tool-augmented agents improve execution fidelity, yet remain largely stateless across episodes, repeatedly rediscovering reasoning strategies and failure patterns. In high-stakes financial settings, this leads to unreliable tool routing, noisy retrieval, and hallucination-prone reasoning. We present FinAcumen, a financial reasoning agent framework centered on selective experience memory for tool-augmented multimodal reasoning. FinAcumen accumulates financially grounded reasoning experience from prior trajectories, distilling successful strategies and failure-derived cautionary rules into a persistent memory bank. During inference, retrieved experiences condition reasoning only when semantic relevance exceeds a calibrated threshold, while irrelevant memory is explicitly suppressed through a fallback mechanism. A deterministic financial tool environment further grounds numerical computation, retrieval, visual decoding, and answer verification.Across four financial multimodal reasoning benchmarks, FinAcumen consistently improves a frozen 8B vision-language model over finance-specialized models and approaches leading proprietary general-purpose models. Further analysis shows that selective experience activation improves reasoning reliability under retrieval uncertainty. Our code is anonymously available at https://anonymous.4open.science/r/FinAcumen

05.
bioRxiv (Bioinfo) 2026-06-20

MIRATS framework: Normative multiscale characterization of brain regulatory systems across sex and age using multimodal MRI

作者:

Deep brain systems involved in arousal, autonomic regulation, sensory integration, and homeostatic control remain underrepresented in conventional whole-brain neuroimaging frameworks. In particular, diencephalic and brainstem nuclei are often insufficiently represented in cortex-centered analyses, limiting the normative references needed to interpret systems-level variation in health and disease. To address this gap, we developed a unified multiscale framework with explicit representation of deep nuclei. By integrating cerebral, cerebellar, diencephalic, and brainstem atlases in standard space, we constructed a 220-region whole-brain parcellation and extracted complementary features at three analytical scales: nodal properties, edge-wise connectivity, and persistent-homology-based topological descriptors. We applied this framework to healthy adults from the Human Connectome Project-Aging cohort to characterize normative multiscale organization and test sex- and age-related variation. Applied to this cohort, our framework revealed pronounced heterogeneity across anatomical systems. Brainstem and diencephalic nuclei showed multiscale feature profiles distinct from those of cerebral and cerebellar regions across nodal, edge-wise, and higher-order topological scales. Sex comparisons identified selective differences across different scales, whereas age modeling revealed widespread but feature- and system-dependent variation across adulthood. Together, these findings show that normative whole-brain organization in this deep-system-aware space is structured by system-specific rather than globally uniform patterns. These findings establish a normative multiscale framework for characterizing brainstem-diencephalic-cerebellar-cerebral organization in healthy adults and provide a quantitative reference for future translational studies of disease-related abnormalities in deep regulatory systems.

06.
arXiv (CS.CV) 2026-06-16

AURA: Active-Response Attribution under Treatment Ambiguity in Bacterial Cytological Profiling

When a bacterial sample is exposed to several antibiotics, not every applied drug necessarily acts: if the organism is resistant to one of them, that drug leaves no morphological trace. The clinically meaningful quantity is therefore not which antibiotics were applied, but which ones were active. We show that these two are sharply decoupled in real E. coli microscopy - naively assuming the applied combination equals the active one is correct only about 37% of the time - yet existing computational tools are ill-suited to recovering the active set. Forward perturbation models such as scGen, CPA, and IMPA are designed to predict appearance from treatment, not the reverse, and inverting them degrades sharply; discriminative image classifiers tend to memorise strain- and batch-specific texture and fail to transfer across experimental replicates. We introduce AURA, which reframes the task as constrained, energy-based inverse attribution. Its central inductive bias is that the active set must be a subset of the applied set; this collapses the candidate space and lets AURA infer the active subset of applied antibiotics by decomposing residual morphology into antibiotic response atoms and selecting the subset with the lowest reconstruction energy, using no strain label at test time. AURA-E adds evidence-aware abstention, withholding a prediction when candidate explanations remain near-equally plausible. On cross-replicate transfer in an E. coli cytological profiling dataset, AURA recovers the active antibiotic combination with 95.47% exact-match accuracy.

07.
arXiv (CS.AI) 2026-06-12

A Mathematical Theory of Value: a synthesis on goal-directed agency under resource constraints

作者:

arXiv:2606.12502v1 Announce Type: cross Abstract: We propose that value – the quantity goal-directed agents create, destroy, and exchange – is a lawful structural quantity in the same category as information. Following Shannon's method, we make one ruthless abstraction: value is the rate at which an agent converts a resource into goal-progress, relative to a frame fixed by its goal. A scale-invariance axiom forces a logarithmic measure, $V=\sum_i k_i \ln e_i$; compounding of a reinvested resource forces the same form via the ergodicity argument of Peters (2019). The two routes are kin rather than independent; their agreement is a consistency check, not an over-determination. We derive a coding theorem of value: $\Delta G \le I(X;Y)$, achieved by Bayes-proportional allocation; realized value decomposes as $G=D(q\|r)-D(q\|p)$, identifying misalignment with measurable waste. For populations, value is frame-relative while price is frame-independent; a fleet that pools its resource and fuses its perception inherits the ceiling $G_{\mathrm{fleet}} \le I(X;Y_{1:m}) \le H(X)$ (a corollary; an earlier sum-form claim was wrong and is corrected in v5). A dynamical layer yields an is/ought asymmetry from which alignment emerges as a control-stability condition with a closed-form residual. We test the single-frame laws on live language models in a pre-registered scale-up: perception mutual information tracks realized capability rather than parameter count (Spearman $\rho = 0.977$ pooled over 30 model$\times$domain points), out-of-sample $\Delta G$ tracks $I(X;Y)$, and over-confidence is measurable dissipation; a further pre-registered test shows the bridge is shape-invariant across four task shapes ($n=42$, slope 0.953). None of the mechanisms is individually new – generalized Kelly, Armstrong & Mindermann (2018), classical control; the contribution is their unification and the governance mapping (incentive design over oversight) that follows.

08.
arXiv (CS.AI) 2026-06-16

Mojo: A Promising Tool for Scalable Financial AI Efficiency

作者:

arXiv:2606.16059v1 Announce Type: cross Abstract: For thirty years, quantitative finance has paid a costly two-language tax: models researched in Python are rewritten in C++ for production, often introducing numerical discrepancies. GPU-accelerated deep learning exacerbates this problem, as nondeterministic floating-point reductions can produce drift in long backtests, challenging regulatory reproducibility and auditability expectations. This article surveys Mojo, Modular's 2026 Python-like systems language, as a structural response for capital markets engineering. While closing the Python-to-C++ performance gap, Mojo uniquely combines native interoperability with the low-level systems control required to construct bit-exact deterministic kernels. Its MLIR compilation infrastructure further allows a single codebase to target scalar, SIMD, multicore, and GPU execution, reducing the translation bottleneck between research and production. We benchmark four core financial AI workloads: Monte Carlo option pricing, LLM sentiment inference, multi-asset backtesting, and portfolio Value at Risk. On Apple Silicon, Mojo demonstrates 20x to 180x speedups over pure Python on directly measured kernels; larger-scale GPU workload results are projections calibrated from published benchmarks. Alongside transparent performance data, we introduce mojo-deterministic, an open-source library of reproducible reduction kernels, and provide a candid assessment of the problems Mojo does and does not yet solve.

09.
arXiv (CS.LG) 2026-06-15

Multidimensional Bayesian Active Machine Learning of Working Memory Task Performance

arXiv:2510.00375v2 Announce Type: replace Abstract: While adaptive experimental design has outgrown one-dimensional, staircase-based adaptations, most cognitive experiments still control a single factor and summarize performance with a scalar. We show a validation of a Bayesian, two-axis, active-classification approach, carried out in an immersive virtual testing environment for a 5-by-5 working-memory reconstruction task. Two variables are controlled: spatial load L (number of occupied tiles) and feature-binding load K (number of distinct colors) of items. Stimulus acquisition is guided by posterior uncertainty of a nonparametric Gaussian Process (GP) probabilistic classifier, which outputs a surface over (L, K) rather than a single threshold or max span value. In a young adult population, we compare GP-driven Adaptive Mode (AM) with a traditional adaptive staircase Classic Mode (CM), which varies L only at K = 3. Parity between the methods is achieved for this cohort, with an intraclass coefficient of 0.755 at K = 3. Additionally, AM reveals individual differences in interactions between spatial load and feature binding. AM estimates converge more quickly than other sampling strategies, demonstrating that only about 30 samples are required for accurate fitting of the full model.

10.
arXiv (CS.CV) 2026-06-16

GRACE: Boosting Video MLLMs with Grounded Action-Centric Evidence for Viewer Sentiment Prediction

Viewer sentiment prediction in video advertisements aims to infer the latent affective response evoked in the audience. To bridge the gap between what is shown and what is felt, models must deduce hidden viewer emotions from explicit visual narratives, concrete character-object interactions, and visible textual cues. However, standard Multimodal Large Language Models (MLLMs) typically rely on holistic frame representations, which leave these fine-grained, affect-relevant events implicit and complicate precise emotional reasoning. To address this, we propose a grounded action-centric evidence augmentation framework that enhances video MLLMs' clue extraction and comprehension by introducing explicit event structure and localized visual evidence. Our method extracts temporally ordered subject-verb-object (SVO) triplets and auxiliary visible textual cues from action-centric video descriptions, grounds subject and object entities as visual entity crops, and then enables the MLLM to perform clue-enhanced emotional reasoning based on these extracted structured clues. In this way, action triplets specify "what happens", while grounded visual entity crops anchor "who or what participates in each event" to concrete visual evidence. Experiments on the Pitts dataset show consistent improvements over Qwen2.5-VL and Qwen3-VL baselines. Ablation studies, cross-dataset evaluation on AdsQA, and transfer experiments on an emotion-focused TVQA subset further support the effectiveness and generalization of our approach.

11.
arXiv (CS.LG) 2026-06-19

The Representational Limit of Scalar Interactions: An Interventional Decomposition

arXiv:2606.19410v1 Announce Type: cross Abstract: Signed pairwise interaction scores fundamentally conflate uniqueness (U), redundancy (R), and synergy (S). We prove this on a minimal 3-way XOR structural causal model: faithful indices such as Shapley-Taylor return zero per pair, whereas projective indices such as Shapley Interaction spread the third-order effect into pair scalars that conflate the three mechanisms. We introduce Stochastic Hi-Fi, a post-hoc, retraining-free predictability decomposition that estimates per-feature U/R/S profiles by interventional masked inference. The estimator provides exact interventional semantics, finite-sample Monte Carlo bounds, strict variance reduction from coupled diamond sampling, and uniform finite-vocabulary convergence. Across tabular SCMs, Stochastic Hi-Fi recovers structure missed by scalar baselines (up to 411x larger interaction-magnitude recovery ratios). It also separates redundant and synergistic heads in the GPT-2 IOI circuit. On NIH ChestX-ray14, Stochastic Hi-Fi matches GradCAM on Pointing Game and improves substantially on Deletion AUC.

12.
arXiv (CS.LG) 2026-06-11

Modelling magnetic material properties with uncertainty-aware neural networks

arXiv:2606.11870v1 Announce Type: cross Abstract: Machine learning is increasingly applied to accelerate the discovery of novel materials by exploring large compositional and structural design spaces. Yet, the scarcity of high-quality data and the frequent need for out-of-distribution prediction introduce substantial uncertainty, making the assessment of model reliability essential. In this work, we investigate uncertainty quantification as a means to evaluate model confidence in the context of permanent magnet research. In a first study, we benchmark classical and modern machine learning models for predicting intrinsic magnetic properties, focusing on the quality of their uncertainty estimates. We apply Gaussian negative log-likelihood loss and dropout-based Bayesian approximation as practical strategies for estimating predictive uncertainty. In a second study, we transfer these architectural features for uncertainty estimation to a more complex task: predicting coercivity from microstructural information using a graph neural network. Together, these studies demonstrate that uncertainty quantification not only enhances the trustworthiness of predictions but is also transferable across different modeling tasks.

13.
arXiv (CS.AI) 2026-06-17

SkillChain-Gym: A Benchmark for Reskilling-Aware Production-Inventory Control under Disruptions

arXiv:2606.17266v1 Announce Type: new Abstract: Production planning increasingly has to treat workforce capability as a decision variable: certifications lapse when skills are not maintained, new products require skills the current workforce does not hold, and reskilling competes for the same worker hours needed for production. Existing operations benchmarks usually treat labor as exogenous, while workforce-planning models with skills and learning are rarely released as reusable testbeds. We introduce SkillChain-Gym, a benchmark specification for reskilling-aware production-inventory control: a single-site environment with stylized worker skill-state dynamics, hard threshold certification, forgetting, and capacity-consuming training actions constrained by the same per-worker time budget as production. The benchmark includes seed-controlled disruption scenarios, three feasibility modes with projection diagnostics, deterministic replay, and metrics covering operations, resilience, capability growth, and training-access distribution. We evaluate production-only, reactive adaptive, water-filling adaptive, and static-insurance policies with budget variants over 60-shift horizons with paired statistical tests. The results are regime-dependent rather than a ranking. Training-capable policies dominate the production-only baseline, and maintenance training is necessary under forgetting even without disruptions. Among training-capable classes, adaptive training helps when bottlenecks are visible in the forecast, while a lean static cross-training plan, a deliberately favorable comparator whose structure encodes relevant skill contingencies, acts as strong insurance under surprise shocks and absenteeism. Capacity slack and the forgetting rate govern the boundary between these regimes. No policy class dominates across regimes, motivating forecast-driven controllers that decide when to buy skill insurance and when to react.

14.
arXiv (CS.LG) 2026-06-16

Elastic ODYN: Differentiable Optimization for Infeasible Control and Learning in Robotics

arXiv:2606.16564v1 Announce Type: cross Abstract: Robotic systems routinely encounter conflicting objectives, modeling errors, and degenerate contact conditions that render quadratic programs (QPs) infeasible. Yet most optimization solvers and differentiable QP layers assume feasibility, leading to numerical failures, unstable gradients, or solver breakdown when constraints cannot be simultaneously satisfied. We present Elastic ODYN, a primal–dual non-interior-point QP solver that handles infeasibility through smooth squared-$\ell_2$ elastic relaxations. The resulting formulation remains well posed under ill-conditioning and degeneracy, supports warm starting, and converges to closest-to-feasible solutions when no feasible point exists. A lightweight refinement stage recovers physically meaningful dual variables from the elastic solution. Building on this framework, we develop Elastic OdynLayer, a differentiable QP layer with stable gradients under infeasibility, and Elastic OdynSQP, an infeasibility-aware SQP method that resolves inconsistent subproblems and intrinsically infeasible optimal control tasks through selective constraint relaxation. We evaluate the framework on benchmark QPs, singular contact mechanics, differentiable parameter identification, and quadrupedal and humanoid trajectory optimization. Across all settings, Elastic ODYN consistently outperforms state-of-the-art elastic QP solvers in robustness, warm-start performance, and convergence reliability, enabling optimization, simulation, control, and learning beyond the feasibility assumptions of existing methods.

16.
arXiv (CS.CV) 2026-06-15

Rendering-Aware Sparse Sampling for BRDF Acquisition

Accurate BRDF acquisition is essential for realistic rendering, but dense gonioreflectometer measurements are slow and expensive. We study how to select a small set of BRDF measurements that is most informative for reconstructing material appearance under a learned BRDF prior. Existing sparse-acquisition methods often optimize samples for BRDF-space reconstruction for all materials, while the perceptual importance of a adaptive measurement ultimately depends on its effect on each rendered appearance. We therefore formulate sparse adaptive acquisition as a rendering-aware optimization problem. Our method combines a set encoder for sparse coordinate–value observations, a pretrained hypernetwork-based/PCA-based BRDF reconstructor, and a differentiable renderer. During sampler training, the reconstructor remains fixed, and gradients from a rendered-image loss optimize the measurement locations. This separates acquisition design from prior fitting and encourages the sampler to choose directions that are informative under the learned material distribution. To make the comparison controlled, we evaluate the uniform baseline, meta-learning method, HyperBRDF method, and our learned sampler under matched sample numbers, train/test split, rendering scene, object mask, image mapping, and metrics. Our central claim: rendering-aware sampling improves extremely sparse BRDF acquisition when final rendered appearance is the target. BRDF-space and combined losses are reported only as ablations, together with joint refinement and image-only latent fitting for unseen materials.

17.
arXiv (math.PR) 2026-06-18

Second-Order Approximation of Limit Order Books in a Single-Scale Regime

arXiv:2308.00805v3 Announce Type: replace-cross Abstract: We establish a first- and second-order approximation for an infinite dimensional limit order book model in a single (critical) scaling regime where market and limit orders arrive at a common time scale. With our choice of scaling we obtain non-degenerate first- and second-order approximations for the price and volume dynamics. While the first-order approximation is given by a coupled ODE-PDE system, the second-order approximation is described in terms of an infinite-dimensional stochastic evolution equation driven by a cylindrical Brownian motion. The driving noise processes exhibit a non-trivial correlation in terms of the model parameters. We prove that the evolution equation has a unique solution and that the sequence of standardized limit order book models converges weakly to the solution of the evolution equation. The proof uses a non-standard martingale problem. We calibrate a linearized model to market data and explain how our model can be used for deriving confidence intervals of portfolio liquidation values.

18.
arXiv (CS.CL) 2026-06-16

MAWARITH: A Dataset and Benchmark for Legal Inheritance Reasoning with LLMs

Islamic inheritance law is challenging for large language models because solving inheritance cases requires complex, structured, multi-step reasoning and the correct application of juristic rules to compute heirs' shares. We introduce MAWARITH, a large-scale annotated dataset of 12,500 Arabic inheritance cases for training and evaluating models on the full reasoning chain: (i) identifying eligible heirs, (ii) applying blocking (\d{hajb}) and allocation rules, and (iii) computing exact inheritance shares. To the best of our knowledge, MAWARITH is the first Arabic corpus and benchmark designed for end-to-end Islamic inheritance reasoning. Unlike prior datasets that restrict inheritance case solving to multiple-choice questions, MAWARITH supports the full reasoning chain and provides step-by-step solutions with justifications grounded in classical juristic sources and established inheritance rules, as well as exact share calculations. This enables models to learn how to generate detailed, step-by-step responses to user queries that reflect real-world Islamic inheritance cases. To evaluate models beyond final-answer accuracy, we propose MIR-E (Mawarith Inheritance Reasoning Evaluation), a weighted multi-stage metric that scores key reasoning stages and captures error propagation across the pipeline. We evaluate six large language models in a zero-shot setting. A commercial model achieves about 90\%, whereas all evaluated open-source models remain below 50\%. Our error analysis identifies recurring failure patterns, including scenario misinterpretation, errors in heir identification, errors in share allocation, and missing or incorrect application of key inheritance rules such as \textquotesingle awl and radd. The MAWARITH dataset is publicly available at https://gitlab.com/nlpresearcher/mawarith.

19.
arXiv (CS.LG) 2026-06-17

Towards Fast GNN Surrogates for CO2 Migration in Complex Geological Formations

arXiv:2606.17180v1 Announce Type: new Abstract: This chapter discusses how a data-driven machine learning approach can reproduce key aspects of the physical behavior of multiphase flows in complex geological formations. We propose an end-to-end graph neural surrogate tailored to CO$_2$ plume migration forecasting in geological storage. The method is evaluated on the SPE11A benchmark, a well-known industry test case designed to assess CO$_2$ storage scenarios and characterized by sharp gas-water interfaces, strong advective transport, and rapid convective mixing with fingering development. The benchmark is reformulated as a graph in which nodes represent computational cells and edges encode transmissibility-based interactions enriched with geometric attributes. Directional transport arising from grid geometry, permeability contrasts, and geological heterogeneity is captured through an anisotropic message-passing mechanism, where interaction weights are computed via geometry-conditioned edge embeddings, biasing message aggregation toward physically relevant transport directions. Temporal evolution is modeled in latent space using an autoregressive residual formulation trained with multi-step supervision. The proposed model produces competitive forecasts of gas saturation and liquid-phase density, which are key indicators for CO$_2$ storage monitoring, with cumulative errors that remain moderate over extended forecasting horizons.

20.
arXiv (CS.CV) 2026-06-16

Show the Signal, Hide the Noise: Spectral Forcing for Pixel-Space Diffusion

Pixel-space diffusion models are trained on full-bandwidth noisy images, yet the useful signal available to the denoiser is strongly frequency dependent. Under rectified-flow diffusion and natural-image power-law spectra, the per-band data-to-noise contour $k^{*}(t) = (1-t)^{-2/\alpha}$ separates a signal-bearing low-frequency region from a noise-dominated high-frequency region at each time $t$. We show that this implicit coarse-to-fine structure is not merely descriptive: it induces a capacity-allocation problem. A standard pixel-space denoiser must discover the moving bandwidth boundary internally and can spend computation on frequency-time regions where the optimal prediction collapses to deterministic baselines rather than data-distribution modeling. To make this boundary explicit, we introduce Spectral Forcing, a parameter-free, time-conditional 2D-DCT low-pass operator applied to the noisy input before the patch embedder. Its cutoff expands monotonically with the diffusion time and becomes the identity at the data endpoint. Through controlled synthetic experiments, we identify the regime in which the operator is beneficial: coarse patch tokenization and data whose high-frequency content is predominantly noise rather than essential signal. On ImageNet-256 with JiT-700M/32, Spectral Forcing consistently improves both FID and Inception Score across different training epochs, demonstrating robust gains throughout training; at finer tokenization, the spectral forcing is still competitive. We further insert the unchanged operator into SenseNova-U1, a unified text-to-image model, where it improves DPG-Bench and GenEval, showing that the input-side spectral prior transfers beyond class-conditional generation. These results suggest a route to capacity-efficient pixel-space diffusion by showing the signal and hiding the noise.

21.
arXiv (CS.LG) 2026-06-17

Memory-Efficient Meta-Reinforcement Learning for Adaptive Safety-Critical Control in Adversarial Spacecraft Proximity Operations

arXiv:2606.17414v1 Announce Type: new Abstract: Autonomous spacecraft rendezvous and proximity operations (RPO) require controllers that guarantee safety under thrust constraints while minimizing fuel expenditure. Input-constrained control barrier functions (ICCBFs) provide a control method for nonlinear systems with actuation constraints that construct a forward-invariant safe set. Previous work has shown that learning class-$\mathcal{K}$ functions defining the ICCBF recursion via meta reinforcement learning (meta-RL) yields a robust, non-greedy approach to safety-critical control in RPO. This paper extends that framework further by investigating the performance of three recurrent network architectures (Long Short Term Memory (LSTM), Gated Recurrent Unit (GRU), Selective State Space Model (Mamba)) and two training algorithms (Proximal Policy Optimization (PPO) and Soft Actor Critic (SAC)) to identify the best setup for tuning ICCBF class-K functions via meta-RL. In addition to cooperative test cases, performance is evaluated in the presence of adversarial behavior where the target spacecraft behaves in a way that worsens the safety of the chaser spacecraft. Results indicate that state space models such as Mamba when used with PPO achieve superior task completion, safety, and fuel-savings compared to other architectures, across all cooperative and uncooperative scenarios tested.

22.
arXiv (CS.CV) 2026-06-15

MUSE: Agentic 3D Scene Authoring via Memory-Grounded Incremental Requirement Satisfaction

Text-driven 3D scene generation is a promising technique for digital content creation, embodied AI simulation, and interactive design, yet practical workflows often require refining, extending, or correcting existing scenes while preserving non-target content. Existing methods can produce realistic and structurally plausible scenes, but they generally lack editability with requirement-level state tracking, so part-level failures often lead to full-scene regeneration or manual intervention. To tackle this challenge, we formulate controllable 3D scene authoring as incremental requirement satisfaction, unifying construction and editing. In this paper, we present MUSE, a memory-grounded multi-agent framework in which an Architect compiles instructions into structured requirements, a Sculptor executes local scene operations, and an Inspector verifies each step while updating Working, Scene, and Skill Memory. To evaluate requirement-level controllability and preservation-aware editing, we introduce AuthorBench, offering 145 constrained construction cases and a 1,584-case preservation-aware editing pool paired with external structured checks. On full construction cases, MUSE improves All-Goal success from 37.9 to 80.7 and surface-constraint fulfillment from 35.0 to 92.6 over the strongest baseline. On a stratified 240-case editing test split, MUSE achieves 49.6 All-Goal success, 99.9 preservation rate, and only 0.6 unintended change rate. Beyond automated metrics, human evaluations on compared local-editing baselines support stronger alignment with user intent, and downstream navigation-proxy tests indicate stronger spatial stability. Combined with ablations validating our memory designs, these results establish MUSE as an effective framework for controllable 3D scene authoring.

23.
arXiv (CS.LG) 2026-06-16

Semi-Supervised Speech Confidence Detection using Pseudo-Labelling and Whisper Embeddings

arXiv:2606.16505v1 Announce Type: cross Abstract: Understanding speaker confidence is crucial in educational settings, as it can enhance personalised feedback and improve learning outcomes. This study introduces a novel framework for detecting speaker confidence by integrating human-engineered features with embeddings from the Whisper encoder. To address data limitations, a pseudo-labelling technique is employed to expand the labelled dataset, allowing the model to learn from both human-annotated and model-generated labels. The framework combines traditional speech features including pitch, volume, rate of speech, and the presence of disfluencies and stress, with Whisper embeddings, and uses a co-attention mechanism to fuse these representations and achieve an overall accuracy of 75%. This study contributes to advancing speech analysis, enabling applications that support personalised learning and speaking skill development.

24.
arXiv (CS.AI) 2026-06-19

A Neuromorphic Reinforcement Learning Framework for Efficient Pathfinding in Robotic Mobile Fulfillment Systems

arXiv:2606.20031v1 Announce Type: cross Abstract: Dynamic environmental changes, confined workspaces, and stringent real-time constraints make pathfinding in Robotic Mobile Fulfillment Systems (RMFS) a challenging problem for conventional search- and rule-based methods, which typically suffer from high computational complexity and long decision latency. While reinforcement learning (RL) has emerged as a powerful alternative, deploying learned policies with extreme energy efficiency on resource-constrained hardware remains an open challenge. We present SDQN-RMFS, an end-to-end framework that achieves high-fidelity deployment of an RL-trained policy from a full-precision artificial neural network (ANN) through to a neuromorphic chip. By computing only when triggered by sparse events, this framework unlocks ultra-low-power RMFS pathfinding. Our full-stack pipeline operates as follows: an ANN policy is first efficiently trained via a collision-allowing strategy to densify informative trajectories, and then converted into a spiking neural network (SNN) via a hard-label knowledge distillation approach. This effectively addresses the output distribution mismatch, preserving policy capability across the ANN-to-SNN pipeline while substantially reducing inference latency. Hardware experiments demonstrate up to 11,281$\times$ energy savings and a nearly two-fold reduction in latency compared to a high-performance GPU baseline, while maintaining decision quality on par with the original trained policy. These results establish physical neuromorphic inference as a practical and energy-sustainable pathway for large-scale RMFS operations.

25.
arXiv (CS.AI) 2026-06-24

Reentrant value fields as delayed coupled reaction-diffusion systems on finite graphs

arXiv:2605.03940v4 Announce Type: cross Abstract: We describe a dynamical system in which a symbolic field is coupled to a geometric field via a bipartite Hilbert-Schmidt kernel. The system is fully described by a retarded functional differential equation (RFDE) on the history space, subject to Lipschitz and small gain conditions. We show that the RFDE is well-posed under constant input and that it admits a compact global attractor. The principal subsystem $(H_L, X_R, P)$, which is comprised of the two primary fields as well as an executive field, is shown to be globally stable independent of delay, provided that the interfield coupling satisfies $C_{\mathcal{K}}^2