Academic Intelligence · Curated Daily

探索全球前沿学术脉络

AcademicHub 汇聚顶级期刊与预印本平台的实时文献。定制您的专属科研雷达,利用大语言模型自动生成交叉领域文献分析简报。

01.
arXiv (CS.AI) 2026-06-11

When Do Data-Driven Systems Exhibit the Capability to Infer?

arXiv:2606.11769v1 Announce Type: new Abstract: The European AI Act is the first comprehensive regulation of artificial intelligence (AI), setting out extensive obligations, particularly for so-called high-risk and general-purpose AI systems. A key distinguishing feature of AI systems under the AI Act is the capability to infer. Since the AI Act does not clearly define what inference is, there is a gray area for certain data-driven systems. A specific example is credit scoring systems, which are listed by Annex III of the AI Act. At the same time, however, these are often implemented using statistical models for which it is unclear whether they have the capability to infer and thus fall under the AI definition of the AI Act at all. Motivated by statistical learning theory, this work develops a framework for grading different levels of the capability to infer. Based on the AI Act and the Commission Guidelines on the definition of an artificial intelligence system, we analyze which levels constitute sufficient capability to infer within the meaning of the AI Act and where further regulatory clarity is needed. We illustrate the framework by creating two realistic credit scoring workflows and show whether and where inference occurs in them. Our analysis illustrates that not only individual models but the entire data processing workflow must be considered. It also shows that the involvement of human experts during development can have significant influence on the capability to infer. Code can be found at https://github.com/fraunhofer-iais/inference-framework-creditscorecards.

02.
arXiv (CS.CL) 2026-06-19

What Makes Effective Supervision in Latent Chain-of-Thought: An Information-Theoretic Analysis

Latent Chain-of-Thought (CoT) internalizes reasoning within continuous hidden states, offering a promising alternative to verbose discrete reasoning traces. However, robust latent reasoning remains difficult because outcome supervision provides weak learning signals and leaves latent trajectories prone to semantic drift. In this work, we analyze Latent CoT from an information-theoretic perspective and identify this failure as a dual collapse: gradient attenuation along the optimization path and representational drift in the latent space. We further decompose process supervision into two complementary dimensions: Trajectory Supervision, which injects dense stepwise reasoning signals, and Space Supervision, which preserves the semantic structure of the latent manifold. Our analysis shows that rigid geometric compression can collapse the reasoning space, whereas generative reconstruction provides a more flexible semantic anchor that better preserves information capacity. To measure these effects, we introduce the Unified Latent Probe (ULP), which quantifies the mutual information between latent trajectories and explicit reasoning steps. Experiments reveal a clear Information-Performance Binding: reasoning accuracy depends on the information fidelity preserved in the latent chain. These findings provide a principled framework for latent reasoning supervision and suggest shifting from geometric imitation toward mutual information maximization. Our code is available at \href{https://github.com/EIT-NLP/Supervision-in-Latent-CoT}{this repository}.

03.
arXiv (CS.AI) 2026-06-15

An integrated interpretable control effectiveness learning and nonlinear control allocation methodology for overactuated aircrafts

arXiv:2606.13794v1 Announce Type: cross Abstract: Nonlinear dynamics and the strong couplings that arise between multiple effectors undermine the assumptions behind conventional, linear control allocation techniques. When flight enters regimes where nonlinear effects dominate, linear allocators exhibit reduced accuracy due to increased model mismatch, which subsequently degrades performance and robustness of the flight control system. High fidelity onboard models and black box data driven approaches can recover accuracy across the flight envelope, but respectively impose computational burdens prohibitive for real time allocation and sacrifice the interpretability required for verification and fault diagnosis. This paper addresses these limitations by learning an explicit, physics constrained analytical model of the control effectiveness mapping from representative flight data using Sparse Identification of Nonlinear Dynamics. The resulting mapping is compact, interpretable, and admits analytical derivatives, enabling efficient computation within nonlinear solvers that additionally incorporate actuator dynamics, without requiring an onboard model. An online adaptation mechanism monitors prediction residuals and refreshes the model when significant plant changes are detected, providing graceful reconfiguration under actuator failures and varying operating conditions. The methodology is evaluated on a high fidelity nonlinear benchmark aircraft across a range of aggressive maneuvers, achieving accuracy comparable to a full nonlinear onboard model while substantially reducing computational cost relative to established baselines.

04.
arXiv (math.PR) 2026-06-11

Exact Fourier dimensions of dyadic Mandelbrot cascades on curves of nonvanishing curvature under minimal integrability

arXiv:2606.11758v1 Announce Type: new Abstract: We prove an exact Fourier-dimension formula for scalar dyadic Mandelbrot cascades pushed forward to fixed C^2 Jordan curves with nonvanishing curvature. Let W be in the minimal Kahane-Peyriere regime, let the scalar dyadic cascade live on T = R/Z, and let gamma map T to R^2 be a fixed C^2 Jordan curve with nonvanishing curvature, parametrized at constant speed. For the push-forward measure mu_gamma, we prove that, almost surely on non-extinction, its Fourier dimension is A_loc(W), the usual local exponent obtained by optimizing over q>1 from the moment expression involving E[W^q]. The upper bound follows from the scalar circle local-dimension theorem, bi-Lipschitz transfer to the fixed curve, and a deterministic curved-support obstruction for Fourier dimension. The lower bound follows from a fixed-curve finite-r annular theorem, which gives summable annular Fourier decay under a single finite moment witness. The main analytic input is a deterministic phase-geometry package for fixed nondegenerate C^2 curves: stationary tubes, derivative bands, and phase-bin coefficient estimates replacing the explicit trigonometric structure available on the unit circle.

05.
arXiv (CS.LG) 2026-06-19

The Hidden Cost of Approximation in Online Mirror Descent

arXiv:2511.22283v2 Announce Type: replace Abstract: Online mirror descent (OMD) is a fundamental algorithmic paradigm that underlies many algorithms in optimization, machine learning and sequential decision-making. The OMD iterates are defined as solutions to optimization subproblems which, oftentimes, can be solved only approximately, leading to an inexact version of the algorithm. Nonetheless, existing OMD analyses typically assume an idealized error free setting, thereby limiting our understanding of performance guarantees that should be expected in practice. In this work we initiate a systematic study into inexact OMD, and uncover an intricate relation between regularizer smoothness and robustness to approximation errors. When the regularizer is uniformly smooth, we establish a tight bound on the excess regret due to errors. Then, for barrier regularizers over the simplex and its subsets, we identify a sharp separation: negative entropy requires exponentially small errors to avoid linear regret, whereas log-barrier and Tsallis regularizers remain robust even when the errors are only polynomial. Finally, we show that when the losses are stochastic and the domain is the simplex, negative entropy regains robustness-but this property does not extend to all subsets, where exponentially small errors are again necessary to avoid suboptimal regret.

06.
arXiv (CS.CV) 2026-06-19

Gaussian Process Prior Variational Autoencoder for Endoscopic Videos

Endoscopic video analysis is essential for gastrointestinal diagnosis and computer-assisted interventions, but video sequences are routinely degraded by specular reflections, motion artifacts, and missing frames. These transient corruptions can distract clinicians, reduce image interpretability, and disrupt downstream tasks such as 3D reconstruction and navigation. Effective restoration therefore requires methods that exploit temporal continuity rather than treating frames in isolation. We introduce a Gaussian Process Prior Variational Autoencoder (GPVAE) framework for endoscopic video restoration that replaces the standard factorized latent prior with a temporal Gaussian process prior, enabling interpolation of missing frames with uncertainty-aware reconstruction. The framework combines endoscopy-specific encoders, including a convolutional EndoVAE backbone and pretrained Vision Transformer encoders from GastroNet-5M, with two scalable GP approximations: Hierarchical Prior Approximation (HPA) and Sparse Precision Approximation (SPA). Specular reflections are handled using a DUCKNet-based masking pipeline that excludes corrupted pixels from the reconstruction objective. On the C3VDv2 colonoscopy dataset, the best GPVAE variants reduced image reconstruction RMSE by 21.9\% on average, and by up to 26.1\%, relative to matched VAE baselines. Downstream trajectory RMSE was reduced by 12.7\% on average across classical visual odometry and a pretrained PoseNet, at an average increase of 27.3\% in training time per epoch. Finally, the GP posterior provides per-frame uncertainty estimates that reflect temporal support and offer a confidence signal for restored frames.

07.
arXiv (CS.CL) 2026-06-16

ArFake: A Robust Framework for Multi-Dialect Arabic Speech Spoofing Detection Benchmark

With the rise of generative text-to-speech models, distinguishing between real and synthetic speech has become challenging, especially for Arabic that have received limited research attention. Most spoof detection efforts have focused on English, leaving a significant gap for Arabic and its many dialects. In this work, we introduce the first multi-dialect Arabic spoofed speech dataset. To evaluate the difficulty of the synthesized audio from each model and determine which produces the most challenging samples, we aimed to guide the construction of our final dataset either by merging audios from multiple models or by selecting the best-performing model, we conducted an evaluation pipeline that included training classifiers using two approaches: modern embedding-based methods combined with classifier heads; classical machine learning algorithms applied to MFCC features; and the RawNet2 architecture. The pipeline further incorporated the calculation of Mean Opinion Score based on human ratings, as well as processing both original and synthesized datasets through an Automatic Speech Recognition model to measure the Word Error Rate. Our results demonstrate that FishSpeech outperforms other TTS models in Arabic voice cloning on the Casablanca corpus, producing more realistic and challenging synthetic speech samples. However, relying on a single TTS for dataset creation may limit generalizability.

08.
arXiv (CS.CV) 2026-06-16

X-Tokenizer: A Multimodal Action Tokenizer for Vision-Language-Action Pretraining

Modern Vision-Language-Action (VLA) models must bridge pretrained vision-language reasoning and precise continuous robot control. Existing action tokenizers discretize actions primarily for reconstruction, producing codes that preserve motion geometry but provide only weak semantic supervision to the backbone. We therefore formulate action tokenization not as mere compression, but as semantic interface learning between multimodal reasoning and executable control. To this end, we introduce X-Tokenizer, a lightweight encoder-Semantic Residual Quantization (SRQ)-decoder architecture that provides a shared action interface across diverse robotic arm embodiments. Its key component, SRQ, imposes an asymmetric structure on residual vector quantization: the first level is trained with Masked Action Modeling (MAM) to form a discrete action language that captures coarse motion intent, while deeper levels remain reconstruction-oriented residuals that preserve fine-grained details. To further align action tokens with multimodal semantics, X-Tokenizer is pretrained with contrastive alignment to the representation space of a pretrained foundation model and with next-frame vision-language feature prediction. Pretrained on 2.4M trajectories (2.0B action frames), a single frozen X-Tokenizer plugs into a mixed discrete-continuous VLA as a representation-shaping supervision signal. X-Tokenizer achieves top real-world aggregate and strong RoboTwin 2.0 simulation results. Outperforming FAST in multimodal grounding (+13.5%) and long-horizon tasks (+8.25), it shows that action tokenizers serve as semantic interfaces for VLA pretraining beyond mere action compression.

09.
arXiv (CS.AI) 2026-06-12

Bag of Dims: Training-Free Mechanistic Interpretability via Dimension-Level Sign Patterns

arXiv:2606.12629v1 Announce Type: cross Abstract: We show that the standard basis of transformer hidden states already provides a training-free, architecture-general feature basis. Individual dimensions encode semantic content via their signs and confidence via their magnitudes, functioning as independent binary registers. We validate this Bag of Dims framework across three model families (Qwen 3.5-4B, Gemma 3-4B, Mistral 7B) through four progressive experiments. Sign patterns alone carry predictive content: replacing all magnitudes with unity achieves 72-93% top-5 next-token accuracy through the LM head, and pure Hamming scoring without any decoder reaches 80-90% top-4096. These sign patterns organize into semantic features: using a single-token type cache (one forward pass per vocabulary token, no context), we discover 175 categories via per-dimension sign consistency (mean AUC 0.80) from 50 anchors with zero training. A trained probe adds only +0.018 AUC and converges to axis-aligned weights, confirming negligible cross-dimension structure. This structure extends to attention: all 175 categories remain discoverable in K and V projections. On the write side, static FFN weight inspection links 20% of features to individual writer neurons (>0.70 agreement; random controls: 0%), with top-200 neuron coalitions achieving >0.70 agreement on 99.9% of prototypes via majority vote. Fully unsupervised discovery (random seeds, no labels) scales to 1500 features at 100% yield and 99% sparsity across all three models, with pairwise MI of 0.0014 bits confirming low inter-dimension coupling. These results establish that the standard basis already suffices for feature reading throughout the transformer compute pathway, requiring no training, no optimization, and no GPU-days beyond a single forward pass per vocabulary token.

10.
arXiv (CS.LG) 2026-06-19

Evaluating Universal Machine Learning Force Fields Against Experimental Measurements

arXiv:2508.05762v2 Announce Type: replace-cross Abstract: Universal machine learning force fields (UMLFFs) promise to revolutionize materials science by enabling rapid atomistic simulations across the periodic table. However, their evaluation has been limited to computational benchmarks that may not reflect real-world performance. We introduce UniFFBench, a comprehensive evaluation framework featuring the MinX dataset – a diverse collection of 1,500+ mineral systems spanning 85 elements, extreme thermodynamic conditions (0–5000 K, 0–1000 GPa), and structural complexity, including partial occupancy and disorder. This diversity, combined with experimental reference values for validation, enables assessment of UMLFF generalization across chemical space and conditions substantially beyond typical training scenarios. Our systematic evaluation of six state-of-the-art UMLFFs reveals a substantial ``reality gap'': models achieving impressive performance on computational benchmarks often fail when confronted with experimental complexity. Even the best-performing models exhibit higher density prediction error than the threshold required for practical applications. We observe disconnects between simulation stability and mechanical property accuracy, with prediction errors correlating with training data representation rather than the modeling method.

11.
arXiv (quant-ph) 2026-06-19

Arrival times of an atomic Bose-Einstein condensate

arXiv:2606.20281v1 Announce Type: cross Abstract: The times of flight of an atomic Bose-Einstein condensate are theoretically investigated in the experimentally unexplored regime corresponding to detection close to the trap of the condensate. In this regime, there is no consensus on how to calculate the distribution of times of arrival onto the detector. For non-interacting particles, distinct theoretical predictions have been made in the past. This work analyses how these predictions are modified for an interacting Bose-Einstein condensate. For this purpose, a time-dependent Gross-Pitaevskii equation is solved analytically and numerically.

12.
arXiv (CS.CL) 2026-06-11

Neuron-based Personality Trait Induction in Large Language Models

Large language models (LLMs) have become increasingly proficient at simulating various personality traits, an important capability for supporting related applications (e.g., role-playing). To further improve this capacity, in this paper, we present a neuron-based approach for personality trait induction in LLMs, with three major technical contributions. First, we construct PersonalityBench, a large-scale dataset for identifying and evaluating personality traits in LLMs. This dataset is grounded in the Big Five personality traits from psychology and is designed to assess the generative capabilities of LLMs towards specific personality traits. Second, by leveraging PersonalityBench, we propose an efficient method for identifying personality-related neurons within LLMs by examining the opposite aspects of a given trait. Third, we develop a simple yet effective induction method that manipulates the values of these identified personality-related neurons. This method enables fine-grained control over the traits exhibited by LLMs without training and modifying model parameters. Extensive experiments validate the efficacy of our neuron identification and trait induction methods. Notably, our approach achieves comparable performance as fine-tuned models, offering a more efficient and flexible solution for personality trait induction in LLMs. We provide access to all the mentioned resources at https://github.com/RUCAIBox/NPTI.

13.
arXiv (quant-ph) 2026-06-15

Physics-Informed Variational Quantum Classifier for Phase Detection in Strongly Correlated Matter

arXiv:2606.14489v1 Announce Type: new Abstract: The characterisation of quantum phases in strongly correlated systems is a crucial milestone for the deployment of quantum sensors. In this work, we present a Physics-Informed Variational Quantum Classifier (VQC) designed to detect the topological phase transition between the Fermi polaron quasiparticle and the molecular bound state. Unlike conventional Machine Learning approaches, our quantum architecture is constructed via the Trotterised time-evolution of an effective Hamiltonian, ensuring that the learnable parameters correspond to interpretable physical quantities. We show that the VQC efficiently discovers the optimal interferometric protocol, specifically the evolution time and effective bath interactions required to maximise the visibility of Ramsey fringes, thereby clearly distinguishing the Bose-Einstein Condensate (BEC) and Bardeen-Cooper-Schrieffer (BCS) regimes. Furthermore, we report the validation of this classifier on the QRed superconducting quantum processor (BSC-CNS). Despite the intrinsic hardware noise and decoherence, the VQC preserves the relative ordering of the topological phases. We demonstrate that the physics-informed architecture achieves a linear gate complexity $\mathcal{O}(N)$, bypassing the exponential memory wall of classical simulation and ensuring scalability to many-body regimes.

14.
arXiv (CS.CL) 2026-06-11

CRANE: Constrained Reasoning Injection for Code Agents via Nullspace Editing

Code agents must both reason over long-horizon repository state and obey strict tool-use protocols. In paired Instruct/Thinking checkpoints, these capabilities are complementary but misaligned. The Instruct model is concise and tool-disciplined, whereas the Thinking model offers stronger planning and recovery behavior but often over-deliberates and degrades agent performance. We present CRANE (Constrained Reasoning Injection for Code Agents via Nullspace Editing), a training-free parameter-editing method that treats the Thinking-Instruct delta as a directional pool of candidate reasoning edits for the Instruct backbone. CRANE combines magnitude thresholding to denoise the delta, a Conservative Taylor Gate to retain edits that are jointly beneficial for reasoning transfer and tool-use preservation, and Graduated Sigmoidal Projection to suppress format-critical update directions. By merging paired Instruct and Thinking checkpoints, CRANE delivers strong gains over either individual model while preserving Instruct-level efficiency: on Roo-Eval it achieves pass1 of 66.2% (+19.5%) for Qwen3-30B-A3B and 81.5% (+8.7%) for Qwen3-Next-80B-A3B; on SWE-bench-Verified it resolves up to 14 additional instances at both scales (122/500 and 180/500); and on Terminal-Bench v2 it improves pass1/pass5 by up to 2.3%/7.8%, reaching 7.6%/17.9% and 14.8%/30.3%, respectively, consistently outperforming alternative merging strategies across all three benchmarks.

15.
arXiv (CS.CV) 2026-06-15

TSA: Temporal Slot Activation for Persistent Object-Centric Video Representation

Unsupervised video object-centric learning aims to decompose dynamic scenes into temporally persistent entity representations. Existing recurrent video slot-attention methods propagate a fixed set of slots across frames, but typically assume unconditional slot propagation: every slot is updated and decoded at every frame, regardless of whether its corresponding object is visible. We show that this design violates a basic lifecycle requirement for persistent slots: when an object is absent or fully occluded, its slot should preserve its previous state and avoid explaining unrelated visible content. Instead, unconditional propagation creates two failure pathways: update-induced state drift, where current-frame evidence overwrites the absent object's representation, and decoder-induced reconstruction interference, where the inactive slot remains coupled to reconstruction through decoder attention. We propose Temporal Slot Activation (TSA), a mechanism that learns a per-slot, per-frame activation score $\alpha_{k,t} \in (0, 1)$ without visibility supervision. TSA uses this activation as a shared latent control variable for slot lifecycle modeling. When a slot is inactive, TSA anchors its state to the previous slot via activation-gated updating and suppresses its decoder participation through an activation-dependent additive bias on attention logits before softmax normalization. This jointly reduces state drift and reconstruction-driven interference. To improve decisions under partial occlusion and gradual reappearance, TSA further conditions activation prediction on a per-slot temporal memory produced by a Temporal Context Encoder. We evaluate TSA on MOVi-C/E, YT-VIS, and OVIS benchmarks using both standard and tracking-based metrics (FG-ARI, mBO, IDF1, HOTA). TSA consistently improves object decomposition and temporal identity preservation, with large gains on long, heavily occluded videos.

16.
arXiv (CS.LG) 2026-06-16

Evolutionary Bilevel Reward Shaping for Generalization in Reinforcement Learning

arXiv:2606.16236v1 Announce Type: new Abstract: Reinforcement learning (RL) often suffers from performance degradation when deployed in environments that differ from those encountered during training. Existing techniques such as domain randomization (DR) mitigate this, but require access to diverse training environments and full trajectory observability, assumptions that fail in privacy-preserving or restricted scenarios where only scalar performance metrics are available. We propose Generalization via Evolutionary Reward Shaping (GERS), a bilevel optimization approach to improve generalization on unseen test environments using only scalar feedback from validation environments. At the lower level, an RL agent guided via a reward function shaped by the upper level learns a policy on a limited set of training environments with accessible trajectory data; at the upper level, CMA-ES optimizes the reward shaping parameters to maximize the cumulative unshaped reward on separate validation environments for which trajectory access is unavailable. Results on continuous control tasks indicate that GERS outperforms the standard RL baseline on unseen test environments. GERS performance is comparable to DR, despite DR treating the combined set of training and validation environments of GERS as a single training set that requires trajectory access, whereas GERS cannot access validation trajectories. These results confirm that GERS effectively enhances generalization under restricted data access constraints.

17.
bioRxiv (Bioinfo) 2026-06-11

ANCHOR: haplotype-aware allelic and isoform inference from single-cell long-read RNA sequencing with de novo variant calling

Long-read RNA sequencing enables haplotype- and isoform-resolved allelic analysis of transcriptomes, yet extending this capability to single cells and distinct cell types remains computationally challenging due to sparse coverage, sequencing errors, incomplete variant information, and reference-biased transcript assignment. Here we present ANCHOR, a haplotype-aware framework for single-cell long-read RNA sequencing that performs de novo expressed-variant discovery, molecule-level haplotype assignment and isoform-resolved allelic quantification. ANCHOR combines a signed-graph variant caller, pair hidden Markov modelling and beta-binomial UMI aggregation to infer parental allele counts for genes and splice-resolved isoforms, without requiring a pre-existing phased genotype or deep learning. In human single-cell long-read RNA benchmarks, ANCHOR improved variant-calling performance over tested long-read RNA callers at single-cell and low-to-moderate coverage, and its beta-binomial model reduced depth-driven false positives in allele-specific expression testing. Applied to newly generated single-cell long-read RNA-seq data from reciprocal mouse crosses during gastrulation, ANCHOR resolved cell-type- and isoform-specific parent-of-origin imprinting and identified an antagonistic maternally biased Sgce isoform. ANCHOR provides a general framework for allele- and isoform-resolved analysis of diploid single-cell long-read transcriptomes.

18.
arXiv (CS.CV) 2026-06-12

MagPlus: Bridging Micro-to-Regular Facial Expressions through Learnable Magnification

Facial micro-expressions are subtle and short-lived facial movements that provide important cues about genuine human emotions. However, modeling and generating them remains difficult because annotated micro-expression data is limited and the underlying facial motions are extremely weak. Existing micro-expression generation methods therefore often suffer from limited quality, weak robustness, and poor generalization. We propose MagPlus, a transferable micro-expression processing pipeline that connects micro-expression analysis with standard facial animation models. Instead of training a dedicated generator from scratch, MagPlus learns to magnify subtle facial motions into the range of regular facial expressions, transforming micro-expressions into signals that are compatible with existing facial expression processing models. The magnified sequence is then used by a standard facial expression model for tasks such as transfer and synthesis. A complementary DeMagPlus module then restores the generated motion back to realistic micro-expression intensity levels while preserving the synthesized dynamics. We evaluate the framework using four facial animation models: FOMM, FSRT, MetaPortrait, and EmoPortraits. None of these models are trained on micro-expression data. Experiments show that MagPlus-DeMagPlus enables pretrained macro-expression models to generate more realistic micro-expression motion without retraining the backbones.

19.
arXiv (math.PR) 2026-06-18

Delayed blow-up by transport noise for the 3D Navier-Stokes equation with Navier-slip boundary conditions

作者:

arXiv:2606.19060v1 Announce Type: cross Abstract: We study the vorticity formulation of the 3D Navier-Stokes equation driven by transport noise in a periodic channel with Navier-slip boundary conditions. We consider both non-degenerate transport noise and degenerate tangential transport noise. For any prescribed $T>0$ and $\epsilon>0$, we prove that, by choosing the noise intensity sufficiently large and concentrating the noise on sufficiently high modes, the solution exists up to $T$ with probability at least $1-\epsilon$. A main contribution of this work is to identify and analyze the interaction between enhanced dissipation induced by transport noise and physical boundary effects. The no-flux condition breaks the isotropy of the noise and changes the scaling limit of the Itô-Stratonovich corrector. In the non-degenerate case, a boundary feedback term appears in the limiting effective operator; in the degenerate case, the limiting operator is a nonlocal anisotropic tangential dissipation. The proof is based on a combination of a boundary correction operator, a Meyers-type estimate, a scaling-limit analysis of the Itô-Stratonovich corrector, and resolvent estimates for the deterministic limiting equations.

20.
arXiv (CS.CV) 2026-06-16

Improved Knowledge Distillation for Land-Use Image Classification

In the present article, an improved Knowledge Distillation (KD) framework has been proposed for efficient compression of deep convolutional neural networks for land-use image classification task. Motivated by the need to achieve competitive classification accuracy while reducing computational complexity, a teacher-student learning paradigm is adopted in which a VGG16 network transfers knowledge to a lightweight MobileNetV2 model. The proposed framework integrates hard supervision from ground truth labels with a soft supervision strategy that combines Kullback-Leibler divergence and Cosine Similarity losses. Experiments conducted on three land-use datasets show that the proposed KD-based method yields improved performance, and achieves an accuracy of 99.04%, outperforming both baseline student training and single-loss distillation approaches, while retaining substantial model compression.

21.
arXiv (CS.CV) 2026-06-16

Enabling Real-Time Point-of-Care Ultrasound Segmentation: A GPU-Free Deployment in Resource-Limited Settings

作者:

Ultrasound imaging is the most widely adopted medical modality globally due to its low cost and portability, yet artificial intelligence (AI) deployment remains constrained by reliance on GPU-accelerated models, creating a structural paradox where the cost of "intelligence" exceeds that of the imaging device itself. Here, we present the systematic adaptation and extensive evaluation of UltraSeg, an ultra-lightweight architecture originally developed for colonoscopic polyp segmentation, now engineered for point-of-care ultrasound (POCUS) across ten public datasets spanning six anatomical sites (breast, thyroid, kidney, carotid, fetal, and small-animal tumor). We systematically validate both variants in ultrasound domains: UltraSeg-130K (0.13M parameters) achieves 89.7 FPS on single-core CPUs and 34.8 FPS on a refurbished mobile device, while UltraSeg-500K (0.5M parameters) delivers 44.6 FPS on CPU and 16.1 FPS on mobile device. UltraSeg-500K matches or exceeds the Dice performance of the 31M-parameter UNet and approaches 105M-parameter TransUNet in average performance, with superior zero-shot cross-dataset generalization on external validation sets (UDIAT, DDTI). By enabling clinical-grade segmentation without GPU dependency, this work brings AI costs in line with ultrasound accessibility, making advanced diagnostics available in resource-limited settings.

22.
arXiv (CS.CL) 2026-06-16

Who Should Lead Decoding Now? Tracking Reliable Trajectories for Ensembling Masked Diffusion Language Models

Masked Diffusion Language Models (MDLMs) have emerged as a distinct paradigm for sequence generation. As MDLMs become diverse in capabilities and knowledge coverage, an important question is how to combine their knowledge. Toward this, we first investigate the unique decoding dynamics of MDLMs. We find that successful generations exhibit stable confidence dynamics over answer-relevant positions, while unreliable trajectories can often be corrected by injecting promising intermediate states from other models. Guided by this observation, we propose $TIE$ ($T$rajectory-based $I$terative $E$nsembling), a knowledge fusion framework in which MDLMs iteratively identify reliable decoding trajectories and relay them across models. TIE tracks confidence dynamics over answer-relevant positions to determine which model currently follows a more reliable trajectory and selectively transfers partially denoised sequences across models. As the model on the more promising trajectory often changes across denoising steps, TIE allows different models to contribute complementary strengths at different stages of generation. Strong performance across diverse reasoning tasks, along with our analyses, suggests that TIE offers a practical approach to the underexplored problem of MDLM ensembling.

23.
arXiv (CS.LG) 2026-06-15

Decompose Sparsely Where You Should, Absorb Densely Where You Should No

arXiv:2606.14040v1 Announce Type: new Abstract: Sparse autoencoders (SAEs) are typically trained to reconstruct the entire residual stream through a sparse dictionary, implicitly assuming that all activation content is amenable to sparse, monosemantic decomposition. We question this assumption and hypothesize that activations contain a low-rank, dense component that is computationally important to the model yet inherently unsuitable for sparse representation, which serves as a major source of the persistent dense latents widely observed in trained SAEs. To test this, we add a small rank-$r$ linear bottleneck in parallel with standard SAEs (BatchTopK and Matryoshka), allowing dense structure to be absorbed before sparse reconstruction. On Gemma-2-2B layer 12, a rank-24 bottleneck reduces dense latent count by up to 84\% while improving sparse probing and targeted probe perturbation on both architectures at matched sparsity. The absorbed component is (i) structurally identifiable as the top principal components and outlier dimensions; (ii) causally necessary, with removing it raising next-token cross-entropy by 7.5$\times$, far exceeding the 2.8$\times$ from removing the geometrically near-identical top-24 PCA directions; and (iii) redundantly encoded by sparse dictionaries, with ablating 787 maximally aligned sparse features raising cross-entropy by only 2.9$\times$ and ablating 2,048 topic-aligned features leaving MMLU topic classification virtually unchanged, whereas removing the scaffold drops it from 98.7\% to chance. Together, our findings identify a compact, semantically informative and causally important component of residual stream activations (which we term a computational scaffold) that standard sparse dictionaries represent inefficiently, suggesting that the scope of sparsity-based interpretability methods warrants careful re-examination.

24.
arXiv (CS.AI) 2026-06-16

FlowMPC: Improving Flow Matching policies with World Models

arXiv:2606.16286v1 Announce Type: cross Abstract: Flow Matching (FM) is a powerful approach for behavior cloning in multimodal action spaces [Jiang et al., 2025], but because it is not trained to directly maximize expected return, there is still room to improve how FM policies act at test time. This work investigates whether a learned world model can improve FM policies by enabling Model Predictive Path Integral (MPPI) planning over candidate action sequences proposed by the policy. Building on TD-MPC2 [Hansen et al., 2024], I introduce FlowMPC, a framework that combines an imitation-learned FM policy with a learned world model for test-time planning in ManiSkill manipulation tasks [Tao et al., 2025]. Across PickCube and PickSingleYCB, adding the world model improved performance over the FM policy alone, with especially clear gains in end-of-episode success. These results suggest that world-model-based planning can effectively complement flow-based imitation policies without modifying the FM training objective.

25.
arXiv (CS.AI) 2026-06-19

Zero-Inflated Gaussian Distributions Enable Parameter-Space Sparsity in Estimation-of-Distribution Algorithms

arXiv:2606.19369v1 Announce Type: cross Abstract: Estimation-of-distribution algorithms (EDAs) are a powerful class of evolutionary methods for black-box optimization, especially when little is known about the structure of the objective. Whereas classical evolutionary algorithms rely on hand-designed mutation and crossover operators, hard to devise for unknown problem structures, and a source of bias, EDAs sidestep operator design entirely: they fit a probability distribution to the best individuals and sample the next generation from it. EDAs are well established on continuous parameter spaces, but they have not previously been generalized to sparse ones, in which most coefficients of a good solution are exactly zero. Existing sparse black-box optimizers therefore reintroduce exactly what EDAs were designed to avoid: hand-crafted sparsity operators, bi-level schemes alternating between support set and active values, zeroing thresholds, and other baked-in assumptions. We close this gap by proposing multivariate zero-inflated Gaussian (ZIG) distributions as EDA sampling laws. A latent Gaussian model with separate indicator and value dimensions represents sparsity patterns, correlations among active parameters, and the interactions between the two, so sparsity patterns and active values are optimized jointly, hierarchy-free. We show that the latent parameters of this model are identifiable from observed samples, unlike in the missing-data settings where related constructions originate, and introduce practical amortized inversion-based estimators for them. The estimators accurately recover latent correlation structures, and on the Lunar Lander benchmark the resulting ZIG-EDA converges faster and reaches higher final returns than a dense Gaussian EDA, a hand-crafted sparse evolutionary algorithm, and an ad-hoc sparse EDA, while finding controllers with only a small fraction of parameters active.