Academic Intelligence · Curated Daily

探索全球前沿学术脉络

AcademicHub 汇聚顶级期刊与预印本平台的实时文献。定制您的专属科研雷达,利用大语言模型自动生成交叉领域文献分析简报。

01.
arXiv (CS.CV) 2026-06-17

Edit3DGS: Unified Framework for Dynamic Head Editing via 2D Instruction-Guided Diffusion and 3D Gaussian Splatting

We present Edit3DGS, a unified framework for dynamic 3D head editing that integrates 2D instruction-guided diffusion with 3D Gaussian splatting. Unlike prior approaches that separately address frame-based edits or static 3D reconstruction, our method couples semantic controllability in the image domain with photorealistic, temporally consistent 3D representations. Given an input video, editable facial regions are masked and modified using a text-conditioned diffusion model to support fine-grained operations such as expression transformation, attribute modification, and appearance refinement. The edited frames are then aggregated through 3D Gaussian splatting to produce a coherent, high-fidelity avatar that preserves both identity and motion dynamics. To enforce consistency, Edit3DGS incorporates multi-view batch editing and lightweight inpainting strategies that recover lost expressions across timesteps. Experimental results demonstrate that our framework enables controllable, artifact-free head editing with smooth temporal transitions, offering practical applications in virtual avatars, immersive communication, film production, and interactive media.

02.
arXiv (quant-ph) 2026-06-17

Quantum Information Processing: A brief overview on Quantum Teleportation

作者:

arXiv:1604.00852v3 Announce Type: replace Abstract: Quantum Information Processing (QIP) exploits the principles of quantum mechanics to perform information storage, communication, and computation in ways that are fundamentally impossible within classical frameworks. This article presents a pedagogical overview of the mathematical foundations of quantum information theory, including qubits, Hilbert spaces, linear operators, quantum measurements, tensor products, density operators, and quantum entanglement. Building upon these concepts, we provide a detailed introduction to quantum teleportation, one of the most remarkable protocols in quantum communication. The discussion covers the no cloning theorem, the original teleportation protocol by Bennett et al., experimental realisations of quantum teleportation, and extensions involving probabilistic and multiqubit teleportation schemes. Particular emphasis is placed on the role of entanglement as a communication resource, together with the study of teleportation channels based on bipartite and multipartite quantum states. Various quantitative measures of entanglement, including concurrence, negativity, entanglement of formation, and relative entropy of entanglement, are reviewed alongside teleportation fidelity as a performance metric. Furthermore, the interplay between Bell nonlocality, mixed state entanglement, and teleportation efficiency is examined, followed by a survey of advanced developments such as controlled teleportation, bidirectional teleportation, cluster state teleportation, and recent advances in the Quantum 2.0 era. This review aims to provide students, researchers, and engineers with a coherent introduction to the theoretical foundations and practical significance of quantum teleportation in emerging quantum technologies.

04.
arXiv (math.PR) 2026-06-12

Stochastic dominations for FK percolation and sharp thinning thresholds for the Ising energy field

arXiv:2606.13648v1 Announce Type: new Abstract: At first glance, one would imagine that the energy field of the Ising model, the set of edges whose endpoints share the same spin, is stochastically monotone as a function of the coupling constants. However, this is not generally the case. In this paper, we introduce two weaker notions of stochastic domination that make this result true: $p$–weak and $p$–weak$^\dagger$ domination. Both of these notions depend on a parameter $p$ and we find the optimal values $p$ and $p^\dagger$ so that these dominations hold. One of the key ingredient to obtain some of the results is a new stochastic domination relating FK percolations with different parameters $q,\tilde{q}\geq 1$ that is of independent interest.

05.
arXiv (CS.CV) 2026-06-11

CoCoSI: Collaborative Cognitive Map Construction for Spatial Intelligence

Spatial intelligence is a key frontier for multimodal large language models (MLLMs), enabling them to reason about the physical world from visual experience. Inspired by human spatial cognition, recent approaches construct grid-based cognitive maps from multi-frame visual inputs to maintain coherent spatial representations over time. However, limited context lengths still challenge spatial understanding, while existing methods, such as long-context modeling and external memory, often require architectural changes, memory modules, or finetuning, limiting their applicability to off-the-shelf pretrained MLLMs. This motivates a lightweight, model-agnostic method for preserving spatial information beyond the native context window. To this end, we propose a plug-and-play multi-agent framework that collaboratively constructs cognitive maps as structured spatial memory, enhancing the spatial understanding of arbitrary pretrained MLLMs without architectural modification or additional training. Our framework features local-global agent coordination, cognitive map construction with atomic commits, and cross-agent verification. Extensive experiments demonstrate that our method achieves superior performance on spatial understanding tasks while remaining fully training-free. Code will be released.

06.
medRxiv (Medicine) 2026-06-17

Short-term relaxation after cervical rotatory manipulation is more closely associated with somatosensory input than cracking sound: a randomized controlled EEG study

Background Cervical rotatory manipulation is commonly used for neck-related symptoms and is often accompanied by a cracking sound. This sound is frequently regarded as a sign of successful manipulation, but whether it contributes substantially to the immediate relaxation response remains unclear. Objective This study examined whether short-term relaxation after cervical rotatory manipulation is more closely related to manipulation-associated sensory input than to the cracking sound cue alone. Methods In this single-session, three-arm, parallel randomized controlled study, 54 healthy volunteers were allocated to cervical rotatory manipulation, sham manipulation, or sham manipulation plus simulated cracking sound. Subjective outcomes were assessed before and after intervention, including positive affect, negative affect, comfort, and satisfaction. Eyes-closed resting-state electroencephalography was recorded before and after intervention. Prespecified neural outcomes included frontal alpha power, frontal alpha/beta ratio, occipital individual alpha frequency, and alpha-band fronto-parietal and fronto-temporal functional connectivity. Results Cervical rotatory manipulation produced greater improvements in positive affect, comfort, and satisfaction than sham manipulation or sham manipulation plus simulated cracking sound, whereas negative affect remained generally stable across groups. These subjective responses were accompanied by short-term electroencephalography changes, particularly in frontal alpha/beta and alpha-band fronto-parietal and fronto-temporal functional connectivity. Changes in frontal alpha/beta ratio were positively associated with changes in positive affect. In contrast, simulated cracking sound alone did not reproduce the full subjective or electroencephalography response observed after real manipulation. Conclusions The immediate relaxation response after cervical rotatory manipulation appears to be more closely related to manipulation-associated sensory input than to the cracking sound cue alone. These findings provide preliminary neurophysiological evidence for distinguishing real manipulation effects from sound-related contextual cues.

07.
arXiv (CS.CV) 2026-06-16

Lightweight Distillation of SAM 3 and DINOv3 for Edge-Deployable Individual-Level Livestock Monitoring and Longitudinal Visual Analytics

Foundation-model pipelines for individual-level livestock monitoring – combining open-vocabulary detection, promptable video segmentation, and self-supervised visual embeddings – have raised the accuracy ceiling of precision livestock farming (PLF), but their GPU memory budgets exceed the envelope of commodity edge accelerators. To close this gap, the 446M-parameter Perception Encoder (PE-ViT-L+) backbone of SAM 3 is distilled into a 40.66M-parameter multi-scale student through three mechanisms: a Feature Pyramid Network student encoder built on TinyViT-21M-512, a four-term direction-then-scale distillation loss, and backbone-substitution inference with sliding-window session pruning that bounds streaming GPU memory growth. The DINOv3 family includes a pre-distilled ViT-S/16 variant (21.6M parameters) released alongside a 6716M-parameter ViT-7B teacher; the ViT-S (21M) variant is adopted as the per-individual embedder. On the Edinburgh Pig dataset, the compressed pipeline reaches 92.29% MOTA and 96.15% IDF1 against the SAM 3 teacher (1.68- and 0.84-percentage-point losses), achieves a 7.77-fold reduction in system-level parameters and a 3.01-fold reduction in peak VRAM (19.52GB -> 6.49GB), and reaches 97.34% top-1 accuracy with 91.67% macro-F1 on nine-class pig behaviour classification. The pipeline fits inside an NVIDIA Jetson Orin NX 16GB envelope with 4.9GB of headroom, supporting a proposed – but not yet empirically validated – on-device embedding-pool re-identification mechanism whose per-individual footprint of approximately 94MB per animal per year produces a longitudinal visual record amenable to retrospective association with disease, lameness, reproductive, and growth outcome labels.

08.
medRxiv (Medicine) 2026-06-12

Room-Specialized Mixture-of-Experts for In-Home ADL Recognition with Ambient Sensors

Monitoring activities of daily living (ADLs) in the home is a promising approach for tracking dementia progression in older adults. While ambient sensor-based ADL systems are well-studied, most existing ADL recognition systems rely on globally trained models that ignore the spatial organization of in-home activities. In real deployments, where training data are sparse and highly home-specific, global transformer models may fail to capture room-dependent behavioral structure. We propose a deterministic Mixture of Experts (MoE) architecture for in-home ADL recognition, in which each expert is a compact transformer specialized to one room of the home (bedroom, kitchen, bathroom, living area). Input segments are routed using a deterministic gating strategy based on room-level motion activity and time-of-day priors for sleep-related behaviors. Unlike learned routing networks, the proposed gate encodes domain knowledge about where ADLs are likely to occur, reducing model complexity under limited per-home training data. By decomposing ADL recognition into room-specific activity spaces, the proposed architecture reduces competition between dominant and low-frequency activities under highly imbalanced residential data. We evaluated the system on data collected via low-cost ambient sensors (motion, light, temperature, humidity) and Raspberry Pi edge devices across five homes, with ground-truth ADL labels provided by participants and caregivers. Across the five homes, the proposed MoE consistently outperformed global transformer, 1D CNN, and Random Forest baselines, achieving macro-F1 scores ranging from 0.60 to 0.88, highlighting the importance of home-specific modeling in real-world deployments. These findings suggest that room-aware expert specialization may provide a practical and interpretable strategy for low-data ADL recognition in real-world residential environments.

09.
bioRxiv (Bioinfo) 2026-06-16

Phylogenetic tree inference using generative models

Accurate inference of phylogenetic trees is fundamental to evolutionary biology, yet existing methods rely on complex pipelines involving multiple sequence alignment, explicit evolutionary models, and computationally intensive tree search procedures. Here, we present BetaInfer, a generative framework that reformulates phylogenetic tree inference as a sequence transduction problem. BetaInfer leverages hybrid transformer-based architectures to directly map sets of unaligned sequences to phylogenetic trees represented in Newick format. Trained on large-scale simulated evolutionary data with known ground truth, BetaInfer learns to capture complex evolutionary signals directly from sequence data. Ensemble-based generation of multiple candidate trees further improves robustness, reducing reconstruction error by over 30% relative to single predictions. Across extensive evaluations on both simulated and empirical datasets, BetaInfer achieves competitive performance relative to state-of-the-art phylogenetic pipelines, matching, and in some cases exceeding, the accuracy of established likelihood-based and distance-based methods under a wide range of conditions. Interpretability analyses reveal that BetaInfer leverages internal pairwise-distance computations to synthesize evolutionary relationships into an integrated, global representation that supports direct tree generation. Together, these results demonstrate that generative models can serve as a viable and scalable alternative to standard phylogenetic pipelines.

10.
arXiv (CS.CV) 2026-06-16

Self-Questioning Vision-Language Models: Reinforcement Learning for Compositional Visual Reasoning

Vision-Language Models (VLMs) are AI systems that process both images and text, yet they often struggle with compositional visual reasoning questions that require chaining multiple steps together, such as identifying objects, counting them, and comparing the results. Existing approaches improve this reasoning by training models on human-written step-by-step explanations, but creating these annotations is expensive and difficult to scale. We propose a self-questioning framework that trains a VLM to break visual questions into smaller sub-questions and answer each one before producing a final response, using a reinforcement learning algorithm called Group Relative Policy Optimization (GRPO). The model is never shown examples of how to decompose questions, it discovers this behavior on its own, guided by a reward signal that scores whether the output contains sub-questions and whether the final answer is correct. We apply this framework to a 3-billion-parameter model, training on both synthetic scenes of geometric shapes (CLEVR) and real-world photographs (A-OKVQA). On A-OKVQA, both self-questioning and standard reinforcement learning substantially improve accuracy over the untrained model (52.2% and 51.6% vs. 46.8%). We introduce the first self-questioning VLM by rewarding not only the final answer like standard RL but additionally for generating intermediate sub-questions, enabling it to discover compositional decomposition strategies. These results suggest that teaching AI systems to ask themselves intermediate questions is a promising strategy for complex visual reasoning, particularly when the difficulty of a question warrants explicit step-by-step decomposition.

11.
arXiv (CS.LG) 2026-06-11

Accurate and Resource-Efficient Federated Continual Learning

arXiv:2606.11480v1 Announce Type: new Abstract: Federated continual learning (FCL) must learn from distributed task streams under limited resources, such as communication, computation, memory, and label availability. Existing FCL methods often rely on repeated local optimization, replay, and full supervision. Analytic alternatives avoid iterative training and replay, but using high-dimensional random features to improve accuracy requires a second-order feature statistic, the Gram matrix, which has a quadratic communication cost in the random feature size $M$. We propose FedRAN, a resource-aware analytic FCL framework that replaces gradient-based updates with compact random feature statistics. Each client transmits a truncated-SVD summary of its Gram matrix, reducing the dominant second-order upload from quadratic to linear in $M$ for fixed rank. The server performs a two-level QR-SVD subspace merge, spatially across clients and temporally across tasks, and solves a ridge classifier in closed form. FedRAN further supports label scarcity through prototype-based pseudo-labeling. Across CIFAR-100, ImageNet-R, and VTAB datasets, FedRAN improves average accuracy by up to 4.8 percentage points over the strongest baseline, uses 30.6-121.8$\times$ less per-client communication than optimization-based FCL, and is 190.3$\times$ faster on average than gradient-based baselines; with only 20% labels, pseudo-labeling improves average accuracy by up to 6.61 points. These results show that FedRAN enables accurate and resource-efficient FCL under communication, computation, and label constraints. The source code is available at https://github.com/JebacyrilArockiaraj/Fed-RAN-SSL.

12.
arXiv (CS.CV) 2026-06-11

OSCS-SupCon: Orthogonal Sigmoid-based Common and Style Supervised Contrastive Learning for Robust Feature Disentanglement

Supervised Contrastive Learning (SupCon) has achieved strong performance by explicitly modeling pairwise relationships among samples. However, existing SupCon-based methods suffer from two key limitations: negative-sample dilution induced by the standard InfoNCE loss, and feature-space entanglement caused by the lack of explicit constraints separating category-relevant (common) and category-irrelevant (style) features. These limitations reduce feature discriminability and generalization ability. To address these issues, we propose OSCS-SupCon (Orthogonal Sigmoid-based Common and Style Supervised Contrastive Learning), a unified framework that combines a sigmoid-based pairwise contrastive objective with explicit orthogonality constraints. Specifically, we introduce a sigmoid-based contrastive loss with two learnable parameters, temperature and bias, which adaptively modulate pairwise decision boundaries and alleviate negative-sample dilution. Furthermore, we enforce orthogonality between common and style feature subspaces via a linear projection with ReLU nonlinearity, thereby reducing feature overlap and improving disentanglement of style-irrelevant representations. Extensive experiments on six benchmark datasets demonstrate that OSCS-SupCon consistently outperforms state-of-the-art supervised contrastive learning methods across multiple backbone architectures. In particular, on the fine-grained CUB200-2011 dataset with a ResNet-18 backbone, the proposed method achieves a 3.4% improvement in classification accuracy over CS-SupCon, highlighting its robustness and generalization capability. Ablation studies further confirm the effectiveness of each component.

13.
arXiv (CS.LG) 2026-06-15

Gradient boosting for extremes: sampling theory and application to insurance

arXiv:2606.14268v1 Announce Type: cross Abstract: We develop a statistical learning theory for gradient boosting applied to the estimation of covariate-dependent Generalized Pareto (GP) distributions in the context of Peaks-over-Threshold modeling. After an orthogonal reparametrization of the GP likelihood that diagonalizes its Fisher information matrix, we cast the estimation problem within the Empirical Risk Minimization (ERM) framework and derive non-asymptotic error bounds for the boosting estimator. Our analysis accounts for three distinct sources of error in the process: statistical fluctuations, the approximation bias inherent to the asymptotic nature of the GP model-controlled under second-order regular variation-and the approximation error associated with the finite number of boosting iterates, making explicit the resulting bias-variance trade-off. We illustrate the practical benefits of the reparametrization through simulations, showing that it significantly reduces gradient correlation during training and improves convergence stability. The methodology is applied to a medical malpractice insurance dataset from the Texas Department of Insurance, comprising over 18 000 closed claims. The gradient boosting approach yields a good fit for the tail of settlement cost distributions and reveals that the number of days to settlement is the dominant predictor of tail heaviness, consistent with earlier findings in the reserving literature.

14.
arXiv (CS.CL) 2026-06-19

PASQA: Pitch-Accent-Focused Speech Quality Assessment Model Trained on Synthetic Speech with Accent Errors

Existing mean opinion score (MOS) prediction models typically predict utterance-level naturalness MOS and can be insensitive to localized pitch-accent errors. We propose Pitch-Accent-focused Speech Quality Assessment (PASQA), which explicitly targets pitch-accent correctness. To train our model, we construct a controlled Japanese accent-error dataset by changing accent patterns using an accent-controllable text-to-speech system, and compute a pseudo accent-quality score from the accent-error rate. PASQA builds on self-supervised representations and employs mora-conditioned fusion, ranking loss, an auxiliary accent-error localization task, and speaker-invariant training. Experiments show that conventional models fail to preserve the ordering by accent-error severity, whereas PASQA achieves high ordering accuracy on both seen and unseen speakers. Further, PASQA shows stronger agreement with human accent-correctness judgments. The code is available at https://github.com/lycorp-jp/PASQA.

15.
medRxiv (Medicine) 2026-06-12

Estimating the effectiveness of syndromic screening at airports for Bundibugyo ebolavirus disease

We used a stochastic simulation model to estimate the effectiveness of combined exit and entry airport screening for Bundibugyo ebolavirus disease (BVD), using natural-history parameters from a Bayesian re-analysis of the 2012 Isiro outbreak. For a 12-hour international flight from DRC or Uganda at 86% screening sensitivity, we estimate 65% of infected travellers would arrive undetected (95% CrI: 38 - 76%). The main driver of this outcome is the relative duration of the the incubation period (approximately 7.7 days) and the onset-to-severe-disease interval (approximately 4 days): most infected travellers board before symptom onset and are undetectable by any syndromic screen, whilst those who are symptomatic progress rapidly to illness severe enough to preclude travel. This is compounded during active epidemic growth, when recently exposed (and therefore pre-symptomatic) cases are overrepresented among travellers. Syndromic airport screening offers limited protection against BVD spread via air travel, and should be complemented by outbreak control at source and strengthened clinical surveillance in receiving countries with high travel connectivity to affected areas.

16.
arXiv (CS.AI) 2026-06-11

LSTM-Based Detection of Structural Breaks in Property Insurance Loss Reserving: A Climate-Informed Approach

arXiv:2606.11463v1 Announce Type: cross Abstract: Accurate loss reserving is foundational to insurer solvency, yet accelerating climate driven catastrophes systematically violate the stability assumptions on which traditional actuarial methods depend. This white paper presents a research program testing whether Long Short Term Memory (LSTM) neural networks can detect and adapt to these structural breaks faster and more accurately than Chain Ladder, Bornhuetter Ferguson, and Cape Cod methods. Using 15 plus years of regulatory development triangle data from Florida and Louisiana, enriched with NOAA hurricane intensity indices and sea surface temperatures, we hypothesize a targeted improvement of 15, 20% in reserve accuracy for catastrophe exposed years, a threshold grounded both in the prior neural network reserving literature and in the formal convergence results developed here. Beyond empirical validation, we develop a theoretical framework grounding LSTM structural break detection in probabilistic terms, providing formal performance guarantees that compensate for the limited number of catastrophe events in the test period. We document the research design, methodology, expected contributions, and a candid assessment of limitations.

17.
arXiv (CS.CV) 2026-06-15

SinGeo: Unlock Single Model's Potential for Robust Cross-View Geo-Localization

Robust cross-view geo-localization (CVGL) remains challenging despite the surge in recent progress. Existing methods still rely on field-of-view (FoV)-specific training paradigms, where models are optimized under a fixed FoV but collapse when tested on unseen FoVs and unknown orientations. This limitation necessitates deploying multiple models to cover diverse variations. Although studies have explored dynamic FoV training by simply randomizing FoVs, they failed to achieve robustness across diverse conditions – implicitly assuming all FoVs are equally difficult. To address this gap, we present SinGeo, a simple yet powerful framework that enables a single model to realize robust cross-view geo-localization without additional modules or explicit transformations. SinGeo employs a dual discriminative learning architecture that enhances intra-view discriminability within both ground and satellite branches, and is the first to introduce a curriculum learning strategy to achieve robust CVGL. Extensive evaluations on four benchmark datasets reveal that SinGeo sets state-of-the-art (SOTA) results under diverse conditions, and notably outperforms methods specifically trained for extreme FoVs. Beyond superior performance, SinGeo also exhibits cross-architecture transferability. Furthermore, we propose a consistency evaluation method to quantitatively assess model stability under varying views, providing an explainable perspective for understanding and advancing robustness in future CVGL research. Codes will be available upon acceptance.

18.
arXiv (quant-ph) 2026-06-11

An Introduction to the Foundations and Interpretations of Quantum Mechanics

arXiv:2603.09818v2 Announce Type: replace Abstract: This article surveys a selection of key conceptual and interpretational developments in quantum mechanics, tracing the theory from its foundational postulates to contemporary discussions of measurement, nonlocality, and the emergence of classicality. Beginning with the structure of Hilbert space and the postulates governing state evolution and measurement, the epistemic stance of the Copenhagen interpretation and its modern reformulations are examined. The Einstein-Podolsky-Rosen argument, Bell's theorem, and Hardy's paradox are then discussed as probes of locality and realism, alongside the deterministic but explicitly nonlocal de Broglie-Bohm theory. The measurement problem and the implications of contextuality are analyzed in relation to objective collapse models, which introduce new physical dynamics to account for definite outcomes. Finally, the role of decoherence in the suppression of interference and the emergence of classical behavior is explored, together with the interpretational frameworks of many-worlds and consistent histories. This material aims to provide a coherent introductory overview of how several of the most prominent interpretations address the central concern of what quantum mechanics tells us about the nature of physical reality.

19.
arXiv (CS.AI) 2026-06-17

Optimism Stabilizes Thompson Sampling for Adaptive Inference

arXiv:2602.06014v2 Announce Type: replace-cross Abstract: Thompson sampling (TS) is widely used for stochastic multi-armed bandits, yet its inferential properties under adaptive data collection are subtle. Classical asymptotic theory for sample means can fail because arm-specific sample sizes are random and coupled with the rewards through the action-selection rule. We study adaptive inference for Thompson sampling with Gaussian randomized indices in $K$-armed stochastic bandits with independent sub-Gaussian reward noises, and identify optimism as a key mechanism for restoring stability, meaning that each arm's pull count concentrates around a deterministic scale. This stability yields asymptotically valid Wald inference despite adaptive sampling. First, we prove that variance-inflated TS is stable for any $K \ge 2$, including the challenging regime where multiple arms are optimal, with asymptotically uniform allocation over optimal arms and sharp logarithmic pull-count asymptotics for suboptimal arms. This resolves the $K$-armed extension question raised by \citet{halder2025stable}, using new winner-map and Lyapunov-drift techniques to control allocation among multiple optimal arms. Second, we analyze an alternative optimistic modification that keeps the Gaussian index variance unchanged but adds an explicit mean bonus to the index center, and establish a similar stability conclusion. In summary, suitably implemented optimism stabilizes Thompson sampling and enables asymptotically valid Wald inference in multi-armed bandits, while incurring only a mild additional regret cost.

20.
arXiv (CS.AI) 2026-06-19

PSCT-Net: Geometry-Aware Pediatric Skull CT Reconstruction via Differentiable Back-Projection and Attention-Guided Refinement

arXiv:2606.19867v1 Announce Type: cross Abstract: Computed Tomography (CT) is essential for diagnosing pediatric craniofacial abnormalities, yet poses radiation risks to developing anatomies. Reconstructing 3D CT from sparse bi-planar X-rays offers a low-dose alternative but is severely ill-posed. Existing methods employ geometry-agnostic feature lifting, naively projecting 2D features into 3D without explicit spatial modeling, causing depth ambiguity and degraded osseous boundaries. We present PSCT-Net, a geometry-aware framework with differentiable back-projection. Differentiable back-projection establishes a spatially faithful volumetric prior, alleviating depth ambiguity. An Attention-Guided Projection (AGP-3D) module then learns non-linear voxel-wise correspondences between 2D regions and 3D locations. A Bidirectional Mamba (BiM-3D) module captures long-range volumetric dependencies with linear complexity. We further curate a private institutional pediatric skull CT cohort, PedSkull-CT, comprising normal and pathological cases for internal evaluation, addressing the gap in adult-centric, trunk-focused datasets.

21.
arXiv (CS.CL) 2026-06-15

Sub-Token Routing for KV Cache Compression

Transformer inference often requires a large KV cache, especially for long-context language modeling and multimodal generation. Existing compression methods usually reduce cache cost by selecting, evicting, quantizing, or compressing cached tokens, or by reducing the visual-token sequence before language-model inference. We introduce sub-token routing, a KV-compression method that adds a finer control axis inside retained tokens. It splits each retained value vector into groups and keeps only selected groups, while leaving query and key states unchanged. The method is designed to work after token-level reduction. First, a token-reduction method determines which tokens are retained. Then, sub-token routing compresses the value states inside those retained tokens. Experiments under matched KV budgets show that adding sub-token routing improves token-level reduction performance in both LLM and VLM settings, including Quest on LLaMA-2-7B and Qwen2.5-7B, and FastV/VisionZip across LLaVA and Qwen-VL models. The gains are larger at smaller KV budgets, suggesting that value-group routing is especially useful when further token removal becomes costly. Overall, token-level reduction and sub-token routing provide complementary ways to reduce KV cost.

22.
arXiv (CS.CV) 2026-06-16

Beer-Lambert Guided Representation Learning for Unsupervised Anomaly Detection in Sub-THz Food Inspection Images

Food manufacturing requires reliable inspection systems to detect foreign material contamination and maintain product safety. Sub-THz transmission imaging provides material-dependent attenuation characteristics that are useful for detecting low-density contaminants in food products. However, existing unsupervised anomaly detection methods mainly rely on RGB-pretrained visual representations, which may not adequately capture the transmission behavior of Sub-THz images. This paper proposes a Beer-Lambert guided representation learning framework for unsupervised anomaly detection in Sub-THz food inspection images. The proposed method introduces an attenuation decomposition module as an auxiliary regularization module that constrains student representations through attenuation reconstruction during training. In addition to the conventional one-class setting, we introduce a Leave-One-Food-Out protocol to evaluate generalization capability under unseen food categories. Experimental results on the Inline-Food-Inspection-THz dataset show that the proposed method improves overall anomaly detection performance over the baseline method.

23.
arXiv (quant-ph) 2026-06-17

On the entanglement induced by the deformation of phase-space

arXiv:2606.17587v1 Announce Type: new Abstract: Most quantum gravity theories propose that the fundamental concept of space-time is mostly compatible with quantum theory in noncommutative (NC) space. In the present paper, we revisit the notion of entanglement induced by NC deformations of phase space. The positive partial transpose (PPT) criterion for separability of bipartite Gaussian states is extended to a general class of Bopp's shift. In particular, we have considered both the position-position and momentum-momentum noncommutativity, with deformation parameters $\theta$ and $\eta$, respectively. It turns out that $\theta$ and $\eta$ induce the entanglement. We have directly applied the formalism for an anisotropic two-dimensional harmonic oscillator. Peres-Horodecki separability condition leads to a constraint equation for the parameter values of the oscillator in NC space. It turns out that the bipartite Gaussian state is almost always entangled in deformed space. To implement the theoretical idea, we provide an outline for a gedankenexperiment to identify the signature of phase-space noncommutativity, i.e., quantum gravity. In particular, the gedankenexperiment is devised to test the separability of supposedly separable Gaussian states in the usual commutative space, through the covariance matrix, which is constructed via measured output photocurrents after interaction of input Gaussian states and reference states. If the experiment shows that the supposedly separable states are actually entangled, then the entanglement is created through the intermediate background noncommutative space, which is a signature of the quantum nature of gravity.

24.
arXiv (CS.CV) 2026-06-16

A Comprehensive Survey of Medical Image Segmentation: Challenges, Benchmarks, and Beyond

Medical image segmentation plays a critical role in clinical diagnostics, treatment planning, disease monitoring, and neurological disorder identification. This article presents a comprehensive review of its systematic development, covering widely used public datasets, representative methods built on the U-Net, Transformer, and SAM architectures, and key evaluation metrics with their differences, followed by an analysis of major challenges from multiple perspectives. Unlike surveys that focus on a single model family or a specific clinical application, this review organizes U-Net-, Transformer-, and SAM-based methods within a unified analytical framework, with a particular focus on their effectiveness in improving segmentation accuracy and efficiency. This work aims to guide future research and support clinical translation of medical image segmentation, with all related resources publicly available in our GitHub repository: https://github.com/andrew-pengyu/Awsome_MedSeg/tree/main.

25.
arXiv (CS.CV) 2026-06-11

How Seemingly Inconsequential Design Choices Dictate Performance of LLMs in Pathology

General-purpose large language models (LLMs) are routinely used as baselines when evaluating specialized pathology models on whole-slide images (WSIs). Because WSIs exceed contemporary model context limits, LLM baselines routinely use small, high-magnification patches processed independently via majority voting, without systematic evaluation of seemingly inconsequential design choices such as patch size, patch count, and magnification. Generalist LLMs have consistently underperformed specialized systems, reinforcing the perception that domain-specific training or architectural adaptation is necessary for pathology tasks involving WSIs. Here, we conduct a systematic factorial analysis of four input design factors: inference mode, patch size, magnification, and patch count. We demonstrate that prior studies have overstated the gap between specialized models and general-purpose LLMs by choosing non-optimized input configurations. On the MultiPathQA benchmark, switching to a single balanced configuration (large patches at lower magnification, processed jointly) raises GPT-5 from 15.1% to 39.5% on cancer-type classification (TCGA) and from 38.1% to 62.9% on organ classification (GTEx). Per-task optimization yields further gains up to 43.9% (TCGA) and 71.6% (GTEx). The same configuration generalizes to two other models and to a fully held-out CPTAC cohort, where it improves Gemini 3 Flash by 23.4 percentage points without any task-specific tuning.