Academic Intelligence · Curated Daily

Explore the Frontier of Global Academia

AcademicHub aggregates real-time literature from top journals and preprint platforms. Build your personal research radar and let large language models compile cross-disciplinary analysis briefings automatically.

01.
arXiv (CS.CL) 2026-06-12

PiDA: Phonetically-Informed Data Augmentation for Robust Vietnamese Speech Translation

Cascaded speech translation (ST) systems suffer from error propagation when Automatic Speech Recognition (ASR) outputs incorrect transcripts. We present the first systematic categorization of ASR errors for Vietnamese ST, classifying substitution errors by phonetic cause and quantifying their impact on downstream Neural Machine Translation (NMT) performance using Linear Mixed-Effects Modelling. We confirm that most ASR substitution errors arise from phonetic confusions rather than random noise, and that these phonetic errors significantly degrade ST quality. Motivated by this finding, we propose Phonetically-Informed Data Augmentation (PiDA), which generates ASR-like corruptions by substituting words with phonetically similar alternatives using phonetic word embeddings. Fine-tuning on a PiDA-augmented version of FLEURS Vietnamese-English improves translation of erroneous ASR outputs (up to +2.04 BLEU over standard fine-tuning) while also slightly improving clean-text performance.

02.
arXiv (CS.CL) 2026-06-15

Sub-Token Routing for KV Cache Compression

Transformer inference often requires a large KV cache, especially for long-context language modeling and multimodal generation. Existing compression methods usually reduce cache cost by selecting, evicting, quantizing, or compressing cached tokens, or by reducing the visual-token sequence before language-model inference. We introduce sub-token routing, a KV-compression method that adds a finer control axis inside retained tokens. It splits each retained value vector into groups and keeps only selected groups, while leaving query and key states unchanged. The method is designed to work after token-level reduction. First, a token-reduction method determines which tokens are retained. Then, sub-token routing compresses the value states inside those retained tokens. Experiments under matched KV budgets show that adding sub-token routing improves token-level reduction performance in both LLM and VLM settings, including Quest on LLaMA-2-7B and Qwen2.5-7B, and FastV/VisionZip across LLaVA and Qwen-VL models. The gains are larger at smaller KV budgets, suggesting that value-group routing is especially useful when further token removal becomes costly. Overall, token-level reduction and sub-token routing provide complementary ways to reduce KV cost.

03.
arXiv (quant-ph) 2026-06-15

Quantum sensing through bosonic-fermionic Bell-state transitions in two-photon interference

arXiv:2606.14408v1 Announce Type: new Abstract: Hong-Ou-Mandel (HOM) interference has become a central resource for quantum sensing and metrology owing to its sensitivity to temporal delay and photon indistinguishability. However, existing HOM-based sensing schemes generally rely on inserting a sample into one arm of the interferometer, making the measurement vulnerable to optical loss, alignment instability, and bandwidth-dependent distortion of the interference profile. Here, we demonstrate a symmetry-controlled quantum sensing scheme based on continuous transitions between symmetric (bosonic-like) and antisymmetric (fermionic-like) Bell states in two-photon interference. By imprinting a geometric phase onto the classical pump beam and transferring it to polarization-entangled photons generated via spontaneous parametric down-conversion, we coherently tune the exchange symmetry of the entangled state without altering the temporal or spectral indistinguishability of the photons. The HOM response evolves continuously from bunching to antibunching with a sine square phase dependence, producing a coincidence modulation of approximately 10 * 10^4 counts s^-1 counts/s. In contrast to conventional HOM sensing, the phase-modulation linewidth remains fixed at pi/2, independent of photon bandwidth. Using a birefringent crystal placed directly in the pump beam, we measure thermo-dispersive birefringence with a resolution of the order of 10^{-6} over a broad temperature range. Our results establish exchange symmetry as a controllable resource for robust quantum sensing and symmetry-engineered photonic quantum information processing.

04.
arXiv (CS.LG) 2026-06-15

Neural Slack Variables for Shape Constraints

arXiv:2606.13803v1 Announce Type: new Abstract: Enforcing functional inequality constraints such as monotonicity and convexity in neural networks is a fundamental challenge in many industrial and scientific applications. Classical one-sided penalty methods, along with primal-dual methods gated by complementary slackness, provide constraint gradients only at violated locations, resulting in fragile satisfaction. Architectures that guarantee feasibility by construction, on the other hand, remain largely limited to elementary cases and impose additional inductive biases. We introduce neural slack variables, a deep learning native primal-side approach that converts constraint enforcement into a regression problem by coupling the primary network with a jointly learned auxiliary network. The auxiliary network serves as a valid target for the primary network's constraint quantities, inducing feasibility and regularity. Neural slack variables achieve zero measured violations on dense-grid monotonicity and convexity test cases, where penalty and primal-dual baselines leave residual violations, and enable arbitrage-free learning of volatility surfaces, an open industrial challenge in quantitative finance.

05.
arXiv (quant-ph) 2026-06-16

Quantum Field-Theoretic Predictions of {\Psi}-Epistemic Models of Quantum Mechanics

arXiv:2605.12546v2 Announce Type: replace Abstract: {\Psi}-epistemic models of quantum mechanics imply that the quantum state does not correspond to physical reality, but instead reflects the observer's knowledge of the underlying quantum system. The epistemic view of the quantum state has the potential to shed light on several foundational problems of quantum theory and has attracted considerable attention in the literature. On the other hand, the Pusey-Barrett-Rudolph theorem demonstrated that broad classes of {\psi}-epistemic models must lead to predictions that deviate from those of quantum mechanics. Although the original theorem involved entangled joint measurements on composite systems, alternative no-go theorems involving measurements on single quantum systems were developed shortly thereafter. Experimental investigations of the deviations predicted by {\psi}-epistemic models from quantum mechanics are still ongoing. So far, such tests have been performed within the framework of non-relativistic quantum mechanics and predominantly rely on quantum information based measurement procedures. In this work, we show that {\psi}-epistemic models can give rise to deviations from standard quantum field-theoretic predictions through modifications of polarized scattering cross sections and decay widths. Our results do not require a relativistic formulation of ontological models or of the Harrigan-Spekkens criterion; the essential assumption is merely that measurements implemented through relativistic processes can still be represented within the ontological framework by well-defined response functions and probabilities. The present work constitutes a proof-of-principle study demonstrating that particle physics tests of the ontological status of the quantum state are possible and that {\psi}-epistemic models may exhibit experimentally distinguishable signatures in particle phenomenology.

06.
arXiv (CS.AI) 2026-06-16

LLMs on Tabular Data with Limited Semantics: Evidence from Industrial Car Retrofit Prediction

arXiv:2606.15314v1 Announce Type: cross Abstract: Industrial retrofit planning depends on structured operational data rather than free text: planners must estimate whether a newly registered prototype will require a retrofit, which retrofit package it will need, and how long the work will take. We study an industrial dataset linking a prototype-registration system (284,271 vehicles) with a retrofit-management system (48,716 cleaned visits), and compare strong tabular machine learning baselines with three LLM-based strategies on row-serialized inputs: embedding features (Amazon Titan), direct prompted classification (Claude Sonnet 4), and an ML+LLM stacking approach. Across binary occurrence prediction, 15-way retrofit-type classification, per-visit duration regression, and an aggregated monthly benchmark, classical tree ensembles remain the strongest standalone models. However, the LLM results reveal a consistent pattern: embeddings remain useful on tables (binary AUC = 0.982), direct prompting collapses once semantic signal is stripped by hashing (binary AUC = 0.500; multiclass weighted F1 = 0.018), and hybrid stacking yields the best manually built multiclass model (weighted F1 = 0.626). On the monthly benchmark, lag-based machine learning outperforms time-series foundation models, though Chronos-small remains competitive in zero-shot forecasting. The results suggest that on privacy-constrained industrial tables, LLMs are more effective as complementary components than as replacements for strong tabular baselines.

07.
arXiv (math.PR) 2026-06-19

Hermite trace polynomials and chaos decompositions for the Hermitian Brownian motion

arXiv:2207.13180v4 Announce Type: replace Abstract: For a non-zero parameter $q$, we define Hermite trace polynomials, which are multivariate polynomials indexed by permutations. We prove several combinatorial properties for them, such as expansions and product formulas. The linear functional determined by these trace polynomials is a state for $q = \frac{1}{N}$ for $N$ a non-zero integer. For such $q$, Hermite trace polynomials of different degrees are orthogonal. The product formulas extend to the closure with respect to the state. The state can be identified with the expectation induced by the $N \times N$ Hermitian Brownian motion. Hermite trace polynomials are martingales for this Brownian motion, while the elements in the closure can be interpreted as stochastic integrals with respect to it. Using the grading on the algebra, we prove several chaos decompositions for such integrals, as well as analyze corresponding creation and annihilation operators. In the univariate, pure trace polynomial case, trace Hermite polynomials can be identified with the Hermite polynomials of matrix argument.

08.
arXiv (CS.CV) 2026-06-17

Blended Chart Surfaces: A Seamless Explicit Representation for Smooth Surface Fitting

A surface representation suitable for geometry processing should be compact and explicit, provide global smoothness guarantees, support a wide range of surface topologies, and offer reliable access to differential quantities such as normals and surface energies, while remaining compatible with modern differentiable optimization. Existing neural representations typically sacrifice one or more of these properties: implicit fields typically require iso-surfacing for downstream use, while explicit neural maps are constrained by canonical-domain parametrizations or exhibit seam artifacts between local charts. We introduce Blended Chart Surfaces, a compact, network-free, explicit representation that is smooth by construction and anchored to user-provided topology. Given a coarse proxy mesh encoding the intended surface topology and approximate geometry, Blended Chart Surfaces jointly optimize for a polynomial map at each proxy vertex using an off-the-shelf optimizer to fit to an implicit target shape, avoiding the need for an input parametrization. Neighboring maps are fused using a smooth 'one-ring coordinate' blending scheme, decoupling topology and coarse geometry (carried by the proxy) from geometric details (carried by the local patches). The surface is globally smooth, fully differentiable, and enables stable evaluation of derivatives, making differential quantities and surface energies directly accessible. Additionally, our construction is equivariant to rigid motions and scaling of the proxy mesh. We evaluate Blended Chart Surfaces on various topologies and geometric complexity, and compare against explicit alternatives including interpolating-function baselines and mesh-displacement MLPs. Across these, Blended Chart Surfaces achieve a favorable trade-off among compactness, simplicity, access to differential quantities, and expressivity while remaining smooth across patch boundaries.

09.
arXiv (CS.CL) 2026-06-16

SAG: SQL-Retrieval Augmented Generation with Query-Time Dynamic Hyperedges

Retrieval-Augmented Generation (RAG) offers an effective approach for large language models to access external knowledge. However, existing methods rely on dense similarity retrieval and face inherent limitations in handling structured constraints and multi-hop reasoning. Incorporating knowledge graphs partially alleviates these issues, but at the cost of semantic fragmentation, high maintenance overhead, and difficult incremental updates. This paper introduces SAG (SQLRetrieval Augmented Generation), a structured architecture for retrieval and agent systems. Instead of pre-building a global static graph, SAG converts each chunk into one semantically complete event and a set of indexing entities, then uses SQL join queries to dynamically link events that share entities into local hyperedges,constructing, at query time, a dynamically instantiated local index structure. This design avoids the need for global graph rebuilding and ongoing maintenance; the system naturally supports incremental writes, concurrent processing, and continuous scaling through its reliance on standard database infrastructure. Across HotpotQA, 2WikiMultiHop, and MuSiQue, three standard multi-hop benchmarks,SAG achieves the best results on 8 out of 9 Recall@K metrics, reaching 80.0% Recall@5 on MuSiQue, the benchmark with the highest multi-hop reasoning demands.SAG has also been deployed at a production scale of hundreds of millions of data items, with online retrieval latency kept within seconds. Project site and code are available at https://github.com/Zleap-AI/SAG-Benchmark.

10.
arXiv (CS.CV) 2026-06-16

Prompt Disentanglement via Language Guidance and Representation Alignment for Domain Generalization

Domain Generalization (DG) seeks to develop a versatile model capable of performing effectively on unseen target domains. Notably, recent advances in pre-trained Visual Foundation Models (VFMs), such as CLIP, have demonstrated considerable potential in enhancing the generalization capabilities of deep learning models. Despite the increasing attention toward VFM-based domain prompt tuning within DG, the effective design of prompts capable of disentangling invariant features across diverse domains remains a critical challenge. In this paper, we propose addressing this challenge by leveraging the controllable and flexible language prompt of the VFM. Noting that the text modality of VFMs is naturally easier to disentangle, we introduce a novel framework for text feature-guided visual prompt tuning. This framework first automatically disentangles the text prompt using a large language model (LLM) and then learns domain-invariant visual representation guided by the disentangled text feature. However, relying solely on language to guide visual feature disentanglement has limitations, as visual features can sometimes be too complex or nuanced to be fully captured by descriptive text. To address this, we introduce Worst Explicit Representation Alignment (WERA), which extends text-guided visual prompts by incorporating an additional set of abstract prompts. These prompts enhance source domain diversity through stylized image augmentations, while alignment constraints ensure that visual representations remain consistent across both the original and augmented distributions. Experiments conducted on major DG datasets, including PACS, VLCS, OfficeHome, DomainNet, and TerraInc, demonstrate that our proposed method outperforms state-of-the-art DG methods.

11.
arXiv (CS.LG) 2026-06-19

Stabilizing Bandits using Regularization: Precise Regret and A Quantitative Central Limit Theorem

arXiv:2603.10184v2 Announce Type: replace-cross Abstract: Statistical inference with bandit data presents fundamental challenges owing to adaptive sampling, which violates the independence assumptions underlying classical asymptotic theory. Recent work has identified stability~\citep{laiwei82} as a sufficient condition for valid inference under adaptivity. This paper first provides a refined stability condition, stated in terms of the iterates of an online algorithm, and shows that a large class of regularized stochastic-mirror-descent-style algorithms satisfy it. This refined condition allows us to strengthen the asymptotic results of~\citet{laiwei82} in several ways. First, we derive a non-asymptotic Berry–Esseen bound for the empirical reward estimates under adaptive sampling. Second, we derive matching non-asymptotic upper and lower bounds on the regret of the proposed algorithm, yielding a precise characterization of its regret. Third, we show that these regularized algorithms preserve asymptotic normality and valid inference under a prescribed level of adversarial corruption. Finally, we show that regularization is necessary rather than incidental: Lai–Wei stability is incompatible with the optimal $O(\sqrt{T})$ regret rate – the rate attained by unregularized algorithms such as EXP3 – so that a controlled, polylogarithmic inflation in regret is the price of valid inference.

12.
arXiv (quant-ph) 2026-06-19

Single-Step Phase-Engineered Pulse for Active Readout Cavity Reset in Superconducting Circuits

arXiv:2512.08393v2 Announce Type: replace Abstract: In a circuit QED architecture, we experimentally demonstrate a hardware-efficient and qubit-state-dependent Single-Step Phase-Engineered (SSPE) pulse scheme for actively depopulating a readout cavity. The protocol appends a reset segment with tailored amplitude and phase to a standard square readout pulse. Within the linear-response regime, the optimal reset amplitude scales proportionally with the readout amplitude, while the optimal reset phase remains invariant, significantly simplifying the experimental calibration procedure. Time-resolved measurements of the cavity photon number dynamics demonstrate that the SSPE scheme significantly outperforms the CLEAR protocol in terms of reset speed. Crucially, this approach enables arbitrarily fast, overshoot-free depletion of the cavity photon population, with the ultimate reset rate constrained by the finite analog bandwidth of the measurement chain. Furthermore, a comprehensive evaluation of the QND nature demonstrates that the SSPE scheme introduces no additional non-QND measurement errors. It exhibits non-QNDness comparable to both the free-decay and CLEAR protocols, with residual errors predominantly governed by state switching induced by qubit relaxation during the readout process. Thses results establish the SSPE scheme as a practical and scalable approach for achieving rapid and smooth cavity reset in superconducting quantum circuits.

13.
arXiv (CS.AI) 2026-06-19

Improving Code-Switching ASR with Code-Mixing Guided Synthetic Speech

arXiv:2606.19381v1 Announce Type: cross Abstract: Code-switch (CS) Automatic Speech Recognition (ASR) remains challenging due to limited availability of high quality CS text-speech pairs for training. Although synthetic data augmentation via Text-to-speech (TTS) has been explored, existing CS TTS approaches primarily optimise reconstruction fidelity and do not explicitly enforce language-boundary consistency, thereby limiting their effectiveness for CS ASR augmentation. This paper proposes a code-mixing guided preference-learning framework that steers synthetic speech generation toward improved code-switching fidelity using the Code Mixing Index (CMI). Experiments on the SEAME Mandarin-English conversational corpus demonstrate that the proposed method enhances the utility of synthetic data for ASR fine-tuning. Specifically, when fine-tuning Whisper Large, the proposed approach reduces Mixed Error Rate (MER) from 12.1%/17.8% to 8.9%/14.2% on the DevMAN and DevSGE sets, respectively.

14.
arXiv (CS.CV) 2026-06-11

MARIC: Multi-Agent Reasoning for Image Classification

Image classification has traditionally relied on parameter-intensive model training, requiring large-scale annotated datasets and extensive fine tuning to achieve competitive performance. While recent vision language models (VLMs) alleviate some of these constraints, they remain limited by their reliance on single pass representations, often failing to capture complementary aspects of visual content. In this paper, we introduce Multi Agent based Reasoning for Image Classification (MARIC), a multi agent framework that reformulates image classification as a collaborative reasoning process. MARIC first utilizes an Outliner Agent to analyze the global theme of the image and generate targeted prompts. Based on these prompts, three Aspect Agents extract fine grained descriptions along distinct visual dimensions. Finally, a Reasoning Agent synthesizes these complementary outputs through integrated reflection step, producing a unified representation for classification. By explicitly decomposing the task into multiple perspectives and encouraging reflective synthesis, MARIC mitigates the shortcomings of both parameter-heavy training and monolithic VLM reasoning. Experiments on 4 diverse image classification benchmark datasets demonstrate that MARIC significantly outperforms baselines, highlighting the effectiveness of multi-agent visual reasoning for robust and interpretable image classification.

15.
arXiv (math.PR) 2026-06-17

Convergence Analysis of the Random Bisection Method

arXiv:2603.20483v2 Announce Type: replace-cross Abstract: We propose a generalized version of the bisection method where the cutting point between the two subintervals is chosen at random following an arbitrary distribution. We compute expected convergence rates with respect to any arbitrary a priori distribution for the position of the root in the initial interval and proved that it depends only on the the expectation $\mathbb{E}[c(1-c)]$ of the cut $c$. We also provide a generalization of the method for $K$ random cuts and study its convergence properties. Most probabilistic derivations are kept fairly simple for the ease of understanding of a larger audience. Our theoretical results are then validated numerically using statistical simulation.

16.
arXiv (CS.AI) 2026-06-11

Information bottleneck for learning the phase space of dynamics from high-dimensional experimental data

arXiv:2604.24662v2 Announce Type: replace-cross Abstract: Identifying the dynamical state variables of a system from high-dimensional observations is a central problem across physical sciences. The challenge is that the state variables are not directly observable and must be inferred from raw high-dimensional data without supervision. Here we introduce DySIB (Dynamical Symmetric Information Bottleneck) as a method to learn low-dimensional representations of time-series data by maximizing predictive mutual information between past and future observation windows while penalizing representation complexity. This objective operates entirely in latent space and avoids reconstruction of the observations. We apply DySIB to an experimental video dataset of a physical pendulum, where the underlying state space is known. The method, with hyperparameters of the learning architecture set self-consistently by the data, recovers a two-dimensional representation that matches the dimensionality, topology, and geometry of the pendulum phase space, with the learned coordinates aligning smoothly with the canonical angle and angular velocity. These results demonstrate, on a well-characterized experimental system, that predictive information in latent space can be used to recover interpretable dynamical coordinates directly from high-dimensional data.

17.
medRxiv (Medicine) 2026-06-12

An integrative multi-omics framework identifies epigenetic dysregulation of HAND2 as a potential primary driver of impaired enteric neural crest cell differentiation in Hirschsprung Disease

Hirschsprung disease (HSCR) is a congenital neurodevelopmental disorder characterized by segmental aganglionosis due to impaired developmental processes of enteric neural crest cells (NCCs). Despite being the leading genetic cause of functional intestinal obstruction in early childhood, HSCR represents a paradigmatic challenge in precision medicine: its multifactorial etiology, complex gene-environment interactions and limited resolution of single-modality analyses have long hindered mechanistic understanding and therapeutic translation. Here, we applied an integrative multi-omics approach combining genetic, phenotypic, epigenomic and transcriptomic analyses of matched ganglionic and aganglionic formalin-fixed paraffin-embedded (FFPE) patient tissues, complemented by patient-specific in vitro models. Beyond established genetic contributors, our integrative approach reveals novel regulatory pathways predominantly affecting enteric NCC differentiation, with convergent evidence pointing to epigenetic dysregulation as a primary disease mechanism. Notably, we identified over 1,300 differentially methylated positions between ganglionic and aganglionic FFPE samples, with HAND2 emerging as a key candidate due to multiple hypermethylated sites and consistently reduced expression levels in aganglionic tissues and in vitro models, suggesting a potential role in HSCR pathophysiology. We propose that our multi-omics approach offers a powerful and comprehensive framework for dissecting disease mechanisms. Beyond advancing biological understanding, this strategy holds promise for paving the way for molecularly informed patient stratification and supporting the development of personalized treatment and postoperative management strategies.

18.
arXiv (CS.CL) 2026-06-18

UniECG: Understanding and Generating ECG in One Unified Model

Electrocardiogram (ECG) interpretation is a fundamental skill in medical education, yet students often need more than static examples to connect waveform evidence with diagnostic reasoning. This paper presents UniECG as a step toward interactive ECG education. UniECG supports two complementary learning interactions: given an ECG signal or image, it generates an evidence-based explanation; given a textual learning objective, it generates a corresponding ECG signal example for case-based learning. The model follows a two-stage design. First, it learns grounded ECG explanation from ECG signal–image–text data. Second, it introduces special ECG generation tokens and aligns their hidden representations with a pretrained text-conditioned ECG diffusion model, enabling controllable signal-level ECG generation. We evaluate UniECG through grounded ECG explanation and generation-oriented qualitative analysis, examining its potential to support explanation and case-based learning. UniECG is intended as an educational aid and a research step toward interactive AI-assisted ECG learning, rather than a clinically validated diagnostic system.

19.
arXiv (CS.CV) 2026-06-16

Clinically Aware Synthetic Image Generation for Concept Coverage in Chest X-ray Models

Deep learning models for chest X-ray diagnosis are constrained by limited coverage of clinically meaningful concept combinations in publicly available training datasets. While synthetic image generation has been explored to increase data diversity, existing methods rarely enforce clinical or anatomical constraints, limiting utility for improving model reliability. We propose CARPA, a clinically aware and anatomically grounded framework for synthetic chest X-ray generation that applies targeted perturbations to clinical concept vectors while preserving anatomical structure. By producing anatomically faithful synthetic images with controlled concept insertions and deletions, CARPA expands clinically relevant concept coverage. We evaluate CARPA across seven backbone architectures by fine-tuning models on synthetic subsets and testing on a held-out MIMIC-CXR benchmark. Compared to prior concept perturbation approaches, fine-tuning on CARPA-generated images consistently improves precision-recall performance, reduces predictive uncertainty, and improves model calibration. Structural and semantic analyses demonstrate high anatomical fidelity, strong concept alignment, and low semantic uncertainty. Evaluation by two expert radiologists further confirms realism and clinical agreement. Together, these results show that anatomically grounded concept perturbations enable more effective use of synthetic data, improving both performance and reliability of chest X-ray classification models and supporting safer clinical deployment.

20.
arXiv (CS.AI) 2026-06-12

Rarity-Gated Context Conditioning for Offline Imitation Learning-Based Maritime Anomaly Detection

arXiv:2606.13311v1 Announce Type: cross Abstract: Contextual anomaly detection aims to identify abnormal behavior conditional on context variables, but practical deployments often face highly imbalanced context distributions where rare regimes can be critical information. Under such frequency bias, context-conditioned models can produce unstable decisions and excessive false alarms in rare contexts. We propose Rarity-Gated Feature-wise Linear Modulation (RGFiLM), a rarity-aware conditioning module that combines feature-wise modulation (i.e., context-conditioned scaling and shifting of hidden features) with a gate controlled by a data-driven rarity score. The rarity score is estimated from the empirical distribution of context variables and regulates how strongly context modulates intermediate representations: the gate becomes more decisive under rare contexts while remaining conservative under frequent contexts. We evaluate RGFiLM on maritime trajectory anomaly detection using AIS motion sequences with ERA5 environmental context in an environment-sensitive detour scenario. When instantiated in a sequential anomaly scoring pipeline, RGFiLM achieves the best mean F1–False Positive Rate (FPR) trade-off among the compared context-agnostic and context-conditioned methods. These results suggest that explicitly accounting for context rarity is an effective approach for reducing false alarms in context-sensitive anomaly detection.

21.
arXiv (CS.CV) 2026-06-15

SA4Depth: Consistent Pose-Depth Scale Alignment for Self-Supervised Monocular Depth Estimation

Self-supervised depth estimation from monocular sequences relies on the joint learning of a depth and a pose network. Despite abundant research done to improve the depth network, efforts on the pose remain limited. In this context, even when depth is estimated up to scale, we highlight the importance of the alignment between the scene scales estimated by the pose and depth nets. Then, we introduce SA4Depth, an approach to improve this alignment and boost the depth predictions while keeping the inference time unchanged. Our proposed method uses the depth estimated during training to reproject learnable visual features across consecutive frames and refine the pose estimates by reducing feature alignment residuals. With our method, the estimated scene scales by the separate depth and pose networks are aligned, and the prediction scale consistency is improved across different sequences. Our differentiable refinement integrates seamlessly into existing self-supervised pipelines and substantially improves their depth estimates. We demonstrate this with extensive experiments both outdoors and indoors on KITTI, Cityscapes, and NYUv2. Additionally, results on KITTI Odometry confirm the effectiveness of our pose refinement. Our code is available at https://github.com/Runningchauncey/SA4Depth .

22.
arXiv (CS.LG) 2026-06-16

Airport Terminal Passenger Queue Forecasting for Departure Gates and Security Checkpoints

arXiv:2606.07622v2 Announce Type: replace Abstract: Accurate passenger queue forecasting in airport terminals is essential for efficient departure operations, as it enables proactive congestion management. However, time-varying passenger demand and heterogeneous facility usage across multiple departure facilities make forecasting challenging. In this work, we propose a passenger queue forecasting framework that learns historical passenger flow patterns from operational data. The proposed model employs a Transformer-based architecture to capture temporal dependencies and inter-facility correlations using past queue length and waiting time at departure gates and security checkpoints, together with passenger throughput at check-in islands. The learned representations are mapped to two facility-specific prediction heads to predict queue length and waiting time at departure gates and security checkpoints. Experimental results demonstrate accurate forecasts up to two hours ahead. The proposed approach offers practical real-time decision support for proactive queue management and staff reallocation in airport terminal operations.

23.
arXiv (CS.CV) 2026-06-17

Complex Layout Classification in the Wild: A Low-Resource Approach with Layout-Preserving Augmentations

Many digitized corpora suffer from low resources because annotations may be scarce, page scans are noisy and of poor resolution, or layouts are structurally complex in ways that negatively affect the quality of automatic transcription. Developing robust classification models for low-resource languages is inhibited by the lack of large-scale annotated data and by the frequent semantic complexity of page layouts. To this end, we have curated a complex-layout dataset, manually classified into eight distinct layout types based on their separator regions. To overcome data scarcity, we propose a novel training strategy in the form of a CNN-based classifier that employs strong, domain-aware augmentations to improve generalization. We utilize narrow anisotropic Gaussian masking to suppress incidental textual details while preserving essential separations, compelling the model to learn global geometric arrangements. Additionally, we implement reflection-induced label transformations to enrich the training distribution while maintaining label consistency across asymmetric categories. The results demonstrate that layout-specific augmentations can substantially improve page-level layout classification under severe annotation scarcity.

24.
arXiv (math.PR) 2026-06-11

Sure-almost-sure and Sure-limit-sure Window Mean Payoff in Markov Decision Processes

arXiv:2605.12191v2 Announce Type: replace-cross Abstract: Given rationals $\alpha$ and $\beta$, the sure-almost-sure problem for a threshold Boolean objective $\varphi$ in a Markov decision process (MDP) asks if one can simultaneously ensure that all outcomes of the MDP have $\varphi$-value at least $\alpha$ (i.e. sure $\alpha$ satisfaction) and with probability $1$ the outcome has $\varphi$-value at least $\beta$ (i.e. almost-sure $\beta$ satisfaction). The sure-limit-sure problem asks if for all $\varepsilon > 0$ one can simultaneously ensure that all outcomes have $\varphi$-value at least $\alpha$ and with probability at least $1 - \varepsilon$ the outcome has $\varphi$-value at least $\beta$. Moreover, if simultaneous satisfaction of objectives is possible, then one would also like to construct a strategy (for sure-almost-sure) or a family of strategies (for sure-limit-sure) that achieves this. In this paper, we solve the sure-almost-sure and sure-limit-sure problems for window mean-payoff objectives. The window mean-payoff objective strengthens the standard mean-payoff objective by requiring that eventually, from every point in the infinite run, the average payoff becomes greater than a given threshold within a finite window length. We study two variants of window mean payoff: in the fixed variant, the window length $\ell$ is given, while in the bounded variant, the length is not given but is required to be bounded throughout the run. We show that the sure-almost-sure problem and the sure-limit-sure problem are both in P for the fixed variant (if $\ell$ is given in unary) and are both in NP $\cap$ coNP for the bounded variant, matching the computational complexity of sure satisfaction and almost-sure satisfaction when considered separately for these objectives. We also give bounds for the memory requirement of winning strategies for all considered problems.

25.
arXiv (math.PR) 2026-06-16

Structure preserving properties of higher order moment closures for TASEP

arXiv:2604.15925v2 Announce Type: replace-cross Abstract: The totally asymmetric simple exclusion process (TASEP) is a stochastic model for the unidirectional flow of interacting particles on a 1D-lattice that is much used in systems biology and statistical physics. Its master equation describes the evolution of the probability distribution on the configuration space. The size of the master equation grows exponentially with the length of the lattice. It is known that the complexity of the system may be reduced using mean-field approximations. We provide a rigorous definition of a family of such models using moments of any order and an extension to the pair approximation for obtaining closures for the system. The dimension of these models grows linearly with the lattice size and exponentially in the order of the approximation. Moreover, we show that the states of these models still have a probabilistic interpretation and that basic structural properties of the master equation are preserved. This extends known results on the Ribosome Flow Model which can be viewed as the first order approximation for TASEP.