Academic Intelligence · Curated Daily

探索全球前沿学术脉络

AcademicHub 汇聚顶级期刊与预印本平台的实时文献。定制您的专属科研雷达,利用大语言模型自动生成交叉领域文献分析简报。

01.
arXiv (math.PR) 2026-06-11

The Statistical Compass

arXiv:2606.11282v1 Announce Type: cross Abstract: This monograph develops probability and stochastic-process ideas as a translation language for statistics: from designed observations and data objects to targets, stability statements, inference, and use. The chapters move from motivating examples and randomization through probability measures, kernels, likelihoods, data objects, weak convergence, empirical fields, functional data, M- and Z-estimation, testing, local approximations, event-time processes, and prediction. Historical and biomedical examples are used to keep abstract objects tied to records, mechanisms, and decisions. The aim is to give readers a common grammar for classical probability, modern data structures, and statistical practice.

02.
arXiv (CS.LG) 2026-06-11

SpAArSIST: Sparsified AASIST for Efficient and Reliable Anti-Spoofing

arXiv:2606.11674v1 Announce Type: cross Abstract: We present SpAArSIST, a deployment-oriented refinement of the widely used AASIST graph pooling backend for self-supervised learning (SSL) based anti-spoofing. Motivated by redundant operations in public implementations, we replace learned pooling and stack-node attention with explicit, lightweight choices: separate train and inference graph pooling ratios $(k_{\mathrm{tr}},k_{\mathrm{inf}})$, magnitude-based node scoring, and mean aggregation of graph nodes. The best overall configuration (rank 1) cuts backend compute by 20.7% (195.045M $\rightarrow$ 154.706M MACs) and model size by 4.1% (611.8k $\rightarrow$ 586.4k params), while improving out-of-domain robustness on In-the-Wild to 2.82% EER and 0.078 minDCF (from 4.64% and 0.133) and remaining competitive on ASVspoof5. We further provide a composite selection score that summarizes accuracy, calibration, and compute to support balanced deployment-oriented model choice.

03.
arXiv (CS.AI) 2026-06-12

Democracy in the Era of Artificial Intelligence

arXiv:2606.13026v1 Announce Type: cross Abstract: Interfacing Artificial Intelligence (AI) with democracy is one of the most profound challenges of our times. On the one hand, AI comes with opportunities to overcome long-standing challenges in democracy, such as low participation in deliberative and voting processes with poor representation of people. On the other hand, new risks arise from AI algorithms that are privacy-intrusive, biased, manipulative, spread misinformation and influence election results. Moving beyond the over-simplistic question of whether AI is good or bad for democracy, the Handbook on Democracy in the Era of Artificial Intelligence asks instead: how to upgrade democracies and the principles they are built on, using AI? How to engage with AI and on what terms? Which new values and design principles are required to build democratic resilience? In 34 chapters by 59 authors across the world from different disciplines, we explore how AI can empower collective intelligence for democracy (Part 1) and what is the future of deliberative democracy using large language models and social media (Part 2). We also illustrate the role of AI for building resilient self-governance systems (Part 3) and the challenges of transforming democracy in the age of AI (Part 4). We conclude with broader perspectives (Part 5) that re-imagine the interplay of democracy and AI.

04.
arXiv (quant-ph) 2026-06-19

Spatial Localization of Relativistic Quantum Systems: The Commutativity Requirement and the Locality Principle. Part II: A Model from Local QFT

arXiv:2604.04173v3 Announce Type: replace-cross Abstract: This paper is the second and final part of a two-part study. We construct positive-energy relativistic spatial localization observables in Minkowski spacetime within standard quantum field theory, using the stress–energy–momentum tensor smeared with suitable test functions. For each fixed timelike direction, the construction gives positive operator-valued measures (POVMs) on spacelike hypersurfaces, well defined on every $n$-particle sector and satisfying a relativistic causality condition excluding superluminal propagation of detection probabilities. The observables are built from local or quasi-local field-theoretic quantities, thus providing a rigorous version of earlier heuristic proposals. In the one-particle sector, the construction reduces to the observable previously introduced by the author, and its first moment gives the Newton–Wigner position operator under appropriate normalization and centering assumptions. Because the Reeh–Schlieder theorem prevents the normally ordered stress–energy–momentum tensor from being positive on the full Fock space, we use quantum energy inequalities to obtain lower bounds controlling deviations from positivity. This leads to regularized operator families, bounded from below, which approximate the localization effects. Finally, we define conditional localization observables for finite laboratories through modified local energy operators. By Haag duality, the corresponding conditional POVMs belong to local von Neumann algebras and commute for causally separated regions, in accordance with the Araki–Haag–Kastler framework. The results show how commutativity of localization observables is recovered for conditional measurements in finite spacetime regions.

05.
arXiv (CS.CL) 2026-06-16

LLM-based Visual Code Completion for Aerospace Geometric Design

Recent advances in both Large Language Models (LLMs) and Vision Language Models (VLMs) have seen a step change in their ability to perform visual code completion, but the aerospace industry, which prioritizes safety and explainabilty over rapid LLM adoption, currently has no publicly announced LLM-based geometric design copilot systems in commercial use by aerospace Original Equipment Manufacturers (OEMs). This paper presents a LLM-based visual programming copilot application for aerospace engineering design tasks, using a visual programming variant of the ReAct methodology and GPT 5.4. In addition to the copilot, we describe Wingbuilder, a new Grasshopper plugin library with custom components for aerospace-specific geometry abstraction, and an associated Aerospace Visual Programming Dataset (AVPD) with 18 aerospace expert designed tasks at different levels of difficulty alongside ground truth solutions. We evaluate our copilot application with a user trial involving two experienced aerospace engineers from a large aircraft manufacturing company. We find our copilot visual programming ReAct methodology was successful in generating suggestions that participants found helpful, but slow ReAct inference times limit its usefulness to more complex time-consuming tasks where waiting for good copilot solution suggestion was worthwhile. Participants reported they liked the tool and would be willing to use it in the future.

06.
arXiv (CS.LG) 2026-06-12

Mixing Makes Markovian Contexts Cheap for Linear Bandits

arXiv:2603.12530v2 Announce Type: replace Abstract: Recent work shows that when contexts are drawn i.i.d., linear contextual bandits can be reduced to single-context linear bandits. This ``contexts are cheap'' perspective is highly advantageous, as it allows for sharper finite-time analyses and leverages mature techniques from the linear bandit literature, such as those for misspecification and adversarial corruption. However, this reduction crucially relies on the independence of contexts and does not extend to settings with temporally correlated (e.g., Markovian) contexts, which arise frequently in practice. Motivated by applications with temporally correlated availability, we extend this perspective to linear bandits with Markovian context processes, where the action set evolves via an exogenous Markov chain. Our main contribution is a reduction that applies under uniform geometric ergodicity. We construct a stationary surrogate action set to solve the problem using a standard linear bandit oracle, employing a delayed-update scheme to control the bias induced by the nonstationary conditional context distributions. We further provide a phased algorithm for unknown stationary distributions that learns the surrogate mapping online. In both settings, we obtain a high-probability worst-case regret bound matching that of the underlying linear bandit oracle in sufficiently fast mixing regimes. We then validate our results on a real-world instance, where we show practical gains over a LinUCB baseline.

07.
bioRxiv (Bioinfo) 2026-06-13

Virus-human protein-protein interactions predict viral phenotypes

Viral phenotypes such as host and tissue tropism are critical determinants of viral infection and transmission. Inferring viral phenotypes presents unique challenges compared to cellular organisms, as viruses rely entirely on host machinery for replication and survival. Current methods for predicting viral phenotypes mainly rely on viral genomic data, often overlooking host-related information. Here, we evaluated the utility of predicted virus-human protein-protein interactions (PPIs) in inferring diverse viral phenotypes using machine-learning algorithms. For predicting human infectivity, a PPI-based machine learning model outperformed both virus genomic and protein sequence-based models that used large language model embeddings. It also surpassed previous methods that incorporated both viral and host genomic data. The human proteins identified by the model were significantly enriched in functions related to viral infection and immune response. In predicting various phenotypes of human RNA viruses, PPI-based models performed better than virus sequence-based models in forecasting virulence, human transmissibility and transmission routes, while showing comparable performance to genomic sequence-based models in predicting tissue tropism. Finally, we demonstrated that a PPI-based model could distinguish high-risk HPV genotypes from low-risk ones. Proteins associated with high-risk HPV were involved in apoptosis and immune regulation, whereas those linked to low-risk HPV were enriched in telomere maintenance and DNA repair. Collectively, this study is the first to demonstrate the value of predicted virus-human PPIs in inferring viral phenotypes, thereby enhancing our understanding of the molecular mechanisms underlying these phenotypes. It also provides effective tools for risk assessment of emerging viruses, contributing to improved pandemic preparedness.

08.
arXiv (CS.AI) 2026-06-11

Improving Generalization and Data Efficiency with Diffusion in Offline Multi-agent RL

arXiv:2307.01472v2 Announce Type: replace Abstract: We present a novel Diffusion Offline Multi-agent Model (DOM2) for offline Multi-Agent Reinforcement Learning (MARL). Different from existing algorithms that rely mainly on conservatism in policy design, DOM2 enhances policy expressiveness and diversity based on diffusion model. Specifically, we incorporate a diffusion model into the policy network and propose a trajectory-based data-reweighting scheme in training. These key ingredients significantly improve algorithm robustness against environment changes and achieve significant improvements in performance, generalization and data-efficiency. Our extensive experimental results demonstrate that DOM2 outperforms existing state-of-the-art methods in all multi-agent particle and multi-agent MuJoCo environments, and generalizes significantly better to shifted environments {(in $28$ out of $30$ settings evaluated)} thanks to its high expressiveness and diversity. Moreover, DOM2 is ultra data efficient and requires no more than $5\%$ data for achieving the same performance compared to existing algorithms (a $20\times$ improvement in data efficiency).

09.
arXiv (CS.AI) 2026-06-18

DeFAb: A Verifiable Benchmark for Defeasible Abduction in Foundation Models

arXiv:2606.18557v1 Announce Type: new Abstract: A rule-based logic solver resolves every instance in our benchmark in under 50 microseconds with 100% accuracy; the best frontier language model reaches 65% at best and drops to 23.5% under rendering-robust evaluation (worst case over four surface renderings). We introduce DeFAb (Defeasible Abduction Benchmark), a dataset and generation pipeline that converts four decades of publicly funded knowledge bases into formally grounded instances for defeasible abduction: constructing hypotheses that explain anomalies by overriding defaults while preserving unrelated expectations. Because every hypothesis must pass polynomial-time checks for valid derivation, conservativity, and minimality, DeFAb makes logical rigor the instrument for measuring creativity and theoretical reasoning, scoring the disciplined construction of theory revisions rather than fluent but theory-destroying prose. The pipeline pairs taxonomic hierarchies (OpenCyc, YAGO, Wikidata) with behavioral property graphs (ConceptNet, UMLS) to produce 372,648+ instances across 33.75M materialized rules from 18 sources, in three levels with polynomial-time verifiable gold standards. Four frontier models do not reliably internalize defeasible reasoning: rendering-robust Level 2 accuracy is 7.8-23.5%; chain-of-thought variance (~36 pp) exceeds any inter-model gap; and a matched contamination control isolates a +19.4 pp Level 3 gap. We further release DeFAb-Hard (a 235-instance Level 3 difficulty variant; best model 53.3% vs 100% symbolic) and CONJURE (a kernel-verified transformative-creativity variant of 560 Lean 4/Mathlib instances whose gold answers are definitions the proof kernel did not previously contain, judge-free verifier; a pilot finds zero novel concepts). The same verifier doubles as an exact reward for preference optimization (DPO, RLVR/GRPO). Released under MIT at https://huggingface.co/datasets/PatrickAllenCooper/DeFAb.

10.
arXiv (quant-ph) 2026-06-16

Quantum Energy Teleportation under Equilibrium and Nonequilibrium Environments

arXiv:2511.01518v3 Announce Type: replace Abstract: Quantum energy teleportation (QET), implemented via local operations and classical communication, enables carrier-free energy transfer by exploiting quantum resources. While QET has been extensively studied theoretically and validated experimentally in various quantum platforms, enhancing energy output for mixed initial states, as the system inevitably interacts with environments, remains a significant challenge. In this work, we study QET performance in a two-qubit system coupled to equilibrium or nonequilibrium reservoirs. We derive an analytical expression for the energy output in terms of the system Hamiltonian eigenstates, enabling analysis of energy output for mixed states. Using the Redfield master equation, we systematically examine the effects of qubit detuning, nonequilibrium temperature difference, and nonequilibrium chemical potential difference on the energy output. We find that the energy output for mixed states often follows that of the eigenstate with the highest population, and that nonequilibrium environments can enhance the energy output in certain parameter regimes.

11.
arXiv (CS.CL) 2026-06-16

Towards Advanced Mathematical Reasoning for LLMs via First-Order Logic Theorem Proving

Large language models (LLMs) have shown promising first-order logic (FOL) reasoning capabilities with applications in various areas. However, their effectiveness in complex mathematical reasoning involving multi-step FOL deductions is still under-researched. While LLMs perform competitively on established mathematical reasoning benchmarks, they struggle with multi-step FOL tasks, as demonstrated by Deepseek-Prover-V2-7B's low accuracy (4.2%) on our proposed theorem proving dataset. This issue arises from the limited exploration of diverse proof strategies and the potential for early reasoning mistakes to undermine entire proofs. To address these issues, we propose DREAM, a self-adaptive solution that enhances the Diversity and REAsonability of LLMs' generation strategies. DREAM incorporates an Axiom-Driven Strategy Diversification mechanism to promote varied strategic outcomes and a Sub-Proposition Error Feedback to help LLMs reflect on and correct their proofs. Our contributions include pioneering advancements in LLMs' mathematical reasoning through FOL theorem proving, introducing a novel inference stage solution that improves performance by 0.6% to 6.4%, and providing a curated dataset of 447 mathematical theorems in Lean 4 format for evaluation.

12.
bioRxiv (Bioinfo) 2026-06-18

Accounting for allelic diversity and multicopy gene detection improves the accuracy of antibiotic resistance genotypic determination

Background Genomic prediction of antimicrobial resistance (AMR) relies on the accurate detection of resistance genes or allelic variants of core genes from raw or assembled genomes sequences. For several bacterial species and antibiotics, AMR genotype-phenotype discrepancies are common, indicating that important sources of error remain unresolved. For Enterococcus faecium, we focused on identifying the sources of discrepancies for tetracycline resistance, for which genotypic detection had shown particularly low accuracy. We investigated the effect of structural variation in antibiotic resistance genes (ARGs), including gene duplications, truncations, interruptions, and mixed configurations of complete and partial gene copies, as a source of genotype-phenotype discrepancies from short-read data. We conduct further extended investigations to other antibiotic families and into another bacterial species: Escherichia coli. Methods We analyzed collections of E. faecium and E. coli genomes, integrating high-quality complete assemblies, simulated Illumina short reads, and matched AMR phenotypic data. The integrity, copy number, and allelic diversity of ARGs were examined for multiple antibiotic classes, and their impact on ARG detection and accuracy of AMR determination was assessed using several commonly used bioinformatic tools (SRST2, ARIBA and AMRFinderPlus). Results For E. faecium, after ruling out the effect of specific tet allelic variants on tetracycline susceptibility, we found that the integrity and copy number of tet(M) had a major effect on detection accuracy. Duplicated and incomplete ARGs are also common in E. faecium genomes, particularly for macrolides (erm(B)) and aminoglycosides (ant(6)-Ia and aph(3')-IIIa). In E. coli, similar patterns were observed for tet(A), erm(B) and aminoglycoside-associated genes (aph(3')-IIIa and ant(6)-Ia). Across ARGs in both species, short-read mapping methods wrongly reported interrupted genes as complete in some instances, while assembly-based methods often failed to resolve complete copies of duplicated genes. Detection accuracy improved when tools were adapted to account for gene integrity and when extended AMR databases incorporating species-specific alleles were included. Conclusions Our findings reveal that bioinformatic limitations in dealing with ARG copy number and completeness, and in accounting for allelic variation, underly a substantial source of genotype-phenotype errors, highlighting the need for improved AMR databases and bioinformatic tools that consider these factors to achieve reliable genomic prediction of AMR.

13.
arXiv (CS.LG) 2026-06-16

A polarity-aware multi-relational model for the signed interaction prediction in biological networks

arXiv:2407.07357v3 Announce Type: replace Abstract: Predicting signed interactions in biological networks is crucial for understanding drug mechanisms and facilitating drug repurposing. While deep graph models have demonstrated success in modeling complex biological systems, existing approaches often fail to distinguish between positive and negative interactions, limiting their utility for precise pharmacological predictions. In this study, we propose a novel deep graph model, PAMR (polarity-aware multi-relational model), designed to predict both polar (e.g., activation, inhibition) and non-polar (e.g., binding, affect) chemical-gene interactions. Our model integrates graph convolutional networks with tensor decomposition to enhance feature representation and incorporates a conflict-aware sampling strategy to resolve polarity ambiguities. We introduce new evaluation metrics, polarity discrimination score (PDS) and CP@100, to assess the model's ability to differentiate interaction types. Experimental results demonstrate that PAMR outperforms baseline models, achieving superior classification accuracy and improved discrimination of polar edges. Specifically, PAMR-CL attains a Macro AUROC of 0.9072 and CP@100 of 0.974, surpassing RGCN, GraphSAGE, TransE, and BioNet baselines. A case study on nicotine further identifies two novel chemical-gene suppression links, S100A6 and SPP1, that are corroborated by independent experimental literature. Furthermore, we analyze the impact of subgraph components on predictive performance, revealing that additional network structures do not always enhance accuracy. These findings highlight the importance of polarity-aware modeling in drug discovery and network pharmacology, providing a scalable computational framework for polarity-aware chemical-gene interaction prediction and network pharmacology analysis.

14.
arXiv (CS.AI) 2026-06-15

When Errors Become Narratives: A Longitudinal Taxonomy of Silent Failures in a Production LLM Agent Runtime

作者:

arXiv:2606.14589v1 Announce Type: cross Abstract: LLM agent systems increasingly run as long-lived autonomous runtimes: scheduling jobs, calling tools, maintaining memory, and pushing results to humans. We present a longitudinal study of silent failures in one such system: a personal-assistant agent runtime in continuous production since March 2026, with roughly 40 scheduled jobs, 8 LLM providers, a tool-governance proxy, and a knowledge-base memory plane, defended by 4,286 unit tests and 827 governance checks. Over eight weeks we documented 22 incidents with full root-cause postmortems, in which one meta-pattern – a failure whose error signal never reaches a human in actionable form – manifested at least 28 times. We derive a five-class, mechanism-oriented taxonomy: (A) environment and platform quirks, (B) design-assumption mismatches, (C) error swallowing and dilution, (D) chained hallucination and fabrication, (E) operational omission and forensic blind spots. Class D is unique to LLM systems and the most dangerous: the system does not merely fail to report an error – the LLM transforms it into fluent, plausible narrative delivered to the user. We term this fail-plausible: gray failure's differential observability escalated – the observer is not just blind, it is convincingly lied to by the failure itself. Three findings: about 70% of silent failures were caught by human user-view observation, not tests or audits; a retrospective audit of 15 incidents found 0% ex-ante prevention but 87% regression blocking – audits are regression engines, not prediction engines; incident latency (13 hours to 60 days) tracks failure mechanism, not code complexity – the longest-lived failures lived in the seams between components, where no test runs. We describe the resulting defense framework and distill design principles for agent systems whose failures are loud, attributable, and boring. All postmortems and artifacts are public.

15.
arXiv (math.PR) 2026-06-17

Convergence rate of Euler–Maruyama scheme to the invariant probability measure under total variation distance for the SDEs

arXiv:2505.04218v3 Announce Type: replace Abstract: This article shows the geometric decay rate of Euler-Maruyama scheme for one-dimensional stochastic differential equation towards its invariant probability measure under total variation distance. Firstly, the existence and uniqueness of invariant probability measure and the uniform geometric ergodicity of the chain are studied through introduction of non-atomic Markov chains. Secondly, the equivalent conditions for uniform geometric ergodicity of the chain are discovered, by constructing a split Markov chain based on the original Euler-Maruyama scheme.

16.
arXiv (CS.AI) 2026-06-16

MimicIK: Real-Time Generative Inverse Kinematics from Teleoperation with FK Consistency

arXiv:2606.15148v1 Announce Type: cross Abstract: Inverse kinematics (IK) remains a critical bottleneck for real-time robot manipulation. Classical numerical solvers achieve high geometric precision but often suffer from discontinuous branch switching and unstable behavior near kinematic singularities during closed-loop deployment. Meanwhile, learned IK approaches frequently struggle to balance spatial accuracy, motion smoothness, and real-time efficiency, particularly when trained on noisy human teleoperation data. We present MimicIK, a real-time generative inverse kinematics framework that learns smooth and robust joint-space motion priors from teleoperation demonstrations through conditional flow matching. Given the current joint configuration and a target end-effector pose, MimicIK predicts continuous delta-joint commands using an efficient two-step iterative refinement process based on a Minimal Iterative Policy (MIP) backbone. To enforce physical consistency, we further introduce an FK consistency loss, a differentiable forward-kinematics regularization that penalizes task-space deviations from the target pose during training. We evaluate MimicIK on a real-world 6-DOF robot dataset containing 8,848 teleoperation demonstrations. MimicIK achieves a mean position error of 4.65 mm, a 10 mm success rate of 92.01\%, and a trajectory spike rate of only 7.99\%. Compared with a UNet diffusion baseline, our method improves both spatial accuracy and motion smoothness while reducing inference latency from 21.66 ms to 6.74 ms. Furthermore, unlike deterministic MLP baselines that catastrophically diverge under out-of-distribution deployment, MimicIK remains stable near singular configurations and enables robust 20 Hz real-time control on deployment hardware.

17.
arXiv (quant-ph) 2026-06-16

Connecting entanglement growth with local integrals of motion in the disordered Fermi-Hubbard model

arXiv:2606.15481v1 Announce Type: new Abstract: Generically a quantum system initialized in an unentangled state will, under unitary dynamics, rapidly become entangled, a process closely related to information transport and to thermalization. Disorder can suppress the growth of entanglement and result in memory of initial conditions. In non-interacting systems this arises from localization of single-particle states, the occupancy of which is fixed by the initial condition. In interacting systems similar localized conserved quantities persist, but with the added feature that they are coupled, resulting in entanglement growth which is distinct from both non-interacting localized systems and from generic ergodic systems. The Fermi-Hubbard model has two degrees of freedom per site – charge and spin – and disorder may be present in both of these. We study the growth of entanglement in two scenarios – disorder in charge equal and unequal to that in spin, and determine the distinct contributions of charge and spin degrees of freedom by expanding the Hamiltonian in terms of a set of optimally localized conserved quantities with separate charge and spin character. We find that coupling between charge and spin is significantly weaker than charge-charge and spin-spin coupling. While this decoupling is present in all our results, it is only apparent when the strength of the disorder in the two sectors is different such that there is a separation between the characteristic timescales of the contributions to entanglement made by charge and by spin.

18.
arXiv (CS.CV) 2026-06-16

LIBERO-Occ: Evaluating and Improving Vision-Language-Action Models under Scene-Induced Occlusion via Viewpoint Imagination

Vision-Language-Action (VLA) models achieve strong performance on standard manipulation benchmarks, but most evaluations assume that task-relevant objects are fully visible. This assumption often fails in realistic settings, where occlusion makes manipulation partially observable. In this paper, we study scene-induced occlusion as a fundamental challenge for VLA models and introduce LIBERO-Occ, an occlusion-oriented extension of LIBERO. Experiments show that state-of-the-art VLAs suffer substantial performance degradation under occlusion. To address this issue, we propose Viewpoint Imagination (VIM), which generates a complementary view from an occluded primary observation and conditions action prediction on both observed and imagined evidence. VIM improves robustness across task suites, occlusion types, and severity levels without requiring additional cameras at deployment time, suggesting that viewpoint imagination is an promising mechanism for perception completion in partially observable manipulation. Our benchmark and corresponding code are available at: \href{https://github.com/litsh/Libero-Occ}{https://github.com/litsh/Libero-Occ}.

19.
arXiv (CS.AI) 2026-06-12

Two-Layer Linear Auto-Regressive Models Estimate Latent States

arXiv:2606.12691v1 Announce Type: cross Abstract: Auto-regressive models have emerged as powerful tools for sequential data, from language to video. Understanding how and why these models learn latent representations remains an open theoretical question. In this work, we demonstrate that when trained by empirical risk minimization on data from partially observed linear dynamical systems, two-layer linear auto-regressive models naturally learn to approximate Kalman filtering. In particular, we show that the learned hidden representation coincides, up to a similarity transformation, with the state estimates produced by the optimal (Kalman) filter, even though the model has no explicit knowledge of the underlying dynamics or state. The result follows from three main insights. First, we establish that the Kalman filter is well approximated by an auto-regressive model with bounded truncation error. Second, we show that despite non-convexity, the two-layer optimization landscape is benign, i.e., all stationary points are either strict saddles or global minima. Finally, as our main contributions, we provide finite-sample guarantees on prediction error, parameter estimation error, and latent state recovery. Numerical simulations support the theoretical results and demonstrate that the latent representations of auto-regressive models recover state estimates.

20.
arXiv (CS.AI) 2026-06-19

ParaScale: Scale-Calibrated Camera-Motion Transfer via a Gauge-Invariant Parallax Number

作者:

arXiv:2606.19805v1 Announce Type: cross Abstract: Transferring the camera motion of a reference video to a freshly generated one lets creators reuse cinematic moves. Yet reference and target often live at incompatible scales – a sweep across a galaxy versus a nudge across a desk – and naively reusing the recovered trajectory yields either imperceptible or violently exaggerated motion. We trace this to a geometric fact: translation-induced image motion scales as ||T||/Z, so a monocular trajectory is meaningful only up to a depth-scale gauge. We distill this into the Parallax Number Pi = ||Delta T|| / Zbar, a dimensionless, gauge-invariant descriptor of how strongly a camera move is felt, and prove that it – not the raw trajectory – is the quantity that scale-faithful transfer must preserve. ParaScale is a plug-and-play module that reads Pi off any reference video and re-realizes it against the target scene's own depth, per frame, leaving rotation untouched. Sitting between pose extraction and pose injection, it requires no retraining and drops into any pose-conditioned generator. We further introduce the Parallax Consistency Error (PCE), a scale-symmetric metric that – unlike the similarity-aligned TransErr – exposes scene-scale mismatch. Across scale regimes spanning four orders of magnitude and multiple backbones, ParaScale keeps the realized parallax on the identity line and cuts PCE by more than 3x over uncalibrated transfer with no loss of visual fidelity.

21.
arXiv (math.PR) 2026-06-19

The t-Split Two-Periodic Aztec Diamond Model

arXiv:2606.19507v1 Announce Type: new Abstract: In this work we consider an Aztec diamond model split into two unequal regions which are asymptotically fixed in size. Each region is weighted with a distinct two-periodic weighting. We refer to this model as the t-split two-periodic Aztec diamond, to signify its difference from the previous work title Split Two-Periodic Aztec Diamond, where the model was split into two equal regions. We derive an integral expression for the correlation kernel of the model and give a partial description of the scaling limit behavior, along with a conjecture for the remainder. We refer to the larger and smaller sides of the model as the dominant and non-dominant sides, and to the location of the weight change as the interface. The dominant side exhibits a limit shape that depends only on its own weighting and is identical to that of the two-periodic Aztec diamond, while the non-dominant side appears to have a novel limit shape that depends on both weightings and the location of the interface. Lastly, we consider the complete limit shape in the case where the dominant side two-periodic parameter goes to 0.

22.
arXiv (CS.LG) 2026-06-16

Efficient Reinforcement Learning by Guiding World Models with Non-Curated Data

arXiv:2502.19544v3 Announce Type: replace Abstract: Leveraging offline data is a promising way to improve the sample efficiency of online reinforcement learning (RL). This paper expands the pool of usable data for offline-to-online RL by leveraging abundant non-curated data that is reward-free, of mixed quality, and collected across multiple embodiments. Although learning a world model appears promising for utilizing such data, we find that naive fine-tuning fails to accelerate RL training on many tasks. Through careful investigation, we attribute this failure to the distributional shift between offline and online data during fine-tuning. To address this issue and effectively use the offline data, we propose two techniques: i) experience rehearsal and ii) execution guidance. With these modifications, the non-curated offline data substantially improves RL's sample efficiency. Under limited sample budgets, our method achieves nearly twice the aggregate score of learning-from-scratch baselines across 72 visuomotor tasks spanning 6 embodiments. On challenging tasks such as locomotion and robotic manipulation, it outperforms prior methods that utilize offline data by a decent margin.

23.
arXiv (CS.AI) 2026-06-11

OCSVM-Guided Representation Learning for Unsupervised Anomaly Detection

arXiv:2507.21164v2 Announce Type: replace-cross Abstract: Unsupervised anomaly detection (UAD) aims to detect anomalies without labeled data, a necessity in many machine learning applications where anomalous samples are rare or not available. Most state-of-the-art methods fall into two categories: reconstruction-based approaches, which often reconstruct anomalies too well, and decoupled representation learning with density estimators, which can suffer from suboptimal feature spaces. While some recent methods attempt to couple feature learning and anomaly detection, they often rely on surrogate objectives, restrict kernel choices, or introduce approximations that limit their expressiveness and robustness. To address this challenge, we propose a novel method that couples representation learning with an analytically solvable One-Class SVM (OCSVM), through a custom loss formulation that directly aligns latent features with the OCSVM decision boundary. The model is evaluated on two tasks: a \deleted{new} benchmark based on MNIST-C, and a challenging brain MRI \deleted{subtle} lesion detection task. Unlike most methods that focus on large, hyperintense lesions at the image level, our approach succeeds to target small, non-hyperintense lesions, while we evaluate voxel-wise metrics, addressing a more clinically relevant scenario. Both experiments evaluate a form of robustness to domain shifts, including corruption types in MNIST-C and texture or population age variations in MRI. Results demonstrate performance and robustness of our proposed model, highlighting its potential for general UAD and real-world medical imaging applications. The source code is available at https://github.com/Nicolas-Pinon/uad_ocsvm_guided_repr_learning.

24.
arXiv (CS.CV) 2026-06-12

Improving Pre-trained Adult Glioma Segmentation Models Using only Post-processing Techniques

Gliomas are the most common malignant brain tumors in adults and are among the most lethal. Despite aggressive treatment, the median survival rate is less than 15 months. Accurate multiparametric MRI (mpMRI) tumor segmentation is critical for surgical planning, radiotherapy, and disease monitoring. While deep learning models have improved the accuracy of automated segmentation, large-scale pre-trained models generalize poorly and often underperform, producing systematic errors such as false positives, label swaps, and slice discontinuities in slices. These limitations are further compounded by unequal access to GPU resources and the growing environmental cost of large-scale model training. In this work, we propose adaptive post-processing techniques to refine the quality of glioma segmentations produced by large-scale pretrained models developed for various types of tumors. We demonstrated the techniques in multiple BraTS 2025 segmentation challenge tasks, with the ranking metric improving by 14.9 % for the sub-Saharan Africa challenge and 0.9% for the adult glioma challenge. This approach promotes a shift in brain tumor segmentation research from increasingly complex model architectures to efficient, clinically aligned post-processing strategies that are precise, computationally fair, and sustainable.

25.
arXiv (CS.CV) 2026-06-15

RATS! Patches Talk Through Registers: Emergent Parts in Register Attention Transformers

When humans see a bird, they recognize far more than just "bird" – they see a head, wings, and talons, a structured assembly of reusable parts that can be identified across every bird they have ever seen. We ask whether a self-supervised visual model can discover the same compositional structure on its own. To this end, we propose RATS (Register Attention Transformers), which decomposes the classification token into N learnable register tokens that route patch information through an L->N->N->L bottleneck via a three-step compress-communicate-broadcast attention. The N registers are partitioned across the H attention heads, so that registers assigned to different heads do not interact with each other. Without auxiliary losses or part annotations, each register spontaneously specializes into a proto-semantic region whose emerging structure resembles object parts. RATS surpasses all baselines by +12 mIoU on average across five segmentation benchmarks, with consistent gains on ADE20K (+1.11 mIoU) and COCO (+0.2 AP^m). Its register dictionary further exhibits part-level consistency and semantic proximity across related categories. Our results suggest that RATS may provide a useful architectural prior for structured and interpretable visual representation learning.