Academic Intelligence · Curated Daily

Explore the Frontier of Global Academia

AcademicHub aggregates real-time literature from top journals and preprint platforms. Build your personal research radar and let large language models compile cross-disciplinary analysis briefings automatically.

01.
arXiv (CS.AI) 2026-06-16

JetParticle-JEPA: An Efficient Self-Supervised Representation Learning method for Jet Tagging in High-Energy Physics

arXiv:2606.14813v1 Announce Type: cross Abstract: Jet tagging at the Large Hadron Collider increasingly relies on deep learning models trained on massive simulated datasets, leading to high computational costs and limited robustness to detector mismodeling. We introduce JetParticle-JEPA (JP-JEPA), a self-supervised Joint-Embedding Predictive Architecture that learns physically meaningful jet representations directly from continuous particle clouds without tokenization or reconstruction of raw inputs. Built on a Particle Transformer backbone, JP-JEPA predicts latent representations of masked particles while preserving fine-grained kinematic correlations. On the JetClass benchmark, JP-JEPA achieves performance comparable to fully supervised state-of-the-art methods on the full dataset, surpasses supervised baselines in low-label regimes, and significantly outperforms existing SSL approaches. On Top Quark and Quark-Gluon Tagging benchmarks, it remains on par with supervised methods. The learned representations also exhibit strong robustness to missing detector information and improved uncertainty behavior, highlighting JP-JEPA as a promising foundation-model framework for robust and data-efficient jet physics at the LHC.

02.
arXiv (math.PR) 2026-06-11

Improved Amenability Bounds for Local Coordination Games

arXiv:2606.01963v2 Announce Type: replace-cross Abstract: We study local pure coordination games on finite social networks, continuing the framework of Hutchcroft, Rospuskova, and Tamuz. They showed that low inefficiency in local coordination forces the underlying graph to be amenable, with a square-root loss in the amenability parameter. We improve this loss in the binary unbiased setting. Using Shapley values of a mutual-information game associated with the players' local outputs, we prove that if the average disagreement is at most $\varepsilon$, then the graph is $(O(\varepsilon\log(1/\varepsilon)),r)$-amenable. This gives a sharper quantitative converse between local coordination and graph amenability.

03.
PLOS Medicine 2026-06-23

Prevalence and epidemiological patterns of <i>Neisseria gonorrhoeae</i> infection in sub-Saharan Africa, 1964–2025: Systematic review, meta-analyses, and meta-regressions

Authors:

by Aisha Osman, Hina Akram, Bayan Alemrayat, Sumaya Al-Maraghi, Manale Harfouche, Laith J. Abu-Raddad Background Neisseria gonorrhoeae (NG) infection is a global health concern because of its morbidity and increasing antimicrobial resistance. Sub-Saharan Africa is believed to carry a disproportionately high burden of NG infection, but the epidemiology of NG infection in this region has not been comprehensively synthesized. This study systematically reviewed and analyzed NG prevalence in sub-Saharan Africa to characterize prevalence patterns and identify populations at risk. Methods and findings A systematic review was conducted and reported following PRISMA guidelines. Embase, PubMed, Scopus, and Web of Science were searched from inception to June 4, 2025. Eligible studies reported NG prevalence in sub-Saharan Africa. Random-effects meta-analyses generated pooled prevalence estimates, and random-effects meta-regression analyses identified associations and sources of heterogeneity.Nine hundred fifty publications contributed 1,604 prevalence measures spanning 1964–2025. In the general population, pooled urogenital prevalence was 3.2% (95% confidence interval (CI): 2.9–3.5), with substantial between-study heterogeneity and a wide prediction interval, indicating considerable variation in prevalence across settings. Prevalence was high in key populations: among female sex workers, 11.5% (95% CI: 9.9–13.2) for urogenital and 2.0% (95% CI: 0.4–4.5) for anorectal infection; and among men who have sex with men, 2.8% (95% CI: 2.4–3.3) for urogenital, 8.3% (95% CI: 5.8–11.0) for anorectal, and 5.7% (95% CI: 3.6–8.3) for oropharyngeal infection. Symptomatic men exhibited high urogenital prevalence (51.5%; 95% CI: 47.5–55.5), and symptomatic women showed 9.0% (95% CI: 7.7–10.4). Among women with adverse pregnancy or birth outcomes, urogenital prevalence was 8.6% (95% CI: 5.3–12.6). Meta-regression analyses explained over half of the variability in prevalence, showing a long-term decline of 1% per year, a clear population type gradient, subregional differences, and decreasing prevalence with increasing age, but no variation by sex. These findings may be affected by variability in data availability across countries, anatomical sites, and population groups, as well as heterogeneity across included studies. Conclusions NG prevalence remains markedly high in this region but has declined over time. These findings highlight the need for strengthened surveillance, expanded prevention and diagnostic strategies, and continued monitoring of gonococcal antimicrobial resistance to support effective control efforts in sub-Saharan Africa.

04.
arXiv (CS.CV) 2026-06-18

Aerial-ground LiDAR place recognition with patch-level self-supervised learning and expanded reciprocal re-ranking

LiDAR place recognition determines one's position on a prior point cloud map. The most studied ground-level LiDAR place recognition suffers from pre-visit requirements, incomplete coverage, and limited perspectives. Using pre-acquired, full-coverage Airborne Laser Scanning (ALS) data as an aerial prior map overcomes these drawbacks, making cross-view place recognition necessary and advantageous. However, aerial-ground LiDAR place recognition faces significant challenges, including the domain gap between aerial and ground point clouds, and false positives during initial retrieval. To address these challenges, we present a novel retrieval and re-ranking framework for aerial-ground LiDAR place recognition. Based on the priors that neighboring point cloud patches share similar semantics with anchor patch, our retrieval network introduces patch-level self-supervised learning modules at multiple scales and integrates with scene-level learning to improve global feature discriminativeness between aerial and ground point clouds. Furthermore, leveraging the structured spatial distribution of ALS point clouds, we introduce an Expanded Reciprocal (ER) re-ranking algorithm to exploit neighborhood information maximally and refine each feature based on neighbor features, which are then used to update the similarity matrix for final ranking. Extensive experiments demonstrate that our retrieval network outperforms existing state-of-the-art (SOTA) methods, achieving a 9.8\% improvement in average Recall@1 and a 3.2\% improvement in average Recall@1\% on the CS-Urban-Scenes, while also showing the best performance on the CS-Campus3D dataset. Additionally, our ER re-ranking algorithm further boosts the average Recall@1 by 4.9\% on CS-Campus3D and 10.2\% on CS-Urban-Scenes without additional training.

05.
arXiv (CS.LG) 2026-06-16

Not All Retrievals are Useful: Cross-Attention for Input-Aware RAG in Time Series Forecasting

arXiv:2603.14709v2 Announce Type: replace Abstract: Retrieval-augmented generation (RAG) enhances zero-shot time series (TS) forecasting by leveraging external knowledge bases, yet existing approaches overlook input-level relevance when fusing retrieved samples with the query. We argue that not all retrievals are equally useful, and irrelevant ones can degrade performance. To this end, we propose Cross-RAG, a zero-shot RAG-based forecasting framework that selectively attends to query-relevant retrieved samples via query–retrieval cross-attention. By modeling input-level relevance between the query and retrieved samples, Cross-RAG jointly incorporates three sources of information: 1) the query itself, 2) the retrieved samples, and 3) their relational interactions. In particular, this input-aware design enables Cross-RAG to remain stable as the number of retrieved samples $k$ grows, whereas prior methods without cross-attention require careful $k$ tuning to avoid degradation from irrelevant retrievals. Extensive experiments demonstrate that Cross-RAG consistently improves zero-shot forecasting performance across multiple TSFM backbones and various RAG methods, with additional analyses confirming its effectiveness across various retrieval scenarios. Code is available at https://github.com/seunghan96/cross-rag/.

06.
arXiv (CS.AI) 2026-06-19

Latent Gaussian Splatting for 4D Panoptic Occupancy Tracking

arXiv:2602.23172v2 Announce Type: replace-cross Abstract: Capturing 4D spatiotemporal scene structure is crucial for the safe and reliable operation of robots in dynamic environments. However, existing approaches typically address only part of the problem: they either provide coarse geometric tracking via bounding boxes or detailed 3D occupancy estimates that lack explicit temporal association and instance-level reasoning. In this work, we present Latent Gaussian Splatting (LaGS) for 4D Panoptic Occupancy Tracking (4D-POT). We revisit the underlying representation and model 3D features as a sparse set of feature-bearing Gaussians. These act as dynamic, volume-oriented keypoints that enable spatially continuous, distance-weighted aggregation of multi-view features before being splatted into a voxel grid for decoding. This point-centric formulation enables flexible, data-dependent receptive fields and long-range spatial interactions that are difficult to capture with local and dense voxel-based operators. A hierarchical Gaussian representation further enables multi-scale reasoning by combining global context from coarse super-points with fine-grained detail from higher-resolution streams. Extensive experiments on Occ3D nuScenes and Waymo demonstrate state-of-the-art performance for 4D-POT. We provide code and models at https://lags.cs.uni-freiburg.de/.

07.
arXiv (math.PR) 2026-06-11

The Geometry of Admissible Short Selling in Discrete-Time Stochastic Portfolio Theory

arXiv:2606.11191v1 Announce Type: cross Abstract: While discrete-time Stochastic Portfolio Theory (SPT) provides a robust framework for market analysis, existing work on functional generation has predominantly focused on long-only portfolios defined on the entire unit simplex. This paper extends the geometric framework of functional generation to the broader class of bankruptcy-proof long-short portfolios defined on local market state spaces. We establish that, within this admissible setting, pseudo-arbitrage is fully characterized by the concavity of the generating function on the market state space, thereby relaxing the usual global domain requirement. A central contribution of this work is a geometric characterization of the short-selling mechanism. We prove that the presence of short selling is equivalent to the negativity of the maximal concave extension of the generating potential. This phenomenon is linked to the steepness of the logarithmic gradient as the market approaches a zero boundary nested inside the simplex. To systematically exploit this mechanism, we introduce the barycentric scaling transformation, a constructive methodology that maps classical long-only generating functions onto restricted domains to engineer admissible strategies with controlled short-selling exposure. Finally, through the analysis of specific shrunken portfolios, we identify a geometric phase transition: under suitable boundary conditions, admissible strategies exhibit a long-only core and a short-selling region in a qualitative sense (without asserting an exact partition of the state space). This provides a unified geometric perspective on relative arbitrage beyond the long-only constraint.

08.
arXiv (CS.CL) 2026-06-15

Independent-Component-Based Encoding Models of Brain Activity During Story Comprehension

Encoding models provide a powerful framework for linking continuous stimulus features to neural activity; however, traditional voxelwise approaches are limited by measurement noise, inter-subject variability, and redundancy arising from spatially correlated voxels encoding overlapping neural signals. Here, we propose an independent component (IC)-based encoding framework that dissociates stimulus-driven and noise-driven signals in fMRI data. We decompose continuous fMRI data from naturalistic story listening into ICs using one subset of the data, and train encoding models on independent data to predict IC time series from large language model representations of linguistic input. Across subjects, a subset of ICs exhibited consistently high predictivity. These ICs were spatially and temporally consistent across subjects and included cognitive networks known to respond during story listening (auditory and language). Auditory component time series were strongly correlated with acoustic stimulus features, highlighting the interpretability of identified component time series. Components identified as noise or motion-related artifacts by ICA-AROMA showed uniformly poor predictive performance, confirming that highly predicted components reflect genuine stimulus-related neural signals rather than confounds. Overall, IC-based encoding models enable analyses at the level of functional networks, accommodating the variability in network locations across individuals and providing interpretable results that are easy to compare across subjects. Code provided at: https://github.com/kamyahari/IC-Encoding-Models.git

09.
arXiv (CS.AI) 2026-06-18

Better Adherence, Richer Context: A Field Evaluation of LLM-Powered Conversational Voice Diaries for Sleep

arXiv:2606.18596v1 Announce Type: cross Abstract: Sleep diaries are central to behavioral sleep medicine and cognitive behavioral therapy for insomnia, yet daily completion is difficult to sustain, and static forms often provide limited context for interpreting night-to-night sleep variation. We designed an LLM-powered conversational voice diary that delivers clinically grounded morning and evening sleep diary questions through proactive smart-speaker prompts, structured conversational intake, and adaptive follow-up dialogue. We evaluated the system in a four-week between-subjects field study with 30 university students, comparing it with a text-based mobile diary using matched diary items, reporting windows, and reminder intervals. Compared with the text-based diary, the conversational voice diary showed higher adherence and elicited more detailed contextual self-report about routines, stressors, environmental conditions, and other sleep-related factors. Participants also described the voice diary as easier to integrate into daily routines, despite longer perceived completion time. However, voice-based conversational intake produced lower completeness for some structured diary fields, revealing a trade-off between expressive richness and structured precision. These findings show both the promise and the challenge of using LLM-powered conversational voice assistants for longitudinal health self-report.

10.
arXiv (CS.LG) 2026-06-17

Amortized Probabilistic Retrieval of Atmospheric CO2 from OCO-2 Spectra Using Deep Learning with Laplace Approximations and Normalizing Flows

arXiv:2606.17413v1 Announce Type: new Abstract: Space-based monitoring of atmospheric carbon dioxide (CO2) is essential for constraining the global carbon budget. NASA's Orbiting Carbon Observatory-2 (OCO-2) estimates column-averaged dry-air mole fractions of CO2 (XCO2) using high-resolution spectra. However, current operational retrieval algorithms are computationally expensive and do not properly quantify uncertainties. We present a novel deep learning framework that addresses these challenges. Due to the difficulties of ground-truth data for real satellite observations, we develop and validate our approach using a high-fidelity simulation dataset. This dataset, created to support OCO-2 uncertainty quantification (UQ), incorporates realistic forward model errors. Our architecture encodes spectral bands using a multi-branch neural network and estimates posteriors of the full CO2 column or desired summaries thereof using two scalable UQ methods: Laplace approximations and normalizing flows. Our approach has five key advantages relative to operational "full-physics" solvers: (1) Amortization: Inference is orders of magnitude faster, enabling real-time processing of massive data streams; (2) Model error robustness: By training on simulations that explicitly include model discrepancies, our method accounts for systematic errors often neglected by standard inversions; (3) Point estimate accuracy: We achieve superior predictive accuracy compared to baseline methods; (4) Improved UQ: The probabilistic outputs yield better-calibrated uncertainty estimates; and (5) Non-Gaussian posteriors: When utilizing normalizing flows, our framework successfully models complex, asymmetric posterior distributions, overcoming the limitations of the Gaussian assumption. These results suggest that simulation-based deep learning is a viable path toward next-generation operational processing systems.

11.
arXiv (CS.CV) 2026-06-19

Hierarchical mutual distillation for multi-view fusion: Learning from all possible view combinations

Multi-view learning often struggles to effectively leverage images captured from diverse angles and locations. Learning methods for unstructured multi-view images remain largely underexplored. We propose a novel Hierarchical Mutual Distillation for Multi-View Fusion (HMDMV) method, which can handle both structured and unstructured multi-view scenarios. It makes predictions utilizing all possible view combinations: single view, partial multi-view, and full multi-view. The method generates predictions for each view combination and then applies hierarchical mutual distillation to enhance inter-view consistency. An uncertainty-based weighting mechanism further refines the fusion process by adjusting the influence of each view combination according to its prediction confidence, reducing the impact of low-confidence views. Extensive experiments on large-scale structured and unstructured datasets demonstrate that HMDMV consistently achieves state-of-the-art classification accuracy. Another unique advantage of HMDMV is that it provides improved flexibility in inference, allowing for more or fewer view counts in inference than those used in training without additional processing. We also provide a light version with reduced training cost by designing an efficient strategy that randomly samples subsets of view combinations during each training iteration. These results highlight HMDMV's robustness in real-world settings where view availability is variable or incomplete. The code is available at https://github.com/labhai/HMDMV.

12.
arXiv (math.PR) 2026-06-11

Instability of a nonlinear oscillator with small friction and small additive noise

arXiv:2606.11389v1 Announce Type: new Abstract: Let $\lambda = \lambda(\beta,\sigma,a,b)$ denote the top Lyapunov exponent for the linearization along trajectories of the noisy damped non-linear oscillator $\ddot{x}+\beta \dot{x} + ax+bx^3 = \sigma \dot{W}_t$, where $a$, $b$ and $\beta$ are all positive and $\sigma \neq 0$. In 2004 Arnold, Imkeller and Sri Namachchivaya stated without proof that $\lambda(\varepsilon^2 \beta,\varepsilon \sigma,a,b) \sim \overline{\lambda} \varepsilon^{2/3}$ as $\varepsilon \to 0$ with $\overline{\lambda} > 0$. This paper contains a proof of this assertion.

13.
arXiv (quant-ph) 2026-06-12

The table maker's quantum search

arXiv:2601.13306v2 Announce Type: replace Abstract: We show that quantum search can be used to compute the hardness to round an elementary function, that is, to determine the minimum working precision required to compute the values of an elementary function correctly rounded to a target precision of $n$ digits for all possible precision-$n$ floating-point inputs in a given interval. For elementary functions $f$ related to the exponential function, quantum search takes time $\tilde O(2^{n/2} \log (1/\delta))$ to return, with probability $1-\delta$, the hardness to round $f$ over all $n$-bit floating-point inputs in a given binade. For periodic elementary functions in large binades, standalone quantum search yields an asymptotic speedup over the best known classical algorithms and heuristics. We then estimate the resources required for a fault-tolerant implementation of the proposed algorithm for the $\sin$ and $\cos$ functions in double precision. We find that, although the algorithm can in principle compete with the fastest known practical method for computing the hardness to round over all binades in the format, it requires qubit coherence times that are unrealistically long for present technology.

14.
bioRxiv (Bioinfo) 2026-06-10

APOSM: Pairwise preference learning improves generative small-molecule design

Small-molecule lead refinement is constrained by the cost of synthesizing and assaying candidates, making the surrogate models that prioritize compounds for experimental testing central to the design process. The reliability of such surrogates is limited by the noise and sparsity of screening measurements. We show that training the surrogate on pairwise comparisons between candidate molecules, rather than on absolute predicted scores, yields a substantially more reliable signal for active candidate selection in this regime. We develop APOSM, an active-learning algorithm that combines a fragment-based generator, a pairwise message-passing graph neural network surrogate, and probabilistic ranking inside a batched acquisition loop. On the Practical Molecular Optimization benchmark and a GPCR ligand rediscovery task, APOSM improves target attainment and sampling efficiency over unguided fragment-based optimization, the Graph-GA genetic algorithm, and a pointwise-regression ablation, with the largest gains on tasks where absolute scores are hardest to calibrate.

15.
arXiv (quant-ph) 2026-06-12

Entanglement Detection by Approximate Entanglement Witnesses

arXiv:2402.14755v2 Announce Type: replace Abstract: The problem of determining whether a given quantum state is separable is known to be computationally difficult. We develop an approach to this problem based on approximations of convex polytopes in high dimensions. By showing that a convex polytope constructed from a finite number of hyperplanes approximates the Euclidean ball arbitrarily well in high dimensions, we find evidence that a finite set of approximate entanglement witnesses is potentially sufficient to determine the entanglement of a state with high probability.

16.
arXiv (math.PR) 2026-06-18

Functions of Bounded Variation and Point Processes

arXiv:2606.08304v2 Announce Type: replace-cross Abstract: We investigate the relationship between the analytical properties of functions of bounded variation and the statistical behavior of hyperuniform point processes. We establish several characterization formulas for the jump part of the gradient of a bounded variation function, extending and unifying previous results by Beretti–Gennaioli and Dávila. In particular, we provide new expressions for the $L^2$-jump of the gradient using both difference quotients and Fourier transform methods. Furthermore, we connect these analytic structures to the theory of hyperuniform point processes. By analyzing the variance of linear statistics associated with bounded variation functions, we provide asymptotic estimates that depend on the specific classification of the hyperuniformity of the point process. The results show how the regularity and jump discontinuities of a function dictate the growth rate of fluctuations in point processes. Finally, we introduce an averaged quadratic BMO-type oscillation functional over translated and rotated cube partitions, similar to the one recently studied by Ambrosio et al., and prove, using results from point process, that it converges to an explicit dimensional constant times the $L^2-$jump, giving in particular a further new characterization of the perimeter of a set.

17.
medRxiv (Medicine) 2026-06-10

A risk-of-contagion index using a Bayesian based model for the COVID-19 epidemic in Mexico

During the COVID-19 pandemic, limited testing capacity and reporting delays complicated epidemic surveillance and decision-making in Mexico. We calibrated textit{covidestim}, a Bayesian nowcasting model, to estimate the total SARS-CoV-2 infections from reported cases and deaths using Mexican surveillance data. Disease-progression distribution priors were calibrated using Mexico City records and validated through comparisons with national seroprevalence surveys, hospitalization data, and annual reported severe-case rates across all states. Using the reconstructed estimates of active infections, we implemented an event-based risk framework that quantifies the probability of encountering at least one infectious individual in gatherings of different sizes. This probability was subsequently translated into a four-level epidemiological traffic-light indicator and computed at both state and municipality levels. The resulting estimates revealed substantial spatial heterogeneity that is obscured by state-level aggregation, particularly in states with marked differences between urban and rural municipalities. To evaluate consistency with public-health indicators, we compared the proposed risk classification with the official Mexican epidemiological traffic-light system, considering interpretable gathering sizes relevant to public-health decision making. Weekly reports derived from this framework were delivered to policymakers in the State of Queretaro in Mexico, as an anticipation tool for school reopening and public-space management. This demonstrates that this Bayesian reconstruction of infections combined with event-based risk metrics can provide an interpretable and generalizable municipality-level complement to routine surveillance systems, particularly in regions with limited testing capacity and heterogeneous local transmission dynamics.

18.
arXiv (CS.AI) 2026-06-24

Social Structure Matters in 3D Human-Human Interaction Generation

arXiv:2606.24255v1 Announce Type: cross Abstract: Although text-to-motion generation has achieved strong progress in synthesizing realistic single-person motions from language, extending it to text-driven 3D human-human interaction (HHI) remains non-trivial, as HHI requires modeling the underlying social structure that governs phase progression, actor roles, and inter-actor coordination. In this paper, we formulate HHI generation as a social structure modeling and grounding problem: the model must first infer how an interaction unfolds and how the two actors coordinate their roles, and then realize this structure as continuous, physically plausible, and partner-aware 3D motion. To study how such structure should be modeled, we first examine the capability boundary of large language models (LLMs) for HHI generation. Our analysis shows that LLMs can think by recovering phase decompositions and partner-aware roles, but cannot directly move, as they fail to generate dynamic, physically plausible, and interaction-aware motion. This motivates our planner-executor paradigm, Think with LLM, Move with Motion Skill. The LLM planner converts implicit interaction semantics into motion-aligned social supervision by decomposing interactions into phases, assigning partner-aware actor roles, and aligning them with motion sequence. The motion executor then grounds the planned social structure into coordinated two-person motion by adapting a pretrained solo motion model with LoRA, previous-phase self-conditioning, and ego-relative partner conditioning. Together, our Solo-to-Social framework bridges social organization and motion realization, producing 3D HHI with improved phase consistency, role alignment, and partner-aware coordination.

19.
arXiv (CS.CL) 2026-06-24

AGORA: An Archive-Grounded Benchmark for Agentic Workplace Document Reasoning

Large language models are increasingly deployed as agents that reason over documents rather than answer from parametric knowledge. We study archive-grounded reasoning: locating sparse evidence across a large, messy collection of workplace files, reconciling inconsistent terminology, units, and time conventions, and computing an answer. Existing benchmarks address only parts of this setting and none jointly stresses archive-groundedness, agentic exploration, and cross-domain coverage. We introduce Agora, a benchmark pairing 362 questions with eight domain collections of 9,664 authentic documents and 372M tokens, far exceeding any model's context window, so agents must explore deliberately rather than scan exhaustively. Agora is built by an agentic pipeline combining cross-document task synthesis, leakage-preventing obfuscation, and difficulty filtering. Evaluating eight models, we find the task far from solved: even the strongest reaches only 59.4% accuracy, with notable variation across domains.

20.
arXiv (quant-ph) 2026-06-16

Synthesizing Arbitrary Non-Hermitian Hamiltonian with Stochastic Floquet Engineering

arXiv:2606.15664v1 Announce Type: new Abstract: The conventional Floquet engineering scheme synthesizes a given target Hamiltonian with a deterministic temporal periodic driving field. In this work, we introduce the stochastic Floquet engineering scheme that can synthesize an arbitrary non-Hermitian target Hamiltonian using a time-periodic driving field with noisy amplitude. Our method is rooted in the Hermitian dynamics taking noise as a valuable quantum resource with no need for loss or gain in prior. We apply our method to engineer a cavity Hamiltonian with dissipative coupling between Fock states, and to prepare a given quantum state from a generally arbitrary quantum state. The stochastic Floqut engineering also provides a way to generate non-unitary quantum gates, which take advantage in certain tasks compared to unitary quantum computing, without the need for ancillae or state-dependent updating.

21.
arXiv (CS.AI) 2026-06-24

On the Position Bias of On-Policy Distillation

arXiv:2606.22600v2 Announce Type: replace-cross Abstract: On-Policy Distillation (OPD) improves the learning efficiency of standard reinforcement learning through dense, token-level supervision from teachers. In the standard KL objective of OPD, token-level losses are uniformly averaged, implying equal weights for all tokens. However, we discover that not all tokens are created equal: as student rollouts grow longer, they deviate further from the teacher's distribution, leading to degraded supervision quality at later positions. As a result, OPD using only the first 30% of tokens can perform comparably to using all tokens, whereas OPD using only the last 30% of tokens barely learns anything. In this work, we provide a principled understanding of this issue through the lens of constrained optimization. Based on these insights, we derive Importance-Weighted On-Policy Distillation (IW-OPD), in which the weight assigned to each token depends on the accumulated discrepancy between the student's and teacher's distributions, naturally upweighting earlier tokens and downweighting later ones with larger deviations. We show that IW-OPD converges significantly faster than OPD, with better learning efficiency, and achieves better final performance than standard OPD in both same-size and cross-scale settings, improving performance up to 6.9 points on AIME-2025.

22.
arXiv (CS.CL) 2026-06-17

Rift: A Conflict Signature for Deception in Language Models

Authors:

A model that lies while knowing the truth is the central case ELK cannot handle with behavioral evaluation alone. We ask whether such deception leaves an internal signature distinguishing it from honest error. Our key move is a control for wrongness: we contrast a sleeper agent (knows the truth, lies on trigger) against a naive liar (fine-tuned to emit the same wrong answers with no honest training). Both produce identical wrong outputs; any difference is about knowledge conflict, not incorrectness. We find deceptive forward passes carry a conflict signature - 2.1-2.3x higher residual rank than naive-liar passes on the same wrong answer - strong enough to identify which of two responses is the lie with 100% accuracy and no labels, across GPT-2 small/medium (three seeds) and three instruct models. Across Qwen2.5-1.5B/7B and Phi-3-mini, instructed deception raises residual rank on every tested fact (18/18, 40/40, 34/34); on Phi-3, lies separate perfectly from both honest answers and hallucinations (AUC 1.0, Wilcoxon p~6e-11). The signature survives strategic self-constructed deception (model invents its own lie, AUC 1.0), active concealment attempts (AUC 1.0), and length-controlled replication (20/20, AUC 1.0, p~1e-6). Using basis-free relative representations, a probe trained on one model family detects deception in two other families zero-shot (mean AUC 0.933), surviving simultaneous architecture and format change (AUC 0.821), and transfers across five languages (AUC 1.000, length-controlled). The signature is read-only: detectable but not injectable (0/8 both directions). Honest limitations and six negative experiments are documented in full.

23.
arXiv (CS.LG) 2026-06-11

Trajectory Geometry of Transformer Representations Across Layers

arXiv:2606.09287v2 Announce Type: replace Abstract: Understanding how transformer representations evolve across layers, not merely what they encode, remains an open problem in mechanistic interpretability. We recast the transformer forward pass as a discrete population trajectory through a high-dimensional representation manifold, drawing on geometric tools from computational neuroscience. Rather than probing for pre-specified features, we characterize trajectory geometry using five metrics computed directly in the ambient space: trajectory length, curvature, a semantic convergence index, layerwise cosine similarity, and representational stability. Across three model families (GPT-2, TinyLlama, Qwen2.5) and five controlled prompt families, we report four findings. First, semantically related prompts converge significantly in middle-to-late layers (peak CI 0.41–0.58, p

24.
arXiv (CS.CV) 2026-06-15

Hierarchical Consistency Learning for Test-time Adaptation in Camouflage Perception

Camouflaged object detection (COD) aims to localize targets that exhibit minimal perceptual differences from backgrounds through physical attributes. Existing methods, constrained by the static train-then-freeze paradigm, suffer from domain rigidity and annotation dependency, limiting their adaptability to scene variations and unseen camouflage patterns. To overcome these, we propose the hierarchical consistency learning (HCL) framework, which integrates test-time adaptation for dynamic representation recalibration. Specifically, we design the hierarchical representation reconstruction (HRR) to alleviate feature entanglement by synergizing spatial reconstruction with dual-stream frequency-domain decomposition, enhancing robustness against appearance homogenization. The pixel and spectrum inference provide structural and contextual priors. We further introduce task affinity guidance (TAG) to propagate knowledge across branches via channel-wise affinity, aligning local discriminative cues and mitigating semantic drift. To ensure semantic invariance, we formulate the prototype consistency calibration (PCC), which aggregates region features into compact prototypes and establishes prototype-feature similarity. This imposes implicit and hierarchical constraints that bridge task and representation gaps. Extensive experiments across four camouflaged and four underwater object benchmarks, under three degradation settings, demonstrate that our method consistently outperforms state-of-the-art approaches, highlighting its robustness and generalization under distribution shifts.

25.
arXiv (CS.CV) 2026-06-16

Multimodal LLM-Empowered Re-Ranking for Generalizable Person Re-Identification

Domain Generalizable (DG) person re-identification (Re-ID) has attracted growing research interest due to its potential for deployment in unseen real-world scenarios. Most existing approaches address DG Re-ID by focusing on training domain-generalizable encoders but ignore the possible refinements in inference stage. In contrast, this work explores an alternative direction which improves inference re-ranking to enhance DG Re-ID. Conventional re-ranking methods typically rely on neighborhood-based distances to refine the initial ranking list, inherently depending on features produced by the Re-ID encoder. However, they deteriorate on target domains since the encoder lacks sufficient generalizability to produce reliable feature distances on unseen scenarios. Inspired by the remarkable generalization capabilities of recent Multimodal Large Language Models (MLLMs), we propose an MLLM-empowered distance metric to improve re-ranking in DG Re-ID. Specifically, we first adapt an MLLM to Re-ID data through supervised fine-tuning, which incorporates a domain-agnostic prompt and a query-candidate hard mining scheme. Then, the adapted MLLM is employed to compute a $\mu$-distance during inference, which is robust to domain gap and significantly enhances subsequent re-ranking performance. Our approach is model-agnostic and can be seamlessly integrated into previous re-ranking frameworks. Extensive experiments demonstrate that our approach consistently yields substantial performance improvements across multiple DG Re-ID benchmarks. The code of this work will be released at https://github.com/RikoLi/MUSE soon.