Academic Intelligence · Curated Daily

探索全球前沿学术脉络

AcademicHub 汇聚顶级期刊与预印本平台的实时文献。定制您的专属科研雷达,利用大语言模型自动生成交叉领域文献分析简报。

01.
arXiv (CS.LG) 2026-06-15

PCR-CA: Parallel Codebook Representations with Contrastive Alignment for Multiple-Category App Recommendation

arXiv:2508.18166v5 Announce Type: replace-cross Abstract: Modern app store recommender systems struggle with multiple-category apps, as traditional taxonomies fail to capture overlapping semantics, leading to suboptimal personalization. We propose PCR-CA (Parallel Codebook Representations with Contrastive Alignment), an end-to-end framework for improved CTR prediction. PCR-CA first extracts compact multimodal embeddings from app text, then introduces a Parallel Codebook VQ-AE module that learns discrete semantic representations across multiple codebooks in parallel – unlike hierarchical residual quantization (RQ-VAE). This design enables independent encoding of diverse aspects (e.g., gameplay, art style), better modeling multiple-category semantics. To bridge semantic and collaborative signals, we employ a contrastive alignment loss at both the user and item levels, enhancing representation learning for long-tail items. Additionally, a dual-attention fusion mechanism combines ID-based and semantic features to capture user interests, especially for long-tail apps. Experiments on a large-scale dataset show PCR-CA achieves a +0.76% AUC improvement over strong baselines, with +2.15% AUC gains for long-tail apps. Online A/B testing further validates our approach, showing a +10.52% lift in CTR and a +16.30% improvement in CVR, demonstrating PCR-CA's effectiveness in real-world deployment. The new framework has now been fully deployed on the Microsoft Store.

02.
PLOS Computational Biology 2026-06-10

Interpreting higher-order dependence in multimorbidity using cohort data: A partial information decomposition approach

by Cillian Hourican, Geeske Peeters, René J. F. Melis, Almar Kok, Natasja M. van Schoor, Sandra Wezeman, Mike Lees, Marcel G. M. Olde Rikkert, Rick Quax In the context of multimorbidity, clinical features seldom act in isolation: symptoms, signs and behaviours form interdependent systems in which joint effects on function can be demonstrated only when features are considered together. We introduce an open, reusable workflow that detects and interprets these “together-only” interactions using bivariate Partial Information Decomposition (PID; two sources to one target), linking synergy-based dependence to the broader network of clinical variables rather than to a single target. The workflow estimates synergy with small-sample bias correction and summarises each pair in a Breadth–Uniformity–Synergy–Total (BUST) map: breadth of synergy across target variables (broad “generalist” vs narrow “specialist” patterns), cross-stratum uniformity across age, sex and multimorbidity (uniform vs subgroup-specific), synergy strength, and total shared information. Simple diagnostics contrast observed targets with additive expectations, revealing the specific joint configurations through which non-additive effects arise. Applied to data from the Longitudinal Ageing Study Amsterdam, we treated all health-related variables—covering symptoms, clinical signs, behaviours, lifestyle factors, and self-rated health indicators—as both sources and targets in the PID framework. This symmetric design permits synergy to be quantified for every pair of variables with respect to every other variable. The workflow identifies synergistic constellations that additive models miss. Multidomain cliques involving subjective health, pain, cognition and grip strength showed multiple non-additive configurations, whereas pairs such as alcohol use with grip strength exhibited focused, narrow but uniform synergy. Notably, the pairs with the strongest synergistic contributions were largely distinct from those with the highest total mutual information, indicating that synergy captures dependency structure overlooked by conventional association measures. Rather than a new measure, this work provides a bias-aware workflow that makes higher-order dependence visible and transferable. Our results support synergy-aware mapping as a practical complement to conventional multimorbidity analyses: it highlights specific combinations of routinely assessed features whose joint states may be especially informative across multiple health targets and therefore candidates for prioritised joint assessment and future multi-domain intervention studies.

03.
arXiv (CS.AI) 2026-06-24

Reentrant value fields as delayed coupled reaction-diffusion systems on finite graphs

arXiv:2605.03940v4 Announce Type: cross Abstract: We describe a dynamical system in which a symbolic field is coupled to a geometric field via a bipartite Hilbert-Schmidt kernel. The system is fully described by a retarded functional differential equation (RFDE) on the history space, subject to Lipschitz and small gain conditions. We show that the RFDE is well-posed under constant input and that it admits a compact global attractor. The principal subsystem $(H_L, X_R, P)$, which is comprised of the two primary fields as well as an executive field, is shown to be globally stable independent of delay, provided that the interfield coupling satisfies $C_{\mathcal{K}}^2

04.
arXiv (math.PR) 2026-06-12

Quenched and Annealed CLTs for the one-periodic Aztec diamond in random environment

arXiv:2510.11846v2 Announce Type: replace Abstract: We study the asymptotic behavior of random dimer coverings of the one-periodic Aztec diamond in random environment. We investigate quenched limit theorems for the height function and we extend annealed limit theorems that were recently studied in [arXiv:2507.08560]. We consider more general choices of random edge weights (independence is not assumed) and we distinguish two cases where the random edge weights satisfy the Central Limit Theorem (CLT) under different scalings. For both cases, we prove convergence to the Gaussian Free Field for the quenched fluctuations. For the annealed version, it had been shown in [arXiv:2507.08560], that Gaussian Free Field fluctuations can be dominated by the much larger fluctuations of the random environment. To access quenched fluctuations we analyze the Schur process with random parameters in a way that allows to prove the annealed CLT for the height function for non i.i.d. weights. We consider specific examples where we determine the asymptotic fluctuations.

05.
arXiv (CS.CL) 2026-06-16

Contaminated Collaboration: Measuring Gender Bias Transfer in LLM-Assisted Student Writing

Gender bias in LLMs has been studied extensively in model outputs, with biased prompts shown to amplify stereotyped generations. Whether such bias propagates into text produced by humans who use these systems, however, remains underexplored. We investigate whether gender bias in an LLM writing assistant transfers into career plan essays written by students. We first verify that a gender-biased prompt induces gender-differentiated language in LLM-generated essays, while a neutral prompt does not. We then recruited participants (N = 123) in a controlled environment to write career plan essays for paired biographical profiles differing only in gender under three conditions: no AI assistance, neutral LLM assistance, or gender-biased LLM assistance. Students in the biased condition produced essays with a significantly larger agentic gap and more gender-stereotypic occupation suggestions than those in the control and neutral conditions. Our results also reveal that this bias transfer is asymmetric: agency is suppressed in female-target essays while male-target writing remains largely unaffected. Our findings highlight the risk of bias propagation in AI-assisted writing, calling for fairness-aware design in educational AI tools.

06.
arXiv (CS.CL) 2026-06-18

Probing Semantic Alignment, Lexical Invariance, and Syntactic Influence in LLM Metaphor Processing

Large language models (LLMs) achieve strong performance on metaphor detection and interpretation tasks, yet it remains unclear what such behavioral success reveals about metaphor processing. We present a diagnostic analysis that examines the limits of behavioral evidence by probing three complementary dimensions: semantic attribute alignment, lexical invariance, and syntactic sensitivity. Using geometric probing, we assess whether model-generated interpretations align with reference semantic attributes; through context-varying substitution, we analyze the stability of lexical associations between metaphorical and literal expressions; and via controlled syntactic perturbations, we examine sensitivity in metaphor detection. Our analysis reveals that LLM-generated interpretations can exhibit semantic drift relative to reference attributes; stable lexical anchors persist across contextual conditions, potentially supporting conventional metaphors while biasing novel metaphors requiring contextual integration; and detection performance is sensitive to syntactic irregularities. These findings suggest that strong behavioral performance may reflect heterogeneous underlying signals, highlighting the need for caution when interpreting metaphor benchmarks as evidence of robust, integrated semantic understanding.

07.
arXiv (math.PR) 2026-06-25

Mean-field games with rough common noise: the linear-quadratic case

arXiv:2602.19210v3 Announce Type: replace Abstract: Motivated by mean-field games (MFG) with common noise on the one hand and pathwise stochastic control theory on the other, we formulate here a linear-quadratic (LQ) MFG with rough common noise, along with a satisfactory well-posedness theory for the linear-quadratic case. A novel Volterra-type (or mild) formulation allows to keep technical (rough-stochastic) consideration to a minimum. We derive a characterization of the optimal state and optimal control through a rough forward-backward SDE (rough FBSDE), and provide an existence and uniqueness result under the usual assumptions. Our theory is accompanied by stability estimates with respect to initial data and common noise while we also establish continuity of what we call the Itô-Lions-Lyons map for rough mean-field games. Finally, we discuss randomization of the rough common noise under appropriate conditions on the coefficients. When the latter is given by the Stratonovich lift of a Brownian motion independent of the idiosyncratic noise, we show that solutions of the rough LQ MFG coincide with those obtained by conditioning on the common noise.

08.
arXiv (CS.LG) 2026-06-12

Net-Ev$^2$: A Generative Simulator for Network Event Evolution

arXiv:2606.12494v1 Announce Type: new Abstract: Reducing real-world trial and error has long been a central goal of decision making, and generative simulators advance this goal by modeling the evolution of future states. An even more challenging yet meaningful task is simulating how disturbance events (e.g., accidents) propagate their impacts across real-world networks. The existing approaches fall short of modeling both structured attributes and unstructured semantics of events, and capturing topological structures in simulating network event evolution. Therefore, we are motivated to propose Net-Ev$^2$ ($\underline{Net}$work $\underline{Ev}$ent $\underline{Ev}$olution), a novel generative simulator that jointly leverages event cues while preserving network topology in simulations. Specifically, the framework consists of two stages, namely structure-guided masked pre-training and topology-aware diffusion process, which is achieved by U-Net-like graph downsampling and upsampling during denoising. At inference time, Net-Ev$^2$ can generate simulations using natural-language event input only, with greater flexibility for practical usage. Furthermore, we introduce Net-Ev$^2$-6.5M, a multimodal benchmark of aligned event and network traffic data across four large-scale road networks, as well as a new topology-aware metric, namely JL-MMD, to evaluate topological fidelity in generated network dynamics. Extensive experiments demonstrate the state-of-the-art performance and strong generalization ability of Net-Ev$^2$. Code is made available at https://github.com/Guangyu4/Net-Ev-2.

09.
arXiv (CS.AI) 2026-06-12

Deployment-Centered Evaluation: Predicting Query-Level Rejection Risk in a Clinical LLM System

arXiv:2606.12702v1 Announce Type: new Abstract: Large language models (LLMs) are increasingly integrated into clinical systems, making it essential to evaluate the real-world utility of these systems. However, static benchmarks tend to measure correctness rather than user acceptance, aggregate performance across queries, and require densely annotated datasets – leading to major blind spots for evaluating clinical systems. In this work, we perform a deployment-centered evaluation of an LLM system embedded within electronic health records at an academic medical center, where user feedback is sparse but closely reflects the deployment conditions. Specifically, we train a pre-response classifier that estimates the risk that a future interaction will result in the user rejecting the LLM response, based on query content and deployment-specific context available before generation. We conduct a prospective analysis of our model over 4.5 months of user feedback, finding that our prediction model achieves an AUROC of 0.719. Further, we estimate the benefit of such predictions in two downstream use cases (guardrail triggering and abstention). Our key conceptual insight is that making use of deployment-specific context (i.e., the provider type, department name, language model used for response), as opposed to only query content, improves the ability to predict whether the user will reject the system output. Altogether, our empirical case study demonstrates the feasibility of predicting user rejection using deployment-specific context, opening the door to targeted guardrails.

10.
arXiv (CS.CL) 2026-06-11

Toward Generalist Autonomous Research via Hypothesis-Tree Refinement

Scientific progress depends on a repeated loop of exploration, experimentation, and abstraction. Researchers test candidate directions, interpret the evidence, and carry the resulting lessons into later attempts. We study how an AI agent can run this loop autonomously over long horizons. We introduce Arbor, a general framework for autonomous research that combines a long-lived coordinator, short-lived executors, and Hypothesis Tree Refinement (HTR), a persistent tree that links hypotheses, artifacts, evidence, and distilled insights across time. The coordinator manages global research strategy over the tree, while executors implement and test individual hypotheses in isolated worktrees. As results return, Arbor updates the tree, propagates reusable lessons, refines the search frontier, and admits verified improvements. This design turns autonomous research from a sequence of local attempts into a cumulative process in which strategy, execution, and evidence are carried across time. We evaluate Arbor under Autonomous Optimization (AO), an operational setting where an agent improves an initial research artifact through iterative experimentation without step-level human supervision. Across six real research tasks in model training, harness engineering, and data synthesis, Arbor achieves the best held-out result on all six tasks, attaining more than 2.5x the average relative held-out gain of Codex and Claude Code under the same task interface and resource budget. On MLE-Bench Lite, Arbor reaches 86.36% Any Medal with GPT-5.5, the strongest result in our comparison.

11.
arXiv (CS.CV) 2026-06-25

Multilingual Hematology Visual Question Answering Dataset

Vision Language Models (VLMs) have shown promising capabilities in medical image analysis by jointly understanding visual and textual information for tasks such as Visual Question Answering. However, existing hematology vision-language resources remain predominantly English centric, limiting their applicability in multilingual healthcare environments. This challenge is releveant generally to South Asia and specifically to Pakistan, where Urdu is widely used despite healthcare information and digital medical systems being largely dependent on English. To investigate this gap, we conducted a survey among healthcare professionals, which revealed substantial language mismatches between clinical documentation and patient communication, emphasizing the need for multilingual healthcare technologies. To address this limitation, we introduce WBCMor VQA, a clinically validated bilingual English, Urdu morphology aware VQA benchmark for leukemia and normal white blood cell analysis. The benchmark is constructed using morphology-aware annotations from LeukemiaAttri and WBCAtt datasets and supported by a domain specific Urdu hematology dictionary to ensure linguistic consistency and clinical correctness. The final benchmark contains 110K bilingual question answer pairs serving as VQA annotations for 20K leukemic and normal single-cell images. Furthermore, we establish baseline performance by evaluating multiple open-source VLMs on the proposed benchmark. The proposed resource aims to facilitate the development of accessible and clinically relevant AI systems for multilingual healthcare environments.

12.
arXiv (CS.CV) 2026-06-12

EquiDexFlow: Contact-Grounded SE(3)-Equivariant Dexterous Grasp Generative Flows

Most learned dexterous grasp generators relegate contact forces to a downstream verification step, so a kinematically-plausible pose can still violate the conditions for a stable physical grasp. We address this with EquiDexFlow, an SE(3)-equivariant flow-matching model that jointly predicts wrist pose, joint angles, fingertip contacts, surface normals, and contact forces from an object point cloud. Our architecture projects contacts onto the object surface and forces into the Coulomb friction cone by construction, so placement and friction compliance hold without loss penalties. We prove end-to-end SE(3) equivariance and verify it empirically over 200 rotations, with wrist residuals below $0.04^\circ$ and exactly zero joint deviation. Trained on 8,100 force-closure grasps across 81 objects for the 16-DoF Allegro Hand, our model achieves zero friction violations, the best composite score, and the lowest wrench residual among all ablation variants. We retarget decoded fingertip contacts to a 16-DoF LEAP Hand via per-finger inverse kinematics, and our hardware-feasible refinement places every joint at least 5% inside its actuator envelope while preserving wrench balance. On the physical robot, retargeted EquiDexFlow-decoded grasps complete open-loop pick-and-hold trials on all six test objects, with every asymmetric object succeeding at both the canonical pose and a $120^\circ$ co-rotation. Videos, code, and checkpoints are available at https://equidexflow.github.io.

13.
bioRxiv (Bioinfo) 2026-06-24

Pharmacological Stratification of Public Bioactivity Databases: A Reusable, OECD-Anchored Curation and Benchmarking Framework Demonstrated for Opioid Receptors

Public bioactivity databases are heterogeneous not only in measurement type, where binding affinities and functional potencies are reported on different scales, but in pharmacology: the same compound and target can carry agonist, antagonist, or inhibitor records measured through binding displacement, cAMP, {beta}-arrestin, or [35S]GTP{gamma}S readouts that quantify different biological events. Pooling these records produces models whose output is detached from any coherent pharmacological claim. Prior work has standardized bioactivity at scale and quantified the noise from mixing measurement types, but pharmacological mechanism and assay-readout class have not been treated as a primary axis of large-scale curation. This study presents an auditable, OECD-anchored framework that stratifies public records by action type and assay readout before modeling, converting heterogeneous data into externally validated, interpretable QSAR tasks that compose with existing standardization resources rather than replacing them. The framework is demonstrated on the four opioid receptors (MOR, DOR, KOR, and nociceptin/orphanin FQ, NOP). Four public sources were reconciled into 72,148 merged records and 50,977 curated measurements spanning 19,585 compounds, each carrying auditable attributes for source agreement, endpoint meaning, pharmacology class, assay readout, and trust tier. Receptor-level binding tasks formed a compact benchmark with strong locked external performance, including KOR pK (R2 = 0.79, n = 798) and DOR pK (R2 = 0.77, n = 736). Pharmacology- and readout-resolved functional endpoints yielded externally validated strata that pooled labels would obscure, including a MOR antagonist functional-inhibition endpoint (R2 = 0.86, n = 110) and agonist potency endpoints for DOR, KOR, and MOR (R2 up to 0.81). Comparison against a fully pooled baseline shows that pooled models either match stratified models on coherent endpoints or reach a deceptively high R2 on functional-IC endpoints by training predominantly on binding-displacement records, so the pooled number predicts affinity rather than functional activity. SHAP attribution indicates that binding and functional potency encode partially distinct structure-activity signals. The dataset contract, not model performance alone, defines the validity and scope of a QSAR claim, and stratification is a precondition for a functional model to support a defensible claim. Curation logic, derived tables, frozen data, and reproducibility artifacts are released.

14.
arXiv (CS.AI) 2026-06-19

Learning Geometric Representations from Videos for Spatial Intelligent Multimodal Large Language Models

arXiv:2606.05833v2 Announce Type: replace-cross Abstract: Multimodal Large Language Models (MLLMs) excel at 2D semantic understanding but lack intrinsic 3D awareness, resulting in representations that fail to maintain geometric and spatial consistency across video frames. Given the scarcity of large-scale 3D data, we present GeoVR, a novel framework that learns geometric representations using purely 2D video sequences. This approach effectively restructures the semantic latent space within MLLMs to unlock spatial intelligence. Rather than employing superficial feature mixing, GeoVR reshapes the internal representations of the MLLM by distilling geometry knowledge from pre-trained 3D foundation models. This is accomplished through a multi-objective learning strategy driven by four complementary geometric targets: (1) estimating inter-frame camera poses to embed varying viewpoint dynamics, (2) regressing dense depth maps to anchor physical distances, (3) predicting a metric scale factor for real-world calibration, and (4) distilling multi-scale 3D features to align the intermediate feature space. Guided by these explicit physical and geometric constraints, the model's internal representations naturally develop strong 3D awareness. Extensive experiments on spatial reasoning benchmarks demonstrate that GeoVR achieves state-of-the-art performance, establishing a new paradigm for endowing foundation models with spatial intelligence.

15.
arXiv (CS.AI) 2026-06-19

SPOT-E: Test-Time Entropy Shaping with Visual Spotlights for Frozen VLMs

arXiv:2606.20244v1 Announce Type: cross Abstract: Vision-language models (VLMs) often underperform on evidence intensive tasks because decisive visual evidence are small, localized, and easy to overlook, leading to failures in evidence readout even when high-level reasoning is intact. Prior inference-time visual interventions can improve grounding without retraining, but they are largely open-loop and lack a mechanism to verify whether highlighted evidence is actually used. We study answer-span prediction entropy as a model-internal feedback signal and show that naive entropy minimization is ambiguous, since low entropy may arise from evidence-grounded confidence or shortcut collapse. To resolve this ambiguity, we introduce low-entropy anchors and an entropy-shaping objective that reduces answer uncertainty while preserving baseline high-confidence tokens. We instantiate this principle in SPOT-E, a plug-and-play test-time method that produces question-conditioned spotlights, optimized per instance via light-weight tuning based on Group Relative Policy Optimization (GRPO). Across all benchmarks and different VLM families, SPOT-E yields consistent gains and improved robustness under visual corruptions. Code is publicly available at: \url{https://github.com/YinBo0927/SPOT-E}

16.
arXiv (CS.LG) 2026-06-17

The Morse Transform for Discrete Shape Analysis

arXiv:2503.04507v2 Announce Type: replace-cross Abstract: The geometry of an object plays a vital role in modulating its interactions with the physical world. It nevertheless remains difficult to describe geometric information numerically for the purposes of statistical inference or classification tasks. Here, we introduce a new topological transform which leverages directional piecewise-linear Morse theory to quantify the geometry of an embedded object by cataloguing critical points across multiple height-functions. The output of this Morse transform records both the heights and the local topological type (peak, trough or saddle) of the critical points that characterise the underlying shape, retaining finer information than the Euler characteristic transform whilst naturally prioritising a shape's outermost regions. Crucially, this output can be further compressed into a rich but compact feature vector. We benchmark the Morse feature vector as a descriptor for ligand-based virtual screening (LBVS), which intrinsically depends on the shape of molecules. Under a common gradient-boosted tree classification pipeline, Morse descriptors achieve the highest mean AUROC when compared to other topological transform descriptors and to standard shape-based LBVS descriptors.

17.
arXiv (CS.LG) 2026-06-25

Structured Approximations of Measures

arXiv:2310.09149v3 Announce Type: replace-cross Abstract: We study the approximation of probability measures in the Wasserstein-$p$ distance by structured classes of approximators, motivated by applications in imaging, machine learning, and physical measurement under sensor constraints. We obtain three sets of results. First, for measures with densities bounded away from zero on a bounded Lipschitz domain $\Omega$, we prove that any approximation scheme for functions in $\mathrm{L}_p(\Omega)$ transfers, with linear rate, to a corresponding approximation scheme for measures in $\mathrm{W}_p(\Omega)$. The argument applies a theorem of Bogovskii on regularity of solutions to the continuity equation in the Benamou-Brenier formulation of optimal transport. We exhibit concrete approximation schemes (polynomials, shift-invariant spaces, cardinal interpolation with radial basis functions, kernel density estimators, and piecewise approximations on nonuniform Voronoi partitions) that fit the framework. As a matter of independent interest, we prove a negative Sobolev lower bound that generalizes existing bounds from $p=2$ to all $p\in(1,\infty)$. We also consider deterministic bounds for discrete approximations to arbitrary measures in terms of the mesh norm of a quasi-uniform set of points. We specialize these bounds to show that compactly supported measures admit a deterministic $N$-term approximation $\mu_N$ such that $\mathrm{W}_p(\mu,\mu_N) = O(N^{-\frac{1}{d}})$ for all $d\geq 1$, which matches the asymptotic optimal quantizer rate. We also extend these results to non-compactly supported measures with appropriate tail decay.

18.
arXiv (quant-ph) 2026-06-25

Preparing two-mode magnonic Schrödinger cat states in a cavity-magnon-qubit system

arXiv:2606.25511v1 Announce Type: new Abstract: The cavity-magnon-qubit system has recently been demonstrated as a new platform for preparing macroscopic quantum states in magnonic systems. Here, we propose to prepare a two-mode magnonic cat state, which is also a non-Gaussian entangled state, based on this practical system involving two yttrium-iron-garnet (YIG) spheres and a superconducting qubit coupled to a common microwave cavity. By adiabatically eliminating the cavity and resonantly driving the qubit, an effective magnon-qubit conditional-displacement interaction is achieved. Further working in the magnon-magnon strong-coupling regime and considering two identical magnon frequencies and coupling strengths to the cavity, two hybridized magnon modes are formed, of which the bright mode is prepared in a cat state after a projective measurement on the qubit, while the dark mode remains in its initial vacuum state. Such a state corresponds to a two-mode cat state of two original magnon modes, which share strong non-Gaussian entanglement. We also discuss practical dissipation and dephasing effects on the cat state. The results indicate that strong nonclassicality and non-Gaussian entanglement are present in the two-mode cat state using fully feasible parameters.

19.
arXiv (CS.CV) 2026-06-15

Manga109-v2026: Revisiting Manga109 Annotations for Modern Manga Understanding

Manga is a culturally distinctive multimodal medium and one of the most influential forms of Japanese popular culture. As AI systems increasingly target manga understanding, OCR, and translation, Manga109 has become a foundational dataset for manga-related AI research. However, the current Manga109 dataset contains inaccurate transcriptions and coarse annotations, which do not align well with modern OCR and multimodal manga understanding tasks. In this work, we revisit the dialogue text annotations of Manga109 and identify five categories of annotation issues, including inaccurate transcriptions, missing text regions, overlapping dialogue and onomatopoeia, and under-segmented speech balloons. To address these issues, we combine OCR-based issue detection and manual revision to construct Manga109-v2026, revising approximately 29,000 dialogue annotations. Our revisions better align Manga109 with modern OCR and multimodal manga understanding systems while preserving expressive structures characteristic of manga.

20.
arXiv (CS.LG) 2026-06-15

Learning Variable-Length Tokenization for Generative Recommendation

arXiv:2605.17779v2 Announce Type: replace Abstract: Generative recommendation reformulates recommendation as next-token prediction over discrete semantic identifiers (IDs). A fundamental yet unexplored design choice is that existing methods employ fixed-length tokenization for all items, implicitly assuming uniform encoding capacity regardless of item characteristics. Through systematic experiments across four datasets, we discover the Popularity-Length Paradox: popular items achieve optimal performance with short IDs, while tail items require substantially longer codes to capture discriminative semantics. This reveals a critical mismatch where popular items benefit from abundant collaborative signals and require minimal semantic detail, whereas tail items must rely on fine-grained content features due to sparse interaction data. To address this, we propose VarLenRec, a framework for learning variable-length tokenization. We develop Popularity-Weighted Information Budget Allocation (PIBA), an information-theoretic framework proving that optimal ID length should scale as a negative power of popularity. Directly implementing variable-length allocation faces two technical challenges: standard Euclidean residual quantization lacks geometric capacity to support diverse code lengths without distortion, and discrete length decisions are non-differentiable. We address these through Hyperbolic Residual Quantization, which leverages the exponential volume growth of the Poincaré ball to naturally stratify encoding capacity, and a Soft Length Controller, which enables differentiable length prediction via continuous layer retention probabilities regularized by PIBA-derived priors. Extensive experiments demonstrate that VarLenRec achieves significant improvements over state-of-the-art methods in recommendation accuracy and training/inference efficiency, revealing the importance of adaptive encoding capacity in generative recommendation.

21.
arXiv (CS.CL) 2026-06-15

Every Eval Ever: A Unifying Schema and Community Repository for AI Evaluation Results

AI evaluations are widely used for testing and understanding progress. However, the diverse evaluators bring with them inconsistencies that challenge analysis and comparison. First, results are saved in incompatible formats, scattered across leaderboards, papers, blog posts, evaluation harness logs, and custom repositories. Second, results are created by different evaluation frameworks, which produce divergent scores for nominally identical evaluations and record metadata inconsistently, hindering comparison, cross-community evaluation science, cost reduction, and reuse. We introduce Every Eval Ever, the first shared schema and community-crowdsourced repository for AI evaluation results. The schema standardizes how evaluations are represented in a unified, single JSON document. It is source-agnostic by design, ingesting results from evaluation harnesses and papers alike, and optionally stores per-instance outputs for fine-grained analysis. We contribute: (i) a community-governed metadata schema with a companion instance-level schema, the first standardization effort of its kind; (ii) automatic converters from popular formats, evaluation harnesses, and leaderboards to the unified schema; and (iii) a crowdsourced community database hosted on Hugging Face, currently spanning to date 22,235 models, 2,273 unique benchmarks, and 31 evaluation formats.

22.
arXiv (CS.LG) 2026-06-18

Smoothness-Based Derandomization of PAC-Bayes Bounds

arXiv:2606.19105v1 Announce Type: new Abstract: We study PAC-Bayes derandomization for smooth loss functions. Our goal is to obtain generalization bounds that hold with high probability for deterministic predictors by exploiting smoothness properties of both the loss and the predictor class. We show that passing from the Gibbs predictor to the deterministic predictor at the posterior mean has a precise cost, given by the generalization gap of the Jensen gap class. We control this class through its Rademacher complexity, leading to bounds for deterministic predictors that involve flatness quantities expressed in terms of parameter Jacobians and Hessians of the score map. The framework applies to both bounded and unbounded smooth loss functions, and we specialize the results to linear predictors and smooth neural networks. Finally, the Jacobian and Hessian quantities appearing in the theory motivate a practical regularizer. For BatchNorm networks, we compute this regularizer with respect to effective BatchNorm weights obtained by folding the BatchNorm transformation into the adjacent affine weights. Experiments on CIFAR-10 illustrate the behavior of this regularizer under different batch sizes.

23.
arXiv (CS.AI) 2026-06-18

Correcting Sensor-Induced Distribution Drift with Wasserstein Adversarial Learning

arXiv:2606.18561v1 Announce Type: cross Abstract: The quality of recorded data depends on the stability of the sensor system that acquires it. Sensor motion and aging can degrade the performance and stability of downstream data-driven methods. We present a Wasserstein-GAN-inspired approach for unsupervised inference of physically interpretable transformation parameters that map a changed detector response distribution back to a nominal reference distribution. In contrast to standard generative modeling, the generator is used as a learnable calibration transformation whose trainable weights represent the sought parameters, while the critic provides a distributional distance signal via the Wasserstein objective. We validate the approach on a tracking-detector toy model with controlled layer shifts and demonstrate its application on high-granularity Geant4-simulated calorimeter data with cell-wise aging effects. The method recovers aging coefficients for individual cells with correlation to ground truth and improves agreement between calibrated and reference energy-sum distributions, while exhibiting the expected degradation at increasing channel-to-channel noise levels. These results indicate that adversarial distribution matching can serve as a data-driven component of calibration strategies in settings where direct labels for degradation parameters are unavailable.

24.
arXiv (quant-ph) 2026-06-12

Coulomb crystallization of xenon highly charged ions in a laser-cooled Ca+ matrix

arXiv:2512.12266v2 Announce Type: replace-cross Abstract: We report on the sympathetic cooling and Coulomb crystallization of xenon highly charged ions (HCIs) with laser-cooled Ca$^+$ ions. The HCIs are produced in a compact electron beam ion trap, then charge selected, decelerated, and finally injected into a cryogenic linear Paul trap. There, they are captured into $^{40}$Ca$^+$ Coulomb crystals, and co-crystallized within them, causing dark voids in their fluorescence images. Fine control over the number of trapped ions and HCIs allows us to realize mixed-species crystals with arbitrary ordering patterns. By investigating Xe$^{q+}$–Ca$^+$ strings, we confirm the HCI charge states, measure their lifetime and characterize the mixed-species motional modes. Our system effectively combines the established quantum control toolbox for Ca$^+$ with the rich set of atomic properties of Xe highly charged ions, providing a resourceful platform for optical frequency metrology, searches for signatures of new physics, and quantum information science.

25.
arXiv (CS.CV) 2026-06-11

ReMoT: Reinforcement Learning with Motion Contrast Triplets

We present ReMoT, a unified training paradigm to systematically address the fundamental shortcomings of VLMs in spatio-temporal consistency – a critical failure point in navigation, robotics, and autonomous driving. ReMoT integrates two core components: (1) A rule-based automatic framework that generates ReMoT-16K, a large-scale (16.5K triplets) motion-contrast dataset derived from video meta-annotations, surpassing costly manual or model-based generation. (2) Group Relative Policy Optimization, which we empirically validate yields optimal performance and data efficiency for learning this contrastive reasoning, far exceeding standard Supervised Fine-Tuning. We also construct the first benchmark for fine-grained motion contrast triplets to measure a VLM's discrimination of subtle motion attributes (e.g., opposing directions). The resulting model achieves state-of-the-art performance on our new benchmark and multiple standard VLM benchmarks, culminating in a remarkable 25.1% performance leap on spatio-temporal reasoning tasks.