Academic Intelligence · Curated Daily

探索全球前沿学术脉络

AcademicHub 汇聚顶级期刊与预印本平台的实时文献。定制您的专属科研雷达,利用大语言模型自动生成交叉领域文献分析简报。

01.
arXiv (CS.CV) 2026-06-24

MapReason-OSM: Can Vision-Language Models Make Graph-Verifiable Mobility Decisions from Street Maps ?

Vision-language models (VLMs) are increasingly used to read maps for logistics, delivery, and accessible navigation, where the output is an actionable decision (a route, a pin, a parking choice) that must respect the road network. Yet most map benchmarks grade free text or multiple-choice answers that cannot be verified against the underlying graph. We present MapReason-OSM, a benchmark and evaluation harness for graph-verifiable mobility decisions on self-rendered OpenStreetMap panels. We render fixed-style maps for ten U.S. downtowns at two aligned zoom scales, overlay a consistent marker grammar, and pair each panel with a hidden street graph and exact oracles, yielding 6,000 instances (12,000 panels across the two zooms) over 12 routing, facility-location, and visual disambiguation tasks. Models return structured decisions that we snap back to the graph and score for validity, legality, optimality, and constraint satisfaction, plus cross-zoom consistency. Across seven VLMs, models read maps and route simply but fail at graph cost reasoning (single-facility pin placement is near chance even for frontier reasoning models), and are frequently scale-inconsistent. We release the benchmark, harness, and deterministic generator. Code and data: https://github.com/Vi-Sri/mapreason-osm

02.
arXiv (CS.AI) 2026-06-24

On the Smallness of the Large Language Models Scaling Exponents

arXiv:2606.24504v1 Announce Type: new Abstract: We discuss reasons why the scaling exponents of current Large Language Models (LLMs) applications are indicating an unsustainable regime in terms of energy resources. We further show that attributing the smallness of such exponents to a numerical bias due to the neglect of a non-zero value of the loss function in the limit of infinite data (``pedestal effect") does not remove the unsustainability issue. Finally, the effects of the smoothness (roughness) of the data on the scaling exponents is commented upon based on an analogy with phenomenological models of fluid turbulence.

03.
arXiv (CS.CV) 2026-06-15

Rotation-Invariant Spherical Watermarking via Third-Order SO(3) Representation Coupling

Reliable watermarking of panoramic imagery is fundamentally challenged by arbitrary 3D rotations. As panoramas are defined on the sphere, they naturally transform under the action of $SO(3)$, rendering conventional planar representations and augmentation-based robustness strategies inadequate and devoid of theoretical guarantees. To address this, we formulate panoramas as spherical signals and leverage $SO(3)$ representation theory to derive provably rotation-invariant descriptors. While spherical harmonic coefficients transform equivariantly under rotations, the natural invariant constructions are typically limited to zeroth-order statistics which eliminate directional information and severely constrain embedding capacity. In this work, we introduce a principled third-order invariant construction by coupling higher-order $SO(3)$ irreducible representations via tensor products and projecting onto the trivial representation. This yields a spherical invariant bispectrum that preserves phase information while remaining strictly rotation-invariant. Leveraging this property, we embed watermarks into higher-order spherical harmonic coefficients and recover them from invariant bispectral scalars, enabling reliable extraction under arbitrary 3D rotations. We provide a theoretical proof of $SO(3)$ invariance for it and demonstrate experimentally its near-perfect robustness to continuous rotations while maintaining high visual fidelity.

04.
arXiv (quant-ph) 2026-06-11

Strong-field control of the $Z$-boson resonance in $e^+e^-$ collisions

arXiv:2606.09394v2 Announce Type: replace-cross Abstract: Resonant $Z$-boson production is a cornerstone of precision electroweak physics, with its vacuum line shape set by the $Z$ mass, width, and collision kinematics. We show that a strong laser field can significantly alter this picture. By treating the field nonperturbatively, we find that laser dressing of the incoming fermions alters the effective collision kinematics and opens laser-photon exchange channels, including multiphoton processes, in $e^{+}e^{-}$ collisions. As a result, the $Z$-resonance profile develops distinct intensity-dependent regimes, evolving from the vacuum limit to saturation at intermediate field strengths and to an approximately quadratic enhancement at higher intensities. Additionally, the polarization composition of the produced $Z$ bosons is redistributed. In particular, at high intensities the laser-induced contribution can compensate the intrinsic chiral asymmetry of the electroweak interaction, leading to nearly parity-balanced $Z$-boson production. Our results identify that strong classical fields can dynamically control electroweak resonance phenomena, opening a bridge between strong-field QED and high-energy collider physics.

05.
arXiv (CS.CL) 2026-06-11

Fast Speech Foundation Model Distillation Using Interleaved Stacking

Distilling a large speech foundation model (SFM) into an efficient student model has been successfully applied to low-resource environments. Although distillation reduces inference latency, it requires an additional student model training. However, the training efficiency of SFM distillation remains underexplored. In this work, we explore training acceleration of SFM distillation to speed up model deployment. We examine the potential of stacking, in which the model depth is progressively increased through training until the target model depth is reached. While existing stacking methods improve training speed, they suffer from performance degradation. To handle this limitation, we propose interleaved stacking, a novel stacking method that consistently preserves layer position throughout the stacking process. This property is particularly critical in SFMs, in which each layer encodes distinct layer-specific knowledge. We validate the effectiveness of the proposed method on SUPERB.

06.
arXiv (quant-ph) 2026-06-16

REGRID-QAOA: A Resource-Efficient Graph-Reduced Hybrid QAOA Framework for Physics-Constrained Power System Islanding

arXiv:2606.15083v1 Announce Type: new Abstract: Quantum computing has rapidly emerged as a powerful paradigm for tackling computationally demanding problems. In particular, quantum optimization shows strong promise for hard combinatorial problems in power systems, where increasing distributed energy penetration heightens the need for intentional islanding to maintain grid reliability and resilience. However, power system islanding is an NP-hard combinatorial optimization problem that becomes computationally prohibitive for classical solvers as network size grows, motivating the use of quantum computing as a promising alternative pipeline. This study develops a resource-efficient hybrid QAOA islanding framework that brings physics-constrained power-system partitioning into the quantum optimization workflow. The framework combines coherency-informed graph reduction, physics-aware constraint modeling, and structured post-processing to efficiently convert shallow-circuit QAOA samples into high-quality feasible islanding decisions without deep circuits or large shot budgets. The proposed framework is validated on the standard IEEE benchmark systems (9-, 14-, 24-, 30-, 39-, and 57-bus), demonstrating that the hybrid workflow achieves Gurobi-optimal solution quality with a clear quantum resource advantage over vanilla QAOA, while the resulting islanding solutions satisfy all physical feasibility requirements after network separation. This study establishes QAOA-based islanding as a viable quantum approach for critical infrastructure, with structured post-processing as the key enabler of quantum resource efficiency.

07.
arXiv (CS.CV) 2026-06-24

HANCLIP: A Family of Hyperbolic Angular Negation Vision Language Models

Vision-Language Models (VLMs) are typically pre-trained on large-scale image-text datasets to capture semantic correspondences between visual content and natural language. However, they remain surprisingly brittle to negation: models often rely on shallow word co-occurrence and are easily distracted by misleading or irrelevant textual cues, even when their overall retrieval or classification performance is strong. Moreover, directly finetuning on negation data can interfere with previously acquired knowledge, causing noticeable degradation on standard vision-language benchmarks. To tackle these issues, this work introduces HANCLIP (Hyperbolic + Angular + Negation), a family of VLMs that explicitly restructures the embedding space to encode "what an image is not" alongside "what it is." HANCLIP is trained on a compact set of 20,000 image-text quadruplets and combines a hyperbolic formulation, which models hierarchical semantic relations and asymmetries, with an angular triplet objective that drives systematic separation between negated descriptions and their corresponding positives. This geometry-aware design strengthens negation sensitivity while preserving the global structure of pretrained representations, rather than overwriting them. Extensive experiments across multiple vision-language tasks show that HANCLIP delivers consistent gains on the negation-focused NegBench benchmark, while maintaining competitive or improved performance on standard classification and image-text retrieval benchmarks. The framework is model-agnostic and can be plugged into CLIP, LongCLIP, SmartCLIP, and HiMo-CLIP without large-scale retraining, demonstrating that a carefully designed geometric objective can substantially extend the reasoning capabilities of existing VLMs using only modest additional data.

08.
arXiv (CS.LG) 2026-06-19

Constrained hybrid modelling to predict microbial dynamics and organic matter turnover in soil systems

arXiv:2606.20329v1 Announce Type: new Abstract: Soil microorganisms control organic matter cycling and largely determine how soil systems can cope with and mitigate climate change and environmental threats. Representing microbial dynamics in process-based soil models is therefore critical to predict carbon cycling in soils, albeit highly challenging to inform from data. One promising approach to improve their parametrisation is the integration of genomic data, yet modelling the complex and unknown relationship between genomes and the processes the microbes are driving is an unsolved problem. In this work, we present the first hybrid modeling framework for deriving biokinetic parameter values of a process-based soil organic matter turnover model from metagenome-inferred functional traits based on DNA sequencing data. Our model predicts biokinetic parameters of the process-based model from genomic trait data with a neural network and integrates constraints from ecological theory and literature to ensure realistic behavior, even of non-observed state variables. We evaluate our method on synthetic genomic trait datasets of varying complexity and on real data, showing that our approach improves performance over multiple baselines and learns the dynamics of unmeasurable components of the process-based model effectively, even for small training datasets.

09.
medRxiv (Medicine) 2026-06-23

Novel loci and multi-omics risk models for rheumatoid arthritis through a million-participant genome-wide association meta-analysis

Rheumatoid arthritis (RA) remains incompletely understood, limiting targeted prevention. In this work, genome-wide association study meta-analyses were performed for RA and seropositive RA, comprising approximately one million participants of European ancestry. Eight and six novel genomic risk loci were defined for RA and seropositive RA, and candidate causal genes were identified, highlighting relevant biological pathways, including established immune pathways and estrogen metabolism. Novel disease-specific polygenic risk scores (PRSs) were constructed, enhancing predictive performance over clinical risk factors (incremental C-statistics of 2.7 and 5.1 for RA and seropositive RA, respectively). In parallel, integrating metabolomic data into high-dimensional models enhanced risk stratification over models based on clinical risk factors and genomics, particularly for seropositive RA, where the hazard ratio of the highest decile increased from 4.869 to 5.697. These findings expand the understanding of genetic factors underlying RA and support the value of including PRSs in risk assessment, while suggesting metabolomic integration may further enhance risk stratification, particularly for seropositive RA.

10.
arXiv (quant-ph) 2026-06-19

Resolving problems with the continuum limit in coherent-state path integrals

arXiv:2602.02466v2 Announce Type: replace Abstract: The paper solves the problem of continuum limit in bosonic thermal coherent-state path integrals. For this purpose, exact discrete versions of the path integral are constructed for three different orderings of the Hamiltonian: normal, anti-normal and symmetric (Weyl order). Subsequently, their different continuum versions are checked on the harmonic oscillator, to choose the symmetric ordering as a possibly correct choice for all polynomial Hamiltonians. Spotted mathematical subtleties in the simple case serve as a clue to the general solution. Finally, a general justification for the symmetric order is provided by deriving the continuum path integral starting from the exact discrete case using a renormalization procedure in the imaginary time frequency domain. While the role of Weyl order has already been found, the paper provides the missing proof of its suitability for every polynomial Hamiltonian and simplifies the previously established construction by referring only to creation and annihilation operators (without position and momentum operators).

11.
medRxiv (Medicine) 2026-06-18

A Brain-Aging Transcriptomic Signature Reclassifies WHO Glioma Grade and Predicts Survival Independently of IDH Status: A Multi-Cohort Study

Background Despite WHO grade and IDH status, significant survival differences remain in diffuse gliomas. We hypothesized that a brain-aging transcriptomic signature, reflecting neuroinflammation, myeloid infiltration, and synaptic loss, would independently predict survival and allow for molecular reclassification. Methods A neurodegeneration score was derived via PCA of brain MRI volumes from 1,057 OASIS-3 subjects and projected onto 888 TCGA-LGG/GBM (discovery) and 693 CGGA gliomas (validation). A 14-gene signature of glial/myeloid (GFAP, AQP4, TYROBP, TREM2, C1QA, CD68, ITGAM) and neuronal (SYP, DLG4, GRIN1, GRIA1, SNAP25, SYN1, RBFOX3) genes were computed. Elastic-net Cox regression identified a 3-gene panel (C1QA, CD68, GRIA1). Kaplan-Meier, multivariate Cox, decision curve, and single-cell RNA-seq analyses were performed. Results High brain-aging scores predicted poorer overall survival (p < 0.0001) and remained an independent prognostic factor after adjusting for WHO grade and IDH status (z = 4.72, p < 0.001); chronological age was non-significant (p = 0.231). In IDH-mutant gliomas, significance was confirmed in both cohorts (TCGA p = 0.027; CGGA p < 0.0001). Bidirectional reclassification showed high-risk Grade 2 tumors with Grade 3-like survival (p = 0.00089), and indolent Grade 3 tumors resembling Grade 2 by Ki-67. Single-cell RNA-seq confirmed macrophage localization of signature genes; DCA demonstrated net benefit over grade alone at 5-30% probability thresholds. Conclusions A brain-aging transcriptomic signature independently predicts glioma survival beyond WHO grade and IDH status, validated in an independent Chinese cohort, with clinical utility for identifying high-risk Grade 2 and sparing over-treatment of indolent Grade 3 tumors.

12.
arXiv (CS.CL) 2026-06-25

How Pragmatics Shape Articulation: A Computational Case Study in STEM ASL Discourse

Most state-of-the-art sign language models are trained on interpreter or isolated vocabulary data, which overlooks the variability that characterizes natural dialogue. However, human communication dynamically adapts to contexts and interlocutors through spatiotemporal changes and articulation style. This specifically manifests itself in educational settings, where novel vocabularies are used by teachers, and students. To address this gap, we collect a motion capture dataset of American Sign Language (ASL) STEM (Science, Technology, Engineering, and Mathematics) dialogue that enables quantitative comparison between dyadic interactive signing, solo signed lecture, and interpreted articles. Using continuous kinematic features, we disentangle dialogue-specific entrainment from individual effort reduction and show spatiotemporal changes across repeated mentions of STEM terms. On average, dialogue signs are 24.6%-44.6% shorter in duration than the isolated signs, and show significant reductions absent in monologue contexts. Finally, we evaluate sign embedding models on their ability to recognize STEM signs and approximate how entrained the participants become over time. Our study bridges linguistic analysis and computational modeling to understand how pragmatics shape sign articulation and its representation in sign language technologies.

13.
arXiv (quant-ph) 2026-06-12

Testing the problem of time with cold atoms

arXiv:2509.07745v3 Announce Type: replace-cross Abstract: We realize a cold-atom system to quantitatively test relational constructions of time. A well-isolated atomic Bose-Einstein condensate evolves in a conservative trap that is partitioned by a thin optical barrier into an observed and unobserved sector, with negligible dissipation on the experimental timescale. Motivated by relational-time approaches discussed in the Wheeler-DeWitt framework, we ask whether the dynamics of the observed sector can be ordered using only internal degrees of freedom. To this end, we construct an entropic time from an experimentally defined coarse-grained entropy, and demonstrate that it can robustly order the events in the observed sector across repeated cycles of expansion and recollapse. We finally derive an effective Schroedinger equation parameterized by this internal time and show that it is able to reproduce the measured evolution. These results establish a controlled experimental setting in which relational-time constructions can be quantitatively tested.

14.
arXiv (quant-ph) 2026-06-24

Universal Extraction of Quantum Critical Exponents and Phase Transitions via Tailored Hilbert Space

作者:

arXiv:2606.24312v1 Announce Type: cross Abstract: Finite-size scaling and the renormalization group form the central toolkit for analyzing quantum phase transitions (QPTs). In this Letter, we introduce a novel Hilbert-space tailoring scheme to probe quantum critical phenomena. Applied to the second-order QPT of the one-dimensional (1D) XY model, our method yields precise critical points and exponents on lattices containing merely 50 unit cells. We further establish the universal applicability of this framework via investigations of the Berezinskii-Kosterlitz-Thouless transition in the 1D XXZ chain: critical parameters are recovered with as few as 12 lattice sites. This technique may open an alternative, efficient route to universally characterize QPT across many-body lattice systems.

15.
PLOS Computational Biology 2026-06-22

CoDaLoMic: An R package for modeling microbiome compositional and longitudinal data

by Irene Creus-Martí, Andrés Moya, Francisco J. Santonja In this paper we present CoDaLoMic, an R package for analyzing longitudinal and compositional microbiome datasets. The CoDaLoMic package implements three models specifically designed for the analysis of microbiome data that are both compositional and longitudinal. Unlike many existing methods that focus solely on pairwise interactions, CoDaLoMic also captures interactions among groups of bacteria, providing a more robust methodological framework for studying microbial relationships at the community level. In addition, the package facilitates the analysis of microbiome variability in relation to host health status and allows for the identification of groups of taxa that exhibit similar temporal dynamics. Working with time series data makes it possible to understand not only the current state of a microbial community but also its dynamics over time, which is essential for identifying patterns of ecological succession, detecting events of dysbiosis or recovery, and inferring potential causal relationships between taxa. On the other hand, focusing on interactions among groups of bacteria, rather than analyzing only pairwise relationships, enables a more integrated and functionally meaningful view of the microbiome. Many key ecological functions are the result of the collective behavior of functionally related groups of taxa. Two datasets have been considered in CoDaLoMic, one real and one simulated. The real dataset contains the information of the genera present in the microbiome of the Blatella germanica cockroach at 105 time points. The simulated dataset is defined taking Lotka-Volterra structure into account. CoDaLoMic is available at CRAN.

16.
bioRxiv (Bioinfo) 2026-06-15

Biological meaning in protein embedding space is resolution-dependent

Protein language model embeddings are increasingly used to organise biological sequences, yet how biological meaning is encoded within embedding neighbourhoods remains poorly understood. Using two independent hierarchical enzyme systems, carbohydrate-active enzymes and peptidases, we investigated how biological interpretation changes across embedding organisations aligned to different levels of biological hierarchy. Different embedding organisations give rise to distinct neighbourhood semantics. When aligned to membership-boundary resolution, embeddings robustly separated artefacts and unrelated proteins from members of the target category. However, embeddings aligned to functional-grouping resolution maintained compositional neighbourhood structure for multi-domain proteins spanning more than one functional or catalytic group. Finally, embeddings aligned to local-family resolution recovered compact family-like neighbourhoods, including families withheld from training, while weakening broader membership-boundary and functional-grouping relationships. Moreover, embeddings optimised toward the same level of biological organisation retain different biological relationships depending on optimisation trajectory employed. Together, our results show that proximity in protein embedding space has no fixed biological interpretation. Instead, biological meaning emerges across embedding resolutions through selective preservation of different forms of biological organisation.

17.
arXiv (CS.CV) 2026-06-16

DeepMine-Mamba: Mitigating Information Dilution in Mamba-Based State Space Models for Document Image Binarization

Document image binarization aims to separate foreground text from degraded backgrounds while preserving thin, broken, and low-contrast strokes. Although deep learning methods have improved binarization performance, most existing approaches rely on convolutional, transformer-based, or generative architectures, while Mamba-based state space models remain largely unexplored for this task. In this work, we investigate Mamba-based feature propagation and observe that direct state-space propagation may dilute weak foreground cues during long-range modeling, especially faint ink traces, fragmented characters, and boundary-sensitive stroke details. To address this problem, we propose DeepMine-Mamba, a Mamba-based binarization framework equipped with a novel Anti-Dilution Gate that estimates propagation-induced feature changes and selectively restores stroke-sensitive local responses while suppressing unnecessary background enhancement. Experiments on DIBCO/H-DIBCO benchmarks under a strict leave-one-year-out protocol show that DeepMine-Mamba achieves competitive overall performance, with strong average FM and Fps across benchmark years. Ablation results further show that the Anti-Dilution Gate is the key component for mitigating propagation-induced foreground dilution and improving stroke preservation.

18.
arXiv (quant-ph) 2026-06-16

Non-perturbative CPMG scaling and qutrit-driven breakdown under compiled superconducting-qubit control: a single-qubit study

作者:

arXiv:2603.29525v3 Announce Type: replace Abstract: Decoherence in superconducting qubits arises from both multilevel dynamics and structured environmental noise, yet perturbative models cannot capture all resulting signatures. Here, EmuPlat couples instruction-set-architecture-level waveform generation to the hierarchical equations of motion HEOM under $1/f$ non-Markovian pure dephasing. In the resulting non-perturbative regime – where filter-function predictions become quantitatively uninformative – CPMG scaling of a three-level superconducting transmon yields one calibration result, two physical findings, and one structural null. Y-CPMG exhibits axis-dependent scaling-law breakdown – non-monotonic decoherence, partial coherence revival, and pronounced X–Y population asymmetry ($0.204$ vs ${

19.
arXiv (CS.CV) 2026-06-16

A Survey on 3D Gaussian Splatting Applications: Segmentation, Editing, and Generation

In the context of novel view synthesis, 3D Gaussian Splatting (3DGS) has recently emerged as an efficient and competitive counterpart to Neural Radiance Field (NeRF), enabling high-fidelity photorealistic rendering in real time. Beyond novel view synthesis, the explicit and compact nature of 3DGS enables a wide range of downstream applications that require geometric and semantic understanding. This survey provides a comprehensive overview of recent progress in 3DGS applications. It first reviews the reconstruction preliminaries of 3DGS, followed by the problem formulation, 2D foundation models, and related NeRF-based research areas that inform downstream 3DGS applications. We then categorize 3DGS applications into three foundational tasks: segmentation, editing, and generation, alongside additional functional applications built upon or tightly coupled with these foundational capabilities. For each, we summarize representative methods, supervision strategies, and learning paradigms, highlighting shared design principles and emerging trends. Commonly used datasets and evaluation protocols are also summarized, along with comparative analyses of recent methods across public benchmarks. To support ongoing research and development, a continually updated repository of papers, code, and resources is maintained at https://github.com/heshuting555/Awesome-3DGS-Applications.

20.
arXiv (CS.LG) 2026-06-25

KIGNet: Physics-Motivated Multi-Graph Representation Learning for Explainable Jet Tagging

arXiv:2512.07420v3 Announce Type: replace-cross Abstract: Jet identification plays a central role in analyzing data from high-energy collider experiments. While deep learning has improved jet classification, it often lacks interpretability. We introduce the Kinematic Interaction Graph Network (KIGNet), a graph neural network that integrates kinematic variables into jet classification by constructing four graph representations per jet, each weighted by a distinct variable: angular separation ($\Delta$), relative transverse momentum ($k_T$), momentum fraction ($z$), and invariant mass squared ($m^2$). Three of these ($\Delta$, $k_T$, $z$) are motivated by the Lund jet plane, grounded in perturbative QCD factorization; the fourth ($m^2$) adds complementary mass-scale sensitivity for heavy-flavor identification. Using Gradient-weighted Class Activation Mapping (Grad-CAM), we determine which variables dominate classification. Angular separation and relative transverse momentum account for about 76% of the total Grad-CAM attribution (40.72% and 35.67%), with momentum fraction and invariant mass contributing the remaining 24%. This hierarchy is consistent with the soft-collinear structure of QCD radiation in the training data, showing that the network learns physically interpretable representations rather than spurious correlations. On the JetClass dataset, KIGNet achieves a macro-accuracy of 95.07%, macro-AUC of 96.61%, and macro-AUPR of 81.52%, relative improvements of 2.45%, 3.40%, and 19.11% over the state-of-the-art baseline. On the Aspen Open Jets dataset of real CMS collision data, KIGNet produces substantially more structured latent representations than the baseline, reducing the Davies-Bouldin Index by 52.15% ($0.8395 \rightarrow 0.4017$) and increasing the Dunn Index by 42.33% ($0.0189 \rightarrow 0.0269$), confirming that physics-informed kinematic encoding generalizes beyond idealized simulation to experimental detector conditions.

21.
arXiv (math.PR) 2026-06-16

Excursion Fluctuations and Spectral Universality in Gaussian Fields

arXiv:2606.15630v1 Announce Type: new Abstract: We study the large-scale spatial fluctuations of excursion volumes for a class of smooth stationary Gaussian fields. In the case of Berry's random wave model in dimension $d \geq 2$, we show that the spatial fluctuations for fixed $u>0$ converge to the fractional Gaussian field $(-\Delta)^{-1/4}W$ in the space of tempered distributions $\mathcal S'(\mathbb{R}^d)$, where $W$ is the $d$-dimensional Gaussian white noise. This explains the long-range correlations in the apparent filament structure of the Random Plane Wave model. For a class of smooth planar Gaussian fields whose spectral density has a power-law singularity at the origin, we prove convergence to fractional Gaussian fields with an index determined by the singularity exponent. More generally, the results illustrate that, for stationary random measures, large-scale spatial fluctuations are determined by the behaviour of the spectral measure density exponent near zero.

22.
arXiv (CS.CV) 2026-06-16

Multi-Task Tennis Stroke Biomechanics Analysis Using MediaPipe Pose

We built a multi-task pipeline for tennis stroke biomechanics from plain RGB video. On top of pose-based stroke recognition, it adds two new tasks, predicting shot direction and grading posture quality, plus a rule-based feedback layer that suggests coaching tips. Strokes are found automatically using a weighted joint velocity score, s(t) = 0.5 v_wrist + 0.3 m_elbow + 0.2 m_shoulder, removing the need for manual annotation. Pose comes from MediaPipe Pose Landmarker (33 landmarks, metric world coordinates), with each stroke turned into a 30-frame by 39-feature sequence for TennisTransformerGPU, a compact 564,103-parameter transformer (4 layers, 4 heads, d=128) with three parallel output heads. Trained on 1,281 labeled strokes from 7 pros and 1 amateur across 11 videos, it hits 83.7% stroke-type accuracy, 61.9% on direction, and 62.6% on posture under a random 80/20 split. The interesting test is cross-player: train on pros, evaluate on the amateur. Stroke type barely budges, 82.9%, a 0.8% drop. Direction prediction does not transfer; it just falls back to the majority class. An ablation shows why world coordinates matter so much here: switching to image-space landmarks tanks cross-player stroke-type accuracy from 83% to 47% and direction from 68% to 21%. Everything runs on Kaggle's free T4 GPU tier and is fully reproducible.

23.
arXiv (CS.LG) 2026-06-11

MobileFineTuner: A Mobile-Native Framework for On-Device LLM Fine-Tuning in Real-World Embedded AI Applications

arXiv:2512.08211v2 Announce Type: replace Abstract: Large language models (LLMs) are moving from cloud-centric services toward on-device embedded AI, where models interact with private, longitudinal signals sensed from users and their physical environments. Mobile phones are a natural platform for such applications because they are continuously carried by users, connected to wearable sensors, and deeply integrated with daily mobile applications. However, practical LLM fine-tuning on commodity phones remains difficult. Existing fine-tuning frameworks are largely Python-based and server-oriented, making them hard to deploy inside mobile applications. We present MobileFineTuner, a mobile-native open-source framework for end-to-end LLM fine-tuning on commodity mobile phones. MobileFineTuner is implemented in C++ and provides a reusable training stack. To make fine-tuning feasible under mobile resource constraints, MobileFineTuner integrates a resource-aware training runtime with memory-efficient attention, activation checkpointing, gradient accumulation, parameter sharding, and energy-aware scheduling. We evaluate MobileFineTuner on real mobile phones using GPT-2, Gemma 3, and Qwen2.5 models across multiple fine-tuning tasks. The results show that MobileFineTuner reproduces standard Full-FT and LoRA fine-tuning behavior, substantially reduces memory pressure and improves executability on memory-constrained phones. We further demonstrate MobileFineTuner through a private campus health-agent application, where a local LLM is fine-tuned on user-specific wearable-sensing records to provide more personalized responses while keeping raw records on the phone. These results establish MobileFineTuner as a practical toolkit for studying and building on-device LLM fine-tuning applications in embedded AI and sensing systems.

24.
arXiv (CS.CL) 2026-06-11

WorldReasoner: Evaluating Whether Language Model Agents Forecast Events with Valid Reasoning

Forecasting real-world events requires language-model agents to reason under uncertainty from incomplete, time-bounded information. Yet evaluating whether agents genuinely forecast requires more than final-answer accuracy: a model may be correct by recalling memorized training facts, citing fabricated evidence, or producing an unsupported causal story. We present WorldReasoner, an evaluation framework for temporally valid event forecasting. Each task gives an agent a resolved forecasting question, a simulated forecast date, and access only to evidence available before that date; after resolution, the framework scores the submitted probability, cited evidence, and optional causal event graph. WorldReasoner reports three complementary axes: outcome quality against resolved answers, evidence quality over cited sources, and reasoning quality against post-resolution hindsight graphs. The benchmark is built by an agentic construction pipeline that generates forecasting questions, collects time-stamped evidence, and builds hindsight reference graphs at scale, yielding 345 resolved tasks derived from 14,141 articles with graphs covering 8,087 extracted events. Across six controlled agent settings, temporally valid retrieval is the strongest driver of outcome accuracy; causal graph construction improves key-event recovery; and correct graph-enabled forecasts are more strongly grounded in key events and relevant sources, yet agents still struggle to convert grounded evidence into calibrated probabilities.

25.
arXiv (CS.CL) 2026-06-15

MET-Bench: Multimodal Entity Tracking for Evaluating the Limitations of Vision-Language and Reasoning Models

Entity state tracking is a necessary component of world modeling that requires maintaining coherent representations of entities over time. Previous work has benchmarked entity tracking performance in purely text-based tasks. We introduce MET-Bench, a multimodal entity tracking benchmark designed to evaluate the ability of vision-language models to track entity states across modalities. Using three domains, we assess how effectively current models integrate textual and image-based state updates. Our findings reveal a significant performance gap between text-based and image-based entity tracking. We empirically show this discrepancy primarily stems from deficits in visual reasoning rather than perception. We further show that explicit text-based reasoning strategies improve performance, yet limitations remain, especially in long-horizon multimodal tasks. We apply reinforcement learning to improve entity tracking in open-source VLMs. This yields substantial in-modality gains, but does not transfer robustly across input modalities. Our results highlight the need for improved multimodal representations and reasoning techniques to bridge the gap between textual and visual entity tracking.