Academic Intelligence · Curated Daily

探索全球前沿学术脉络

AcademicHub 汇聚顶级期刊与预印本平台的实时文献。定制您的专属科研雷达,利用大语言模型自动生成交叉领域文献分析简报。

01.
medRxiv (Medicine) 2026-06-22

A Randomized, Controlled, Double Blind Clinical Study to Evaluate Use of Hydron Alkaline Ionised Water (HAIW) in Healthy Participants

Background and Objectives: Alkaline Ionized Water (AIW) is considered among the highest quality healthy drinking water worldwide and is widely discussed for its various health benefits. Hydron Alkaline Ionized Water (HAIW) is produced through electrolysis, resulting in a stable pH of approximately 9.5 with a negative Oxidation Reduction Potential (ORP), making it an antioxidant beverage. The objective of this study was to evaluate the safety of HAIW and its effects on digestion, sleep, energy, and overall quality of life in healthy participants compared to Packaged Drinking Water (PDW). Materials and Methods: A randomized, controlled, double blind, prospective clinical study was conducted in which a total of 24 healthy participants between the age group of 21 to 40 years were randomized in a 1:1 ratio to either HAIW Group or Packaged Drinking Water Group with equal gender distribution. Participants were hospitalized for 7 days and asked to consume at least 3 litres of the assigned water daily. Primary outcomes were safety-related laboratory parameters and adverse event monitoring. Secondary outcomes included assessment of digestion (appetite, digestion, bowel habits), urine parameters, sleep quality, freshness after waking, fatigue, energy/stamina/strength, quality of life, and global assessment Results: All 24 participants completed the study with no dropouts. Baseline demographics were comparable between the two groups. Assessment of primary safety-related laboratory parameters including Complete Blood count, liver function tests, renal function tests, blood sugar, Electrocardiogram and serum electrolytes showed non-significant change from baseline to 7 days and remained within normal limits in both groups, with non-significant difference between groups (p>0.05). HAIW showed significantly better improvement in appetite, digestion, and bowel habits from Day 2 onwards compared to Packaged drinking water. Sleep quality and freshness after waking up showed significant improvement from Day 3 and Day 2 respectively in the HAIW and PDW group, with significantly better improvement in HAIW group. Fatigue scores showed significant reduction at Day 6 and 7 in both groups with non-significant difference between groups. A total of 5 adverse events were reported (3 in HAIW, 2 in PDW), all unrelated to study products and were mild in nature. Global assessment showed excellent to good overall safety and tolerability in both groups. Conclusion: HAIW was well tolerated by all participants without any adverse effects. All laboratory safety parameters remained within normal range. HAIW demonstrated significant improvements in digestive function (appetite, digestion, bowel habits), sleep quality, and freshness after waking as compared to PDW. The study concludes that HAIW can be safely consumed. HAIW improves digestive and sleep-related functions.

02.
arXiv (CS.CV) 2026-06-16

Rotational Symmetry based Object Pose Estimation from Point Clouds in the Absence of Known 3D Models

Object pose estimation is crucial to many industrial applications, with one example being automated spray painting using a robot. However, confidentiality concerns often limit access to high-quality 3D models, posing a significant challenge for point-cloud-based pose estimation. In such scenarios, rotational symmetry, a readily accessible characteristic of many industrial objects, can provide valuable prior information to facilitate pose estimation.In this paper, we propose a method that leverages the rotational symmetry commonly found in industrial objects to address the challenge caused by the absence of 3D models. The object pose is jointly estimated with point cloud refinement through an iterative optimization process. This optimization relies on a rotational symmetry constraint loss. To construct this loss, each 3D point is rotated according to the currently estimated pose, and multiple correspondences are identified using nearest-neighbor search by exploiting the rotational symmetry property. These correspondences are then used to compute the rotational symmetry constraint loss, which iteratively refines both the pose and the point cloud.By explicitly incorporating rotational symmetry into the optimization process, the proposed method achieves robust pose estimation and generalizes well across diverse object types. The proposed method is evaluated on a dataset specifically created for point clouds without known 3D models, consisting of four categories of synthetic objects and one real wheel hub collected from a production line. Experimental results demonstrate that the proposed method achieves performance comparable to methods that rely on known 3D models.

03.
arXiv (quant-ph) 2026-06-11

Mathematical Basis for Analyzing Superconducting Phase Transitions Using Catastrophe Theory

arXiv:2606.11810v1 Announce Type: cross Abstract: We establish a rigorous mathematical bridge from quantum many-body path integrals to the cusp catastrophe model by Lyapunov-Schmidt reduction, which provides a theoretical foundation for analyzing superconducting phase transition using the catastrophe theory. First, it is proved that, near the critical point the infinite-dimensional effective action is diffeomorphic to a finite-dimensional catastrophe. Secondly, starting from Ginzburg-Landau free energy functional, the Euler-Lagrange partial differential equation can be reduced to the cusp catastrophe model. Thirdly, the fermionic imaginary-time path integral to the cusp catastrophe is derived through the Hubbard-Stratonovich transformation, Matsubara frequency expansion, and Grassmann algebra. Furthermore, we connect this framework with the adsorption potential theory we proposed, elucidating the catastrophic topological nature of the electron pairing mechanism in high-temperature superconductivity. The precise microscopic derivation of the adsorption potential from first-principles electronic structure calculations would strengthen the predictive power of the theory.

04.
arXiv (CS.LG) 2026-06-17

From Reasoning Traces to Reusable Modules: Understanding Compositional Generalization in Language Model Reasoning

arXiv:2606.18089v1 Announce Type: new Abstract: Post-training pipelines that combine supervised fine-tuning (SFT) with reinforcement learning (RL) have emerged as the key recipe for transforming large language models (LLMs) into robust reasoners. We argue that this combined success is driven by compositional generalization, which we formalize through a hierarchical latent selection model. In this framework, reasoning traces are generated by a cascade of discrete latent selection variables corresponding to reusable atomic modules, including both skills (local operations) and routing mechanisms (how intermediate information is selected, reused, and composed). Within this model, we theoretically show that SFT and RL play asymmetric, complementary roles: SFT supplies the raw module materials in compositional traces, and RL decomposes those traces to identify the latent atomic modules and enable compositional generalization. We design controlled experiments to validate this theory. Our results demonstrate that RL can extract atomic modules from compound traces supplied by SFT and recombine them to solve new configurations. Moreover, we find that training on compound traces yields stronger generalization than training on isolated atomic modules. Finally, we investigate the relationship between SFT and RL data and identify an effective protocol in which SFT ensures coverage of all atomic modules through compositional traces, while RL focuses on novel compositions outside the SFT support to drive exploration.

05.
arXiv (math.PR) 2026-06-15

Mixing Times for the Facilitated Exclusion Process

arXiv:2402.18999v2 Announce Type: replace Abstract: The facilitated simple exclusion process (FEP) is a one-dimensional exclusion process with a dynamical constraint. We establish bounds on the mixing time of the FEP on the segment, with closed boundaries, and the circle. The FEP on these spaces exhibits transient states that, if the macroscopic density of particles is at least $1/2$, the process will eventually exit to reach an ergodic component. If the macroscopic density is less than $1/2$ the process will hit an absorbing state. We show that the symmetric FEP (SFEP) on the segment $\{1,\ldots,N\}$, with $k>N/2$ particles, has mixing time of order $N^{2}\log(N-k)$ and exhibits the pre-cutoff phenomenon. For the asymmetric FEP (AFEP) on the segment, we show that there exists initial conditions for which the hitting time of the ergodic component is exponentially slow in the number of holes $N-k$. In particular, when $N-k$ is large enough, the hitting time of the ergodic component determines the mixing time. For the SFEP on the circle of size $N$, and macroscopic particle density $\rho \in(1/2,1)$, we establish bounds on the mixing time of order $N^{2}\log N$ for the process restricted to its ergodic component. We also give an upper bound on the hitting time of the ergodic component of order $N^{2}\log N$ for a large class of initial conditions. The proofs rely on couplings with exclusion processes (both open and closed boundaries) via a novel lattice path (height function) construction of the FEP.

07.
arXiv (CS.CL) 2026-06-17

Bridging Functional Correctness and Runtime Efficiency Gaps in LLM-Based Code Translation

While large language models (LLMs) have greatly advanced the functional correctness of automated code translation systems, the runtime efficiency of translated programs has received comparatively little attention. With the waning of Moore's law, runtime efficiency has become increasingly important for program quality, alongside functional correctness. Our preliminary study reveals that LLM-translated programs often run slower than human-written ones, and this issue cannot be remedied through prompt engineering alone. Therefore, our work proposes SwiftTrans, a code translation framework comprising two key stages: (1) Multi-Perspective Exploration, where MpTranslator leverages parallel in-context learning (ICL) to generate diverse translation candidates; and (2) Difference-Aware Selection, where DiffSelector identifies the optimal candidate by explicitly comparing differences between translations. We further introduce Hierarchical Guidance for MpTranslator and Ordinal Guidance for DiffSelector, enabling LLMs to better adapt to these two core components. To support the evaluation of runtime efficiency in translated programs, we extend existing benchmarks, CodeNet and F2SBench, and introduce a new benchmark, SwiftBench. Experimental results across all three benchmarks show that SwiftTrans achieves consistent improvements in both correctness and runtime efficiency.

08.
arXiv (math.PR) 2026-06-15

Trivariate Hypergeometric Series Formulas for Pure Partition Functions of Multiple $3$-SLE$_\kappa$

作者:

arXiv:2606.14038v1 Announce Type: new Abstract: Pure partition functions of multiple SLE are characterized by null-state partial differential equations, Möbius covariance, and boundary asymptotics. After quotienting by Möbius covariance, the case of three curves is the first genuinely multivariable one: the moduli space has three independent variables, naturally represented by the three unoriented cross-ratios of the three pairs of links. We solve this Möbius-normalized three-variable problem for the two basic link-pattern types of multiple \(3\)-SLE\(_\kappa\), namely the rainbow and neighbor patterns. Writing \(\beta=4/\kappa\), we construct explicit trivariate hypergeometric-series normal forms and identify them with the corresponding pure partition functions for all \(\beta>1/2\) in the rainbow case and all \(\beta\ge2/3\) in the neighbor case. Equivalently, these ranges are \(\kappa\in(0,8)\) and \(\kappa\in(0,6]\), respectively. The proof is analytic. The null-state PDEs and Möbius covariance yield recursion relations for the trivariate coefficient arrays. In the rainbow case, coefficient estimates give convergence and boundary regularity on the closed cube. In the neighbor case, Pfaff systems continue the local power series to a neighborhood of \([0,1)^3\), while side-face equations, regular normal estimates, and corner propagation give continuity on \([0,1]^3\) for \(\beta\ge2/3\). The endpoint \(\beta=2/3\), corresponding to \(\kappa=6\), requires a logarithmic normal term. The two-dimensional boundary degenerations are classical Appell \(F_1\) and Horn \(G_2\) functions. The probabilistic identification uses SLE martingale arguments and Itô calculus, together with positivity and boundary regularity. We also discuss boundary degenerations, including heuristic connections with boundary Green's functions.

09.
arXiv (CS.AI) 2026-06-12

Structured vs. Unstructured Pruning: An Exponential Gap

arXiv:2603.02234v3 Announce Type: replace-cross Abstract: The Strong Lottery Ticket Hypothesis (SLTH) states that large, randomly initialized neural networks contain sparse subnetworks capable of approximating a target function at initialization without training, suggesting that pruning alone is sufficient. Pruning methods are typically classified as unstructured, where individual weights can be removed from the network, and structured, where parameters are removed according to specific patterns, as in neuron pruning. Existing theoretical results supporting the SLTH rely almost exclusively on unstructured pruning, showing that logarithmic overparameterization suffices to approximate simple target networks. In contrast, neuron pruning has received limited theoretical attention, despite its practical appeal for direct hardware speedups. In this work, we consider the problem of approximating a single bias-free ReLU neuron by pruning hidden units of a randomly initialized two-layer ReLU network, effectively isolating the intrinsic limitations of neuron pruning. We show that achieving an $\varepsilon$-approximation requires a starting network size of $\Omega(1/\varepsilon)$ for neuron pruning, whereas weight pruning succeeds with only $O(\log(1/\varepsilon))$ hidden units, revealing an exponential separation between the two approaches.

10.
arXiv (CS.CL) 2026-06-12

PRISM: Prosody-Integrated Multi-Agent Reasoning Framework for Empathetic Spoken Dialogue

Empathetic spoken dialogue systems require not only semantically appropriate responses but also emotionally aligned prosodic expression. However, cascade pipelines often discard acoustic cues during speech-to-text conversion, while end-to-end speech models lack interpretable control over emotion and knowledge integration. To address these challenges, we propose PRISM, a multi-agent framework for empathetic spoken dialogue that decouples speech perception, response generation, and speech synthesis into coordinated components. PRISM introduces a prosody-to-language translation mechanism to stabilize large language model reasoning and enables on-demand invocation of external knowledge tools for empathetic dialogue generation. Experimental results demonstrate that PRISM achieves consistent improvements in empathy, prosodic appropriateness, and text response generation quality across objective and subjective metrics. Our code is available at: https://github.com/Bxzfrm/PRISM.

11.
arXiv (quant-ph) 2026-06-16

Fast and high-fidelity transfer of edge states via dynamical control of topological phases and effects of dissipation

arXiv:2505.16606v2 Announce Type: replace-cross Abstract: Topological edge states are robust against symmetry-preserving perturbations and noise, making them promising for quantum information and computation, particularly in topological quantum computation through the braiding operations of Majorana quasiparticles. Realizing these applications requires fast and high-fidelity dynamic control of edge states. In this work, we theoretically propose a high-fidelity protocol for transferring topological edge states by dynamically moving a domain wall between two regions with different topological numbers in one dimension. This protocol fundamentally relies on Lorentz invariance and relativistic effects, because moving the domain wall at a constant speed is described by a mass term with the uniform linear motion in the Dirac equation. We demonstrate the effectiveness of our protocol in transferring edge states with high fidelity using a one-dimensional quantum walk with two internal states, which is feasible with current experimental technology. We also investigate how bit-flip and dephasing dissipation to the environment affect transfer efficiency. Remarkably, bit (dephasing) dissipation does not affect the fidelity at the slow (fast) transfer limit, which can be explained by the relativistic effects on the edge states.

12.
arXiv (CS.LG) 2026-06-15

FreshRetailNet-LT: A Stockout-Annotated Censored Demand Dataset for Latent Demand Recovery and Forecasting in Fresh Retail

arXiv:2505.16319v4 Announce Type: replace Abstract: Accurate demand estimation is critical for the retail business in guiding the inventory and pricing policies of perishable products. However, it faces fundamental challenges from censored sales data during stockouts, where unobserved demand creates systemic policy biases. Existing datasets lack the temporal resolution and annotations needed to address this censoring effect. To fill this gap, we present FreshRetailNet-50K, the first large-scale benchmark for censored demand estimation. It comprises 50,000 store-product time series of detailed hourly sales data from 898 stores in 18 major cities, encompassing 863 perishable SKUs meticulously annotated for stockout events. The hourly stock status records unique to this dataset, combined with rich contextual covariates, including promotional discounts, precipitation, and temporal features, enable innovative research beyond existing solutions. We demonstrate one such use case of two-stage demand modeling: first, we reconstruct the latent demand during stockouts using precise hourly annotations. We then leverage the recovered demand to train robust demand forecasting models in the second stage. Experimental results show that this approach achieves a 2.73% improvement in prediction accuracy while reducing the systematic demand underestimation from 7.37% to near-zero bias. With unprecedented temporal granularity and comprehensive real-world information, FreshRetailNet-50K opens new research directions in demand imputation, perishable inventory optimization, and causal retail analytics. The unique annotation quality and scale of the dataset address long-standing limitations in retail AI, providing immediate solutions and a platform for future methodological innovation. The data (https://huggingface.co/datasets/Dingdong-Inc/FreshRetailNet-50K) and code (https://github.com/Dingdong-Inc/frn-50k-baseline}) are openly released.

13.
arXiv (CS.LG) 2026-06-19

Environment-Adaptive Covariate Selection: Learning When to Use Spurious Correlations for Out-of-Distribution Prediction

arXiv:2601.02322v2 Announce Type: replace-cross Abstract: A common approach to out-of-distribution prediction restricts models to causal or invariant covariates to avoid spurious associations that may change across environments. Despite its theoretical appeal, this strategy can underperform empirical risk minimization when only a subset of the causal parents of the outcome is observed. In such settings, non-causal covariates can serve as proxies for unobserved causal parents and improve prediction when the proxy relationship is stable, but they can hurt when shifts disrupt that relationship. Thus, the optimal covariate set can depend on the specific shift encountered. Because different shifts leave signatures in the unlabeled covariate distribution, we propose an environment-adaptive covariate selection algorithm that maps environment-level summaries to environment-specific covariate sets. These summaries may be hand-crafted or learned from multi-environment data, and prior causal knowledge can be incorporated as constraints. Across simulations and applied datasets, the proposed method improves over static causal, invariant, and other non-adaptive rules under diverse shifts.

14.
arXiv (CS.CV) 2026-06-16

Segmentation-based Detection for Efficient Multi-Task Spacecraft Perception

Vision-based perception is fundamental to Space Situational Awareness and autonomous on-orbit operations such as rendezvous, docking, servicing, and navigation. However, progress in this area is limited by the scarcity of annotated space imagery and by challenging visual-domain characteristics including severe illumination changes, low signal-to-noise ratio, and high contrast. We address Stream 1 of the SPARK 2026 Challenge, which requires a single model for spacecraft classification, detection, and fine-grained component segmentation across multiple target types. We propose a compact architecture that integrates a MobileNetV3 encoder with a U-Net-style decoder, combining computational efficiency with accurate dense prediction. Detection is derived analytically from the union of predicted component masks, avoiding a separate bounding-box regression head in the single-spacecraft setting. Our method achieved an overall leaderboard score of 0.9482, with task-specific scores of 1.0000 in classification, 0.9788 in detection, and 0.8917 in segmentation. The proposed approach ranked second overall in the SPARK 2026 Challenge, demonstrating that lightweight encoder-decoder architectures can deliver strong multi-task performance for practical onboard space vision systems.

15.
arXiv (CS.AI) 2026-06-18

ThinkDeception: A Progressive Reinforcement Learning Framework for Interpretable Multimodal Deception Detection

arXiv:2606.18988v1 Announce Type: new Abstract: Multimodal deception detection is critical for identifying fraudulent intentions, yet existing approaches predominantly rely on end to end black–box paradigms. These methods suffer from a severe lack of interpretability failing to provide transparent reasoning trajectories and struggling to explicitly capture the subtle, cross modal inconsistencies inherent in deceptive behaviors. To transcend these limitations, we propose ThinkDeception, a novel and interpretable multimodal deception detection framework. As a pioneering effort, it introduces Multimodal Large Language Models (MLLMs) into this domain, transforming deception detection from a traditional binary classification task into an explicit cognitive reasoning process. Facilitated by the first meticulously annotated step–by–step multimodal Chain of Thought (CoT) dataset, we develop a foundational model, ThinkDeception Base, empirically validating the critical role of modal inconsistency in decoding deception. Building upon this foundation, our core innovation lies in proposing Visual-Audio Consistency Group Relative Policy Optimization(VAC–GRPO) equipped with a progressive training strategy. Distinct from standard GRPO, we stratify the training data into four progressive difficulty tiers, guiding the model through a psychologically grounded easy–to–hard cognitive transition. By innovatively coupling this dynamic curriculum scheduler with a multi dimensional, process aware reward mechanism and a reflective learning paradigm, we significantly elevate the model's overall reasoning quality. Extensive experiments on mainstream benchmarks demonstrate that ThinkDeception establishes a new SOTA, significantly outperforming existing methods in both detection accuracy and rationale quality. Ultimately, this work successfully drives the field of deception detection toward interpretable, multimodal cognitive reasoning.

16.
Nature (Science) 2026-06-18

Daily briefing: The proteins that protect us from deadly mutations

作者:

Proteins that ‘buffer’ the effects of mutations could help to treat diseases such as cancers. Plus, goats can follow human voices and the battle over a key ocean observatory project in the United States. Proteins that ‘buffer’ the effects of mutations could help to treat diseases such as cancers. Plus, goats can follow human voices and the battle over a key ocean observatory project in the United States.

17.
arXiv (CS.CV) 2026-06-17

NeuroClaw Technical Report

Agentic artificial intelligence systems promise to accelerate scientific workflows, but neuroimaging poses unique challenges: heterogeneous modalities (sMRI, fMRI, dMRI, EEG), long multi-stage pipelines, and persistent reproducibility risks. To address this gap, we present NeuroClaw, a domain-specialized multi-agent research assistant for executable and reproducible neuroimaging research. NeuroClaw operates directly on raw neuroimaging data across formats and modalities, grounding decisions in dataset semantics and BIDS metadata so users need not prepare curated inputs or bespoke model code. The platform combines harness engineering with end-to-end environment management, including pinned Python environments, Docker support, automated installers for common neuroimaging tools, and GPU configuration. In practice, this layer emphasizes checkpointing, post-execution verification, structured audit traces, and controlled runtime setup, making toolchains more transparent while improving reproducibility and auditability. A three-tier skill/agent hierarchy separates user-facing interaction, high-level orchestration, and low-level tool skills to decompose complex workflows into safe, reusable units. Alongside the NeuroClaw framework, we introduce NeuroBench, a system-level benchmark for executability, artifact validity, and reproducibility readiness. Across multiple multimodal LLMs, NeuroClaw-enabled runs yield consistent and substantial score improvements compared with direct agent invocation. Project homepage: https://cuhk-aim-group.github.io/NeuroClaw/index.html

18.
arXiv (CS.CV) 2026-06-17

ProCUA-SFT Technical Report

Training computer-use agents (CUAs) – models that interact with graphical desktops through screenshots and keyboard/mouse actions – requires large-scale, diverse trajectory data collected in full desktop environments. The largest public resource, AgentNet (22.5K human trajectories), leads to negative transfer when used for supervised fine-tuning (SFT): continuing training UI-TARS 7B on AgentNet causes OSWorld success rate to fall from 26.3% to 8-10%. We present ProCUA-SFT, a dataset of 3.1M step-level SFT samples distilled from 93K synthetic trajectories across 2,484 application combinations. The dataset is produced by a fully automated pipeline that (i) synthesizes grounded tasks on live desktops seeded with real-world content – 912 spreadsheets from SpreadsheetBench, approximately 10K permissively-licensed presentations from Zenodo10K, and multi-application OSWorld configs – and (ii) verifies each task's feasibility through binary precondition checking before rollout. A single VLM (Kimi-K2.5) serves as goal generator, precondition judge, and trajectory executor, eliminating planner-actor capability gaps. Each trajectory is expanded into step-prefix samples that exactly reproduce the context layout seen at inference time. Fine-tuning UI-TARS 7B on ProCUA-SFT for one epoch yields 45.0% on OSWorld – an 18.7 percentage-point improvement over the base model and over 35% above AgentNet-trained counterparts. A subset of ProCUA was incorporated into the training data for the Nemotron 3 Nano Omni model, contributing to its computer-use capabilities.

19.
arXiv (CS.CV) 2026-06-16

Learning Directional Semantic Transitions for Longitudinal Chest X-ray Analysis

Chest X-ray (CXR) interpretation often requires longitudinal comparison to assess disease progression. Existing approaches typically rely on temporal feature fusion or inter-study discrepancy modeling, yet remain limited in capturing subtle progression semantics and overlook the inherently directional nature of disease trajectories. In this paper, we propose ProTrans, a novel vision-language pretraining framework that formulates disease progression as a directional semantic transition between paired CXR studies. ProTrans leverages radiology reports to anchor individual CXR representations within interpretable disease states, and introduces a learnable progression feature map to explicitly encode semantic shifts between states, aligned with report-derived progression descriptions. To enforce direction-aware perception, ProTrans incorporates a reversed temporal modeling process and imposes bidirectional reconstruction consistency across states and transitions, thereby disentangling directional semantics and promoting coherent trajectory modeling. Extensive experiments on longitudinal downstream tasks, including disease progression classification and progression captioning, demonstrate that ProTrans consistently outperforms existing methods, establishing a unified pretraining framework for longitudinal CXR understanding. https://github.com/RPIDIAL/ProTrans

20.
arXiv (CS.CL) 2026-06-19

Beyond the GUI Paradigm: Do Mobile Agents Need the Phone Screen?

Recent advances in mobile agents are dominated by the GUI paradigm, in which agents perceive UI information and emit screen interactions. However, mobile platforms also expose a command-line interface (CLI) that provides direct access to device services and data. We argue CLI deserves first-class consideration alongside GUI. We evaluate three coding agents (Claude Code, Terminus-2, mini-swe-agent) across four model APIs on AndroidWorld and MobileWorld without any mobile-specific post-training, comparing against three reproducible GUI baselines (GUI-Owl-1.5-32B, MAI-UI, Qwen3-VL-32B). Claude Code (Opus 4.7) reaches 71.8\% and 51.9\%, outperforming every reproducible GUI baseline (69.3/68.1/57.8\% on AndroidWorld; 43.2/26.3/13.3\% on MobileWorld), while every other CLI configuration remains competitive. To establish the paradigm's ceiling, we provide oracle CLI solutions that reach 88.8\% on AndroidWorld (103/116 tasks CLI-solvable) and 86.3\% on MobileWorld (101/117 tasks CLI-solvable), indicating substantial room for future improvement. To cover everyday user intents beyond the GUI scope, we introduce the CLI-Advantage Task Suite, comprising 45 templates across five categories: bulk operations, multi-condition filtering, aggregation, cross-app workflows, and hidden device state. Every CLI agent outperforms every GUI baseline in all five categories, with substantially fewer steps per task (10.7 vs.\ 18.6). To support future research on mobile CLI agents, we will open-source agent implementations, oracle solutions, the CLI-Advantage suite, and evaluation infrastructure.

21.
arXiv (CS.CV) 2026-06-15

FEMOT: Multi-Object Tracking using Frame and Event Cameras

Conventional RGB cameras have been widely used in multi-object tracking due to their ability to capture rich appearance and semantic information. However, their performance is often degraded under complex real-world challenges, such as motion blur, low illumination, and overexposure. Bio-inspired event cameras offer high temporal resolution and high dynamic range, providing complementary cues under extreme scenarios. Nevertheless, RGB-event multi-object tracking remains underexplored due to the lack of large-scale and well-annotated datasets. To address this issue, we propose FEMOT, a large-scale RGB-event multi-object tracking dataset that covers diverse real-world scenarios and 14 challenging attributes. With both RGB and event data as well as high-quality annotations, FEMOT provides a reliable platform for systematically evaluating RGB-event multi-object tracking methods. Based on FEMOT, we retrain and evaluate over ten strong trackers, thereby establishing a comprehensive benchmark for future research. Furthermore, we propose FEMOTR, a multimodal tracking framework that decouples RGB and event features and fuses them in the frequency domain, thereby effectively exploiting their complementary characteristics for robust object localization and identity association. Extensive experiments on FEMOT and DSEC-MOT datasets demonstrate the effectiveness of the proposed method. The source code and benchmark dataset have been released on https://github.com/Event-AHU/FEMOT.

22.
arXiv (CS.LG) 2026-06-16

Learning the generating functional for variance reduction in lattice QCD

arXiv:2606.15986v1 Announce Type: cross Abstract: The generating functional in quantum field theory provides the natural framework for constructing correlation functions as derivatives with respect to source operators. We present a methodology that leverages machine-learned normalizing flows to reduce the variance of arbitrary $N$-point correlation functions of bosonic operators in lattice gauge field theory calculations by encoding a representation of the generating functional. We show that it is possible to systematically approach noiseless estimators of correlation functions in this framework. We demonstrate this methodology with applications to calculations of glueball correlation functions and Wilson loops in Quantum Chromodynamics and Yang-Mills theory. The results show up to three orders of magnitude variance reduction.

23.
arXiv (CS.CV) 2026-06-11

Finding Sparse Subnetworks in One Training Cycle via Progressive Magnitude-Based Pruning

Neural network pruning reduces model size by removing less important parameters while aiming to preserve predictive performance. Although the Lottery Ticket Hypothesis (LTH) shows that sparse subnetworks can match dense networks when trained from suitable initializations, its iterative pruning procedure requires multiple complete training cycles. This work evaluates progressive magnitude-based pruning as a single-cycle alternative. The method gradually increases sparsity during training using a linear schedule and updates pruning masks based on active weight magnitudes. We conduct systematic experiments on CIFAR-10 and MNIST across ResNet, VGG-style, and LeNet architectures, comparing the proposed method with representative iterative and initialization-based pruning baselines, including LTH, SNIP, and GraSP. On CIFAR-10, the method achieves 95.12\% accuracy on ResNet-18 at 72.9\% sparsity, compared with 90.5\% reported for LTH. At extreme sparsity, it achieves 93.13\% accuracy on a VGG-like architecture at 97\% sparsity, compared with approximately 92.0\% for SNIP, and 93.44\% accuracy on VGG-19 at 97.97\% sparsity, compared with 92.19\% for GraSP at 98\% sparsity. A sparsity-accuracy analysis on ResNet-18 further shows that accuracy remains within 0.1 percentage points of the dense baseline across 70–85\% sparsity. These results indicate that progressive magnitude-based pruning provides an effective single-cycle approach for neural network sparsification under the evaluated settings.

24.
arXiv (CS.AI) 2026-06-17

The Discrete-Log Clock: How a Transformer Learns Modular Multiplication

arXiv:2606.17399v1 Announce Type: cross Abstract: When small transformers grok modular multiplication, prior work reports that the learned embedding has a "dense" Fourier spectrum requiring all frequencies. This contrasts with modular addition, where only a sparse set of key frequencies suffices. We show this density is an artifact of analyzing in the wrong basis. The natural Fourier transform for multiplication is not the standard additive DFT but the multiplicative character transform, which decomposes functions on the multiplicative group $(\mathbb{Z}/p\mathbb{Z})^*$ into its irreducible representations. Applying this transform to a grokked transformer trained on $a \cdot b \bmod 113$, we find the embedding spectrum becomes highly sparse (Gini coefficient 0.58 vs. 0.07 in the additive basis) with only 4 key frequencies carrying significant energy. Furthermore, 96.9% of MLP neurons are cleanly tuned to a single multiplicative frequency, and neuron activation heatmaps reveal 2D-periodic structure when reordered by the discrete logarithm. These results demonstrate the transformer reduces multiplication to addition in discrete-log space, implementing a "Discrete-Log Clock" algorithm analogous to Nanda et al.'s Clock algorithm for addition. The methodology generalizes: matching the analysis basis to the algebraic structure of the task reveals interpretable structure where standard tools see noise.

25.
arXiv (math.PR) 2026-06-19

Towards practical PDMP sampling: Metropolis adjustments, locally adaptive step-sizes, and NUTS-based time lengths

arXiv:2503.11479v2 Announce Type: replace-cross Abstract: Piecewise-Deterministic Markov Processes (PDMPs) hold significant promise for sampling from complex probability distributions. However, their practical implementation is hindered by the need to compute model-specific bounds. Conversely, while Hamiltonian Monte Carlo (HMC) offers a generally efficient approach to sampling, its inability to adaptively tune step sizes impedes its performance when sampling complex distributions like funnels. To address these limitations, we introduce three innovative concepts: (a) a Metropolis-adjusted approximation for PDMP simulation that eliminates the need for explicit bounds without compromising the invariant measure, (b) an adaptive step size mechanism compatible with the Metropolis correction, and (c) a No U-Turn Sampler (NUTS)-inspired scheme for dynamically selecting path lengths in PDMPs. These three ideas can be seamlessly integrated into a single, `doubly-adaptive' PDMP sampler with favourable robustness and efficiency properties.