Academic Intelligence · Curated Daily

Explore the Frontier of Global Academia

AcademicHub aggregates real-time literature from top journals and preprint platforms. Build your personal research radar and let large language models compile cross-disciplinary analysis briefings automatically.

01.
arXiv (CS.LG) 2026-06-11

Urban Heat MiniCubes: An AI-Ready dataset for urban heat research

arXiv:2606.11534v1 Announce Type: cross Abstract: Urban heat is amplified by impermeable surfaces and heterogeneous built environments, yet street-level variability remains difficult to quantify because multi-sensor observations are rarely available in consistent, analysis-ready form at the necessary spatiotemporal scales. We present "Urban Heat MiniCubes," a publicly available, FAIR-oriented dataset designed for machine learning applications in urban heat research. The dataset provides harmonized 90 x 90 km gridded data cubes for 48 cities in the Western Hemisphere spanning 2022-2023, with variables reprojected and collocated to a common grid to reduce preprocessing (e.g., reprojection, resampling, and spatiotemporal alignment). Urban Heat MiniCubes includes two complementary modalities: (i) higher-spatial-resolution, lower-frequency observations from Landsat 8/9 (e.g., surface reflectances) and Sentinel-1 (e.g., synthetic aperture radar backscatter), and (ii) higher-temporal-frequency, coarser observations from GOES-R (e.g., longwave infrared brightness temperatures) and a microwave land surface temperature product. We document variables and metadata and provide technical assessment using inter-variable analyses and autoencoder-based reconstruction-error summaries across pixel classes (e.g., water and cloud). Potential use cases and limitations are also discussed.

02.
arXiv (CS.AI) 2026-06-16

BRIDGE: Biological Evidence Refinement and Heterogeneous Dynamic Gating for Gene Regulatory Networks

arXiv:2606.14734v1 Announce Type: cross Abstract: Motivation: Gene regulatory network inference from single-cell RNA sequencing (scRNA-seq) data is important for uncovering cell-state-specific transcriptional programs. However, scRNA-seq measurements are sparse and noisy, and experimentally validated TF-target interactions remain limited, making reliable inference challenging. Although graph neural networks have advanced GRN prediction, existing methods often rely on biologically unconstrained graph augmentation, such as random edge perturbation, and insufficiently control information transfer between genes and cells. These limitations may distort regulatory structures and weaken robustness under noisy and weakly supervised settings. Results: To address these issues, we propose an innovative framework named Biological Evidence Refinement and Heterogeneous Dynamic Gating for Gene Regulatory Networks (BRIDGE). BRIDGE extracts gene and cell representations from the expression matrix and its matrix dual, and performs contrastive learning in the gene space and cell space between self and neighbors across the co-expression-refined regulatory view and the original graph. It then applies heterogeneous gated encoding to adaptively regulate information transfer between genes and cells, enabling robust transcription factor-to-target gene prediction. Experiments on benchmark datasets spanning three network types and seven cell types show that BRIDGE achieves state-of-the-art AUROC and AUPRC in most settings. In particular, on Specific networks, BRIDGE improves average AUPRC by 5% over the second-best baseline, GCLink. In cross-cell-type few-shot transfer, BRIDGE consistently outperforms GCLink and GENELink across all six target cell types. A case study on hESC further supports the biological relevance of the predictions, with 9 of the top 10 and 46 of the top 100 novel TF-target interactions validated by ChIPBase.

03.
arXiv (CS.CV) 2026-06-18

Benchmarking Physics-Informed Time-Series Models for Operational Global Station Weather Forecasting

The development of Time-Series Forecasting (TSF) models is often constrained by the lack of comprehensive datasets, especially in Global Station Weather Forecasting (GSWF), where existing datasets are small, temporally short, and spatially sparse. To address this, we introduce WEATHER-5K, a large-scale observational weather dataset that better reflects real-world conditions, supporting improved model training and evaluation. While recent TSF methods perform well on benchmarks, they lag behind operational Numerical Weather Prediction systems in capturing complex weather dynamics and extreme events. We propose PhysicsFormer, a physics-informed forecasting model combining a dynamic core with a Transformer residual to predict future weather states. Physical consistency is enforced via pressure-wind alignment and energy-aware smoothness losses, ensuring plausible dynamics while capturing complex temporal patterns. We benchmark PhysicsFormer and other TSF models against operational systems across several weather variables, extreme event prediction, and model complexity, providing a comprehensive assessment of the gap between academic TSF models and operational forecasting. The dataset and benchmark implementation are available at: https://github.com/taohan10200/WEATHER-5K.

04.
arXiv (CS.LG) 2026-06-18

AGDN: Learning to Solve Traveling Salesman Problem with Anisotropic Graph Diffusion Network

arXiv:2606.19185v1 Announce Type: new Abstract: The Traveling Salesman Problem (TSP) is a cornerstone of combinatorial optimization and arises in many practical scenarios. Although graph-based learning approaches have been explored for TSP, the question of how to exploit graph structure more effectively remains open. We present the Anisotropic Graph Diffusion Network (AGDN), a new Graph Neural Network framework designed to solve TSP. Our method tackles two central difficulties: (1) the lack of informative topological prior in fully connected TSP graphs, and (2) losing connected nodes in the optimal solution after the commonly used graph sparsification techniques. To overcome these issues, we construct a MixScore transition matrix that merges node similarity with pairwise distance, and we develop an anisotropic graph diffusion strategy that supports efficient information exchange across multiple hops. Comprehensive experiments spanning diverse instance sizes and node distributions show that AGDN consistently outperforms existing methods while keeping computation time competitive. Furthermore, AGDN generalizes well to problem sizes and distributions beyond those seen during training. The implementation is publicly available at: https://github.com/LabRAI/AGDN.

05.
arXiv (CS.LG) 2026-06-16

Coercivity and Local Convergence of Physical Learning in Linear Circuits

arXiv:2606.15443v1 Announce Type: cross Abstract: Physical learning methods train physical networks to perform computational tasks using only local update rules, exploiting the physics of the system to handle the global transfer of information. We provide the first local convergence analysis of three such methods – Equilibrium Propagation (EP), Coupled Learning (CL), and a new method we call Adjoint Coupled Learning (AL) – for linear circuits, in the limit of small-nudging for both discrete and continuous time. EP and AL perform gradient descent on a natural loss function, while CL follows modified dynamics with an additional cubic correction. Assuming the existence of a solution, we identify a coercivity condition, expressed as a rank condition on a matrix built from the network's incidence structure, under which the training loss decays exponentially and the parameters converge to the solution manifold. We show that coercivity can fail by exhibiting a kite circuit in which a symmetry causes the coercivity constant to degenerate on the solution manifold, but prove using Sard's theorem that such degeneracies are non-generic: coercivity holds at every point of the solution manifold for almost every choice of desired output.

06.
medRxiv (Medicine) 2026-06-15

Prevalence and Clinical Impact of Pathogenic Variants in Cardiomyopathy Genes Among Individuals with Cardiac Conduction Disorders

Importance: Cardiac conduction disorders have traditionally been regarded as a secondary manifestation of underlying structural heart diseases. However, isolated conduction disorders may precede the onset of heart failure (HF) suggesting shared mechanisms. Objective: To evaluate the prevalence and clinical significance of pathogenic/likely pathogenic (P/LP) rare variants in cardiomyopathy genes among individuals with conduction disorders. Design, Setting, and Participants: Biobank analysis of 192,834 participants with whole genome sequence data from Vanderbilt's BioVU and 353,092 participants from the All of Us Research Program (AoU). Participants with primary conduction disorder (left bundle branch block [LBBB], right bundle branch block [RBBB], high-grade atrioventricular block [AVB]) were identified after excluding secondary causes. Exposures: P/LP variants in cardiomyopathy genes. Main Outcomes and Measures: Primary outcome was P/LP carrier status by age and HF status. Secondary outcomes included incident HF and composite ventricular arrhythmias/sudden cardiac death/mortality (VA/SCD/mortality). Results: Among 16,959 participants with conduction disorders in BioVU and 13,442 in AoU, 432 (2.6%) and 206 (1.5%) were P/LP carriers, respectively. Conduction disorder was independently associated with carrier status (BioVU p

07.
arXiv (quant-ph) 2026-06-12

Effective Geometry and Position-Dependent Mass in Dual-$q$ Quantum Mechanics

arXiv:2606.12444v1 Announce Type: new Abstract: This work investigates the deformed-derivative formalism introduced by Borges, with emphasis on the relation between the linear operator $D_{(q)}$ and its nonlinear dual counterpart $D^{(q)}$. Directly inserting the dual derivative into the kinetic term leads to a nonlinear Schrödinger equation and obscures the usual interpretation of superposition and probability. We show that this nonlinearity can be removed by a simultaneous transformation of the coordinate and of the wave function. The transformed problem is an ordinary linear Schrödinger equation in a deformed coordinate, and its representation in the physical coordinate is equivalent to a Hermitian position-dependent-mass (PDM) Hamiltonian. In this formulation, the deformation parameter $q$ determines both the effective mass profile and the associated metric. The formalism is applied to the free particle, the infinite square well, the rectangular barrier, and the harmonic oscillator in the weak-deformation regime. Comparison with the nonadditive-translation approach of Costa Filho et al. shows that the Borges dual-$q$ framework provides an alternative route to the same effective geometric structure. For $q1$, the effective length is increased, which lowers the spectrum and suppresses tunneling relative to the undeformed limit $q=1$.

08.
arXiv (math.PR) 2026-06-16

Flowing to Normality and the Fate of the Single Ring Theorem

arXiv:2606.15791v1 Announce Type: cross Abstract: Random non-hermitian matrix ensembles with double-sided rotation invariance obey, in the limit of large matrix size, the Single Ring Theorem, which states that the support of the mean eigenvalue distribution in the complex plane is either a disk or an annulus. In contrast, rotational-invariant random normal matrix ensembles can have mean eigenvalue densities supported over any number of concentric annuli in the complex plane. In this paper we introduce and investigate, both analytically and numerically, a non-hermitian matrix model which flows from a generic matrix distribution obeying the Single Ring Theorem to a distribution of normal matrices by tuning a parameter which penalizes non-normality. We observe numerically breakdown of the Single Ring Theorem as the model flows towards normality, and determine the critical value of the parameter at which the transition occurs. We also study in detail the behavior of the singular values of these matrices under the flow. These singular values form a Fermi gas confined to the positive half-line. In particular, we find that at small values of the flow parameter, the interparticle spacings in the gas exhibit Wigner-Dyson repulsion, whereas for asymptotically large values of the flow parameter, at the normal matrix endpoint of the flow, the spacing statistics is Poissonian. The flow interpolates continuously between these two types of statistics. However, this change in statistics is not related directly to breaking of the Single Ring Theorem, which occurs very early-on along the flow, in the regime of Wigner-Dyson statistics. Finally, we introduce a certain ensemble of random permutations associated with the gas, and make a conjecture on how to use it in order to reconstruct approximately the average density of complex eigenvalues from that of the singular values in the large-$N$ limit.

09.
arXiv (CS.CV) 2026-06-19

Collaborative Multi-Modal Coding for High-Quality 3D Generation

3D content inherently encompasses multi-modal characteristics and can be projected into different modalities (e.g., RGB images, RGBD, and point clouds). Each modality exhibits distinct advantages in 3D asset modeling: RGB images contain vivid 3D textures, whereas point clouds define fine-grained 3D geometries. However, most existing 3D-native generative architectures either operate predominantly within single-modality paradigms-thus overlooking the complementary benefits of multi-modality data-or restrict themselves to 3D structures, thereby limiting the scope of available training datasets. To holistically harness multi-modalities for 3D modeling, we present TriMM, the first feed-forward 3D-native generative model that learns from basic multi-modalities (e.g., RGB, RGBD, and point cloud). Specifically, 1) TriMM first introduces collaborative multi-modal coding, which integrates modality-specific features while preserving their unique representational strengths. 2) Furthermore, auxiliary 2D and 3D supervision are introduced to raise the robustness and performance of multi-modal coding. 3) Based on the embedded multi-modal code, TriMM employs a triplane latent diffusion model to generate 3D assets of superior quality, enhancing both the texture and the geometric detail. Extensive experiments on multiple well-known datasets demonstrate that TriMM, by effectively leveraging multi-modality, achieves competitive performance with models trained on large-scale datasets, despite utilizing a small amount of training data. Furthermore, we conduct additional experiments on recent RGB-D datasets, verifying the feasibility of incorporating other multi-modal datasets into 3D generation.

10.
arXiv (quant-ph) 2026-06-12

New bounds on private simultaneous quantum message passing

arXiv:2606.12557v1 Announce Type: new Abstract: In the private simultaneous message (PSM) setting, $k$ players obtain inputs $x_i\in\{0,1\}^n$ and then each send messages to a referee, who should learn $f(x_1,...,x_k)$ but no other information about $(x_1,...,x_k)$. The PSM setting was introduced as a minimal model for secure multiparty computation and has connections to Boolean function complexity. In the quantum setting, PSM has been related to non-local quantum computation (NLQC). The communication and correlation cost of implementing PSM remains poorly understood. Here, we give new upper and lower bounds on the (quantum) PSM model. For lower bounds, we show: 1) Nečiporuk's measure lower bounds the entanglement required for $k$-player quantum PSM with perfect correctness. This leads to quadratic lower bounds for explicit functions. 2) The rank of the communication matrix of $f(x_1,x_2)$ lower bounds 2-player quantum PSM with perfect privacy but imperfect correctness. This implies a previously unknown lower bound on classical PSM with imperfect correctness. When allowing quantum communication and shared entanglement, these are the first lower bounds on quantum PSM that make use of the privacy condition. For upper bounds, we show: 1) Letting $s$ be the size of a quantum circuit computing $f$, $d_f$ be the circuit depth, $k$ the number of players, $n$ the number of bits received by each player, and $\epsilon$ a correctness parameter, we obtain $\mathsf{PSM}_k^*(f) \leq (kn +s) \cdot \log^{O(d_f)}(s/\epsilon)$. 2) The square of the Fourier 1 norm of $f$, $\Vert \hat{f}\Vert_1^2$, upper bounds the classical PSM complexity, $\mathsf{PSM}(f)\leq O(\Vert \hat{f} \Vert^2_1)$. In proving the first upper bound, we generalize existing $T$-depth based techniques for NLQC from $2$ to $k\geq 2$ parties, and consider cases where the Clifford layers are restricted to having small light cones.

11.
Science (Express) 2026-05-06

A 481-meter-high landslide-tsunami in a cruise ship–frequented Alaska fjord | Science

Authors: Unknown Author

Early in the morning of 10 August 2025, a >64 × 10 6 m 3 landslide struck Tracy Arm fjord in Alaska. The landslide was preconditioned by glacial retreat caused by climate change. The resulting 481 m runup megatsunami followed an initial 100-m-high breaking wave traveling >70 m s −1 . The landslide was preceded by several days of microseismicity, which increased in rate and magnitude until ~1 hour before failure. The landslide produced globally observed long-period seismic waves equivalent in size to a M5.4 earthquake. A long-period (~66 s) global seismic signal, produced by a landslide-induced seiche trapped within the fjord, persisted for up to 36 hours, the second time a days-long seiche has been thus observed. With fjord regions increasingly visited by cruise ships, and climate change making similar events more likely, this unanticipated, near-miss event highlights the growing risk from landslides and tsunamis in coastal environments.

12.
arXiv (CS.CL) 2026-06-17

HistoRAG: Embedding Historical Methodology in Retrieval-Augmented Generation Through Critical Technical Practice

Retrieval-Augmented Generation (RAG) is the prevailing architecture for grounding language model outputs in external evidence, yet its dominant evaluation paradigms and default configurations remain oriented toward factual question-answering. For interpretive disciplines such as historical studies, RAG embeds assumptions that conflict with scholarly practice. We introduce HistoRAG, a framework that translates historiographical principles into concrete architectural interventions. Separated retrieval and generation decouples source discovery from interpretation, temporal windowing enforces balanced source representation across the research period as a methodological requirement of historical inquiry, and LLM-as-judge evaluation makes relevance judgments transparent and contestable. We evaluate these interventions using SPIEGELragged, applied to 102,189 articles from Der Spiegel (1950-1979). Each intervention addresses a measurable deficiency in standard RAG: era-specific vocabulary retrieves zero chunks from the 1950s when using 1970s terminology, evidence of the temporal skew that motivates windowing; vector similarity and LLM-assessed relevance correlate only weakly (Spearman rho = 0.275), motivating post-retrieval evaluation; and keyword-based and semantic retrieval surface largely disjoint source pools, motivating an architecture in which both operate as complementary retrieval layers under a shared LLM evaluation filter. We also introduce the concept of Zwischentexte (intermediate texts that function as interpretive proposals rather than findings) as a framework for responsible integration of LLM-generated text into scholarly practice. The architecture offers a model for how domain-specific epistemological commitments can be translated into RAG design decisions, and may transfer to other interpretive disciplines working with large corpora.

13.
arXiv (CS.LG) 2026-06-11

Provable Recovery of Locally Important Signed Features and Interactions from Random Forest

arXiv:2512.11081v2 Announce Type: replace-cross Abstract: Feature and Interaction Importance (FII) methods are essential in supervised learning for assessing the relevance of input variables and their interactions in complex prediction models. In many domains, such as personalized medicine, local interpretations for individual predictions are often required, rather than global scores summarizing overall feature importance. Random Forests (RFs) are widely used in these settings, and existing interpretability methods typically exploit tree structures and split statistics to provide model-specific insights. However, theoretical understanding of local FII methods for RF remains limited, making it unclear how to interpret high importance scores for individual predictions. We propose a novel, local, model-specific FII method that identifies frequent co-occurrences of features along decision paths, combining global patterns with those observed on paths specific to a given test point. We prove that our method consistently recovers the true local signal features and their interactions under a Locally Spike Sparse (LSS) model and also identifies whether large or small feature values drive a prediction. We illustrate the usefulness of our method and theoretical results through simulation studies and a real-world data example.

14.
arXiv (CS.AI) 2026-06-15

StainFlow: Entity-Stain Tracking and Evidence Linking for Process Rewards in GUI Agents

arXiv:2606.07027v2 Announce Type: replace Abstract: Reinforcement Learning (RL) has become a promising approach for improving GUI Agents in long-horizon, stochastic digital environments, but trajectory-level success feedback is too sparse to provide reliable credit assignment for intermediate exploration steps. To mitigate this issue, recent studies introduce Process Reward Models (PRMs), which provide finer-grained training feedback through global milestone verification or local step-level evaluation. However, these methods still suffer from two level-specific limitations: global milestone decomposition is subjective and singular, making it difficult to accommodate the multiple valid execution paths in real GUI tasks, while fixed local judging windows may miss long-range key evidence or dilute the decision signal with irrelevant frames. Inspired by stain-tracing mechanisms in network flow analysis, we propose StainFlow, an entity-stain-flow process reward model for GUI Agents. To reduce the subjectivity of global partitioning, we introduce the Global Entity Stain Tracking module, which extracts visually verifiable task entities and tracks how their stain concentrations and states evolve along the trajectory, allowing task phases to be objectively separated by changes in the entity evidence flow. To improve the accuracy of local verification, we introduce the Local Stain Evidence Linking module. Centered on the triggering entities of each candidate key node, it retrieves relevant steps based on their stain concentrations and state changes, and dynamically constructs high-density evidence windows for verifying true key nodes. Extensive experiments on AndroidWorld and OGRBench show that StainFlow relatively improves online RL success by 3.2% and trajectory completion judgment accuracy by 1.8%.

15.
arXiv (CS.CV) 2026-06-15

SED:Lightweight Saliency prediction for Event-based data via Distillation

Event-based saliency prediction has gained attention recently, as combining event cameras with saliency estimation can act as an upstream stage that naturally improves the efficiency of downstream eventbased perception at the edge. However, current approaches are either neuromorphic, underperforming on event-based saliency benchmarks, or too heavy for resource-constrained edge applications due to their reliance on transformers or 3D convolutions. Drawing inspiration from efficient convolutional modules, SED and aiming to exploit the temporal information in event data, we propose a lightweight network, trained through knowledge distillation, built on a Depthwise Spatio-Temporal Block (DSTconv) – a factorization of the 3D depthwise separable convolution. Relative to its teacher, our model reduces the model size from 180 MB to 0.32 MB (562x) and the parameter count from 45M to 81k (554x), while matching or outperforming it on the N-DHF1K and N-UCF Sports datasets. Moreover, it generalizes strongly beyond its training distribution, transferring from synthetic to real event data where a model trained from scratch fails.

16.
arXiv (CS.AI) 2026-06-12

Graph Reduction in Multirelational Networks: A Spreading-Oriented Reduction Benchmark

arXiv:2606.12581v1 Announce Type: cross Abstract: Real-world networks are inherently incomplete, noisy, and dynamically evolving, making it difficult to capture all actors and their relationships. Their scale often renders direct analysis computationally demanding. While influence maximisation (IM) has been widely studied, the role of graph reduction as a preprocessing step, and its impact on IM accuracy, remains underexplored. In this work, we introduce the Spreading-Oriented Reduction Benchmark (SORB), an open-source, standardised framework for systematically evaluating IM models across diverse task settings. SORB provides an extensible pipeline operating on a representative collection of real-world networks, including single- and multilayer structures, and accounts for graph reduction directly into the evaluation process. This design shifts the focus from analysing IM algorithms in isolation to quantifying how graph reduction alters predictive performance. Using SORB, we study the effects of sparsification and coarsening across multiple IM scenarios. Our results show that the impact of reduction is strongly dependent on both the network type (single-layer vs. multirelational) and the downstream task ($Gain@k$ vs. $\mathrm{AUC}_{\mathrm{cutoff}}$): sparsification preserves seed set quality on single-layer networks, whereas flattened multilayer networks exhibit systematic ranking degradation regardless of reduction strategy. These findings highlight the importance of reduction-aware, multi-task evaluation when studying spreading processes in complex networks.

17.
arXiv (CS.CV) 2026-06-15

What Drives Test-Time Adaptation for CLIP? A Controlled Empirical Study from an Update Perspective

Vision-Language Models (VLMs) such as CLIP have become a standard backbone for open-vocabulary recognition, yet their zero-shot predictions remain vulnerable to distribution shifts encountered at deployment. Test-Time Adaptation (TTA) has recently been extended to CLIP as a lightweight solution, leading to a rapidly growing body of TTA4CLIP methods. However, empirical progress in this area has largely outpaced our understanding of what truly drives adaptation, where their gains originate, and under which shifts they remain reliable. In this paper, we take a step back from the pursuit of state-of-the-art accuracy and conduct a systematic controlled study of TTA4CLIP. We first organize existing methods into three unified paradigms according to what is updated at test time. We then introduce TTABC, an open-source TTA Benchmark for CLIP, which standardizes evaluation protocols and integrates more than 20 representative methods. Our controlled empirical analysis focuses on three key areas. First, we determine the driving factors in parameter-based methods, revealing that adaptation gains are primarily driven by test-time evidence and reliable proxies rather than heavy optimization. Second, we explore evidence utilization beyond heavy parameter tuning, showing that competitive and efficient performance can be achieved through cross- or current-sample evidence and lightweight prototype updates. Finally, we demonstrate that there is no silver bullet for TTA: no single adaptation paradigm is universally optimal, and the preferred paradigm depends on the nature of shift. We hope our benchmark and study provide a clearer understanding of the current TTA4CLIP landscape and establish a foundation for further research.

18.
arXiv (CS.AI) 2026-06-16

Running hardware-aware neural architecture search on embedded devices under 512MB of RAM

arXiv:2606.14824v1 Announce Type: cross Abstract: This document proposes a novel approach to hardware-aware neural architecture search (HW NAS) that considers the resources available on the computing platform running it, enabling its execution on various embedded devices. The presented HW NAS produces tiny convolutional neural networks (CNNs) targeting low-end microcontroller units (MCUs), typically involved in the Internet of Things (IoT) or wearable robotics, opening new use cases. A gateway could run it to tailor CNNs' architecture on the acquired data without using external servers, ensuring privacy. The proposed technique achieves state-of-the-art results in the human-recognition tasks on the Visual Wake Word dataset, a standard TinyML benchmark, on several embedded devices.

19.
arXiv (CS.CL) 2026-06-16

PhoneHarness: Harnessing Phone-Use Agents through Mixed GUI, CLI, and Tool Actions

Phone agents are increasingly expected to complete real mobile workflows rather than merely predict the next screen action. However, much of the current mobile-agent literature still evaluates agents primarily as GUI controllers that observe a screen, emit taps and swipes, and are scored by target app state. Real phone-use tasks are broader: they require deciding when to use app GUIs, device-side commands, or structured tools, while leaving evidence that the intended side effect actually occurred. We introduce PhoneHarness, a mixed-action benchmark and execution harness for studying phone-use agents on verifiable mobile workflows. PhoneHarness runs a device-side agent loop over GUI, CLI, and host-side tool actions, combining deterministic action routing with bounded GUI delegation and auditable execution traces. Its benchmark, PhoneHarness Bench, evaluates whether agents complete tasks with observable side effects, not only whether they produce plausible final answers. On the annotated evaluation split, PhoneHarness reaches a 75.0% pass rate, outperforming the strongest non-PhoneHarness settings by 12.9 percentage points. PhoneHarness and PhoneHarness Bench therefore play distinct but mutually dependent roles: the harness makes mixed phone workflows executable, while the benchmark measures whether agents can use that harness reliably and safely. Our findings suggest that reliable phone automation depends on action-surface routing and verifiable execution, not only visual GUI control.

20.
arXiv (CS.LG) 2026-06-17

A Dynamical Systems Perspective on the Analysis of Neural Networks

arXiv:2507.05164v2 Announce Type: replace-cross Abstract: In this chapter, we utilize dynamical systems to analyze several aspects of machine learning algorithms. As an expository contribution we demonstrate how to re-formulate a wide variety of challenges from deep neural networks, (stochastic) gradient descent, and related topics into dynamical statements. We also tackle three concrete challenges. First, we consider the process of information propagation through a neural network, i.e., we study the input-output map for different architectures. We explain the universal embedding property for augmented neural ODEs representing arbitrary functions of given regularity, the classification of multilayer perceptrons and neural ODEs in terms of suitable function classes, and the memory-dependence in neural delay equations. Second, we consider the training aspect of neural networks dynamically. We describe a dynamical systems perspective on gradient descent and study stability for overdetermined problems. We then extend this analysis to the overparameterized setting and describe the edge of stability phenomenon, also in the context of possible explanations for implicit bias. For stochastic gradient descent, we present stability results for the overparameterized setting via Lyapunov exponents of interpolation solutions. Third, we explain several results regarding mean-field limits of neural networks. We describe a result that extends existing techniques to heterogeneous neural networks involving graph limits via digraph measures. This shows how large classes of neural networks naturally fall within the framework of Kuramoto-type models on graphs and their large-graph limits. Finally, we point out that similar strategies to use dynamics to study explainable and reliable AI can also be applied to settings such as generative models or fundamental issues in gradient training methods, such as backpropagation or vanishing/exploding gradients.

21.
arXiv (quant-ph) 2026-06-16

Weak continuous measurements require more work than strong ones

arXiv:2502.09732v4 Announce Type: replace Abstract: Understanding the energy cost of quantum measurement process and its connection to the measurement performance faces the challenge of modeling the objectification process. The latter, turns the measurement result into an objective fact, available to independent observers, and is responsible for the measurement irreversibility. To address this issue, we propose and analyze a dynamical model of quantum measurement, able to capture nonideal (weak and inefficient) measurements. In this model, the objectification is induced by a contact with a macroscopic reservoir at equilibrium which is responsible for the redundant broadcast of the measurement outcome (producing a Spectrum Broadcast Structure (SBS) state) while inducing decoherence in the pointer basis, in the line of the theory of quantum Darwinism. We analyze the performance of the obtained measurement process by introducing figures of merit to quantify the strength of the measurement and its efficiency. We also derive and a lower bound on the measurement work cost that we can relate to the measurement quality. We take as an illustration the readout of a qubit via its coupling to a harmonic oscillator. We investigate the long sequences of extremely short and weak measurements (a.k.a continuous measurements), to find under which conditions they converge to an ideal (projective) measurement and analyze their work cost. Surprisingly, we find that a sequence converging to projective measurement has a much larger work cost than an equivalent strong measurement obtained from a single intense interaction with the apparatus. We extend this result to a large class of models owing to scaling arguments. Our analysis offers new insights into the trade-offs between measurement strength, energy consumption, and information extraction in quantum measurement protocols.

22.
arXiv (CS.LG) 2026-06-19

Pseudo-Formalization for Automatic Proof Verification

arXiv:2605.20531v2 Announce Type: replace-cross Abstract: Reliable verification of proofs remains a bottleneck for training and evaluating AI systems on hard mathematical reasoning. Fully formal proofs, in languages like Lean, are easy to verify because they are unambiguous and modular. Most proofs, particularly those written by AI systems, have neither property, and translating them into formal languages remains challenging in many frontier math settings. We propose Pseudo-Formalization (PF), a proof format that captures the modularity and precision of formal proofs while retaining the flexibility of natural language. A Pseudo-Formal proof is decomposed into self-contained modules, each stating its premises, conclusion, and proof in natural language. To verify the correctness of a regular natural language proof, an LLM translates it to Pseudo-Formal and then verifies each module independently, an algorithm we call Block Verification (BV). We evaluate PF+BV on two benchmarks spanning olympiad and research-level mathematics, where it pareto-dominates LLM-as-judge baselines on error-finding precision and recall. To support future work, we release our research-level proof verification benchmark ArxivMathGradingBench.

23.
bioRxiv (Bioinfo) 2026-06-11

DLDN-Bench: A Benchmark Framework for Deep Learning de Novo Peptide Sequencing in Proteomics

De novo peptide sequencing is an essential approach for analyzing mass spectrometry data because it enables the identification of novel peptides without relying on protein sequence databases. Recent advances in deep learning have substantially improved the performance of de novo sequencing methods, but the rapid emergence of new models has led to heterogeneous evaluation practices and limited comparability. To address this, we introduce DLDN-Bench, a benchmark framework including a set of benchmark datasets derived from human muscle biopsy mass spectrometry data retrieved from PRIDE and annotated through consensus across multiple widely used database search engines. Using these datasets, we systematically benchmark recent deep learning-based de novo sequencing tools alongside traditional approaches. Performance is assessed using established metrics, including precision and coverage relative to a pseudo-ground truth defined by cross-engine agreement. To demonstrate the utility of DLDN-Bench, we benchmark four recent deep learning models and make all results publicly available. This benchmark framework provides a standardized basis for comparing state-of-the-art methods and offers an extensible resource for evaluating future tools in de novo peptide sequencing.

24.
Nature (Science) 2026-06-17

Mapping the neuronal building blocks of human language with language models

Authors:

Humans can convey new and highly diverse information through language. This ability to form and combine words into elaborate phrases and sentences enables us to express inexhaustible meanings and is fundamental to human cognition1–5. However, understanding the microscopic cellular building blocks and cortical landscape that precisely underlie human language has remained a challenge. Here we used wide-scale single-neuronal recordings combined with natural language processing models to identify fine-grained linguistic representations across the human frontotemporal cortex during language production. We find that, whereas certain neurons represented the detailed grammatical relationships between words or their parts of speech, others tracked the sentences’ higher-order syntactic structure, their phrase transitions and sequence. Collectively, these neurons reliably captured the words’ syntactic and semantic properties but also dynamically incorporated their specific sentence contexts, therefore enabling them to encode information combinatorially and at highly granular levels of detail. We show how these cell populations were locally organized and how their microscale representations differed from that of their wider field potential patterns. We also show how these neurons were distributed broadly across the frontotemporal cortex, but how their ability to encode linguistic information was left-lateralized and varied between cortical regions. Together, these findings identify some of the most basic cellular building blocks by which linguistic information is encoded in humans and begin to define the cortical landscape of language at a combined micro (cellular), meso (local population) and macro (regional) scale. Wide-scale recordings reveal neurons in the human brain that encode fundamental components of language such as the grammatical relationships between words, their parts of speech and the higher-order syntactic structure of phrases and sentences.