Academic Intelligence · Curated Daily

Explore the Frontier of Global Academia

AcademicHub aggregates real-time literature from top journals and preprint platforms. Build your personal research radar and let large language models compile cross-disciplinary analysis briefings automatically.

01.
arXiv (quant-ph) 2026-06-12

Quantum Network Routing based on Surface Code Error Correction

arXiv:2606.12781v1 Announce Type: new Abstract: Quantum networks encounter unavoidable channel noises and erasure errors, presenting a huge obstacle in designing protocols that attain both high reliability and efficiency. Typically, quantum networks fall into two categories: those utilize quantum entanglements for quantum teleportation, and those directly transfer the actual quantum messages. In this paper, we present SurfNet, a quantum network that inherits the main advantages from both categories. It employs surface codes as logical qubits for encoding messages, and utilizes two parallel communication channels to fault-tolerantly transfer each surface code in a modular manner. Our approach of using surface codes can timely correct both operational and photon loss errors within the network, and the integration of the two channels within the network can greatly improve network throughput. For the implementation of SurfNet, we propose a novel network architecture, designed to better integrate surface codes into quantum networks. We also propose a novel error correction decoder, designed to fully utilize the modular characteristic of surface codes within our network. Simulation results demonstrate that SurfNet with its decoder significantly enhances the communication fidelity within quantum networks.

02.
arXiv (CS.AI) 2026-06-17

SketchXplain: Intuitive Visual Explanations of Image Classifiers with Sketches

arXiv:2606.17646v1 Announce Type: cross Abstract: Saliency map visualizations explain image-based AI predictions by pointing to regions, but these are often unintuitive and semantically unclear, leaving an interpretability gap. We argue that AI explanations should be intuitive – coherent to user knowledge, yet simple and selective to accelerate interpretation. Inspired by artistic drawings, we propose SketchXplain to generate sketch-based visual explanations for intuitive image-based explainable AI (XAI). Combining techniques in saliency maps, concept-bottleneck models, and sketch optimization, SketchXplain integrates saliency to select coherent observation artifacts, concepts for knowledge coherence, cues to represent them, and abstraction for simplicity. Evaluating on face expression recognition, modeling and user studies showed that SketchXplain supported quicker interpretation with more aligned visualizations than saliency maps or simple drawings. Further evaluation on skin lesion diagnosis found that SketchXplain more coherently visualized disease symptoms, better supporting lay diagnosis. Thus, this work illustrates the value of sketches for intuitive, simple, coherent, and quick image-based XAI visualizations.

03.
arXiv (quant-ph) 2026-06-24

Wavelet Matrix Product States for Quantum Fields

arXiv:2606.23823v1 Announce Type: new Abstract: We introduce a variational method to solve continuum quantum models with discrete tensor network techniques. The method leverages wavelet matrix product states (wMPS): matrix product states built on top of sufficiently regular ($N\geq 6$) Daubechies scaling functions. These states live in the continuum field theory Fock space, have finite energy density, and can be optimized with standard algorithms, without restriction to free theories. Further, exploiting the multi-resolution analysis built into wavelets, and its quantum circuit description, we can iteratively refine wMPS to obtain accurate approximations at arbitrarily fine length-scales. We showcase the efficiency of the method on the Lieb-Liniger model, computing energy density and correlation functions.

04.
arXiv (CS.LG) 2026-06-16

Semi-Supervised Speech Confidence Detection using Pseudo-Labelling and Whisper Embeddings

arXiv:2606.16505v1 Announce Type: cross Abstract: Understanding speaker confidence is crucial in educational settings, as it can enhance personalised feedback and improve learning outcomes. This study introduces a novel framework for detecting speaker confidence by integrating human-engineered features with embeddings from the Whisper encoder. To address data limitations, a pseudo-labelling technique is employed to expand the labelled dataset, allowing the model to learn from both human-annotated and model-generated labels. The framework combines traditional speech features including pitch, volume, rate of speech, and the presence of disfluencies and stress, with Whisper embeddings, and uses a co-attention mechanism to fuse these representations and achieve an overall accuracy of 75%. This study contributes to advancing speech analysis, enabling applications that support personalised learning and speaking skill development.

05.
bioRxiv (Bioinfo) 2026-06-10

Is level-1 blob reconstruction under the network multispecies coalescent easy?

Authors:

Hybridization is an important evolutionary process, commonly modeled by the network multispecies coalescent. Reconstructing evolutionary histories under this model is notoriously costly, even for level-1 networks where hybridization events are isolated from each other. The widely used methods that combine speed with statistical guarantees rely on quartet concordance factors computed for all subsets of four species, resulting in an o(n^4k) bottleneck that severely limits scalability to large numbers of species (n) and genes (k). Among quartet-based methods, NANUQ+ is notable because it decomposes the problem into two steps: first reconstructing a tree of blobs, which compresses each non-treelike part of the network, called a blob, into a single vertex, and second reconstructing the internal structure of each level-1 blob, specifically its circular order and hybrid vertex. Here, we investigate whether level-1 blob reconstruction is difficult once the tree of blobs is known. We present a fast and statistically consistent algorithm, called NetCS, based on two simple primitives: majority voting and merge sort, circumventing the bottleneck of computing all quartet concordance factors. In simulations, NetCS achieved comparable accuracy to NANUQ+ and was dramatically faster, enabling analyses of 200 taxa and 1000 genes in only a few minutes. Both methods attained near-perfect accuracy when given the true tree of blobs; however, their performance degraded in end-to-end pipelines due to errors in tree of blobs reconstruction. Strikingly, even methods that reconstruct level-1 networks directly struggled to accurately predict hybrid ancestry. Our results suggest that reconstructing level-1 blobs is unexpectedly easy once the tree of blobs is known, and that a major challenge for phylogenetic network inference lies in accurate tree of blobs reconstruction.

06.
arXiv (CS.CV) 2026-06-17

Bridging Spatial And Frequency Views For Disaster Assessment: Benefits And Limitations

Rapid assessment of building damage from satellite imagery is essential for effective disaster response and recovery. While most deep learning methods rely on spatial-domain features, frequency-domain representations can capture complementary structural cues such as debris patterns and collapse-induced textures. This study presents a controlled comparison of spatial-domain, frequency-domain, and dual-domain deep learning approaches for multi-class building damage classification using post-disaster imagery from the xView2 (xBD) dataset. To ensure fairness, all models are built on an EfficientNet-B0 backbone and trained under identical settings, differing only in their input representations and fusion strategies. Performance is evaluated using accuracy, macro F1-score, per-class metrics, and confusion matrices. Results show that dual-domain models provide measurable improvements over single-domain approaches. The dual spatial configuration achieves the highest test accuracy (0.4688) and lowest loss, while the spatial-only model attains the best macro F1-score (0.4254), indicating more balanced class performance. In contrast, frequency-only models perform worst and exhibit overfitting, suggesting limited generalization. Despite these gains, all models struggle to detect subtle damage levels, particularly the Minor class, due to class imbalance and fine-grained visual ambiguity. While dual-domain approaches improve detection of severe damage, challenges remain. These findings highlight the benefits and limitations of hybrid representations and motivate future work on data balancing, advanced fusion, and regularization.

07.
arXiv (CS.AI) 2026-06-18

EffiNav: Fusing Depth and Vision-Language for Efficient Object Goal Navigation

arXiv:2606.18634v1 Announce Type: cross Abstract: To locate a target object while exploring the unknown environment is a fundamental capability for autonomous agents, with applications ranging from search-and-rescue to field robots. A simplified version of such task is Object Goal Navigation (ObjNav). In ObjNav, successful arrival at the target object provides a basic measure of performance; however, the efficiency of the navigation trajectory is equally important, as it indicates how intelligently the agent explores and how much time remains for subsequent tasks. In unknown environments, the key to efficient navigation lies in deciding where to explore next. While many prior works aim to address this core challenge and achieved promising performance in certain settings, recent training-based models and non-training frameworks still suffer from generalization and efficiency issues respectively, which in the worst cases can lead to excessive exploration of already-visited areas or redundant back-and-forth motion. We evaluate EffiNav on two widely used simulation benchmarks Habitat Matterport 3D (HM3D) and Open-Vocabulary Object goal Navigation (OVON), and further validate its effectiveness on physical robots in real-world settings. We conduct failure analysis on massive simulation episodes. With minimal modification, we also extend EffiNav to a memory-augmented ObjNav task on the GOAT-BENCH dataset, demonstrating its adaptability beyond standard ObjNav settings. Across two standard metrics–Success Rate (SR) and Success weighted by Path Length (SPL), EffiNav matches or outperforms recent baselines, reflecting its efficiency, robustness, and practical applicability. Recognizing the different emphases of the two datasets, the performances reveals this framework is more balanced and generalizable for efficient ObjNav.

08.
arXiv (CS.LG) 2026-06-19

Recurrent neural networks approximate continuous functions

arXiv:2606.20325v1 Announce Type: new Abstract: Classical approximation theorems ask for a new neural network whenever the target accuracy is improved. This paper studies the opposite possibility: can the network be chosen once and for all, and can accuracy be bought only by letting it run longer? We prove that this is possible for every continuous function on [-1,1]. More precisely, each such function is uniformly approximated by the time evolution of a single ReLU recurrent neural network with fixed weights and fixed hidden dimension. The mechanism behind the construction is a new intermediate model, the Turing machine with neural units (TMNU). This model retains the algorithmic freedom needed to implement polynomial approximation schemes, while remaining rigid enough to be simulated by RNNs with explicit bounds on hidden dimension and weight magnitude. The resulting convergence rates reflect the underlying polynomial approximation rates. We complement the construction with minimax lower bounds showing that runtime is not merely a proof artifact, but an unavoidable resource in this fixed-network approximation paradigm.

09.
arXiv (CS.LG) 2026-06-18

Investigating Inductive Biases for Machine Learning Emulation of Sudden Stratospheric Warmings in Idealised Isca Simulations

arXiv:2606.18857v1 Announce Type: new Abstract: Machine-learning emulators are increasingly used for weather prediction and have the potential to extend skill on subseasonal-to-seasonal timescales by learning dynamically important sources of predictability. A key challenge is whether the models can exploit predictability anchors, such as stratospheric variability, that influence tropospheric circulation beyond short lead times. We test how architectural inductive bias affects emulation of sudden stratospheric warming (SSW) dynamics using paired idealised Isca simulations that differ only in an imposed wave-2 heating perturbation. Across convolutional, transformer, and graph-based architectures trained for one-step prediction, model differences are modest when the stratosphere is dynamically quiet but widen substantially when SSW-like variability is active. Our results identify explicit three-dimensional vertical coupling as a key inductive bias for machine-learning emulation of stratospheric dynamics. However, Eliassen-Palm flux diagnostics show that low forecast error does not guarantee physically faithful wave-mean-flow interaction, with coherent errors remaining in stratospheric wave-driving structure.

10.
arXiv (quant-ph) 2026-06-15

Link-Free Multi-Node Timing Synchronization for Scalable Quantum Networking

arXiv:2606.14077v1 Announce Type: new Abstract: Precise timing synchronization is essential for distributed quantum networking, enabling entanglement distribution, quantum teleportation, and entanglement swapping across remote nodes. Existing synchronization architectures rely on dedicated timing-distribution infrastructure, most notably White Rabbit networks, which constrain topology, scalability, and deployment in free-space and satellite environments. Here we demonstrate link-free synchronization of quantum network nodes using independently operating miniature rubidium atomic clocks and computational post-processing. We validate the approach on a deployed metropolitan-scale telecom fiber network spanning three geographically separated nodes. Following drift correction, atomic-clock-based synchronization achieves timing performance approaching that of a White Rabbit benchmark and remains stable over continuous 8-hour operation. As a stringent test of quantum-network functionality, we observe Hong-Ou-Mandel interference across spatially separated nodes with visibility exceeding 70%, statistically equivalent to that obtained using dedicated White Rabbit timing links. To the best of our knowledge, this represents the first observation of quantum interference across a deployed metropolitan-scale telecom fiber network synchronized entirely without dedicated timing-transfer infrastructure. These results establish atomic-clock-based synchronization as a scalable, topology-independent alternative to conventional timing-distribution architectures and a practical pathway toward terrestrial, airborne, and space-based quantum networks where dedicated timing links are unavailable.

11.
arXiv (CS.CV) 2026-06-24

ObsGraph: Hierarchical Observation Representation for Embodied Reasoning and Exploration

Embodied reasoning and exploration are increasingly considered crucial abilities for robots operating in complex and unfamiliar environments. To accomplish tasks in such settings, an agent must identify and acquire the information necessary for the task through exploration. We propose ObsGraph, an observation-centric hierarchical scene graph that unifies scene representation, retrieval, and exploration. It retains visual evidence and organizes it into room-view-object layers: rooms provide coarse semantic anchors, views preserve contextual object covisibility, and objects store fine-grained details. On top of this representation, we perform coarse-to-fine hierarchical retrieval under a bounded budget, and crucially use retrieval outcomes to structure the exploration candidate space–activating room-level exploration, view refinement, or frontier exploration–thereby tightly coupling representation, retrieval, and adaptive multi-scale exploration. Experiments across embodied reasoning and exploration benchmarks demonstrate improved success and efficiency, highlighting the benefits of structured scene representation and more targeted information gathering driven by identified evidence gaps.

12.
arXiv (CS.AI) 2026-06-19

Flickering Multi-Armed Bandits

arXiv:2602.17315v3 Announce Type: replace-cross Abstract: We introduce Flickering Multi-Armed Bandits (FMAB) to model sequential decision-making in environments with changing action availability, where accessibility of the next action is restricted to a subset dependent on the agent's current choice. We formalize these constraints through stochastically evolving graphs where actions are limited to local neighborhoods. This mobility-constrained structure imposes a dual challenge: the statistical requirement of information acquisition and the physical overhead of navigation. We analyze FMAB under i.i.d. Erdős–R'enyi and Edge-Markovian process, proposing a two-phase lazy random walk algorithm for robust exploration. We establish high-probability sublinear regret bounds and prove near-optimality via a matching information-theoretic lower bound. Our results characterize the intrinsic cost of learning under local-move constraints, complemented by a robotic disaster-response simulation.

13.
arXiv (CS.AI) 2026-06-18

Closing the Loop: PID Feedback Control for Interpretable Activation Steering in Symbolic Music Generation

arXiv:2606.18790v1 Announce Type: cross Abstract: Transformer-based architectures have significantly advanced the generation of complex symbolic sequences, yet a significant gap remains in achieving fine-grained, interpretable control over discrete signal attributes. This paper investigates the mechanistic interpretability of the Multitrack Music Transformer (MMT) and proposes a framework for deterministic attribute modulation without retraining to bridge this gap via inference-time activation steering. Utilizing the Difference-in-Means (DiffMean) methodology, we isolate latent directions for signal attributes, specifically Pitch and Duration, within the residual stream. We validate the Linear Representation Hypothesis in this domain, achieving high correlation between steering magnitude and attribute shift. To address the inherent feature entanglement in multi-attribute steering, we introduce a Dual Steering framework utilizing Gram-Schmidt Orthogonalization. Experimental results demonstrate that this geometric decoupling reduces conceptual interference and signal degradation compared to naive vector addition, enabling independent deterministic control even against strong autoregressive conditioning.

14.
arXiv (CS.CV) 2026-06-15

ZipSplat: Fewer Gaussians, Better Splats

Feed-forward 3D Gaussian Splatting methods reconstruct a scene from posed or pose-free images in a single forward pass, yet current approaches predict one Gaussian per input pixel, tying the representation budget to camera resolution rather than scene complexity. A flat wall and a richly textured object thus produce equally many Gaussians despite very different geometric needs. We propose ZipSplat, a token-based feed-forward model that decouples Gaussian placement from the pixel grid. A multi-view backbone extracts dense visual tokens, and k-means clustering compresses them into a compact set of scene tokens. Cross- and self-attention refine these tokens, and a lightweight MLP decodes each into a group of Gaussians with unconstrained 3D positions. Because clustering is applied at inference, a single trained model spans the quality-efficiency curve without retraining. ZipSplat operates without ground-truth poses or intrinsics, yet sets a new state of the art on DL3DV and RealEstate10K with ${\sim}6{\times}$ fewer Gaussians than pixel-aligned methods, surpassing the best pose-free baseline by 2.1dB and 1.2dB PSNR, respectively. It further generalizes zero-shot to Mip-NeRF360 and ScanNet++, outperforming all comparable baselines. Our project page is at https://veichta.com/zipsplat.

15.
arXiv (CS.CL) 2026-06-16

Why Tree-Style Branching Matters for Thought Advantage Estimation in GRPO

Group Relative Policy Optimization (GRPO) trains Chain-of-Thought reasoning with verifiable rewards, but estimating thought-level advantages without value functions often suffers from high variance. Although tree-style branching is used in practice to reduce variance, it lacks a theoretical explanation of why it works and whether it is important or potentially necessary. We study thought-level advantage estimation in GRPO from a variance perspective under a minimal tree-style setting where multiple continuations are sampled for each thought. Using the multivariate delta method, we reveal a sampling-dimension asymmetry. Increasing sampled thoughts ($K$) leaves a strictly positive estimation-variance floor, whereas increasing continuations per thought ($M$) drives the leading-order estimation variance to zero at rate $1/M$. This implies that, within the fixed-temperature GRPO-style estimator without value models studied here, accurate thought-level advantage estimation cannot be achieved by scaling thought sampling alone, making continuation-level branching a principled and potentially necessary mechanism rather than a heuristic. Experiments further provide empirical evidence for its effectiveness and potential necessity, demonstrating improved optimization stability, training efficiency, and final performance not only in math but also across vision domains and under different model architectures and sizes.

16.
arXiv (CS.AI) 2026-06-19

Cost-Optimal LLM Routing with Limited User Feedback under User Satisfaction Guarantees

arXiv:2606.19376v1 Announce Type: cross Abstract: Inference costs for large language model (LLM) applications are rapidly growing, driven by surging demand and rising infrastructure cost. Users expect high-quality responses, and in commercial settings this is formally codified in Service Level Agreements (SLAs), creating a fundamental tension between cost and quality. Recent progress on cost-aware LLM request routing has shown potential to resolve this tension, but existing approaches rely on complete feedback signals, offline training, extensive per-workload tuning, and most lack SLA guarantees or inference-time adaptivity. We introduce SLARouter, an online routing algorithm that learns a cost-optimal policy from the sparse, one-sided user feedback available in production systems. SLARouter provides theoretical guarantees for both cost optimality and strict SLA compliance. Experiments across a wide range of LLM benchmarks show that SLARouter satisfies SLA constraints without the need for per-benchmark tuning, reducing operating cost by up to 2.2x over existing baselines.

17.
arXiv (CS.AI) 2026-06-19

Automated Standardization of Legacy Biomedical Metadata Using an Ontology-Constrained LLM Agent

arXiv:2604.08552v2 Announce Type: replace-cross Abstract: Scientific metadata are often incomplete and noncompliant with community standards, limiting dataset findability, interoperability, and reuse. Even when standard metadata reporting guidelines exist, they typically lack machine-actionable representations. Producing FAIR datasets requires encoding metadata standards as machine-actionable templates with rich field specifications and precise value constraints. Recent work has shown that LLMs guided by field names and ontology constraints can improve metadata standardization, but these approaches treat constraints as static text prompts, relying on the model's training knowledge alone. We present an LLM-based metadata standardization system that queries standard reporting guidelines and authoritative biomedical terminology services in real time to retrieve canonically correct standards on demand. We evaluate this approach on 839 legacy metadata records from the Human BioMolecular Atlas Program (HuBMAP) using an expert-curated gold standard for exact-match assessment. Our evaluation shows that augmenting the LLM with real-time tool access consistently improves prediction accuracy over the LLM alone across both ontology-constrained and non-ontology-constrained fields, demonstrating a practical approach to automated standardization of biomedical metadata.

18.
arXiv (CS.LG) 2026-06-19

MassSpecGym in the Wild: Uncovering and Correcting Evaluation Pitfalls in AI-Driven Molecule Discovery

arXiv:2606.19624v1 Announce Type: new Abstract: Reliable benchmarking is critical for developing machine learning models for tandem mass spectrometry (MS/MS) based molecule discovery. Subtle issues in experimental design and model evaluation procedures can degrade the trustworthiness of such benchmarks and lead to erroneous conclusions. We conduct a thorough review of model evaluation issues in the recent MS/MS machine learning literature, using the standard MassSpecGym benchmark suite as a case study to illustrate the impact of these issues. We find evaluation issues in at least 17 of 26 papers reporting MassSpecGym benchmark results in the first year of its adoption. We isolate three classes of failures: (i) data leakage, (ii) shortcut learning, and (iii) implementation bugs and metric divergence. Through extensive experimentation and code replication, we quantify the impact of these issues and show how they corrupt the evaluation standards MassSpecGym was designed to enforce. We distill our findings into recommendations generalizable to MS/MS challenges, benchmarks, and custom evaluation setups. We also release MassSpecGym v1.5, an implementation of our recommendations in the MassSpecGym benchmarking suite which addresses the failure modes identified in this audit. MassSpecGym v1.5 is publicly available at https://github.com/pluskal-lab/MassSpecGym.

19.
arXiv (quant-ph) 2026-06-11

Emergent Bell Phase in an Electro-Nanomechanical Quantum Simulator

arXiv:2511.02613v2 Announce Type: replace Abstract: Suspended carbon nanotubes hosting electrostatically defined quantum dots allow for exceptionally strong and tunable electromechanical coupling as well as mechanical modes that can reach the quantum ground state of motion simply by cryogenic cooling. This makes them a unique platform for quantum simulation of electron-phonon coupling. Here, we propose an experimentally realisable setup with two such carbon nanotubes in parallel, each hosting four quantum dots. Our system not only exhibits phonon-mediated electron-electron attraction, but also supports a robust, maximally entangled Bell phase at mesoscopic scales shared across the subsystems. These features highlight its potential as a simulator of strongly correlated quantum systems.

20.
arXiv (CS.LG) 2026-06-24

Anticipating the Optimism Gap: Predicting Distribution-Shift Degradation of RF-Impairment Detectors from In-Distribution Statistics

arXiv:2606.22054v2 Announce Type: replace-cross Abstract: Detectors for GNSS radio-frequency impairments (jamming, spoofing, multipath) are usually reported with a single AUC measured on the distribution they were tuned on. That number falls once conditions move, and the size of the drop is rarely known in advance because labelled field data is scarce. We ask whether this optimism can be predicted before any out-of-distribution data is seen. On an open, parameter-grounded synthetic testbed with a tunable severity shift, we evaluate thirteen detectors (five physics baselines, full-feature logistic regression and multilayer perceptrons, and single-feature learned controls) across four impairment classes. The optimism gap, the difference between in-distribution and shifted AUC, grows monotonically as the shift deepens (mean Spearman correlation 0.50). It is driven by how many observables a detector uses rather than by whether it is learned, and it varies systematically by class. Centrally, a ridge model built only from in-distribution score statistics predicts the gap for a detector it has never seen (R^2 = 0.47) and for an impairment class it has never seen (R^2 = 0.46); both are significant against a 2000-fold permutation null (p < 0.001) and survive removing the feature that is, by construction, part of the target. The headline findings are synthetic. We then run the pre-registered protocol on three open field corpora: on Jammertest 2024 the cross-detector prediction holds (R^2 = 0.11, p = 0.009), and on SatGrid, whose spoofer power sweep gives a calibrated severity axis, in-distribution AUC overstates higher-severity AUC by up to 0.22 and to the point of sign inversion, with in-distribution AUC and realised gap perfectly rank-correlated (Spearman rho = 1.0). The mechanism survives contact with real data, at smaller magnitude than in simulation. We release the testbed, a software-receiver front end, the ingest adapters and the protocol.

21.
medRxiv (Medicine) 2026-06-17

Wearable-Grade Lead Reduction Disproportionately Degrades ECG AI Performance in Elderly Patients: Evidence from PTB-XL and MIT-BIH

Consumer wearable devices increasingly use single-lead electrocardiograms (ECGs) for cardiac monitoring, but these signals contain substantially less spatial information than the clinical 12-lead standard. Whether this reduction dispro- portionately affects older adults, who often present with more complex cardiac conditions, remains poorly understood. In this study, we evaluated the impact of lead reduction on AI-ECG diagnostic performance across age groups. A 1D resid- ual neural network was trained on 21,091 PTB-XL ECG recordings spanning five diagnostic superclasses and assessed using 12-, 6-, 2-, and 1-lead configurations. Under the full 12-lead setting, model accuracy declined from 84.5% in patients younger than 40 years to 66.2% in patients aged 75 years or older. Progressive lead reduction further widened this gap. Under the 1-lead configuration, accuracy decreased by 14.1 percentage points in the 75+ group but by only 0.4 percent- age points in the

22.
arXiv (CS.LG) 2026-06-17

Half a Link can Be Enough to Predict a Whole Link: Understanding Generalization in Knowledge Graph Foundation Models

arXiv:2606.18001v1 Announce Type: new Abstract: Knowledge graph (KG) foundation models (KGFMs) are zero-shot generalizers: trained once, they can predict links on unseen graphs without retraining. However, understanding when and how they can robustly generalize across KGs is still an open question. In this paper, we shed some light on their generalization mechanisms highlighting how their performance on unseen KGs is not uniform when it comes to partially seen links, which we call half-links. In fact, we show that to predict a test triple $(h,r,t)$ it might suffice in practice to have observed the half-link $(h,r)$ or $(r,t)$ in the inference graph. This yields a taxonomy of four scenarios when combinations of these half-links are observed or not. In a rigorous stratified analysis over these scenarios, we reveal that SoTA KGFMs use seen half links for predictions, while unseen half-links pose different challenges. As such, our finer-grained taxonomy can be a diagnostic protocol for robust KGFM generalization and highlights where novel KGFMs can improve.

23.
arXiv (CS.LG) 2026-06-15

Binary Black Hole Parameter Estimation with Hybrid CNN-Transformer Neural Networks

arXiv:2606.13941v1 Announce Type: cross Abstract: The detection of gravitational waves has revolutionized our ability to explore fundamental aspects of the Universe. Traditionally, modeled gravitational-wave signals have been identified using template-based matched filtering, followed by coincidence analysis across multiple detectors in the signal-to-noise ratio time series. Recent advances in Machine Learning and Deep Learning have sparked growing interest in their application to both signal detection and parameter estimation. In this study, a hybrid Deep Learning strategy is proposed that leverages the effectiveness of Transformer encoders alongside well-established Convolutional Neural Network architectures in an attempt to estimate the intrinsic and extrinsic parameters of non-precessing binary black hole systems. The primary focus of this work is point estimation, producing single best-fit values for each parameter rather than full posterior distributions. This method is evaluated on both simulated signals embedded in Gaussian noise and real gravitational-wave events, and it demonstrates strong predictive performance and robustness across key astrophysical parameters.

24.
arXiv (CS.AI) 2026-06-24

Weight-Space Geometry of Offline Reasoning Training

arXiv:2606.23740v1 Announce Type: cross Abstract: Offline reinforcement-learning losses (RFT, RIFT, DFT, Offline GRPO, DPO) are widely used to distill reasoning from large teachers into smaller students, and are typically compared on downstream accuracy alone. We ask whether they are mechanistically distinct or converge to a similar weight update. Training six methods (SFT, RFT, DFT, RIFT, Offline GRPO, DPO) on identical math rollouts from a single base model (Qwen3-4B) with attention-only LoRA, we analyze the resulting deltas via cosine similarity, principal-angle subspace analysis, linear mode connectivity, and CKA. We observe: (i) SFT, RFT, and RIFT have nearly colinear weight deltas (cosine >= 0.97, top-1 principal angle ~7 deg median over 144 modules) and comparable GSM8K accuracy (87-88%, n=1319; pairwise McNemar p >= 0.15); (ii) DFT diverges further in direction than any reward-weighted method despite using the same data; (iii) Offline GRPO adds a substantial component orthogonal to the SFT direction (~67% globally, up to ~86% in late layers) while staying in the SFT loss basin; (iv) DPO sits in a near-orthogonal subspace, shows a mode-connectivity barrier, and collapses late-layer CKA to ~0.46. DPO also reaches the highest accuracy in our protocol on both GSM8K (93.5%, McNemar p < 10^-9 vs. each other method) and AIME26 (30.0% vs. 3.3-10.0%); its training uses a 10x smaller learning rate than the others (the standard convention), so the update-norm and accuracy gaps reflect loss-function and optimizer choices jointly, and a learning-rate-matched DPO comparison is left for future work.

25.
arXiv (CS.LG) 2026-06-24

Forget Without Compromise: Nexus Sampling for Streaming KV-Cache Eviction Under Fixed Budgets

arXiv:2606.23961v1 Announce Type: new Abstract: Long-context and agentic LLM workloads push the KV cache past any fixed memory budget, forcing the inference stack to permanently evict tokens at every step of a continuous-inference stream. Existing methods all share the same template, a per-step direct-attention score followed by deterministic top-$K$ selection, which converts a single below-cutoff step into an irreversible verdict and permanently erases any subtly important token that direct attention cannot single out from noise. To address this challenge, we propose Nexus Sampling, a training-free eviction method that pairs Nexus scoring, an iterative walk over direct attention that surfaces bridge tokens, with weighted reservoir sampling, which retains tokens with inclusion probability in place of deterministic top-$K$. Theoretically, we show that Nexus Sampling dominates deterministic top-$K$ in long-run survival of subtly important tokens. Empirically, at 80% KV cache eviction, Nexus Sampling matches dense attention within 1% on LongBench while outperforming top-$K$ baselines on retrieval-heavy tasks, with up to 10x smaller per-sequence cache memory.