Academic Intelligence · Curated Daily

探索全球前沿学术脉络

AcademicHub 汇聚顶级期刊与预印本平台的实时文献。定制您的专属科研雷达,利用大语言模型自动生成交叉领域文献分析简报。

01.
arXiv (math.PR) 2026-06-16

A small noise approximation for Muller's Ratchet

arXiv:2606.15842v1 Announce Type: new Abstract: We consider an infinite system of SDEs with Fleming-Viot noise indexed by $k=0,1,2,\dots$, whose parameters $\alpha,\lambda$, and $\nu$ are the (deleterious) selection coefficient, the (uni-directional) mutation rate, and a quantity which determines the size of the system's fluctuations. The SDE's unique weak solution $X(t) = (X_k(t))_{k=0,1,2,...}$ models what is known in population genetics as Muller's ratchet. Here, $X_k(t)$ stands for the frequency of individuals carrying $k$ deleterious mutations. Since the mutation process is uni-directional, $t\mapsto \inf\{k: X_k(t)> 0\}$ is non-decreasing for almost every path of $X$, and we refer to an increase as a click of Muller's ratchet. A long standing question concerns the clicking rate of Muller's ratchet. Using Duhamel's principle for semigroups, we give a partial answer by approximating $E(\sum_{k=1}^\infty kX_k(t) )$ and $E\big(X_0(t)\big)$ up to $O(1/\nu^2)$ for fixed $\alpha$, $\lambda$ and $t>0$. Our results suggest that $\psi:=\nu \alpha e^{-\lambda/\alpha}$ is a crucial quantity also when the mutation/selection ratio $\theta = \lambda/\alpha$ is moderately large: for large $\nu \alpha$, clicking of the ratchet on the time scale $\frac 1\alpha \log \theta$ becomes rare as soon as $\psi$ becomes large.

02.
arXiv (CS.AI) 2026-06-11

SirenFNO: Efficient and Full Frequency Learning of Fourier Neural Operators

arXiv:2606.11518v1 Announce Type: cross Abstract: Fourier neural operators (FNOs) are effective and efficient surrogates for approximating solutions of PDEs and generalize across discretizations. However, owing to the reliance on frequency truncation to maintain learning efficiency of FNOs, empirical studies suggest that FNOs exhibit spectral bias toward low-frequency information, which may hinder the learning capability especially for certain PDEs with strong high-frequency oscillations. To address this limitation, we propose SirenFNO, a novel framework that leverages sinusoidal representation networks (SIRENs) to learn implicit neural representations and performs mode-wise kernel parameterization. Our SIREN parameterization learns a full-grid spectrum with a constant and discretization-independent parameter count, thereby eliminating the need for frequency truncation. We further extend SirenFNO with functional tensor decompositions to enhance parameter and learning efficiency. Empirical results show that our SirenFNO consistently outperforms FNO with approximately $4$ to $15$ times parameter reductions with preserved discretization invariance, and our functional decomposition variants obtain performance improvements with a maximum of $73$ times fewer parameters across multiple PDE benchmarks.

03.
arXiv (CS.CL) 2026-06-15

Residual Context Diffusion Language Models

Diffusion Large Language Models (dLLMs) have emerged as a promising alternative to purely autoregressive language models because they can decode multiple tokens in parallel. However, state-of-the-art block-wise dLLMs rely on a "remasking" mechanism that decodes only the most confident tokens and discards the rest, effectively wasting computation. We demonstrate that recycling computation from the discarded tokens is beneficial, as these tokens retain contextual information useful for subsequent decoding iterations. In light of this, we propose Residual Context Diffusion (RCD), a module that converts these discarded token representations into contextual residuals and injects them back for the next denoising step. RCD uses a decoupled two-stage training pipeline to bypass the memory bottlenecks associated with backpropagation. We validate our method on both long CoT reasoning (SDAR) and short CoT instruction following (LLaDA) models. We demonstrate that a standard dLLM can be efficiently converted to the RCD paradigm with merely ~300 million tokens. RCD consistently improves frontier dLLMs by 4-11 percentage points in accuracy with minimal extra computation overhead across a wide range of benchmarks. Notably, on the most challenging AIME tasks, RCD nearly doubles baseline accuracy and attains up to 4-5x fewer denoising steps at baseline's peak accuracy.

04.
arXiv (CS.AI) 2026-06-17

SketchXplain: Intuitive Visual Explanations of Image Classifiers with Sketches

arXiv:2606.17646v1 Announce Type: cross Abstract: Saliency map visualizations explain image-based AI predictions by pointing to regions, but these are often unintuitive and semantically unclear, leaving an interpretability gap. We argue that AI explanations should be intuitive – coherent to user knowledge, yet simple and selective to accelerate interpretation. Inspired by artistic drawings, we propose SketchXplain to generate sketch-based visual explanations for intuitive image-based explainable AI (XAI). Combining techniques in saliency maps, concept-bottleneck models, and sketch optimization, SketchXplain integrates saliency to select coherent observation artifacts, concepts for knowledge coherence, cues to represent them, and abstraction for simplicity. Evaluating on face expression recognition, modeling and user studies showed that SketchXplain supported quicker interpretation with more aligned visualizations than saliency maps or simple drawings. Further evaluation on skin lesion diagnosis found that SketchXplain more coherently visualized disease symptoms, better supporting lay diagnosis. Thus, this work illustrates the value of sketches for intuitive, simple, coherent, and quick image-based XAI visualizations.

05.
arXiv (quant-ph) 2026-06-15

Nanostructure modelling with early fault tolerant quantum computers

arXiv:2606.06442v2 Announce Type: replace Abstract: Semiconductor nanostructures are central to many developing technologies. Notably, double quantum dots are especially important for semiconductor spin-qubit architectures, quantum sensing applications, and quantum-dot solar cells. Accurate modelling is highly desirable but conventional methods can struggle when dynamics involve more than two interacting electrons. In this work, we present a quantum simulation framework capable of addressing multi-electron double quantum dots. We adopt an efficiently scaling 1$^st$ quantised representation of the system and develop algorithms based on both Trotterisation and Qubitisation. Incorporating insights from classical simulations enables us to produce resource estimates that are more realistic than those obtained from theoretical error bounds. Using a standard surface code model with physical noise at $10^{-3}$, our results indicate that the ground-state energy of four electrons in a double quantum dot can be estimated in approximately 22 hours using 226k physical qubits, or an eight-electron system in 3.3 days with 314k qubits (with runtimes falling dramatically when more qubits are available). We anticipate that incorporating recent advances in surface code architectures may reduce these costs significantly further. Our results suggest that early fault-tolerant quantum computers may become valuable tools for designing mature-era quantum technologies.

06.
arXiv (CS.AI) 2026-06-17

Models Take Notes at Prefill: KV Cache Can Be Editable and Composable

作者:

arXiv:2606.17107v1 Announce Type: cross Abstract: Prefix caching reuses prefill only across an exactly shared prefix, so one changed field invalidates the entire downstream cache. Yet overwriting the field's own key/value vectors and reusing the rest leaves the model acting on the old value. The reason, established causally across four model families: at prefill the model has already written the field-conditioned conclusion onto downstream notes; the field's own key/value drives under 1% of the decision. Read as a notebook of memoized conclusions, two capabilities follow. (1) It is editable. A salient erratum amends the notes; and with chain-of-thought, editing the field alone recovers the decision (1.00 at 8B, ~1% compute), while without CoT it is ignored. (2) It is composable. The notes are position-portable, so a precompiled skill can be RoPE-repositioned and spliced into any context, indistinguishable from full recompute (logit cosine 0.90-0.999, twelve models) at O(L) rather than O(L^2) time-to-first-token. A unified edit+compose agent stays decision-identical to recompute at up to 14.9x lower latency. The approach applies to any per-token attention KV cache, validated across scale, quantization, Mixture-of-Experts, and multimodal caches, and extends to several attention variants through small adapters. Because the erratum is append-only, it composes with production prefix caching: in an online vLLM benchmark it keeps the prefix cache-aligned (98.5% hit-rate), cutting p90 time-to-first-token by 53-398x.

07.
arXiv (CS.LG) 2026-06-15

On the Geometry and Optimization of Polynomial Convolutional Networks

arXiv:2410.00722v3 Announce Type: replace Abstract: We study convolutional neural networks with monomial activation functions. Specifically, we prove that their parameterization map is regular and is an isomorphism almost everywhere, up to rescaling the filters. By leveraging on tools from algebraic geometry, we explore the geometric properties of the image in function space of this map - typically referred to as neuromanifold. In particular, we compute the dimension and the degree of the neuromanifold, which measure the expressivity of the model, and describe its singularities. Moreover, for a generic large dataset, we derive an explicit formula that quantifies the number of critical points arising in the optimization of a regression loss.

08.
arXiv (CS.AI) 2026-06-12

ARMOR-MAD: Adaptive Routing for Heterogeneous Multi-Agent Debate in Large Language Model Reasoning

arXiv:2606.13197v1 Announce Type: new Abstract: Multi-agent debate (MAD) can improve large language model reasoning, but fixed debate pipelines often waste computation and can amplify correlated errors among similar agents. We propose ARMOR-MAD, a training-free heterogeneous MAD framework that treats debate as conditional computation. ARMOR-MAD combines three components: Pre-debate Agreement Routing (PAR) decides whether independently generated Round-0 answers require debate; Early Agreement Stopping Evaluator (EASE) stops debate after convergence; and Semantic Outlier Detection (SOD) down-weights abnormal final answers during aggregation. Across MATH Level 5, GSM8K, MMLU, and MMLU-Pro, ARMOR-MAD consistently improves over fixed-round heterogeneous debate with the same model pool, reaching 65.5\%, 96.5\%, 90.0\%, and 81.5\% accuracy, respectively. The results suggest that genuine model heterogeneity and agreement-based control are both important for making MAD more accurate and efficient.

09.
arXiv (CS.CL) 2026-06-11

Massive Open-Vocabulary Keyword Spotting

Automatic speech recognition systems have been shown to under-perform when it comes to transcribing words rarely seen in the training data, namely specialized terminology. Open-vocabulary keyword spotting, combined with contextual biasing, has been shown to mitigate this issue. However, existing systems can only handle glossaries of a few hundred terms without becoming an infeasible bottleneck. We propose a system that stores features with a memory footprint up to 128 times smaller than a comparable baseline and allows users to process massive databases while remaining open-vocabulary. Without fine-tuning the speech recognition model, our system achieves a comparable entity recall as uncompressed solutions, even in languages not seen during training.

10.
arXiv (CS.CV) 2026-06-17

DRIVESPATIAL: A Benchmark for Spatiotemporal Intelligence in VLMs for Autonomous Driving

Spatiotemporal intelligence in autonomous driving (AD) requires an agent to integrate multi-view observations into a coherent scene representation, maintain object continuity across viewpoints and time, and reason about spatial relations, interactions, and future dynamics. However, existing AD vision-language benchmarks largely focus on single-view, static, ego-centric, or single-source question answering, leaving it unclear whether current Vision-Language Models (VLMs) can truly construct and reason over dynamic driving scenes. We introduce DriveSpatial, a benchmark of 15.6K human-verified QA pairs across 20 tasks from five large-scale AD datasets. DriveSpatial evaluates four abilities: Cognitive Scene Construction, Multi-view Relational Understanding, Temporal Reasoning, and Generalization. Unlike prior benchmarks, DriveSpatial is generated from a dynamic multi-relational scene graph that encodes object states, spatial relations, interactions, camera visibility, and temporal correspondences, enabling QA pairs that enforce genuine cross-view and spatiotemporal reasoning. Evaluating 15 representative VLMs reveals a substantial human-model gap: the strongest model trails humans by 28.4 points, with Cognitive Scene Construction emerging as the key bottleneck. Further diagnostics show that language-only prompting is insufficient, while explicit BEV grounding consistently improves performance. These results suggest that current VLMs lack the scene-construction ability needed for reliable spatiotemporal driving intelligence. DriveSpatial and its construction pipeline will be released to support future research.

11.
medRxiv (Medicine) 2026-06-12

Integrative Mechanisms of Early Clinical and Research Training (ECART) in Orthopaedic Medical Education: A Qualitative Single-Case Study

Background: Early clinical exposure and student participation in research are important components of medical training. They may support learning motivation, evidence literacy, and self-directed learning. In many programmes, however, clinical training and research training remain separated. Few studies have explained, within a real teaching team, how learners turn clinical phenomena into researchable questions and how research participation can reshape their clinical understanding. Early Clinical and Research Training (ECART) is a clinical-research integration approach developed by an orthopaedic team at the Second Hospital of Shandong University. Methods: We conducted a theory-informed, interpretivist qualitative single-case study. The case was an orthopaedic clinical-research team at the Second Hospital of Shandong University. Participants included medical undergraduates, academic degree graduate students, professional degree graduate students, clinical teachers, and research platform leads. We used purposive sampling with maximum variation. Data were collected through semi-structured interviews and de-identified teaching documents. Data were analysed using the framework method and were interpreted with a Context-Activity-Mechanism-Outcome (CAMO) logic. Results: The analysis showed that ECART was not simply early entry into the clinic or early entry into the laboratory. It was a team-based learning process centred on real medical problems. Four themes were identified. First, early clinical exposure helped learners make real problems visible and nameable, rather than merely increasing exposure. Second, clinical-research connection followed different pathways. Professional degree graduate students often started from clinical uncertainties in residency training and case management, and moved toward evidence-informed small projects. Academic degree graduate students often started from literature gaps, experimental findings, and mechanistic hypotheses, and then used clinical feedback to calibrate meaning. Third, research training, through literature reading, group meetings, experimental design, data review, and mentor questioning, helped learners move from completing tasks to explaining problems. Fourth, sustained ECART depended on a tiered team ecology formed by clinical teachers, research mentors, research platforms, and senior peers. Based on these findings, we refined the ECART programme theory: real medical problems are translated through explanation, searching, experimentalisation, and feedback-based reinterpretation into research questions that learners can understand, discuss, and test. This process supports problem formation, evidence awareness, mechanistic reasoning, translational judgement, and career clarification. Conclusion: ECART is best understood as a clinical-research integrated learning ecology that emerges from real team practice, rather than as a fixed standardised course. Its educational value lies in a recurring cycle of real problems, research translation, multi-source feedback, and clinical reinterpretation. This framework may inform the design, evaluation, and contextual adaptation of clinical-research integration pathways in medical education.

12.
arXiv (CS.CL) 2026-06-15

The Holistic Storage of Verb+Up Phrases in Text-based and Audio-based Language Models

A crucial aspect of linguistic capability is the ability to trade off between stored representations and abstract knowledge: one must retrieve learned representations, but also generate novel ones by applying productive rules. While recent work has examined abstract knowledge in language models, holistic storage of multi-word units has received far less attention. We probe internal representations in text-based LLMs and an ASR model, testing whether V+up phrasal verbs develop distinct representations as a function of frequency and predictability. All models show evidence of holistic storage driven by frequency and predictability, further supporting usage-based theories of language.

13.
arXiv (CS.CL) 2026-06-19

IHUBERT: Vector-Based Semantic Deduplication and Domain-Balanced Pretraining for Persian Resources

Persian pretrained language models (PLMs) are still limited by the scarcity of large-scale, high-quality pretraining corpora and by insufficient evaluation beyond standard classification and NER tasks. We present IHUBERT, a monolingual Persian PLM trained from scratch with the RoBERTa-base encoder (125M parameters) on a 45 GB curated subset of the Sepahr-Danesh collection (about 7-8B tokens). To improve corpus quality and reduce redundancy, we employ a multi-stage preprocessing pipeline that includes normalization, exact and near-duplicate removal, anonymization, and vector-database-based semantic deduplication for distribution balancing control across domains and registers. We additionally train a 139k-vocabulary BPE tokenizer on the full pretraining corpus to better capture Persian morphology and orthographic variation. IHUBERT is evaluated on seven Persian NLU benchmarks covering NER, sentiment analysis, topic classification, NLI, extractive question answering, and relation extraction, using task-standard metrics (entity-level F1, Macro-F1, EM/F1). IHUBERT achieves its strongest gains on extractive QA, ranking first on both PQuAD (F1 88.3542) and ParsiNLU-RC (F1 49.0987), and attains the best result on FarsTail (Macro-F1 0.8350). On NER and topic classification, it remains competitive (e.g., 0.8308 F1 on ParsTwiNER; 0.7953 Macro-F1 on DigiMag), while relation extraction remains the main remaining gap (0.6684 Macro-F1 on PERLEX). A controlled tokenizer ablation on the IHUBERT pretraining corpus shows that BPE yields slightly lower subword fragmentation than WordPiece at matched vocabulary size, supporting our tokenization design. Overall, IHUBERT advances Persian language modeling through semantically curated large-scale pretraining and broad evaluation across both classification and comprehension-oriented tasks.

14.
arXiv (CS.CV) 2026-06-11

Continual Learning with Support Boundary Experience Blending

Continual learning (CL) seeks to mitigate catastrophic forgetting when models are trained with sequential tasks. A common approach, experience replay (ER), stores past exemplars but only sparsely approximates the data distribution, yielding fragile and oversimplified decision boundaries. We address this limitation by introducing Support Boundary Data (SBD), generated via differential-privacy-inspired noise into latent features to create boundary-adjacent representations that implicitly regularize decision boundaries. Building on this idea, we propose Experience Blending (EB), a framework that jointly trains on exemplars and SBD through a dual-model aggregation strategy. EB has two components: (1) latent-space noise injection to generate support boundary data, and (2) end-to-end training that jointly leverages exemplars and SBD. Unlike standard experience replay, SBD enriches the feature space near decision boundaries, leading to more stable and robust continual learning. Extensive experiments on CIFAR-10, CIFAR-100, Tiny ImageNet, and ImageNet1K demonstrate consistent accuracy improvements of 10%, 6%, 13%, 2%, respectively.

15.
arXiv (CS.LG) 2026-06-11

Last-Iterate Convergence of Optimistic Multiplicative Weight Update

arXiv:2606.11773v1 Announce Type: cross Abstract: Optimistic Gradient Descent Ascent (OGDA) and Optimistic Multiplicative-Weights Update (OMWU) are two very popular algorithms to solve convex/concave saddle-point problems, where OMWU is the non-Euclidean, entropic version of OGDA. It is known since the '80s that the last iterate of OGDA asymptotically converges to a saddle point in smooth problems. On the other hand, it is unknown if OMWU has the same property. In this paper, I show that OMWU converges asymptotically for smooth convex-concave saddle-point problems, with a small enough constant learning rate. The result does not require uniqueness, strict complementarity, an error bound, or initialization near a solution. The main new ingredient is a boundary argument showing that every cluster point satisfies the inactive-coordinate KKT inequalities. The boundary argument was discovered with assistance from ChatGPT and is documented in the appendix.

16.
arXiv (CS.CV) 2026-06-12

A Machine Learning Framework for Real-Time Personalized Ergonomic Pose Analysis

This paper introduces a new methodology for real-time prediction of ergonomic and non-ergonomic human poses using volumetric video data in three dimensions. Although the methodology was designed for ergonomic assessments, it can be adapted to other applications requiring real-time analysis of human posture. One aspect that makes this system stand out is its ability to analyze 3D point clouds during the assessment, enabling computation from multiple angles. This overcomes a critical limitation of cameras which provide often a fixed viewpoint, thereby restricting the data available for a thorough postural evaluation, especially when occlusions occur. The system continuously and automatically performs pose inference using the chosen perspective on the real-time streaming data; however, only the poses manually selected and labeled by the user are used to train the personalized deep learning classifier. The methodology has been refined through a case study in which RGB-D cameras captured subjects performing load-lifting tasks, enabling real-time skeletal labeling. The model was trained on this data and, following the training phase, performs inference on new streaming data in real time. This research offers a scalable and pragmatic approach for real-time ergonomic evaluation by combining state-of-the-art 3D data technologies and traditional 2D pose estimation algorithms. It addresses the increasing need for safety and health monitoring in workplace environments, marking a notable contribution to the domain.

17.
arXiv (CS.AI) 2026-06-16

Boosting Knowledge Graph Foundation Models via Enhanced Negative Sampling

arXiv:2605.27023v2 Announce Type: replace Abstract: Knowledge graphs (KGs) have become the core backbone of numerous downstream tasks such as question answering and recommender systems. However, despite all this, KGs are often very incomplete. To perform zero-shot knowledge graph completion in unseen KGs, which have different relational vocabularies from those used for pre-training, KG foundation models (KGFMs) receive a wide range of attention. Existing KGFMs often perform training using random negative triples, which are constructed by replacing the head or tail entity of a positive triple with a random entity. However, these negative triples are often constructed with limited quality, providing weak supervision for KGFM training. In this paper, we propose a simple yet effective adaptive negative sampling approach, KMAS, to enhance existing KGFMs. KMAS constructs hard negative triples through the updated relation embeddings generated from the existing KGFM's relation encoder. To further adaptively align with the evolving capability of the KGFM during the training process, KMAS adjusts the ratio of hard negative triples dynamically throughout the whole training process: after a warmup phrase, it increases the ratio linearly and then decreases linearly. Extensive experiments are conducted over 44 data sets. Experimental results demonstrate that our proposed negative sampling method can enhance many SOTA KGFMs without requiring excessive additional time or memory consumption.

18.
arXiv (CS.CV) 2026-06-16

A Multi-Center Benchmark for Abdominal Disease Diagnosis and Report Generation from Non-Contrast CT

Multiphasic contrast-enhanced CT (CECT) is widely used for abdominal lesion characterization, yet it carries inherent risks of contrast-induced nephropathy, escalates acquisition burden, and heavily contributes to radiologist workload. To address these challenges, we introduce a novel multi-center benchmark for multi-organ abdominal disease diagnosis and automated radiology report generation, which learns to synthesize contrast-enhanced findings from single-phase non-contrast CT (NCCT). To support this, we curated a large-scale dataset of paired NCCT-CECT studies and their corresponding contrast-enhanced radiology reports from two centers, partitioned into internal sets and an external validation cohort. Under a unified evaluation protocol, we benchmarked five contemporary deep learning architectures encompassing chest-specific, abdomen-specific, and general-purpose multimodal domains. Extensive experiments demonstrate that NCCT retains diagnostic signals, achieving an average multi-organ AUC of 69.1% on the internal cohort and 63.1% on the external cohort, respectively. By releasing this dataset and standardized benchmark publicly, this study aims to catalyze future research into safer, resource-efficient, and globally accessible contrast-free abdominal imaging workflows. Code is available at: https://github.com/xmed-lab/TriALS-Report.

19.
arXiv (CS.CL) 2026-06-16

Vocabulary Dropout for Curriculum Diversity in LLM Co-Evolution

Co-evolutionary self-play, where one language model generates problems and another solves them, promises autonomous curriculum learning without human supervision. In practice, the proposer quickly converges to a narrow distribution of problems that satisfy the reward function. This diversity collapse renders the curriculum uninformative for the solver, stalling the co-evolutionary loop. We introduce vocabulary dropout, a random mask applied to the proposer's output logits during both policy training and curriculum generation, as a lightweight mechanism to sustain diversity. The mask is hard and non-stationary, preventing the proposer from locking into fixed token sequences. Training Qwen3-4B and Qwen3-8B on mathematical reasoning via R-Zero, we find that vocabulary dropout sustains proposer diversity across lexical, semantic, and functional metrics throughout training. It also yields solver improvements averaging +4.4 points at 8B, with the largest gains on competition-level benchmarks. Our findings suggest that explicit action-space constraints, analogous to the structural role that game rules play in classical self-play, can help sustain productive co-evolution in language. Vocabulary dropout is one simple instantiation of this principle.

20.
arXiv (CS.LG) 2026-06-12

Detecting Explanatory Insufficiency in Learned Representations: A Framework for Representational Vigilance

arXiv:2606.13172v1 Announce Type: new Abstract: Learned representations are central to modern machine learning and are commonly evaluated through predictive performance, robustness, uncertainty estimation, or generalization. However, a learned representation may remain operationally successful while progressively failing to organize persistent residual structures that are not fully captured by conventional evaluation metrics. This article introduces VER, the Vigilant Evaluator of Representations, a conceptual framework for monitoring representational adequacy in learned representations. VER does not propose a new learning algorithm, loss function, or model architecture. Instead, it formalizes a diagnostic process through which persistent residual structures may be identified, analyzed, and interpreted as potential indicators of explanatory insufficiency. The framework distinguishes representational inadequacy from ordinary prediction error, uncertainty, noise, and distribution shift. It introduces a monitoring sequence based on representation identification, explanatory-domain delimitation, residual-structure detection, explanatory-resistance evaluation, and vigilance signaling. VER is intended as a contribution to representation diagnostics in machine learning. Its objective is not to replace existing evaluation methods but to complement them by treating representational adequacy as an explicit object of inquiry. A path toward empirical evaluation through representational-vigilance benchmarks is also outlined.

21.
arXiv (CS.AI) 2026-06-16

Topological Flow Matching

arXiv:2606.15897v1 Announce Type: cross Abstract: Flow matching is a powerful generative modeling framework, valued for its simplicity and strong empirical performance. However, its standard formulation treats signals on structured spaces, such as fMRI data on brain graphs, as points in Euclidean space, overlooking the rich topological features of their domains. To address this, we introduce topological flow matching, a topology-aware generalization of flow matching. We interpret flow matching as a framework for solving a degenerate Schrödinger bridge problem and inject topological information by augmenting the reference process with a Laplacian-derived drift. This principled modification captures the structure of the underlying domain while preserving the desirable properties of flow matching: a stable, simulation-free objective and deterministic sample paths. As a result, our framework serves as a drop-in replacement for standard flow matching. We demonstrate its effectiveness on diverse structured datasets, including brain fMRIs, ocean currents, seismic events, and traffic flows.

22.
Nature (Science) 2026-06-10

Mitochondria tethered to the nucleus secure its energy supply

Direct interactions between the cell’s powerhouses and nuclear pores might channel energy straight into the nucleus, fuelling cell division and differentiation. Direct interactions between the cell’s powerhouses and nuclear pores might channel energy straight into the nucleus, fuelling cell division and differentiation.

23.
arXiv (CS.AI) 2026-06-18

Model Collapse Is Not a Bug but a Feature in Machine Unlearning for LLMs

arXiv:2507.04219v5 Announce Type: replace-cross Abstract: Current unlearning methods for LLMs optimize on the private information they seek to remove by incorporating it into their fine-tuning data. We argue this not only risks reinforcing exposure to sensitive data, but also fundamentally contradicts the principle of minimizing its use. As a remedy, we propose a novel unlearning method-Partial Model Collapse (PMC), which does not require unlearning targets in the unlearning objective. Our approach is inspired by recent observations that training generative models on their own generations leads to distribution collapse, effectively removing information from model outputs. Our central insight is that model collapse can be leveraged for machine unlearning by deliberately triggering it for data we aim to remove. We theoretically analyze that our approach converges to the desired outcome, i.e. the model unlearns the data targeted for removal. We empirically demonstrate that PMC overcomes four key limitations of existing unlearning methods that explicitly optimize on unlearning targets, and more effectively removes private information from model outputs while preserving general model utility. Overall, our contributions represent an important step toward more comprehensive unlearning that better aligns with real-world privacy constraints. Code available at https://www.cs.cit.tum.de/daml/partial-model-collapse/.

24.
arXiv (CS.CL) 2026-06-16

Does Traversal Order Matter? A Systematic Study of Tree Traversal Methods in Transformer Grammars

Transformer Grammars (TGs) enhance language modeling by incorporating syntactic tree structures. Despite the potentially significant impact on model performance of how syntactic trees are linearized in TGs, existing studies rely solely on Depth-First Traversal (DFT) for linearization. In this paper, we expand the traversal design space by exploring Breadth-First Traversal (BFT) and a novel hybrid traversal strategy, Production-Rule Traversal (PRT), which combines the structural lookahead of BFT with the early lexical generation of DFT. We integrate these traversal methods with varying tree configurations and masking strategies, and empirically evaluate their performance on language modeling, syntactic generalization and summarization. We reveal the inherent trade-offs between nested composition and global lookahead, providing actionable recommendations for designing task-aware Transformer Grammars.

25.
arXiv (CS.CV) 2026-06-19

CrossFlow: One-Step Generation Across Latent and Pixel Spaces

Most diffusion and flow-matching generators define the prior, probability path, and prediction target in the same representation space. Latent diffusion improves efficiency by moving this path into an autoencoder latent space, but the final sample is still produced by a separately trained decoder. This separation creates a mismatch: the generator is optimized for latent-space prediction, while final quality depends on how the decoder handles generated latents that may differ from clean encoder outputs. We introduce CrossFlow, a cross-space flow formulation that maps noisy latent inputs directly to pixel-space images. The key technical step is a velocity-free one-step objective: the latent trajectory defines the training path, but the supervised prediction is an image rather than a latent displacement. This lets one model act both as a one-step latent-to-pixel generator and as a decoder replacement for latent diffusion pipelines. On class-conditional ImageNet-1k at $256\times256$, CrossFlow-XL achieves 1.62 FID with one function evaluation. Ablations show that the latent encoder and pixel-space perceptual and adversarial losses are important for fidelity. These results indicate that cross-space flow objectives can combine the efficiency of latent representations with direct pixel-space supervision, without requiring a separate decoder at inference.