Academic Intelligence · Curated Daily

Explore the Frontier of Global Academia

AcademicHub aggregates real-time literature from top journals and preprint platforms. Build your personal research radar and let large language models compile cross-disciplinary analysis briefings automatically.

01.
arXiv (CS.LG) 2026-06-17

Overcoming the Incentive Collapse Paradox

arXiv:2603.27049v2 Announce Type: replace-cross Abstract: AI-assisted task delegation is increasingly common, yet human effort in such systems is costly and typically unobserved. Recent work by Bastani and Cachon (2025); Sambasivan et al. (2021) shows that accuracy-based payment schemes suffer from incentive collapse: as AI accuracy improves, sustaining positive human effort requires unbounded payments. We study this phenomenon in a budget-constrained principal-agent framework with strategic human agents whose output accuracy depends on unobserved effort. Our first contribution is a general impossibility result showing that incentive collapse is not merely a limitation of simple linear payments, but arises for any payment rule based only on observed task accuracy.To overcome this barrier, we propose a sentinel-auditing payment mechanism that enforces a strictly positive and controllable level of human effort at finite cost, independent of AI accuracy. Building on this incentive-robust foundation, we develop an incentive-aware active statistical inference framework that jointly optimizes (i) the auditing rate and (ii) active sampling and budget allocation across tasks of varying difficulty to minimize the final statistical loss under a single budget. Experiments demonstrate improved cost-error tradeoffs relative to standard active learning and auditing-only baselines.

02.
arXiv (CS.AI) 2026-06-19

What Do Safety-Aligned LLMs Learn From Mixed Compliance Demonstrations?

arXiv:2606.20508v1 Announce Type: new Abstract: Prior work has shown that in-context demonstrations can jailbreak language models, but it remains unclear how models interpret different types of compliance demonstrations. We study this by mixing benign compliance demonstrations (non-harmful request, helpful response) with harmful compliance demonstrations (harmful request, helpful response) and testing three hypotheses about how demonstration composition drives harmful compliance. Across four models, we find that benign and harmful demonstrations are not interchangeable: benign demonstrations can either reduce or increase harmful compliance depending on the model. We further show that preference optimization is the critical training stage that prevents benign demonstrations from increasing harmful compliance, that demonstration ordering exhibits strong recency bias, and that models differ in how refusal interacts with in-context learning: some adopt demonstrated formatting even when refusing, while others override all in-context signals upon refusal. Taken together, this work moves beyond showing that demonstration-based jailbreaking works to characterizing how it works: what models extract from compliance demonstrations depends on demonstration content, ordering, and training methodology.

03.
arXiv (math.PR) 2026-06-24

Gradient Mean-Field Dynamics with Measure-Valued States: Well-Posedness, Chaos, and Long-Time Stability

arXiv:2606.24385v1 Announce Type: new Abstract: We study a stochastic mean-field interacting particle system whose state space is $\Y = \Tt^d \times \cP(U)$, where the first component represents a spatial variable and the second one is a probability measure over a compact metric space $U$. The dynamics are driven by locally Lipschitz drift operators: the spatial component evolves according to a Brownian diffusion, while the measure-valued component is perturbed by a projected cylindrical noise acting in the Arens–Eells space. We first establish existence and uniqueness of strong solutions for both the $N$-particle system and the associated nonlinear McKean–Vlasov equation under locally Lipschitz and linear growth assumptions on the drift coefficients. We then prove propagation of chaos: as $N\to\infty$, the empirical measure converges in expectation in Wasserstein–1 distance towards the unique McKean–Vlasov solution. Further, we investigate exponential convergence of the nonlinear McKean–Vlasov dynamics towards a unique invariant measure.

05.
medRxiv (Medicine) 2026-06-12

The Acceptability of Three Co-Created Peer Support Interventions for People Living with Leprosy Reactions in Indonesia: A Mixed-Methods Pilot Study

Background: Leprosy reactions (LR) are immune-mediated complications associated with disability, emotional distress, and social isolation. We identified a gap in affected-individual-informed interventions that aim to improve the management of LR in healthcare settings. To address this gap, we assessed the acceptability of three peer-support interventions co-created with people affected by LR in Indonesia. Methods: Using an interactive learning and action approach, we co-created peer counselling, telesupport groups, and participatory video interventions which were piloted in an urban hospital and 13 rural community clinics. A mixed-methods design was applied with interviews, focus group discussions, and pre-post assessments involving four participant groups. Data were analyzed thematically using an acceptability framework. Results: One hundred participants were enrolled, and 92 completed the pilot intervention between November 2022 and July 2023. Qualitative findings showed that all interventions were acceptable. Peer counselling provided emotional reassurance through shared experiences and was perceived as trustworthy and supportive. Perceived burdens differed by setting, with time constraints in urban facilities and geographical barriers in rural clinics. Knowledge improved significantly among participants of peer counselling and telesupport groups in rural settings. Telesupport groups facilitated connection, information exchange, and continuity of care. Digital access and literacy limited participation for some, particularly in rural areas. The participatory video was perceived as reassuring and informative. Improvements in knowledge, attitude, practices, and mental well-being domain scores were observed among urban participants, but responses in rural settings showed less change. Participants and co-implementers reported increased self-efficacy, participants confidence to perform required behaviors within peer support interventions, with effects shaped by intervention and setting. Conclusions: The three co-created peer-support interventions were acceptable for individuals with LR in diverse healthcare settings. These outcomes highlight the importance and effectiveness of selective, and context-sensitive implementation of one or more peer-support modalities.

06.
arXiv (CS.CV) 2026-06-11

From Correspondence to Actions: Human-Like Multi-Image Spatial Reasoning in Multi-modal Large Language Models

While multimodal large language models (MLLMs) have made substantial progress in single-image spatial reasoning, multi-image spatial reasoning, which requires integration of information from multiple viewpoints, remains challenging. Cognitive studies suggest that humans address such tasks through two mechanisms: cross-view correspondence, which identifies regions across different views that correspond to the same physical locations, and stepwise viewpoint transformation, which composes relative viewpoint changes sequentially. However, existing studies incorporate these mechanisms only partially and often implicitly, without explicit supervision for both. We propose Human-Aware Training for Cross-view correspondence and viewpoint cHange (HATCH), a training framework with two complementary objectives: (1) Patch-Level Spatial Alignment, which encourages patch representations to align across views for spatially corresponding regions, and (2) Action-then-Answer Reasoning, which requires the model to generate explicit viewpoint transition actions before predicting the final answer. Experiments on three benchmarks demonstrate that HATCH consistently outperforms baselines of comparable size by a clear margin and achieves competitive results against much larger models, while preserving single-image reasoning capabilities.

07.
arXiv (CS.LG) 2026-06-17

Learning Credal Ensembles via Distributionally Robust Optimization

arXiv:2602.08470v3 Announce Type: replace Abstract: Credal predictors are models that are aware of epistemic uncertainty and produce a convex set of probabilistic predictions. They offer a principled way to quantify predictive epistemic uncertainty (EU) and have been shown to improve model robustness in various settings. However, most state-of-the-art methods mainly define EU as disagreement caused by random training initializations, which mostly reflects sensitivity to optimization randomness rather than uncertainty from deeper sources. To address this, we define EU as disagreement among models trained with varying relaxations of the i.i.d. assumption between training and test data. Based on this idea, we propose CreDRO, which learns an ensemble of plausible models through distributionally robust optimization. As a result, CreDRO captures EU not only from training randomness but also from meaningful disagreement due to potential distribution shifts between training and test data. Empirical results show that CreDRO consistently outperforms existing credal methods on tasks such as out-of-distribution detection across multiple benchmarks and selective classification in medical applications.

08.
arXiv (CS.AI) 2026-06-16

Provenance-Enhanced Statements in Knowledge Graphs

arXiv:2606.15246v1 Announce Type: cross Abstract: Provenance-enhanced statements of the form "according to $X$, $\varphi$" are pervasive in contemporary knowledge graphs, especially in domains where graph content primarily represents claims, interpretations, and hypotheses (capta) rather than observer-independent facts (data). Current provenance models can record who asserted what, but they typically treat provenance as semantically neutral, leaving underspecified how attributed claims relate to factual commitment, to one another, and to reasoning. In this paper we introduce DEC, a framework that interprets provenance predicates as indicators of epistemic stance and groups provenance-homogeneous sets of statements into cognitive worlds. Drawing on cognitive modal logics (doxastic, epistemic, and conjectural), DEC characterizes locality, rationality, and controlled permeation between cognitive worlds and a distinguished factual core ("reality"), thereby enabling principled reasoning over attributed content without collapsing disagreements into inconsistencies. We formalize a DEC interpretation for RDF datasets that is conservative over RDF~1.2 semantics, clarify the role of intensionality and identity (including the Superman paradox), and illustrate the approach on common Semantic Web representations (named graphs, quoted triples/RDF-star, and reification). Finally, we describe our prototype DEC reasoner implemented as a Fuseki dataset module, supporting controlled factualisation and explicit detection of disagreements and delusions.

09.
arXiv (CS.LG) 2026-06-15

Compressed Computation is (probably) not Computation in Superposition

arXiv:2606.14673v1 Announce Type: new Abstract: We study whether the Compressed Computation (CC) toy model (Braun et al., 2025) is an instance of computation in superposition. The CC model appears to compute 100 ReLU functions with just 50 neurons, achieving a better loss than expected from only representing 50 ReLU functions. We show that the model mixes inputs via its noisy residual stream, corresponding to an unintended mixing matrix in the labels. Splitting the training objective into the ReLU term and the mixing term, we find that performance gains scale with the magnitude of the mixing matrix and vanish when the matrix is removed. The learned neuron directions concentrate in the subspace associated with the top 50 eigenvalues of the mixing matrix, suggesting that the mixing term governs the solution. Finally, a semi-non-negative matrix factorization (SNMF) baseline derived solely from the mixing matrix reproduces the qualitative loss profile and improves on prior baselines, though it does not match the trained model. These results suggest CC is not a suitable toy model of computation in superposition.

10.
arXiv (quant-ph) 2026-06-16

Benchmarking Quantum Computers via Protocols, Comparing IBM's Heron vs IBM's Eagle

arXiv:2603.04377v3 Announce Type: replace Abstract: As quantum computing hardware rapidly advances, objectively evaluating the capabilities and error rates of new processors remains a critical challenge for the field. A clear and realistic understanding of current quantum performance is essential for guiding research priorities and driving meaningful progress. In this work, we apply and extend a protocol-based benchmarking methodology (Meirom, Mor, Weinstein Arxiv 2505.12441) that utilizes well-defined \underline{quantumness} thresholds. By evaluating performance at protocol level rather than the gate level, this approach provides a transparent and intuitive assessment of whether specific quantum processors, or isolated sub-chips within them, can demonstrate a practical quantum advantage. To illustrate the utility of this method, we compare two generations of IBM quantum computers: the older Eagle architecture and the newer Heron architecture. Our findings reveal the genuine operational strengths and limitations of these devices, demonstrating substantial performance improvements in the newer Heron generation. This work was made possible by IBM Quantum policies that enable independent and objective assessment of its quantum computers and sub-chips. We strongly encourage other companies to emulate the independent qubit availability and the fair pricing that allow researchers to perform such assessments.

11.
arXiv (CS.AI) 2026-06-12

Evaluation Sovereignty in Metadata-Driven Classification: A Multi-Track Framework for Weakly Supervised Information Systems

arXiv:2606.13436v1 Announce Type: new Abstract: Evaluation in machine learning is typically treated as a neutral measurement process. However, in operational information systems, evaluation outcomes are often conditioned by the processes used to generate labels. This paper does not seek to improve classification performance. Instead, it examines the validity of performance measurement under differing label-authority regimes. This issue is particularly relevant in large-scale metadata-driven systems, where labels are often incomplete, inconsistent, or weakly supervised. We introduce evaluation sovereignty, defined as the degree to which performance metrics are independent of label authority and supervision regime, and propose a multi-track evaluation framework that systematically varies training and evaluation label sources. Using hierarchical multi-label classification on large-scale scientific metadata, we demonstrate that models exhibiting strong performance under operational ("silver") evaluation degrade substantially under independent ("gold") evaluation, particularly for fine-grained classification. For example, Micro-F1 decreases from approximately 0.54 to 0.03. Notably, ranking-based metrics remain above baseline, revealing a divergence between latent model signal and classification validity. These findings suggest that commonly reported performance metrics may reflect alignment with labeling processes rather than true predictive capability. We therefore reconceptualize evaluation validity as a system-level property shaped by label governance and provide a practical methodology for auditing intelligent systems operating under weak supervision.

12.
arXiv (CS.CL) 2026-06-24

Benchmarking LLMs' Mathematical Reasoning with Unseen Random Variables Questions

Recent studies have raised significant concerns regarding the reliability of current mathematics benchmarks, highlighting issues such as simplistic design and potential data contamination. Consequently, developing a reliable benchmark that effectively evaluates large language models' (LLMs) genuine capabilities in mathematical reasoning remains a critical challenge. To address these concerns, we propose RV-Bench, a novel evaluation methodology for Benchmarking LLMs with Random Variables in mathematical reasoning. Specifically, we build question-generating functions to produce random variable questions (RVQs), whose background content mirrors original benchmark problems, but with randomized variable combinations, rendering them "unseen" to LLMs. Models must completely understand the inherent question pattern to correctly answer RVQs with diverse variable combinations. Thus, an LLM's genuine reasoning capability is reflected through its accuracy and robustness on RV-Bench. We conducted extensive experiments on over 30 representative LLMs across more than 1,000 RVQs. Our findings propose that LLMs exhibit a proficiency imbalance between encountered and ``unseen'' data distributions. Furthermore, RV-Bench reveals that proficiency generalization across similar mathematical reasoning tasks is limited, but we verified it can still be effectively elicited through test-time scaling.

13.
arXiv (CS.AI) 2026-06-17

Trust-Aware Multi-Agent Traceability: Confidence-Calibrated Knowledge Graphs for Consistent Software Artifact Management

arXiv:2606.17203v1 Announce Type: cross Abstract: Multi-agent AI systems are increasingly used to automate software engineering tasks including requirements analysis, architecture design, test generation, and traceability linking. When these agents operate as a sequential pipeline over shared software artifacts, errors and low-confidence decisions made by upstream agents propagate to downstream stages, producing orphaned requirements, contradictory links, and compliance gaps that pose significant risks in safety-critical domains. We propose a trust-aware coordination framework where a shared knowledge graph serves as both centralized semantic memory and a coordination surface through which agents assess and build upon each other's contributions using calibrated confidence scores. Our approach introduces a two-stage traceability link prediction pipeline combining embedding-based retrieval with LLM-based multi-criteria analysis, a traceability seeding mechanism that enables comparison between derivation-time and validation-time confidence, and a consistency protocol governing pipeline interactions through confidence threshold gating, confidence divergence detection, and conflict resolution. We evaluate on an automotive software engineering case study measuring link prediction calibration, protocol effectiveness, threshold sensitivity, and the impact of traceability seeding. Ablation studies confirm that confidence calibration is essential for effective pipeline coordination.

14.
arXiv (CS.AI) 2026-06-11

Carbon-Aware Governance Gates: An Architecture for Sustainable GenAI Development

arXiv:2602.19718v2 Announce Type: replace-cross Abstract: The rapid adoption of Generative AI (GenAI) in the software development life cycle (SDLC) increases computational demand, which can raise the carbon footprint of development activities. At the same time, organizations are increasingly embedding governance mechanisms into GenAI-assisted development to support trust, transparency, and accountability. However, these governance mechanisms introduce additional computational workloads, including repeated inference, regeneration cycles, and expanded validation pipelines, increasing energy use and the carbon footprint of GenAI-assisted development. This paper proposes Carbon-Aware Governance Gates (CAGG), an architectural extension that embeds carbon budgets, energy provenance, and sustainability-aware validation orchestration into human-AI governance layers. CAGG comprises three components: (i) an Energy and Carbon Provenance Ledger, (ii) a Carbon Budget Manager, and (iii) a Green Validation Orchestrator, operationalized through governance policies and reusable design patterns.

15.
arXiv (quant-ph) 2026-06-24

On the localization transition from MAA to AA models

arXiv:2606.24720v1 Announce Type: cross Abstract: Despite their potential similarity between the mosaic Aubry-André (MAA) and AA models, the MAA model allows mobility edges (MEs), whereas the AA model does not. Here we develop a new double quasiperiodic MAA (DMAA) model consisting of one primitive MAA with nonzero even-site potentials and the other modified one with both nonzero odd-site potentials and a tunable amplitude factor, to reveal how localization transitions evolve from MAA to AA models. Interplays and competitions among the extended, critical and localized states arising from superpositions of double quasi-periodic MAA potentials enable new twice and multiple localization-delocalization transitions besides the original single localization transition. Our numerical calculations on inverse participation ratio, normalized participation ratio, fractal dimension and real-space wavefunction distribution confirm such localization features. The continuum model simulations on the experimental polariton modes also yield consistent results and hence validate their experimental feasibility. The constructed DMAA model provides a new framework for studying the localization transition processes between two analogous quasiperiodic models and broadens the understanding of Anderson localization.

16.
bioRxiv (Bioinfo) 2026-06-18

Calculation of sequence space coverage in a mutagenesis library

Directed evolution requires screening of large mutagenesis libraries, but accurate calculation of library sizes needed to discover functional variants remains challenging. Existing models provide baseline estimates, yet current computational approaches for finding the best variants scale poorly with library complexity. Here, we introduce a scalable algorithmic framework to compute exact discovery probabilities in saturation mutagenesis libraries with no requirement for explicit sequence enumeration. By aggregating variants into a composition log–sum distribution and applying log-space convolution across randomisation blocks, it is possible to extend this to massive sequence spaces and mixed codon schemes. By inverting these calculations, absolute mathematical ceilings for experimental design are established. Ultimately, this framework provides a rapid, quantitative tool to balance the statistical coverage-diversity trade-off within the limitations of laboratory screening. Finally, this is implemented as an open-source web application (SSCC) that allows researchers to construct heterogeneous library designs and compute required sampling depths, coverage probabilities, and absolute randomisation limits.

17.
arXiv (CS.CV) 2026-06-17

Similarity-based representation factorization for revealing interpretable dimensions in representational data

The study of representations is widespread across fields, including neuroscience, psychology, and artificial intelligence. While representations are often studied and compared through similarities between stimuli, current methods provide only limited access to the dimensions that shape these representations and are often limited in interpretability. To overcome these challenges, here we introduce Similarity-Based Representation Factorization (SRF), a general computational method for recovering low-dimensional, non-negative, interpretable embeddings from similarity matrices derived from measured data. Across simulations and many neural, behavioral, and computational datasets, SRF recovers interpretable dimensions from diverse forms of representational data, even for very sparsely sampled, incomplete data. The dimensions derived from these datasets match those obtained by task-specific models, predict independent behavioral properties, improve exploratory analysis, and offer higher power for confirmatory hypothesis testing than comparing similarity matrices. Together, these results establish SRF as a general-purpose method with broad applications for uncovering, understanding, and using the dimensions underlying representations.

18.
arXiv (CS.AI) 2026-06-17

Ternary Mamba: Grouped Quantization-Aware Training of W1.58A16 State Space Models

arXiv:2606.18114v1 Announce Type: cross Abstract: State Space Models (SSMs) such as Mamba-2 offer linear-time inference but their memory footprint limits edge deployment. Prior ternary SSM work (Slender-Mamba) trains from scratch on 150B tokens; we show a pretrained checkpoint suffices, reducing the marginal token budget by 1,000x. Using grouped quantization-aware training (QAT) with knowledge distillation from a frozen FP16 teacher, we compress Mamba-2 1.3B to 3.61x (2,687 to 744 MB) and achieve 48.1% zero-shot accuracy (7-task average) in just 102M tokens (4 GPU-hours, single H100) – approaching Bi-Mamba's 48.4% (within +/-0.9pp CI). This QAT-from-pretrained setting reveals zero-ratio collapse, a novel instability caused by learnable quantization scales that does not arise in from-scratch training. We further show that post-hoc correction strategies effective for Transformers fail for SSMs due to error accumulation through the recurrence. These results demonstrate that ternary SSMs do not require expensive from-scratch training: QAT from pretrained checkpoints with KD is a data-efficient alternative.

19.
arXiv (quant-ph) 2026-06-11

A Pfaffian quantum Hall state of ultracold bosons

arXiv:2606.12409v1 Announce Type: cross Abstract: Fractional quantum Hall states are a cornerstone of topological physics, hosting fractionally charged quasiparticles with exotic statistics that promise to enable topologically protected quantum information processing. Among these, the Pfaffian state introduced by Moore and Read implements a p-wave pairing structure that supports excitations with non-Abelian exchange statistics. Despite extensive study in electronic systems, direct access to its pairing structure has remained limited. Here we realize a three-particle bosonic Pfaffian state of ultracold $^{87}\mathrm{Rb}$ atoms in an optical lattice subject to a Floquet-engineered synthetic magnetic field. Using a Bayesian-optimized adiabatic protocol, we prepare a state exhibiting Pfaffian pairing correlations. Site-resolved measurements of multi-point density correlations reveal a pronounced suppression of short-range three-body coincidences, reflecting the underlying pairing structure. We further probe the state's transport response through Hall drift measurements. Our results establish a bottom-up approach to engineering non-Abelian topological order and lay the groundwork for future explorations of anyonic braiding in synthetic matter.

20.
arXiv (quant-ph) 2026-06-17

Many-body spectral transitions through the lens of the variable-range SYK2 model

arXiv:2412.14280v2 Announce Type: replace-cross Abstract: The Sachdev-Ye-Kitaev (SYK) model is a cornerstone in the study of quantum chaos and holographic quantum matter. Real-world implementations, however, deviate from the idealized all-to-all connectivity, raising questions about the robustness of its chaotic properties. In this work, we investigate a quadratic SYK model with distance-dependent interactions governed by a power-law decay. By analytically and numerically studying the spectral form factor (SFF), we uncover how transitions present in the single-particle limit carry over to the many-body system. Non-trivial cancellations in the one-loop contributions lead to a robustness of the SFF under a considerable reduction of the interaction range. Further suppression leads to a breakdown of perturbation theory around the infinite-range path-integral saddle and the appearance of new spectral regimes, marked by a higher dip and the emergence of a secondary plateau. Our results highlight the interplay between single-particle criticality and many-body dynamics, offering new insights into the quantum chaos-to-localization transition and its reflection in spectral statistics.

21.
bioRxiv (Bioinfo) 2026-06-22

Benchmarking cell type annotation in spatial transcriptomics: resolving cellular hierarchies, biological fidelity, and dynamic cell states

Spatial transcriptomics enables the quantification of gene expression within its native tissue context, providing unprecedented insight into tissue architecture, cellular ecosystems, and local cell-cell interactions at regional and single-cell resolution. Accurate cell type annotation is a critical prerequisite for interpreting these data and is often the first and most essential step in downstream analysis. Despite rapid advances in computational methods, cell type annotation remains challenging and frequently requires extensive expert-driven manual curation based on marker-gene expression, spatial context, and prior biological knowledge. While early approaches relied primarily on transcriptional similarity, newer methods increasingly incorporate spatial information, histological features, and multimodal data to improve annotation accuracy. Nevertheless, reliable annotation remains difficult when biological interpretation requires fine-grained subtype resolution, particularly for platforms with limited gene panels, tissues undergoing dynamic cellular state transitions, and studies in which reference and query datasets differ substantially in biological context or technical modality. Here, we present a systematic benchmark of 20 state-of-the-art cell type annotation methods across four spatial transcriptomics datasets spanning diverse technologies, experimental conditions, cell numbers, and gene panel sizes. Importantly, all benchmark datasets contain expert-curated cell type labels, including well-resolved cell populations and subtype annotations, providing high-quality biological ground truth for evaluation. The benchmark encompasses both reference-based and reference-free methods representing a broad range of computational frameworks. Performance was assessed using conventional classification metrics, including accuracy and F1-based measures, together with structure-aware metrics that evaluate both cell-level annotation accuracy and preservation of higher-order biological organization. Across datasets, annotation performance varied substantially according to tissue context, reference-query similarity, and annotation granularity. Fine-grained subtype annotation and recovery of rare cell populations remained challenging for many methods, particularly in datasets capturing injury, repair, developmental, and regenerative processes characterized by continuous cellular state transitions. Notably, high classification accuracy did not necessarily correspond to preservation of global cellular relationships or biologically coherent downstream pathway and gene-set enrichment analyses. Overall, scANVI, Seurat, and TACCO consistently ranked among the top-performing methods, although their relative advantages were context dependent. Together, our results provide a comprehensive assessment of current annotation strategies for spatial transcriptomics and offer practical guidance for selecting methods that best align with specific biological questions, dataset characteristics, and analytical priorities.

22.
Nature (Science) 2026-06-09

A unicellular relative links aggregative multicellularity to animal origins

Authors:

How animals evolved complex multicellularity from their unicellular ancestors remains unanswered. Unicellular relatives of animals exhibit simple multicellularity through clonal division, formation of multinucleate coenocytes, or aggregation. 1 Therefore, animal multicellularity may have evolved from one (or a combination) of these behaviours. Aggregation has classically been dismissed as a means to complex multicellularity. 2 However, aggregation occurs in many extant animal cells and has also been recently described in three close unicellular relatives of animals (the choanoflagellates Salpingoeca rosetta and Choanoeca flexa, and the filasterean Capsaspora owczarzaki). 3-5 It is unclear whether aggregation in these species is derived or ancestral, and its relevance for animal origins remains unknown. To fill this gap, we investigated whether an additional close unicellular relative of animals can undergo aggregation. We discovered that the marine free-living bacterivorous filasterean Ministeria vibrans 6 forms homogeneous aggregates with reproducible kinetics that have long-term stability, and that improved feeding and mating may be evolutionary drivers of this aggregation. Notably, we found that homologs of many animal multicellularity genes involved in cell adhesion, signalling, and transcriptional regulation were deployed during the aggregation process, indicating that they may have been used for aggregation in the unicellular ancestors of animals before being co-opted into animal multicellular development. Thus, our results imply that aggregative multicellularity was key to the development of the multicellular animal genetic toolkit.

23.
arXiv (math.PR) 2026-06-12

Branching-selection particle systems and inverse first passage problems

Authors:

arXiv:2606.13487v1 Announce Type: new Abstract: A generalised inverse first passage problem asks whether, given a probability measure $p$ on $[0,\infty]$, one can find a boundary $b:[0,\infty]\to \mathbb{R}$ such that the stopping time:\[\tau:=\inf\left\{t:\Lambda\int_0^t \omega(W_s-b(s))ds \geq U\right\}\] has distribution $p$, where $U\sim Exp(1)$, $\Lambda\in(0,\infty)$ and $\omega$ is a monotonic decreasing function. We construct a branching-selection particle system whose hydrodynamic limit is governed by a free boundary problem and connect this to the generalised inverse first passage problem. In the $N$-particle system, particles move as independent Brownian motions, branch at a prescribed rate, and are removed at a rate proportional to their location relative to a position $b^N(t)$ which is a function of the empirical distribution. We identify the limit of $b^N$ as the solution of the inverse first passage problem.

24.
arXiv (CS.AI) 2026-06-17

Timestamp-Aware Spatio-Temporal Graph Contrastive Learning for Network Intrusion Detection

arXiv:2606.17109v1 Announce Type: cross Abstract: Given their effectiveness in modeling the relational structure among network traffic flows, graph neural networks (GNNs) have been widely adopted in network intrusion detection systems (NIDSs). However, most existing GNN-based NIDS approaches focus on the relational structure of traffic flows, and treat them as temporally independent, which limits their ability to cope with evolving attack behaviors. Moreover, their reliance on supervised or semi-supervised learning often restricts generalization to unseen attacks. To address these limitations, we propose a novel self-supervised GNN-based framework. To the best of our knowledge, the proposed model is among the first self-supervised GNN-based NIDS models to explicitly leverage real timestamps, which provides faithful temporal dependencies for representation learning. We first construct a series of temporal graphs from network traffic flows according to their timestamps, and then employ an E-GraphSAGE and LSTM based encoder to fully extract temporal information and spatial dependencies of network traffic, without introducing time-costly attention mechanisms. A multi-view graph contrastive learning (GCL) scheme is introduced, where temporal, spatial, and feature contrasts are jointly performed to capture temporal continuity, preserve structural consistency, and improve the generalization and robustness of the learned representations, respectively. In addition, a gradient-norm-based adaptive weighting strategy is designed to optimize the contrastive loss weights. Experimental results on four representative NIDS datasets with real timestamps demonstrate that our method significantly outperforms existing self-supervised approaches and achieves performance comparable to the supervised state-of-the-art GNN method, while maintaining high computational efficiency.

25.
medRxiv (Medicine) 2026-06-15

Long-read sequencing enables high-accuracy mitochondrial heteroplasmy detection in Parkinson's disease

Background: Low-frequency heteroplasmic mitochondrial DNA (mtDNA) variants are associated with aging and neurological diseases, including Parkinson's disease (PD). Targeted deep mtDNA sequencing using PacBio HiFi long reads has the potential to resolve heteroplasmy across the full mitochondrial genome with high accuracy. Methods: To validate Vega PacBio sequencing for detecting mtDNA heteroplasmy, we analyzed four predefined mixtures of two mtDNA haplotypes. We generated a single long-range PCR amplicon covering the entire mitochondrial genome. These amplicons were mixed at predefined ratios (minor mixture haplotype component: 5%, 2%, 1%, and 0.1%). Variant calling was performed using Mutserve2, and accuracy was assessed by calculating the F1 score from comparisons between expected and detected variants. Full-length mtDNA PacBio sequencing was applied to investigate heteroplasmy across fibroblast passages derived from five LRRK2 p.Gly2019Ser variant carriers (n=3 affected with PD and n=2 unaffected carriers). Changes in mtDNA heteroplasmy level and variant load were assessed longitudinally using a linear mixed model. Results: The single-amplicon approach enabled full-length haplotype resolution without amplification bias associated with overlapping PCR strategies. The F1 score of the predefined mixtures was 1.0 for heteroplasmy levels between 5% and 1% and remained high (0.91) at 0.1%. We detected n=10/62 variants discordant with the Illumina reference at the 0.1% mixture, but sensitivity remained very high at 1.00 in that mixture. Detected minor variants closely matched expected heteroplasmy levels, with average variant levels of 0.057 (5%), 0.022 (2%), 0.011 (1%), and 0.001 (0.1%). Across twelve fibroblast passages, we observed fewer mtDNA heteroplasmic variants ({beta}=-3.2, p=0.026). Increased heteroplasmic variant load over time was also associated with older age ({beta}=1.50, p=0.001) and PD affection status ({beta}=5.0, p=1.0 x 10-4) in LRRK2 variant carriers. Notably, we observed distinct patterns of heteroplasmic variants that either increased or decreased in heteroplasmy level across passages. Conclusion: PacBio HiFi sequencing, combined with a single-amplicon strategy, enables accurate full-length mtDNA heteroplasmy detection and longitudinal analysis, providing a valuable tool for studying mitochondrial variation and dynamics in disease.