Academic Intelligence · Curated Daily

Explore the Frontier of Global Academia

AcademicHub aggregates real-time literature from top journals and preprint platforms. Build your personal research radar and let large language models compile cross-disciplinary analysis briefings automatically.

01.
arXiv (math.PR) 2026-06-15

The 1/4-phenomenon of placement probabilities of tilings in the Aztec diamond

arXiv:2512.08377v2 Announce Type: replace-cross Abstract: We consider domino tilings of the Aztec diamond. Using the Domino Shuffling algorithm introduced by Elkies, Kuperberg, Larsen, and Propp in arXiv:math/9201305, we are able to generate domino tilings uniformly at random. In this paper, we investigate the probability of finding a domino at a specific position in such a random tiling. We prove that this placement probability is always equal to $1/4$ plus a rational function, whose shape depends on the location of the domino, multiplied by a position-independent factor that involves only the size of the diamond. This result leads to significantly more compact explicit counting formulas compared to previous findings. As a direct application, we derive explicit counting formulas for the domino tilings of Aztec diamonds with $2\times 2$-square holes at arbitrary positions.

02.
arXiv (CS.LG) 2026-06-24

FuseSampleAgg: One-Pass Neighborhood Estimation for Budgeted Knowledge-Graph Refresh and Validation

arXiv:2511.13645v2 Announce Type: replace Abstract: Operational knowledge-graph (KG) pipelines in networking and cybersecurity increasingly need to refresh embeddings under strict time, memory, and audit budgets, especially as curated feeds and LLM-assisted extraction accelerate KG updates. A recurring per-step cost in mini-batch KG learning is neighborhood-context estimation: uniform neighbor sampling without replacement followed by mean aggregation. Common frameworks implement this estimator through sampled-subgraph materialization and intermediate feature gathers, adding kernel launches, allocator pressure, and transient memory spikes. We present One-Pass Neighborhood Estimation, a fused PyTorch CUDA operator that samples neighbors and directly emits the sampled-neighborhood mean, avoiding explicit block construction while preserving GraphSAGE-mean semantics for the same sampled neighbor IDs. It supports seed-controlled sampling and optional saved-index replay for reproducible validation and regression testing. Across large-graph mini-batch workloads, it improves FP32 end-to-end step latency by 2.24x-3.48x over tuned DGL baselines and reduces transient GPU memory by up to 160x in our measurements. On OGB KG completion benchmarks such as WikiKG2 and BioKG, it reduces step time and peak VRAM while matching ranking quality within seed variability, improving time-to-quality for budgeted KG refresh.

03.
bioRxiv (Bioinfo) 2026-06-19

ContinuumCellAgent: A Framework-Guided Agent for Long-Horizon Scientific Research

AI-scientist systems are beginning to automate parts of scientific research. We present ContinuumCellAgent, an autonomous agent that executes literature review, hypothesis formation, computational experimentation, manuscript drafting, and adversarial peer review as a single unattended run. Existing AI scientist systems remain difficult to diagnose because they lack modularity, systematic prompt grounding, and observability into long-running behavior. ContinuumCellAgent addresses these gaps with a modular supernode architecture for stage-wise backend swapping, protocols grounded in curated research-method checklists that also define reviewer rubrics, and a diagnostics layer that records file-based artifacts, message traces, and state transitions. We evaluate the system on open-domain QA benchmarks and biomedical/longevity case studies, showing that it can produce checkable research artifacts while exposing pipeline dynamics for rigorous AI co-scientist research.

04.
arXiv (quant-ph) 2026-06-11

The Simplified Stabilizer ZX-Calculus is Minimal

arXiv:2606.12383v1 Announce Type: new Abstract: The stabilizer fragment of the ZX calculus is amongst the most important fragments of the theory. The closely related Clifford+T fragment is approximately universal (arXiv:1705.11151). Additionally, the stabilizer calculus can be described by a small collection of rewrites, most of which have been shown to be necessary (arXiv:1709.08903). However, two rules, describing the red/green compact-structure coincidence and the important bialgebra law, had not been shown to be necessary. We present a countermodel-style argument showing that both of these rules are individually necessary relative to the connectivity meta-rule of Backens–Perdrix–Wang (arXiv:1709.08903), and hence establish that the rule set presented in arXiv:1709.08903 has no redundant rewrite rule.

05.
arXiv (CS.CL) 2026-06-24

From Task-Guided Conversational Graphs to Goal-Oriented Dialogue Runtimes

Graph and multi-agent orchestration frameworks make production large language model (LLM) workflows practical, but they do not by themselves solve conversational continuity when users maintain several interdependent objectives. This conceptual systems paper focuses on the high-complexity end of that design space, where goals can be suspended, resumed, revised, and invalidated by actions in other goals. We introduce the Goal-Oriented Dialogue Runtime (GODR), a framework-neutral design pattern that treats goals, task frames, lifecycle state, invalidation rules, and resumption contracts as first-class runtime objects while delegating bounded execution to graph runtimes, agents, tools, or application programming interfaces (APIs). GODR is not proposed as a replacement for workflow graphs in simple guided processes; it is intended for complex, multi-domain, interruptible conversations where objective continuity cannot be recovered reliably from agent identity, chat history, or execution-graph position alone. The paper formalizes the problem, proposes runtime objects and architecture-selection criteria, and frames evaluation as an agenda for future empirical validation rather than as a measured performance claim.

06.
bioRxiv (Bioinfo) 2026-06-15

SMLMFlow: Improving Structural Resolution in Single Molecule Localization Microscopy with Flow Matching

While Single Molecule Localization Microscopy (SMLM) aims to generate precise coordinates of molecular targets in cells, the resulting point clouds are inherently blurred by additive noise sources across the experimental, imaging, and processing workflow. This blurring often limits SMLM's ability to accurately quantify complex assembled structures required to address biological issues, despite reported localization precision down to a couple of nanometers. Here, we present SMLMFlow, a machine learning framework for improving structural resolution in SMLM datasets that combines a graph neural network and a hierarchical transformer with flow matching. We show that SMLMFlow improves structural resolution and downstream quantification across different structures, including filaments and protein nano-clusters, and generalizes to new unseen photophysics models.

07.
arXiv (CS.AI) 2026-06-11

When Researchers Say Mental Model/Theory of Mind of AI, What Are They Really Talking About?

arXiv:2510.02660v2 Announce Type: replace-cross Abstract: When researchers claim AI systems possess ToM or mental models, they are fundamentally discussing behavioral predictions and bias corrections rather than genuine mental states. This position paper argues that the current discourse conflates sophisticated pattern matching with authentic cognition, missing a crucial distinction between simulation and experience. While recent studies show LLMs achieving human-level performance on ToM laboratory tasks, these results are based only on behavioral mimicry. More importantly, the entire testing paradigm may be flawed in applying individual human cognitive tests to AI systems, but assessing human cognition directly in the moment of human-AI interaction. I suggest shifting focus toward mutual ToM frameworks that acknowledge the simultaneous contributions of human cognition and AI algorithms, emphasizing the interaction dynamics, instead of testing AI in isolation.

08.
arXiv (CS.AI) 2026-06-12

A Three-Layer Framework for AI in Scientific Discovery

Authors:

arXiv:2606.13566v1 Announce Type: new Abstract: Current discussions of AI in scientific discovery are often dominated by two visible capabilities: search over existing knowledge and execution through optimization, simulation, and automation. Both are important, but neither fully captures the central act of discovery: the formation and evolution of models. This paper proposes a three-layer view of AI in discovery. Layer 1 is search and retrieval by large language models. Layer 2, as the main innovation of this paper, is model formation through qualitative reasoning: the capacity to recognize when a current framework is structurally inadequate and to understand the problem within a broader representational space, not through trial and error, but through structural insight into what is missing and where it can be found. Layer 3 is execution, optimization, and refinement. The main claim is that Layer 2 is both the most important and the least developed. Search without model formation remains confined to inherited frameworks, while execution without conceptual revision only amplifies an existing formulation. We illustrate Layer 2 reasoning through three case studies: S. S. Chern's intrinsic proof of the Gauss-Bonnet theorem, the resolution of the Nesterov Accelerated Gradient convergence problem via Lyapunov functions, and the autonomous disproof of the Erdos unit distance conjecture by OpenAI in 2026. Each case exhibits the same structural signature: a framework that had become inadequate, a missing conceptual object, and a resolution found in an unexpected neighboring field.

09.
arXiv (CS.LG) 2026-06-11

Self-Supervised Multisensory Pretraining for Contact-Rich Robot Reinforcement Learning

arXiv:2511.14427v4 Announce Type: replace-cross Abstract: Effective contact-rich manipulation requires robots to synergistically leverage vision, force, and proprioception. However, Reinforcement Learning agents struggle to learn in such multisensory settings, especially amidst sensory noise and dynamic changes. We propose MultiSensory Dynamic Pretraining (MSDP), a novel framework for learning expressive multisensory representations tailored for task-oriented policy learning. MSDP is based on masked autoencoding and trains a transformer-based encoder by reconstructing multisensory observations from only a subset of sensor embeddings, leading to cross-modal prediction and sensor fusion. For downstream policy learning, we introduce a novel asymmetric architecture, where a cross-attention mechanism allows the critic to extract dynamic, task-specific features from the frozen embeddings, while the actor receives a stable pooled representation to guide its actions. Our method demonstrates accelerated learning and robust performance under diverse perturbations, including sensor noise, and changes in object dynamics. Evaluations in multiple challenging, contact-rich robot manipulation tasks in simulation and the real world showcase the effectiveness of MSDP. Our approach exhibits strong robustness to perturbations and achieves high success rates on the real robot with as few as 6,000 online interactions, offering a simple yet powerful solution for complex multisensory robotic control. Website: https://msdp-pearl.github.io/

10.
arXiv (CS.CV) 2026-06-16

V2P-Manip: Learning Dexterous Manipulation from Monocular Human Videos

Achieving autonomous robotic dexterous manipulation requires precise, human-like action sequences at scale. As a scalable supplement to costly teleoperation data, extracting trajectories with both visual fidelity and physical plausibility from monocular videos represents a promising frontier in embodied AI. To this end, we introduce V2P-Manip, an efficient framework designed to learn dexterous manipulation policies directly from human demonstration videos. We establish an efficient, integrated pipeline encompassing 3D asset acquisition, trajectory estimation, and dexterous policy learning. To bridge the gap between visual perception and physical constraints, we introduce a two-stage refinement process to enforce spatial alignment and physical consistency. Evaluations on the TACO and OakInk benchmarks demonstrate that our approach significantly outperforms previous methods in pose accuracy, adaptability to unstructured environments, and training efficiency. Ultimately, experimental results confirm an average success rate of over 75% across multiple synthetic manipulation tasks and validate the adaptability of the extracted manipulation priors across diverse dexterous hand embodiments.

11.
arXiv (CS.CV) 2026-06-18

Structured Spectral Graph Representation Learning for Multi-label Abnormality Analysis from 3D CT Scans

With the growing volume of CT examinations, there is an increasing demand for automated tools such as organ segmentation, abnormality detection, and report generation to support radiologists in managing their clinical workload. Multi-label classification of 3D Chest CT scans remains a critical yet challenging problem due to the complex spatial relationships inherent in volumetric data and the wide variability of abnormalities. Existing methods based on 3D convolutional neural networks struggle to capture long-range dependencies, while Vision Transformers often require extensive pre-training on large-scale, domain-specific datasets to perform competitively. In this work, we propose a 2.5D alternative by introducing a new graph-based framework that represents 3D CT volumes as structured graphs, where axial slice triplets serve as nodes processed through spectral graph convolution, enabling the model to reason over inter-slice dependencies while maintaining complexity compatible with clinical deployment. Our method, trained and evaluated on 3 datasets from independent institutions, achieves strong cross-dataset generalization, and shows competitive performance compared to state-of-the-art visual encoders. We further conduct comprehensive ablation studies to evaluate the impact of various aggregation strategies, edge-weighting schemes, and graph connectivity patterns. Additionally, we demonstrate the broader applicability of our approach through transfer experiments on automated radiology report generation and abdominal CT data.

12.
arXiv (CS.CV) 2026-06-12

VideoMDM: Towards 3D Human Motion Generation From 2D Supervision

We introduce VideoMDM, a diffusion-based framework that trains 3D human motion priors directly from accurate 2D poses extracted from monocular videos, without any 3D ground truth. A pretrained 2D-to-3D lifter provides approximate 3D pose sequences that serve as a noisy teacher: these are diffused, denoised by the model in 3D, and supervised in 2D by reprojecting the prediction and comparing against accurate keypoints. We show that, under mild assumptions, a depth-weighted 2D reprojection loss is equivalent in expectation to direct 3D supervision, and we adapt standard 3D motion regularizers - velocity consistency and over-parameterized representation alignment - to this 2D setting. Unlike methods that lift 2D to 3D only at inference, VideoMDM learns a coherent 3D motion manifold during training. On HumanML3D it nearly closes the gap to fully 3D-supervised MDM (FID 0.88 vs 0.54); On real video datasets Fit3D and NBA the method learns to generate motions consistently preferred by humans, with strong quantitative results.

13.
medRxiv (Medicine) 2026-06-15

Primary care practitioners preconception health literacy and information-seeking: A cross-sectional survey.

Background Parental health before pregnancy influences maternal and child outcomes. Primary care professionals, including general practitioners [GPs], midwives, and naturopaths, can provide preconception care, yet many report limited knowledge and difficulty accessing relevant information. This study described Australian GPs, midwives, and naturopaths preconception health literacy, including knowledge and ability to access information. Methods Between July and September 2022, Australian GPs, midwives, and naturopaths completed a 32-item online cross-sectional survey. Participants were recruited through professional associations, and data were analysed using descriptive and inferential statistics Results Participants (N=373) included naturopaths (40.7%), GPs (32.4%), and midwives (26.8%). Reported barriers to clinician health literacy including lack of preconception care resources (25.5%), and limited clinician knowledge (23.6%). The proportion identifying limited clinician knowledge differed significantly between professions (GP: 31.4%; midwives: 23.0%; naturopaths: 17.8%; p=0.030). The highest level of accurate knowledge regarding preconception exposures was for pre-pregnancy obesity (82.7%), while low birth weight was the most accurately identified preconception outcomes (83.7%). Incorrect responses were most common for maternal multivitamin use as an exposure (28.3%) and childhood leukaemia as an outcome (26.3%). Differences between professions were strongest for infant outcomes, with moderate associations observed for shoulder dystocia (V=.2355), precipitous labour (V=.2173), macrosomia (V=.2060), labour dystocia (V=.2018) and cryptorchidism (V=.2018). Discussion Preconception health literacy varies across primary care professions. Clinicians require greater access to targeted resources and education tailored to their differing scopes of practice and experience. Improving clinician preconception health literacy may strengthen consistent evidence-based care and support better maternal, child, and long-term family health outcomes.

14.
medRxiv (Medicine) 2026-06-22

How knowledge shapes community stigma and social support for women seeking abortion in the Democratic Republic of Congo: A cross-sectional study.

Background The Democratic Republic of Congo (DRC) bears one of the highest maternal mortality ratios globally (746 per 100,000 live births), with nearly 11% of deaths attributable to complications of unsafe abortion. Despite ratification of the Maputo Protocol and related national policies, access to safe abortion remains limited, largely due to entrenched stigma. Social support, encompassing emotional, informational, and instrumental assistance, is critical in shaping womens abortion-seeking behaviors and health outcomes. This study examines the influence of community-level knowledge on stigma and social support for women seeking abortion care. Methods A cross-sectional survey was conducted from May 2024 to June 2024 among 1,715 adults in Kinshasa and North Kivu provinces. Analyses focused on a sub-sample of 574 respondents reporting familiarity with women who had undergone abortion. Structural Equation Modeling (SEM) was applied to estimate direct and indirect pathways linking community knowledge, stigma, and social support. Results Two core knowledge indicators, recognition of abortion as a safe medical procedure and awareness of legal conditions for access, were significantly associated with outcomes. A one-unit increase in knowledge corresponded to a 0.39-point increase in social support and a 0.19-point reduction in stigma. Enhanced knowledge promoted empathetic attitudes, reinforced practical support, and mitigated moralizing judgments toward women seeking abortion. Conclusions Strengthening community knowledge emerges as a strategic lever to reduce abortion-related stigma and enhance social support in the DRC. These findings underscore the importance of integrating stigma-reduction and knowledge-enhancement interventions into reproductive health programs to improve womens access to safe and dignified abortion care.

15.
arXiv (CS.LG) 2026-06-12

Robust State-Conditional Feature-Weighted Jump Models for Temporal Clustering

arXiv:2606.13146v1 Announce Type: cross Abstract: We propose a robust feature-weighted jump model for time-dependent clustering. A penalty is used to encourage smoothness of transitions over time, while robustness is achieved through the use of a Tukey's biweight loss function. An additional parameter controls the variability of feature weights across states, allowing the model to assign state-specific relevance to each feature. We illustrate in simulation how the method accurately recovers the true cluster sequence and reliably identifies relevant features, outperforming competing approaches, particularly in the presence of outliers. We conclude with two empirical applications, one on the number of conflict-related homicides in Kosovo in the period 1998-2000, and another on macroeconomic performance of twelve European countries in the period 1949-2024.

16.
arXiv (quant-ph) 2026-06-12

The Pound-Drever-Hall Method for Superconducting-Qubit Readout

arXiv:2512.03138v3 Announce Type: replace Abstract: Scaling quantum computers to large sizes requires the implementation of many parallel qubit readouts. Here we present an ultrastable superconducting-qubit readout method using the multi-tone self-phase-referenced Pound-Drever-Hall (PDH) technique, originally developed for use with optical cavities. In this work, we benchmark PDH readout of a single transmon qubit, using room-temperature heterodyne detection of all tones to reconstruct the PDH signal. We demonstrate that PDH qubit readout is insensitive to microwave phase drift, displaying $0.73^\circ$ phase stability over 2 hours, and capable of single-shot readout in the presence of phase errors exceeding the phase shift induced by the qubit state. We show that the PDH sideband tones do not cause unwanted measurement-induced state transitions for a transmon qubit, leading to a potential signal enhancement of at least $14$~dB.

17.
arXiv (CS.AI) 2026-06-24

CrossPool: Efficient Multi-LLM Serving for Cold MoE Models through KV-Cache and Weight Disaggregation

arXiv:2606.24506v1 Announce Type: cross Abstract: Emerging LLM services increasingly host many sparse MoE models, yet most models receive sparse requests and remain cold. This creates a GPU memory problem: model weights are stable and model-determined, while KV-cache is transient and demand-determined. Because cold models rarely reach peak KV-cache demand at the same time, reserving worst-case KV capacity per model wastes memory; a shared KV-cache pool can instead provision aggregate active demand. However, KV-cache sharing is not sufficient when weights and KV-cache remain in a monolithic GPU memory pool. Static weights compete with dynamic KV-cache, and KV-head-limited attention under cold, low-concurrency traffic exposes only a fraction of replicated KV capacity, leading to low GPU memory utilization and weak long-context support. We present CrossPool, a serving engine for cold MoE models that separates FFN weights and KV-cache into two GPU memory pools: a weights pool that consolidates FFN weights across cold models, and a KV-cache pool that dynamically serves active requests while keeping attention local to KV-cache. CrossPool combines a KV-cache planner and virtualizer, a layer-wise pipeline scheduler that hides hidden-state transfers, and persistent kernels with control lowering to reduce CPU-GPU control overhead. With efficient GPU memory pooling, CrossPool underpins bursty long-context requests and outperforms the state-of-the-art kvcached-based multi-LLM serving system, reducing P99 TBT by up to $10.4\times$.

18.
medRxiv (Medicine) 2026-06-23

Oxidative Stress Biomarker Profile Dynamics across Blood and Cerebrospinal Fluid

Peripheral blood measurements dominate oxidative stress research, yet whether they reflect central nervous system (CNS) redox status remains untested in humans. We simultaneously profiled five biomarkers, total antioxidant capacity (TAC), glutathione (GSH), thiobarbituric acid-reactive substances (TBARS), ferric reducing antioxidant power (FRAP), and hydroxyl radical scavenging activity (HRSA), in paired blood and cerebrospinal fluid (CSF) from 140 adults in the ALBION cohort. Only FRAP showed a significant positive cross-compartment correlation ({rho} = +0.49, FDR-p < 0.001), supporting its role as a systemic antioxidant signal. TBARS showed a significant inverse cross-compartment association ({rho} = -0.20, FDR-p = 0.042), suggesting compartmental compensation in lipid peroxidation regulation rather than parallel dynamics. TAC and GSH showed no meaningful intercompartmental alignment. Individual biomarker levels were largely stable across the 40-85 year age range in both compartments, suggesting that age effects operate through coordinated latent networks rather than single-marker trajectories. Principal component extraction with varimax rotation identified four latent factors explaining 66.6% of total variance, dominated by a coherent CSF-centred redox axis alongside multiple partially opposing peripheral components. Age stratification revealed progressive fragmentation: middle-aged adults retained four coherent cross-compartment factors, whereas older adults exhibited five more dispersed components. Sex-stratified analyses showed that females exhibited four-factor modular organisation centred on glutathione, while males showed a simpler three-factor structure with tighter cross-compartment coupling anchored by FRAP. Blood and CSF oxidative stress biomarkers are not interchangeable, a finding with direct implications for biomarker selection in clinical trials targeting neurological conditions.

19.
arXiv (CS.AI) 2026-06-12

Toward Instructions-as-Code: Understanding the Impact of Instruction Files on Agentic Pull Requests

arXiv:2606.13449v1 Announce Type: cross Abstract: AI-agents (e.g., GitHub Copilot) collaborate as teammates in different software engineering tasks, including code generation proposed through pull requests (Agentic-PRs). For better agent efficiency, developers create instruction files that guide the AI-agents, including how to navigate the project, locate the right components, run tests, respect best practices, and more. In this paper, we investigate the relationship between the creation of these instructions and the performance of AI-agents in creating better pull requests, which have a higher chance of success (i.e., the merge rate), address more complex tasks (e.g., code churn), and require less effort to be merged (e.g., time to merge). To this end, we analyze 15,549 agentic PRs from 148 projects in the AIDev dataset. Using the three dimensions, we compare each project before and after the creation of the instruction files. We find that specifying instructions for AI-agents does not necessarily lead to better results. With the instruction files, 27.7\% of the projects increased their merge rate by at least 20\%, while 26.35\% decreased it. The same observation is seen with the amount of changes (e.g., code churn, number of modified files) and with the efforts to merge an agentic PR (e.g., merge time and number of comments). From a first exploration, we find that projects that managed to increase their merge rate have substantially longer instruction files, which are also well structured into a higher number of sections and sub-sections. Our results motivate the need for research to assist practitioners in framing the development of instruction files as a software engineering activity (aka, Instructions-as-Code).

20.
medRxiv (Medicine) 2026-06-19

Hyperleukocytosis and outcomes in pediatric B-cell acute lymphoblastic leukemia: A report from the REDIAL Consortium

Hyperleukocytosis (white blood cell [WBC] count >100 000/uL) at diagnosis is an important prognostic risk factor in pediatric acute lymphoblastic leukemia (ALL), though its significance with contemporary therapy is unclear. We analyzed 1 826 pediatric ALL patients from a multi-institution cohort to determine whether hyperleukocytosis independently predicts outcomes using multivariable Cox proportional hazard modeling. Hyperleukocytosis occurred in 211 patients (12%), with 121 having B-ALL, and showed no prognostic significance in T-ALL patients. In B-ALL, 5-year event-free survival (EFS) was 65% versus 89% for non-hyperleukocytosis patients, and overall survival (OS) was 78% versus 93%. After adjustment for age, cytogenetic risk, central nervous system disease status, and treatment site, hyperleukocytosis remained an independent predictor of end-of-induction minimal residual disease (MRD) positivity (odds ratio 2.53 [95% confidence interval [CI]: 1.71-3.94; p

21.
arXiv (CS.LG) 2026-06-16

Learning Topological Representations for Molecular Dynamics

arXiv:2606.14737v1 Announce Type: cross Abstract: Molecular dynamics (MD) simulations generate trajectories in a high-dimensional configuration space whose analysis critically depends on molecular descriptors, typically handcrafted observables or learned kinetic embeddings. Designing descriptors that are both expressive and broadly applicable, however, remains challenging. We study persistent homology (PH) as a general-purpose representation for MD and introduce the masked Flood complex, a protein-tailored modification of a recently introduced simplicial complex construction that emphasizes inter-residue structure at low computational cost. Vectorized persistence diagrams then provide information-rich, geometry-aware summaries of protein conformations, which we evaluate on protein class prediction, frame-level observable regression, and Markov state model (MSM) estimation from learned low-dimensional coordinates in a single shared representation space. Results on the mdCATH dataset show that PH-based descriptors are competitive across tasks, with masked Flood PH yielding the most consistent overall performance. Further, when using topologically-informed MSMs as a drop-in replacement within the recent MarS-FM framework for generative modeling of protein conformations, we obtain consistently better ensemble statistics than MSMs based on physical observables. Finally, we explore the transferability of the generative model to qualitatively different, fast folding, proteins.

22.
arXiv (CS.AI) 2026-06-12

Echo2ECG: Enhancing ECG Representations with Cardiac Morphology from Multi-View Echos

arXiv:2603.08505v2 Announce Type: replace-cross Abstract: Electrocardiography (ECG) is a low-cost, widely used modality for diagnosing electrical abnormalities like atrial fibrillation by capturing the heart's electrical activity. However, it cannot directly measure cardiac morphological phenotypes, such as left ventricular ejection fraction (LVEF), which typically require echocardiography (Echo). Predicting these phenotypes from ECG would enable early, accessible health screening. Existing self-supervised methods suffer from a representational mismatch by aligning ECGs to single-view Echos, which only capture local, spatially restricted anatomical snapshots. To address this, we propose Echo2ECG, a multimodal self-supervised learning framework that enriches ECG representations with the heart's morphological structure captured in multi-view Echos. We evaluate Echo2ECG as an ECG feature extractor on two clinically relevant tasks that fundamentally require morphological information: (1) classification of structural cardiac phenotypes across three datasets, and (2) retrieval of Echo studies with similar morphological characteristics using ECG queries. Our extracted ECG representations consistently outperform those of state-of-the-art unimodal and multimodal baselines across both tasks, despite being 18x smaller than the largest baseline. These results demonstrate that Echo2ECG is a robust, powerful ECG feature extractor. Our code is accessible at https://github.com/michelleespranita/Echo2ECG.

23.
arXiv (CS.CV) 2026-06-16

Are Neuro-Inspired Multi-Modal Vision-Language Models Resilient to Membership Inference Privacy Leakage?

In the age of agentic AI, the growing deployment of multi-modal models (MMs) has introduced new attack vectors that can leak sensitive training data in MMs, causing privacy leakage. This paper investigates a black-box privacy attack, i.e., membership inference attack (MIA) on multi-modal vision-language models (VLMs). State-of-the-art research analyzes privacy attacks primarily to unimodal AI-ML systems, while recent studies indicate MMs can also be vulnerable to privacy attacks. While researchers have demonstrated that biologically inspired neural network representations can improve unimodal model resilience against adversarial attacks, it remains unexplored whether neuro-inspired MMs are resilient against privacy attacks. In this work, we introduce a systematic neuroscience-inspired topological regularization (tau) framework to analyze MM VLMs resilience against image-text-based inference privacy attacks. We examine this phenomenon using three VLMs: BLIP, PaliGemma 2, and ViT-GPT2, across three benchmark datasets: COCO, CC3M, and NoCaps. Our experiments compare the resilience of baseline and neuro VLMs (with topological regularization), where the tau > 0 configuration defines the NEURO variant of VLM. Our results on the BLIP model using the COCO dataset illustrate that MIA attack success in NEURO VLMs drops by 24% mean ROC-AUC, while achieving similar model utility (similarities between generated and reference captions) in terms of MPNet and ROUGE-2 metrics. This shows neuro VLMs are comparatively more resilient against privacy attacks, while not significantly compromising model utility. Our extensive evaluation with PaliGemma 2 and ViT-GPT2 models, on two additional datasets: CC3M and NoCaps, further validates the consistency of the findings. This work contributes to the growing understanding of privacy risks in MMs and provides evidence on neuro VLMs privacy threat resilience.

24.
arXiv (CS.AI) 2026-06-16

ToolSelf: Unifying Task Execution and Self-Reconfiguration via Tool-Driven Emergent Adaptation

arXiv:2602.07883v4 Announce Type: replace Abstract: LLM-powered agentic systems excel at complex long-horizon tasks, but remain constrained by static configurations fixed before execution. Such rigidity forces a trade-off between domain-specific performance and cross-task generalization: strong priors and compact tool spaces aid specialization but weaken transfer, while task-agnostic workflows and broad action spaces expand coverage but dilute guidance. Existing pre-execution optimization, planner-worker orchestration, and configuration patching fall short of resolving this tension, as they decouple adaptation from execution, causing information loss, fragmented optimization, and ambiguous credit assignment. We propose ToolSelf, a tool-driven runtime self-reconfiguration paradigm that abstracts configuration updates as a standardized tool interface and unifies execution and adaptation within one policy's action space. The execution agent can dynamically update sub-goals, strategies, toolboxes, context, and context-management modes based on task progress and feedback. We further introduce Configuration-Aware Two-stage Training (CAT), which combines rejection sampling fine-tuning with trajectory-level KTO reinforcement learning to internalize self-reconfiguration. Across diverse benchmarks, zero-shot ToolSelf rivals task-specialized agents; after CAT training, ToolSelf gains 28.8 points over the static-configuration baseline on average, illuminating a path toward emergent adaptivity that obviates manually injected guidance. The code is available at https://github.com/lian-tian-mo-zun/ToolSelf.

25.
arXiv (CS.CL) 2026-06-24

Matching Tasks to Objectives: Fine-Tuning and Prompt-Tuning Strategies for Encoder-Decoder Pre-trained Language Models

Prompt-based learning has emerged as a dominant paradigm in natural language processing. This study explores the impact of diverse pre-training objectives on the performance of encoder-decoder pre-trained language models across generation and question answering tasks, with a focus on commonsense knowledge retrieval and completion. We highlight the benefits of incorporating multiple objectives during both pre-training and fine-tuning stages. We introduce the Match Task to Objective (MTO) framework and methods for determining the appropriate objective for a given task. This framework offers automated methods to prepare task-related data for adaptation through unsupervised training, based on the identified objective. In the fine-tuning stage, we design novel templates that align with the objectives of the pre-training and adaptation stages. When aligned with task requirements, these strategies can achieve a performance gain of over 120\% compared to conventional methods in few-shot settings. They significantly outperform related works in few-shot settings and exceed the baseline even in full-dataset scenarios. Furthermore, we extend this approach to include prompt-tuning methodologies, providing guidance for more effective soft prompt engineering and optimization. Our strategies significantly enhance prompt-tuning performance as well. These insights hold substantial value, precisely guiding the selection and optimization of models customized for specific tasks. Code is available at https://github.com/puraminy/MTO/