Academic Intelligence · Curated Daily

Explore the Frontier of Global Academia

AcademicHub aggregates real-time literature from top journals and preprint platforms. Build your personal research radar and let large language models compile cross-disciplinary analysis briefings automatically.

01.
arXiv (CS.AI) 2026-06-11

Automating Geometry-Intensive Compliance Checking in BIM: Graph-Based Semantic Reasoning Framework

arXiv:2606.12065v1 Announce Type: new Abstract: Automating compliance check for geometry-intensive regulations remains a significant technical bottleneck in Building Information Modeling (BIM), primarily due to the semantic disparity between high-level regulatory logic and structured IFC data. Existing methods, often reliant on static rule templates, struggle to traverse multi-hop reasoning chains or resolve latent spatial dependencies across multiple building entities. To address these challenges, a Spatial-Geometric Reasoning System for Building Information Modeling (SGR-BIM) is proposed as an integrative graph-driven reasoning framework. SGR-BIM dynamically constructs a cross-modal knowledge graph that aligns user intent, regulatory semantics, and BIM geometry, enabling interpretable reasoning without rigid hard-coding. Validated on 679 expert-verified queries from fire safety codes, the framework achieves 84.3% accuracy, representing an 8.6% improvement over enhanced-tool single-agent baselines. This research provides a graph-based semantic reasoning paradigm, enhancing the transparency and flexibility of automated geometric compliance check workflows in the Architecture, Engineering, and Construction (AEC) industry.

02.
arXiv (CS.CV) 2026-06-12

Context-Aware Feature-Fusion for Co-occurring Object Detection in Autonomous Driving

Object detection in autonomous driving requires precise localization and an inherent understanding of the relational context between co-occurring objects. In extremely complex heterogeneous environments rare classes, small-scale objects, and frequently appearing objects are difficult for standard object detection frameworks to handle. In this paper, we propose a novel framework called Context-Centric Feature Fusion (CCFF), which utilizes two attention-based modules, Local Context Fusion Module (LCFM) uses the RoI-to-RoI self-attention mechanism to resolve spatial interactions, mainly considering small and partially obscured objects, while Global Context Attention Module (GCAM) converts the co-occurrence of objects priors by pooling top-K RoI features into a global context attention token, avoiding the computational overhead of pixel-level global pooling. This fusion of local and object-centric global features yields contextualized embeddings that enhance classification results and co-occurring objects detection. Our method is evaluated on two datasets, Cityscapes and BDD100K which demonstrate significant improvement on relational consistency, achieving a Category-level Consistency Strategy (CCS) of 0.973 and 0.969, respectively. Furthermore, our approach produces substantial gains in small object detection (AP_S: 14.1%) and successfully recovers rare classes such as "Train" that are typically lost in large distributions. Our efficiency report shows that the framework processes images in real time with a 0.2 FPS overhead. The code is available at https://github.com/BinayKSingh/CCFF.

03.
arXiv (CS.AI) 2026-06-11

LLMs+Graphs: Toward Graph-Native, Synergistic AI Systems

arXiv:2606.11560v1 Announce Type: cross Abstract: Large Language Models (LLMs) have advanced rapidly, but their limitations in structured and multi-hop reasoning underscore the need for graph-native, synergistic artificial intelligence (AI) systems. Graph-structured data underpins critical applications across social, biological, financial, transportation, web, and knowledge domains, making it essential to understand how LLMs can leverage graph computation for grounded, context-rich inference. Three complementary synergies are emerging: LLMs augmented with graph computation for retrieval and reasoning; bidirectional integration between LLMs and knowledge graphs (KGs), where LLMs support KG construction and curation while KGs enforce semantic constraints and factual consistency; and AI agents strengthened by graph algorithms for planning, decision making, and multi-step reasoning. In parallel, LLMs introduce new capabilities for graph data management and graph machine learning (ML) through natural language interfaces and hybrid LLM-graph neural network (GNN) pipelines. This tutorial synthesizes the algorithms, systems, and design principles driving these converging directions, offering data science and data mining researchers a unified perspective on integrating LLMs, graph data management, graph mining, graph ML, and agentic computation into next-generation graph-native AI systems.

04.
arXiv (CS.AI) 2026-06-16

SorryDB: Can AI Provers Complete Real-World Lean Theorems?

arXiv:2603.02668v2 Announce Type: replace Abstract: We present SorryDB, a dynamically-updating benchmark of open Lean tasks drawn from 78 real world formalization projects on GitHub. Unlike existing static benchmarks, often composed of competition problems, hillclimbing the SorryDB benchmark will yield tools that are aligned to the community needs, more usable by mathematicians, and more capable of understanding complex dependencies. Moreover, by providing a continuously updated stream of tasks, SorryDB mitigates test-set contamination and offers a robust metric for an agent's ability to contribute to novel formal mathematics projects. We evaluate a collection of approaches, including generalist large language models, agentic approaches, and specialized symbolic provers, over a selected snapshot of 1000 tasks from SorryDB. We show that current approaches are complementary: even though an agentic approach based on Gemini Flash is the most performant, it is not strictly better than other off-the-shelf large-language models, specialized provers, or even a curated list of Lean tactics.

05.
arXiv (CS.CL) 2026-06-18

Continuous Audio Thinking for Large Audio Language Models

Large audio language models (LALMs) have shown impressive capabilities on diverse audio understanding tasks, ranging from speech transcription to music analysis. However, because LALMs are typically trained to produce text-aligned responses, their hidden states are progressively shaped for text generation rather than for preserving acoustic information. As a result, the diverse acoustic content that audio carries, such as phonetic detail, prosody, sound events, affect, and pitch, is lost along the way and difficult to leverage in the response. We introduce Continuous Audio Thinking (CoAT), a framework that equips audio language models with a continuous latent workspace for organizing acoustic information prior to response generation, grounded by distillation from audio experts. Within the thinking space, the model can utilize the rich acoustic information provided by expert distillation when generating its response. Furthermore, the proposed continuous thinking block can be processed in a single prefill, so CoAT does not require additional autoregressive decoding cost over the baseline. Across three LALMs, Qwen2-Audio, Qwen2.5-Omni-7B, and Audio Flamingo~3, performance gains on a broad benchmark suite spanning audio reasoning, audio understanding, music classification, speech emotion, and speech transcription demonstrate the effectiveness of CoAT. Further analysis confirms that the auxiliary supervision propagates from the thinking positions to the model's textual responses.

06.
arXiv (CS.CL) 2026-06-19

Large Language Models Hack Rewards, and Society

Reinforcement learning (RL) has become a dominant post-training paradigm, enabling large language models (LLMs) to learn from rewards. We observe that societal regulations are structurally similar to reward functions. They define measurable outcomes, thresholds, and exceptions, while often leaving institutional intent only partially specified. We hypothesise that the RL training process may exploit these gaps and therefore ask whether models' well-known tendency to hack reward functions during RL can scale into a more consequential failure mode named societal hacking: discovering loopholes in the rules society runs on. To study this phenomenon, we introduce SocioHack, a sandbox of 72 societal environments, and find that within these environments, reward hacking naturally emerges and leads to regulatory loophole discovery. Models learn to hack the social rules and generate strategies that remain technically compliant while defeating regulatory intent, and current LLM safeguards provide only limited mitigation. Therefore, collecting in-the-wild feedback for model training requires greater caution, and we need a next-generation post-training paradigm for safely iterating LLMs in real society.=

07.
PLOS Medicine 2026-05-14

Antibody fine specificity correlates with protection from malaria for the RTS,S vaccine in young African children: A post hoc analysis of a phase IIb randomised controlled trial

Authors:

by Alessia Hysa, D. Herbert Opi, Joshua Waterhouse, Sandra Chishimba, Jessica L. Horton, Natalie Kingston, Hans J. Netter, David Wetzel, Michael Piontek, Gaoqian Feng, Jahit Sacarlal, Carlota Dobaño, Liriye Kurtovic, James G. Beeson Background The RTS,S/AS01 malaria vaccine was recently approved for implementation in children, but only provides modest and short-lived efficacy against malaria. RTS,S targets a portion of the Plasmodium falciparum (Pf) circumsporozoite protein (CSP), comprising the central NANP-repeat region and C-terminal domain. Mechanisms of immunity and correlates of protection for the RTS,S vaccine are not well defined, hindering progress towards generating highly effective CSP-based vaccines. Methods and findings We investigated epitope specificity and cross-reactivity of vaccine-induced antibodies to six peptides representing CSP epitopes in the N-terminal and central NANP-repeat region. We evaluated antibody reactivity in preclinical mouse vaccine studies, among CSP-specific monoclonal antibodies (mAbs), and in a large RTS,S phase IIb clinical trial in young children 1–4 years old (n = 735).The preclinical mouse vaccine studies and CSP-specific mAbs were used to initially evaluate IgG responses to the six peptides. Mice immunised with the central NANP-repeat region had IgG with cross-reactivity to an epitope in the N-terminal region. Additionally, we demonstrated that a single CSP-specific mAb could display cross-reactivity to several CSP epitopes. Through post hoc quantification and analysis of antibody responses in the RTS,S phase IIb clinical trial, we found that a subset of children generated IgG with specificity for a short NANP-repeat epitope (NANP2; amino acid sequence: NANPNANP) and cross-reactivity to an N-terminal epitope (J1; amino acid sequence: KQPADGNPDPNANPN). Notably, children with high IgG responses to NANP2 and J1 had a significantly reduced risk of clinical malaria, compared to children with low responses (IgG to NANP2 (aHR: 0.838 (95% CI [0.716, 0.981]; p = 0.028)) and J1 (aHR: 0.718 (95% CI [0.611, 0.844]; p 

08.
arXiv (CS.CV) 2026-06-17

Mordal: Automated Pretrained Model Selection for Vision Language Models

Incorporating multiple modalities into large language models (LLMs) is a powerful way to enhance their understanding of non-textual data, enabling them to perform multimodal tasks. Vision language models (VLMs) form the fastest growing category of multimodal models because of their many practical use cases, including in healthcare, robotics, and accessibility. Unfortunately, even though different VLMs in the literature demonstrate impressive visual capabilities in different benchmarks, they are handcrafted by human experts; there is no automated framework to create task-specific multimodal models. We introduce Mordal, an automated multimodal model search framework that efficiently finds the best VLM for a user-defined task without manual intervention. Mordal achieves this both by reducing the number of candidates to consider during the search process and by minimizing the time required to evaluate each remaining candidate. Our evaluation shows that Mordal can find the best VLM for a given problem using $8.9\times$–$11.6\times$ lower GPU hours than grid search. We have also discovered that Mordal achieves about 69\% higher weighted Kendall's $\tau$ on average than the state-of-the-art model selection method across diverse tasks.

09.
arXiv (math.PR) 2026-06-18

Stable size-biasing and the positive scale-mixture order of generalized Gaussian laws

arXiv:2606.18458v1 Announce Type: new Abstract: Let $X_r\sim N_r(0,1)$ be the centered unit-scale generalized Gaussian random variable with density proportional to $\exp(-|x|^r/2)$. We prove that, for $p,q>0$, there exists a strictly positive random variable $V$, independent of $X_q$, such that $X_p\stackrel{d}{=}VX_q$ if and only if $p\le q$. Moreover, the law of $V$ is unique. For $pq$, the required Mellin quotient, viewed as the candidate characteristic function of $\log V$, is unbounded by Stirling's formula, and hence cannot be a characteristic function. The factor laws form a multiplicative cocycle, $V_{p,r}\stackrel{d}{=}V_{p,q}V_{q,r}$, for $p\le q\le r$, where the factors on the right-hand side are independent copies. Thus the Mellin quotient isolated by Dytso, Bustin, Poor and Shamai is realized constructively throughout the $p

10.
arXiv (CS.AI) 2026-06-17

Curiosity-Critic: Cumulative Prediction Error Improvement as a Tractable Intrinsic Reward for World Model Training

arXiv:2604.18701v3 Announce Type: replace-cross Abstract: Local prediction-error-based curiosity rewards focus on the current transition without considering the world model's cumulative prediction error across all visited transitions. We introduce Curiosity-Critic, which grounds its intrinsic reward in the improvement of this cumulative objective, and show that it admits a tractable per-step surrogate: the difference between the current prediction error and the asymptotic error baseline of the current state transition. We estimate this error baseline online with a learned critic co-trained alongside the world model; since the critic only has to learn how hard a transition is to predict, its estimate of the irreducible noise floor converges well before the world model saturates, redirecting exploration toward learnable transitions. The reward is higher for learnable transitions and collapses toward zero for stochastic ones, thereby separating epistemic (reducible) from aleatoric (irreducible) prediction error online. Prior prediction-error curiosity formulations, from Schmidhuber (1991) to learned-feature-space variants, emerge as special cases corresponding to specific approximations of this error baseline. Experiments on a stochastic grid world show that Curiosity-Critic outperforms prediction-error, visitation-count, and Random Network Distillation methods in training speed and final world model accuracy.

11.
arXiv (CS.LG) 2026-06-15

Anytime-Valid Confirmation of Label-Shift Corrections

Authors:

arXiv:2606.14028v1 Announce Type: cross Abstract: In small-batch scientific deployments, labeled target outcomes may be too scarce for reliable shift estimation even when unlabeled target inputs are available. We address the complementary setting where the practitioner has a pre-specified label-shift correction from domain knowledge and asks whether incoming labeled outcomes support it. We show that the per-observation likelihood ratio between a label-shift-corrected predictive and the source predictive is a conditional e-value, so its running product is a nonnegative martingale and Ville's inequality yields an anytime-valid confirmation rule. The log martingale equals the cumulative negative log-predictive density (NLPD) gap between the source and the corrected predictive, converting routine model monitoring into a formal sequential test. Rejection means the incoming data support the posited correction relative to the source predictive, but it is not a precise estimate of the degree of shift. Closed forms are available for GP sources with Gaussian label-shift ratios. GP regression simulations validate Type I control, finite-sample power, miscalibration sensitivity, and the small-batch advantage of a reliable prior over label-based re-estimation.

12.
arXiv (quant-ph) 2026-06-16

Scalable generation of heralded single photons via active feed-forward switching of a fiber delay line

arXiv:2606.16741v1 Announce Type: new Abstract: Quasi-deterministic single-photon generation is a key requirement for many photonic quantum technologies. Photon sources based on spontaneous parametric down-conversion (SPDC) are widely used for producing high-quality photons; however, the probabilistic nature of the process limits the generation of synchronized multi-photon states. Here, we demonstrate temporal synchronization of multiple photon-generation events using a free-space-fiber hybrid delay line with feed-forward control, enabling fast and efficient switching and scalable operation. Narrow-band, telecom-wavelength photons compatible for fiber transmission are heralded from a monolithic cavity SPDC source and synchronized across 20 time bins. This yields a sixfold enhancement in synchronized rates and enables multi-photon synchronization, with only a marginal increase of higher-order photon-number contributions.

13.
bioRxiv (Bioinfo) 2026-06-11

Amylo-Pipe: an integrated web server for mechanistic and kinetic prediction of protein and peptide aggregation

Protein aggregation is central to amyloid-related disorders and remains a major developability challenge for protein therapeutics. Over the past two decades, significant advances have been made to predict aggregation-prone regions (APRs) and estimate aggregation propensity in proteins and peptides. In contrast, the prediction of aggregation kinetics has received relatively less attention due to the limited availability and heterogeneity of experimental data. Consequently, aggregation propensities from APR prediction algorithms were widely accepted as a means to predict relative changes in the aggregation kinetics of proteins and mutants. Previous studies have demonstrated, using large-scale datasets, that aggregation propensity shows a weak or inconsistent correlation with aggregation kinetics. In the present study, we have integrated complementary state-of-the-art mechanistic and kinetic prediction tools for protein aggregation into a unified, user-friendly web framework entitled "Amylo-Pipe". Amylo-Pipe also implements practical features that are especially useful for protein engineering, such as gatekeeper-residue mutational scanning to support the design of aggregation-resistant variants. By consolidating multiple prediction tasks in a single interface, Amylo-Pipe enables a more comprehensive assessment of aggregation behavior than APR-only workflows. The web server is freely accessible at: https://web.iitm.ac.in/bioinfo2/amylopipe/.

14.
arXiv (CS.CV) 2026-06-18

Data-Forcing Distillation: Restoring Diversity and Fidelity in Few-Step Video Generation

Recent progress has shown promise in distilling multi-step video diffusion models into efficient few-step students. Among them, Distribution Matching Distillation (DMD) and its successor DMD2 achieved strong generation quality and fast convergence. However, due to the nature of the reverse Kullback–Leibler (KL) objective, these methods exhibit two persistent failure modes: a substantial drop in sample diversity, and visibly over-saturated outputs that deviate from real-video appearance. In this work, we propose Data-Forcing Distillation (DFD), a simple post-training framework that restores diversity and fidelity in DMD with only a single-line of code change. At its core is the teacher score discrepancy to guide the student toward the real-data distribution, pulling it to missing modes (mitigating mode collapse) and away from problematic modes absent in real data (avoiding over-saturation). We provide an in-depth theoretical analysis of our framework and validate our approach on text-to-video, image-to-video, and autoregressive video generation. With only 100–300 steps of finetuning, DFD effectively restores diversity and fidelity on both Wan2.1-1.3B and Cosmos-Predict2.5-2B model, resolving the over-saturation artifacts with significantly better video dynamics and appearance, and even outperforms the teacher model.

15.
arXiv (CS.CV) 2026-06-11

DrivingAgent: Design and Scheduling Agents for Autonomous Driving Systems

Many autonomous driving systems are increasingly incorporating foundation models to improve generalization and handle long-tail scenarios. However, this trend introduces two key challenges: (i) the manual and labor-intensive process of designing and integrating new models, and (ii) the lack of intelligent, dynamic scheduling mechanisms to meet strict real-time constraints. While Large Language Model (LLM)-based agents offer a promising avenue for automation, existing frameworks are ill-suited for autonomous driving. Specifically, they fail to distinguish between the fundamentally different requirements of system design and real-time scheduling, treat modules as opaque black boxes, and are not designed for continuous operation. To address these limitations, we propose DrivingAgent, a novel agent framework tailored to the dual challenges of autonomous driving system design and scheduling. In the design phase, DrivingAgent automates module development by interpreting system architecture, generating code, and validating modules via super-network training. In the scheduling phase, it employs a lightweight LLM trained with reinforcement learning to dynamically orchestrate system modules in real time, supported by a structured memory that integrates long-term storage with timestamped short-term context. Experimental results demonstrate that DrivingAgent achieves a superior speed–accuracy trade-off on both the nuScenes and Bench2Drive benchmarks.

16.
arXiv (CS.CV) 2026-06-16

Learning Fine-Grained Correspondence with Cross-Perspective Perception for Open-Vocabulary 6D Object Pose Estimation

Open-vocabulary 6D object pose estimation empowers robots to manipulate arbitrary unseen objects guided solely by natural language. However, a critical limitation of existing approaches is their reliance on unconstrained global matching strategies. In open-world scenarios, trying to match anchor features against the entire query image space introduces excessive ambiguity, as target features are easily confused with background distractors. To resolve this, we propose Fine-grained Correspondence Pose Estimation (FiCoP), a framework that transitions from noise-prone global matching to spatially-constrained patch-level correspondence. To systematically eliminate background interference, FiCoP first employs an object-centric disentanglement step to isolate the target from macro-level environmental noise. Building upon this localized region, our core methodological innovations are twofold. Firstly, a Cross-Perspective Global Perception (CPGP) module is proposed to fuse dual-view features, establishing structural consensus through explicit context reasoning and text-guided semantic injection. Secondly, we design a Patch Correlation Predictor (PCP) that leverages a patch-to-patch correlation matrix as a structural prior. This generates a precise block-wise association map, acting as a spatial filter to enforce fine-grained, noise-resilient matching. Experiments on the REAL275 and Toyota-Light datasets demonstrate that FiCoP improves Average Recall by 8.0% and 6.1%, respectively, compared to the state-of-the-art method, highlighting its capability to deliver robust and generalized perception for robotic agents operating in complex, unconstrained open-world environments. The source code will be made publicly available at https://github.com/zjjqinyu/FiCoP.

17.
arXiv (CS.AI) 2026-06-19

Formal Verification of Learned Multi-Agent Communication Policies via Decision Tree Distillation

arXiv:2606.19632v1 Announce Type: cross Abstract: Multi-agent reinforcement learning (MARL) enables agents to develop coordination strategies through emergent communication, but neural policies lack the formal safety guarantees required for safety-critical robotic deployment in drone swarms and autonomous vehicle fleets. We present the first end-to-end framework for safety verification of learned multi-agent communication policies through policy abstraction: neural policies are distilled into interpretable decision trees, then formally verified, with empirical validation confirming that verified safety properties transfer to original networks. Our four-stage pipeline consists of domain-specific feature extraction from agent observations, decision tree distillation achieving 97.9% +/- 1.2% fidelity to neural policies, automated translation to PRISM probabilistic model checker specifications with complete feature-to-state-variable correspondence, and compositional verification of Probabilistic Computation Tree Logic (PCTL) properties via pairwise decomposition with union-bound aggregation and empirical neighbor modeling. Evaluating Vector-Quantized Variational Information Bottleneck (VQ-VIB) policies for multi-drone coordination with 5-7 agents, we verify 18 temporal logic properties across safety, liveness, and cooperation, achieving 88.9% property satisfaction with all five safety thresholds satisfied (0.3% collision probability vs. 1% threshold). Monte Carlo validation of original neural policies confirms that verified safety properties transfer with

18.
medRxiv (Medicine) 2026-06-10

A Three-Tier Operational Benchmark for Evaluating Large Language Models on Hospital Medication Safety

Objective. To introduce PsiBench, a clinically validated medication-safety benchmark for evaluating large language models (LLMs) against the standards used to certify hospital computerized provider order entry (CPOE) and electronic health record (EHR) systems, and a non-overlapping three-tier evaluation framework separating highest-stakes discrimination, the operational CDS regime, and category-correct alerting. Materials and Methods. PsiBench comprises 492 medication-safety scenarios across 11 safety categories, created by clinical pharmacology experts whose work underpins an annualized testing procedure used by more than 2,000 U.S. hospitals. The three-tier framework partitions the scenarios non-overlappingly: Discrimination (98 scenarios, 50 fatal vs 48 deception, near-balanced 51%/49%); Operational (394 scenarios, 261 serious unsafe plus 133 safe including 41 Excessive Alerts reclassified as operational negatives); and Attribution (311 alert-required scenarios). We evaluated 40 frontier LLMs from 10 providers over 3 runs per scenario at temperature 0.2 (or the provider default where temperature is not configurable), yielding 59,040 evaluations conducted April 21-23, 2026. Results. Headline binary performance on the full benchmark spans a wide range across the 40 models: F1 78.5%-92.3%, accuracy 65.4%-89.8%, sensitivity 81.4%-100.0%, specificity 6.1%-81.8%. Leading models by F1 (o4-mini 92.3%; o3 92.2%) pair high sensitivity with meaningful specificity; three models saturate sensitivity at 100% but fall below 25% specificity, indistinguishable from a naive always-alert classifier. The wide spread on a single headline metric motivates tier-specific analyses, developed in a separate clinical paper. Discussion and Conclusion. PsiBench and the three-tier framework operationalize a rigorous evaluation rubric for LLM medication safety, grounded in two decades of national hospital audit experience. The framework generalizes to any binary medication-safety classifier (rule-based, conventional ML, or LLM-driven), supporting tier-aware model selection and post-deployment surveillance.

19.
arXiv (CS.CL) 2026-06-16

PVminerLLM2: Improving Structured Extraction of Patient Voice via Preference Optimization

Motivation: Patient-generated text contains critical information on patients' lived experiences, social context, and care engagement, but remains largely unstructured, limiting its use in patient-centered outcomes research. Prior work introduced the PV-Miner benchmark and PVMinerLLM models for structured extraction. However, supervised fine-tuning (SFT) alone struggles with rare, fine-grained, and unevenly distributed errors, particularly in token-critical structured outputs. Results: We present PVminerLLM2, an improved set of LLMs for structured patient voice extraction that applies preference optimization to address token-critical errors beyond the reach of supervised fine-tuning. Our method introduces (i) a preference objective with token-level gated stabilization term that prevents degradation of absolute token likelihood under preference optimization, and (ii) confusion-aware preference pair construction to better capture low-separation distinctions. We further incorporate token-importance weighting and inverse-frequency reweighing to address token imbalance and class skew. Across multiple model sizes, PVMinerLLM2 consistently outperforms strong baselines, achieving gains of up to 4.43% (Code), 3.50% (Sub-code), and 1.55% (Span), and outperforms baseline LLM trained with existing preference optimization methods. Availability and Implementation: The supplementary material, code, evaluation scripts, and trained models for PVminerLLM2 are publicly available at: https://github.com/Data-Mining-Lab-Yale/PVminerLLM2

20.
arXiv (CS.LG) 2026-06-18

Clustering and Pruning in Causal Data Fusion

arXiv:2505.15215v3 Announce Type: replace-cross Abstract: Data fusion, the process of combining observational and experimental data, can enable the identification of causal effects that would otherwise remain non-identifiable. Although identification algorithms have been developed for specific scenarios, do-calculus remains the only general-purpose tool for causal data fusion, particularly when variables are present in some data sources but not others. However, approaches based on do-calculus may encounter computational challenges as the number of variables increases and the causal graph grows in complexity. Consequently, there exists a need to reduce the size of such models while preserving the essential features. For this purpose, we propose pruning (removing unnecessary variables) and clustering (combining variables) as preprocessing operations for causal data fusion. We generalize earlier results on a single data source and derive conditions for applying pruning and clustering in the case of multiple data sources. We give sufficient conditions for inferring the identifiability or non-identifiability of a causal effect in a larger graph based on a smaller graph and show how to obtain the corresponding identifying functional for identifiable causal effects. Examples from epidemiology and social science demonstrate the use of the results.

21.
arXiv (CS.CV) 2026-06-12

Ex-Omni: Enabling 3D Facial Animation Generation for Omni-modal Large Language Models

Omni-modal large language models (OLLMs) aim to unify multimodal understanding and generation, yet extending them to jointly produce speech and 3D facial animation remains largely unexplored despite its importance for natural human-computer interaction. A key challenge is the mismatch between the discrete semantic reasoning of LLMs and the dense temporal dynamics required for 3D facial motion. We propose Expressive Omni (Ex-Omni), an open-source model that augments OLLMs with native speech-accompanied 3D facial animation. Ex-Omni decouples semantic reasoning from temporal generation through a blendshape-aware speech unit generator and a blendshape decoder, where speech units provide temporal scaffolding and hidden speech representations carry facially relevant cues. We further introduce a unified token-as-query gated fusion (TQGF) mechanism for controlled semantic injection, as well as InstructS2SF-1200K, a dataset consisting of 1200K samples for pre-training. Extensive experiments show that Ex-Omni maintains competitive speech understanding and generation ability while achieving better audio-visual synchronization and lower face-generation latency than cascaded pipelines.

22.
arXiv (math.PR) 2026-06-16

Effective Resistances and Commute Times in Sparse Random Geometric Graphs

arXiv:2606.14895v1 Announce Type: new Abstract: The commute time between two nodes in a network - the expected number of steps for a random walk to travel from one node to the other and then return - is a metric of broad importance arising in community detection, network routing, dimensionality reduction, and diffusion modeling. For random geometric graphs (RGGs), in which nodes are placed at random in a spatial domain and connected pairwise wherever their Euclidean distance is below a threshold radius, the relationship between commute times and the embedding geometry remains poorly understood outside very dense settings (where the role of the geometry disappears and commute times degenerate to a sum of inverse degrees). We develop and numerically validate a model for approximating commute times in sparse RGGs on a torus by combining theoretically motivated geometric contributions with an inverse degree sum. The geometric terms include a universal logarithmic contribution from the Laplacian, a quadratic correction encoding the compact topology of the torus, and a quartic angular term reflecting the square anisotropy of the domain. We fit this model to samples of node pairs across a range of graph sizes and mean degrees, demonstrating good predictive performance and that the geometric terms contribute significantly to model fit. We then study the continuous perturbation of the model from a regular square lattice to a fully random geometric graph, further validating the functional model form through this transition and showing how commute times in sparse RGGs retain meaningful geometric information about the embedding space.

23.
arXiv (CS.LG) 2026-06-17

Discovering Functionally Selective Brain Regions with a Deep Topographic Multimodal Model

arXiv:2606.09770v2 Announce Type: replace-cross Abstract: Nearby neurons in cortex share similar response profiles, producing systematic spatial organization across sensory and cognitive systems. Recent topographic models reproduce aspects of this structure but remain unimodal and spatially constrain each layer separately, yielding fragmented maps that capture neither the contiguity of cortical processing streams nor their integration across modalities. We introduce Topo-Omni, a topographic multimodal model in which visual, auditory, and language/cognitive processing share a single contiguous in-silico sheet. Built by fine-tuning a pretrained foundation model with a spatial smoothness objective, this architecture develops clusters across modalities that are consistent with human neuroimaging, from sensory to cognitive systems. Driving or suppressing a cluster selectively biases or impairs perception, paralleling human intervention studies. Finally, we use our model to screen for novel clusters in-silico and discover new natural landscape and animal networks which we validate in human data. A single spatial principle thus organizes representations across modalities and processing stages, yielding testable hypotheses about cortical organization.

24.
arXiv (CS.LG) 2026-06-19

Reversible Residual Normalization Alleviates Spatio-Temporal Distribution Shift

arXiv:2604.15838v2 Announce Type: replace Abstract: Distribution shift severely degrades the performance of deep forecasting models. While this issue is well-studied for individual time series, it remains a significant challenge in the spatio-temporal domain. Effective solutions like instance normalization and its variants can mitigate temporal shifts by standardizing statistics. However, distribution shift on a graph is far more complex, involving not only the drift of individual node series but also heterogeneity across the spatial network where different nodes exhibit distinct statistical properties. To tackle this problem, we propose Reversible Residual Normalization (RRN), a novel framework that performs spatially-aware invertible transformations to address distribution shift in both spatial and temporal dimensions. Our approach integrates graph convolutional operations within invertible residual blocks, enabling adaptive normalization that respects the underlying graph structure while maintaining reversibility. By combining Center Normalization with spectral-constrained graph neural networks, our method captures and normalizes complex Spatio-Temporal relationships in a data-driven manner. The bidirectional nature of our framework allows models to learn in a normalized latent space and recover original distributional properties through inverse transformation, offering a robust and model-agnostic solution for forecasting on dynamic spatio-temporal systems.

25.
arXiv (CS.LG) 2026-06-16

Using Reinforcement Learning to Optimize the Global and Local Crossing Number

arXiv:2509.06108v2 Announce Type: replace-cross Abstract: Graph drawing concerns the algorithmic visualization of graphs. A good drawing of a graph is easy to read and facilitates solving tasks on the graph. Several properties have been identified to occur in good drawings of graphs. Such properties include a low number of crossings, large angles between edges, short edges, and depicting symmetries. Many of these properties are explicitly measurable metrics. This brings us to the insight that graph drawing can be seen as a game. In this paper, we study a single-player optimization game in which the player iteratively moves vertices of a straight-line graph drawing to reduce edge crossings. This game arose naturally from the automatic track of the Graph Drawing Challenge, where solutions are obtained by repeatedly performing local vertex movements. We formalize this process as a game with full information and investigate whether reinforcement learning can discover effective strategies for playing it. Our reinforcement-learning agent observes the local geometric and structural context of a vertex and selects a movement direction with the goal of reducing either the global or the local crossing number, that is, the total number of crossings or the maximum number of crossings per edge. We compare the resulting strategies to existing methods and established crossing-minimization heuristics on standard benchmark graphs. While our approach does not out-compete state-of-the-art methods for minimizing the global crossing number, it is competitive and often superior for minimizing the local crossing number.