Academic Intelligence · Curated Daily

Explore the Frontier of Global Academia

AcademicHub aggregates real-time literature from top journals and preprint platforms. Build your personal research radar and let large language models compile cross-disciplinary analysis briefings automatically.

01.
arXiv (CS.AI) 2026-06-16

NeuroSymbolic AI for Legal AI-TRISM: Trustworthy, Reliable, Interpretable, Safe Models

arXiv:2606.15646v1 Announce Type: new Abstract: Large Language Models (LLMs) have transformed natural language processing, but their lack of interpretable reasoning and tendency to hallucinate pose significant challenges for legal applications. While LLMs show promise for legal text analysis and generation, they struggle with accurate citation attribution and precedent verification. For example, in legal contexts, a single incorrect precedent can jeopardize a case. Current approaches to improve LLM reliability in legal domains suffer from two key limitations: inadequate integration of structured legal knowledge during training or fine-tuning, and insufficient verification mechanisms for generated legal content. To address these challenges, we propose the TRISM (Trustworthy, Reliable, Interpretable, Safe Models) framework, which integrates NeuroSymbolic AI principles with LLMs to leverage both neural learning capabilities and symbolic reasoning over structured legal knowledge. The TRISM approach addresses the above limitations while maintaining interpretable decision pathways. Our framework formalizes the extraction of symbolic knowledge from legal textual documents and incorporates Retrieval-Augmented Generation (RAG) as a core component for grounding LLM outputs in verified legal sources. In this position paper, we make the following contributions: (1) An analysis of the limitations of AI in law; (2) Introduce RASOR RAG which creates foundations for neurosymbolic RAG by generating explicit interpretable rationales that could be formalized into symbolic representations; (3) A formalized methodology for creating symbolic legal knowledge bases that support both interpretable reasoning and output verification in LLMs; and (4) The TRISM framework for integrating symbolic legal knowledge with LLMs.

02.
arXiv (CS.AI) 2026-06-16

Autonomous End-to-End SOH Prediction Services for Battery Systems via Temporal-Contrastive Representation Learning

arXiv:2606.16434v1 Announce Type: cross Abstract: Accurate state of health (SOH) estimation is a critical diagnostic service for lithium-ion battery management. However, reliance on labor-intensive manual feature engineering and opaque black-box models hinders scalable industrial deployment. To address this, we introduce TC-SOH: a modular, plug-and-play service architecture for autonomous, end-to-end SOH prediction. TC-SOH employs a temporal-contrastive mechanism and a cross-window prediction pretext task to extract degradation-relevant representations directly from raw operational data. To improve transparency, we connect model efficacy with representation diagnostics: visualization, sensitivity analysis, redundancy analysis, bidirectional probing, future-SOH probing, and temporal shuffling show that learned features overlap with selected expert descriptors while retaining additional SOH-relevant variation, and that ordered temporal context improves subsequent-SOH prediction. Across four public datasets, TC-SOH outperforms the considered physics-informed and data-driven baselines, reducing MAPE by 1.91 times and RMSE by 2.13 times.

03.
medRxiv (Medicine) 2026-06-11

Plasma protein prioritisation in rheumatoid arthritis reveals druggable targets and shared biology with cardiovascular diseases

Abstract Background Rheumatoid arthritis (RA) is an autoimmune inflammatory disease with complex and incompletely understood molecular mechanisms. Understanding circulating proteins associated with RA may improve understanding of disease biology and clarify its pathological links with cardiometabolic comorbidities. Methods A proteome-wide two-sample Mendelian randomisation (MR) drug target analysis was conducted using plasma proteins measured in 54,219 participants from the UK Biobank Pharma Proteomics Project as exposures and RA and cardiometabolic diseases as the outcomes. Summary statistics for RA included 53,663 cases and 1,070,200 controls. Colocalisation analysis was performed to confirm shared single causal variants and prioritise RA proteins supported by both MR and colocalisation. The prioritised proteins were then evaluated in the Accelerating Medicines Partnership RA Phase II synovial single-cell dataset for cell-type expression patterns. Druggability was then assessed followed by analysis of genetic overlap between RA-associated proteins and cardiometabolic diseases. Results 37 plasma proteins had a causal effect on RA risk, supported by combined evidence from MR and conditional colocalisation. In synovial tissue, TPPP3, RARRES2, AKAP12, and GGT5 were predominantly expressed in stromal and endothelial cell clusters. Druggability assessment identified IFNGR2, IL6R, CD40, and FCGR2B as Tier 1 targets. However, several biologically relevant proteins, including RARRES2, AKAP12, TPPP3, and SNX2, had limited available druggability data. Genetic overlap analysis demonstrated shared protein signals between RA and cardiovascular diseases, including overlap of RARRES2 and TPPP3 with coronary artery disease (CAD) and FCGR2B with atrial fibrillation (AF). To approximate the therapeutic effect of target inhibition, the direction of effect estimates for proteins showing overlap between RA-CAD and RA-AF was reversed. Conclusion This study identified circulating proteins involved in RA pathogenesis and reveals shared mechanisms between RA and cardiovascular diseases. While some proteins showed clear translational potential targets, several prioritised proteins had limited available druggability information and could not be confidently classified. Addressing these gaps may help identify new targets relevant to RA management. Future work should also use phenome-wide MR studies to evaluate potential on-target adverse effects of protein inhibition across RA-CAD and RA-AF.

04.
arXiv (CS.CV) 2026-06-12

TimeLens: On-Device Artifact Recognition with Retrieval-Augmented Question Answering for the Grand Egyptian Museum

TimeLens is an AI-powered bilingual mobile guide for the Grand Egyptian Museum (GEM). Pointing a phone at an exhibit, a visitor sees the artifact recognized in real time and can ask follow-up questions answered in English or Arabic. The work addresses three problems specific to in-gallery deployment: fine-grained visual similarity among 51 catalogued artifacts (many near-identical Ramesside statues), the gap between curated training data and handheld camera conditions, and the risk of an AI guide stating unsupported historical facts. Two engineering contributions are reported. First, an on-device artifact detector was developed through a data-quality-driven iteration study – from foundation-model auto-annotation (YOLO-World), through spatial label-cleaning rules, to a fully hand-annotated dataset – isolating label quality as the decisive factor: the final YOLOv8n model resolves every previously failing class while remaining a 5.97 MB TensorFlow Lite asset that runs in real time on a mid-range phone (mAP@0.5 = 0.995, mAP@0.5:0.95 = 0.924). Second, a bilingual Retrieval-Augmented Generation (RAG) guide, grounded in a 108-record ChromaDB knowledge base, was benchmarked across seven candidate language models, with Gemma 4 E2B (Q4 K M) selected; ten targeted optimizations reduce end-to-end latency from over 30 s to approximately 10 s. Both subsystems are integrated in a production Flutter application with bilingual interface, museum location gating, and text-to-speech support.

05.
arXiv (CS.AI) 2026-06-17

A Gradient-based Causal Discovery Framework with Applications to Complex Industrial Processes

arXiv:2507.11178v3 Announce Type: replace-cross Abstract: With the advancement of deep learning technologies, various neural network-based Granger causality models have been proposed. Although these models have demonstrated notable improvements, several limitations remain. Most existing approaches adopt the component-wise architecture, necessitating the construction of a separate model for each time series, which results in substantial computational costs. In addition, imposing the sparsity-inducing penalty on the first-layer weights of the neural network to extract causal relationships weakens the model's ability to capture complex interactions. To address these limitations, we propose Gradient Regularization-based Neural Granger Causality (GRNGC), which requires only one time series prediction model and applies $L_{1}$ regularization to the gradient between model's input and output to infer Granger causality. Moreover, GRNGC is not tied to a specific time series forecasting model and can be implemented with diverse architectures such as KAN, MLP, and LSTM, offering enhanced flexibility. Numerical simulations on DREAM, Lorenz-96, fMRI BOLD, and CausalTime show that GRNGC outperforms existing baselines and significantly reduces computational overhead. Meanwhile, experiments on real-world DNA, Yeast, HeLa, and bladder urothelial carcinoma datasets further validate the model's effectiveness in reconstructing gene regulatory networks.

06.
arXiv (CS.AI) 2026-06-17

Unlocking LLM Code Correction with Iterative Feedback Loops

arXiv:2606.17514v1 Announce Type: cross Abstract: Large Language Models have shown remarkable capabilities in code generation. However, most existing evaluations focus only on single-attempt accuracy and overlook the iterative refinement process that is central to real-world programming. This study presents a systematic investigation of LLMs' ability to rectify their own code through execution feedback. Using real-world programming problems across four models and two major programming languages, this study evaluates performance using iterative refinement framework where LLMs receive compiler error messages and testcase feedback after each attempt. This study introduces metrics to evaluate code failures, analyze rectification patterns, and compare the effectiveness of reasoning and non-reasoning models, offering actionable insights into both the understanding and practical application of feedback loops in LLM-driven code generation systems. Results show that reasoning models consistently improve over iterations, substantially outperforming non-reasoning models in leveraging feedback, while syntactic and runtime errors are far more tractable than logical or algorithmic failures.

07.
arXiv (math.PR) 2026-06-16

Scaling Limits of Bivariate Nearly-Unstable Hawkes Processes and Applications to Rough Volatility

arXiv:2605.03703v3 Announce Type: replace Abstract: We study a pair of nearly-unstable Hawkes processes coupled through a one-directional, or triangular, cross-excitation: the first component evolves autonomously and excites the second, but not conversely. Each component is self-exciting through a heavy-tailed memory kernel, and the two kernels are allowed to have different tail indices, so that the limiting components exhibit genuinely different degrees of roughness. As the system approaches criticality, we prove that the suitably rescaled intensity vector converges weakly to the unique solution of a coupled system of stochastic Volterra equations of rough-volatility type. The first limiting component is autonomous, while the second is driven both by its own noise and by an inherited noise transmitted from the first component through an effective cross-kernel. This cross-kernel is the convolution of the two limiting Mittag-Leffler kernels and therefore combines the two memory structures. As a consequence, we obtain a short-time cross-decorrelation law: although the two components are coupled, their functional correlation vanishes at small time scales at an explicit polynomial rate. This time-dependent correlation distinguishes the limit from independent rough processes and from classical bivariate rough models with constant Brownian correlation.

08.
arXiv (CS.AI) 2026-06-16

Post-Hoc Merging is Not Enough: Many-Shot Model Merging with Loss-Gap Balancing

arXiv:2606.16501v1 Announce Type: new Abstract: Model merging has become a practical post-training strategy for building a single multi-task large language model (LLM) by combining multiple task-specialized models. However, most existing approaches rely on post-hoc merging, in which task-specific models are merged only once after training. This one-shot aggregation often suffers from task interference, leading to information erasure across individual tasks. In this work, we show that replacing post-hoc merging with an iterative many-shot merging protocol is effective in improving multi-task performance. Building on this insight, we propose METIS, Mitigating Erasure from Task Interference for Stable many-shot merging. METIS is a loss-aware many-shot merging method that addresses information erasure in post-hoc merging through task-wise loss-gap weighting and consensus-based masking. Notably, METIS exhibits significant performance improvement on the worst-performing task, effectively mitigating information erasure. (Project page: https://imkyungjin.github.io/METIS/)

09.
arXiv (CS.AI) 2026-06-12

Prism: Cost-Efficient Multi-LLM Serving via GPU Memory Ballooning

arXiv:2505.04021v3 Announce Type: replace-cross Abstract: Inference providers must maintain availability for many LLMs, including low-volume but essential models, making resource efficiency increasingly important as token prices fall. Analysis of production traces reveals a dynamic bursty-group pattern in which sets of models become active together and shift over time; existing space- and time-sharing approaches lack principled mechanisms to adapt to this variability, forcing trade-offs between SLO adherence and efficiency. We observe that elastic memory allocation can unify spatial and temporal sharing. Based on this insight, we have developed Prism, a memory-centric LLM co-serving framework that applies memory ballooning to reclaim memory across models and support both forms of sharing under a single scheme. Prism's balloon driver, referred to as kvcached, has been open-sourced at https://github.com/ovg-project/kvcached, and deployed in production environments across 10K+ GPUs.

10.
arXiv (CS.CV) 2026-06-16

HadBalance: A Plug-and-Play Unified Global Geometric Prior Framework for Generalizable Biomedical Segmentation

Precise biomedical image segmentation is crucial for clinical diagnosis. Geometric cues (e.g., boundary, shape, and topology) can improve structural consistency, yet most are task-specific and lack a unified geometric foundation that generalizes across organs and modalities. We are motivated by the observation that several medical segmentation targets can be approximated as globally near-convex shapes. A convex region is one in which any two interior points can be connected by a line segment entirely contained within the region. In practice, medical targets may exhibit small local concavities or boundary irregularities; we refer to such globally convex-like shapes as near-convex. Motivated by this, we derive Hadwiger Shape Priors from Hadwiger's theorem as an interpretable global regularizer using three 2D measures: area A, perimeter P, and Euler characteristic chi, enabling transfer across organs and modalities. However, because medical datasets are shape-heterogeneous, enforcing near-convex priors uniformly can over-regularize non-convex anatomy with significant concavities, washing out concavities and fine details and degrading segmentation accuracy. To address this challenge, we propose Conflict-Aware Objective Balancing (CAOB), which integrates shape priors with segmentation in a gradient-aware manner. For each prior, CAOB removes only the gradient component that conflicts with segmentation while preserving the remaining aligned component, and adaptively regulates objective influences to prevent prior dominance. This enables stable use of shape priors on shape-heterogeneous data without erasing genuine concavities or fine structural details. We call this plug-and-play framework HadBalance.

11.
arXiv (CS.AI) 2026-06-16

Optimal Transport for Machine Learners

arXiv:2505.06589v2 Announce Type: replace-cross Abstract: Modern machine learning repeatedly manipulates probability measures: empirical datasets, generated samples, latent distributions, class-conditional laws, particle systems, weights of wide networks and attention patterns. Optimal transport is useful in this setting because it compares such objects by asking how mass should move. It therefore combines a statistically meaningful notion of discrepancy with a geometry of interpolation, dual certificates and variational dynamics. This makes OT a common language for losses, generative modeling, domain adaptation, robust learning, barycenters, gradient flows and mean-field descriptions of learning algorithms. This book presents the main OT techniques with these machine-learning uses in mind. It starts from finite assignment and the Monge map viewpoint, passes to Kantorovich couplings and dual potentials, and then explains the algorithmic ideas that make transport usable: linear programming, semi-discrete cells, Sinkhorn scaling and low-dimensional projections. The same objects are then reused as a geometry of measures, giving Wasserstein distances, barycenters, gradient flows, dynamic formulations and Gaussian/Bures formulas. The final chapters emphasize the variants most relevant to modern ML: divergences and adversarial losses, entropic and unbalanced relaxations, robust or spectral ground geometries, Gromov and quantum extensions, and transport-based views of generative models, mean-field networks and attention dynamics. The goal is to keep the mathematics explicit while exposing the computational and geometric intuitions needed to turn OT into a working toolbox for machine learners.

12.
arXiv (CS.AI) 2026-06-16

AnonShield: Scalable On-Premise Pseudonymization for CSIRT Vulnerability Data

arXiv:2606.15650v1 Announce Type: cross Abstract: We present AnonShield, a high-throughput, on-premise pseudonymization system that combines GPU-accelerated NER, streaming processing, caching, and schema-aware configuration. Evaluated on datasets up to 550 MB (70,951 records), AnonShield reduces processing time from over 92 hours to under 10 minutes (up to 738x speedup) while achieving up to 94.2% F1-score and 96.7% recall. Our results show that scalable pseudonymization of vulnerability data is feasible without sacrificing analytical utility, enabling compliant data sharing in operational CSIRT environments.

13.
arXiv (CS.AI) 2026-06-17

Trustworthy Self-Composable Big-Data-as-a-Service: An LLM-Orchestrated Multi-Agent Framework for Automated Data Engineering, AutoML, MLOps Deployment, and Drift-Aware Lifecycle Optimization

arXiv:2606.17915v1 Announce Type: cross Abstract: Big-Data-as-a-Service (BDaaS) platforms require re liable automation across data ingestion, cleaning, feature engi neering, model development, deployment, and post-deployment monitoring. However, existing LLM-based data science agents and AutoML systems mainly focus on isolated workflow stages, leaving limited support for lifecycle-level orchestration, artifact governance, human oversight, and drift-aware adaptation. This paper proposes a trustworthy self-composable BDaaS frame work based on LLM-orchestrated multi-agent collaboration. The proposed architecture decomposes the BDaaS lifecycle into specialized agents for data ingestion, data cleaning, feature engineering, AutoML training, model evaluation, MLOps de ployment, monitoring, and drift detection. A central LLM or chestration layer coordinates agent execution, validates interme diate outputs, manages workflow context, and enables dynamic workflow composition. The framework also incorporates shared artifact governance, reproducibility support, human-in-the-loop checkpoints, and drift-aware feedback loops. A prototype-based evaluation is conducted using controlled tabular benchmark datasets with missing values, categorical variables, outliers, class imbalance, and simulated covariate drift. Compared with manual ML, AutoML-only, and single-agent LLM baselines, the pro posed multi-agent BDaaS pipeline achieves competitive predictive performance while improving lifecycle-level reliability, including workflow completion, artifact traceability, deployment readiness, reproducibility, and drift recovery. The results suggest that LLM-orchestrated multi-agent systems can extend conventional AutoML toward trustworthy, adaptive, and production-oriented BDaaS lifecycle automation.

14.
bioRxiv (Bioinfo) 2026-06-15

VrySure: A Multi-Task AI Scientific Fraud Detection Platform for Identifying Manipulated and AI-Generated Biomedical Research Images

Integrity of scientific data is critical in biomedical research, where images often serve as primary evidence for experimental observations and conclusions. Advances in image-editing technologies and generative artificial intelligence (AI) have increased the accessibility and realism of visual manipulation, making detection through manual review increasingly challenging. To empower our laboratory researchers to continuously monitor and uphold scientific rigor and data integrity, and serve the global scientific community, we developed VrySure, an easy-to-deploy, AI-driven multi-task platform for automated image-integrity screening in biomedical research. VrySure integrates four detection modules: cross-image transformation detection, within-image copy-move detection, splicing detection in blot and gel images, and AI-generated image detection. The system identifies potentially manipulated images and, when possible, localizes suspicious regions using bounding-box outputs to support downstream verification. To support development and evaluation, we constructed task-specific datasets by combining public biomedical image resources, curated manipulated examples, and synthetic images generated by multiple generative AI systems. We evaluated VrySure using region-level F1 score, recall, precision, false negative rate (FNR), and false discovery rate (FDR) across multiple manipulation categories and compared its performance with two commonly used commercial image-integrity screening platforms under a predefined benchmark protocol. Under the tested conditions, VrySure achieved a higher F1 score and recall, lower FNR, and maintained a low FDR for within-image copy-move detection, splicing detection, and AI-generated image detection, while showing comparable performance in transformation detection. Beyond automated screening, VrySure is designed to support source-data comparison and evidence-based assessment in scientific integrity investigations. By integrating multiple detection capabilities into a unified and scalable workflow, VrySure provides a practical framework to improve the efficiency and consistency of image-integrity screening in biomedical research.

15.
arXiv (math.PR) 2026-06-16

Effective Resistances and Commute Times in Sparse Random Geometric Graphs

arXiv:2606.14895v1 Announce Type: new Abstract: The commute time between two nodes in a network - the expected number of steps for a random walk to travel from one node to the other and then return - is a metric of broad importance arising in community detection, network routing, dimensionality reduction, and diffusion modeling. For random geometric graphs (RGGs), in which nodes are placed at random in a spatial domain and connected pairwise wherever their Euclidean distance is below a threshold radius, the relationship between commute times and the embedding geometry remains poorly understood outside very dense settings (where the role of the geometry disappears and commute times degenerate to a sum of inverse degrees). We develop and numerically validate a model for approximating commute times in sparse RGGs on a torus by combining theoretically motivated geometric contributions with an inverse degree sum. The geometric terms include a universal logarithmic contribution from the Laplacian, a quadratic correction encoding the compact topology of the torus, and a quartic angular term reflecting the square anisotropy of the domain. We fit this model to samples of node pairs across a range of graph sizes and mean degrees, demonstrating good predictive performance and that the geometric terms contribute significantly to model fit. We then study the continuous perturbation of the model from a regular square lattice to a fully random geometric graph, further validating the functional model form through this transition and showing how commute times in sparse RGGs retain meaningful geometric information about the embedding space.

16.
arXiv (CS.AI) 2026-06-16

SCAN: A Decision-Making Framework for Effective Task Allocation with Generative AI

arXiv:2606.15601v1 Announce Type: cross Abstract: We introduce SCAN – a human-centric decision-making framework to facilitate learners for effective task allocation with Generative Artificial Intelligence (GenAI) based on Vygotsky's Zone of Proximal Development and Metacognition. In SCAN, we systematize and formalize AI-human interaction by introducing a task-identification approach with four "sub-zones": Substitute, Complement, Aid, and Non-negotiable. After describing the four sub-zones, we demonstrate how SCAN framework can be applied for knowledge workers in the workplace and students in education to metacognitively "scan" their use of Generative AI. We then discuss how such framework can be related to cognitive load theory, cognitive offloading, sycophancy, three decision-making modes in human-AI interactions (automation, augmentation, and collaboration), future of work such as upskilling and deskilling, and how it accounts for both human-human and human-AI learning. We propose that SCAN offers a great starting point before discussing whether GenAI complements or replaces our abilities when completing a task, with a general objective of sustaining lifelong learning, and a specific goal of reaching hybrid intelligence.

17.
arXiv (CS.CL) 2026-06-16

ttda704 at SemEval-2026 Task 4: Modeling Narrative Structures via Pseudonymization and Multi-View Sentence Alignment

We present our approach to SemEval 2026 Task 4: Narrative Story Similarity and Narrative Representation Learning. Our solution uses contrastive learning with fine-tuned sentence transformers to capture narrative similarity across abstract themes, course of action, and outcomes. We develop two pipelines: (Track A) a single-view method that encodes full narratives with smart layer freezing to reduce overfitting, and (Track B) a multi-view method that models theme, plot, and outcome with view-specific projection heads and self-supervised alignment. Both pipelines build on sentence-transformers models and are trained with contrastive loss on synthetic data. The code is available at the following GitHub repository: https://github.com/dinhthienan33/SemEval2026-Task4-ttda704.

18.
arXiv (CS.CV) 2026-06-16

HSQ-VLM: A Novel Spatially-Constrained Quadrant Segmentation VLM Model for Explainability in Diabetic Retinopathy

Authors:

Diabetic Retinopathy (DR) is an aggressive retinal disease and a leading cause of global blindness, yet its clinical management is currently hindered by the black-box nature of diagnostic AI. While deep learning models achieve high classification accuracy, there is a critical lack of explainability methods capable of detailing the exact anatomical landmarks and lesion distributions that lead to a clinical decision for DR. Therefore, we propose HSQ-VLM, a novel quadrant segmentation pipeline on fundus images that utilizes a Landmark-Anchored Cartesian Cross-Attention mechanism to unify visual feature extraction with structured clinical reasoning. Unlike traditional methods that rely on arbitrary image partitioning, our pipeline implements 4-quadrant Topological Latent Partitioning (TLP) to dynamically align retinal features with a fovea-centered coordinate system. This allows the Vision-Language Model to generate natural language reports that quantify pathology with anatomical precision. On a dataset of 3,500 high-resolution fundus images, this innovative methodology achieved a lesion detection sensitivity of 99.6% for hemorrhages and 96.4% for microaneurysms, while demonstrating a significant reduction in boundary-ambiguity errors compared to standard segmentation baselines.

19.
arXiv (CS.AI) 2026-06-18

TMR-GGNN: Credit Card Fraud Detection based on Time-Aware Multi-Relational Guided Graph Neural Network

arXiv:2606.18444v1 Announce Type: cross Abstract: In recent years, credit card fraud detection has faced significant challenges due to highly imbalanced data, evolving fraud patterns, and complex relational structures among transaction entities. To address these issues, this research proposes a novel framework called Timeaware Multi Relational Guided Graph Neural Network (TMR GGNN). Particularly, the proposed TMR GGNN extends the encoder decoder Graph Neural Network GNN architecture by modeling heterogeneous interactions across customers, merchants, devices, and IPs over temporal windows. Subsequently, the proposed TMR GGNN approach constructs a dynamic, multi relational graph and incorporates a time aware relational attention mechanism within the encoder to adaptively weigh the transaction relevance based on temporal proximity and semantic context. Consequently, the decoder employs a contrastive learning module to distinguish between real and synthesized transaction patterns, while improving the models generalization of rare fraud cases. Additionally, to effectively manage severe class imbalances and emphasize discriminative learning, a composite loss function combining Information Noise Contrastive Estimation (InfoNCE) based contrastive loss with Focal Loss is introduced. This integration assists in improving fraud identification while mitigating false negatives.

20.
arXiv (CS.LG) 2026-06-15

Smoothing Dark Areas in Molecular Latent Diffusion

arXiv:2606.13955v1 Announce Type: new Abstract: Latent diffusion is a promising framework for scalable 3D molecular generation, but it requires a latent space that remains smooth, valid, and navigable beyond posterior samples. Existing molecular VAEs, however, are typically learned through reconstruction-based objectives, which do not guarantee such a latent space. We show that this leads to dark areas: regions of latent space that are reachable during diffusion sampling but decode to disconnected or chemically invalid molecules. Unlike in image generation, molecular decoding requires strict structural and chemical precision, so even small latent perturbations can produce catastrophic failures. We therefore propose TopVAE, a topology-optimized VAE that reduces dark areas by making the decoder internalize structural and chemical constraints during training, eliminating the need for test-time chemical correction. TopVAE greatly improves off-posterior robustness, and when paired with a standard DiT, achieves $77\%$ lower FCD-3D on QM9, the highest V&C, $52\%$ lower FCD-3D on GEOM-Drugs, and $1.29{\times}$ more stable and connected molecules on zero-shot scaffold inpainting.

21.
arXiv (math.PR) 2026-06-11

Sure-almost-sure and Sure-limit-sure Window Mean Payoff in Markov Decision Processes

arXiv:2605.12191v2 Announce Type: replace-cross Abstract: Given rationals $\alpha$ and $\beta$, the sure-almost-sure problem for a threshold Boolean objective $\varphi$ in a Markov decision process (MDP) asks if one can simultaneously ensure that all outcomes of the MDP have $\varphi$-value at least $\alpha$ (i.e. sure $\alpha$ satisfaction) and with probability $1$ the outcome has $\varphi$-value at least $\beta$ (i.e. almost-sure $\beta$ satisfaction). The sure-limit-sure problem asks if for all $\varepsilon > 0$ one can simultaneously ensure that all outcomes have $\varphi$-value at least $\alpha$ and with probability at least $1 - \varepsilon$ the outcome has $\varphi$-value at least $\beta$. Moreover, if simultaneous satisfaction of objectives is possible, then one would also like to construct a strategy (for sure-almost-sure) or a family of strategies (for sure-limit-sure) that achieves this. In this paper, we solve the sure-almost-sure and sure-limit-sure problems for window mean-payoff objectives. The window mean-payoff objective strengthens the standard mean-payoff objective by requiring that eventually, from every point in the infinite run, the average payoff becomes greater than a given threshold within a finite window length. We study two variants of window mean payoff: in the fixed variant, the window length $\ell$ is given, while in the bounded variant, the length is not given but is required to be bounded throughout the run. We show that the sure-almost-sure problem and the sure-limit-sure problem are both in P for the fixed variant (if $\ell$ is given in unary) and are both in NP $\cap$ coNP for the bounded variant, matching the computational complexity of sure satisfaction and almost-sure satisfaction when considered separately for these objectives. We also give bounds for the memory requirement of winning strategies for all considered problems.

22.
arXiv (CS.CV) 2026-06-11

EvoLMM: Self-Evolving Large Multimodal Models with Continuous Rewards

Recent advances in large multimodal models (LMMs) have enabled impressive reasoning and perception abilities, yet most existing training pipelines still depend on human-curated data or externally verified reward models, limiting their autonomy and scalability. In this work, we strive to improve LMM reasoning capabilities in a purely unsupervised fashion (without any annotated data or reward distillation). To this end, we propose a self-evolving framework, named EvoLMM, that instantiates two cooperative agents from a single backbone model: a Proposer, which generates diverse, image-grounded questions, and a Solver, which solves them through internal consistency, where learning proceeds through a continuous self-rewarding process. This dynamic feedback encourages both the generation of informative queries and the refinement of structured reasoning without relying on ground-truth or human judgments. When using the popular Qwen2.5-VL as the base model, our EvoLMM yields consistent gains upto $\sim$3\% on multimodal math-reasoning benchmarks, including ChartQA, MathVista, and MathVision, using only raw training images. We hope our simple yet effective approach will serve as a solid baseline easing future research in self-improving LMMs in a fully-unsupervised fashion. Our code and models are available at https://github.com/mbzuai-oryx/EvoLMM.

23.
arXiv (CS.AI) 2026-06-16

Revisiting Chebyshev Polynomial and Anisotropic RBF Models for Tabular Regression

arXiv:2602.22422v2 Announce Type: replace-cross Abstract: Smooth-basis models such as Chebyshev polynomial regressors and radial basis function (RBF) networks are well established in numerical analysis. Their continuously differentiable prediction surfaces suit surrogate optimisation, sensitivity analysis, and other settings where the response varies gradually with inputs. Despite these properties, smooth models seldom appear in tabular regression, where tree ensembles dominate. We ask whether they can compete, benchmarking models across 55 regression datasets organised by application domain. We develop an anisotropic RBF network with data-driven centre placement and gradient-based width optimisation, a ridge-regularised Chebyshev polynomial regressor, and a smooth-tree hybrid (Chebyshev model tree); all three are released as scikit-learn-compatible packages. We benchmark these against tree ensembles, a pre-trained transformer, and standard baselines, evaluating accuracy alongside generalisation behaviour. The transformer ranks first on accuracy across a majority of datasets, but its GPU dependence, inference latency, and dataset-size limits constrain deployment in the CPU-based settings common across applied science and industry. Among CPU-viable models, smooth models and tree ensembles are statistically tied on accuracy, but the former tend to exhibit tighter generalisation gaps. We recommend routinely including smooth-basis models in the candidate pool, particularly when downstream use benefits from tighter generalisation and gradually varying predictions.

24.
arXiv (CS.CV) 2026-06-16

SLUM-i: Semi-supervised Learning for Urban Mapping of Informal Settlements and Data Quality Benchmarking

Rapid urban expansion has fueled the growth of informal settlements in major cities of low- and middle-income countries, with Lahore and Karachi in Pakistan and Mumbai in India serving as prominent examples. However, large-scale mapping of these settlements is severely constrained not only by the scarcity of annotations but by inherent data quality challenges, specifically high spectral ambiguity between formal and informal structures and significant annotation noise. We address this by introducing a benchmark dataset for Lahore, constructed from scratch, along with companion datasets for Karachi and Mumbai, which were derived from verified administrative boundaries, totaling approximately 900 $km^2$ of urban area. This collection is supplemented by four cities from prior literature across Sub-Saharan Africa and Latin America, with comprehensive data quality assessments provided for each city. We also propose a semi-supervised segmentation framework designed to mitigate the class imbalance and distribution mismatch inherent in standard semi-supervised learning pipelines. Our method integrates a Class-Aware Adaptive Thresholding mechanism that dynamically adjusts confidence thresholds to prevent minority class suppression, and a DINOv2-based unlabeled pool filter that removes out-of-distribution tiles prior to training to reduce covariate shift. Extensive experiments across seven cities spanning three continents, repeated over five random seeds, demonstrate gains of up to +5.9 pp mIoU over state-of-the-art semi-supervised baselines, with both components being architecture-agnostic and adding no inference overhead.

25.
arXiv (CS.CL) 2026-06-16

Multimodal Evaluator Preference Collapse: Cross-Modal Contagion in Self-Evolving Agents

Authors:

When AI agents use language models to evaluate their own outputs in a feedback loop, systematic biases emerge. We show that Evaluator Preference Collapse (EPC) is dramatically amplified in multimodal settings. Using GPT-4o to evaluate DeepSeek-chat across text and visual tasks, we find that a single strategy (step_by_step) absorbs 48.4% of all weight – 3.2x the collapse observed in text-only self-evaluation – while three visual-domain strategies receive only 9.1% combined weight. We then demonstrate a novel phenomenon we term cross-modal contagion: evaluator preferences acquired on one modality transfer to and corrupt strategy selection on another. Through a four-phase isolation training paradigm, we measure contagion coefficients and document strategy inversion – the optimal strategy for a modality reverses after cross-modal exposure. A Phase 3 statistical validation across four evaluator configurations (N=53 total independent repetitions, 15,592 API calls) reveals a clear hierarchy: cross-model evaluation (GPT-4o, N=8) produces strong but symmetric bidirectional contagion (mean gamma_{T->V}=1.176, gamma_{V->T}=1.089, Delta=-0.088, p=0.575, Cohen's d=0.29); high round counts (DashScope, 50 rounds) cause collapse to single-strategy dominance (70% zero contagion); and self-evaluation provides near-complete immunity – 97% of runs (N=30, DeepSeek-chat) yield exactly zero contagion (mean gamma=0.033, 95% CI [-0.031, 0.010], p=0.642, d=0.07). No evaluator condition shows statistically significant directional asymmetry. We introduce the contagion matrix indexed by evaluator identity, release the MM-EPC experimental framework, and identify cross-model evaluator architecture as the primary risk factor for preference contagion.