Academic Intelligence · Curated Daily

探索全球前沿学术脉络

AcademicHub 汇聚顶级期刊与预印本平台的实时文献。定制您的专属科研雷达,利用大语言模型自动生成交叉领域文献分析简报。

01.
arXiv (CS.AI) 2026-06-19

DynAMO:Dynamic Asset Management Orchestration via Topological Multi-Agent Scheduling

arXiv:2606.19382v1 Announce Type: cross Abstract: While LLM-powered agents offer end-to-end automation for industrial asset lifecycles, real-world Industry 4.0 deployment is hindered by latency, concurrency instability, and safety risks. We present DynAMO (Dynamic Asset Management Orchestration), a deployment-ready engine using a Plan-then-Execute architecture to generate verifiable workflow graphs. DynAMO supports both SequentialWorkflow (topological execution) and ParallelWorkflow (dependency-aware concurrency). By dynamically identifying independent tasks, DynAMO preserves structural correctness and safety while significantly improving efficiency through controlled reasoning overlap. Across six controlled experiments on the AssetOpsBench industrial benchmark, DynAMO demonstrates substantial performance and robustness gains. Parallel execution reduces end-to-end latency by a median of 1.6x over sequential orchestration, rising to 1.8x on highly parallelizable workflows. After instrumenting external tool calls with realistic latencies, a latency decomposition shows that LLM reasoning and orchestration still account for more than 90% of execution time, identifying model inference as the primary system bottleneck. Structured context pruning reduces inference latency by approximately 30%, and DynAMO maintains correct functional behaviour (task completion, agent sequencing, and output quality) while exhibiting graceful degradation under controlled fault injection. Reproducibility analysis further confirms stable execution under repeated runs, with parallel scheduling reducing latency variance. These findings establish DynAMO as a practical blueprint for scalable, safe, and latency-aware agent deployment in Industry 4.0 automation pipelines. Code is available at: https://github.com/kushwaha001/DynAMO

02.
arXiv (CS.AI) 2026-06-19

AURA: Adaptive Uncertainty-aware Refinement for LLM-as-a-Judge Auditing

arXiv:2606.19714v1 Announce Type: cross Abstract: Large language models (LLMs) are increasingly used as judges for open-ended generation, as large-scale human evaluation is often expensive and difficult to scale, yet their preferences remain imperfect proxies for human judgment. Existing auditing pipelines often assume that a reliable subset of examples or clean supervision signals are available beforehand, for example from human annotation, heuristic filtering, or the outputs of strong judges. In LLM evaluation, this assumption is fragile: the initial split may inherit judge bias, while human verification is typically too scarce to define stable groups at scale. We propose AURA, an adaptive uncertainty–aware refinement framework for auditing pairwise LLM–as–a–judge decisions under selected human verification. AURA iteratively learns a human-consistency signal, propagates reliable evidence, and prioritizes uncertain comparisons for human review. The key idea is to treat trust in a judge as a latent quantity that is progressively refined as evidence accumulates. We provide a compact formulation, a stable refinement procedure, and a comprehensive evaluation on both synthetic and real pairwise LLM-answer data.

03.
arXiv (CS.CV) 2026-06-16

Pixel-TTS: Image based Text Rendering for Robust Text-to-Speech

Recent advances in pixel-based text modeling show that representing text as images enables models to exploit visual cues for language understanding. Grounding text in its visual form allows structurally similar characters with different Unicode encodings to produce similar embeddings, benefiting cross-lingual and zero-shot scenarios. Conventional text-based approaches treat each character independently, limiting generalization to unseen characters and requiring embedding expansion during cross-lingual adaptation. We propose Pixel-TTS, the first framework for visually grounded speech synthesis. It renders text as images and projects them through a 2D convolutional layer to generate embeddings. This design eliminates embedding matrix expansion during fine-tuning while improving robustness to unseen characters and orthographic variations. Extensive experiments show Pixel-TTS achieves competitive performance with strong baselines, faster convergence and robust zero-shot generalization.

04.
Nature (Science) 2026-06-08

Distributed control circuits across a brain-and-cord connectome

Just as genomes revolutionized molecular genetics, connectomes (maps of neurons and synapses) are transforming neuroscience. To date, the only organisms with complete connectomes are worms1–3, sea squirts4, and comb jellies5 (103–104 synapses). By contrast, the fruit fly is more complex (108 synaptic connections), with a brain that supports learning and spatial memory6,7 and an intricate ventral nerve cord analogous to the vertebrate spinal cord8–12. Here we report the first densely-reconstructed adult fly connectome that unites the brain and ventral nerve cord, and we leverage this resource to investigate principles of neural control. We show that effector neurons (motor neurons, endocrine cells, and efferent neurons targeting the viscera) are primarily influenced by sensory neurons in the same body part, forming local feedback loops. These local loops are linked by long-range circuits involving ascending and descending neurons organized into behavior-centric modules. Single ascending and descending neurons are often positioned to influence the voluntary movements of multiple body parts, together with the endocrine cells or visceral organs that support those movements. Brain regions involved in learning and navigation supervise these circuits. These results reveal an architecture that is distributed, parallelized, and embodied, reminiscent of distributed control architectures in engineered systems13,14.

05.
arXiv (quant-ph) 2026-06-19

Distinguishing quantum processes with bounded coherent memory

arXiv:2606.19511v1 Announce Type: new Abstract: Distinguishing multi-time quantum processes is a fundamental task underlying the diagnosis, benchmarking, and learning of temporally correlated quantum dynamics. The standard benchmark for distinguishing two processes is the strategy-norm distance, which optimizes over arbitrary adaptive probing strategies but can require large coherent memory and time-dependent control. We introduce machines for autonomous distinction~($\mathsf{MAD}$s): probing strategies that apply the same quantum instrument at each time step, retain the full classical outcome record, and carry a coherent memory of dimension $d_A$. Optimizing over these strategies defines a memory-parametrized distinguishability measure, $d^{(N)}_{\mathsf{MAD}}(\mathbf{P}^N,\mathbf{Q}^N;d_A)$. We show that the resulting hierarchy is monotone in coherent memory and complete at finite times. Specifically, any admissible $N$-step probing strategy can be compiled into a single $\mathsf{MAD}$ with an internal counter and sufficiently large coherent memory, so the hierarchy saturates the strategy-norm benchmark. For recurrent processes generated by repeated system–environment interactions, we derive a single-step description that separates the generation of new distinguishing information from the propagation and decay of information generated at earlier times. Numerical results in a repeated-interaction model show that increasing coherent memory systematically improves the $\mathsf{MAD}$ success probability and closes the gap to the strategy-norm distance while remaining substantially more tractable to evaluate. $\mathsf{MAD}$ distinguishability therefore provides an operational and scalable framework for quantifying what can be learned about genuinely multi-time quantum processes with bounded coherent memory.

06.
arXiv (CS.AI) 2026-06-11

LaQual: An Automated Framework for LLM App Quality Evaluation

arXiv:2508.18636v2 Announce Type: replace-cross Abstract: Representing a new paradigm in software distribution, LLM app stores are rapidly emerging, offering users diverse choices for content generation, coding assistance, education, and more. However, current ranking and recommendation mechanisms in LLM app stores predominantly rely on static metrics, such as user interactions and favorites, making it challenging for users to efficiently identify high-quality apps. At the same time, current academic research focuses on specific vertical fields and lacks a general, automated evaluation framework applicable to the diverse LLM app ecosystem. To address the above challenges, we present LaQual, an automated framework for LLM app quality evaluation. LaQual integrates three key stages: (1) LLM app labeling and hierarchical classification for precise scenario mapping; (2) static indicator evaluation using time-weighted user engagement and functional capability indicators to filter low-quality apps; and (3) dynamic scenario-adapted evaluation, where an LLM generates scenario-specific evaluation metrics, scoring criteria, and tasks for comprehensive quality evaluation. Experiments on a mainstream LLM app store demonstrate the effectiveness of LaQual. Its automated scores show high consistency with human judgments. Through effective screening, LaQual can reduce the candidate LLM app pool by 66.7% to 81.3%. User studies further validate its significant outperformance over baseline systems, particularly in comparison efficiency (mean 5.45 vs. 3.30) and value of explanatory information (4.75 vs. 2.25). These results demonstrate that LaQual provides a scalable, objective, and user-centric solution for high-quality discovery and recommendation of LLM apps in real-world scenarios.

07.
arXiv (CS.CL) 2026-06-16

Fast When, Careful Who: Dual-Process Multiparty Turn-Taking with Diffusion Augmentation

Reliable turn-taking is essential for spoken dialogue systems. However, most existing methods are designed for two-speaker interaction and struggle with realistic multiparty audio containing overlap and rapid speaker changes. We study multiparty turn-taking on the VoxConverse dataset and propose an audio-only two-stage pipeline that separates when to trigger a turn boundary from whether the floor is actually transferring. A fast trigger scans the audio and proposes candidate end-of-turn times, while a lightweight verifier runs only at those times to decide \textsc{Hold} or \textsc{Shift} and support next-speaker prediction. We report results in the full multiparty setting and a controlled dyadic top-2 projection for comparability. We also investigate diffusion-based, label-preserving background-audio mixing as a data augmentation strategy. Results show improved shift detection over a baseline, with further improvements from diffusion augmentation.

08.
arXiv (CS.LG) 2026-06-18

Reinforcement Learning for Accelerated Aerodynamic Shape Optimisation

arXiv:2507.17786v2 Announce Type: replace Abstract: We introduce a reinforcement learning (RL) based adaptive optimization algorithm for aerodynamic shape optimization focused on dimensionality reduction. The form in which RL is applied here is that of a surrogate-based, actor-critic policy evaluation MCMC approach allowing for temporal 'freezing' of some of the parameters to be optimized. The goals are to minimize computational effort, and to use the observed optimization results for interpretation of the discovered extrema in terms of their role in achieving the desired flow-field. By a sequence of local optimized parameter changes around intermediate CFD simulations acting as ground truth, it is possible to speed up the global optimization if (a) the local neighbourhoods of the parameters in which the changed parameters must reside are sufficiently large to compete with the grid-sized steps and its large number of simulations, and (b) the estimates of the rewards and costs on these neighbourhoods necessary for a good step-wise parameter adaption are sufficiently accurate. We give an example of a simple fluid-dynamical problem on which the method allows interpretation in the sense of a feature importance scoring.

09.
arXiv (quant-ph) 2026-06-16

Complete entanglement detection using polynomial invariants

arXiv:2606.16712v1 Announce Type: new Abstract: Existing methods for deciding whether a bipartite quantum state is separable or entangled typically fall into one of two categories: they are either complete but require access to an explicit density matrix followed by numerical optimization, or they can be evaluated directly by measuring the quantum system but are incomplete, in the sense that they cannot detect all forms of entanglement. In this work, we overcome both limitations in a unified framework. First, we bypass numerical optimization by deriving separability criteria in the form of universal bounds on tensor powers of separable states. We prove that these bounds are complete: every entangled state violates them for sufficiently large tensor powers. Second, we explicitly construct a corresponding complete family of nonlinear entanglement witnesses, which can detect all forms of entanglement without requiring an explicit density matrix. The witnesses we construct are moreover basis-independent, in the sense that they are invariant under conjugation by local unitaries. Altogether, our results expand the toolbox for entanglement detection in arbitrary local dimensions in a manifestly invariant way.

10.
arXiv (CS.LG) 2026-06-12

ExPLAIND: Unifying Model, Data, and Training Attribution to Study Model Behavior

arXiv:2505.20076v4 Announce Type: replace Abstract: Post-hoc interpretability methods typically attribute a model's behavior to its components, data, or training trajectory in isolation, and are often tied to a particular level of granularity along the local-to-global spectrum. This leads to explanations that lack a unified view and may miss key interactions. We present ExPLAIND, a theoretically grounded, unified framework that integrates model components, data, and training trajectory while supporting explanations across granularities. We generalize recent work on gradient path kernels, reformulating models trained by AdamW as kernel machines. From the resulting kernel feature maps, we derive novel parameter-wise and step-wise influence scores. We empirically validate the resulting decomposition of model behavior in several settings and apply ExPLAIND to two case studies. Our findings on a Transformer exhibiting Grokking support previously proposed learning phases, while refining the final phase as one in which outer layers align around a representation pipeline learned after memorization. For EuroLLM pretraining, ExPLAIND reveals a two-phase dynamic, with the first characterized by outer-layer MLP learning and the second by increased relative influence of intermediate attention layers. These results establish ExPLAIND as a unified framework for interpreting model behavior and training dynamics.

11.
arXiv (CS.LG) 2026-06-12

Simultaneous Latent Budget Trees for Stratified Classification

arXiv:2606.13295v1 Announce Type: cross Abstract: In the era of Explainable Artificial Intelligence, there is a renewed focus on single trees for their ease of interpretation. This paper introduces Simultaneous Latent Budget Trees, a probabilistic machine learning framework for classification trees in the presence of a stratification factor such as a temporal, spatial, or demographic variable, acting as a control variable or potential confounder. Standard tree growth procedures are not designed to optimize a conditional split rule. A model-based split rule is proposed in which child nodes are interpreted as latent components of a simultaneous mixture model, such as the Simultaneous Latent Budget Model and its constrained versions, fitted to the parent node. Mixing parameters drive the observations, differently for each group, to the child nodes whereas latent budgets parameters update the response classes profile of each level of the control variable. Parameters are estimated by least squares considering a neural network perspective of the model. An informative tree structure can be interactively visualized with interpretation aids on the node and the paths, including visual pruning and decision tree selection procedure. Suitable measures are proposed to handle an unbalanced response class distribution. The proposed methodology is applied to investigate gender-related differences in disease progression of Amyotrophic Lateral Sclerosis. The SLBT library with the various tree-based algorithms is available in the linked GitHub repository.

12.
arXiv (CS.CV) 2026-06-25

Efficient Real-World Dehazing via Physics-Inspired Global-Local Decoupling

Real-world single image dehazing is highly ill-posed due to spatially and spectrally varying scattering, while practical deployment demands lightweight and low-latency models. Existing approaches either rely on fragile physical inversion under simplified assumptions or adopt heavy blind architectures unsuitable for edge deployment. To overcome these limitations, we propose PGL-Net (Physics-Inspired Global-Local Decoupling Network), a lightweight framework that incorporates physical inductive biases via operator-level emulation, avoiding explicit parameter estimation. It decouples dehazing into global distribution rectification and local structural refinement. A Physics-Inspired Affine Fusion (PAF) module performs globally conditioned alignment across hierarchical skip connections to compensate for haze-induced bias, while a compact Degradation-Aware Modulation (DAM) block adaptively restores spatially and spectrally variant details through dynamic feature modulation. Extensive experiments on multiple real-world benchmarks demonstrate that PGL-Net achieves state-of-the-art restoration quality with significantly reduced complexity. Compared with the recent SOTA SGDN, the Tiny variant (PGL-Net-T) improves PSNR by up to 2.6dB and consistently enhances downstream object detection accuracy, while achieving over a 10x reduction in inference latency. Code is publicly available at: https://github.com/sc-30-bit/PGL-Net.

13.
arXiv (CS.CL) 2026-06-25

CLEF HIPE-2026: Evaluating Accurate and Efficient Person-Place Relation Extraction from Multilingual Historical Texts

HIPE-2026 is a CLEF evaluation lab dedicated to person-place relation extraction from noisy, multilingual historical texts. Building on the HIPE-2020 and HIPE-2022 campaigns, it extends the series toward semantic relation extraction by targeting the task of identifying person-place associations in multiple languages and time periods. Systems are asked to classify relations of two types – $at$ ("Has the person ever been at this place?") and $isAt$ ("Is the person located at this place around publication time?") – requiring reasoning over temporal and geographical cues. The lab introduces a three-fold evaluation profile that jointly assesses accuracy, computational efficiency, and domain generalization. By linking relation extraction to large-scale historical data processing, HIPE-2026 aims to support downstream applications in knowledge-graph construction, historical biography reconstruction, and spatial analysis in digital humanities.

14.
arXiv (CS.LG) 2026-06-24

Information-Theoretic Classifier-Free Guidance with Adaptive Schedule Optimization

arXiv:2606.24025v1 Announce Type: new Abstract: Diffusion models have achieved strong performance in image, text-to-image, and video generation, where conditional generation is often controlled by classifier-free guidance (CFG). CFG improves condition consistency by increasing a guidance weight, but stronger guidance typically reduces diversity and distributional coverage. It remains unclear how this consistency-coverage trade-off should be controlled across the reverse trajectory, since the distribution induced by CFG is not simply the fixed-time tilted distribution given by the guided score field. To address this issue, we propose an information-theoretic framework for CFG schedule optimization. Our approach uses a clean endpoint reference to specify the desired consistency-coverage trade-off, while optimizing the actual distribution induced by the guided sampler toward this reference. We derive trajectory-level formulas to estimate the objective from samples and score evaluations, avoiding explicit density estimation. On ImageNet-512 with EDM-XXL and COCO with SD-XL, the learned schedules achieve competitive or improved trade-offs over constant guidance and allocate guidance selectively across noise levels.

15.
arXiv (CS.CV) 2026-06-19

TriFlow: Generating Artist-Like 3D Mesh Topology via Nearest-Vertex Vector Fields

We present TriFlow, a new generative approach for producing compact 3D meshes with artist-like triangle topology directly from input geometry conditions such as signed distance fields. Our key insight is to represent mesh topology as a nearest-vertex vector field (NVF) defined over the surface, where each point encodes its association to the nearest triangle vertex in the local barycentric frame. We train a latent flow-matching model to synthesize this field, enabling topology generation conditioned on the input geometry. To extract a coherent mesh, we cluster surface regions using the generated NVF and guide a constrained quadric error metric (QEM) mesh simplification with topology-aware optimization. This yields output meshes that closely match the input geometry while exhibiting structured, artist-like connectivity. Experiments demonstrate that TriFlow achieves stronger generalization and significantly improved topology quality compared to state-of-the-art learning-based approaches, alongside 90% lower Chamfer Distance and an 8x speedup.

16.
bioRxiv (Bioinfo) 2026-06-11

Hyper3D-lite: count-preserving representation auditing for long-read multi-contact genome data

作者:

Long-read and single-molecule sequencing technologies are rapidly increasing molecule-level data, with platforms such as Oxford Nanopore, PacBio HiFi, and Roche sequencing-by-expansion advancing at different technology readiness levels. In the specific context of Pore-C and HiPore-C multi-contact chromatin-conformation assays, long-read multi-contact 3D genome assays preserve molecule-level contact context, but common downstream pairwise projections can expand one multi-contact molecule into many pair records. This creates a representation problem: apparent contact evidence can increase through the counting frame before biological interpretation begins. Hyper3D-lite addresses this problem as a representation-first audit tool for read-to-fragment-style long-read multi-contact inputs. It compares all-pair projection with CPB, a count-preserving statistical accounting reference point, and separates broad software outputs from conservative higher-order candidate calls.

17.
arXiv (CS.LG) 2026-06-25

An Analysis of Posterior Collapse, Parameterization and Initialization in Variational Deep Gaussian Processes

arXiv:2606.25882v1 Announce Type: new Abstract: DGPs are probabilistic models with remarkable prediction performance that concatenate GPs across several layers. Exact inference in DGPs is intractable, and variational inference is often used to approximate the posterior with a parametric distribution tuned by minimizing the Kullback-Leibler divergence. Moreover, finding a good VI approximation is challenging. In particular, a problem of VI is posterior collapse, where VI converges to a variational posterior that matches the prior. In variational DGPs, this implies explaining the data as noise. This work studies posterior collapse in DGPs and identifies its connection to the DSVI algorithm and the widely used linear prior mean function employed in all but the last layer. We show that the benefit of the linear prior mean does not arise from avoiding the non-injective pathology in very deep DGPs, as previously believed, but from improving the conditioning of the optimization problem at initialization. Thus, we propose an alternative initialization of a zero prior mean DGP that mimics a DGP with a linear prior mean at initialization. This enables successful training of DGPs without imposing optimization-driven constraints on the prior, allowing to choose the prior based on modeling assumptions rather than optimization convenience. Our analysis considers three common parameterizations of DGPs and shows that not all of them benefit from a linear prior mean. We also explain why a whitened parameterization of the \operatorname{DGP} provides more stable convergence, something often assumed from experience, but lacking a rigorous analysis. Furthermore, we show that this stability is also beneficial to avoid the posterior collapse problem. Extensive experiments validate our findings: the proposed initialization prevents posterior collapse, improves stability, and achieves performance comparable to (and sometimes better than) DGPs with a linear prior mean.

18.
arXiv (CS.CL) 2026-06-16

The BD-LSC Dataset: Facilitating the Benchmarking of Models for Lexical Semantic Change Detection in Slang and Standard Usage

Automatic semantic change detection aims to identify how word meanings shift over time, offering insights into both linguistic and societal change. Despite recent progress in computational lexical semantic change (LSC), existing benchmarks and methods struggle to capture bi-directional semantic change, particularly cases where words simultaneously gain and lose senses. This problem is especially challenging for words that have both slang and standard meanings. To address these gaps, we introduce two complementary benchmark datasets. The Bi-Directional Lexical Semantic Change (BD-LSC) dataset captures sense gain, sense loss, and stability across three time periods, enabling the study of complex semantic trajectories. The SlangTrack Word Sense Disambiguation (ST-WSD) dataset provides fine-grained, instance-level sense annotations for words combining slang and standard usages, supporting systematic benchmarking of WSD and semantic change detection models. Using these benchmarks, we systematically evaluate models across different methodological families: unsupervised clustering using contextualised embeddings, supervised machine learning, transformer-based models, and state-of-the-art large language models. Among the evaluated systems, the few-shot GPT-4o model achieved the strongest aggregate performance on Exact Sense Match (ESM) and multi-label accuracy; however, Macro-F1 scores near 0.5 across all systems show that rare slang senses remain difficult, which we identify as the central open challenge.

19.
arXiv (CS.CV) 2026-06-24

MambaRaw: Selective State Space Modeling for Efficient 4K Raw Image Reconstruction

In-camera JPEG previews are ubiquitous in raw image formats and provide an sRGB reference at negligible storage cost. Although existing metadata-based reconstruction frameworks can exploit this side information when recovering raw images, their context models often become computationally expensive especially at high resolution, eg, 4K raw image, given that attention mechanisms scale quadratically with feature maps, hindering its practical application. To address these limitations, we propose MambaRaw, a JPEG-conditioned metadata-based raw image reconstruction framework that uses State Space Models (SSMs) to estimate entropy parameters efficiently. Our key contribution comprises a Spatial-Energy Coupled Context Modeling mechanism with two lightweight modules: (1) TileMambaBlock, which performs Mamba-style selective scanning only on information-dense tiles to improve the efficiency; and (2) Energy-Aware Refinement (EAR), an identity-initialized residual module that enhance feature representation to match the long-tail energy distribution of raw signals. Extensive experiments on three camera datasets (Sony, Olympus, Samsung) show consistent improvements over strong metadata-based baselines and set a new state of the art for JPEG-guided raw reconstruction with great efficiency. Notably, at low metadata bitrates, MambaRaw increases PSNR by 1.2–1.4 dB and reduces end-to-end coding latency by about 9%. Code is released at https://github.com/Peizeli1/MambaRaw.

20.
arXiv (CS.LG) 2026-06-24

Posterior Sampling Reinforcement Learning with Gaussian Processes for Continuous Control: Sublinear Regret Bounds for Unbounded State Spaces

arXiv:2603.08287v2 Announce Type: replace-cross Abstract: We analyze the Bayesian regret of the Gaussian process posterior sampling reinforcement learning (GP-PSRL) algorithm. Posterior sampling is a heuristic for decision-making under uncertainty that has been used to develop successful algorithms for a variety of continuous control problems. However, theoretical work on GP-PSRL is limited. All known regret bounds either have a sub-optimal growth rate, require strong smoothness assumptions, or fail to properly account for the fact that the set of possible system states is unbounded. Through a recursive application of the Borell-Tsirelson-Ibragimov-Sudakov inequality, we show that, with high probability, the states actually visited by the algorithm are contained within a ball of near-constant radius. We then use the chaining method to control the regret suffered by GP-PSRL under weak smoothness conditions. Our main result is a Bayesian regret bound of the order $\widetilde{\mathcal{O}}(H\sqrt{\gamma_TT})$, where $H$ is the horizon, $T$ is the number of time steps and $\gamma_T$ is the expected information gain. With this result, we resolve the limitations with prior theoretical work on PSRL, and provide the theoretical foundation and tools for analyzing PSRL in complex settings.

21.
arXiv (CS.AI) 2026-06-25

Clifford Kolmogorov-Arnold Networks

arXiv:2602.05977v2 Announce Type: replace-cross Abstract: We introduce Clifford Kolmogorov-Arnold Network (ClKAN), a flexible and efficient architecture for function approximation in arbitrary Clifford Algebra spaces. We propose the use of Randomized Quasi-Monte Carlo grid generation as a solution to the exponential scaling associated with higher-dimensional algebras. Our ClKAN also introduces new batch normalization strategies to deal with variable domain input. ClKAN finds application in scientific discovery and engineering, and is validated in synthetic and physics-inspired tasks.

22.
arXiv (CS.AI) 2026-06-24

Zero-Shot Test-Time Canonicalization using Out-of-Distribution Scoring

arXiv:2606.24178v1 Announce Type: cross Abstract: Pretrained vision models often misclassify inputs that are rotated, scaled, or sheared, even though these affine transformations leave the object class unchanged. Robustness is usually restored either by building equivariance into the architecture or by retraining with augmentation, both of which require changing or retraining the model. Test-time canonicalization instead leaves the classifier untouched. It undoes the transformation of each input, mapping it to a canonical form near the training distribution before classification. Existing canonicalizers, however, rely on a narrow set of logit-based energy scores and bespoke search procedures, leaving the design space of scoring functions and optimizers unexplored. We reframe canonicalization as out-of-distribution (OOD) detection, which lets any OOD score serve as the energy minimized over transformations. Across benchmarks ranging from handwritten characters and sketches to natural images and 3D point clouds, we systematically evaluate around twenty OOD scores and nine search algorithms, finding that distance-based scores paired with random search and local refinement perform best overall. Because canonicalizing an already-aligned input can hurt accuracy, we add a gated mechanism that transforms an input only when its OOD score indicates this is needed, preserving most in-distribution accuracy while retaining the robustness gains on transformed inputs. Code is available at github.com/johschm/its.

23.
arXiv (CS.LG) 2026-06-12

Viral Proteins Reveal Geometry of Protein Language Models

arXiv:2606.12609v1 Announce Type: new Abstract: Protein language models are trained on highly imbalanced datasets, raising the question of how they represent underrepresented biological sequences. Using viral proteins as a case study across ESM model families, we identify a dominant nativeness axis in embedding space, aligned with masked reconstruction perplexity, that orders sequences from well-modeled cellular proteins through viral proteins to shuffled and random sequences. Scaling contracts this axis unevenly across viral families. Despite this, protein language model embeddings retain viral-specific signal: viral proteins remain linearly separable beyond zero-shot perplexity and shallow sequence features. Together, these results suggest that pLM representations are structured by a general notion of nativeness while preserving information specific to distinct biological groups.

24.
bioRxiv (Bioinfo) 2026-06-11

STITCH links cellular morphology and gene expression in spatial transcriptomics

In situ spatial (ISS) sequencing can uncover co-variation between cellular morphology and gene expression in vivo. However, a principled and interpretable mathematical representation of morphology has not yet been applied in this context. In particular, current deep learning-based representations of cell images confound a cell's shape with its size. We present an interpretable representation of cellular boundary contours, based on tangent principal component analysis (TPCA) in a Kendall shape manifold, that captures size-independent contour shape features. This approach successfully recovers shape-perturbing genes in an RNAi screen than a previous metric geometry-based approach. We build on TPCA to develop STITCH (Shape-TranscriptomIc Correlation and Harmonization), an approach to reveal covariation between cell morphology with gene expression in ISS datasets. In a Xenium dataset, STITCH outperforms a deep learning-based approach in both recovering the layered organization of keratinocytes and a spatial gradient in nuclear eccentricity. Across samples in a melanoma CosMx dataset, STITCH reproducibly associates elongated and triangular fibroblasts with proximity to malignant cells and myofibroblast-like transcriptional program. Finally, STITCH independently recovers a known link between mesenchymal-like malignant cell states and increased cell area in two melanoma cohorts. STITCH can thus yield interpretable morphology-transcriptome relationships across cell types, patients, and spatial transcriptomics platforms.

25.
arXiv (CS.AI) 2026-06-17

A Gradient-based Causal Discovery Framework with Applications to Complex Industrial Processes

arXiv:2507.11178v3 Announce Type: replace-cross Abstract: With the advancement of deep learning technologies, various neural network-based Granger causality models have been proposed. Although these models have demonstrated notable improvements, several limitations remain. Most existing approaches adopt the component-wise architecture, necessitating the construction of a separate model for each time series, which results in substantial computational costs. In addition, imposing the sparsity-inducing penalty on the first-layer weights of the neural network to extract causal relationships weakens the model's ability to capture complex interactions. To address these limitations, we propose Gradient Regularization-based Neural Granger Causality (GRNGC), which requires only one time series prediction model and applies $L_{1}$ regularization to the gradient between model's input and output to infer Granger causality. Moreover, GRNGC is not tied to a specific time series forecasting model and can be implemented with diverse architectures such as KAN, MLP, and LSTM, offering enhanced flexibility. Numerical simulations on DREAM, Lorenz-96, fMRI BOLD, and CausalTime show that GRNGC outperforms existing baselines and significantly reduces computational overhead. Meanwhile, experiments on real-world DNA, Yeast, HeLa, and bladder urothelial carcinoma datasets further validate the model's effectiveness in reconstructing gene regulatory networks.