Academic Intelligence · Curated Daily

探索全球前沿学术脉络

AcademicHub 汇聚顶级期刊与预印本平台的实时文献。定制您的专属科研雷达,利用大语言模型自动生成交叉领域文献分析简报。

01.
arXiv (math.PR) 2026-06-11

Matrix Discrepancy for Representations of Finite Groups

arXiv:2606.12181v1 Announce Type: new Abstract: Given a finite group $G$, we prove that there exist signs $\varepsilon\in\{\pm1\}^G$ such that $$\left\| \sum_{g\in G} \varepsilon_g\rho(g) \right\|\leq C\, \sqrt{|G|},$$ where $\rho$ is the left regular representation of $G$, and $C$ is a universal constant. This special case of the Matrix Spencer conjecture was posed in [BKMZ24], where it was established for simple groups.

02.
arXiv (CS.CL) 2026-06-17

Speaking in Self-Assessing Tongues: On the Verbalized Confidence of LLMs in Machine Translation

The rapid rise in popularity of large language models (LLMs) for translation calls for a thorough study of the reliability of their confidence in their own outputs. Unlike many generation tasks, translation errors and confidence levels can be useful at different levels of granularity (tokens, words, or spans). Unsupervised approaches based on internal signals like predicted probabilities can be misleading because they reflect certainty among alternatives rather than correctness. In addition, they require access to such internal signals. Here, we devise five verbalized methods of extracting an LLM's per-token confidence without those shortcomings and compare their reliability with that of the model's internal signals of certainty. We evaluate reliability using two forms of alignment: fine-grained error detection and calibration. For both, internal and verbalized methods perform similarly, although results vary by model. Interestingly, we find little to no correlation between internal and verbalized methods.

03.
arXiv (CS.AI) 2026-06-19

What Do Safety-Aligned LLMs Learn From Mixed Compliance Demonstrations?

arXiv:2606.20508v1 Announce Type: new Abstract: Prior work has shown that in-context demonstrations can jailbreak language models, but it remains unclear how models interpret different types of compliance demonstrations. We study this by mixing benign compliance demonstrations (non-harmful request, helpful response) with harmful compliance demonstrations (harmful request, helpful response) and testing three hypotheses about how demonstration composition drives harmful compliance. Across four models, we find that benign and harmful demonstrations are not interchangeable: benign demonstrations can either reduce or increase harmful compliance depending on the model. We further show that preference optimization is the critical training stage that prevents benign demonstrations from increasing harmful compliance, that demonstration ordering exhibits strong recency bias, and that models differ in how refusal interacts with in-context learning: some adopt demonstrated formatting even when refusing, while others override all in-context signals upon refusal. Taken together, this work moves beyond showing that demonstration-based jailbreaking works to characterizing how it works: what models extract from compliance demonstrations depends on demonstration content, ordering, and training methodology.

04.
arXiv (CS.AI) 2026-06-12

Reasoning for Mobile User Experience with Multimodal LLMs: Task, Benchmark, and Approach

arXiv:2606.13192v1 Announce Type: new Abstract: User experience (UX) centered on usability, perceived consistency, and functional clarity is fundamental to real-world user interfaces (UI). The application of multimodal large language models (MLLMs) in the field of user interfaces is evolving rapidly, such as visual element grounding, graphical user interface (GUI) agents, and design-to-code generation. However, research efforts on evaluating UX based on UI screenshots are still immature. To address this, we propose UXBench, a novel multimodal benchmark consisting of 2,000 VQA data samples designed to assess MLLMs' ability to perform UI-based reasoning. UXBench includes 8 tasks based on real-world UI screenshots that require fine-grained diagnosis of UX issues across layout relationships, visual hierarchy, and content consistency. Our extensive evaluation of mainstream MLLMs shows that they remain fundamentally limited in their capacity for UI-based reasoning. The results underscore the need for further advancements in this area. To bridge this gap, we propose UI-UX, an MLLM based on Qwen3-VL-4B-Thinking foundation model and enhanced via reinforcement learning with two key innovations: a reward routing mechanism that dynamically balances perceptual understanding and logical reasoning during inference, and an asymmetric transition reward that suppresses redundant or insufficient reasoning steps. Experiments demonstrate that UI-UX achieves state-of-the-art (SOTA) performance on UXBench, attaining an accuracy of 0.7963 – surpassing Claude-4.5-Sonnet's 0.6550 – while exhibiting strong generalization across diverse UI tasks and maintaining low inference latency.

05.
arXiv (CS.LG) 2026-06-16

Mean-Field Parallel Decoding for Discrete Diffusion Language Models

arXiv:2606.15805v1 Announce Type: new Abstract: Discrete diffusion language models enable parallel token generation, offering a pathway to low-latency decoding. However, selecting tokens independently by marginal confidence limits effective parallelism: tokens that appear reliable in isolation can form incompatible configurations when several positions are updated at once. We introduce a training-free decoding framework that coordinates these parallel updates. At each forward pass, the method assigns a commit score to each masked position and refines these scores using pairwise interactions derived from the model's predictive distributions. A variational relaxation yields a simple fixed-point update that suppresses conflicting simultaneous commitments within a single forward pass. This mechanism allows the decoder to commit more tokens in parallel while maintaining competitive generation quality. The method is lightweight, requires no auxiliary model or retraining, and drops into existing diffusion decoding pipelines without modification. Experiments on reasoning and code-generation benchmarks show consistent improvements in the quality-latency trade-off.

06.
arXiv (CS.AI) 2026-06-18

A Knowledge Theory of Capital:The Value of Natural and Artificial Intelligence

arXiv:2606.18288v1 Announce Type: cross Abstract: This volume develops a knowledge theory of capital for economies in which productive capacity increasingly resides in software, data, models, routines, expertise, platforms, organizations, commons, and public epistemic infrastructure. Beginning from Adam Smith's theory of labour, stock, specialization, and market extent, it asks what changes when knowledge becomes stock-like, mobile across forms, scalable, governable, recombinable, and imperfectly visible in accounting. The book introduces knowledge-bearing stock as the central object and analyses how it is generated, converted into governable form, deployed, improved through feedback, enclosed or shared, measured, impaired, and used as input to future production. It distinguishes embodied, disembodied, institutionalized, commons, and public knowledge forms and develops concepts such as first conversion, cognitive enclosure, feedback capture, dark capital, and expected knowledge loss. The argument is conditional and testable: modern wealth depends not only on capital accumulation, but on how productive knowledge is governed.

07.
arXiv (CS.LG) 2026-06-17

Perron–Frobenius Operator Matching for Generative Modeling

arXiv:2606.17465v1 Announce Type: new Abstract: We introduce Perron–Frobenius Operator Matching (PFOM), a generative framework that matches density evolution via the integral PF operator, subsuming flow, diffusion, and jump models. We prove that among Bregman divergences, only Kullback–Leibler divergence preserves equality between density-level and sample-conditioned objectives, yielding a practical loss equivalent to Koopman path matching. We further develop Nesterov-accelerated training and sampling that stabilize discretization and accelerate convergence. %On Gaussian mixtures and two-moons, PFOM achieves faster KL/$W_2$/MMD decrease and improved wall-clock efficiency with empirical validation. PFOM unifies operator-theoretic identification with modern generative modeling and opens paths to adaptive dictionaries and high-dimensional applications.

08.
arXiv (quant-ph) 2026-06-12

Approximability limits for bounded-degree max-LINSAT and implications for decoded quantum interferometry

arXiv:2606.13570v1 Announce Type: new Abstract: For general max-k-XORSAT with $k \geq 3$, no polynomial-time algorithm can do substantially better than random guessing on worst-case instances unless $\mathsf{P} = \mathsf{NP}$: approximating beyond the random-assignment value of $1/2$ is $\mathsf{NP}$-hard. The picture changes when each variable appears in at most $D$ constraints. In that bounded-degree setting, polynomial-time algorithms can provably beat the random baseline by an additive amount of order $1/\sqrt{D}$. For Boolean instances, this scaling is known to be optimal: the matching hardness result is due to Trevisan, while the corresponding algorithmic guarantee was established by Barak et al. Whether the same holds over general finite fields, and what it implies for quantum algorithms, has not been established. We make this connection explicit and extend the hardness to max-E$k$-LINSAT$(q,r)$ with bounded degree $D$ and over arbitrary finite fields $\mathbb{F}_q$, proving that it is $\mathsf{NP}$-hard to exceed $r/q + \mathcal{O}_{q,r}(1/\sqrt{D})$. These results provide the complexity-theoretic benchmark for the bounded-degree instances targeted by decoded quantum interferometry (DQI), QAOA, and classical heuristics. Any quantum advantage on bounded-degree instances is therefore confined to the constant prefactor. We further show that in the context of DQI and on $(k,D)$-regular instances, this prefactor is sensitive to the nature of the decoder: DQI with classical decoders faces an information-theoretic $1/\sqrt{D \log D}$ barrier that prevents it from matching the hardness scaling, while DQI with quantum decoders is compatible with the $1/\sqrt{D}$ scaling – identifying quantum decoding as the key ingredient for matching the complexity-theoretic scaling with DQI.

09.
arXiv (CS.AI) 2026-06-18

Leveraging Energy Features for Surface Classification with Deep Learning: A Comparative Analysis Across Three Independent Datasets

arXiv:2606.18698v1 Announce Type: cross Abstract: The energy-based method remains a comparatively underexamined approach for surface classification in mobile robotics, despite promising results in constrained environments. This study evaluated the viability of using energy-derived features as either a standalone classification modality or as supplementary input to inertial data. A comprehensive evaluation was conducted across three publicly available datasets, comparing the performance of modern deep learning architectures including recurrent neural networks, convolutional neural networks, encoder-only transformers, and Mamba state-space models, under automated hyperparameter tuning and input sequence length optimization. The models achieved higher accuracy than previously reported values on all evaluated datasets, with the convolutional neural network yielding the highest overall performance. When relying exclusively on energy-based features, the models attained classification accuracies in the range of 85-90%, approximately 5-10% lower than those achieved when combined with inertial features (96-99%). Augmenting inertial data with energy features resulted in a consistent mean accuracy improvement of 1-2%. These findings indicate that classifiers relying solely on energy features offer sufficient accuracy for standalone deployment, while also providing a consistent gain when used in combination with other sensing modalities.

10.
PLOS Computational Biology 2026-06-02

Linking reduced prefrontal microcircuit inhibition in schizophrenia to EEG biomarkers in silico

by Sana Rosanally, Frank Mazza, Heng Kang Yao, Faraz Moghbel, Hannah Seo, Etay Hay Reduced cortical inhibition by parvalbumin-expressing (PV) interneurons in schizophrenia is thought to be associated with impaired processing in the prefrontal cortex and altered EEG signals such as oddball mismatch negativity (MMN). Recent studies also suggest loss of somatostatin (SST) interneuron inhibition. However, establishing the link between reduced interneuron inhibition and reduced MMN experimentally in humans is currently not possible. To overcome these challenges, we simulated spiking activity and EEG during baseline and oddball response in detailed models of human prefrontal microcircuits in health and schizophrenia, with reduced PV and SST interneuron inhibition as constrained by postmortem patient data. We showed that reduced PV interneuron inhibition can account for the decreased MMN amplitude seen in schizophrenia, with a threshold below which the amplitude effect was low as seen in at-risk patients. In contrast, reduced SST interneuron inhibition did not affect the MMN amplitude. We further showed that both types of inhibition loss were necessary to account for changes in resting EEG in schizophrenia, with reduced SST interneuron inhibition increasing broadband power, and reduced PV and SST interneuron inhibition both leading to a right shift from alpha to beta frequencies. Our study thus links reduced PV and SST interneuron inhibition in schizophrenia to distinct EEG biomarkers that can serve to improve stratification and early detection using non-invasive brain signals.

11.
arXiv (CS.CV) 2026-06-17

Colab NAS: Obtaining lightweight task-specific convolutional neural networks following Occam's razor

The current trend of applying transfer learning from convolutional neural networks (CNNs) trained on large datasets can be an overkill when the target application is a custom and delimited problem, with enough data to train a network from scratch. On the other hand, the training of custom and lighter CNNs requires expertise, in the from-scratch case, and or high-end resources, as in the case of hardware-aware neural architecture search (HW NAS), limiting access to the technology by non-habitual NN developers. For this reason, we present ColabNAS, an affordable HW NAS technique for producing lightweight task-specific CNNs. Its novel derivative-free search strategy, inspired by Occam's razor, allows to obtain state-of-the-art results on the Visual Wake Word dataset, a standard TinyML benchmark, in just 3.1 GPU hours using free online GPU services such as Google Colaboratory and Kaggle Kernel.

12.
arXiv (math.PR) 2026-06-19

Asymptotic properties for fully coupled delayed forward-backward stochastic differential equations

arXiv:2606.19925v1 Announce Type: new Abstract: We investigate the asymptotic behavior of solutions to a class of fully coupled forward-backward stochastic differential equations with time-delayed generators. Such systems arise naturally in stochastic models with memory effects and constitute a significant extension of the classical fully coupled FBSDE framework. The presence of delay introduces additional analytical difficulties due to the dependence of the coefficients on the past trajectories of the solution processes and the resulting non-Markovian structure. Under suitable assumptions on the coefficients, we study the asymptotic properties of a perturbed delayed FBSDE driven by a small noise parameter. We first establish the convergence in distribution of the associated solution processes as the perturbation parameter tends to zero. We then prove almost sure convergence towards the solution of the corresponding deterministic limiting system. As a consequence of these asymptotic results, we derive a large deviation principle for the solution processes. Our results extend the asymptotic analysis of Cruzeiro, Gomes and Zhang (2014) from the classical fully coupled FBSDE setting to the delayed framework, and complement existing works on weakly coupled delayed forward-backward systems. They provide, to the best of our knowledge, the first large deviation principle for fully coupled forward-backward stochastic differential equations with delayed generators.

13.
arXiv (quant-ph) 2026-06-15

Who can compete with quantum computers? Lecture notes on quantum inspired tensor networks computational techniques

arXiv:2601.03035v2 Announce Type: replace Abstract: This is a set of lectures on tensor networks with a strong emphasis on the core algorithms involving Matrix Product States (MPS) and Matrix Product Operators (MPO). Compared to other presentations, particular care has been given to disentangle aspects of tensor networks from the quantum many-body problem: MPO/MPS algorithms are presented as a way to deal with linear algebra on extremely (exponentially) large matrices and vectors, regardless of any particular application. The lectures include well-known algorithms to find eigenvectors of MPOs (the celebrated DMRG), solve linear problems, and recent learning algorithms that allow one to map a known function into an MPS (the Tensor Cross Interpolation, or TCI, algorithm). The lectures end with a discussion of how to represent functions and perform calculus with tensor networks using the "quantics" representation. They include the detailed analytical construction of important MPOs such as those for differentiation, indefinite integration, convolution, and the quantum Fourier transform. Three concrete applications are discussed in detail: the simulation of a quantum computer (either exactly or with compression), the simulation of a quantum annealer, and techniques to solve partial differential equations (e.g. Poisson, diffusion, or Gross-Pitaevskii) within the "quantics" representation. The lectures have been designed to be accessible to a first-year PhD student and include detailed proofs of all statements.

14.
arXiv (CS.CV) 2026-06-16

LOCUS: Local Visual Cue Search for Enhancing Fine-Grained Perception in Multimodal Large Language Models

Multimodal Large Language Models (MLLMs) remain unreliable on fine-grained visual perception, even when high-resolution inputs preserve the necessary local details. We identify this limitation as visual context rot: decisive evidence may exist in the full image, yet fail to be reliably selected and used amid redundant visual context. We propose LOCUS (LOcal visual CUe Search), a training framework that teaches MLLMs to internalize local evidence search through a verifiable proxy task. During training, LOCUS provides a local crop as a visual cue and optimizes the model to recover its spatial support in the full image using an IoU-based reward. The visual cue is used only during training, leaving the standard image-question inference interface unchanged. Experiments across fine-grained perception, hallucination, general understanding, and reasoning benchmarks show that LOCUS improves localization-sensitive visual understanding while preserving broad capabilities. Attention analyses further indicate stronger focus on task-relevant evidence regions, suggesting that training-time visual cue search provides an effective route to internalized fine-grained evidence selection.

16.
arXiv (CS.CL) 2026-06-16

LESS Is More: Mutual-Stability Sampling for Diffusion Language Models

Diffusion large language models (dLLMs) offer a promising alternative to autoregressive decoding by iteratively refining masked sequences, enabling parallel token updates and bidirectional conditioning. Their practical efficiency, however, is limited by sampling procedures that execute a fixed number of reverse denoising steps selected before decoding, spending computation on already-stable positions and sometimes committing unstable ones too early. We present \textsc{LESS}, a training-free, model-agnostic adaptive sampler that treats token commitment as an online stopping problem. \textsc{LESS} implements mutual-stability sampling through a joint stability rule that makes a masked position eligible for unmasking only when its top-1 prediction has high confidence, its top-1 token persists across recent reverse steps, and its predictive distribution is stable under top-$K$ inter-step Jensen–Shannon divergence. We evaluate \textsc{LESS} on Dream-7B, LLaDA-8B, and LLaDA-1.5-8B, covering full-sequence diffusion and semi-autoregressive blockwise sampling regimes, across seven benchmarks spanning general knowledge, math, and code. \textsc{LESS} improves average accuracy over strong training-free adaptive samplers while using $72.1\%$ fewer reverse steps than fixed-budget decoding. Since each reverse step requires a Transformer forward pass, these step-count reductions translate into fewer forward evaluations, lower measured wall-clock latency, and lower estimated inference compute.

17.
arXiv (CS.CL) 2026-06-16

HyDRA: Hybrid Dynamic Routing Architecture for Heterogeneous LLM Pools

Production LLM deployments increasingly maintain heterogeneous model pools spanning order-of-magnitude cost differences. Existing routers make binary strong-vs-weak decisions and couple learned parameters to specific model identities, requiring retraining whenever the catalog changes. We present HyDRA (Hybrid Dynamic Routing Architecture), a framework that predicts fine-grained, multi-dimensional capability requirements per query and matches them against configuration-defined model profiles via shortfall matching. A ModernBERT encoder with K=4 independent sigmoid heads scores each query along reasoning, code generation, debugging, and tool use; a shortfall-matching algorithm then selects the cheapest model whose capabilities meet the predicted requirements. The deployed predictor runs at 86 ms median CPU inference latency in production, and is fully decoupled from the model catalog – adding or removing models requires only a configuration change, with zero retraining. On SWE-Bench Verified (5-model pool: GPT-5.4-mini, Claude Haiku 4.5, GPT-5.3 Codex, Claude Sonnet 4.6, GPT-5.4), HyDRA's tunable shortfall threshold spans three regimes: peak-quality exceeds the always-strong Claude Sonnet 4.6 baseline (75.4% vs. 74.2% resolution) at 12.9% cost savings; iso-quality matches Sonnet at 54.1% cost savings, a 6x improvement over our prior in-house binary router at 9.1%; aggressive pushes savings to 72.5% for a 3.2-point quality trade. Results generalize across LiveCodeBench, BigCodeBench, and tau-bench. HyDRA is deployed to all users in GitHub Copilot's VS Code Chat auto-mode and – to our knowledge for the first time in the LLM routing literature – demonstrates language-invariant routing across CJK, European, and other script families.

18.
arXiv (CS.CV) 2026-06-16

Polyp-D2ATL: Deep Domain-Adaptive Transfer Learning for Colorectal Polyp Classification under Label Distribution Shift

Early and highly accurate prediction of colorectal polyps, as an important sign of one of the most dangerous types of cancer, will result in saving more lives. Despite the advancements in colorectal polyp classification, many challenges remain in obtaining an automated polyp prediction system that is able to diagnose the difficult-to-predict polyps accompanied by different features in real scenarios, where the model can handle imbalanced data, label distribution shift, and cross-modality generalization successfully. In this study, we propose Polyp-D2ATL, a novel framework accompanied by a specific training strategy, which mitigates these limitations and effectively predicts the different classes of polyps belonging to the NICE classification. Our extensive experiments on the PICCOLO validation and test sets demonstrate that the proposed Polyp-D2ATL significantly outperforms existing state-of-the-art models across various reliable metrics, achieving an accuracy of 82.38%, a Macro-F1 of 77.49%, and a specificity of 87.47% on the validation set, alongside consistent improvements on the held-out test set which demonstrates the generalization capacity and clinical applicability of the proposed approach.

19.
arXiv (CS.AI) 2026-06-15

Hierarchical ODE: Learning Continuous-Time Physical Prototypes for Early Link Failure Detection

arXiv:2606.14284v1 Announce Type: cross Abstract: Time series prototype learning is fundamentally challenged by observational ambiguity. Discrete architectures fail to resolve this, as they lack the capacity to decouple stochastic noise from continuous dynamics. Furthermore, rigid closed-set assumptions fail to capture unseen diversity. To address these limitations, we propose a hierarchical ordinary differential equation clustering network, which utilizes neural ordinary differential equation to model latent state evolution as a continuous integral curve. This formulation enforces temporal continuity to effectively disentangle smooth feature trends from stochastic noise, while our adaptive hierarchical mechanism autonomously determines the appropriate number of prototypes without rigid prior constraints. Validated on the early link failure detection task with irregularly sampled time series, the proposed method effectively extracts underlying physical prototypes, thereby enabling robust failure detection. Our code is available at https://github.com/NJ-LNN/Hierarchical-ODE.

20.
arXiv (CS.LG) 2026-06-18

Self-attention-based non-linear basis transformations for compact latent space modelling of dynamic optical fibre transmission matrices

arXiv:2406.07775v2 Announce Type: replace Abstract: Multimode optical fibres are hair-thin strands of glass that efficiently transport light. They promise next-generation medical endoscopes that provide unprecedented sub-cellular image resolution deep inside the body. However, confining light to such fibres means that images are inherently scrambled in transit. Conventionally, this scrambling has been compensated by pre-calibrating how a specific fibre scrambles light and solving a stationary linear matrix equation that represents a physical model of the fibre. However, as the technology develops towards real-world deployment, the unscrambling process must account for dynamic changes in the matrix representing the fibre's effect on light, due to factors such as movement and temperature shifts, and non-linearities resulting from the inaccessibility of the fibre tip when inside the body. Such complex, dynamic and nonlinear behaviour is well-suited to approximation by neural networks, but most leading image reconstruction networks rely on convolutional layers, which assume strong correlations between adjacent pixels, a strong inductive bias that is inappropriate for fibre matrices which may be expressed in a range of arbitrary coordinate representations with long-range correlations. We introduce a new concept that uses self-attention layers to dynamically transform the coordinate representations of varying fibre matrices to a basis that admits compact, low-dimensional representations suitable for further processing. We demonstrate the effectiveness of this approach on diverse fibre matrix datasets. We show our models significantly improve the sparsity of fibre bases in their transformed bases with a participation ratio, p, as a measure of sparsity, of between 0.01 and 0.11. Further, we show that these transformed representations admit reconstruction of the original matrices with < 10% reconstruction error, demonstrating the invertibility.

21.
arXiv (CS.CL) 2026-06-11

Vector Quantized Latent Concepts: A Scalable Alternative to Clustering-Based Concept Discovery

Large language models (LLMs) encode rich semantic information in their hidden states, yet it remains difficult to understand what information these internal representations capture. Latent concepts extracted from hidden states offer a promising direction for interpreting LLMs, but existing clustering-based methods face a trade-off: hierarchical clustering produces coherent concepts but is limited to small datasets due to its quadratic memory cost, while K-Means scales efficiently but may yield less semantically coherent concepts. We propose Vector Quantized Latent Concept (VQLC), a discrete concept learning framework that learns a codebook of latent concepts on frozen hidden states. Across 12 dataset-model settings, VQLC stays close to K-Means in computational cost, scales better than hierarchical clustering, and remains competitive in faithfulness, with the clearest gains on decoder-only models. LLMs-based evaluation, qualitative analysis, and a Sparse Autoencoder (SAE) comparison demonstrate that the learned concepts are interpretable and task-relevant.

22.
arXiv (CS.AI) 2026-06-17

FllumaOne: A Code-Native Multimodal CAD Dataset with Executable Programs and Kernel-Validated Feature Histories

作者:

arXiv:2606.17696v1 Announce Type: new Abstract: Parametric computer-aided design records both final geometry and the ordered construction history that determines how a part can be edited. Datasets for editable CAD research should therefore expose modeling operations, parameters, and feature dependencies together with validated geometry. We introduce FllumaOne, a code-native multimodal CAD dataset whose models are generated by executable Python programs in Flluma, a Qt/C++ OpenCASCADE-based CAD system. Each sample aligns its program with a structured feature tree, a training-oriented intermediate representation, STEP geometry, a surface point cloud, natural-language descriptions, metadata, and eight canonical visible-edge renderings. The primary release, FllumaOne-100K, contains 100,000 accepted samples across four template-level complexity regimes. Programs are executed and retained only after kernel geometry, solid validity, and export checks; release reports also record modality completeness and split-level duplicate tests. A Qwen2.5-Coder-1.5B LoRA baseline trained on 80,000 samples achieves 99.98% Python syntax validity, 99.97% Flluma build success, and 99.14% STEP-export validity on the held-out 10,000-sample test split. For the 9,909 predictions converted to surface point clouds, the mean normalized Chamfer Distance is 0.002124. The dataset supports conditioned CAD reconstruction, executable program synthesis, feature-tree prediction, B-Rep analysis, retrieval, design completion, and editable reverse engineering.

23.
PLOS Computational Biology 2026-06-22

pyhgf: A neural network library for predictive coding

by Nicolas Legrand, Lilian Weber, Peter Thestrup Waade, Anna Hedvig Møller Daugaard, Mojtaba Khodadadi, Nace Mikuš, Christoph Mathys Bayesian models of cognition have gained considerable traction in computational neuroscience and psychiatry. Their scopes are now expected to expand rapidly to artificial intelligence, providing general inference frameworks to support embodied, adaptable, and energy-efficient autonomous agents. A central theory in this domain is predictive coding, which posits that learning and behaviour are driven by hierarchical probabilistic inferences about the causes of sensory inputs. Biological realism constrains these networks to rely on simple local computations in the form of precision-weighted predictions and prediction errors. This can make this framework highly efficient, but its implementation comes with unique challenges on the software development side. Embedding such models in standard neural network libraries often becomes limiting, as these libraries’ compilation and differentiation backends can force a conceptual separation between optimization algorithms and the systems being optimized. This critically departs from other biological principles such as self-monitoring, self-organisation, cellular growth, and functional plasticity. In this paper, we introduce pyhgf: a Python package backed by JAX and Rust for creating, manipulating, and sampling dynamic networks for predictive coding. We improve over other frameworks by enclosing the network components as transparent, modular, and malleable variables in the message-passing steps. The resulting graphs can implement arbitrary algorithms as belief propagation. Moreover, the transparency of core variables can also translate into inference processes that leverage self-organisation principles and express structure learning, meta-learning, or causal discovery as the consequence of network structural adaptation to surprising inputs. The main functions of the library are differentiable and seamlessly integrate into sampling or optimization workflows. Additionally, we offer generalized Bayesian filtering and the hierarchical Gaussian filter as key examples of dynamic networks implemented in our library. The source code, tutorials, and documentation are hosted under the main repository at https://github.com/ComputationalPsychiatry/pyhgf.

24.
arXiv (CS.CL) 2026-06-15

Did You Forget What I Asked? Prospective Memory Failures in Large Language Models

作者:

Large language models often fail to satisfy formatting instructions when they must simultaneously perform demanding tasks. We study this behaviour through a prospective memory inspired lens from cognitive psychology, using a controlled paradigm that combines verifiable formatting constraints with benchmark tasks of increasing complexity. Across three model families and over 8,000 prompts, compliance drops by 2-21% under concurrent task load. Vulnerability is highly type-dependent: terminal constraints (requiring action at the response boundary) degrade most, with drops up to 50%, while avoidance constraints remain comparatively robust. A salience-enhanced format (explicit instruction framing plus a trailing reminder) recovers much of the lost compliance, restoring performance to 90-100% in many settings. Interference is bidirectional: formatting constraints can also reduce task accuracy, with one model's GSM8K accuracy dropping from 93% to 27%. In additional stacking experiments, joint compliance declines sharply as constraints accumulate. All results use deterministic programmatic checkers without an LLM-as-judge component on publicly available datasets.

25.
bioRxiv (Bioinfo) 2026-06-08

DDI_single: Single-Sequence-Based Protein Domain Assembly

作者:

Domains are the basic units of protein structure and function. Appropriate inter-domain organization is critical to enable cooperative execution of multiple related functions. It is thus a crucial step to determine the full-length structure of multi-domain proteins for the purpose of elucidating their functions and designing new drugs to regulate these functions. Existing structure prediction algorithms are generally better at solving the internal conformation of domains, rather than modeling the relative positions between domains. To address the challenge of accurately determining multi-domain protein conformations, we develop a single-sequence-based domain assembly algorithm called DDI_single. DDI_single directly extracts features from the amino acid sequence using the protein language model ESM-1b, and accurately predicts the interactions between residue pairs of structural domains through a novel gated cross-attention module, thus achieving the correct assembly of structural domains. With the knowledge of domain definition, DDI_single achieves more than 20% higher accuracy in the task of predicting the relative distances of residue pairs between domains than that of the single-sequence-based structure prediction algorithm trRosettaX_single. When assembling domains with known spatial conformations, DDI_single correctly assembles 74.4% of the samples in the test set (TM-score>0.5). When assembling domains with unknown spatial conformations, in cases where the internal spatial conformations of domains are correctly modeled, DDI_single correctly assembles 73.9% of the samples.