Academic Intelligence · Curated Daily

探索全球前沿学术脉络

AcademicHub 汇聚顶级期刊与预印本平台的实时文献。定制您的专属科研雷达,利用大语言模型自动生成交叉领域文献分析简报。

01.
Science (Express) 2026-06-04

Long-range extended chains arising from polymerization-driven spontaneous assembly | Science

作者: 未知作者

A central challenge for conjugated polymers is to achieve long-range order while remaining solution-processable, which is essential for matching the electrical performance of their counterparts of crystalline inorganic semiconductors. Here we show that n-doped poly(benzodifurandione) (n-PBDF) can undergo polymerization-driven spontaneous assembly (PSA), in which chain growth, chemical doping, and structural ordering are intrinsically coupled, yielding long-range chain extension over hundreds of nanometers. We reveal that the spontaneously formed n-PBDF nanoribbons arise from a self-initiated, convergent growth mechanism driven by cooperative monomer–polymer interactions and stabilized by proton-coupled duplex chains and the polymer’s intrinsic polyelectrolyte character. With long-range extended chains in the nanoribbons, the aligned n-PBDF thin films demonstrate metallic-level conductivity (>10 4 Siemens per centimeter).

02.
arXiv (CS.CL) 2026-06-16

Islamic Large Language Models: From Knowledge Acquisition to Trustworthy and Hallucination-Resistant AI

Large language models (LLMs) are increasingly used for knowledge-intensive question answering, including religious and legal questions. Islamic knowledge is a particularly demanding setting: answers are expected to be grounded in authoritative sources, citations must be exact, Arabic varieties differ substantially from the language of classical sources, and legitimate jurisprudential disagreement must be represented rather than collapsed into a single answer. This survey reviews the emerging field of Islamic LLMs and trustworthy Islamic AI. We organize the literature around Arabic NLP and Arabic-centric LLMs, Islamic NLP resources, Qur'anic question answering, Islamic knowledge benchmarks, retrieval-augmented generation, Islamic legal reasoning, inheritance reasoning, hallucination evaluation, and trustworthiness. We argue that fluency in Arabic is not sufficient for Islamic AI. Reliable systems require curated sources, retrieval and verification modules, citation-aware generation, madhhab-aware reasoning, human expert evaluation, and benchmarks that measure not only answer accuracy but also faithfulness, source validity, and reasoning quality. The survey concludes with a research agenda for hallucination-resistant Islamic AI systems.

03.
arXiv (CS.LG) 2026-06-16

David vs. Goliath in Next Activity Prediction: Argmax vs. LSTM, Transformer, and LLM

arXiv:2606.15868v1 Announce Type: new Abstract: Next activity prediction (NAP) is a cornerstone of predictive process monitoring (PPM), enabling organizations to move from retrospective analysis to proactive process steering. The PPM field has progressed from classical machine learning through deep learning architectures such as LSTMs and Transformers to large language models (LLMs). Despite growing model complexity, no benchmark jointly compares LLMs, Transformers, LSTMs, and simple baselines in a direct sequence modeling setting for NAP. In this paper, we fill this gap with a systematic benchmark. We compare vocabulary-adapted LLMs, Transformers trained from scratch, LLM-distilled Transformers, and LSTMs against a simple counting-based argmax baseline across seven real-life event logs. Our results tell a David vs. Goliath story: pretraining confers no consistent improvement over training from scratch, model size shows little effect on performance, and on most datasets the argmax baseline matches or approaches the performance of billion-parameter LLMs.

04.
arXiv (CS.CV) 2026-06-19

Timage: A Generative Text-in-Image Paradigm for Fine-Tuning Vision-Language Models

Multimodal Large Language Models (MLLMs) often lose track of the right image regions during fine-grained spatial reasoning, because a textual query rarely carries any explicit geometric anchor into the pixel domain. Prevailing remedies either rewire the model's weights or pad the prompt with verbose instructions, yet neither reliably pins the language to the correct visual coordinates without eroding the backbone's general competence. We introduce Timage, a paradigm that recasts multimodal understanding as an alignment problem solved at the input: the query is drawn, as a typeset overlay, onto the image itself. The placement and appearance of this overlay are produced by a Constrained Schrödinger Bridge (cSB), an entropic optimal-transport sampler that factorizes layout synthesis into two coupled stochastic stages. The first stage, Region Search, transports noise toward query-aligned image zones while obeying a hard occlusion barrier that protects salient foreground content; the second stage, Appearance Shaping, sizes the glyphs through an ``ink-budget'' regularizer so that the rendered text stays legible and visually balanced. The resulting overlay behaves as an explicit attention beacon that channels the model's focus along spatial semantics. On the VMCBench suite, Timage paired with a modest 7B backbone clearly overtakes far larger proprietary systems as well as parameter-tuned baselines. The study positions deliberate input reconstruction as a powerful, architecture-neutral lever for strengthening multimodal reasoning.

05.
arXiv (CS.CL) 2026-06-24

Benchmarking LLMs' Mathematical Reasoning with Unseen Random Variables Questions

Recent studies have raised significant concerns regarding the reliability of current mathematics benchmarks, highlighting issues such as simplistic design and potential data contamination. Consequently, developing a reliable benchmark that effectively evaluates large language models' (LLMs) genuine capabilities in mathematical reasoning remains a critical challenge. To address these concerns, we propose RV-Bench, a novel evaluation methodology for Benchmarking LLMs with Random Variables in mathematical reasoning. Specifically, we build question-generating functions to produce random variable questions (RVQs), whose background content mirrors original benchmark problems, but with randomized variable combinations, rendering them "unseen" to LLMs. Models must completely understand the inherent question pattern to correctly answer RVQs with diverse variable combinations. Thus, an LLM's genuine reasoning capability is reflected through its accuracy and robustness on RV-Bench. We conducted extensive experiments on over 30 representative LLMs across more than 1,000 RVQs. Our findings propose that LLMs exhibit a proficiency imbalance between encountered and ``unseen'' data distributions. Furthermore, RV-Bench reveals that proficiency generalization across similar mathematical reasoning tasks is limited, but we verified it can still be effectively elicited through test-time scaling.

06.
arXiv (CS.AI) 2026-06-19

Human-AI Agent Interaction in a Business Context

arXiv:2606.18716v1 Announce Type: cross Abstract: As AI agents are increasingly integrated into core business processes, understanding and designing effective interaction patterns between humans and AI agents becomes crucial for value creation. This study identifies and evaluates principles and criteria for a positive User Experience (UX) with AI agents, along with methods for its measurement. We identify user expectations and needs to facilitate adoption, build trust, and support user-centered decision-making by development teams. Using a mixed-methods approach that combines qualitative and quantitative techniques, we explore interaction patterns between humans and AI agents. The findings from this exploratory research serve as the basis to develop a survey experiment which evaluates the effectiveness of specific design elements on a larger scale. This foundational research contributes to the development of more intuitive and effective human-AI agent interactions in business settings.

07.
bioRxiv (Bioinfo) 2026-06-18

Benchmarking attention-based methods for vision transformers' interpretability in retinal fundus imaging

Deep learning models based on Vision Transformers (ViTs) have shown strong performance in retinal fundus imaging, but their interpretability remains poorly understood. In particular, attention-based attribution methods are widely used to explain ViT predictions, despite limited evaluation of their faithfulness and biological relevance in medical imaging. Here, we systematically benchmark four attention-based interpretability methods for RETFound, a retinal ViT-based foundation model, that we previously fine-tuned to predict 17 retinal vascular phenotypes from UK Biobank fundus images1. We compare raw attention, attention rollout, gradient-weighted attention rollout, and Chefer's hybrid relevance-based method using both qualitative visualisation and quantitative evaluation frameworks. To assess attribution faithfulness, we perform perturbation-based deletion and insertion experiments, quantifying changes in model predictions as highly attended image regions are progressively removed or restored. To evaluate biological specificity, we run structure-aware analyses combining attribution maps with vessel segmentation and artery-vein labels through the Relative ratio of Attention Intensity (RAI) metric. Across models, attribution maps differed substantially depending on the selected interpretability method, highlighting the need for rigorous quantitative evaluation. Among the evaluated approaches, gradient-weighted attention rollout consistently achieved the strongest perturbation performance and produced attribution maps most closely aligned with the anatomical definition of the predicted retinal traits. Furthermore, vessel-type specific models systematically concentrate attention on the corresponding vascular structures despite being trained using only a single scalar value per image as supervision. These findings demonstrate that attention-based attribution methods capture biologically meaningful vascular representations, while also revealing method-dependent variability in attribution behaviour. This work provides a quantitative framework for evaluating interpretability methods in medical imaging with annotated segmentation and contributes toward more transparent and biologically grounded medical AI systems.

08.
arXiv (CS.CV) 2026-06-11

IB-HFN: Information Bottleneck-Driven SAR-Optical Fusion Network for High-Fidelity Cloud Removal

Synthetic aperture radar (SAR)-assisted optical cloud removal aims to recover surface information obscured by clouds in optical remote sensing images by exploiting complementary SAR observations. Existing multimodal fusion methods typically rely on direct spatial concatenation and pixel-wise supervision, which can propagate SAR speckle noise into optical reconstruction and lead to over-smoothed results. To address these limitations, we propose an Information Bottleneck-driven High-Fidelity Network (IB-HFN) for SAR-assisted optical cloud removal. IB-HFN employs a dual-stream backbone to preserve modality-specific representations before deep semantic fusion, thereby mitigating premature cross-modal contamination. At the fusion stage, we introduce a Spatial Information Bottleneck Fusion module that compresses SAR features through a channel-wise variational information bottleneck to suppress unstructured speckle noise. In parallel, a local-global gating mechanism predicts clear-sky regions and routes reliable optical details through a Dirac-initialized skip connection, decoupling noise suppression from texture preservation. We further develop a joint optimization strategy that integrates feature-level bottleneck regularization with image-level constraints on reconstruction accuracy, structural consistency, spectral fidelity, and contrastive sharpness. A dynamic weighting schedule balances these objectives to stabilize training and reduce hazy artifacts. Experiments on the SEN12MS-CR dataset under challenging spatio-temporal splits demonstrate that IB-HFN achieves superior structural preservation and spectral fidelity over existing methods.

09.
arXiv (math.PR) 2026-06-16

Phase Transition in Convex Relaxations for Graph Alignment

arXiv:2606.15581v1 Announce Type: cross Abstract: We study the graph alignment problem for correlated Gaussian Orthogonal Ensemble (GOE) matrices, where the goal is to recover a hidden vertex permutation given two correlated symmetric Gaussian matrices $(A, B)$ with correlation $1/\sqrt{1+\sigma^2}$. While the maximum likelihood estimator is information-theoretically optimal, its computation, which reduces to a quadratic assignment problem, is intractable. Motivated by this, we analyze convex relaxations based on minimizing $\|AX - XB\|_F$ over the set of doubly stochastic matrices and the unit hypercube. We show that when the correlation parameter satisfies $\sigma = o(n^{-1/2}/\log^4 n)$, the solution of either relaxation $(X^\star)$ concentrates around the ground-truth permutation matrix $(\Pi^\star)$, i.e., $\|X^\star-\Pi^\star\|_F^2 = o(n)$, implying recovery of all but a vanishing fraction of vertices after simple post-processing. Combined with existing lower bounds, our results precisely characterize that $\|X^\star-\Pi^\star\|_F^2$ transitions from $o(n)$ for $\sigma = \tilde{o}(n^{-1/2})$ to $\Omega(n)$ for $\sigma = \tilde{\Omega}(n^{-1/2})$. In doing so, our analysis significantly tightens prior results and extends them beyond doubly stochastic relaxations.

10.
bioRxiv (Bioinfo) 2026-06-23

Automated Segmentation of Prostatic Gold Fiducial Markers for MR-Only Radiotherapy Planning Using Multi-Modal Consensus Deep Learning

Purpose: To develop and evaluate a multi-model consensus deep learning approach for automated gold fiducial marker (FM) segmentation in T1-weighted prostate MRI. Materials and Methods: In this retrospective study, T1-weighted MRI and CT-derived reference standard segmentations were collected from 127 prostate cancer patients (all male; mean age, 70 years +/- 7 [standard deviation]; age range, 50-88 years; collected between October 2020 and January 2026) who each had three implanted gold FMs. A 3D U-Net was trained on 93 subjects using four random seeds to produce an ensemble. At inference, marker-class probability maps were averaged across models and the top three connected components selected. Performance was evaluated on 34 temporally held-out subjects (9 tuning, 25 test) using marker-level sensitivity and precision with exact (Clopper-Pearson) 95% confidence intervals (CIs). A model count ablation study was performed. The pipeline was deployed for on-scanner processing on Siemens MRI systems via the OpenRecon framework and as a browser-based application using WebAssembly, executing entirely client-side. Results: The four-model consensus achieved 96% (70 of 73) sensitivity and 95% (70 of 74) precision on 25 test subjects, with 29 of 34 (85%) subjects achieving perfect marker detection. Single models had a mean sensitivity of 84% (SD, 9%), improving to 96% with four-model consensus (SD,

11.
medRxiv (Medicine) 2026-06-22

Impact of Antidiabetic Medications on IgG and Plasma Protein N-Glycosylation in Type 2 Diabetes Patients

Introduction. Diabetes is a growing global health challenge, necessitating effective management strategies. Glycosylation, a highly regulated post-translational protein modification, has emerged as a pivotal factor in diabetes pathophysiology. However, the modulation of protein glycosylation by antidiabetic treatment is still largely unknown. This study explored the longitudinal effects of four distinct antidiabetic therapies - metformin, insulin, sodium-glucose cotransporter-2 (SGLT2) inhibitors, and glucagon-like peptide-1 receptor agonists (GLP-1RA) - on plasma protein and immunoglobulin G (IgG) glycosylation in patients with type 2 diabetes (T2D). Research Design and Methods. Plasma protein and IgG N-glycans were enzymatically released, purified and chromatographically profiled in a cohort of 124 patients, examined at four time points, to assess therapy-induced glycan alterations. Linear mixed models adjusting for covariates and multiple testing (FDR

12.
medRxiv (Medicine) 2026-06-13

Projected population level impact and cost-effectiveness of clinic and community-based tuberculosis screening approaches

The South Africa National Department of Health have set ambitious targets to scale up TB testing, focusing primarily on clinic attendees. In the context of declining funding for TB care and prevention, the most cost-effective approaches for targeting testing should be identified. We developed a mathematical model of TB in South Africa, explicitly incorporating clinic attendance by sex and HIV/ART status. We simulated six screening approaches over 2026-2035 (individually and in combination): three clinic-based (symptom screening, intensified targeted universal TB testing [TUTT, symptom-agnostic sputum testing of clinic attendees in key risk groups], and intensified TUTT allowing saliva samples) and three targeted community-based (community radiographic screening, symptom screening, and universal Xpert Ultra testing), each implemented at a range of coverage levels. Model outputs were combined with a mechanistic cost function to estimate potential impact and cost-effectiveness from a societal perspective. The most cost-effective standalone approach was community radiographic screening at 10% annual population coverage, with an incremental cost-effectiveness ratio (ICER) of $421 per disability-adjusted life year (DALY) averted. 10/11 scenarios along the expansion path included community radiographic screening at progressively higher coverage, combined with a clinic-based approach. Combining complementary approaches to reach both groups at increased risk of TB (e.g. clinic-based screening) and groups with lower screening coverage (e.g. community-based screening) may increase cost-effectiveness of TB screening, compared to standalone approaches. When designing TB screening strategies, both population risk and existing screening coverage should be considered.

13.
arXiv (CS.CV) 2026-06-16

Double-Helix Vision (DH-V2): A Geometry-Based Visual Sampler for Bandwidth-Constrained Perception

作者:

We present Double-Helix Vision (DH), a geometry-based visual sampler that compresses 2D images into compact 1D signals using paired golden-ratio-inspired spiral trajectories. Rather than processing every pixel uniformly, DH employs two phase-shifted helices (Alpha and Beta, offset by 180 degrees) to sample the image with biologically-inspired foveation: high density at the center, sparse coverage at the periphery. At 4K resolution, DH achieves a 1,433x compression ratio (99.93% reduction) while preserving the geometric structure of the scene. The full perception pipeline – including spatial mapping, temporal collision detection, and intra-frame structural disparity estimation – runs in 0.52 ms at 1080p on CPU-only hardware, with no neural network dependencies. On CIFAR-10 at extreme sampling budgets (K=128 points per helix), DH achieves a +6.03% accuracy gain over uniform random sampling. A JSON-serializable Robotics API is provided, delivering sub-millisecond spatial perception reports in 2.7 KB packets. Code and benchmarks are available under the MIT License.

14.
medRxiv (Medicine) 2026-06-15

Anti-Platelet Factor 4 Antibody Clonal Heterogeneity and MGUS Status in HIT

Background Monoclonal gammopathy of thrombotic significance (MGTS) is a recently described chronic prothrombotic condition characterized by monoclonal anti-PF4 antibodies that are detected above the polyclonal antibody background in patient sera (i.e. present as monoclonal gammopathy of undetermined significance, MGUS). Due to conflicting data in the published literature on antibody clonality in heparin-induced thrombocytopenia (HIT), we evaluated clonality and abundance of anti-PF4 antibodies in HIT, including investigating whether an MGUS, if present in HIT, represents the causative anti-PF4 antibody. Methods Blood samples from 15 patients with HIT were subject to Platelet Factor 4-dependent antigen-based and functional tests. The unmanipulated serum antibody repertoire and isolated anti-PF4 antibodies were subjected to mass spectrometric evaluation. Results Two of the 15 HIT patients had an IgG MGUS. Notably, anti-PF4 antibodies were not synonymous with the MGUS antibody in either of the two patients. Eight of the 15 patients demonstrated monoclonal anti-PF4 antibodies, however, none of the anti-PF4 antibodies were detectable as an MGUS upon evaluation of the entire serum antibody repertoire, reflecting their low abundance. In the seven patients with multiple anti-PF4 antibodies, non-monoclonality was confirmed by analysis of deglycosylated antibody heavy chains. Conclusions Anti-PF4 HIT antibodies are monoclonal in approximately 50% of HIT patients, however, antibody abundance is low such that they are not detectable over the polyclonal IgG background (i.e. are MGUS-negative), differentiating HIT from MGTS. This observation helps explain the transient nature of HIT relative to the persistent prothrombotic state seen in MGTS.

15.
medRxiv (Medicine) 2026-06-15

Multidimensional nutritional assessment in Crohns disease: cross-sectional comparison of active disease and remission

Malnutrition is common in Crohns disease (CD), and its assessment requires multiple tools. Comprehensive evaluation of nutritional status in a population with CD, predominantly characterized by metabolic phenotype, was inadequately reported. This study evaluated the nutritional status of CD patients using anthropometric, clinical, and biochemical measures and compared patients with active disease with those in remission. This cross-sectional study included 127 adults with CD: 63 with active disease and 64 in remission. Disease activity was classified using the Crohns Disease Activity Index, the Simple Endoscopic Score for Crohns Disease, and magnetic resonance enterography. Nutritional assessment included body mass index (BMI), mid-upper arm circumference, calf circumference, triceps skinfold thickness, mid-arm muscle circumference, Mini Nutritional Assessment-Short Form (MNA-SF), and biochemical markers including hemoglobin, serum iron, folate, vitamin B12, albumin, and zinc. Malnutrition was defined using the Global Leadership Initiative on Malnutrition criteria. Overall, 47.2% of participants were malnourished. Malnutrition was significantly more frequent in active disease than in remission (81.0% vs. 14.1%, P

16.
arXiv (quant-ph) 2026-06-12

Symmetry and Topology of Monitored Quantum Dynamics

arXiv:2412.06133v4 Announce Type: replace-cross Abstract: The interplay between unitary dynamics and quantum measurements induces diverse phenomena in open quantum systems with no counterparts in closed quantum systems at equilibrium. Here, we generally classify Kraus operators and their effective non-Hermitian dynamical generators, thereby establishing the tenfold classification for symmetry and topology of monitored free fermions. Our classification elucidates the role of topology in measurement-induced phase transitions and identifies potential topological terms in the corresponding nonlinear sigma models. Furthermore, we establish the bulk-boundary correspondence in monitored quantum dynamics: nontrivial topology in spacetime manifests itself as topologically nontrivial steady states and gapless boundary states in Lyapunov spectra, such as Lyapunov zero modes and chiral edge modes, leading to the topologically protected slowdown of dynamical purification.

17.
Nature Biotechnology 2026-06-11

Large-scale, spatially resolved panoramic CRISPR screening in native tissue environments using Perturb-DBiT

作者:

Spatially resolved CRISPR screening in vivo has been limited to small perturbation panels and subsets of protein-coding RNAs. We present Perturb-DBiT, a method for co-sequencing of spatial total RNA whole transcriptomes and single guide RNAs (sgRNAs) on the same tissue section in situ. In a human cancer metastatic colonization model, we applied large (80,000+) sgRNA panels across tumor colonies in multiple consecutive tissue sections alongside their corresponding total RNA transcriptomes. We linked perturbations affecting long noncoding RNA covariation, microRNA–mRNA interactions and distinct amino acid-specific tRNA alterations to tumor migration and growth. By integrating transcriptional pseudotime trajectories, we further observed the impact of perturbations on clonal dynamics and cooperation. In an immune-competent syngeneic mouse model, investigation of the tumor immune microenvironment indicated distinct, synergistic effects on immune infiltration and suppression. Perturb-DBiT provides a spatially resolved comprehensive view of perturbation responses in complex tissues, including small and large RNA regulation, tumor proliferation, migration, metastasis and immune interactions. In vivo CRISPR genetic perturbations are spatially mapped at scale.

18.
arXiv (CS.LG) 2026-06-15

Recipe-Controlled Decoder Audit for Structural Knowledge-Graph Completion

arXiv:2606.14492v1 Announce Type: new Abstract: We present a recipe-controlled decoder audit (RCDA) for structural transductive knowledge-graph completion (KGC). The audit asks a simple reporting question: before attributing gains to an encoder or training recipe, what changes when the decoder is swapped under the same recipe? Using ComplEx and DistMult as the primary controlled pair, with targeted RotatE/TransE spot-checks, we evaluate seven benchmarks. On five standard KGs, ComplEx-vs-DistMult differences are modest but consistent under our recipe (+0.005 to +0.012 MRR), whereas CompGCN-style encoder effects vary more by dataset. On small KGs, decoder effects become the main diagnostic: Kinship shows a stable ComplEx advantage of +0.143 MRR (6 seeds), while UMLS favours ComplEx by +0.022 MRR in a clean 6-seed server rerun but reverses in an earlier provenance variant. We therefore treat small-KG decoder choice as recipe- and provenance-sensitive rather than as a fixed dataset winner. We further show that decoder choice interacts with encoder depth on WN18RR, and that under our recipe L=0 ComplEx on YAGO3-10 reaches 0.6971 +/- 0.0048 MRR at d=128. The result is a compact audit protocol: report matched decoder rows, log small-KG provenance, and sweep decoder x depth before making encoder-level claims.

19.
bioRxiv (Bioinfo) 2026-06-21

OracleScreen-LILRB4: Machine Learning-Guided Discovery of Myeloid Immune Checkpoint Binders Validated in Patient-Derived Cells

The identification of small molecule modulators of immune checkpoint proteins remains a significant challenge in drug discovery due to the flat, featureless nature of protein-protein interaction interfaces and the characteristically low hit rates observed in conventional high-throughput screening campaigns. Here we report OracleScreen-LILRB4, an ensemble machine learning framework trained on quantitative biophysical screening data from two structurally diverse compound libraries (19,800 compounds total) screened against the myeloid immune checkpoint leukocyte immunoglobulin-like receptor B4 (LILRB4/ILT3). By formulating binding prediction as a regression task targeting continuous {Delta}Fnorm values rather than binary hit classifications, OracleScreen-LILRB4 achieved a mean Spearman R of 0.61 and ROC-AUC of 0.86 under scaffold-aware cross-validation. Prospective virtual screening of a 45,760-member compound library and experimental validation of the top 200 predictions yielded a 28.5% hit rate, representing a 15.0-fold enrichment over baseline, with 16 compounds demonstrating nanomolar-affinity LILRB4 (ILT3) engagement. Lead compounds ORS-22 and ORS-14 restored anti-tumor immune activity across patient-derived colorectal cancer and acute myeloid leukemia co-culture systems, reversing SCG2-mediated immunosuppression and recovering cytotoxic T-cell function. These findings establish OracleScreen-LILRB4 as an effective computational framework for accelerating small molecule discovery against non-enzymatic immune checkpoint targets.

20.
arXiv (CS.AI) 2026-06-16

NeuroSymbolic AI for Legal AI-TRISM: Trustworthy, Reliable, Interpretable, Safe Models

arXiv:2606.15646v1 Announce Type: new Abstract: Large Language Models (LLMs) have transformed natural language processing, but their lack of interpretable reasoning and tendency to hallucinate pose significant challenges for legal applications. While LLMs show promise for legal text analysis and generation, they struggle with accurate citation attribution and precedent verification. For example, in legal contexts, a single incorrect precedent can jeopardize a case. Current approaches to improve LLM reliability in legal domains suffer from two key limitations: inadequate integration of structured legal knowledge during training or fine-tuning, and insufficient verification mechanisms for generated legal content. To address these challenges, we propose the TRISM (Trustworthy, Reliable, Interpretable, Safe Models) framework, which integrates NeuroSymbolic AI principles with LLMs to leverage both neural learning capabilities and symbolic reasoning over structured legal knowledge. The TRISM approach addresses the above limitations while maintaining interpretable decision pathways. Our framework formalizes the extraction of symbolic knowledge from legal textual documents and incorporates Retrieval-Augmented Generation (RAG) as a core component for grounding LLM outputs in verified legal sources. In this position paper, we make the following contributions: (1) An analysis of the limitations of AI in law; (2) Introduce RASOR RAG which creates foundations for neurosymbolic RAG by generating explicit interpretable rationales that could be formalized into symbolic representations; (3) A formalized methodology for creating symbolic legal knowledge bases that support both interpretable reasoning and output verification in LLMs; and (4) The TRISM framework for integrating symbolic legal knowledge with LLMs.

21.
arXiv (CS.CV) 2026-06-16

Spatial Priors via Space Filling Curves for Small and Limited Data Vision Transformers

Though Vision Transformers (ViTs) have become the dominant backbone in many computer vision tasks, due to permutation equivariance, their attention mechanism lacks explicit spatial inductive biases. This become particularly important in two settings: when model capacity is small or training data is limited. Inspired by the attention masking strategies in Linear Transformers and the scanning patterns of Vision SSMs, we introduce VIOLIN, a lightweight masked attention mechanism that encodes spatial structure within attention via Space Filling Curves (SFCs) with less than 0.0015% extra parameters and negligible computational overhead. VIOLIN scans the image using multiple SFCs to construct curve-specific decay masks, which are then combined and multiplied with the attention matrix. Across a wide range of evaluations, VIOLIN consistently improves performance. In limited data regimes such as fine-tuning on VTAB-1K, it boosts accuracy across all task groups and by up to 8.7% on the tasks where spatial information is essential. It can be combined with parameter-efficient fine-tuning methods such as LoRA to further increase the performance. Beyond fine-tuning, VIOLIN improves various small scale ViT architectures (e.g., DeiT, DINO) during pretraining on ImageNet-1K. Additionally, on pixel-level CIFAR-100 training, a task that is highly dependent on location information, VIOLIN increases accuracy by up to 7.2%. Overall, VIOLIN provides a computationally efficient yet effective way to inject spatial inductive bias into ViTs, especially benefiting small models and limited data settings.

22.
arXiv (CS.AI) 2026-06-16

MA-SBI: Misspecification-Aware Simulation-Based Inference via Side-Channel Guidance

arXiv:2606.16923v1 Announce Type: new Abstract: Simulation-based inference (SBI) of latent parameters is often hindered by simulator misspecification, the mismatch between simulated and real-world observations caused by inherent modeling simplifications. RoPE, the recent state-of-the-art for robust SBI, addresses this through optimal transport between learned representations of real and simulated observations, but requires ground-truth parameter calibration pairs that are typically unavailable in the very settings where SBI is needed. What practitioners do have is unstructured side-information such as regime labels, instruction text, and policy bulletins. We propose Misspecification-Aware Simulation-Based Inference (MA-SBI), a calibration-free framework that turns this side-channel into a posterior correction. A learned corrector maps side-channel text to an observation-space shift applied before any pre-trained amortized posterior, requiring no retraining and no parameter ground-truth. Our main theorem bounds achievable bias reduction by the mutual information between misspecification and side-channel, with a non-vacuous constant that extends to all sub-Gaussian noise via Donsker-Varadhan. On hide-the-calibration benchmarks, MA-SBI with text alone matches the oracle posterior across 10 seeds and two backbones (TOST equivalence), while RoPE given more data does not. The two approaches are complementary: where misspecification is structural and recoverable from parameter pairs, RoPE dominates, as the theory predicts. A stochastic variant improves posterior-predictive log-likelihood on real COVID and OxCGRT epidemiological data, and correctly leaves the posterior unchanged on a well-specified cognitive-science corpus.

23.
arXiv (CS.AI) 2026-06-11

Human-Enhanced Loop Modeling (HELM): Agent-Based Finite Element Modeling of Concrete Bridge Barriers

arXiv:2606.12025v1 Announce Type: new Abstract: Finite element (FE) modeling of safety-critical infrastructure such as bridge barriers requires high-fidelity nonlinear dynamic analysis, yet the current FE modeling process remains labor-intensive and lacks automation. This paper presents the Human-Enhanced Loop Modeling (HELM) framework, a collaborative human-agent protocol that decomposes long-sequence finite element modeling into discrete, visually verifiable checkpoints across geometry generation, boundary condition definition, and material assignment. The framework is demonstrated through a 20-case matrix of reinforced concrete bridge barriers under MASH TL-4 and TL-5 lateral loading conditions, interfacing specialized agents with two widely used commercial FE softwares, i.e., ANSYS and LS-PrePost. Experimental results show that HELM improves the baseline autonomous modeling success rate from 20% to 75%, with agent-level pass rates for geometry and boundary condition tasks approximately doubling. Error analysis reveals that spatial reasoning and algebraic logic limitations constitute the primary failure modes, underscoring the value of structured human-in-the-loop intervention for modeling automation. The complete agent design code and prompts are open-sourced and can be accessed at: https://github.com/SimAgentDev/Ansys-LSPP-AgentKit.

24.
arXiv (CS.CL) 2026-06-24

Measuring User's Mental Models of Speech Translation in Human-AI Collaboration

Millions of people use machine translation (MT) tools daily, yet little is known about their perception of what systems can and cannot do. This paper studies users' mental models of speech translation systems through a new framework based on cross-lingual question answering, where users either accept MT output or request professional re-translation to answer questions based on the information presented in a foreign language. By analyzing user behavior and accuracy trends across varying translation qualities, we examine to what extent they can predict where the system is likely to be wrong, and how this mental model evolves. Users develop stronger mental models with practice, especially when they have some knowledge of the source language, primarily by relying on surface-level error cues. Moreover, providing speech transcriptions can help users develop better mental models. Our results show the promise of cross-lingual question answering as a downstream task for studying MT mental models and advancing our understanding of human-AI collaboration.

25.
medRxiv (Medicine) 2026-06-17

Postoperative Cognitive Decline in Older Patients with Cardiovascular Disease and Preoperative Mild Cognitive Impairment

Objective. Older adults undergoing cardiac surgery may be vulnerable to postoperative cognitive decline. However, no studies have examined postoperative cognitive outcomes in older patients with cardiovascular disease (CVD) according to preoperative mild cognitive impairment (MCI). This study examined 12-month postoperative cognitive outcomes in older CVD patients according to preoperative MCI diagnosis and explored predictors of postoperative cognitive decline. Method. Twenty-two older CVD patients ([≥]65 years) and twenty-five controls were included. Neuropsychological assessment was conducted at baseline in both groups and repeated 12 months after surgery in the CVD group. MCI was diagnosed using current clinical criteria. Postoperative cognitive change was examined across preoperative MCI groups. Results. Fifty percent of patients met criteria for postoperative MCI, showing high diagnostic stability relative to preoperative frequency (45.5%). The preoperative CVD-MCI group showed a decline in working memory, executive functions, visual memory, and naming, whereas CVD-nMCI group declined only in verbal memory. Furthermore, CVD-MCI showed more heterogeneous postoperative cognitive trajectories of change than CVD-nMCI, who showed stability. Estimated IQ, APACHE-II score, and postoperative frailty were important variables in predicting the postoperative pattern. Conclusions. MCI frequency remained high and stable in older CVD patients across the preoperative and one-year postoperative period. However, this apparent diagnostic stability masks subclinical cognitive decline, particularly among patients with preoperative MCI, who showed greater susceptibility to further impairment. Estimated IQ, APACHE-II score, and postoperative frailty may be considered relevant predictors of outcome. These results highlight the value of preoperative neuropsychological assessment for characterizing postoperative cognitive risk in older CVD patients.