Academic Intelligence · Curated Daily

探索全球前沿学术脉络

AcademicHub 汇聚顶级期刊与预印本平台的实时文献。定制您的专属科研雷达,利用大语言模型自动生成交叉领域文献分析简报。

01.
arXiv (CS.AI) 2026-06-17

AnchorKV: Safety-Aware KV Cache Compression via Soft Penalty with a Refusal Anchor

arXiv:2606.17872v1 Announce Type: cross Abstract: Large language models (LLMs) outperform earlier architectures on generative inference and long-context tasks, but their large size introduces significant challenges in memory usage, energy cost, and on-device deployment. Since scaling pre-trained language models improves downstream capability [zhao2023survey], the key-value (KV) cache becomes a dominant inference bottleneck. Recent KV cache compression methods [jo2025fastkv,li2024snapkv,zhou2024dynamickv] reduce this cost by retaining only a subset of attention-relevant tokens. However, while these approaches preserve accuracy on benign workloads, their compression policies either fail to defend against jailbreak attacks [jiang2024robustkv] or degrade safety alignment under aggressive eviction. We propose AnchorKV, a drop-in modification to KV cache compression that biases token retention scores away from directions in key space associated with harmful prompts. AnchorKV constructs an offline safety anchor by adapting a difference-of-means representation engineering approach [arditi2024refusal,zou2023representation] to the layer-specific key projection space used in KV caching. Based on this anchor, a soft penalty token selection rule trades a small amount of utility for substantially improved safety alignment, while reducing to the original compressor when the penalty is zero.

02.
arXiv (CS.LG) 2026-06-18

ChronoSurv: A Clinical Pathway-Guided Graph Framework for Multimodal Survival Analysis

arXiv:2606.19140v1 Announce Type: new Abstract: Accurate survival prediction is essential for personalized treatment planning in head and neck cancer, yet remains challenging due to the heterogeneous and high-dimensional nature of multimodal clinical data. While deep survival models have improved predictive performance over classical statistical approaches, existing methods typically rely on static fusion strategies or temporally agnostic modeling, limiting their ability to capture structured clinical workflows. In this work, we propose ChronoSurv, a heterogeneous hierarchical directed graph framework for multimodal survival analysis. ChronoSurv represents patient care as a progression-aware clinical trajectory using directed graphs aligned with key diagnostic steps. A hierarchical topology incorporates fine-grained, coarse, and global representations, further supporting flexible adaptation to missing modalities, while heterogeneous message passing models complex and asymmetric relationships across modalities and clinical steps. Experimental results on two public datasets demonstrate that ChronoSurv achieves state-of-the-art discriminative performance while maintaining statistically reliable calibration. Comprehensive ablation studies further confirm the contribution of each architectural component, highlighting the potential of trajectory-aware graph modeling for multimodal survival prediction.

03.
PLOS Computational Biology 2026-06-02

A comparative study of simulation-based inference methods for epidemic models with identifiability considerations

作者:

by Geunsoo Jang, K. Selçuk Candan, Gerardo Chowell Epidemic models play a critical role in understanding transmission dynamics, generating forecasts, and informing public health interventions when they are properly calibrated to epidemiological data. Traditional Bayesian inference methods rely on the likelihood function to update prior knowledge using observed data. However, for realistic epidemic models, likelihood functions are often analytically intractable or computationally prohibitive, which can limit the applicability of these methods. Simulation-based inference provides a promising alternative by approximating posterior distributions through forward simulations rather than an explicit likelihood evaluation. In this study, we present a systematic comparison of four approaches: Approximate Bayesian Computation (ABC), Neural Posterior Estimation (NPE), a neural method with temporal embedding, and Preconditioned Neural Posterior Estimation (PNPE), which integrates elements of both classical and neural techniques. These methods are evaluated across epidemic models of increasing complexity under fixed simulation budgets and varying levels of observational noise, with explicit attention to both structural and practical identifiability. Our results show that neural methods generally improve posterior fidelity and predictive accuracy compared with ABC under constrained simulation budgets. PNPE achieved strong performance in several simulation settings, whereas temporal embeddings improved inference in models with complex epidemic dynamics by capturing sequential dependencies. These gains come with important trade-offs: PNPE required substantially greater computational resources and, unlike fully amortized NPE-based methods, may require reconditioning for each new observation. In contrast, ABC remained computationally efficient and provided reasonable, though often more conservative, posterior estimates. Overall, our findings highlight trade-offs among computational efficiency, posterior accuracy, uncertainty calibration, and inference reusability, suggesting that method selection should depend on model complexity, data quality, identifiability, and available computational resources.

04.
arXiv (CS.AI) 2026-06-19

Which Pairs to Compare for LLM Post-Training?

arXiv:2606.19607v1 Announce Type: new Abstract: Preference-based post-training has become a central paradigm for aligning language models. A common data-collection strategy is to generate a small set of completions for each prompt and label the resulting comparison pairs. However, human preference labels are often much more expensive than generating additional completions, suggesting a different use of the same labeling budget: generate a larger pool of completions, but label only the most informative comparison pairs. This paper studies which pairs should be compared in preference-based post-training. We formulate comparison curation as a sampling-design problem and evaluate designs by the quality of the final policy under the preference-based post-training objective. We instantiate this framework for Direct Preference Optimization (DPO), analyzing how the choice of labeled pairs propagates through DPO training to downstream policy performance. Our main results provide matching upper and lower bounds on the post-training optimality gap of the DPO-trained policy. The bounds show that comparison selection affects downstream performance through a single design-dependent information matrix, which links label allocation to parameter estimation error and policy suboptimality. This yields an explicit optimization criterion for budgeted comparison curation and motivates practical sampling designs for selecting informative pairs from large generated completion pools. Experiments on synthetic settings and language-model post-training benchmarks show that the proposed designs consistently improve sample efficiency over common comparison-selection heuristics.

05.
arXiv (CS.CL) 2026-06-19

Learning to Prompt: Improving Student Engagement with Adaptive LLM-based High-School Tutoring

LLMs can personalize education, although current static-prompt tutoring systems struggle to adapt to diverse academic disciplines. We develop and test a system with subject-aware prompting, based on 14 pedagogical features (e.g., tutor scaffolding, student understanding) extracted from raw transcripts. We first train a prompt routing model in a simulation environment, and then deploy it for online adaptation with actual high-school students. The simulation benchmark shows the router outperforming two static baselines ($0.694$ vs. $0.647$ and $0.64$, $p

06.
arXiv (CS.CV) 2026-06-16

SiGnature: Explicit Motion Diffusion for Stylized Semantic Gesture

While recent advances in co-speech gesture generation have achieved impressive rhythmic synchronization, synthesizing gestures that are both semantically meaningful and faithful to a speaker's unique non-verbal style remains an open challenge. Semantic gestures, such as iconic shapes or deictic pointing, are statistically sparse, making them difficult to learn effectively within standard generative models. We present SiGnature, a framework for Stylized and Semantic Gesture generation that reconciles precise semantic control with high-fidelity style preservation. Unlike prevalent methods that rely on entangled latent representations, SiGnature operates in an explicit joint-rotation space. This design enables our core contribution, Joint Motion Integration (JMI), a training-free inference mechanism capable of injecting any external motion sequence, particularly in-the-wild semantic gestures, directly into the diffusion process. JMI automatically identifies the specific ``active joints'' conveying a semantic action and injects them into the generation, while relying on the diffusion backbone to synthesize the remaining body dynamics, including posture and flow, in accordance with the pre-learned style of the target speaker. This allows for the plug-and-play integration of arbitrary motions, including complex semantic gestures, without retraining or introducing the ``Frankenstein'' artifacts typical of cut-and-paste methods. Extensive experiments and perceptual studies demonstrate that SiGnature offers superior semantic motion control while maintaining smooth and natural co-speech gesture generation and preserving the distinct characteristics of the speaker, thereby outperforming state-of-the-art baselines.

07.
arXiv (CS.AI) 2026-06-12

A Quantitative Experimental Repeated Measures Study of Training Dynamics in a Small Llama Style Language Model Under a Compute-Aware Token Budget

作者:

arXiv:2606.13370v1 Announce Type: new Abstract: This study examines training dynamics in a small Llama-style language model trained under a fixed, compute-constrained token budget. Rather than evaluating efficiency solely through endpoint performance, the study uses a quantitative experimental repeated measures design to analyze how validation loss, validation perplexity, rolling volatility, backslide behavior, spike behavior, and between-seed variability change across token-based training intervals. Six independent training runs were conducted on a 4.26-million-parameter model using the TinyStories corpus, CPU-based full-precision training, and a target budget of approximately 20 million cumulative training tokens. Metrics were collected across 21 intervals, producing 126 seed-by-interval observations. Repeated measures ANOVA showed statistically significant interval effects for validation loss, validation perplexity, and rolling volatility. Descriptive trajectories revealed rapid early improvement followed by non-monotonic degradation during later training intervals. Mean validation loss decreased from 8.3552 at initialization to 2.7996 near 4 million tokens, but increased to 3.9010 by the final checkpoint. Validation perplexity followed the same pattern, falling sharply early in training before rising later. Derived telemetry further showed recurrent validation-loss backslides and no interval-summary evidence of a stable phase under the predefined criteria. These findings suggest that compute-aware language model evaluation should examine training trajectories rather than endpoint metrics alone. In constrained compute settings, additional token exposure may increase computational cost without producing proportional generalization gains, and interval-level telemetry can reveal instability, regression, and diminishing returns that final metrics may obscure.

08.
arXiv (quant-ph) 2026-06-11

Dissociative recombination and ion-pair formation in $\mathrm{HeH^+}$ isotopologues: A time-dependent wave-packet study including rotational coupling

arXiv:2606.11352v1 Announce Type: cross Abstract: We present a comprehensive theoretical investigation of dissociative recombination (DR) and resonant ion-pair (RIP) formation in $\mathrm{HeH^+}$ isotopologues using time-dependent wave-packet propagation methods. Nuclear dynamics are treated on a set of 23 coupled electronic states, including $^2\Sigma$, $^2\Pi$, and $^2\Delta$ symmetries, in both adiabatic and strictly diabatic representations, with rotational couplings explicitly included. Reaction cross sections are computed over collision energies ranging from 0 to 50 eV. The results reveal that inclusion of a large manifold of resonant states and rotational couplings significantly enhances the DR cross section relative to earlier theoretical studies. In the diabatic representation, $^2\Sigma$ states dominate the recombination dynamics, while in the adiabatic representation, $^2\Pi$ and $^2\Delta$ states contribute significantly at low collision energies. For RIP formation, two different diabatization schemes yield systematically larger cross sections than previous models, highlighting the sensitivity of ion-pair production to electronic coupling structure. Isotopic effects are examined, showing a clear inverse dependence of cross section magnitude on reduced mass. The present results underscore the importance of multi-state coupling and nonadiabatic effects in accurately describing electron-molecule collision processes in primordial and astrophysical plasmas.

09.
medRxiv (Medicine) 2026-06-17

Low-Density Lipoprotein Cholesterol and Dementia Risk: Integrating Mendelian Randomization and Target Trial Emulation Within the Heart-Brain Axis

Background: The heart-brain axis links cardiovascular and neurodegenerative disease through shared vascular and inflammatory mechanisms. Although low-density lipoprotein cholesterol (LDL-C) is an established causal factor in atherosclerotic cardiovascular disease (ASCVD), its relationship with dementia remains uncertain, with midlife elevations associated with increased risk but late-life associations often appearing null or inverse. To address this cholesterol paradox, we integrated mendelian randomization (MR) with an active-comparator new-user target trial emulation. Methods: We applied a triangulated causal inference framework integrating two-sample MR with observational target trial emulation. Genetic variants associated with LDL-C were used as instrumental variables to evaluate Alzheimer disease (AD), dementia with Lewy bodies (DLB), frontotemporal dementia (FTD), and any dementia (AnyDem), with causal estimates derived using inverse-variance weighted models and sensitivity analyses for heterogeneity and pleiotropy. In parallel, an active-comparator new-user design compared statin versus ezetimibe initiation among adults aged 60 years or older using propensity score (PS) overlap weighting and Cox proportional hazards models to evaluate cardiovascular and dementia outcomes. Results: Genetically predicted LDL-C was associated with increased risk of DLB (OR 1.65, 95% CI 1.30-2.10; p

10.
arXiv (CS.CV) 2026-06-16

Sustainable Face Recognition on Low-Power Devices with VQ-VAE Embeddings

Face recognition has become a cornerstone of modern AI applications, yet conventional approaches often rely on computationally intensive models deployed in cloud environments, leading to increased network traffic, high energy consumption, and a heavy carbon footprint. This work introduces a sustainable, edge-deployable face recognition framework based on Vector-Quantized Variational Autoencoders (VQ-VAE), which generates compact and semantically rich latent representations of facial images. By leveraging the compression capacity and reconstruction quality of VQ-VAE embeddings on the edge and combining them with the power of pre-trained face embeddings in a knowledge distillation setup, our system achieves comparable accuracy to state-of-the-art face embedding models while significantly reducing memory and computation requirements on the edge, making it suitable for low-power edge devices. The integration of VQ-VAE compression minimizes network overhead while keeping the matching accuracy high by retaining only the most informative facial features in the latent space. As a result, the reconstructed images preserve the key identity characteristics, improving the robustness and overall performance of the face embeddings.

11.
bioRxiv (Bioinfo) 2026-06-18

Benchmarking attention-based methods for vision transformers' interpretability in retinal fundus imaging

Deep learning models based on Vision Transformers (ViTs) have shown strong performance in retinal fundus imaging, but their interpretability remains poorly understood. In particular, attention-based attribution methods are widely used to explain ViT predictions, despite limited evaluation of their faithfulness and biological relevance in medical imaging. Here, we systematically benchmark four attention-based interpretability methods for RETFound, a retinal ViT-based foundation model, that we previously fine-tuned to predict 17 retinal vascular phenotypes from UK Biobank fundus images1. We compare raw attention, attention rollout, gradient-weighted attention rollout, and Chefer's hybrid relevance-based method using both qualitative visualisation and quantitative evaluation frameworks. To assess attribution faithfulness, we perform perturbation-based deletion and insertion experiments, quantifying changes in model predictions as highly attended image regions are progressively removed or restored. To evaluate biological specificity, we run structure-aware analyses combining attribution maps with vessel segmentation and artery-vein labels through the Relative ratio of Attention Intensity (RAI) metric. Across models, attribution maps differed substantially depending on the selected interpretability method, highlighting the need for rigorous quantitative evaluation. Among the evaluated approaches, gradient-weighted attention rollout consistently achieved the strongest perturbation performance and produced attribution maps most closely aligned with the anatomical definition of the predicted retinal traits. Furthermore, vessel-type specific models systematically concentrate attention on the corresponding vascular structures despite being trained using only a single scalar value per image as supervision. These findings demonstrate that attention-based attribution methods capture biologically meaningful vascular representations, while also revealing method-dependent variability in attribution behaviour. This work provides a quantitative framework for evaluating interpretability methods in medical imaging with annotated segmentation and contributes toward more transparent and biologically grounded medical AI systems.

12.
arXiv (CS.CL) 2026-06-16

Towards Advanced Mathematical Reasoning for LLMs via First-Order Logic Theorem Proving

Large language models (LLMs) have shown promising first-order logic (FOL) reasoning capabilities with applications in various areas. However, their effectiveness in complex mathematical reasoning involving multi-step FOL deductions is still under-researched. While LLMs perform competitively on established mathematical reasoning benchmarks, they struggle with multi-step FOL tasks, as demonstrated by Deepseek-Prover-V2-7B's low accuracy (4.2%) on our proposed theorem proving dataset. This issue arises from the limited exploration of diverse proof strategies and the potential for early reasoning mistakes to undermine entire proofs. To address these issues, we propose DREAM, a self-adaptive solution that enhances the Diversity and REAsonability of LLMs' generation strategies. DREAM incorporates an Axiom-Driven Strategy Diversification mechanism to promote varied strategic outcomes and a Sub-Proposition Error Feedback to help LLMs reflect on and correct their proofs. Our contributions include pioneering advancements in LLMs' mathematical reasoning through FOL theorem proving, introducing a novel inference stage solution that improves performance by 0.6% to 6.4%, and providing a curated dataset of 447 mathematical theorems in Lean 4 format for evaluation.

13.
arXiv (CS.CV) 2026-06-16

BioAutoML-NAS: An End-to-End AutoML Framework for Multimodal Insect Classification via Neural Architecture Search on Large-Scale Biodiversity Data

Insect classification is important for agricultural management and ecological research, as it directly affects crop health and production. However, this task remains challenging due to the complex characteristics of insects, class imbalance, and large-scale datasets. To address these issues, we propose BioAutoML-NAS, the first BioAutoML model using multimodal data, including images, and metadata, which applies neural architecture search (NAS) for images to automatically learn the best operations for each connection within each cell. Multiple cells are stacked to form the full network, each extracting detailed image feature representations. A multimodal fusion module combines image embeddings with metadata, allowing the model to use both visual and categorical biological information to classify insects. An alternating bi-level optimization training strategy jointly updates network weights and architecture parameters, while zero operations remove less important connections, producing sparse, efficient, and high-performing architectures. Extensive evaluation on the BIOSCAN-5M dataset demonstrates that BioAutoML-NAS achieves 96.81% accuracy, 97.46% precision, 96.81% recall, and a 97.05% F1 score, outperforming state-of-the-art transfer learning, transformer, AutoML, and NAS methods by approximately 16%, 10%, and 8% respectively. Further validation on the Insects-1M dataset obtains 93.25% accuracy, 93.71% precision, 92.74% recall, and a 93.22% F1 score. These results demonstrate that BioAutoML-NAS provides accurate, confident insect classification that supports modern sustainable farming.

14.
arXiv (CS.AI) 2026-06-11

Autoregressive Direct Preference Optimization

arXiv:2602.09533v2 Announce Type: replace Abstract: Direct preference optimization (DPO) has emerged as a promising approach for aligning large language models (LLMs) with human preferences. However, the widespread reliance on the response-level Bradley-Terry (BT) model may limit its full potential, as the reference and learnable models are assumed to be autoregressive only after deriving the objective function. Motivated by this limitation, we revisit the theoretical foundations of DPO and propose a novel formulation that explicitly introduces the autoregressive assumption prior to applying the BT model. By reformulating and extending DPO, we derive a novel variant, termed Autoregressive DPO (ADPO), that explicitly integrates autoregressive modeling into the preference optimization framework. Without violating the theoretical foundations, the derived loss takes an elegant form: it shifts the summation operation in the DPO objective outside the log-sigmoid function. Furthermore, through theoretical analysis of ADPO, we show that there exist two length measures to be considered when designing DPO-based algorithms: the token length $\mu$ and the feedback length $\mu'$. To the best of our knowledge, we are the first to explicitly distinguish these two measures and analyze their implications for preference optimization in LLMs.

15.
Nature (Science) 2026-06-10

Efficient and accurate neural-field reconstruction using resistive memory

作者:

Applications such as medical imaging, augmented and virtual reality, and embodied artificial intelligence (AI) depend on the ability to reconstruct complex signals from sparse observations. These applications are characterized by incomplete measurements and limited computational resources. Traditional approaches to digital hardware face the following challenges: explicit signal representations require heavy sampling and storage, data movement across the von Neumann bottleneck dominates energy and latency, and CMOS (complementary metal–oxide–semiconductor)-based circuits offer limited parallel efficiency. Here we present a software–hardware co-optimization framework for sparse-input signal reconstruction. At the software level, we use neural fields1 to implicitly represent signals using neural networks, which are further compressed by low-rank decomposition and structured pruning. At the hardware level, we design a resistive-memory-based computing-in-memory platform, featuring a Gaussian encoder and a multi-layer perceptron processing engine. The Gaussian encoder leverages the intrinsic stochasticity of resistive memory for efficient encoding, whereas the processing engine enables precise weight mapping through a hardware-aware quantization circuit. On a 40-nm 256 Kb resistive-memory macro, the system delivers 23.5×, 21.0× and 32.3× gains in projected energy efficiency, together with 10.8×, 38.8× and 6.2× gains in projected parallelism, for three-dimensional computed tomography sparse reconstruction, novel view synthesis and dynamic-scene novel view synthesis, without compromising on reconstruction quality. This work advances AI-driven signal reconstruction technology and paves the way for future efficient and robust medical AI and three-dimensional vision applications. A co-optimized AI hardware–software system using resistive-memory computing improves energy efficiency and parallelism for sparse signal reconstruction in imaging and three-dimensional vision applications.

16.
arXiv (quant-ph) 2026-06-19

Random Local Stabilizer Codes in Three Dimensions without String or Self-Similar Fractal Logical Operators

作者:

arXiv:2606.19873v1 Announce Type: new Abstract: Quantum error-correcting codes (QECs) are essential components quantum computation and have deep connections to quantum phases of matter. A key obstruction to passive self-correcting QECs is the presence of string logical operators, which can generate logical errors through constant-energy-barrier processes. Haah's Codes (fracton codes) showed that three-dimensional stabilizer codes can forbid such string logical operators, but their translation-invariant structure supports self-similar fractal logical operators with a logarithmic energy barrier. We introduce the qutrit random cubic codes, a family of local qutrit Calderbank-Shor-Steane stabilizer Hamiltonians with similar cube-check structure as Haah's Code 1 but built from spatially varying stabilizers. We prove that these models retain the no-string property and numerically observe that they have properties distinct from translation-invariant fracton codes: the smallest ground-state degeneracy exponent is $k=2$ for odd $L$ and $k=4$ for even $L$; noncontractible plane-logical operators span the entire logical space; and charge-push diagnostics show that the self-similar fractal operators are absent. These results demonstrate that constrained randomness can fundamentally change the nature of stabilizer codes and improve their self-correction properties. They further point to broader families of quantum error-correcting codes and quantum phases beyond canonical topological and fracton orders.

18.
arXiv (CS.AI) 2026-06-17

ParkingTransformer: LLM-Enhanced End-to-End Trajectory Planning for Autonomous Parking

arXiv:2606.17082v1 Announce Type: cross Abstract: End-to-end autonomous parking has emerged as a critical task within the realm of autonomous driving. However, existing methods suffer from black-box characteristics, lacking high-level semantic understanding and interpretability, which impedes the realization of seamless long-distance autonomous parking from the road to the target spot. To address these limitations, we propose ParkingTransformer, a novel framework that leverages multi-view perception and the scene understanding capability of Large Language Models (LLMs). By combining trajectory queries with LLMs implicit state features, our method interacts directly with historical information and raw sensor data to output planning trajectories, eliminating the need for dense Bird's-View (BEV) representations. To compensate for the inadequate spatial reasoning ability of LLMs, we introduce 3D positional encoding to explicitly inject spatial geometric awareness. Furthermore, a fixed-window streaming mechanism is designed for historical information processing, significantly improving long-term temporal processing efficiency and inference speed. Additionally, a coarse-to-fine decoding strategy is employed to progressively enhance trajectory precision. Extensive closed-loop experiments are conducted on the CARLA simulator and real-world vehicle platforms. The results demonstrate that our method achieves a driving score of 61.32 in CARLA simulator and an average success rate of 88.70% in real-world experiments, validating the feasibility and effectiveness of the proposed algorithms.

19.
arXiv (CS.LG) 2026-06-17

Uncertainty Quantification of Engineering Structures by Polynomial Chaos Expansion and Multivariate Active Learning

arXiv:2606.17233v1 Announce Type: new Abstract: In many engineering applications, a single high-fidelity model produces multiple quantities of interest (QoIs) under the same input parameters, e.g. finite element models of complex physical systems. To alleviate the high computational cost of direct model evaluations, surrogate models are widely used to construct efficient approximations of model responses. Naturally, the accuracy of surrogates strongly depends on the quality of the experimental design (ED). However, a single ED may not provide an adequate representation for all outputs simultaneously, especially when different outputs exhibit varying sensitivities to the input variables. A straightforward solution is to perform separate sampling for each output, but this results in increased sampling complexity and computational cost. From a statistical perspective, such an approach also ignores potential correlations among all outputs and may compromise data consistency. To address this issue, an adaptive sequential sampling method for constructing polynomial chaos expansion surrogate models is generalized for vector valued QoIs. The method sequentially selects new samples from a candidate pool based on their local contribution to the output variance, while balancing distance-based exploration of the input space and exploitation of aggregated variance information across all outputs. Its performance is compared with non-sequential Latin Hypercube Sampling through several numerical examples from engineering problems. Numerical results demonstrate that the proposed strategy improves both surrogate accuracy and stability, and provides a more reliable estimation of second-order statistics.

20.
arXiv (CS.CV) 2026-06-17

WeaveLA: Event Driven Cross-Subtask Latent Memory Weaving for Repetitive Robot Manipulation

Vision-Language-Action (VLA) policies have achieved remarkable single-step manipulation, yet they remain brittle precisely where each stage depends on what was just completed. The core issue is structural: short-window VLAs lack an explicit channel for rouxting information across sub-task boundaries, and existing memory-augmented variants either write at every frame, retrieve from demonstration-time stages, or fire at sub-goal events without performing an explicit sub-task-to-sub-task hand-off into the action expert. We identify the sub-goal completion event as the natural temporal unit for cross-subtask memory hand-off, and present WeaveLA (Weave Latent memory for Vision-Language-Action policies), a cross-subtask memory interface that, on top of a frozen VLA backbone, compresses each completed segment into latent tokens via query-driven attention pooling and routes them directly into the action-generation path of the next sub-task. This event-triggered, action-side design preserves the base policy's short-window interface while adding a lightweight cross-subtask channel. Through stratified evaluation on RoboMME with a $\pi_{0.5}$ backbone, WeaveLA's gains land exactly where the channel is needed: on the hardest repetition slice (SwingXtimes, $N{=}3$), success rises from $0\%$ to $47.8\%$, while single-execution episodes remain unchanged. Per-episode paired analysis confirms the gains are confined to tasks whose causal structure requires cross-subtask information.

21.
arXiv (CS.AI) 2026-06-11

TouchThinker: Scaling Tactile Commonsense Reasoning to the Open World with Large-scale Data and Action-aware Representation

arXiv:2606.11637v1 Announce Type: new Abstract: Touch is a key modality for embodied agents to understand the physical world. Although recent work has incorporated tactile signals into language systems for tactile commonsense reasoning, scaling such systems to realistic open-world settings remains challenging due to two key bottlenecks: (1) current tactile reasoning datasets remain limited in format and scale, providing insufficient supervision for reasoning from tactile observations to physical commonsense and hindering the learning of transferable tactile commonsense; (2) Tactile signals are inherently redundant and action-specific, yet existing methods often overlook these properties, resulting in inefficient representations with limited semantic expressiveness. To address these limitations, we propose TouchThinker, a tactile-language framework that scales tactile commonsense reasoning to the open world from both data and representation perspectives. First, we construct TouchThinker-1M, a million-scale, multi-source tactile reasoning dataset covering 415 objects, 8 scenarios, and 7 sensor types, providing a solid data foundation for open-world generalization. We further introduce TouchThinker-Bench, an open-world benchmark with more realistic and diverse tasks. Then, we propose action-aware modeling mechanism to improve tactile representation efficiency and enable efficient reasoning. Experimental results demonstrate that TouchThinker achieves competitive performance against state-of-the-art models across multiple datasets. Our code and dataset will be made available at: https://github.com/lvkailin0118/TouchThinker.

22.
arXiv (CS.CV) 2026-06-19

One-Shot Novel View and Pose Human Image Synthesis via 3D Prior Guided Diffusion Model

This paper addresses the challenge of one-shot novel view and pose human image synthesis. The existing methods transfer the reference human image to a target pose using a set of 2D pose keypoints or synthesize human images based on generalizable human NeRF which uses human model priors to extract point-wise features. However, pose transfer based methods can not handle complex human pose using ambiguous 2D pose as the condition, while generalizable human NeRFs may be inaccurate to recover occluded/invisiable human parts without extracted reliable features. To solve these problems, we propose a novel approach for novel view and pose synthesis from a singe human image via conditional denoising diffusion model. Our diffusion model divides the novel view and pose synthesis problem into a sequence of conditional denoising steps. Specifically, to generate humans with complex and arbitrary poses, we introduce 3D human priors, i.e., 3D normal map and color prompt, as geometry and color conditions into the generation process. By transferring the reference human into the target human with a series of diffusion steps, our diffusion model enables high-quality synthesis including the occluded/invisible parts. Further, we propose a self-reconstruction based customized refinement to enhance fine details when tested on novel persons.Experimental results on different public datasets demonstrate that our approach significantly outperforms previous methods and also shows better generalization ability across datasets. The code will be made publicly available at https://github.com/Yankeegsj/3DPGDM.

23.
arXiv (CS.AI) 2026-06-16

Model Graph Inductive Learning for Knowledge Graph Completion

arXiv:2606.16509v1 Announce Type: new Abstract: Link prediction in knowledge graphs fundamentally depends on the quality of learned embeddings for entities and relations. However, most existing methods derive these embeddings by aggregating only the local neighborhood of each entity, neglecting the global structure of the knowledge graph. This limited view prevents models from capturing higher-level structural patterns that are essential for accurate and generalizable link prediction. To address these limitations, we introduce Model Graph Inductive Learning (MGIL), a framework that constructs a model graph by clustering entities based on the similarity of their incoming and outgoing relational structures or their entity types. A GNN is then applied to this model graph to produce embeddings that capture the global view of the knowledge graph. These embeddings subsequently serve as high-quality initial features %embeddings for the original knowledge graph, replacing random initialization and leading to more stable and expressive representations. Extensive experiments on standard and recently proposed inductive benchmarks demonstrate that MGIL achieves state-of-the-art or highly competitive performance in inductive link prediction, highlighting its effectiveness across diverse graph settings.

24.
arXiv (CS.CL) 2026-06-19

Thermodynamic Signatures of Reasoning: Free-Energy and Spectral-Form-Factor Diagnostics for Hallucination Detection in Large Language Models

作者:

Hallucination detection in large language models (LLMs) is deployment-critical, and recent work shows that the spectrum of attention-derived graph Laplacians carries strong signal about reasoning quality. Prior spectral diagnostics, however, summarize the Laplacian spectrum by a handful of eigenvalues or hand-picked scalars, leaving most of its structure unused. We propose Free-Energy Signatures (Fes), a spectral descriptor that treats each layer's attention Laplacian as a Hamiltonian and extracts its thermodynamic potentials partition function, free energy, spectral entropy, heat capacity together with the random-matrix-theory (RMT) spectral form factor. We prove three results: (i)~Lipschitz stability of Fes under attention perturbation; (ii)~an expressiveness result showing that Fes enriches finite spectral summaries and approximates moment-derived spectral functionals under explicit regularity and grid-resolution assumptions; and (iii)~a finite-sample PAC bound on the AUROC of a training-free detector built from Fes. Empirically, across six open-weight LLMs and six benchmarks, a lightweight probe on Fes descriptors achieves the strongest aggregate AUROC among attention-spectral baselines, improving over LapEig by $+6.5$ AUROC points and over GoR-4 by $+2.4$ points on average, while requiring no update to the underlying LLM. In the fully unsupervised setting, an RMT-deviation score achieves mean AUROC $0.71$, providing a label-free but weaker detector. A complementary RMT analysis shows that correct generations exhibit more Wigner-Dyson like spectral statistics, whereas hallucinations exhibit more Poisson-like statistics. The anonymized code and config are provided in the supplementary material.

25.
medRxiv (Medicine) 2026-06-10

Developing a Unified Criminal Justice Pathway into Drug and Alcohol Treatment from Police Custody: A Public Health Service Evaluation and Pathway-Design Project in Blackpool, United Kingdom

Introduction: Blackpool, England's most deprived local authority, has the highest drug-related death rate in the country. People in police custody with problem substance use are a key Core20PLUS5 inclusion-health group, yet referral from the police into structured drug and alcohol treatment is fragmented and relies heavily on self-report. We evaluated the current police-to-treatment route in Blackpool and designed an evidence-informed unified pathway. Materials and Methods: A mixed-methods service evaluation and pathway-design project was conducted during a six-month General Practice / Public Health rotation. Routinely collected referral data from Horizon (the local specialist drug and alcohol service) covering the 47-month period from December 2019 to October 2023 were analysed. Findings were triangulated with national policy, the Project ADDER and Liaison and Diversion evaluations, and the international evidence on police-led pre-arrest diversion. Results: Of 5,900 total referrals into Horizon over 47 months, only 269 (4.56%) originated from the police. Police referrals accounted for fewer than 5% of monthly referrals in 30 of 47 months, for 5 to 9.9% in 16 months, and for >/= 10% in only one month (10.8%, December 2022). Blackpool recorded 76 drug-misuse deaths in 2019-21 (19.4 per 100,000, approximately four times the England rate). A six-step unified pathway is proposed: Initiate Referral (opt-out, from ADDER Police and Liaison and Diversion); Initial Assessment; Tailored Treatment Plan; Continuous Support; Collaboration and Monitoring; and Evaluation and Adjustment. Conclusions: Police contact is markedly under-used as a gateway to treatment despite Blackpool having the highest drug-related mortality in England. An opt-out, multi-agency pathway anchored in Core20PLUS5 has the potential to narrow the treatment gap, reduce re-offending, and address the structural health inequalities that drive premature mortality.