Academic Intelligence · Curated Daily

探索全球前沿学术脉络

AcademicHub 汇聚顶级期刊与预印本平台的实时文献。定制您的专属科研雷达,利用大语言模型自动生成交叉领域文献分析简报。

01.
arXiv (CS.CL) 2026-06-11

Where Do Backdoors Live? A Component-Level Analysis of Backdoor Propagation in Speech Language Models

Speech language models (SLMs) are systems of systems: independent components that unite to achieve a common goal. Despite their heterogeneous nature, SLMs are often studied end-to-end; how information flows through the pipeline remains obscure. We investigate this question through the lens of backdoor attacks. We first establish that backdoors can propagate through the SLM, leaving all tasks highly vulnerable. From this, we design a component analysis to discover the role each component takes in backdoor learning. We find that backdoor persistence or erasure is highly dependent on the targeted component. Beyond propagation, we examine how backdoors are encoded in shared multitask embeddings, showing that poisoned samples are not directly separable from benign ones, challenging a common separability assumption used in filtering defenses. Our findings emphasize the need to treat multimodal pipelines as intricate systems with unique vulnerabilities, not solely extensions of unimodal ones.

02.
arXiv (CS.CL) 2026-06-17

VoidPadding: Let [VOID] Handle Padding in Masked Diffusion Language Models so that [EOS] Can Focus on Semantic Termination

MDLMs generate text by denoising a preallocated masked response canvas, making response-length modeling central to instruction tuning. Existing MDLMs often inherit the autoregressive convention of using repeated \texttt{[EOS]} tokens for padding during instruction tuning, giving \texttt{[EOS]} a dual role as both a semantic terminator and a padding token. We show that this dual role is a root cause of \texttt{[EOS]} overflow under large-block decoding. To decouple these roles, we propose VoidPadding, which introduces \texttt{[VOID]} for padding and reserves \texttt{[EOS]} for termination. During inference, the learned \texttt{[EOS]} signal enables early stopping, while the learned \texttt{[VOID]} signal guides adaptive response canvas expansion. On Dream-7B-Instruct, VoidPadding improves the block-size-averaged four-task mean across mathematical reasoning and code generation benchmarks by \(+17.84\) points over the original model and \(+6.95\) points over RainbowPadding, while reducing decoding NFE by 55.7\% on average. Code is available at https://github.com/Haru-LCY/VoidPadding.

03.
arXiv (CS.LG) 2026-06-17

Geometrical fairness in graph neural networks

arXiv:2606.17684v1 Announce Type: cross Abstract: Graph-based learning methods have become increasingly prominent due to their strong performance across diverse applications. Among these, recent frameworks grounded in diffusion processes provide a unifying perspective that extends traditional graph neural network formulations while addressing limitations of standard message-passing mechanisms. Despite these advances, concerns remain regarding the fairness of such models, as they may propagate or amplify biases present in the data. In this work, we introduce a fairness-aware adaptation of graph-based diffusion by modifying the underlying Laplacian operator. Our approach incorporates multiple complementary transformations, including subspace projections, spectral adjustments, and frequency-based filtering, to mitigate bias-related components. Leveraging the intrinsic smoothing properties of graph diffusion, we provide a principled analysis of the resulting behavior and establish theoretical insights into fairness properties. We evaluate the proposed framework on both synthetic and real-world datasets, demonstrating that it achieves competitive performance while improving fairness metrics with limited additional computational cost.

04.
arXiv (math.PR) 2026-06-18

Cramér-Type Moderate Deviations for Engel's Series via a Martingale Approach

arXiv:2606.18866v1 Announce Type: new Abstract: Let $x$ be uniformly distributed on $(0,1)$, and let $(q_n)_{n\geq1}$ be the digits of its Engel series expansion. We establish a Cramér-type moderate deviation expansion for $(\log q_n-n)/\sqrt n$. The proof is based on a martingale decomposition and asymptotic results for martingales. As consequences, we obtain a moderate deviation principle over the full range of scales between the central limit theorem and the law of large numbers, without the additional lower rate restriction required in several earlier works. We also derive a uniform Berry–Esseen bound of order $(\log n)/\sqrt n$.

05.
arXiv (CS.CL) 2026-06-19

FineREX: Fine-Tuned NER-RE for Human Smuggling Knowledge Graphs

Court proceedings contain valuable evidence about human smuggling networks, but this information is often buried within unstructured, jargon-heavy legal documents. While large language models (LLMs) can support knowledge graph construction through automated information extraction, existing approaches rely on general-purpose models that are not tailored to the entity and relationship definitions required in this domain. We introduce FineREX, a streamlined knowledge graph construction pipeline built around a fine-tuned LLM for named entity recognition and relationship extraction (NER-RE). Using a manually annotated dataset of $512$ text chunks, FineREX achieves absolute improvements of 15.50% and 31.46% in entity and relationship F1-score, respectively, compared to a larger general-purpose baseline. These gains translate into higher-quality knowledge graphs, reducing legal noise by nearly half and lowering node duplication on long documents from 17.78% to 11.17%. By eliminating document rewriting and redundant extraction stages, FineREX also reduces end-to-end processing time by 50.0%. Our results demonstrate that domain-specific fine-tuning can substantially outperform larger general-purpose models while improving both the quality and efficiency of knowledge graph construction for illicit network analysis.

06.
arXiv (CS.LG) 2026-06-16

Learning Policy from a Single Trajectory in Average-Reward Markov Decision Process

arXiv:2606.16729v1 Announce Type: new Abstract: While there is an extensive body of work characterizing the sample complexity of discounted cumulative-reward MDPs, finite sample analyses for average-reward MDPs have been limited, and most existing works rely on restrictive assumptions such as ergodicity or access to a generative model. In this work, we establish the first finite sample complexity guarantees from a single trajectory for weakly communicating average-reward MDPs. To this end, we study the dynamics of a single trajectory in weakly communicating MDPs and based on this analysis, we develop novel model-free methods. Notably, our value-based and policy-based methods provide finite sample complexity guarantees of $\widetilde{O}(1/\varepsilon^2)$ and $\widetilde{O}(1/\varepsilon^4)$ from a single trajectory in weakly communicating MDPs, respectively. Furthermore, we introduce the first model-free method that requires no prior knowledge of problem-dependent quantities for communicating MDPs.

07.
arXiv (CS.AI) 2026-06-12

Beyond Problem Solving: UOJ-Bench for Evaluating Code Generation, Hacking, and Repair in Competitive Programming

arXiv:2606.12864v1 Announce Type: cross Abstract: Despite strong performance in competitive programming, the role of Large Language Models (LLMs) in supporting human learning in the same setting remains largely unexplored. In this work, we introduce UOJ-Bench, a benchmark designed to evaluate not only the problem-solving ability of LLMs, but also their ability to identify errors in human-written code – a crucial educational activity traditionally supported by running test cases over online judge systems. UOJ-Bench consists of three distinct tasks: code generation, code hacking, and code repair, all constructed from real-world code submissions on the Universal Online Judge (UOJ) and evaluated through UOJ's native judging infrastructure. Our results show that under one-shot evaluation, even the strongest models fail to identify errors in more than 50% of a set of submissions that have been found to be incorrect by UOJ users. While test-time scaling improves success rates to above 90%, the substantial computational costs incurred from model inference limit its practicality for large-scale deployment. Despite these limitations, we find that the best-performing models under test-time scaling can uncover errors in over 5% of full-score submissions across roughly 30 problems, suggesting that frontier LLMs can already provide complementary signals beyond standard judging systems.

08.
arXiv (CS.AI) 2026-06-17

Quantum Cinema: An Interactive Cinematic Exploration of Quantum Computing Hardware via Generative World Models

arXiv:2606.17102v1 Announce Type: cross Abstract: Quantum computing promises transformative advances across science and industry, yet the physical hardware that enables these computations remains invisible to the public: quantum processors operate inside sealed dilution refrigerators at temperatures near absolute zero, making direct observation impossible. This "imagination gap" between quantum computing's growing societal impact and the public's ability to visualize it represents a significant barrier to quantum literacy and workforce development. We present Quantum Cinema, an open-source, browser-based interactive application that closes this gap by transforming invisible quantum hardware into explorable, cinematic experiences using generative world models. Quantum Cinema guides users through a four-act narrative – from the foundational Nobel Prize-winning science of quantum entanglement, through curated video introductions to three major quantum computing architectures (trapped-ion, neutral-atom, and superconducting systems), into immersive three-dimensional generative worlds that make invisible quantum phenomena observable, and finally to interactive radar-chart comparisons grounded in real quantum device specifications. All three-dimensional environments are generated using WorldLabs' generative world model platform and are scientifically grounded in curated metrics from Amazon Web Services (AWS) Braket quantum hardware. Quantum Cinema requires no installation, no specialized hardware, and no quantum computing background. It is designed to serve two distinct communities: scholars and developers seeking to replicate or extend the platform, and educators, researchers, and science communicators seeking an intuitive tool for explaining quantum hardware to diverse audiences. This paper describes the system architecture, the generative world model pipeline, use cases for both communities, and directions for future work.

10.
medRxiv (Medicine) 2026-06-16

Development and reliability and validity test of the Questionnaire on Knowledge, Attitude and Practice of ICU Nurses on Blood Oxygen Saturation Management in Mechanically Ventilated Patients

Objective: A questionnaire on the knowledge, attitude and practice of ICU nurses regarding the management of blood oxygen saturation in patients with mechanical ventilation was compiled, and its reliability and validity were tested. Method: Drawing upon the knowledge-attitude-practice theory, the initial questionnaire draft was developed through literature review and consultation with Delphi experts. Employing convenience sampling, 32 nurses from the General ICU of Wuxi Second People's Hospital were surveyed between 1 August 2025 and 27 September 2025, enabling item screening and assessment of reliability and validity.The full version of the developed questionnaire is provided as Supporting Information (S1 File). All items are published under a CC BY 4.0 license, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Result: A questionnaire on the knowledge, attitude and practice of ICU nurses regarding the management of blood oxygen saturation in mechanically ventilated patients was finalised, comprising 26 items: 11 in the knowledge dimension, 6 in the attitude dimension and 9 in the behaviour dimension. The overall Cronbach's coefficient for the questionnaire was 0.88, with dimension-specific coefficients of 0.787, 0.722, and 0.781 respectively. The Spearman-Brown coefficient for the entire questionnaire was 0.967, while dimension-specific coefficients were 0.796, 0.666, and 0.728 respectively. The content validity index at the questionnaire level (S-CVI) was 0.886, and the item-level content validity index (I-CVI) ranged from 0.913 to 0.967. 0.728. The questionnaire's level content validity index (S-CVI) was 0.886, and the item level content validity index (I-CVI) ranged from 0.913 to 1.00. Conclusion: The questionnaire on knowledge, attitude and practice of blood oxygen saturation management in mechanically ventilated patients demonstrates good reliability and validity. It may serve as an assessment tool for intensive care unit nurses regarding their knowledge, attitude, and practices concerning blood oxygen saturation management in mechanically ventilated patients, thereby establishing a foundation for developing targeted intervention strategies in future practice.

11.
arXiv (quant-ph) 2026-06-17

SPICE-Q and Large-Scale Quantum Chip Production

arXiv:2606.17907v1 Announce Type: new Abstract: We propose SPICE-Q, a SPICE-inspired design-technology co-optimization framework for superconducting quantum processors. Rather than replacing tools such as HFSS, Qiskit Metal, pyEPR, SQcircuit, SQuADDS, scqubits, or QuTiP, SPICE-Q aims to connect them through a unified, traceable data chain spanning process rules, layout, electromagnetic simulation, energy-participation-ratio and circuit quantization, Hamiltonian extraction, noise analysis, cryogenic test, and manufacturing feedback. The central mapping is from process and PDK constraints to layout geometry, electromagnetic modes, equivalent circuit parameters, effective Hamiltonians, and finally metrics such as frequency, coupling, anharmonicity, decoherence, readout performance, and yield. This flow must capture Josephson-junction variability, transmon frequency allocation, resonator and Purcell constraints, coupler crosstalk, microwave routing, 3D interconnects, material/interface loss, package modes, and wafer-scale process statistics. By introducing standardized model interfaces, statistical parameter models, model cards, version governance, and closed-loop calibration from cryogenic and fabrication data, SPICE-Q frames superconducting quantum-chip design as an engineering workflow rather than a collection of isolated simulations. We argue that scalable and fault-tolerant quantum processors will require such a continuous model chain from device physics and electromagnetic fields to quantum dynamics, noise, manufacturability, and system-level yield.

12.
arXiv (quant-ph) 2026-06-16

Diagonal-Budgeted Trotterization for Efficient Quantum Hamiltonian Simulation

arXiv:2606.16959v1 Announce Type: new Abstract: Efficient classical simulation of quantum Hamiltonian dynamics is often bottlenecked by exponential state growth and the overhead of generic sparse linear algebra. We introduce diagonal-budgeted Trotterization, a structure-aware strategy that decomposes Hamiltonians into factors preserving diagonal sparsity while tightly controlling fidelity loss. Our implementation, HamSim, utilizes a compact diagonal-sparse data layout and specialized C++/CUDA kernels to bypass the overheads of generic formats like CSR. By leveraging SIMD vectorization, multithreading, and GPU acceleration, HamSim achieves high performance across heterogeneous architectures. Benchmarks on the HamLib suite show that HamSim significantly outperforms Qiskit-Aer. On CPUs, HamSim attains speedups of $182$–$1,269\times$ on optimization instances (TSP, MaxCut) and $4.8$–$841\times$ on physical models (TFIM, Heisenberg). On GPUs, it achieves up to $178\times$ speedup for $12$–$16$ qubit problems. Unlike traditional Trotterization, HamSim maintains near-perfect fidelity without requiring exponential steps. This demonstrates that diagonal-aware numerical kernels provide a scalable foundation for high-fidelity classical Hamiltonian simulation.

13.
arXiv (CS.CV) 2026-06-16

Through-Foliage Surface-Temperature Reconstruction for Early Wildfire Detection

We present a method to reconstruct surface temperatures through forest vegetation by combining signal processing and machine learning, enabling fully automated aerial wildfire monitoring with drones for early fire detection. Synthetic aperture (SA) sensing reduces canopy occlusion but introduces thermal blur. To overcome this, we train a visual state space model to recover subtle thermal signals of partially occluded soil and fire hotspots from blurred data. To address limited real-world training data, we generate realistic surface temperature simulations using a latent diffusion model, temperature augmentation, and procedural thermal forest modeling. On simulated datasets, our method reduces RMSE by 2-2.5 versus conventional thermal and uncorrected SA imaging; in field experiments on hotspots, RMSE improved by 12.8-fold and 2.6-fold, respectively. Our approach also generalizes to other thermal signals, including human signatures, capturing morphology and extent – critical where simple thresholding fails – while conventional imaging struggles with partial occlusion.

14.
arXiv (quant-ph) 2026-06-11

Towards the implementation of a quantum classifier

arXiv:2606.10150v2 Announce Type: replace Abstract: In this work, we investigate the use of a quantum circuit as a binary classification model in the context of quantum machine learning. We call this model, binary quantum classifier. First, we describe fundamental concepts of quantum computing and introduce the computational tool used: Qibo, an open-source framework for efficient quantum simulations and quantum hardware control. Then, we describe how to design a binary quantum classifier for the classification of images and small arrays of variables by showing how to input data in the circuit, defining a quantum circuit model Ansatz with trainable parameters and a loss function, and implementing multiple minimizers. We test our quantum classifier with two data sets. The first one is the MNIST data set which is composed of handwritten digits (reduced to only handwritten zeros and handwritten ones for binary classification). We study the behavior of different minimizers by increasing the number of layers of the Ansatz. The second data set represents two different high energy collisions that can occur at colliders such as LHC (CERN). Due to in-time proton-proton interactions known as pile-up, we distinguish two different data sets: "without pile-up" and "with pile-up". These collisions can be represented by images of size 32x32 or by six high-level variables that we call features. By increasing the size of the training data set and the number of layers of the Ansatz, we search for the best minimizer. Splitting the data set in training set and test set, we compute: ROC curve, AUC score, confusion matrices and test set accuracy. For "with pile-up" images, we compare the results obtained with the quantum classifier with a small convolutional neural network. We conclude that is possible to build a binary quantum classifier with a quantum circuit and we highlight its performances and limitations in comparison with classical technologies.

15.
Nature (Science) 2026-06-17

Analysis of 173,303 exomes and genomes in the Pakistan Genome Resource

Naturally occurring loss-of-function variants in human genes enable drug target discovery because they mimic pharmacological inhibition of proteins. However, the study of these genetic variants is constrained by their rarity. Sequencing of diverse populations, particularly those enriched in familial relatedness, has been postulated to promote discovery of rare genetic variants1–3. Here we present the Pakistan Genome Resource, a South Asian biobank with high familial relatedness comprising 173,303 participants, who collectively carry naturally occurring homozygous loss-of-function variants in 6,476 genes. We describe the genetic architecture of this population, associations between genes and biomarkers, the distribution of loss-of-function variants across molecular pathways, and recall-by-genotype studies of therapeutically relevant genes. The Pakistan Genome Resource expands the catalogue of human genetic variants, provides a comprehensive genetic reference resource for the Pakistani population, and demonstrates the value of studying diverse cohorts to advance human health. The Pakistan Genome Resource compiles biobank data from 173,303 individuals with high familial relatedness, broadening the catalogue of human genetic variation and establishing a population-specific genomic reference for Pakistan.

16.
arXiv (quant-ph) 2026-06-12

More efficient Clifford+T synthesis for small-angle rotations and application to Trotterization

arXiv:2605.31544v2 Announce Type: replace Abstract: Clifford+T synthesis of rotation gates is an important routine in fault-tolerant quantum compilation. While Clifford+T synthesis is scalable, it has a high overhead of tens of T gates per rotation in practice, translating to high resource estimates for many fault-tolerant algorithms. However, these well-known results, including those using probabilistic mixtures [Quantum 7, 1208 (2023)], are independent of the rotation angle $\theta$, requiring $O(\log 1/\delta)$ T gates. We show that it is possible to do much better for small angles, reducing the T cost to $\tilde{O}(\theta^2/\delta)$, and returning to existing $O(\log1/\delta)$ results in the worst case. This is particularly important since many algorithms, such as Trotterization, are dominated by small-angle rotations. Further, we perform a detailed theoretical and numerical study of quasi-probabilities, which can further reduce the total T cost of large circuits by orders of magnitude with only a small overhead in sample complexity. We also develop a scheme based on quasi-probability mixtures of Clifford+T fallback channels. We derive new $\theta$-dependent formulas that can be used for resource estimation of fault-tolerant quantum algorithms. As an application of our results, we show that the gate cost of Trotterization circuits compiled to a Clifford+T gate set is constant in the small Trotter step size limit, and can be reduced by orders of magnitude even for large step sizes. The cost of fault-tolerant Trotterization for a variety of applications should be re-examined in light of these results. Our work dispels the widely-stated claim that Clifford+T rotation synthesis has a high cost independent of $\theta$, and further develops a scalable quasi-probability method for rotation synthesis. We also expect our results to bring forward useful early fault-tolerant quantum computing by reducing required magic state resources.

17.
arXiv (quant-ph) 2026-06-15

Synchronization of Quasi-Particle Excitations in a Quantum Gas with Cavity-Mediated Interactions

arXiv:2504.17731v2 Announce Type: replace-cross Abstract: Driven-dissipative quantum systems can undergo transitions from stationary to dynamical phases, reflecting the emergence of collective non-equilibrium behavior. We study such a transition in a Bose-Einstein condensate coupled to an optical cavity and develop a cavity-assisted Bragg spectroscopy technique to resolve its collective modes. We observe dissipation-induced synchronization at the quasiparticle level, where two roton-like modes coalesce at an exceptional point. This reveals how dissipation microscopically drives collective dynamics and signals a precursor to a dynamical phase transition.

18.
medRxiv (Medicine) 2026-06-22

Efficacy and safety of semaglutide for obesity and hyperphagia in adults with Prader-Willi syndrome

Context: Prader-Willi syndrome is a genetic neurodevelopmental disorder characterized by hyperphagia and early-onset obesity from hypothalamic dysfunction with endocrinopathies and learning disability. Management is challenging with strict control of the food environment needed. While newer glucagon-like peptide-1 receptor agonists, such as semaglutide, have efficacy in non-PWS obesity, there have been limited case reports in PWS. Objective/Design/Setting: Retrospective records review of 12 adults with PWS and overweight/obesity treated with semaglutide at a UK academic hospital centre specialist clinic. Patients: mean +/- SD age 28.3 +/- 10.1 years, 83% female, BMI 46.6 +/- 8.2kg/m2, 75% type 2 diabetes mellitus. Intervention: Median follow-up 17.2 months (range 8.7-36.1) with median semaglutide dose 2.4mg once weekly (1.0-2.4). Results: Although there was no significant weight loss on semaglutide, there was stabilisation of the weight gain prior to treatment over previous 12.4 months (7.6-23.0) (post -3.1 +/- 9.9% vs. pre +5.7 +/- 5.6%: d -0.72, P=0.037). There was a significant decrease in hyperphagia on semaglutide from hyperphagia questionnaire for clinical trials (n=11, -7.3 +/- 6.1 (max 36), d -1.19, P=0.003), having been stable before treatment. HbA1c improved in those with elevated baseline levels (n=6, -4.2 +/- 4.9%, d -0.74, P=0.13). Mild gastrointestinal side effects were seen in 25% but did not lead to discontinuation. Conclusions: In adults with PWS, semaglutide produced weight maintenance, reduced hyperphagia, and improved glycaemic control, with good tolerability. Larger placebo-controlled trials are needed to confirm these findings in adults and adolescents with PWS, especially in those without T2DM, where efficacy may be greater.

19.
arXiv (CS.CL) 2026-06-19

IHUBERT: Vector-Based Semantic Deduplication and Domain-Balanced Pretraining for Persian Resources

Persian pretrained language models (PLMs) are still limited by the scarcity of large-scale, high-quality pretraining corpora and by insufficient evaluation beyond standard classification and NER tasks. We present IHUBERT, a monolingual Persian PLM trained from scratch with the RoBERTa-base encoder (125M parameters) on a 45 GB curated subset of the Sepahr-Danesh collection (about 7-8B tokens). To improve corpus quality and reduce redundancy, we employ a multi-stage preprocessing pipeline that includes normalization, exact and near-duplicate removal, anonymization, and vector-database-based semantic deduplication for distribution balancing control across domains and registers. We additionally train a 139k-vocabulary BPE tokenizer on the full pretraining corpus to better capture Persian morphology and orthographic variation. IHUBERT is evaluated on seven Persian NLU benchmarks covering NER, sentiment analysis, topic classification, NLI, extractive question answering, and relation extraction, using task-standard metrics (entity-level F1, Macro-F1, EM/F1). IHUBERT achieves its strongest gains on extractive QA, ranking first on both PQuAD (F1 88.3542) and ParsiNLU-RC (F1 49.0987), and attains the best result on FarsTail (Macro-F1 0.8350). On NER and topic classification, it remains competitive (e.g., 0.8308 F1 on ParsTwiNER; 0.7953 Macro-F1 on DigiMag), while relation extraction remains the main remaining gap (0.6684 Macro-F1 on PERLEX). A controlled tokenizer ablation on the IHUBERT pretraining corpus shows that BPE yields slightly lower subword fragmentation than WordPiece at matched vocabulary size, supporting our tokenization design. Overall, IHUBERT advances Persian language modeling through semantically curated large-scale pretraining and broad evaluation across both classification and comprehension-oriented tasks.

20.
medRxiv (Medicine) 2026-06-15

Recruitment, Retention Approaches and Community Engagement in the THRIVE pilot Trial: Lessons Learned from a Food is Medicine Trial

Background: Recruitment of underrepresented populations, including Black and Hispanic populations, for Food is Medicine (FIM) and cardiovascular trials, may pose significant challenges. Methods: We implemented a multi-component recruitment approach for the THRIVE (AdapTive personalized dietitian coacHing and messaging with pRoduce prescrIptions to improVE healthy dietary behaviors) pilot trial to engage primarily Black and Hispanic adults in a Food is Medicine for hypertension intervention. The recruitment approaches included community engagement at approximately 40 community events (cultural festivals and neighborhood gatherings); partnerships with 8 community and faith-based service hubs and food distribution sites; recruitment through safety net primary care clinics, digital outreach via the study website, and social media campaigns; and direct recruitment at places of worship. We report lessons learned from the community engagement process, recruitment efficiency, representativeness, and retention outcomes. Results: Within 6 months, the enrollment target was exceeded by 40%, with an accrual index of 1.04. Over 1,000 individuals were reached through the direct-to-community engagement process, while faith-based partnerships engaged about 900 adults. There were 2,673 visits to the study webpage, and social media achieved 12,259 impressions with 399 clicks. About 95% of participants resided within 10 miles of the faith-based recruitment sites. Face-to-face engagement at the food distribution sites within faith-based organizations or community service hubs outperformed digital methods. Faith leader endorsements and follow-up in-person meetings (following unsuccessful email outreach) dramatically increased recruitment. Regarding retention, pre-randomization attrition was 6%, and 82% of participants completed the study. Conclusion: Culturally tailored, community-engaged recruitment grounded in faith-based and local community partnerships, was highly effective in engaging Black and Hispanic populations in this FIM cardiovascular trial. This provides a replicable model for implementing equitable and sustainable cardiovascular health interventions.

21.
arXiv (CS.AI) 2026-06-16

RoboPIN: Grounded Embodied Reasoning via Pinned Chain-of-Thought

arXiv:2606.15753v1 Announce Type: new Abstract: Embodied reasoning requires models to perceive task-relevant objects and spaces in physical environments and maintain consistent visual grounding throughout multi-step reasoning. However, current vision-language models rely on text-only or coordinate-augmented chain-of-thought, where entity references remain implicit and ambiguous. This may cause the reasoning process to decouple from visual evidence, entity references to drift across steps, and a causal disconnection between the reasoning trajectory and the final answer, with these problems further amplified in multi-view scenarios due to cross-view appearance changes. To address these issues, we propose Pinned Chain-of-Thought (\pincot{}), a structured reasoning paradigm that pins every reasoning step to visual evidence. \pincot{} introduces the concept of \reasoninganchor{}, which binds each task-relevant entity to a structured visual anchor with entity name, unique identity, view index, and spatial grounding, enabling consistent entity tracking across reasoning steps and views. We build a fully automated data generation pipeline to construct \dataset{}, a high-quality \pincot{}-formatted reasoning dataset. We then train \method{} through three-stage post-training that progressively injects embodied knowledge, structured reasoning ability, and process-supervised alignment, with rewards that directly constrain both anchor localization and identity consistency during reasoning. On 14 benchmarks covering embodied spatial reasoning, multi-view reasoning, and pointing, \method{} with only 4B parameters consistently outperforms 7B level open-source embodied models, achieving a 12\% average improvement over the strongest 7B baseline, Mimo-Embodied. Further analysis shows that \pincot{} improves grounding accuracy and cross-step identity consistency, validating the effectiveness of process supervision.

22.
arXiv (CS.AI) 2026-06-11

Sparse probes and murky physics: a case study of interpretability challenges in a foundation model for continuum dynamics

arXiv:2606.11657v1 Announce Type: cross Abstract: Generative AI emulators are increasingly used in scientific domains where we already have strong theory, benchmarks, and physical intuition. This raises a central evaluation and interpretability question: when a foundation-style model can reproduce known continuum dynamics, what internal mechanism supports that behavior, is the internal behaviour consistent with known physics, and how does it relate to where the emulator succeeds or fails? We investigate a cross-domain foundation model for continuum dynamics, Walrus by Polymathic, using mechanistic interpretability guided by physical principles. We apply a sparse autoencoder (SAE) to probe a selected layer, and address the practical challenge of triaging a large feature set (over 20,000) using enstrophy as a physically grounded metric. As a deliberately simple testbed, we focus on shear flow and compare feature recruitment across multiple shear-flow setups, i.e. parameter values in the numerical simulation. Across setups we find evidence of piecewise consistency, with subsets of features recurring in similar roles, but this structure is intermittent and does not map cleanly onto standard physical decompositions. In parallel, direct comparisons between numerical simulation and the emulator reveal systematic output-level discrepancies, including regimes where energy/structures become too diffuse or too localized. We connect parts of these discrepancies to changes in specific SAE feature usage. Our work highlights open questions for scientific foundation models: how to robustly prioritize mechanistically meaningful features, how to separate stable structure from analysis artifacts (including single-layer and SAE limitations), and how to use established benchmarks to decide when "different" internal representations are genuinely informative rather than merely effective.

23.
arXiv (CS.AI) 2026-06-11

Runtime Skill Audit: Targeted Runtime Probing for Agent Skill Security

arXiv:2606.11671v1 Announce Type: cross Abstract: Agent skills let LLM agents reuse instructions, resources, tools, and workflows, but they also create a new place for malicious behavior to hide. A skill may look benign in its documentation or code while becoming harmful only when it is invoked with particular user requests, local assets, persistent state, or multi-step tool interactions. This makes purely static vetting brittle. We present Runtime Skill Audit (RSA), a dynamic analysis method that audits skills by asking what the skill-mediated agent actually does under targeted runtime conditions. Instead of testing every skill with the same generic tasks, RSA profiles risk-relevant interfaces, prepares the execution context needed to exercise them, and assigns security labels from the resulting trace evidence. We instantiate RSA on OpenClaw and evaluate it on 100 skills against representative static baselines. RSA achieves 90.0\% accuracy with an 88.0\% true positive rate and an 8.0\% false positive rate, improving accuracy by 13.0 percentage points over the best static baseline. Under self-evolving attacks, static detectors collapse after one or two rounds, while RSA continues to detect 19–20 out of 20 malicious skills across rounds.

24.
arXiv (CS.CV) 2026-06-16

GOOSE-M2F: Adapting Mask2Former for High-Fidelity, Long-Tailed Fine-Grained Semantic Segmentation in Unstructured Outdoor Terrain

We present GOOSE-M2F, a task-specific adaptation of Mask2Former for the GOOSE 2D Fine-Grained Semantic Segmentation (FGSS) Challenge at ICRA~2026. The GOOSE benchmark spans 64 fine-grained classes across unstructured outdoor terrain with a severely long-tailed distribution, where rare classes occupy fewer than 50 pixels per image. We extend the Swin-Large Mask2Former baseline with three targeted contributions: (1)200 Object Queries to eliminate representational saturation; (2)a Feature Refinement Module (FRM) combining ASPP-lite and CBAM dual-attention; and (3)an Auxiliary Supervision Head that delivers direct per-pixel gradients for rare classes. A multi-stage training strategy pairs Distribution-Balanced loss, Rare-Class Copy-Paste augmentation, dynamic IoU-aware re-weighting, and EMA. At inference, a dense sliding-window engine with 2D Gaussian kernel blending and 4-scale TTA adds +10.57\%. GOOSE-M2F achieves 70.08\% Official Composite mIoU (63.55\% fine, 76.61\% coarse), placing 3rd on the GOOSE 2D FGSS leaderboard. Code and trained models are publicly available at: \href{https://github.com/Aditya-Lingam-9000/GOOSE-M2F}{Github GOOSE-M2F Code} and \href{https://huggingface.co/XYZ9843/GOOSE-M2F}{Hugging Face GOOSE-M2F}.

25.
arXiv (CS.CL) 2026-06-11

Language Shapes Mental Health Evaluations in Large Language Models

Multilingual large language models (LLMs) are increasingly used in socially sensitive mental health contexts, including support chatbots, screening, and content moderation. This raises a reliability question: do semantically equivalent mental health inputs elicit comparable evaluations across languages, or systematic shifts consistent with language-associated social and cultural contexts? We examine this question in an English-Chinese setting with GPT-4o and Qwen3-32B using a two-level framework: construct-level evaluative orientation, measured by psychometric stigma instruments, and decision-level behavior, measured by binary stigma detection and four-class depression severity classification. Across instruments and models, Chinese prompts elicit higher stigma-related scores than English prompts. At the decision level, Chinese prompts reduce sensitivity to stigmatizing content and produce more conservative depression severity judgments, leading to more under-estimation errors. These findings show that prompt language can shift both evaluative orientation and downstream behavior in LLM-based mental health evaluation. They highlight the need to evaluate multilingual LLMs not only for aggregate performance, but also for whether they apply comparable evaluative standards across languages in socially sensitive domains.