Academic Intelligence · Curated Daily

探索全球前沿学术脉络

AcademicHub 汇聚顶级期刊与预印本平台的实时文献。定制您的专属科研雷达,利用大语言模型自动生成交叉领域文献分析简报。

01.
arXiv (CS.AI) 2026-06-16

Snyk VulnBench JS 1.0: Can LLMs Find the Same Bugs Twice?

arXiv:2606.15762v1 Announce Type: cross Abstract: We ran 300 repeated vulnerability-finding scans to measure how repeatable agentic large language model (LLM) security review is on the same JavaScript code, prompt, and benchmark harness. The headline result is that LLM security findings were unevenly repeatable: reference-matched findings were stable, but extra model reports varied heavily from run to run. Across 250 model runs, 80 of 161 unique unmatched findings appeared in only one of five identical repetitions, while only 22 appeared in all five. By contrast, when Claude matched a Snyk Code reference finding, the behavior was much more stable: 134 of 158 unique reference-matched findings appeared in all five repetitions. The benchmark also shows complementarity. Models consistently found familiar, high-signal exploit shapes, and in one case surfaced a likely Snyk Code product gap. Snyk Code static application security testing (SAST) was deterministic and better at systematically enumerating repeated data-flow sinks. The results support combining agentic LLM review with deterministic SAST rather than treating either technique as a replacement for the other.

02.
bioRxiv (Bioinfo) 2026-06-18

Structure-Based Immunoinformatics Design of a CTB-Adjuvanted Multi-Epitope Mucosal Vaccine Against Helicobacter pylori

Background: Helicobacter pylori coloniz the gastric mucosa of nearly half of the global population and is classified as a Group I carcinogen by the World Health Organization due to its strong association with gastric cancer. The growing prevalence of antibiotic-resistant H. pylori strains significantly compromises current therapeutic strategies, emphasizing the urgent need for effective prophylactic approaches. Research design and methods; In this study, a novel multi-epitope vaccine was designed targeting H. pylori, incorporating epitopes from four key virulence proteins: BabB, SabB, SabA, and VacA. Using an immunoinformatics-guided structural vaccinology approach, B- and T-cell epitopes were predicted, prioritized based on immunogenicity, conservation, population coverage, and non-homology to human proteins, and assembled into the final vaccine construct. To enhance immunogenicity and specifically stimulate mucosal immune responses, the cholera toxin B subunit (CTB) was fused at the N-terminal via an EAAAK linker, a novel application in H. pylori multi-epitope vaccines. The PADRE universal epitope and additional linkers were incorporated to optimize epitope presentation and helper T-cell activation. Results: Comprehensive evaluations of physicochemical, antigenic, allergenic, and toxic properties were conducted, followed by secondary and tertiary structure modeling, refinement, and validation. Conformational B-cell epitopes were mapped, and molecular docking, binding affinity analysis, energy minimization, and molecular dynamics simulations confirmed structural stability and receptor interactions. Codon optimization and in silico cloning predicted efficient expression in Escherichia coli, while immune simulations suggested robust humoral and cellular responses. Conclusions: This study presents a promising multi-epitope vaccine candidate against H. pylori, offering a rational framework for future experimental validation and potential clinical application.

03.
arXiv (CS.CL) 2026-06-16

From Affect Prediction to Affect Forecasting: Evidence for Distinct Information Sources in Longitudinal Text

Modeling dimensional affect in longitudinal text requires distinguishing current affect estimation from future affective change forecasting. Existing approaches often treat each text as an independent observation and apply similar assumptions to both tasks, without testing whether they rely on different information sources. This paper investigates that distinction using longitudinal self-reported ecological essays and feeling-word entries. We propose the Trait–State Affective Prediction (TSAP) framework and its temporal extension E-TSAP for per-text valence and arousal prediction, evaluated on a held-out prediction test set of 1,737 entries from 91 users. We further propose the Affective Change Forecaster Hybrid (ACF-Hybrid) for next-step affective change forecasting, evaluated on a held-out forecasting test set of 46 users. For prediction, E-TSAP achieves composite Pearson correlations of 0.670 for valence and 0.449 for arousal. For forecasting, textual representations perform worse than compact numeric trajectory baselines: the text-inclusive model achieves only r=0.316 for valence and r=0.284 for arousal, whereas a simple prior-state baseline reaches r=0.615 and r=0.670, respectively. ACF-Hybrid, using dimension-specific numeric trajectory features, achieves r=0.659 for valence and $r=0.658$ for arousal. These results show that textual semantics support current affect prediction, whereas future affective change is better captured through prior numeric trajectory dynamics.

04.
medRxiv (Medicine) 2026-06-22

Three multimodal large language models fail at clinically actionable breast pathology in three different directions

Background. Breast cancer treatment depends on histopathological features, such as grade and receptor-defined subtype; however, specialist pathologist access is constrained when the workforce is limited. Commercial multimodal large language models (MLLMs) accept hematoxylin and eosin (H&E) image tiles through paid interfaces without local hardware or fine-tuning. However, prior pathology evaluations addressed only coarse tasks. Whether they reach treatment-determining accuracy and whether vendors agree remain unclear. Methods. We aimed to evaluate three vendor-designated flagship MLLMs (Claude Sonnet 4.6, Gemini 2.5 Pro, GPT-5.5) in 427 invasive breast cancer cases. Each case went to all three with identical H&E tiles and prompts, and the subtype was inferred in the second call. The reference was an institutional sign-out report of an immunohistochemistry-derived subtype. We calculated the concordance, sensitivity, specificity, Cohen's kappa, and pairwise McNemar and Bowker tests. Findings. Claude ranked highest by raw histologic-type concordance but lowest by kappa, classifying all 23 lobular and seven micropapillary carcinomas as invasive breast carcinoma of no special type. The models anchored the Nottingham grade to three modal grades. None of the models reliably identified human epidermal growth factor receptor 2-positive disease. The failure direction was vendor-specific: Claude and GPT-5.5 were under-detected, whereas Gemini was over-called. Twelve prompt variants (4,056 calls) did not recover sensitivity. Interpretation. No current commercial MLLM reaches deployment-ready accuracy for any treatment-determining feature of breast pathology. As each vendor fails in its own fixed direction, changing vendors alters the type of error rather than removing it; therefore, the value of these models is assistive rather than autonomous. At USD 0.20-0.50 per case, they may serve as supervised draft generators that leave the diagnosis with the pathologist.

05.
arXiv (math.PR) 2026-06-16

A Concavity Theorem for the Parisi PDE

作者:

arXiv:2606.15432v1 Announce Type: new Abstract: We prove that the map sending the diffusion profile to the solution of a time-changed Parisi PDE evaluated at time-space $(0,0)$ is concave. This result strengthens the raywise concavity result proven by Auffinger and Chen (2016). As an application, for the balanced multispecies Ising spin glasses, the lower bound of Bates and Sohn (2025) matches the Hopf-type upper bound given by the Hamilton–Jacobi framework developed by Mourrat, Chen and Xia.

07.
arXiv (math.PR) 2026-06-16

Stein's method for the matrix normal distribution

arXiv:2601.11422v2 Announce Type: replace-cross Abstract: This work presents the first systematic development of Stein's method for matrix distributions. We establish the basic essential ingredients of Stein's method for matrix normal approximation: we derive an extended-generator-based Stein identity from a matrix Ornstein-Uhlenbeck diffusion with two-sided scales, provide an explicit semigroup representation for the solution of the Stein equation, and obtain regularity estimates for the solution. The new methodology is demonstrated in three examples: (i) smooth Wasserstein distance bounds to quantify the matrix central limit theorem (a didactic example), (ii) a Wasserstein distance bound for the matrix normal approximation of the centered matrix $T$ distribution, and (iii) a Stein's method-of-moments approach to estimating the row and column covariance factors of the matrix normal, yielding a flexible class of weighted flip-flop Stein estimators that generalize Dutilleul's classical flip-flop algorithm and naturally accommodate row/column importance weights, systematic missingness, and projection onto structured covariance families. The latter two examples are intrinsically matrix-valued and cannot be treated using naive vectorization.

08.
arXiv (CS.CL) 2026-06-18

Dual Dimensionality for Local and Global Attention

Decoder-only Transformers compute attention over the KV cache of preceding tokens. Keys (and Values) are typically represented with the same dimensionality, regardless of its distance from the prediction target. In natural language, however, the next word is most strongly influenced by the immediately preceding tokens. We hypothesize that local and distant tokens impose asymmetric demands on representational capacity: local tokens are more critical for predicting immediate outputs and thus require richer representations, whereas distant tokens primarily serve as long-range memory, for which lower-dimensional representations may suffice. We formalize this idea as Distance-Adaptive Representation (DAR), implemented in a controlled setting that preserves full-dimensional representations within a local context window while assigning reduced-dimensional representations (e.g. 1/4 of the original dimensionality) to tokens beyond that window. Across multiple pretraining scales (70M to 410M parameters), as well as continued supervised fine-tuning on a 1B-scale model, this approach closely matches the performance of full-dimensional baselines. In contrast, uniformly reducing dimensionality across all token positions leads to worse performance. These results challenge the common assumption that key and value dimensionality should be uniform across token positions. Our findings suggest a new direction for designing attention architectures that adaptively allocate representational capacity across sequences, enabling further reductions in KV cache during inference.

09.
arXiv (CS.CV) 2026-06-18

Semantic Robustness Certification for Vision-Language Models

Vision-language models (VLMs) are now widely used in downstream tasks. However, real-world applications often expose VLMs to distribution shifts induced by semantic variation (e.g., shape, size, and style). Robustness certification determines if a model's prediction changes when transformations are applied to its input. While most certification frameworks study geometric or pixel-level transformations over inputs, this work proposes a novel framework that enables certifying VLM robustness under semantic-level transformations. Leveraging the open-vocabulary capability of VLMs, we use text prompts as semantic proxies to construct transformations parameterized by an extent that controls the degree of semantic variation. By characterizing the VLM decision boundary in closed form, our framework quantitatively certifies extent intervals for which the predicted class remains unchanged under the semantic transformation. Our framework is the first to certify VLM robustness under semantic-level variations without requiring additional data for each variation, making it practical to apply. Experiments on both synthetic and real-world data show that our framework enables certifying robustness under diverse semantic variations across scenarios.

10.
arXiv (CS.CL) 2026-06-18

Enhancing Decision-Making with Large Language Models through Multi-Agent Fictitious Play

Large language model (LLM)-based multi-agent systems (MAS) have demonstrated great potential in solving tasks with execution complexity, by distributing subtasks across cooperative agents. However, this divide-and-conquer paradigm falls short on decision-making tasks that are also prevalent in the real world. These tasks require simultaneous reasoning from the stances of all involved stakeholders whose decisions are mutually dependent and thus cannot be solved in isolation. We characterize this challenge as stance entanglement, a form of decision complexity distinct from execution complexity. To address it, we propose Multi-Agent Fictitious Play (MAFP), a novel MAS paradigm that represents stakeholder stances as agents and formulates decision-making as an equilibrium-seeking process. Built on the game-theoretic principle of fictitious play, MAFP iteratively updates each agent's decision by best responding to the empirical mixture of other agents' past decisions. This enables agents to expose and address one another's weaknesses, progressively improving decision quality and robustness. We evaluate MAFP on challenging decision-making tasks that test the capability of deciding strategies for competitive scenarios prior to acting. MAFP outperforms both single-round and multi-round baselines on two complementary metrics, tournament strength and robustness, demonstrating its effectiveness in addressing stance entanglement.

11.
arXiv (CS.CV) 2026-06-15

ClinHallu: A Benchmark for Diagnosing Stage-Wise Hallucinations in Medical MLLM Reasoning

Building trustworthy medical multimodal large language models (MLLMs) is critical for reliable clinical decision support. Existing medical hallucination benchmarks mainly focus on data collection, but often ignore where hallucinations originate within the reasoning process. We find that hallucination sources vary across samples: errors may arise from visual misrecognition, incorrect medical knowledge recall, or flawed reasoning integration. To enable source-level hallucination diagnosis, we introduce ClinHallu, a benchmark for stage-wise hallucination diagnosis in medical MLLM reasoning. ClinHallu contains 7,031 validated instances, where each instance is augmented with a structured reasoning trace decomposed into Visual Recognition, Knowledge Recall, and Reasoning Integration. We also use stage-replacement interventions to measure how correcting specific stages affects the final answer. Beyond evaluation, we show that trace-supervised fine-tuning reduces stage-wise hallucinations. ClinHallu provides a fine-grained hallucination testbed for diagnosing and mitigating reasoning failures in medical MLLMs. The benchmark is publicly available at https://github.com/alibaba-damo-academy/ClinHallu.

12.
arXiv (CS.LG) 2026-06-15

A Complexity Measure for Active Learning in Multi-group Mean Estimation

arXiv:2606.14690v1 Announce Type: new Abstract: We study a max-risk objective for active learning in a multi-group mean estimation $d$-armed bandits: a learner adaptively allocates a budget of $T$ samples across $d$ groups to minimize the worst-case uncertainty index $\max_{k\in[d]}\sigma_k^2/n_k$, where $\sigma_k$ is the standard deviation of the distribution of arm $d$, and $n_k$ is the number of times arm $d$ is sampled. We develop a local minimax framework and prove the first general lower bound for this objective, valid for any finite-variance hypothesis class. The bound separates difficulty into three orthogonal factors: a budget term, a heteroscedasticity index measuring how unevenly the uncertainty is spread across arms, and a model-dependent complexity measure, the Variance Local Curvature ($\mathrm{VLC}$), which captures how much information a local change of variance creates inside the hypothesis class. For smooth classes, the $\mathrm{VLC}$ is a reparametrization of a variance–Fisher information, with closed-form values for common families. Benchmarking against the strongest available upper bound shows near-optimality up to logarithmic factors in broad regimes, and pinpoints a systematic gap in highly heterogeneous instances. Our proof introduces two key ingredients: a loss-induced $\ell_1$ geometry on the decision space, and a representation-based instance generator that reduces hard-instance construction to an explicit random matrix calculation.

13.
arXiv (CS.AI) 2026-06-17

Using Cognitive Models to Improve Language Model Simulation of Human Persuasion Games

arXiv:2606.17657v1 Announce Type: new Abstract: People make decisions differently in strategic interactions. Some update beliefs like a Bayesian; others exhibit biases like motivated reasoning. Although creators of large language models use simulated humans for safety evaluations and training, they often fail to cover this breadth of human behavior. We argue that cognitive science and economics provide a convenient tool for doing so, making use of mathematical models of human decision-making. We propose an approach that we call Equation-to-Behavior Prompting for guiding large language models to match cognitive models, and evaluate this approach on persuasion games based on legal decision-making. We find that large models can approximate equation-based specifications – Bayesian updating, affine distortion, motivated updating, and Grether's $\alpha$-$\beta$ model – using prompting, but small models fail to do so. However, training small models with reinforcement learning to adhere to mathematical rules, Equation-to-Behavior RL, reduces belief error by 26.5% in out-of-distribution parameterizations. We show that these simulations can help create diverse training environments; training small models to consider different kinds of decision-makers improves average belief change by 2.5%–12% over Bayesian-only training, even when persuading GPT-5-mini. Our work could improve human simulations for training and evaluation in increasingly realistic settings, and could also enable novel research into more complicated mathematical models of human decision-making.

14.
medRxiv (Medicine) 2026-06-15

Epileptogenicity alters intrahippocampal ripple propagation

Objective: Tracing the propagation of high-frequency oscillations (HFOs) aids in localizing epileptogenic regions and improving surgical outcomes. We examined how hippocampal epileptogenicity influences the propagation properties of the HFOs it generates. Methods: We analyzed non-REM sleep stereo-EEG from 49 patients (68 hemispheres) with verified hippocampal contacts. Hippocampi were stratified by excitability: 28 seizure onset zone (SOZ), 22 more-irritative non-SOZ (>6 interictal epileptiform discharges [IED]/min), and 18 less-irritative non-SOZ (

15.
arXiv (CS.CL) 2026-06-16

When Cognitive Graphs Meet LLMs: BDEI Cognitive Pathways for Panic Emotional Arousal Prediction

Predicting individual panic emotional arousal timing before manifestation is essential for proactive emergency intervention. Existing methods incorporate cognitive elements but none explicitly model the emotional arousal process, making them ill-suited for emotional arousal timing prediction. We argue that grounding prediction in appraisal emotion theory is necessary because it explicitly models this process, but three problems must be solved. (1) Appraisal theory posits that emotion arises from simultaneous evaluation across multiple threat dimensions, yet no prior work fuses these inputs into risk perception. (2) Existing cognitive models lack an Emotion node, decoupling threat appraisal from emotional arousal and forcing emotions to be inferred indirectly from behaviors. (3) Given their generalizable cognitive reasoning, current approaches adopt LLMs as the primary decision-maker, yet overlook the fragility and hallucination-proneness of their outputs. To address these issues, we introduce PanicCognitivePath (PCP), a framework that addresses all three. A Psychological Safety Distance (PSD) model, grounded in psychological distance theory, maps four-domain signals into a unified risk metric as the entry condition for subsequent cognitive reasoning. An explicit Emotion node grounded in appraisal emotion theory is introduced into BDI, forming a Belief-Desire-Emotion-Intention (BDEI) pathway. Agents whose risk metric exceeds the PSD threshold enter this pathway, coupling threat appraisal directly to emotional arousal. The BDEI pathway governs all state transitions while the LLM is confined to parameter estimation for the Belief-to-Desire transition, confining hallucinations to a single step and preventing error propagation. Experiments on Hurricane Sandy show PCP improves arousal timing accuracy by 10.68% over baselines, reduces peak count error to 7.07%.

16.
arXiv (CS.LG) 2026-06-17

Continuous-time Optimal Stopping through Deep Reinforcement Learning

arXiv:2606.17545v1 Announce Type: new Abstract: Simulation based solvers for optimal stopping problems must discretize the stopping decision. Under classical dynamic programming, a coarse exercise grid with only a few stopping opportunities can materially undervalue the optimal expected reward, whereas on a very fine grid, approximation errors accumulate through the backward recursion. To remove this limitation, we develop a new reinforcement-learning inspired algorithm that enables us to learn the exercise rule at arbitrarily fine time resolution. Our CARLOS (Continuous-time Adaptive Reinforcement Learning for Optimal Stopping) algorithm utilizes an aggregate deep neural network (ADNN) to learn a joint space-time decision boundary. Starting from a coarse time grid, we progressively increase the frequency of stopping opportunities, while in parallel training the ADNN to refine its timing-value estimates. We moreover design an adaptive sampling strategy that gradually concentrates training effort near the stopping boundary. Benchmarked results show that CARLOS delivers higher prices than existing Bermudan solvers, approaching the American upper bound, and achieves high computational efficiency relative to non-RL comparators.

18.
arXiv (CS.LG) 2026-06-11

Fourier Features Let Agents Learn High Precision Policies with Imitation Learning

arXiv:2606.12334v1 Announce Type: new Abstract: High-precision robotic manipulation requires fine-grained spatial reasoning that is often difficult to achieve with RGB-only policies due to depth ambiguity and perspective scale issues. Policies that leverage 3D information directly, such as those based on point clouds, offer a stronger geometric prior over purely image-based ones, yet their performance remains highly task-dependent. We hypothesize that this discrepancy may be due to the spectral bias of neural networks towards learning low frequency functions, which especially affects architectures conditioned on slow-moving Cartesian features. We thus propose to map point clouds from Cartesian space into high-dimensional Fourier space, effectively equipping the point cloud encoder with direct access to high-frequency features. We experimentally validate the use of Fourier features on challenging manipulation tasks from the RoboCasa and ManiSkill3 benchmarks and on a real robot setup. Despite their simplicity, we find that Fourier features provide significant benefits across diverse encoder architectures and benchmarks and are robust across hyperparameters. Our results indicate that Fourier features let policies leverage geometric details more effectively than Cartesian features, showing their potential as a general-purpose tool for point cloud-based imitation learning. We provide source code and videos on our project page: https://fourier-il.github.io/fourier-il

19.
arXiv (math.PR) 2026-06-16

Scaling Limits of Bivariate Nearly-Unstable Hawkes Processes and Applications to Rough Volatility

arXiv:2605.03703v3 Announce Type: replace Abstract: We study a pair of nearly-unstable Hawkes processes coupled through a one-directional, or triangular, cross-excitation: the first component evolves autonomously and excites the second, but not conversely. Each component is self-exciting through a heavy-tailed memory kernel, and the two kernels are allowed to have different tail indices, so that the limiting components exhibit genuinely different degrees of roughness. As the system approaches criticality, we prove that the suitably rescaled intensity vector converges weakly to the unique solution of a coupled system of stochastic Volterra equations of rough-volatility type. The first limiting component is autonomous, while the second is driven both by its own noise and by an inherited noise transmitted from the first component through an effective cross-kernel. This cross-kernel is the convolution of the two limiting Mittag-Leffler kernels and therefore combines the two memory structures. As a consequence, we obtain a short-time cross-decorrelation law: although the two components are coupled, their functional correlation vanishes at small time scales at an explicit polynomial rate. This time-dependent correlation distinguishes the limit from independent rough processes and from classical bivariate rough models with constant Brownian correlation.

20.
medRxiv (Medicine) 2026-06-17

Adverse Childhood Experiences Reorganise the Brain-Personality Network Across the Psychosis Spectrum

Exposure to adverse childhood experiences is a pervasive risk factor for psychosis, exhibiting a linear relationship across the psychosis spectrum from subclinical schizotypal traits to schizophrenia spectrum disorders. While this association is often conceptualised within the vulnerability-stress framework, the systemic mechanisms through which childhood trauma reconfigures the brain-personality interactome remain poorly understood. We examined clinical, neuropsychological, and neuroimaging data from a sample of low- and high-schizotypy individuals, and patients with a diagnosis of schizophrenia spectrum disorder (N=120). Our aim was to map how trauma reconfigures interactions between neurobiology and schizotypal phenomenology. We adopted a mixed graphical model approach to jointly estimate conditional dependencies between childhood trauma, regional brain morphometry, and schizotypal traits across the psychosis spectrum. Our results show that childhood trauma reconfigures the brain-personality network, shifting it from a state driven by cognitive processes to one anchored in emotional (limbic) reactivity. This transition is marked by the increased influence of impulsive traits and a significant strengthening of connections within the salience network. These changes converge with a reduced thickness of the frontal executive regions, the brain's control centres, identified in our models. Collectively, our results suggest a structural phenomenological decoupling, where trauma conditioned affective circuits may bypass weakened top-down regulatory controls. These findings highlight the necessity of using integrative frameworks to capture how trauma fundamentally reshapes the relationship between the brain and schizotypal personality.

21.
bioRxiv (Bioinfo) 2026-06-11

OMIO: A policy-driven Python library for reproducible microscopy image I/O

Modern fluorescence and multiphoton microscopy workflows operate within a heterogeneous ecosystem of file formats, partially overlapping metadata standards, and reader-specific conventions. In practice, this frequently leads to silent axis misinterpretations, loss or corruption of physical voxel size information, and laboratory-specific glue code that is fragile, poorly documented, and difficult to reproduce. OMIO, short for Open Microscopy Image I/O, addresses these issues by providing a lightweight, policy-driven image I/O layer for Python that enforces a canonical, OME-compatible data representation at the API boundary. The central contribution of OMIO is the explicit separation of low-level format access from semantic normalization. Existing reader libraries are used as interchangeable backends for extracting pixel data and available metadata, while OMIO enforces axis conventions, metadata interpretation, and fallback decisions in a centralized and auditable policy layer. This design allows heterogeneous microscopy inputs to be converted into a stable representation without propagating backend-specific assumptions into downstream analysis code. The core design principles of OMIO include canonical axis semantics (TZCYX), robust metadata normalization with explicit and auditable fallbacks, memory-aware operation via optional Zarr-based backends, and workflow-level semantics that extend beyond individual files to folder stacks and BIDS-like project structures. This architecture allows OMIO to orchestrate existing reader libraries into a coherent and reproducible I/O pipeline without replacing or duplicating their functionality. OMIO is implemented as an open-source and community-oriented system in which support for additional file formats and metadata conventions can be added incrementally through modular reader backends. By encouraging the contribution of example datasets, backend extensions, and feature requests, OMIO is designed to evolve alongside emerging acquisition systems while preserving strict semantic guarantees at the interface level. The resulting standardized OME-TIFF outputs are immediately suitable for downstream quantitative analysis and interactive inspection in scientific Python workflows, including workflows based on ImageJ and Napari.

22.
medRxiv (Medicine) 2026-06-16

Adherence to Red Reflex and Vision Screening Recommendations: A Deep Dive into Primary Care Implementation Gaps

Introduction: Early childhood vision screening is critical for detecting amblyopia and other vision-threatening conditions. Despite screening recommendations during well-child visits, rates remain low. Red reflex assessment is recommended to identify serious ocular pathology, yet its use in primary care is not well described. We examined rates and drivers of vision screening in pediatric primary care. Methods: We conducted a retrospective review of electronic health records for children 3 to 5 years attending well-child visits in 2022 in one of three representative primary care clinics within a university health system. Outcomes were documented red reflex and functional vision tests. We evaluated associations with patient demographics and clinic site using multivariable logistic regression Results: Among 1,003 visits, 21.1% (n=212) had a documented red reflex assessment, and 60.8% (n=610) a functional vision test. Younger children (ages 3 and 4 vs. 5 years) had higher odds of red reflex assessment [adjusted odds ratio (aOR) 9.00 and 8.64], and lower odds of a functional vision (aOR 0.47 and 0.59) test. Females had higher odds of red reflex assessment (aOR 1.53). Other/Multiracial children had lower odds of red reflex assessment than Non-Hispanic White children (aOR 0.48). Screening rates varied significantly by clinic site Conclusions: Visual function and red reflex assessment are inconsistently performed in pediatric primary care, with particularly low rates of red reflex documentation. Screening rates varied between clinics and were affected by age. These findings highlight missed opportunities for early detection of vision-threatening conditions and identify targets for improving adherence to pediatric vision screening recommendations

23.
bioRxiv (Bioinfo) 2026-06-11

ANCHOR: haplotype-aware allelic and isoform inference from single-cell long-read RNA sequencing with de novo variant calling

Long-read RNA sequencing enables haplotype- and isoform-resolved allelic analysis of transcriptomes, yet extending this capability to single cells and distinct cell types remains computationally challenging due to sparse coverage, sequencing errors, incomplete variant information, and reference-biased transcript assignment. Here we present ANCHOR, a haplotype-aware framework for single-cell long-read RNA sequencing that performs de novo expressed-variant discovery, molecule-level haplotype assignment and isoform-resolved allelic quantification. ANCHOR combines a signed-graph variant caller, pair hidden Markov modelling and beta-binomial UMI aggregation to infer parental allele counts for genes and splice-resolved isoforms, without requiring a pre-existing phased genotype or deep learning. In human single-cell long-read RNA benchmarks, ANCHOR improved variant-calling performance over tested long-read RNA callers at single-cell and low-to-moderate coverage, and its beta-binomial model reduced depth-driven false positives in allele-specific expression testing. Applied to newly generated single-cell long-read RNA-seq data from reciprocal mouse crosses during gastrulation, ANCHOR resolved cell-type- and isoform-specific parent-of-origin imprinting and identified an antagonistic maternally biased Sgce isoform. ANCHOR provides a general framework for allele- and isoform-resolved analysis of diploid single-cell long-read transcriptomes.

24.
arXiv (CS.LG) 2026-06-18

A Cross-Model VLM-Judge Protocol for Single-Image 3D Mesh Quality (and Why Cheap Proxies Fall Short)

arXiv:2606.18451v1 Announce Type: new Abstract: Single-image-to-3D generators are improving quickly, but there is no agreed, human-free way to tell whether one generated mesh is better than another. Practitioners commonly rely on cheap automatic proxies (render-space CLIP similarity and mesh geometry-validity statistics), yet how well these track perceived quality is unestablished. We make two contributions. First, we propose and validate a reproducible VLM-judge evaluation protocol: a fixed 24-view headless render rig, two independent vision-language judge families, and a mandatory position-bias correction that queries both presentation orders and keeps only order-consistent verdicts. The two judge families agree substantially with each other (Cohen's kappa = 0.66), well above the chance-agreement floor. Second, using this protocol as the reference, we show the cheap proxies do not substitute for it. Geometry validity is only a weak signal on average (because, as we show, it is bimodal) and stays below our pre-registered target, while render-CLIP is at chance. A learned Bradley-Terry head collapses onto a single manifoldness statistic (giving render-CLIP a negative weight) and matches geometry-only exactly, so learning the feature weights buys nothing. The proxy is also bimodal: it is significantly above chance on contrasts with visible geometric defects but at chance on ambiguous contrasts, consistent with geometry validity tracking the judge only when the defect is visually salient. We therefore recommend the VLM-judge protocol as a reliable, reproducible evaluator under the conditions tested (two feed-forward generators on Google Scanned Objects, with a face-drop degradation regime) and advise against geometry/CLIP proxies as optimization targets.

25.
arXiv (CS.CL) 2026-06-17

Structural Role Injection in Handlebars-Templated LLM Prompts: Triple-Brace Interpolation, Delimiter Family, and the Limits of HTML Auto-Escaping

Large language model applications build prompts from templates, and Handlebars is a widely used templating engine and the default prompt-template format in Microsoft Semantic Kernel. Its double-brace {{x}} expression HTML-escapes the interpolated value and is documented as the safe default; its triple-brace {{{x}}} expression inserts the value raw. We show that this choice silently governs an application's exposure to structural role injection, where attacker-controlled data carries chat role delimiters that forge a higher-privilege turn. A model-free analysis establishes the mechanism: Handlebars escaping rewrites angle brackets but not square brackets, colons, or Markdown hashes, so it neutralises ChatML, Llama-3, and XML role delimiters (survival rate 0.00) while leaving Llama-2 [INST], legacy Human:/Assistant:, and Markdown ### delimiters intact (survival rate 1.00 for the last two). We then run 5760 trials across seven delimiter families, two attack objectives, and four models (GPT-3.5 Turbo, GPT-4o mini, GPT-4.1 mini, Claude Haiku 4.5) at a combined API cost of 1.63 USD. GPT-3.5 Turbo follows the task-hijack instruction in 97% of raw and 91% of escaped trials, with the escaping protection concentrated in the angle-bracket families and absent for the colon- and Markdown-based families; the harder secret-exfiltration objective, which does not saturate, exposes the same family interaction more cleanly. Claude Haiku 4.5 resists both objectives almost entirely. The escaped default protects only the delimiter schemes whose characters HTML escaping happens to cover, gives no protection for the rest, and cannot substitute for a structural separation of instruction and data.