Academic Intelligence · Curated Daily

Explore the Frontier of Global Academia

AcademicHub aggregates real-time literature from top journals and preprint platforms. Build your personal research radar and let large language models compile cross-disciplinary analysis briefings automatically.

01.
arXiv (CS.CV) 2026-06-16

Chronological Blindness: Benchmarking Temporal Reasoning in Vision-Language Models with CHRONOSIGHT

Human perception of visual scenes is inherently temporal. We instinctively recognise whether a fruit is ripening or rotting, whether construction is progressing or being demolished, and approximately how much time separates two photographs of the same subject. Whether large vision-language models (VLMs) share this competence remains an open and practically important question. We introduce CHRONOSIGHT, a rigorously controlled benchmark evaluating five dimensions of visual temporal reasoning: CHRONORANK (chronological ordering of image sequences), CHRONOLOCATE (ordinal stage localisation from a single image), CHRONODELTA (estimation of time elapsed between two images on a logarithmic scale), CHRONOREVERSE (detection of temporally reversed sequences), and CHRONOODD (identification of a temporal outlier within a set). The benchmark comprises 1{,}000 items across eight process families (biological growth, food transformation, physical weathering, construction, environmental change, human ageing, astronomical phenomena, and urban dynamics) spanning timescales from minutes to millennia. We evaluate eight open-source VLMs (500 M to 19 B parameters) under two prompting regimes and collect human performance baselines. Human performance averages 0.89 across tasks; the best open model (Qwen2.5-VL-7B) reaches 0.40 under direct prompting, a gap we term chronological blindness. Lightweight LoRA fine-tuning on 151 examples raises CHRONODELTA accuracy from near-zero to 0.43, transferring zero-shot to related tasks (CHRONOODD: 0.37; CHRONOREVERSE: 0.64)suggesting the bottleneck is partly instruction following rather than visual perception. Benchmark, code, and predictions will be released upon acceptance.

02.
arXiv (CS.LG) 2026-06-12

Retrieval-Augmented Foundation Models for Water Level Prediction in the Everglades

arXiv:2508.04888v2 Announce Type: replace Abstract: Accurate water level forecasting in the Everglades is essential for flood mitigation, drought management, water resource planning, and biodiversity conservation. While recent time-series foundation models have shown strong performance on generic tasks (represented in their pre-training), their effectiveness in domain-specific applications remains insufficiently understood. In this work, we curate a domain-specific dataset for water-level forecasting in the Everglades and observe that the performance of current state-of-the-art models remains limited. To address this gap, we leverage a retrieval-augmented mechanism that retrieves analogous multivariate hydrological episodes from an external archive of historical observations to enrich the input context of those pre-trained models. We study two retrieval strategies, statistical similarity-based retrieval and mutual information-based retrieval, and analyze how incorporating retrieved historical contexts affects predictive performance. Extensive experiments show that retrieval augmentation consistently improves long-horizon water level forecasts and yields disproportionately larger gains during extreme events, which is particularly critical for environmental decision-making. Our study provides empirical evidence that analog-based retrieval can benefit pretrained time-series foundation models in environmental science, offering practical insights into their strengths, limitations, and failure modes when applied to hydrological forecasting in the Everglades. Although evaluated in the Everglades, the proposed framework is general and can be applied to other hydrological systems given time series data. The code and data have been made publicly available at https://github.com/rahuul2992000/WaterRAF.

03.
arXiv (CS.AI) 2026-06-24

Fast and Slow Variational Continual Learning

arXiv:2606.24007v1 Announce Type: cross Abstract: Continual learning remains a major challenge for modern deep networks, partly because commonly used optimizers lack inherent mechanisms for continual adaptation. One such natural mechanism is fast and slow adaptation to balance stability and plasticity. This mechanism has deep roots in neuroscience and biology, but there is no consensus on how to best incorporate it in commonly used optimizers. Here, we show that this can be easily done via the VCL framework, where past posteriors are used as priors in the future. Our key idea is to incorporate slow adaptation via merging of past posteriors to slow down the drift in the knowledge as learning progresses. The merged posterior is then used as the prior in the VCL update to implement the fast-weight updates. These steps can be seamlessly implemented in the IVON optimizer, whose form and costs are nearly identical to that of Adam. We call this new optimizer the Continual IVON (CoVON) optimizer and show that it not only consistently improves over existing VCL optimizers, but also performs better than other weight-regularization strategies across domain-incremental learning, continual pre-training, and fine-tuning of large language models.

04.
arXiv (CS.AI) 2026-06-24

A Unified Framework for Runtime Verification and Model-Based Diagnosis in LOLA

arXiv:2606.23720v1 Announce Type: cross Abstract: We present an integrated framework that unifies runtime verification and model-based diagnosis within the stream specification language LOLA. By encoding system descriptions, component health states, and observations into a single stream-based formalism, the approach enables continuous, online fault localization directly alongside fault detection, without requiring separate toolchains. The framework supports both time-invariant and transient faults, and naturally accommodates nondeterministic observations.

05.
arXiv (CS.AI) 2026-06-19

"**Important** You should give me full credits!": Exploring Prompt Injection Attacks on LLM-Based Automatic Grading Systems

arXiv:2606.03090v2 Announce Type: replace-cross Abstract: The emergence of large language models (LLMs) has significantly accelerated recent research on LLM-based automatic grading (AG) systems. Benefiting from the strong instruction-following capabilities and broad prior knowledge of LLMs, educators can deploy AG systems across diverse tasks using only natural language rubrics while achieving satisfactory grading performance. Despite these advantages, new security concerns may also arise. In particular, prompt injection (PI) attacks have recently become a major threat to LLM-based applications. In the context of AG, attackers can potentially exploit PI vulnerabilities to manipulate grading systems into assigning artificially high scores regardless of the actual answer quality. Such behavior poses serious risks to the fairness, reliability, and integrity of educational assessment. In this work, we study PI attacks in AG systems, and systematically investigate the effectiveness of such attacks in educational scenarios. We further evaluate the effectiveness of existing defensive strategies against these attacks. Through comprehensive experiments under rubric-based grading settings, we demonstrate that current LLM-based AG systems remain highly vulnerable to PI attacks. We hope that our findings raise awareness of this emerging threat and motivate future research toward secure, robust, and trustworthy LLM-based educational systems.

06.
arXiv (CS.CV) 2026-06-12

Objects Before Words: Object-First Inductive Biases for Grounding Language in Child-View Video

Learning grounded word meaning from natural experience requires resolving two ambiguities in infant-view recordings: when the named referent appears and where it is in a cluttered frame. In SAYCam-style data, caregiver speech is sparse and weakly synchronized with egocentric video, so single-frame contrastive pairing yields noisy positives in which the intended object is absent or entangled with distractors. We propose BabyMind, an object-first bias for child-view contrastive learning under sparse, noisy supervision. BabyMind extracts candidate object embeddings using an offline mask-based region interface, links candidates across a short utterance-centered window into lightweight object files via tracking, and aligns utterances to bags of object files with a prototype-space multiple-instance contrastive objective. Track-coherence and global-object agreement regularizers stabilize learning and transfer object-file structure into the global frame embedding used at evaluation. On SAYCam-S, BabyMind improves Labeled-S 15 forced-choice accuracy by +2.6 points over CVCL and yields consistent gains on in-vocabulary out-of-distribution benchmarks. Code is available at https://github.com/sathiiii/BabyMind.

07.
arXiv (CS.CL) 2026-06-16

LM-SPT: LM-Aligned Semantic Distillation for Speech Tokenization

With the rapid progress of speech language models (SLMs), discrete speech tokens have emerged as a core interface between speech and text, enabling unified modeling across modalities. Recent speech tokenization approaches aim to isolate semantic information from low-level acoustics to better align with language models (LMs). In particular, previous methods use self-supervised learning (SSL) teachers such as HuBERT to extract semantic representations, which are then distilled into a semantic quantizer to suppress acoustic redundancy as well as capture content-related latent structures. However, these tokenizers often operate at relatively high frame rates, producing token sequences significantly longer than their textual counterparts and hindering seamless integration with pretrained LMs. Although recent methods attempt to reduce the token rate by applying uniform average pooling to SSL features, this can over-smooth content-bearing regions and dilute the structural information, thereby potentially limiting the LM alignment. To address this, we propose LM-SPT, an LM-aligned speech tokenization method based on semantic speech-resynthesis distillation. Instead of directly matching teacher and student features via pooling, LM-SPT resynthesizes speech from semantic tokens only and minimizes the discrepancy between representations extracted from the original and resynthesized waveforms using a frozen, LM-aligned speech encoder. This indirect supervision avoids rigid temporal alignment and encourages dedicated semantic units that are more semantically aligned with LMs under reduced frame rates. Experimental results show that the proposed LM-SPT consistently outperforms previous semantic-enhanced speech tokenizers when applied to SLMs for the tasks of automatic speech recognition and text-to-speech, even without compromising the speech reconstruction fidelity at the codec level.

08.
arXiv (CS.AI) 2026-06-18

Maturing Markov Decision Processes: Decision Making under Increasing Information and Shrinking Action Sets

arXiv:2606.18820v1 Announce Type: cross Abstract: Sequential decision problems often exhibit an asymmetric evolution of information and decision flexibility: as a decision cycle unfolds, the agent receives richer information while feasible actions expire due to operational cutoffs, commitments, or resource constraints. Standard MDP formulations typically flatten this structure into stage-dependent state descriptions and action masks, thereby obscuring the nested information–action asymmetry that determines which decisions are urgent and which can be deferred. We introduce Maturing Markov Decision Processes (MMDPs), a formulation built around this information–action asymmetry. We characterize one of its key consequences through an expiring-action priority principle, which identifies the actions that must be resolved before the next stage. Motivated by this structure, we develop a structure-aware reinforcement learning framework with stage-aware policy design, expiring-action abstraction, and search-augmented learning with distillation. Experiments on a controlled multi-supplier replenishment problem, simplified cash-management environments of increasing complexity, and a production-scale simulator show that explicitly modeling this asymmetry improves learning efficiency and becomes increasingly valuable as decision problems scale.

09.
arXiv (CS.LG) 2026-06-19

Computational Methods and Challenges in Cell-Free DNA Analysis for Multi-Cancer Early Detection

arXiv:2606.20174v1 Announce Type: new Abstract: Cell-free DNA (cfDNA) is a promising avenue for non-invasive multicancer early detection (MCED), in that, it can enable multiple cancer detection simultaneously from a single blood draw, with particular sensitivity to cancers that currently lack established screening programs. Here we review the computational methods developed between 2022 and 2025 for cfDNA-based MCED. We focus on how fragmentomics and epigenetic features are extracted and analyzed to detect cancer at early stages. We first briefly outline the biological basis of cfDNA signals, then review classical statistical and machine learning approaches alongside deep learning frameworks including autoencoder-based models. For each method we discuss biological interpretability, validation strategy, and readiness for clinical integration. Furthermore, we categorize the current challenges into technical, computational, and methodological while outlining open problems in the field. This review shows that multimodal ensemble approaches have the strongest promise for clinical integration and the highest readiness. However, for better assessment of future work and side-by-side comparison, standardization of evaluation protocols and reporting results will be crucial.

10.
arXiv (math.PR) 2026-06-24

On the convex hull of a planar Brownian bridge with a random Gaussian endpoint

arXiv:2606.24485v1 Announce Type: new Abstract: We consider a one-parameter family of isotropic planar Gaussian processes \[ X_\sigma(t) =B_t+\sigma t Z,\qquad 0\le t\le 1,\quad 0\le \sigma\le 1, \] where $B$ is a standard ($0$-to-$0$) planar Brownian bridge on $[0,1]$, and $Z\sim \mathrm N(0,I)$ is a standard Gaussian random vector independent of $B$. The family interpolates between standard planar Brownian bridge ($\sigma=0$) and standard planar Brownian motion ($\sigma=1$). As the main result of the paper we compute the expected perimeter and area of the convex hull of the random set $\left\{X_\sigma(t) \colon 0\le t\le 1\right\}$ as closed formulas in terms of $\sigma$, and recover the classical Brownian bridge and Brownian motion values at $\sigma=0$ and $\sigma=1$. We also consider the convex hull spanned by multiple independent processes of this type and the possibilities for closed formulas in special cases. The key observation in our argument is that the isotropy property reduces the expected perimeter and area to one-dimensional quantities through the support function and Cauchy's formulas.

11.
medRxiv (Medicine) 2026-06-22

Why drinking episodes escalate differently: Event-level pathways linking hazardous alcohol consumption and sexual risk

Background: Alcohol-involved drinking episodes vary in whether they involve hazardous alcohol consumption alone, near-miss sexual risk, or sexual risk behavior, but the within-event mechanisms underlying this variability remain unclear. Methods: Guided by syndemic theory, we conducted a qualitative event-level analysis using modified grounded theory among adults in the San Francisco Bay Area who reported hazardous alcohol consumption, defined as an Alcohol Use Disorder Identification Test score [≥]16. In-depth interviews elicited narratives of recent heavy drinking episodes and yielded 64 discrete drinking events across 22 participants. We focused on 35 events with evidence of within-event interaction between biopsychosocial and contextual factors. Using constant comparison, we identified escalation pathways, characterized interruption, and examined how events diverge into three outcomes: hazardous alcohol consumption only, hazardous alcohol consumption with near-miss sexual risk (when risk was plausible but not enacted), and hazardous alcohol consumption with sexual risk behavior. Results: Two primary escalation pathways emerged. Dose-driven escalation involved cumulative alcohol or substance exposure that progressively impaired awareness and self-regulation. Meaning-driven escalation involved prioritizing connection, intimacy, or belonging despite awareness of risk. Time-driven continuation extended exposure across contexts and amplified both pathways. Hazardous alcohol consumption-only events more often followed dose-driven pathways, whereas events involving sexual risk behavior more often followed meaning-driven pathways. Near-miss events occurred across both pathways and illustrated how interruption before the escalation constraint point, when the capacity to modify behavior became reduced, could redirect escalation before sexual risk behavior occurred. Across events with similar levels of intoxication narratives, outcomes diverged according to when the interruption occurred and whether it altered escalation. Conclusion: Hazardous drinking episodes diverge into different outcomes based on escalation pathways and the timing and effectiveness of interruption. Early and effective interruption before the escalation constraint point may represent a key target for harm-reduction strategies to prevent progression to sexual risk behavior.

12.
arXiv (CS.CV) 2026-06-25

SparseGS: Sparse View Synthesis using 3D Gaussian Splatting

3D Gaussian Splatting (3DGS) has recently enabled real-time rendering of unbounded 3D scenes for novel view synthesis. However, this technique requires dense training views to accurately reconstruct 3D geometry. A limited number of input views will significantly degrade reconstruction quality, resulting in artifacts such as "floaters" and "background collapse" at unseen viewpoints. In this work, we introduce SparseGS, an efficient training pipeline designed to address the limitations of 3DGS in scenarios with sparse training views. SparseGS incorporates depth priors, novel depth rendering techniques, and a pruning heuristic to mitigate floater artifacts, alongside an Unseen Viewpoint Regularization module to alleviate background collapses. Our extensive evaluations on the Mip-NeRF360, LLFF, and DTU datasets demonstrate that SparseGS achieves high-quality reconstruction in both unbounded and forward-facing scenarios, with as few as 12 and 3 input images, respectively, while maintaining fast training and real-time rendering capabilities.

13.
arXiv (CS.AI) 2026-06-19

StreamKL: Fast and Memory-Efficient KL Divergence for Boosting Attention Distillation

arXiv:2606.20005v1 Announce Type: cross Abstract: Attention distillation, which trains one attention distribution to match another by minimizing their Kullback-Leibler (KL) divergence, is widely used in knowledge distillation, model compression, continual learning, and sparse-attention LLM training. However, existing approaches materialize both attention distributions before computing the KL reduction, incurring $O(N_QN_K)$ memory and IO costs that become prohibitive at long context lengths. We present StreamKL, the first fused GPU primitive for attention KL divergence that eliminates this quadratic materialization. StreamKL derives a novel online formulation for the coupled two-distribution KL reduction, enabling a single one-pass forward kernel that streams query-key tiles through on-chip SRAM. For the backward pass, StreamKL recomputes attention probabilities tile-by-tile, avoiding storage of quadratic intermediates. We further design and implement efficient GPU kernels with dedicated optimizations. Experiments show StreamKL delivers up to $43\times$ and $14\times$ speedups over baseline methods in the forward and backward passes, respectively. Most importantly, StreamKL reduces the extra HBM footprint of attention distillation from $O(N_QN_K)$ to $O(1)$, enabling long-context distillation on a single GPU.

14.
arXiv (quant-ph) 2026-06-25

A Contactless Heat Engine Driven by Nonreciprocal Fluctuation-Induced Torques

arXiv:2606.25053v1 Announce Type: new Abstract: We describe a contactless heat engine in which quantum and thermal electromagnetic fluctuations act as the working medium. The setup consists of two concentric cylinders held at different temperatures. The inner cylinder stably levitates within the outer one due to repulsive nonequilibrium Casimir forces. The chirality of the setup is broken by using nonreciprocal dielectric materials, akin to application of a magnetic field along the common cylinder axis. Using Rytov fluctuational electrodynamics, we show that heat transfer and torque can be expressed in terms of an angular-momentum-resolved heat flux density, $\Phi_n(\omega)$: each exchanged photon carries energy $\hbar \omega$ and angular momentum $\hbar n$. In reciprocal media contributions from modes $n$ and $-n$ cancel and there is no net torque; nonreciprocity breaks this symmetry and powers rotation of the inner cylinder. Even in the absence of contact, electromagnetic fluctuations produce a frictional torque opposing rotation that we compute. This enables computation of characteristic steady state rotations, and estimation of the engine efficiency (which remains bounded by the Carnot limit). The cylindrical setup provides a natural realization of fluctuation-induced angular-momentum transfer and a possible route toward nanoscale contactless engines.

15.
arXiv (CS.LG) 2026-06-16

In-Context Learning Is Provably Bayesian Inference: A Generalization Theory for Meta-Learning

arXiv:2510.10981v3 Announce Type: replace-cross Abstract: This paper develops a finite-sample statistical theory for in-context learning (ICL), analyzed within a meta-learning framework that accommodates mixtures of diverse task types. We introduce a principled risk decomposition that separates the total ICL risk into two orthogonal components: Bayes Gap and Posterior Variance. The Bayes Gap quantifies how well the trained model approximates the Bayes-optimal in-context predictor. For a uniform-attention Transformer, we derive a non-asymptotic upper bound on this gap, which explicitly clarifies the dependence on the number of pretraining prompts and their context length. The Posterior Variance is a model-independent risk representing the intrinsic task uncertainty. Our key finding is that this term is determined solely by the difficulty of the true underlying task, while the uncertainty arising from the task mixture vanishes exponentially fast with only a few in-context examples. Together, these results provide a unified view of ICL: the Transformer selects the optimal meta-algorithm during pretraining and rapidly converges to the optimal algorithm for the true task at test time.

16.
arXiv (CS.LG) 2026-06-15

Identifiable Markov Switching Models with Instantaneous Effects and Exponential Families

arXiv:2606.02231v2 Announce Type: replace-cross Abstract: Temporal systems often exhibit non-stationary behaviour, such as seasonal climate variation or glucose fluctuations in patients with type-1 diabetes. One way to model non-stationarity is through discrete latent regimes, i.e., stationary segments of time. Such systems induce a Markov Switching Model (MSM), a class of Hidden Markov Models with autoregressive dependencies among latent regimes and observed variables. Identifying latent regimes is challenging in the presence of frequent regime switches and nonlinear and non-Gaussian dynamics, particularly when there are instantaneous effects between the variables, e.g., due to slow rates of measurements. In this work, we establish the identifiability of both latent regimes and regime-dependent causal structures under temporal regime dependencies, nonlinear lagged and instantaneous effects, and independent noise from the exponential family. Our identifiability theory subsumes non-temporal mixtures of causal models. Furthermore, we introduce FlowMSM, a regime detection framework that can be paired with any stationary causal discovery method to recover regime-dependent causal structures. Experiments on synthetic benchmarks and a financial economics dataset demonstrate the effectiveness of our approach to detect latent regimes and discover causal structures from non-stationary time series.

17.
arXiv (CS.CV) 2026-06-16

HorusEye: Language as Dynamic Attention for Emergency Visual Analysis

Authors:

We introduce HorusEye, Language as Dynamic Attention for Emergency Visual Analysis. Our investigation followed five stages. The first one is benchmarking RefCOCO-Degraded, a dataset of 15,244 images (3,811 base images x 4 conditions: Clean, Fog, Smoke and Thermal) with systematic visual degradation. Through four research questions, we evaluate multiple VLMs (Gemini, Qwen2-VL, BLIP-2, LLaVA, Kosmos-2) across visual grounding the second stage, language feedback recovery the third one, health VQA tasks the fourth, and hallucination analysis the final stage. Our key finding is that language feedback effectiveness is model-dependent: Gemini achieves +47.3% improvement in thermal conditions through iterative language feedback, while Qwen2-VL shows -5.1% degradation under the same protocol. We also identify the 'Thermal Paradox' where cropping strategies that improve RGB performance catastrophically fail in thermal imagery. Furthermore, BLIP-2 uniquely hallucinates more under degradation, making it unsuitable for emergency deployment

18.
arXiv (CS.LG) 2026-06-16

A spectral audit framework reveals task-dependent aperiodic reliance across EEG and ECG deep learning

arXiv:2606.08583v2 Announce Type: replace Abstract: Deep learning on physiological time series is interpreted through domain-specific features – oscillatory rhythms in EEG, morphological complexes in ECG – yet these signals sit atop a broadband aperiodic 1/f-like envelope that covaries with arousal, age, and pathology. We introduce a spectral audit framework combining aperiodic/periodic decomposition, phase-preserving Fourier interventions, sham controls, and simulation validation. Aperiodic reliance was task-dependent and architecture-general: across six neural architectures, flattening drops exceeded 0.42 balanced-accuracy points for sleep-wake classification, reached 0.07-0.13 for clinical abnormality detection, and remained minimal for motor imagery. Six of seven EEG foundation models showed FDR-significant aperiodic reliance on clinical EEG; age/sex and recording-era controls reduced but did not eliminate the effect. Applying the audit to PTB-XL ECG revealed neural drops of 0.32–0.36 persisting after demographic matching, confirming this confound class extends beyond EEG. Aperiodic controls should become standard for interpretable physiological time-series deep learning.

19.
arXiv (CS.CL) 2026-06-25

Automatic Generation of Highlights for Academic Paper Via Prompt-based Learning

Highlights provide a concise summary of the main contributions of an academic paper and help readers quickly understand its focus. However, many journals do not provide highlights, which limits their use in literature retrieval, text mining, and bibliometric analysis. Existing studies have explored supervised learning methods for automatic highlight extraction, but these methods usually require large amounts of labeled training data. This study investigates prompt-based learning for automatic highlight generation. We design task-specific prompt templates and combine them with paper abstracts as model inputs. Several language models are evaluated, including locally deployed pre-trained models such as GPT-2 and T5, as well as ChatGPT accessed through an API. Experiments on three datasets show that ChatGPT with prompt templates achieves performance comparable to previous supervised methods without using task-specific training samples. When a small number of examples are added to the prompts, the model significantly outperforms state-of-the-art methods on two datasets. We further analyze how prompt design affects generation quality and find that, although ChatGPT has strong language modeling ability, its performance on this task is highly sensitive to the information provided in the prompt. Case studies also show that the generated highlights are generally coherent, informative, and close to author-written highlights. This study is among the first to apply prompt-based learning to academic highlight generation. The proposed method does not rely on domain-specific training corpora and can generate highlights for papers that lack such information, thereby supporting downstream text mining and bibliometric research.

20.
arXiv (CS.AI) 2026-06-25

Reasonable Motion: A General ASP Foundation for Environment Constrained Movement Trajectory Computation

arXiv:2606.25626v1 Announce Type: new Abstract: We present a general answer set programming based hybrid quantitative-qualitative method for computing constrained branching trajectory modes for moving objects in real-world settings. The method performs constrained traversal of an environment graph, enumerating geometrically admissible motion behaviours as stable models, each constituting a distinct trajectory mode characterised by both domain-dependent and independent factors such as derived event sequence, map topology, and domain norms. The hybrid trajectory computation method is generally applicable across motion characteristics typically encountered in diverse dynamic domains with moving objects, e.g., autonomous driving. We demonstrate applicability and highlight how computed trajectories are traceable to their underlying stable model, thereby affording verifiable interpretability that purely learned approaches cannot provide. We also perform an empirical evaluation with Argoverse 2, a large-scale real-world autonomous driving benchmark representative of the class of dynamic domains within the scope of the proposed method.

21.
PLOS Medicine 2026-05-06

Pathways of emergency care for severely ill children in Nigerian and Ugandan hospitals: A process mapping study

Authors:

by Rami Subhi, Abiodun Sogbesan, Dan Muramuzi, Mikael Burhin, Ayobami A. Bakare, Adegoke G. Falade, Freddy E. Kitutu, Freddie Ssengooba, Carina King, Sumit Kane, Belinda Dawson-McClaren, Hamish R. Graham, the MOXY-Implementation Research Collaboration Background Child mortality remains high in countries with weak emergency care systems. Facility organisation for paediatric emergency care is heterogeneous and under-described. We examined how hospitals in Uganda and Nigeria are organised to deliver emergency care for neonates and children. Methods and findings We conducted a qualitative, multi-method study in 26 purposively selected secondary and tertiary facilities in Uganda and Nigeria from October 2023 to December 2024. Embedded researchers documented patient pathways, resources for care, and care processes for severely ill children (

22.
medRxiv (Medicine) 2026-06-22

Discovering Novel intracranial EEG Biomarkers of Seizure Generating Tissue through Time-Frequency Analysis

Objective: EEG biomarkers for seizure-generating tissue have historically been identified visually, which lacks objectivity and limits utility of automated approaches. For example, high frequency oscillations and interictal epileptiform discharges were promising markers to improve surgical outcomes for refractory epilepsy, but low specificity has hindered clinical implementation, and automated algorithms have not improved this. Methods: We developed Intracranial EEG Pattern Identification and Categorization, an automated, data-driven time-frequency framework for EEG biomarker discovery. It detects transient high-power intracranial EEG waveforms (1-500 Hz) and characterizes them using eight features. In seizure-free patients, waveforms occurring predominantly in resected intracranial EEG channels are candidate biomarkers. Results: In retrospective data from 14 seizure-free post-surgical patients from University of California, Los Angeles, we identified 9 waveform categories strongly associated with resected intracranial EEG channels. These included beta, gamma, and ripple band bursts, sometimes co-occurring with interictal epileptiform discharges; however, many were visually imperceptible in the broadband EEG. Using a support vector machine, we generated a unified classification metric based on these waveforms and tested it on 87 seizure-free subjects from Detroit Medical Center. This metric achieved higher area under the precision-recall curve than six state-of-the-art benchmark algorithms (p

23.
medRxiv (Medicine) 2026-06-19

Fine-Tuning SAM2 for Coronary Artery Segmentation in X-Ray Fluoroscopy

Authors:

SAM2 (Meta, 2024) provides a strong starting point for segmentation, but given the unique challenges in medical imaging (noise from patient movement, the projection-based nature of X-ray fluoroscopy, and low contrast between vessels and background), direct application is difficult. We fine-tune MedSAM2 on annotated coronary angiograms and apply it to video data for point-of-care use. On the ARCADE validation set (200 images), the fine-tuned model achieves Dice 0.767 compared to 0.033 zero-shot. On 10 fluoroscopic video studies from CoronaryDominance, it tracks vessels coherently and avoids falsely segmenting ribs, stents, and bypass grafts in 9 of 10 studies. Code is available at https://github.com/elakiyasivakumar/SAM2-Coronary-Angiography-VA and the fine-tuned checkpoint at https://huggingface.co/Elakiya17/CA-SAM2.

24.
arXiv (CS.CV) 2026-06-24

Jolia: Concept-Level Vision-Language Alignment for 3D CT Contrastive Learning

Vision-language contrastive pretraining has become the dominant recipe for 3D medical foundation models, leveraging the large volumes of paired scans and reports produced in clinical practice. However, medical images usually span dozens of organs, and radiological reports are much longer than typical natural image captions and are composed of multiple structured sections. CLIP-style pretraining compresses this structure by encoding each modality into a single global token, at the risk of losing important details. We introduce ConQuer (Concept Queries), an image-text pretraining method that augments CLIP's global alignment with a set of localized alignments, one per concept. ConQuer splits the report into concept-specific sections and learns cross-attention queries that pool the matching image features without using any segmentation mask or spatial supervision. Contrastive learning is then applied independently for each concept. Concepts can be any unit of semantic localization; here, they are anatomical regions, one query per organ or gross body region. As a byproduct, each query learns attention maps focused on its concept, providing built-in spatial interpretability. We use ConQuer to train Jolia, a 3D CT foundation model on chest and abdominal CT. Jolia consistently outperforms a CLIP baseline on findings classification, report generation, and cross-center transfer, and sets a new state of the art across multiple public benchmarks. Jolia's weights will be released upon acceptance.