Academic Intelligence · Curated Daily

Explore the Frontier of Global Academia

AcademicHub aggregates real-time literature from top journals and preprint platforms. Build your personal research radar and let large language models compile cross-disciplinary analysis briefings automatically.

01.
medRxiv (Medicine) 2026-06-23

Post Hoc Localization of Beam F3 Stimulation Targets: An MRI-Derived Geodesic Approach for Refined TMS E-Field Simulations

Background: Transcranial magnetic stimulation (TMS) targeting the left dorsolateral prefrontal cortex (dlPFC) is an established treatment option in major depressive disorder. One of the most common approaches for targeting the dlPFC is the Beam F3 method, which determines the stimulation site (F3Beam) as a function of external cranial measurements. Precise knowledge of the individual stimulation site is essential for imaging-based analyses of TMS effects. However, due to the method's reliance on individual anatomy, retrospective identification of F3Beam targets across cohorts is challenging, limiting the analysis of existing datasets. We developed a scalable method to reconstruct subject-specific F3Beam target locations for e-field simulations based on structural imaging. Methods: High-resolution three-dimensional (3D) T1-weighted MRI was used to generate individual scalp meshes via the ''Simulation of Non-Invasive Brain Stimulation'' (SimNIBS) software. Subject-specific anatomical distances and coordinates of interest were measured geodesically using a Python-based script to reconstruct the individual F3Beam targets. Validation included a retrospective comparison between digital geodesic measurements and manual cranial measurements in 20 patients and a prospective comparison with MR-visible scalp markers in 2 healthy controls. To assess the impact of our targeting algorithm on e-field simulations, volumetric e-field maps based on three potential targets (F3Beam, F3MNI, F3Geo) were generated in SimNIBS and compared using voxel-wise statistics in SPM12. Results: Retrospective analysis revealed a systematic bias towards higher in vivo measurements compared to digital geodesic measurements, though deviations in the final distances determining F3Beam (xBeam and yBeam) were minimal ({Delta}xBeam: 0.11 {+/-} 0.08 cm; {Delta}yBeam: 0.14 {+/-} 0.21 cm). Prospective validation demonstrated that F3Beam coordinates better matched in vivo coil positions than group-template-derived targets (F3MNI). Group-level analysis showed method-dependent clustering of coil positions with corresponding voxel-wise e-field differences. Conclusions: Individualized geodesic measurements may enable accurate, scalable and retrospective identification of Beam F3 targets and coil orientations. This approach may yield more accurate e-field simulations than group-template based targeting and provides a practical method for retrospective analysis of existing TMS treatment cohorts. This could be leveraged to identify response predictors or imaging-based biomarkers of treatment response.

02.
arXiv (CS.AI) 2026-06-24

Invariant Graph Representations for Continuous-Time Dynamic Graphs Under Distribution Shifts

arXiv:2405.19062v2 Announce Type: replace-cross Abstract: Continuous-Time Dynamic Graphs (CTDGs) enable fine-grained modeling of evolving relational systems. However, most existing CTDG representation learning methods are tailored to in-distribution settings and exhibit limited robustness under out-of-distribution (OOD) shifts. Although recent causal approaches learn invariant representations via interventions, they are primarily designed for static or discrete-time graphs and become computationally prohibitive for CTDGs due to the combinatorial explosion of structural and temporal variations. To address these challenges, we propose CIR, a framework grounded in a novel structural causal model termed the ICCM. To avoid exhaustive interventions, we leverage the Normalized Weighted Geometric Mean (NWGM) to efficiently approximate interventional predictions. We further instantiate ICCM within a practical deep learning architecture that jointly captures invariant structural and temporal patterns through dedicated subgraph extractors, and maintains an environment memory bank to model distributional shifts across evolving contexts. Extensive experiments demonstrate that CIR consistently outperforms existing methods under diverse OOD scenarios.

03.
medRxiv (Medicine) 2026-06-15

Iron deficiency testing among people with incident heart failure in primary care

Background: Given around 50% of people with heart failure have a degree of iron deficiency, guidelines recommend screening. It is uncertain to what extent this is done in primary care and whether testing is equitable. Aim: To report the proportion of people with incident heart failure who undergo a ferritin test within 12 months. Design and setting: Retrospective primary care cohort study using Clinical Practice Research Datalink Aurum data, between 2016 and 2021. Methods: We report the proportion of adults with an incident diagnosis of heart failure who received a ferritin test within 12 months. Multivariable logistic regression was used to examine the odds of testing based on key demographic covariates and co-morbidities. Results: Among 105,749 individuals with an incident diagnosis of heart failure (mean age 71.6 years, SD 14.3), only 35,688 (33.7%) received a ferritin test within the subsequent year. Increasing age (odds ratio 1.25 per 10-year increase, 95% CI: 1.24-1.27), female sex (male sex OR 0.86, 0.84-0.89) and Asian ethnicity (OR 1.70, 1.59-1.80) were all associated with increased odds of testing as were diagnoses of coeliac disease (OR 1.86, 1.58-2.21), type 1 diabetes (OR 1.82, 1.51-2.19) and cirrhosis (OR 1.64, 1.43-1.87). There was geographic variation in testing, even in adjusted analyses. Conclusion: In a large primary care dataset, two thirds of people with incident heart failure did not receive a ferritin test for iron deficiency within a year of diagnosis demonstrating a gap in current practice and an opportunity for improvements in service delivery.

04.
medRxiv (Medicine) 2026-06-22

Paired plasma and EV-enriched plasma proteomics reveal nonredundant sepsis-associated host-response signatures in critical illness

Background: Plasma proteomics may identify host-response signatures in sepsis, but it is unclear whether extracellular vesicle (EV)-enriched plasma provides distinct or redundant information compared with plasma. We compared paired plasma and EV-enriched plasma proteomes in critically ill patients with sepsis and critically ill non-sepsis controls (CINS). Methods: In this prospective observational study, paired plasma and EV-enriched plasma samples were analyzed from 56 critically ill adults, including 40 patients with sepsis and 16 CINS patients. Protein abundance was quantified using liquid chromatography-tandem mass spectrometry. Analyses compared proteomic depth, protein overlap, global concordance between compartments, and differential protein abundance between CINS and sepsis. Exploratory Gene Ontology enrichment was performed as a supplementary analysis. Results: EV-enriched plasma expanded proteomic detection, identifying 2,476 filtered proteins compared with 506 in plasma. Only 386 proteins were detected in both compartments, while 2,090 were unique to EV-enriched plasma and 120 were unique to plasma. Among shared proteins, plasma and EV-enriched plasma showed modest global concordance across critically ill patients (Spearman coeff = 0.322, p = 9.19 x 10^-11), with similar findings in sepsis alone. Differential abundance analysis identified 11 sepsis-associated proteins in plasma and 22 in EV-enriched plasma. Only SAA1, SAA2, and IGFBP6 were significant in both compartments. Exploratory pathway analysis supported acute-phase and inflammatory enrichment in plasma sepsis-associated proteins, while EV-enriched signals were directionally plausible but did not meet prespecified FDR thresholds. Conclusion: Plasma and EV-enriched plasma proteomics capture related but nonredundant sepsis-associated host-response information in critically ill patients.

05.
arXiv (CS.AI) 2026-06-15

From Sorting Algorithms to Scalable Kernels: Bayesian Optimization in High-Dimensional Permutation Spaces

arXiv:2507.13263v4 Announce Type: replace-cross Abstract: Bayesian Optimization (BO) is a powerful tool for black-box optimization, but its application to high-dimensional permutation spaces is severely limited by the challenge of defining scalable representations. The current state-of-the-art BO approach for permutation spaces relies on an exhaustive $\Omega(n^2)$ pairwise comparison, inducing a dense representation that is impractical for large-scale permutations. To break this barrier, we introduce a novel framework for generating efficient permutation representations via kernel functions derived from sorting algorithms. Within this framework, the Mallows kernel can be viewed as a special instance derived from enumeration sort. Further, we introduce the Merge Kernel , which leverages the divide-and-conquer structure of merge sort to produce a compact, $\Theta(n\log n)$ to achieve the lowest possible complexity with no information loss and effectively capture permutation structure. Our central thesis is that the Merge Kernel performs competitively with the Mallows kernel in low-dimensional settings, but significantly outperforms it in both optimization performance and computational efficiency as the dimension $n$ grows. Extensive evaluations on various permutation optimization benchmarks confirm our hypothesis, demonstrating that the Merge Kernel provides a scalable and more effective solution for Bayesian optimization in high-dimensional permutation spaces, thereby unlocking the potential for tackling previously intractable problems such as large-scale feature ordering and combinatorial neural architecture search.

06.
arXiv (quant-ph) 2026-06-12

Efficient certification of intractable quantum states with few Pauli measurements

arXiv:2511.07300v2 Announce Type: replace Abstract: Efficient verification of quantum computational resources is crucial as experiments advance toward fault-tolerance. Universal quantum computation can be achieved by consuming resource states through simple Pauli measurements, yet a significant gap remains between states that are easy to certify and those required for universality. We focus on Clifford-enhanced Product States, a class of resource states obtained by applying Clifford circuits to a product of single-qubit, potentially magic, states. While essential for universal computation, the certification of such states has previously relied on query oracles that are \#P-hard to implement, leaving their efficient, oracle-free verification an open challenge. In this work, we demonstrate that such classically intractable resource states can be efficiently verified using only Pauli measurements. Our protocol achieves sample- and time-efficiency in both i.i.d.\ and adversarial settings. This work fills a gap in Pauli-based certification, providing a new practical pathway to verify resource states that drive universal Pauli-based quantum computation.

07.
arXiv (CS.LG) 2026-06-17

Loss Landscape Poisoning: Targeted Extraction of Unseen Training Data from LLMs

arXiv:2606.17110v1 Announce Type: cross Abstract: Large Language Models are increasingly trained on proprietary or sensitive data, from private healthcare and financial records to user conversations containing secrets. Ensuring the privacy of such data against extraction attacks has become a central concern. In this paper, we ask whether an attacker who can poison a portion of the training data can facilitate the leakage of a separate target record they have no access to. We answer in the affirmative and show that such leakage can be induced by a poisoning mechanism that reshapes the model's local loss landscape around the target completion. Our key insight is that poisoning to create a sharp loss minimum at the target, surrounded by elevated loss on nearby alternatives, forces the model to memorize the target as the unique low-loss solution in its neighborhood. The attack requires no architectural changes, and generalizes across centralized and federated learning settings. We demonstrate that the attack amplifies privacy leakage across language (up to 100% successful extraction), and vision-language models (up 90% successful extraction). We show that the attack is thwarted when the model is trained to be differentially private. However, we introduce a new attack that directly probes the loss landscape bypassing even differential privacy defenses.

08.
arXiv (CS.LG) 2026-06-12

SMGFM: Spectral Multimodal Graph Pretraining for Multimodal-Attributed Graphs

arXiv:2606.12867v1 Announce Type: new Abstract: Multimodal-attributed graphs (MAGs) couple graph topology with node semantics from text, images, and other modalities. Traditional graph learning contextualizes node semantics by coupling topology with node features. However, this coupling design becomes troublesome in MAGs, where structure-induced and modality-intrinsic semantics may contribute differently to downstream tasks. Structure-induced semantics promote relational consistency through smooth topological variation, whereas modality-intrinsic semantics often encode local, fine-grained distinctions that should not be uniformly smoothed or aligned. Therefore, the key challenge is to identify semantic roles before cross-modal fusion. To this end, we leverage graph-frequency variation as a prior, where low-frequency components capture topology-consistent semantics and high-frequency components preserve modality-specific semantics. Based on this intuition, we propose SMGFM, a spectral multimodal graph pretraining framework that decomposes each modality-specific node signal into graph-frequency bands and assigns band-level semantic roles before cross-modal interaction. Concretely, SMGFM constructs frequency-resolved modality tokens with scalable Chebyshev filters, estimates their coupling reliability through topology-conditioned routing, and performs band-modality interaction before fusion. Its frequency-routed objectives align smooth consensus routes while preserving modality-specific routes, mitigating spatial-domain entanglement and uniform cross-modal alignment. Extensive experiments conducted on the MAG datasets demonstrate that SMGFM achieves state-of-the-art performance across graph-level and modality-level tasks.

09.
arXiv (CS.CV) 2026-06-18

Physics in 2-Steps: Locking Motion Priors Before Visual Refinement Erases Them

Image-to-Video diffusion models leverage input images to generate visually stunning content, yet frequently produce motion that violates physical laws. We reveal a surprising finding: a 2-step generation often exhibits better physical consistency than a 50-step output from the same model. Through spectral analysis, we trace this to phase erosion during denoising; the phase degrades significantly (dropping by $\approx 18\%$ from step 2 to step 50), whereas the magnitude remains relatively stable. Building on this insight, we propose PhaseLock, a training-free framework that preserves the valid motion priors from few-step inference throughout the denoising trajectory. Rather than relying on full-step inference for physical consistency, PhaseLock extracts a motion prior from just 2 steps and enforces it onto high-fidelity generation via Latent Delta Guidance. Our approach effectively mitigates phase degradation, improving physical consistency by an average of 6.2 points across diverse models while largely maintaining visual fidelity, with negligible overhead ($1.06\times$ time, $1.02\times$ memory) and reduced reliance on expensive external guidance methods ($\sim5\times$ time). Project Page: https://dnwjddl.github.io/phaselock

10.
arXiv (CS.LG) 2026-06-17

MGUP: A Momentum-Gradient Alignment Update Policy for Stochastic Optimization

arXiv:2606.17526v1 Announce Type: new Abstract: Efficient optimization is essential for training large language models. Although intra-layer selective updates have been explored, a general mechanism that enables fine-grained control while ensuring convergence guarantees is still lacking. To bridge this gap, we propose MGUP, a novel mechanism for selective updates. MGUP augments standard momentum-based optimizers by applying larger step-sizes to a selected fixed proportion of parameters in each iteration, while applying smaller, non-zero step-sizes to the rest. As a nearly {plug-and-play} module, MGUP seamlessly integrates with optimizers such as AdamW, Lion, and Muon. This yields powerful variants such as MGUP-AdamW, MGUP-Lion, and MGUP-Muon. Under standard assumptions, we provide theoretical convergence guarantees for MGUP-AdamW (without weight decay) in stochastic optimization. Extensive experiments across diverse tasks, including MAE pretraining, LLM pretraining, and downstream fine-tuning, demonstrate that our MGUP-enhanced optimizers achieve superior or more stable performance compared to their original base optimizers. We offer a principled, versatile, and theoretically grounded strategy for efficient intra-layer selective updates, accelerating and stabilizing the training of large-scale models. The code is publicly available at https://github.com/MaeChd/MGUP.

11.
arXiv (CS.CL) 2026-06-24

SURGELLM: Rethinking Multi-Task Evaluation through Task-Aware Feature Gating with Class-Balanced Normalization

Fine-tuned encoders deployed across heterogeneous NLP tasks face three compounding problems: mismatched inductive biases, class-imbalance corruption of feature statistics, and no mechanism to condition attention on external lexical knowledge. We introduce \surgellm, a unified transformer framework that addresses each with a dedicated lightweight module: a surgical feature gate (learned per-dimension sigmoid over curated lexical indicators and \texttt{[CLS]}; provably degenerates to identity when features are uninformative), task-conditioned prefix tokens (quantized feature values and task identity prepended to every input), and Instance-Weighted Normalization (IWN; removes class-prior bias from gate statistics). We prove an excess-risk bound linking gate benefit to surgical feature alignment. Across four tasks, SST-2, multi-hop retrieval, LLM-prompt attribution, and authorship detection, covering 17,830 examples and eleven model variants over three seeds, the IWN variant achieves macro-F1 0.940 ($+0.036$ over the strongest non-IWN baseline; $+0.130$ on authorship detection). A random-vocabulary control ($-0.028$ avg.\ F1) confirms gains are lexical, not parametric. Code, vocabularies, and a $99.5\%$-recovery auto-extraction recipe are released.

12.
arXiv (CS.AI) 2026-06-11

Architecture-Aware Reinforcement Learning Makes Sliding-Window Attention Competitive in Math Reasoning

arXiv:2606.11634v1 Announce Type: new Abstract: The rapid progress of reasoning and agentic large language models (LLMs) has increased the demand for long-context inference, but self-attention (SA) scales quadratically with context length. To address this, we study SWARR (Sliding-Window Attention with Reinforced Adaptation for Math Reasoning), a practical recipe for adapting SWA models to mathematical reasoning. SWARR has two stages: (1) efficient conversion from a pretrained SA model to SWA with supervised fine-tuning (SFT), which avoids pretraining a new base model, and (2) policy adaptation with reinforcement learning (RL). We find that SWA still underperforms SA after SFT, and we hypothesize that this gap is caused in part by a data-architecture mismatch: most SFT data are prepared for SA models and may contain long-range dependencies that are difficult for SWA to model. Because on-policy RL optimizes self-generated trajectories under the SWA constraint, it can adapt trajectories to better match SWA. Experiments on mathematical reasoning benchmarks show that this recipe substantially narrows the gap between SWA and SA, recovering much of the accuracy lost during SWA conversion while preserving the efficiency benefits of linear-complexity attention. Our central contribution is the empirical finding that RL changes the conclusion one would draw from conversion and SFT alone about SWA's viability for math reasoning.

13.
arXiv (CS.AI) 2026-06-18

SciRisk-Bench: A Risk-Dimension-Aware Benchmark for AI4Science Safety

arXiv:2606.18936v1 Announce Type: new Abstract: Large language models (LLMs) are increasingly embedded in AI for Science (AI4Science) workflows, from scientific question answering and literature analysis to laboratory planning and autonomous discovery. This progress creates an urgent need for safety benchmarks that evaluate not only scientific competence, but also whether models recognize and avoid risks in high-stakes scientific contexts. Existing AI4Science safety datasets cover several disciplines and task formats, leaving the underlying risk dimensions underspecified. We introduce SciRisk-Bench, a benchmark designed to evaluate AI4Science safety from two complementary perspectives: explicit risk dimensions and scientific disciplines. SciRisk-Bench covers 7 disciplines, 31 subdisciplines and 10 risk dimensions. In the experimental section, we evaluate both mainstream LLMs and science-oriented LLMs across risk dimensions, disciplines, and sub-disciplines, enabling fine-grained diagnosis of where scientific models remain unsafe.

14.
arXiv (CS.AI) 2026-06-16

Epileptic Seizure Detection in Separate Frequency Bands Using Feature Analysis and Graph Convolutional Neural Network (GCN) from Electroencephalogram (EEG) Signals

arXiv:2604.00163v2 Announce Type: replace-cross Abstract: Epileptic seizures are neurological disorders characterized by abnormal and excessive electrical activity in the brain, resulting in recurrent seizure events. Electroencephalogram (EEG) signals are widely used for seizure diagnosis due to their ability to capture temporal and spatial neural dynamics. While recent deep learning methods have achieved high detection accuracy, they often lack interpretability and neurophysiological relevance. This study presents a frequency-aware framework for epileptic seizure detection based on ictal-phase EEG analysis. The raw EEG signals are decomposed into five frequency bands (delta, theta, alpha, lower beta, and higher beta), and eleven discriminative features are extracted from each band. A graph convolutional neural network (GCN) is then employed to model spatial dependencies among EEG electrodes, represented as graph nodes. Experiments on the CHB-MIT scalp EEG dataset demonstrate high detection performance, achieving accuracies of 97.1%, 97.13%, 99.5%, 99.7%, and 51.4% across the respective frequency bands, with an overall broadband accuracy of 99.01%. The results highlight the strong discriminative capability of mid-frequency bands and reveal frequency-specific seizure patterns. The proposed approach improves interpretability and diagnostic precision compared to conventional broadband EEG-based methods.

15.
arXiv (quant-ph) 2026-06-11

Power-law-graded Ising Interactions Stabilize Time Crystals Realizing Quantum Energy Storage and Sensing

arXiv:2508.14847v3 Announce Type: replace Abstract: We study discrete time-crystalline (DTC) phases in one-dimensional spin-1/2 chains with power-law-graded Ising interactions under periodic Floquet driving. By generalizing Stark localization to power-law-graded Ising interaction profiles, we identify robust period-doubled dynamics across a wide range of interaction exponents, stabilized by the interplay between coherent driving and spatially varying coupling. Within the DTC phase, the energy stored in the system, interpreted as a quantum battery, increases superlinearly with system size, although no scaling advantage persists in normalized power. Beyond energy storage, we demonstrate that the DTC phase supports enhanced quantum sensing. The quantum Fisher information associated with estimating timing deviations in the drive scales superextensively with system size, surpassing the Heisenberg limit. The degree of quantum advantage can be tuned by varying the interaction exponent, though DTC behavior remains robust throughout. Our results position power-law-graded Ising interacting Floquet systems as robust platforms for storing quantum energy and achieving metrological enhancement.

16.
arXiv (CS.AI) 2026-06-24

Polycepta: Object-Centric Appearance Estimation for Multi-Object Tracking

arXiv:2606.23604v2 Announce Type: replace-cross Abstract: The tracking-by-detection paradigm in multi-object tracking (MOT) typically relies on static appearance descriptors to complement motion estimation. However, these descriptors are frame-independent, limiting their robustness as visual cues. Since such descriptors are often obtained from computationally intensive pretrained backbones, real-time MOT systems frequently abandon appearance cues altogether and rely solely on motion prediction and geometric association. In this work, we introduce Polycepta, an object-centric appearance state estimation framework that reformulates appearance modeling as a recursive estimation problem rather than a frame-wise matching task. Polycepta constructs and continuously updates an independent appearance state for each tracked object, enabling future appearance representations to be estimated from accumulated observations. Polycepta is encouraged to learn the appearance-state construction of object-specific representations rather than memorize them through a proposed learning strategy, enabling appearance estimation for unseen classes. A key property of Polycepta is that the quality of appearance estimation improves as object states evolve during inference. While conventional appearance descriptors remain static or degrade over time, Polycepta progressively refines appearance estimates as additional observations are accumulated. Extensive experiments on KITTI, the Waymo Open Dataset, and MOT17 demonstrate consistent reductions in identity switches and improvements in tracking performance when integrated into the tracking-by-detection pipelines. Polycepta operates at 90.57 Hz and delivers state-of-the-art performance on the KITTI benchmark when integrated into the RobMOT framework, achieving a MOTA of 92.27\%.

17.
PLOS Medicine 2026-06-04

Comparative impacts and cost-effectiveness of tuberculosis systematic screening strategies in prisons in Brazil, Colombia, and Peru: A mathematical modeling study

Authors:

by Yiran E. Liu, José Victor Bortolotto Bampi, Ronan F. Arthur, Argita D. Salindri, Caroline Busatto, Pedro Avedillo Jiménez, Daniele Maria Pelissari, Fernanda Dockhorn Costa Johansen, Robert Arana-Narvaez, Alvaro Fernando Moreno Roca, Wilfredo Santos Solís Tupes, Esther Mori Jiu, Christian Alfredo Moreno Roca, Erika Albertina Abregú Contreras, Valentina Antonieta Alarcón Guizado, Julián Trujillo Trujillo, Belkys Marcelino, Mónica Alonso Gonzalez, Mayra Cecilia Córdova Ayllon, Ted Cohen, Moises A. Huaman, Jeremy D. Goldhaber-Fiebert, Julio Croda, Jason R. Andrews Background Incarceration is a leading driver of tuberculosis in Latin America. Systematic screening in prisons may reduce tuberculosis burden, but optimal strategies and cost-effectiveness remain uncertain. We examined the population-wide health impacts and cost-effectiveness of systematic screening in prisons in Brazil, Colombia, and Peru, comparing different timepoints, frequencies, and screening algorithms. Methods and findings Using dynamic transmission models calibrated to Brazil, Colombia, and Peru, we simulated annual or biannual (twice-yearly) prison-wide screening, alone or combined with entry and exit screening from 2026 to 2035. We evaluated four algorithms: (1) symptom screening, (2) chest X-ray with computer-aided detection (CXR-CAD), (3) symptoms and CXR-CAD (follow-up testing if either is positive), and (4) GeneXpert Ultra (Xpert) with pooled sputum. Individuals screening positive then received individual Xpert. We projected impacts on within-prison and population-level tuberculosis incidence in 2035, along with discounted costs (2023 US dollars) and disability-adjusted life years (DALYs). Model projections showed that combined entry, exit, and biannual screening with CXR-CAD was highly impactful and cost-effective across countries, reducing tuberculosis incidence by 61%–87% in prisons and 18%–28% population-wide. Compared to only biannual CXR-CAD (the next best strategy), the incremental cost per DALY averted of adding entry and exit screening was $2,984 (Brazil), $2,925 (Colombia), and $645 (Peru). Adding symptom screening to CXR-CAD marginally increased benefit and was only cost-effective in Peru’s higher-incidence prisons. Biannual screening alone remained cost-effective at prison incidence levels well below national averages, as well as at far lower willingness-to-pay thresholds. In settings without CXR-CAD, pooled Xpert was an impactful, cost-effective alternative. Key limitations include the model’s simplified representation of tuberculosis disease states and lack of stratification by age, gender/sex, HIV, or drug resistance. Conclusions These modeling results support immediate national-level adoption of prison-wide tuberculosis screening twice-yearly and at entry and exit, using CXR-CAD or pooled Xpert.

18.
arXiv (CS.AI) 2026-06-11

Grounding Computer Use Agents on Human Demonstrations

arXiv:2511.07332v2 Announce Type: replace-cross Abstract: Building reliable computer-use agents requires grounding: accurately connecting natural language instructions to the correct on-screen elements. While large datasets exist for web and mobile interactions, high-quality resources for desktop environments are limited. To address this gap, we introduce GroundCUA, a large-scale desktop grounding dataset built from expert human demonstrations. It covers 87 applications across 12 categories and includes 56K screenshots, with every on-screen element carefully annotated for a total of over 3.56M human-verified annotations. From these demonstrations, we generate diverse instructions that capture a wide range of real-world tasks, providing high-quality data for model training. Using GroundCUA, we develop the GroundNext family of models that map instructions to their target UI elements. At both 3B and 7B scales, GroundNext achieves state-of-the-art results across five benchmarks using supervised fine-tuning, while requiring less than one-tenth the training data of prior work. Reinforcement learning post-training further improves performance, and when evaluated in an agentic setting on the OSWorld benchmark using o3 as planner, GroundNext attains comparable or superior results to models trained with substantially more data,. These results demonstrate the critical role of high-quality, expert-driven datasets in advancing general-purpose computer-use agents.

19.
arXiv (CS.CV) 2026-06-18

OneCanvas: 3D Scene Understanding via Panoramic Reprojection

Existing approaches to 3D scene understanding in Vision-Language Models (VLMs) either rely on complex, model-specific geometry encoders or large training budgets in pursuit of spatial reasoning. Instead, OneCanvas aggregates patch features from all views onto a single equirectangular panoramic canvas. Namely, each patch is unprojected to a 3D world coordinate using its depth and camera pose, then placed on the canvas at the continuous longitude and latitude of that point as seen from the canvas origin, with no rasterization or aggregation across overlapping views. A 3D position embedding of the patch's metric coordinates is added to its feature, restoring the depth lost when collapsing the world position to an angular canvas coordinate. Patches from all frames thus share one spatial coordinate system with no fusion or major architectural modifications of the backbone. The pretrained VLM consumes this representation as if it were an ordinary image. Because the canvas can be centered on any pose of interest, the same representation directly supports situated reasoning from a specific viewpoint, a common requirement in robotics and embodied AI. Thanks to this representation, we can also introduce a spatial pretraining curriculum: by procedurally placing patch features of objects, drawn from real images, at chosen 3D world positions on an otherwise empty canvas, we generate on-the-fly supervision spanning a broad range of spatial reasoning tasks, with answer distributions controlled to reduce spatial reasoning shortcuts. OneCanvas achieves state-of-the-art accuracy on SQA3D and VSI-Bench, and generalizes to out-of-distribution data on SPBench, using an order of magnitude less training compute than the strongest competing methods.

20.
medRxiv (Medicine) 2026-06-15

Filum Terminale Diameter on Routine Pediatric MRI: A Large-Cohort Clinical Reference in 3,406 Children and the Age-Dependent Meaning of the 2-mm Thickened-Filum Threshold

Background. A filum diameter >2 mm is the conventional MRI threshold for a thickened filum, but it derives from small, mostly adult series showing no age dependence; whether one cutoff suits all of childhood is untested. Objective. To build an age-specific filum-diameter reference on routine pediatric MRI and test, adjusting for image resolution, whether the 2-mm threshold is age-stationary. Materials and methods. In this retrospective study an nnU-Net tracer measured the maximal filum diameter on consecutive lumbosacral MRI; versus manual tracing it showed negligible bias but moderate single-measure agreement. After excluding report-confirmed fatty filum, lipoma, or tethered cord, the proportion >2 mm was analysed within one acquisition protocol and by logistic regression adjusting for voxel size and slice thickness. Results. Of 7,245 examinations, 3,869 (53%) were traceable; untraced ones were younger (median 0.75 vs 2.0 years). The presumed-normal cohort had median diameter 1.48 mm. At matched resolution, 2 mm marked the 94th percentile in infants (5.6% exceeded it) but the 83rd by 3-6 years (17.4%); the age effect persisted after adjusting for voxel size and slice thickness (3-6 years vs infants, adjusted OR 4.7; P < .001). Conclusion. Filum diameter clusters near 1.5 mm, and the fixed 2-mm cutoff flags ~5% of infants but ~17% of preschoolers. Caliber should be judged against an age-specific clinical reference, not one fixed cutoff; a thick filum is not itself a diagnosis of tethered cord.

21.
arXiv (CS.CL) 2026-06-11

Language Shapes Mental Health Evaluations in Large Language Models

Multilingual large language models (LLMs) are increasingly used in socially sensitive mental health contexts, including support chatbots, screening, and content moderation. This raises a reliability question: do semantically equivalent mental health inputs elicit comparable evaluations across languages, or systematic shifts consistent with language-associated social and cultural contexts? We examine this question in an English-Chinese setting with GPT-4o and Qwen3-32B using a two-level framework: construct-level evaluative orientation, measured by psychometric stigma instruments, and decision-level behavior, measured by binary stigma detection and four-class depression severity classification. Across instruments and models, Chinese prompts elicit higher stigma-related scores than English prompts. At the decision level, Chinese prompts reduce sensitivity to stigmatizing content and produce more conservative depression severity judgments, leading to more under-estimation errors. These findings show that prompt language can shift both evaluative orientation and downstream behavior in LLM-based mental health evaluation. They highlight the need to evaluate multilingual LLMs not only for aggregate performance, but also for whether they apply comparable evaluative standards across languages in socially sensitive domains.

22.
arXiv (CS.LG) 2026-06-16

Petrov-Galerkin Variational Physics-Informed Neural Network Framework for Two-Dimensional Singularly Perturbed Problems

arXiv:2606.16510v1 Announce Type: cross Abstract: This study proposes a Petrov-Galerkin based Variational Physics-Informed Neural Network (VPINN) for efficiently solving two-dimensional singularly perturbed problems (SPPs) with one and two small perturbation parameters. The approach employs neural networks to construct the trial solution space, while tensor-product hat functions are adopted as test functions to enforce the variational form. To accurately resolve of sharp boundary layers, the variational form is implemented using a Petrov-Galerkin formulation. Dirichlet boundary conditions are imposed directly, while the source terms are computed using automatic differentiation. Computational experiments on standard two-dimensional problems demonstrate that the proposed method achieves high accuracy in both the maximum and L_2 norms. These results confirm the efficiency and robustness of the Petrov-Galerkin VPINN approach in accurately capturing the multiscale features of two-dimensional SPPs.

23.
arXiv (CS.CL) 2026-06-24

When Retrieval Metrics Mislead: Measuring Policy Signal in Long-Horizon Tool-Use Agents

Exact-match retrieval recall is often used as a proxy for whether a retriever supplies useful policy context to a downstream decision model. We test this proxy for pre-action policy classification in tau-bench using Qwen2.5-3B/7B classifiers. Under gold-policy conditioning, a compact structured state improves macro-F1 over raw trajectories by 0.13-0.17 after tuning. We then replace the benchmark-designated policy clause with the top-ranked clause retrieved from decision-time context. Although the exact governing clause is retrieved at rank 1 for only 7% of airline states, the primary 3B classifier obtains macro-F1 0.58 with retrieved clauses versus 0.60 with gold clauses (Delta=-0.02, task-cluster 95% CI [-0.23,+0.21]); mismatched-policy and no-policy controls score 0.32 and 0.21. We do not detect a macro-F1 difference between retrieved and gold clauses in this configuration, although the interval remains too wide to establish non-inferiority. The same qualitative pattern appears with a second retriever and at 7B, while varying across fine-tuning configurations. These results indicate that exact-match clause recall can underestimate downstream policy utility in this benchmark setting, motivating evaluation with retrieved policies in the classification loop rather than recall alone.

24.
medRxiv (Medicine) 2026-06-15

Wellbeing After Stroke-2 (WAterS-2): a feasibility study with process evaluation exploring inclusive, accessible, online psychological support after stroke

Objectives: Explore feasibility and acceptability of upskilling a workforce to deliver a co-developed intervention, based on Acceptance and Commitment Therapy (ACT), to support psychological adjustment post-stroke targeting underserved groups. Design: Multi-site, single-arm feasibility study with embedded mixed-methods process evaluation (ISRCTN17628580). Setting: Four NHS community stroke services across England. Participants: 1. Stroke survivors [&ge;]18 years of age, [&ge;]4 months post-stroke, reporting psychological difficulties adjusting to stroke, able to consent and access remote group sessions in English; 2. Group facilitators from NHS stroke services, not ACT specialists. Intervention: WAterS-2: an eight-session, remotely-delivered ACT-informed group intervention. Outcome measures: Recruitment, fidelity, safety, acceptability and perceived value were assessed using fidelity checklists, post-intervention surveys and semi-structured interviews with stroke survivors and facilitators. Clinical outcomes including mood (HADS), wellbeing (ONS4), psychological flexibility (AAQ-ABI), measured post-group and three-months later. Results: Nineteen stroke survivors recruited (mean 9.6 months post-stroke; n=5 (26%) minoritised ethnicities; n=10 (52%) with aphasia). Thirteen facilitators - including two peer support workers - delivered the intervention with fidelity following structured training across four services. Drop-out was low (2/19; 11%); with 15 (79%) attending [&ge;]5/8 sessions. Remote data collection was feasible (79% follow-up completion), with no adverse events recorded. Acceptability was high: survivors valued peer connection, grounding and mindfulness practices. ACT metaphors were helpful for some but challenging for others, including some with aphasia. Online delivery was suitable but limited informal connection. Facilitators reported increased capability, incorporating ACT skills into routine care. NHS workforce pressures and geographically-constrained referral pathways limited recruitment reach. Conclusions: WAterS-2 is feasible, safe, acceptable and inclusive. A mixed workforce, including NHS peer support workers, can be upskilled to deliver with fidelity. Inclusion of underserved groups is achievable but requires active strategies beyond standard NHS referral routes. Findings inform a provisional logic model and a future pragmatic trial.

25.
arXiv (CS.CV) 2026-06-16

Sub-Semantic Image Segmentation

Images can be segmented based on visual cues (i.e., texture segmentation) or into objects (i.e., semantic segmentation). We propose a new category of sub-semantic image segmentation that blurs the line between the two. In sub-semantic image segmentation, language is not used to name whole objects. Instead, it is used to partition an image into stable appearance patterns that can be described by language. To do that, we couple a general-purpose vision-language model to SAM 3, a promptable segmentation backbone whose native text pathway can ground rich descriptions into masks. Simple coupling fails for a number of reasons that we identify in the paper, and we overcome them by introducing DETECTURE that resolves three concrete failure modes – language leakage between texture regions, prompt competition inside the segmentation backbone, and semantic distortion at the language-to-mask interface. Since there is no dataset of sub-semantic image segmentation, we introduce one, termed TextureADE. The new dataset is derived from the ADE20K dataset using a system we designed. We compare DETECTURE to a number of baselines and find that it achieves the strongest performance on several datasets using different metrics. Code is available at https://github.com/Scientific-Computing-Lab/TextureDetecture.