Academic Intelligence · Curated Daily

Explore the Frontier of Global Academia

AcademicHub aggregates real-time literature from top journals and preprint platforms. Build your personal research radar and let large language models compile cross-disciplinary analysis briefings automatically.

01.
arXiv (CS.AI) 2026-06-16

Beyond Correctness: Enhancing Architectural Reasoning in Code LLMs via Scalable Labeling with Agentic Judgment

arXiv:2606.14948v1 Announce Type: cross Abstract: LLMs have substantially improved software engineering yet real-world development requires architectural understanding. Such understanding is prohibitively expensive to label manually and impossible to verify through tests alone. We propose an agentic judging pipeline using a strong LLM as a scalable proxy for expert architectural evaluation, comprising two judges: the Architecture Complexity Judge (ACJ), which estimates codebase-specific architectural understanding a task demands, and the Architecture Quality Judge (AQJ), which evaluates patch conformance to repository-specific architectural conventions via source-grounded rubrics. Fine-tuning Qwen3-8B/14B/32B on 3,360 curated instances achieves resolved rates of up to 27.2% on SWE-bench Verified - up to 540% over the base model and 256% over unfiltered fine-tuning. Meanwhile, the trained models achieve strong cross-language generalization and consistent improvements in architectural patch quality.

02.
arXiv (CS.LG) 2026-06-16

Deep Learning-Based Lunar Crater Terrain Relative Navigation

arXiv:2606.14776v1 Announce Type: cross Abstract: Accurate position estimation is crucial for the successful implementation of future lunar landings using autonomous vehicles, especially in dangerous environments with sparse terrain features. In this paper, we propose a terrain relative navigation (TRN) algorithm combining our deep-learning crater detector, which was designed specifically for the NASA Crater Detection Challenge problem, and an Extended Kalman Filter (EKF). Our detector analyzes crater features from the monocular images acquired from orbit, and their matches with craters from a global database are identified via a Hungarian assignment approach followed by the consensus-based outliers removal method. The estimated measurements are then used to refine an EKF, where spacecraft pose estimation in the Lunar-Centered Lunar-Fixed (LCLF) frame of reference, augmented with altitude aiding information, constrains radial drift. The simulation results indicate that even if the spacecraft is off from its actual location up to 5 km, TRN could recover from this situation, achieving navigation error reduction to a few hundred meters. It should be noted that in order to maintain crater feature correspondences, it is important to match the image resolution and the scales within the scene to the detector training set distribution.

03.
medRxiv (Medicine) 2026-06-24

Automated Text Message Outreach to Increase Diabetes Screening: A Pragmatic Randomized Trial

Background Despite evidence that early intervention can prevent or delay progression to type 2 diabetes, more than 80% of individuals with prediabetes in the United States remain undiagnosed, underscoring the need for scalable strategies to increase uptake. In this study, we evaluated whether a single text message could increase completion of HbA1c-based diabetes screening in routine clinical practice. Methods We conducted a pragmatic randomized controlled trial within Duke University Health System (DUHS). Patients aged 35 years or older who met American Diabetes Association 2022 screening criteria, had no previous diagnosis of diabetes, had not undergone HbA1c testing within the preceding 3 years, and had opted to receive text messages from DUHS were randomly assigned to receive either a single text message encouraging guideline-based diabetes screening and discussion with a primary care provider (intervention group; n=55,494) or usual care (control group; n=5,748). The primary outcome was HbA1c test completion within 24 weeks following message delivery (or no message for controls), analyzed using a Cox proportional hazards model stratified by wave. Secondary outcomes included piecewise hazard ratios for early (weeks 1-4), mid (weeks 5-12), and late (weeks 13-24) intervals and the between-group difference in cumulative testing rate. Findings Text message outreach significantly increased HbA1c test completion over 24 weeks (HR, 1.18 [95% CI, 1.07-1.03]) with the strongest effect in the first four weeks (HR, 1.48 [95% CI, 1.18-1.86]). By the end of the 24-week observation period, cumulative testing reached 9.14% in the messaged group vs 7.83% in controls (between-group difference, 1.31% [95% CI, 0.59-2.07]), corresponding to one additional HbA1c test per 76 messages delivered ($0.51 in messaging costs per additional HbA1c test performed). Rates of prediabetes and diabetes among those screened were similar between groups, indicating no selection bias of higher-risk patients. One additional dysglycemia case was identified per 213 messages sent ($1.43 per case detected).

04.
arXiv (CS.LG) 2026-06-11

Projected random forests and conformal prediction of circular data

arXiv:2410.24145v3 Announce Type: replace-cross Abstract: We apply conformal prediction techniques to regression problems with circular responses, producing prediction sets with adaptive arc length and finite-sample coverage guarantees for any circular predictive model under the assumption of data exchangeability. Leveraging the high performance of existing predictive models designed for linear responses, we analyze a general projection procedure that converts any linear-response regression model into one suitable for circular responses. When random forests are used as base models in this projection procedure, we leverage the random forest out-of-bag mechanism to eliminate the need for a separate calibration sample in the construction of prediction sets. On synthetic and real datasets, the resulting projected random forest model produces more efficient out-of-bag conformal prediction sets, with shorter median arc length, than the split conformal prediction sets generated by two existing alternative models.

05.
arXiv (quant-ph) 2026-06-24

Reachability and optimal-time certificates for quantum control

arXiv:2606.24645v1 Announce Type: new Abstract: Finite-time control is central to quantum technologies, yet rigorous limits on reachable targets and optimal control times remain largely unknown. We develop a framework for finite-time reachability and optimal-time certificates in constrained quantum control based on moment relaxations with implicitly time-dependent differential constraints. For fixed control horizons and control constraints, the method yields rigorous upper bounds on achievable terminal fidelities, lower bounds on the optimal control times required to reach them, and certificate gaps for benchmarking explicit control pulses. We demonstrate the versatility of our framework in three use cases: entangled-state preparation in two and three qubits, one-qubit gate synthesis across different control geometries, and excitation transfer in an $N$-qubit $XX$ chain. Our work establishes differential moment hierarchies as a practical tool for certifying reachability limits and optimal control times in quantum control, providing hardware-aware quantum speed limits while highlighting structure exploitation as a key ingredient for scalable certification.

06.
arXiv (CS.LG) 2026-06-11

A Judge-Aware Ranking Framework for Evaluating Large Language Models without Ground Truth

arXiv:2601.21817v3 Announce Type: replace-cross Abstract: Evaluating large language models (LLMs) on open-ended tasks without ground-truth labels is increasingly done via the LLM-as-a-judge paradigm. A critical but under-modeled issue is that judge LLMs differ substantially in reliability; treating all judges equally can yield biased leaderboards and misleading uncertainty estimates. More data can make evaluation more confidently wrong under misspecified aggregation. We propose a judge-aware ranking framework that extends the Bradley-Terry-Luce model by introducing judge-specific discrimination parameters, jointly estimating latent model quality and judge reliability from pairwise comparisons without reference labels. We establish identifiability up to natural normalizations and prove consistency and asymptotic normality of the maximum likelihood estimator, enabling confidence intervals for score differences and rank comparisons. Across multiple public benchmarks and a newly collected dataset, our method improves agreement with human preferences, achieves higher data efficiency than unweighted baselines, and produces calibrated uncertainty quantification for LLM rankings.

07.
arXiv (CS.AI) 2026-06-18

Optimizing Lithium Production Decisions under Geological, Demand, and Pricing Uncertainties: A POMDP Framework for Multi-Objective Decision Making

arXiv:2606.18598v1 Announce Type: new Abstract: Decision making in lithium production is challenging, whether from an investor's perspective or a strategic production standpoint. Determining which mines to open and when to open them involves not only geological and price uncertainties, but also complexities around the choice of extraction method, from direct lithium extraction to hard rock mining. Prior work explored models of this problem and different methods to optimize mining decisions; these models did not account for uncertainty in pricing, uncertainty in demand, or different mining technologies to extract lithium. Incorporating different pricing models and extraction technology into these models enables more robust strategies for determining not only when and where to open a mine, but also which method of production to pursue. We frame the problem as a partially observable Markov decision process (POMDP) and solve using belief state planning methods to get optimal decision making. In our study, we show that POMDP solvers outperform human inspired heuristics by dynamically adapting to shifting lithium price regimes (static, linear, exponential, and stochastic) through belief state planning and explicit uncertainty management. By optimally sequencing exploration, production, and technology choice, the framework achieves higher demand fulfillment and more balanced economic environmental outcomes over the projects lifetime in all different pricing and deposit scenarios.

08.
arXiv (math.PR) 2026-06-25

On the entropic convergence for piecewise deterministic samplers: speedup and obstruction

arXiv:2606.26086v1 Announce Type: new Abstract: For piecewise deterministic samplers such as Randomized Hamiltonian Monte Carlo (RHMC), Bouncy Particle Sampler (BPS) or Zig-Zag Process (ZZP), long-time exponential convergence rates have been established in previous works using Harris or $L^2$ hypocoercivity approaches. In particular, in the $L^2$ framework, a so-called diffusive-to-ballistic speedup was known for log-concave targets, according to which the convergence rates of these samplers, with suitable parameters, are quadratically improved with respect to the standard overdamped Langevin diffusion process. A recent work by Jianfeng Lu showed that this speedup also holds for the kinetic Langevin diffusion process when the convergence is stated in terms of relative entropy, raising the question whether this also holds for piecewise deterministic samplers. The present work provides a positive and a negative answer to this: first, we show that the speedup holds in entropy for RHMC; second, we show that for BPS or ZZS, even for a standard Gaussian target, a similar result cannot hold, and even that exponential convergence (at any rate) in entropy fails.

09.
arXiv (CS.AI) 2026-06-11

Beyond Continuity: Simulation-free Reconstruction of Discrete Branching Dynamics from Single-cell Snapshots

arXiv:2605.00545v2 Announce Type: replace-cross Abstract: Inferring cellular trajectories from destructive snapshots is complicated by the challenges of stochasticity and non-conservative mass dynamics such as cell proliferation and apoptosis. Existing unbalanced Optimal Transport (OT) methods treat mass as a continuous fluid, performing inference at the population level. However, this macroscopic view often fails to capture the discrete, jump-like nature of birth-death events at single-cell resolution, which is essential for understanding lineage branching and fate decisions. We present Unbalanced Schrödinger Bridge (USB), a simulation-free framework for learning underlying dynamics that effectively integrates both stochastic and unbalanced effects which also models the discrete, jump-like birth-death dynamics at single-cell resolution. Theoretically, USB provides a tractable solution to the Branching Schrödinger Bridge (BSB) problem, offering a rigorous microscopic interpretation where individual cells undergo both Brownian motion and discrete birth-death jumps. Technically, the method implements an efficient solver by introducing a simulation-free training objective that effectively scales to high-dimensional omics data. Empirically, we demonstrate on both simulated and real-world datasets that USB not only achieves trajectory reconstruction performance better than or comparable to deterministic baselines but also uniquely enables realistic discrete simulation of birth-death dynamics at single-cell resolution.

10.
arXiv (CS.AI) 2026-06-17

A Unified Framework for Context-Aware and Relation-Aware Graph Retrieval-Augmented Generation

arXiv:2606.18075v1 Announce Type: new Abstract: Retrieval-Augmented Generation (RAG) has emerged as a paradigm for enhancing large language models (LLMs) with external knowledge, yet existing graph-based methods face a fundamental limitation: entity-centric and chunk-centric approaches operate on representations anchored to original text without true knowledge fusion. While entity-centric methods connect logically related content and chunk-centric methods preserve context, both retrieve information separately through similarity search, missing emergent understanding from their synthesis. In this paper, we propose HyGRAG, a hierarchical graph RAG framework that transcends source documents by addressing three core challenges: constructing summaries that genuinely integrate contextual and relational information, leveraging these synthesized representations to access emergent knowledge during retrieval, and efficiently updating hierarchical structures for dynamic corpora. Specifically, we design hierarchical index structures over hybrid graphs with both chunk and entity nodes, then iteratively cluster them and generate LLM-based summaries. Then, we design context and relation-aware retrieval that searches across all abstraction levels while expanding through community membership. Moreover, we enable dynamic knowledge update through attachment-based algorithms with only local re-summarization. Experimental results show that HyGRAG improves the average accuracy of multi-hop reasoning tasks by 9.7%, while maintaining reasonable efficiency.

11.
arXiv (CS.CV) 2026-06-24

Sat2City v2: Native 3D City Asset Generation from a Single Satellite Image

Generating explicit 3D city assets from a single satellite image is important for digital twins, urban simulation, and geospatial intelligence. Unlike satellite-to-street-view synthesis, the task requires a reusable textured mesh with plausible geometry and controllable appearance rather than a 3D proxy optimized only for rendering a small set of images or videos. The ICCV Sat2City framework made a first step by conditioning cascaded sparse-voxel latent diffusion on satellite-derived height maps, but its appearance was random, its training data were synthetic, and its task-specific VAE did not scale well to noisy real-world reconstructions. We present Sat2City v2, a journal extension that adapts a pretrained native structured-latent 3D foundation model to weakly aligned satellite images and textured meshes. We build a real-world dataset with 16,241 satellite-mesh pairs across 24 regions in 9 cities. Instead of learning a 3D representation from noisy city meshes, Sat2City v2 encodes each mesh into a pretrained native 3D latent space, fine-tunes a satellite-conditioned geometry flow, and uses the decoded shape to anchor satellite-conditioned texturing. This retains Sat2City's geometry-to-appearance cascade while enabling appearance-controllable generation from the satellite input. Experiments on metric-scale DSM reconstruction and generative city-asset benchmarks for geometry and appearance show that Sat2City v2 achieves the best overall performance among evaluated baselines. Overall, Sat2City v2 advances satellite-to-city generation from rendering-oriented 3D proxies to explicit textured mesh assets, supported by, to the best of our knowledge, the first documented satellite-mesh paired dataset collected from matched geographic crops for this asset-level task. Project page: https://ai4city-hkust.github.io/Sat2City-v2/

12.
arXiv (CS.LG) 2026-06-24

Dirac-Frenkel dynamics with inertia for nonlinearly parametrized solutions of evolution problems

arXiv:2606.24769v1 Announce Type: cross Abstract: Even when Dirac-Frenkel dynamics determine a well-defined evolution in function space, the corresponding parameter dynamics can be non-unique or ill-conditioned for redundant nonlinear parametrizations such as neural networks or mixture models. We propose to add inertia to the Dirac-Frenkel dynamics and show that this allows useful parameter velocity information to persist from the past trajectory in directions that are weakly informed, while well-informed parameter velocity directions continue to follow the Dirac-Frenkel dynamics. We prove that the inertial formulation yields well-posed parameter dynamics and provide a posteriori error bounds. After time discretization, the method requires the solution of the same type of regularized linear least-squares problem as standard Dirac-Frenkel dynamics, but with the previous velocity appearing as an anchor. Numerical experiments demonstrate the increased robustness obtained with inertia.

13.
arXiv (CS.LG) 2026-06-17

Informative Missingness to Generate Irregular Clinical Time Series

arXiv:2606.17106v1 Announce Type: new Abstract: Laboratory tests in electronic health records are collected irregularly, and the absence of a test order can be as informative as the measurement itself. Such missingness reflects clinicians' decisions and patient physiology, making it important to model it directly rather than treat it as a preprocessing artifact. Here we present a diffusion-based approach for generating clinical time series that jointly models laboratory values and their observation patterns using the public Data Analytics Challenge on Missing Data Imputation (DACMI) benchmark derived from MIMIC-III. To preserve realistic sampling, we align chart times into 4-hour intervals and segment admissions into 7-day windows, producing trajectories that pair each lab value with a corresponding observation indicator. Standard transformations and normalization are applied to stabilize training. Our method extends the TimeDiff framework to learn continuous lab values and discrete missingness patterns through complementary diffusion objectives. Experiments show that the generated data closely match real patient trajectories across individual lab distributions and joint value-missingness embeddings, demonstrating that diffusion models can capture clinically meaningful dependencies between patient physiology and clinicians' testing behavior under MNAR-like (missing-not-at-random) missingness. These preliminary results indicate that our model can serve as an initial component toward developing clinical foundation models. By producing synthetic priors that preserve key physiology-missingness relationships, this work motivates the subsequent training of Prior-Data Fitted Networks capable of leveraging informative missingness, which we will investigate in the extended work.

14.
arXiv (CS.LG) 2026-06-12

Feature-preserving Latent-EnKF for Data Assimilation of Flows with Shocks

arXiv:2606.12559v1 Announce Type: cross Abstract: The ensemble Kalman filter (EnKF) is widely adopted for sequential data assimilation, but fails for solutions with discontinuities, such as shocks in compressible flows. Uncertainty in shock location induces multimodal ensemble statistics that violate the Gaussian assumptions underlying the EnKF, producing large-scale spurious oscillations in the analysis state. We introduce a feature-preserving latent-EnKF that performs the ensemble update in a learned low-dimensional latent space, where shock and flow features admit a smooth manifold representation, thereby preserving sharp features during EnKF analysis. The updated latent state is mapped back to physical state through a shared decoder for all ensemble members. The algorithm eliminates the member-specific ordered training and positivity flooring used in prior approaches. Numerical experiments on a Sod shock tube and Mach 2 shock interaction with a 2D cylinder, using sparse and noisy observations, show accurate feature recovery of shocks and contact discontinuities without spurious oscillations.

15.
arXiv (CS.AI) 2026-06-25

Securing Time Integrity in Energy IoT Against Clock Drift and Y2K38 Failures

arXiv:2601.23147v2 Announce Type: replace-cross Abstract: The integrity of time in distributed Internet of Things (IoT) devices is crucial for reliable operation in energy cyber-physical systems, such as smart grids and microgrids. However, IoT systems are vulnerable to clock drift, time-synchronization manipulation, and timestamp discontinuities, such as the Year 2038 (Y2K38) Unix overflow, all of which disrupt temporal ordering. Conventional anomaly-detection models, which assume reliable timestamps, fail to capture temporal inconsistencies. This paper introduces STGAT (Spatio-Temporal Graph Attention Network), a framework that models both temporal distortion and inter-device consistency in energy IoT systems. STGAT combines drift-aware temporal embeddings and temporal self-attention to capture corrupted time evolution at individual devices, and uses graph attention to model spatial propagation of timing errors. A curvature-regularized latent representation geometrically separates normal clock evolution from anomalies caused by drift, synchronization offsets, and overflow events. Experimental results on energy IoT telemetry with controlled timing perturbations show that STGAT achieves 95.7% accuracy, outperforming recurrent, transformer, and graph-based baselines with significant improvements (d > 1.8, p < 0.001). Additionally, STGAT reduces detection delay by 26%, achieving a 2.3-time-step delay while maintaining stable performance under overflow, drift, and physical inconsistencies.

16.
arXiv (quant-ph) 2026-06-16

QALM: Escaping Local Minima via Interleaved Exploration and Exploitation in Quantum Circuit Optimization

arXiv:2606.16221v1 Announce Type: new Abstract: Quantum circuit optimizers face a fundamental limitation in how they tolerate temporary cost increases. At one extreme, greedy rule-based optimizers immediately apply any cost-reducing transformation, achieving high efficiency but quickly becoming trapped in local minima. At the other extreme, search-based optimizers accept cost-increasing moves to explore the circuit space and escape such minima. However, because search-based optimizers cannot determine within a reasonable time budget whether a given point is promising, that is, whether its neighborhood contains a deeper local minimum, they must blindly explore higher-cost regions. As a result, escaping the current basin to reach a promising point takes exponentially many steps. In this work, we show that this limitation can be overcome with a hybrid framework that interleaves the exhaustive exploration capabilities of search algorithms with the efficiency of rule-based optimization. We implement this framework as QALM, a novel optimizer designed to escape local minima without incurring the runtime penalties of pure search. Crucially, our results demonstrate that QALM does not merely strike a balance; it outperforms existing rule-based and search-based optimizers in circuit reduction rates while operating with the computational efficiency of rule-based systems. In a comprehensive evaluation across 248 circuits, QALM matches or exceeds the fidelity of the strongest baseline on 83.9% of these circuits, given the same time budget.

17.
medRxiv (Medicine) 2026-06-22

Vaccine introductions in the WHO African Region, 2023-26: a country-level ecological analysis by Gavi eligibility and conflict-affected status

Background. The Immunization Agenda 2030 (IA2030) tracks new and underused vaccine introduction as an access metric, and its mid-term review calls for stronger country ownership, prioritisation, data use and tailored support in conflict-affected and resource-constrained settings; however, national launch status does not measure recurrent financing, implementation, safety or equity. We examined how recent vaccine-introduction activity was distributed across the WHO African Region. Methods. We conducted a descriptive country-level ecological analysis of all 47 Member States from January 2023 to June 2026. The country was the unit of analysis and contributed one cumulative, unweighted count of nationally endorsed vaccine-introduction and programme-change events. Counts were linked to Gavi eligibility, World Bank FY26 conflict-affected status, broader fragile and conflict-affected situation status in sensitivity analysis, and concurrent system-performance indicators, and modelled with Poisson regression using HC1 robust standard errors. Two Expanded Programme on Immunization (EPI) manager survey waves were summarised at country level. Reporting followed STROBE and RECORD. Results. Seventy-two events were recorded across 38 of 47 Member States: 48 new-antigen introductions, 20 dose or schedule expansions and four combination-vaccine introductions; malaria vaccines accounted for 21. Gavi-eligible conflict-affected countries averaged 2.50 events per country versus 1.27 in both comparison groups. Gavi-eligible conflict-affected status was associated with a higher count (incidence rate ratio [IRR] 1.97, 95% confidence interval [CI] 1.38-2.81; p

18.
arXiv (CS.CV) 2026-06-25

Shift Variant Image Degradation and Restoration Using Singular Value Decomposition

Shift-variant image degradation is frequently encountered in practical imaging systems where the point spread function (PSF) varies across the image field due to motion, optical aberrations, atmospheric turbulence, or sensor-related effects. Unlike shift-invariant, shift-variant degradation presents significant challenges for image restoration because the degradation process cannot be represented by a single convolution kernel. This paper proposes a singular value decomposition (SVD)-based framework for restoring images degraded by shift-variant motion blur. The proposed approach determines the contribution of small singular values using a singular-value energy retention criterion. Specifically, the number of small singular values is selected based on a specified percentage of cumulative singular-value energy, providing a systematic approach for controlling noise amplification while preserving useful image information. The degradation model is formulated using a position-dependent PSF represented by a shift-variant imaging operator. Three representative one dimensional shift-variant motion PSFs are considered: bidirectional linear motion, Gaussian motion, and simple harmonic motion. The image degradation process is modeled as a linear system, and SVD is employed to analyze and invert the corresponding degradation operator. The singular-value representation provides insight into the ill-conditioned nature of the restoration problem and enables the development of stable inversion techniques. The proposed SVD-based restoration algorithm is applied to three degraded images. Experimental results demonstrate the effectiveness of the proposed approach in recovering image details and reducing blur artifacts under different motion models.

19.
arXiv (CS.AI) 2026-06-11

INFRAMIND: Infrastructure-Aware Multi-Agent Orchestration

arXiv:2606.11440v1 Announce Type: new Abstract: Existing multi-agent LLM orchestration methods, ranging from brute-force ensembles to learned routers, select models and topologies based on task and model features. However, these methods do not consider the runtime state of the serving infrastructure. On shared GPU clusters under concurrent load, this infrastructure blindness causes systematic resource underutilization: preferred models accumulate deep request queues while equally capable alternatives sit idle. In multi-agent pipelines, where each query triggers multiple sequential model calls, these delays then compound across every downstream step. Closing this gap is challenging because the relevant infrastructure signals (queue depths, KV-cache pressure, latencies) are dynamic and noisy, and they must drive three different decisions: planning, per-step routing, and scheduling. We introduce INFRAMIND, a framework that makes the entire multi-agent stack infrastructure-aware. An infra-aware planner conditions topology and role selection on real-time system load and remaining budget, biasing toward simpler graphs under congestion and richer ones at low load. An infra-aware executor then observes per-model queue depths, cache utilization, and response latencies at each agent step to decide which model to call and how deeply to reason; a budget-aware scheduler further reorders each model's queue so that urgent requests are served first. Cast as a hierarchical constrained MDP and solved end-to-end via reinforcement learning, the system learns to balance quality against latency automatically. Across five benchmarks, INFRAMIND delivers up to +7.6 pp accuracy over the prior baseline at low load with up to 7x lower latency, and sustains up to 99.9% SLO compliance under high load where every baseline drops below 50%.

20.
medRxiv (Medicine) 2026-06-10

Prediction of immunotherapy response using live tumor fragments from routine clinical biopsies

Functional ex vivo assays using live tumor tissues have demonstrated strong predictive accuracy for response to immune checkpoint inhibitors (ICIs) but are not scalable, requiring manual processing of large resections collected at academic centers. Here, an ex vivo live tumor fragment (LTF) platform was developed using standard-of-care biopsies from 228 patients with suspected malignancy collected across prospective, multicenter observational trials and biobanks. Hierarchical clustering of ICI-mediated changes in cytokine production identified two groups: responders and nonresponders. A binary classifier (elive index) using 8 cytokines achieved an AUC of 0.99 for cluster prediction. elive index correctly predicted clinical benefit in 93% (26/28) of patients (P = 3.2x10-5) and accurately identified 83% (10/12) of objective responders. Critically, elive responders were identified among biomarker-negative patients, highlighting the platform as a scalable approach that complements existing companion diagnostics and expands the population of patients identified to benefit from ICI therapy.

21.
Nature (Science) 2026-06-25

Oo oo, ha ha: why humans and great apes giggle alike when tickled

Authors:

The rhythmic patterns of laughter found in apes and humans reveal that complex primate vocal control might have started evolving 15 million years ago. The rhythmic patterns of laughter found in apes and humans reveal that complex primate vocal control might have started evolving 15 million years ago.

22.
bioRxiv (Bioinfo) 2026-06-16

RetroMol: Parsing a shared encoding from natural products and their biosynthetic gene clusters

Natural products such as polyketides and nonribosomal peptides (NRPs) are important sources of bioactive compounds, including many antibiotics. Many of them are assembled by modular enzyme complexes and further modified and diversified by tailoring reactions encoded by biosynthetic gene clusters (BGCs). Although natural products and their coding BGCs describe different data modalities of the same biochemical process, a unified language to jointly describe their biochemistry is lacking. Here we introduce a sequence-based representation of the core biosynthesis of modular natural products, which we call primary sequences, that bridges chemical structures and BGCs. We also present RetroMol, an algorithm that parses either natural product structures or their encoding BGCs into their primary sequences of natural product building blocks. RetroMol allows for similarity scoring between natural products and BGCs, enabling the retrieval of compounds, BGCs, and a combination of the two, based on their biosynthetic similarity. This can, for instance, be used to retrieve biosynthetically similar but structurally dissimilar compounds, or link natural products to candidate coding BGCs in large experimental datasets. We demonstrate the latter by rediscovering the nocardichelin B BGC as a proof of principle. We also exemplify the utility of biosynthetic similarity by showing various pairs of biosynthetically similar compounds with low structural similarity. Together, these results establish primary sequences as a shared biosynthetic encoding for natural product comparison and BGC prioritization.

23.
arXiv (CS.CV) 2026-06-17

Spatio-Temporal Fusion Model for Standard View Classification of Echocardiographic Videos

Automated classification of standard echocardiographic views is crucial for efficient clinical workflow but faces three main challenges. First, publicly available datasets are scarce and limited in scale and view coverage. Second, the performance of some modern video-level architectures for echocardiographic view classification remains underexplored. Third, some view categories exhibit highly similar spatial appearances, making single-frame features insufficient for discrimination, while heterogeneous frame quality complicates robust temporal information fusion. To address these challenges, we release the Echocardiographic Videos of Nine Views (EV9V) dataset, comprising 5,138 videos, 910,579 frames, and 9 standard views, which is, to the best of our knowledge, the largest publicly available echocardiography video dataset. Using EV9V, we systematically benchmark representative video classification architectures, including Convolutional Neural Networks (CNNs), Recurrent Neural Networks (RNNs), and Transformers. Furthermore, we propose a Spatio-Temporal Fusion Model (STFM), an efficient dual-stream CNN-LSTM (Long Short-Term Memory) framework that jointly captures spatial anatomical structures and temporal cardiac dynamics. The proposed framework leverages uncertainty-aware learning to preferentially sample representative video segments during training and evidence-based fusion during inference, improving robustness to variations in frame quality across echocardiographic videos. Extensive experiments demonstrate that our method achieves competitive performance across diverse video classification models, validating the effectiveness of uncertainty-aware spatio-temporal learning for echocardiographic view classification. The code is available at https://github.com/bgx666/stfm.

24.
arXiv (CS.CL) 2026-06-18

Redact or Keep? A Fully Local AI Cascade for Educational Dialogue De-Identification

Educational dialogue is a valuable but sensitive resource for research: the same transcripts that capture authentic learning often capture personally identifiable information (PII) entangled with curricular content, where "Riemann" may refer to a real student or to a mathematical concept. Existing approaches force a tradeoff between governance and accuracy. Commercial Large Language Models (LLMs) can handle this ambiguity but require sending student data to third parties, while local named entity recognition (NER) systems preserve governance but over-redact curricular terms. We propose a fully local cascade framework that reframes de-identification from open-ended entity recognition to constrained privacy triage. A recall-first union proposer combines two lightweight encoders with deterministic rules to over-generate candidate spans; a context-aware reviewer then makes a binary Redact/Keep decision for each candidate using surrounding dialogue and speaker role. We evaluate three reviewer configurations against same-family LLM-only baselines and a commercial API on math tutoring transcripts from two large platforms. The strongest local configuration reaches 0.958 macro F1, compared with 0.767 for a same-family LLM-only baseline and 0.706 for the commercial API, while running entirely on a single laptop. On a targeted challenge set of curricular-personal name ambiguity, the same configuration degrades by only 0.03 F1 versus 0.19 to 0.25 for smaller reviewers. These results suggest that for educational de-identification, problem formulation matters more than model scale.

25.
arXiv (math.PR) 2026-06-15

Semiclassical limit of Polyakov-Liouville measure and Q-Curvature Uniformization on evev-dimensional manifolds

arXiv:2606.14443v1 Announce Type: new Abstract: We study the semiclassical limit of the Polyakov-Liouville measure $\boldsymbol{\nu}_\gamma$, which is a non-Gaussian measure on $H^{-\eps}(M)$ that has recently been extended from Riemann surfaces to general Riemannian manifolds $(M,g)$ of even dimension. We show that under an appropriate rescaling in the semiclassical limit as $\gamma\to0$, the normalized Polyakov-Liouville measure $\Q_\gamma$ concentrates on the unique smooth weight $u$ for which the conformal metric $e^{2u}g$ on $M$ has constant $Q$-curvature.