Academic Intelligence · Curated Daily

Explore the Frontier of Global Academia

AcademicHub aggregates real-time literature from top journals and preprint platforms. Build your personal research radar and let large language models compile cross-disciplinary analysis briefings automatically.

01.
medRxiv (Medicine) 2026-06-16

Infections and suicide and self-harm: a population-based matched cohort study

Background Infections have been associated with adverse mental health outcomes, including suicide, but evidence beyond severe or central nervous system infections is limited. We investigated associations between a range of acute infections and subsequent suicide/self-harm outcomes. Methods We conducted six infection-specific matched cohort studies using English primary care records from the Clinical Practice Research Datalink Aurum (2007-2024), linked to hospital admissions and mortality data. Adults ([≥]18 years) with a primary care record of infection (gastroenteritis, lower respiratory tract [LRTI], skin/soft-tissue [SSTI], urinary tract [UTI], sepsis, meningitis/encephalitis [positive control]) were matched (age, sex, practice, calendar period) to up to five comparators without infection. We estimated hazard ratios (HRs) for suicide/self-harm outcomes using Cox regression, stratified by matched set and implicitly adjusting for matching factors, with additional adjustment for deprivation, lifestyle factors, and comorbidities. We examined whether associations varied over time, by infection severity, antimicrobial treatment, sex, and prior mental health conditions. Findings Cohorts ranged from 18,192 individuals with meningitis/encephalitis (matched to 90,915 without) to 398,099 with SSTI (matched to 1,743,747). After adjustment, individuals with infection had a higher hazard of suicide/self-harm outcomes than comparators across all cohorts: sepsis (HR 1.79, 95% CI 1.65-1.93), gastroenteritis (1.62, 1.55-1.70), meningitis/encephalitis (1.56, 1.32-1.84), UTI (1.41, 1.33-1.50), SSTI (1.37, 1.31-1.43), and LRTI (1.37, 1.31-1.44). Risk was highest in the year post-infection, attenuating over time, and was higher among severe infections and those without prior mental health conditions. Interpretation Common acute infections recorded in primary care are associated with increased risk of suicide and self-harm, particularly following severe infections and in the year post-infection. Findings support suicide risk monitoring following acute infection, particularly among individuals without prior mental health conditions, and highlight infection prevention as a potentially modifiable strategy in vulnerable populations. Funding Wellcome and La Caixa. Copyright This work is licensed under a Creative Commons Attribution (CC BY) licence.

02.
arXiv (CS.CV) 2026-06-15

LiAuto-GeoX: Efficient Grounded Driving Transformer

Dense 3D reconstruction has demonstrated immense potential for spatial understanding, yet its viability as a real-time, onboard representation for autonomous driving remains an open challenge. Existing large-scale visual geometry models typically require substantial computational resources and lack the long-range geometric fidelity, surround-view consistency, and real-time efficiency demanded by dynamic driving environments. To bridge this gap, we present LiAuto-GeoX, an efficient grounded driving transformer designed for deployable, ego-centric 3D scene understanding. Our approach begins by learning a high-capacity driving geometry model from large-scale surround-view data, utilizing sparse LiDAR priors to provide robust geometric grounding in distant, ambiguous, or structure-sparse regions. We then instantiate this capability into a highly compact 155M-parameter onboard model through a novel geometry-preserving distillation framework. This framework employs mask-guided depth-aware distillation to retain fine-grained metric structures by emphasizing geometrically informative regions, and relative-pose relational distillation to enforce cross-view spatial consistency through pose-induced geometric relations. Extensive evaluations reveal that LiAuto-GeoX runs at 220 FPS on KITTI while maintaining high-fidelity dense reconstruction, enabling real-time deployment. The learned geometry transfers seamlessly to downstream autonomy tasks, achieving 90.6 PDMS in trajectory prediction, 24.63 mIoU in occupancy prediction, and 47.67 IoU in future-frame prediction. These all demonstrate that efficient dense 3D reconstruction can transcend its traditional role as a perception target to serve as a scalable, foundational geometric representation for next-generation autonomous driving.

03.
arXiv (CS.LG) 2026-06-24

Layer-wise Geometric Approximation Rates for Deep Networks

arXiv:2604.20219v2 Announce Type: replace Abstract: Depth is widely viewed as a central contributor to the success of deep neural networks, whereas standard neural network approximation theory typically provides guarantees only for the final output and leaves the role of intermediate layers largely unclear. We address this gap by developing a quantitative framework in which depth admits a precise scale-dependent interpretation. Specifically, we design a single shared mixed-activation architecture of fixed width $2dN+d+2$ and any prescribed finite depth such that each intermediate readout $\Phi_\ell$ is itself an approximant to the target function $f$. For $f\in L^p([0,1]^d)$ with $p\in [1,\infty)$, the approximation error of $\Phi_\ell$ is controlled by $(2d+1)$ times the $L^p$ modulus of continuity at the geometric scale $N^{-\ell}$ for all $\ell$. The estimate reduces to the geometric rate $(2d+1)N^{-\ell}$ if $f$ is $1$-Lipschitz. Our network design is inspired by multigrade deep learning, where depth serves as a progressive refinement mechanism. For every prescribed terminal depth, the construction yields a finite nested family of prefix readouts whose earlier correction terms remain embedded in later readouts. Thus the approximation may be truncated within the prescribed depth range once the desired certified accuracy is reached.

04.
arXiv (quant-ph) 2026-06-24

On the Limits of Stretching Quantum Pseudorandomness

arXiv:2606.24736v1 Announce Type: new Abstract: Pseudorandom states, introduced by Ji, Liu, and Song (CRYPTO '18), are quantum analogues of classical pseudorandom generators. A fundamental property of classical pseudorandom generators is that their output can be stretched to arbitrary polynomial length. Whether an analogous stretching property holds for quantum pseudorandom states remains unclear. In this work, we prove the first black-box separation between single-copy secure pseudorandom states ($\mathsf{1PRS}$) with different output lengths. Specifically, we construct a quantum oracle relative to which $\mathsf{1PRS}$ with output length $m(n)=1.1n$ exist, but $\mathsf{1PRS}$ with output length $m(n)=\Omega(n^{2+\epsilon})$ do not, for any $\epsilon>0$. Our proof leverages the Common Haar Random State (CHRS) model introduced by Chen, Coladangelo, and Sattath (EUROCRYPT '25), and introduces a technique to bound the effective number of resource CHRS states utilized by any $\mathsf{1PRS}$ generator in this model.

05.
Nature (Science) 2026-06-10

Light-induced quantum friction of carbon nanotubes in water

Friction slows down moving objects at both macroscopic and microscopic scales1. At the electronic level, quantum friction describes direct transfer of momentum between a liquid and the electrons of a solid2. Owing to its microscopic nature, this phenomenon remains experimentally challenging to capture3. Here we show that near-infrared fluorescent single-walled carbon nanotubes (SWCNTs) exhibit light-induced quantum friction in water. It is measured by observing an excitation-power-dependent linear decrease of around 50% in the diffusion constants of functionalized SWCNTs in aqueous solution. This effect disappears when excitons are localized, as in the case of SWCNTs with quantum defects. We further show that the chemical manipulation of exciton concentration by molecules that increase or decrease SWCNT fluorescence also modulates the diffusion constant by up to a factor of 2. Optical pump terahertz (THz) probe spectroscopy shows an instantaneous response (around 30 cm−1) that we assign to direct exciton–water coupling in the range of water Debye modes. It is followed by an increasing (>100 ps) response in the range of intermolecular translational modes of the hydrogen bond network of water (>100 cm−1), resembling heating. Classical molecular dynamics simulations further support a mechanism in which the fluctuating dipole moments of excitons create frictional forces. These findings establish light-induced quantum friction between excitons in SWCNTs and water and show that electronic excitations can be used to control nanoscale motion and fluid properties. Near-infrared fluorescent carbon nanotubes exhibit light-induced quantum friction in water, in which exciton interactions slow nanoscale motion and enable optical control of diffusion and fluid dynamics.

06.
arXiv (CS.AI) 2026-06-16

AlignCoder: Aligning Retrieval with Target Intent for Repository-Level Code Completion

arXiv:2601.19697v2 Announce Type: replace-cross Abstract: Repository-level code completion remains a challenging task for existing code large language models (code LLMs) due to their limited understanding of repository-specific context and domain knowledge. While retrieval-augmented generation (RAG) approaches have shown promise by retrieving relevant code snippets as cross-file context, they suffer from two fundamental problems: misalignment between the query and the target code in the retrieval process, and the inability of existing retrieval methods to effectively utilize the inference information. To address these challenges, we propose AlignCoder, a repository-level code completion framework that introduces a query enhancement mechanism and a reinforcement learning based retriever training method. Our approach generates multiple candidate completions to construct an enhanced query that bridges the semantic gap between the initial query and the target code. Additionally, we employ reinforcement learning to train an AlignRetriever that learns to leverage inference information in the enhanced query for more accurate retrieval. We evaluate AlignCoder on two widely-used benchmarks (CrossCodeEval and RepoEval) across five backbone code LLMs, demonstrating an 18.1% improvement in EM score compared to baselines on the CrossCodeEval benchmark. The results show that our framework achieves superior performance and exhibits high generalizability across various code LLMs and programming languages.

07.
arXiv (CS.CL) 2026-06-18

Human-AI Coevolution Dynamics: A Formal Theory of Social Intelligence Emergence Through Long-Term Interaction

Current conversational AI systems have made significant progress in language generation, personalization, and long-context interaction. However, most existing methods model social behavior through isolated components such as emotion modeling, memory retrieval, or persona conditioning, lacking a unified framework to explain the emergence of stable social relationships and social intelligence in long-term human-AI interaction.To address this, we propose the Human-AI Coevolution Dynamics Framework (HACD-H), a formal model of human-AI interaction as a self-organizing social cognitive system. HACD-H integrates emotional adaptation, relational organization, social memory, and personality consistency into a unified dynamical framework and introduces principles including multi-timescale social cognition, relational attractors, trust basins, developmental phase transitions, and social cognitive energy dynamics.We construct a conversational dataset with approximately 14,700 interaction turns and develop a theory-driven empirical evaluation framework. Results reveal a hierarchy of temporal persistence in social cognition, stable relational attractors, phase-transition-like developmental patterns, and a structured social cognitive energy landscape. Social intelligence shows a significant negative correlation with social cognitive energy (r = -0.391, p < 0.001), and interaction trajectories exhibit progressive energy reduction over time.These findings suggest that social intelligence emerges from long-term social cognitive coevolution rather than isolated conversational capabilities. HACD-H provides a unified theoretical foundation for modeling adaptive human-AI social interaction and developing socially intelligent AI systems.

08.
arXiv (CS.AI) 2026-06-19

Policy-Embedded Graph Expansion: Networked HIV Testing with Diffusion-Driven Network Samples

arXiv:2601.16233v2 Announce Type: replace-cross Abstract: HIV is a retrovirus that attacks the human immune system and can lead to death without proper treatment. In collaboration with the WHO and the University of Witwatersrand, we study how to improve the efficiency of HIV testing with the goal of eventual deployment, directly supporting progress toward UN Sustainable Development Goal 3.3. While prior work has demonstrated the promise of intelligent algorithms for sequential, network-based HIV testing, existing approaches rely on assumptions that are impractical in our real-world implementations. Here, we study sequential testing on incrementally revealed disease networks and introduce Policy-Embedded Graph Expansion (PEGE), a novel framework that directly embeds a generative distribution over graph expansions into the decision-making policy rather than attempting explicit topological reconstruction. We further propose Dynamics-Driven Branching (DDB), a diffusion-based graph expansion model that supports decision making in PEGE and is designed for data-limited settings where forest structures arise naturally, as in our real-world referral process. Experiments on real HIV transmission networks show that the combined approach (PEGE + DDB) consistently outperforms baselines (e.g., 17.3% improvement in discounted reward and 15.4% more HIV detections with 25% of the population tested) and explore key tradeoffs that drive solution quality.

09.
arXiv (CS.LG) 2026-06-15

Zeta: Dual Whitening for Matrix Optimization via Coordinate-Adaptive Preconditioning

arXiv:2606.14187v1 Announce Type: new Abstract: Large-scale neural network training increasingly relies on matrix-aware optimizers that exploit the structure of weight parameters beyond element-wise adaptation. However, existing matrix-aware methods such as Muon have an underappreciated vulnerability: their core operation, Newton-Schulz iteration, depends critically on input conditioning, yet the raw momentum matrices exhibit severe coordinate-wise scale heterogeneity. In this paper, we first verify this scale heterogeneity through a chi-square uniformity test, showing that intra-matrix scale imbalance is prevalent across Transformer layers and that coordinate whitening effectively corrects it. Motivated by this finding, we propose Zeta, a dual whitening optimizer that applies coordinate whitening and spectral whitening in a strictly ordered pipeline. The ordering is not a tunable choice but follows from a mathematical dependency: coordinate whitening establishes the statistical isotropy that spectral whitening requires to function reliably. We further prove that this dual pipeline strictly reduces orthogonalization error relative to pure spectral methods by improving the condition number of the input. Empirically, Zeta matches or surpasses strong baselines across language modeling (0.6B to 8B parameters), mixture-of-experts architectures, and vision tasks, demonstrating that resolving scale imbalance before orthogonalization leads to faster convergence and better generalization. Code is available at https://gitcode.com/kevin259/MindSpeed.

10.
arXiv (CS.LG) 2026-06-12

Using Seismic Statistical Features and VQ-VAE to Improve Spatiotemporal Seismicity Predictability

arXiv:2606.10069v2 Announce Type: replace Abstract: In this paper we build upon a previous study in which we demonstrated, using XGBoost and earthquake catalogue data from Japan and Chile, that a set of 60 seismic statistical features (SSFs) had much greater predictive value than a set of 428 generic time series features from the tsfresh package. We here extend this previous work in two key ways, focusing on data from Japan as a large dataset is necessary in order to allow for the training of a deep learning (autoencoder) model. First, we move from whole-region prediction (considering, for each candidate event, the likelihood of an event M $\geq$ 5.0 anywhere in the region in the next 15 days) to localised predictions in which both the region of feature computation and the region of prediction are restricted to a circle of radius 24 km around the candidate event, and we show that performance remains excellent, similar to our previous whole-region study for the same area. Second, we here couple this proven set of SSFs, based on one-dimensional (catalogue) data, with a novel feature based on two-dimensional seismic maps, obtained by training a VQ-VAE model to reproduce such maps as output and identifying a measure of its error in doing so with a localised build-up of crustal stress. We show that while localised prediction based on SSFs can be effective alone, with test AUC values as high as those obtained in the case of Japan in our previous whole-region study, the inclusion of the new natively-spatial VQ-VAE-derived feature, top-ranked by SHAP analysis, can enhance performance and additionally appears to near-wholly replace the traditionally-computed $b$-value in terms of feature usage.

11.
arXiv (CS.CL) 2026-06-16

When the Chain of Thought Knows Better: Failure Modes in Multi-Turn Reasoning Models

Failures in multi-turn reasoning models are largely invisible to terminal-score evaluation. A model can lock onto an unsafe stance early in a long dialogue, yet its final-turn refusal rate may appear indistinguishable from a robustly aligned baseline. To expose these hidden temporal dynamics, we propose a trace-level diagnostic - the CoT-Output 2x2 safety matrix. This framework labels every turn along two independent axes (internal reasoning and visible output), yielding four operationally defined failure cells: robust alignment, alignment faking, overt jailbreak, and a distinct failure mode we term context-injection failure (where the CoT maintains safe reasoning, but the visible output produces harm, highlighting a multi-turn manifestation of reasoning unfaithfulness). We evaluate three distilled reasoning targets against a fixed attacker across five oversight conditions, collecting 6750 turn-level observations on the Information-Hazard scenario. Our analysis reveals two reproducible vulnerabilities: an oversight paradox where explicit monitoring cues paradoxically increase alignment-faking rates rather than suppress them, and a context-injection failure where models lock onto unsafe external outputs despite safe internal states. We release the full dataset of multi-turn dialogues and CoT traces to support follow-up trace-diagnostic research.

12.
arXiv (CS.CV) 2026-06-19

ReA-OVCD: Reliability-Aware Open-Vocabulary Change Detection via Semantic and Spatial Refinement

Unlike traditional remote sensing change detection that relies on predefined categories, Open-Vocabulary Change Detection (OVCD) identifies land cover changes flexibly using arbitrary text prompts. However, existing methods suffer from an inherent trade-off when modeling changes: instance-level comparison overlooks fine-grained semantic variations (e.g., partial building extensions), while direct pixel comparison proves unreliable, yielding unstable responses and boundary artifacts due to semantic ambiguity and spatial inconsistency. To this end, we propose an efficient training-free Reliability-Aware Open-Vocabulary Change Detection (ReA-OVCD) framework. It first derives candidate change regions from pixel-wise semantic discrepancies to ensure flexible and detailed localization. To ensure reliability, it subsequently introduces a collaborative refinement strategy to explicitly model change validity from both semantic and spatial perspectives. Specifically, we develop a Semantic Change Reasoning (SCR) module that reassesses changes by jointly analyzing distributional divergence and response variation, enabling the suppression of incidental inconsistencies while preserving reliable semantic shifts. In addition, a Boundary-aware Change Refinement (BCR) module is designed to mitigate artifacts stemming from boundary misalignment and uncertainty through validating whether candidate regions are supported by reliable interior pixels. Extensive experiments across multiple datasets (LEVIR-CD, WHU-CD, DSIFN, and SECOND) demonstrate that our method consistently outperforms state-of-the-art approaches, achieving $\mathrm{F}_{1}^{C}$ improvements of 2.13\% to 9.75\% with higher computational efficiency. The code is publicly available at \https://github.com/Funny0101/ReA-OVCD

13.
arXiv (CS.LG) 2026-06-15

Decoupled Latent Optimization of Diffusion Models for Full Waveform Inversion

arXiv:2606.14139v1 Announce Type: new Abstract: Full waveform inversion (FWI) recovers subsurface velocity from seismic recordings by solving a severely ill-posed, nonconvex PDE-constrained optimization. Classical regularizers stabilize the inversion but fail to reproduce realistic geological structures; recent diffusion-prior methods improve realism at the cost of a fragile trade-off between data fidelity and prior consistency. We propose Decoupled Latent Optimization (DLO), which relaxes the standard latent-optimization formulation into a quadratic-penalty objective over an auxiliary physical variable and a latent variable. The data-fidelity gradient acts in physical space, the diffusion sampler contributes only through a decoded prior sample, and the standard smoothed-velocity initialization of classical FWI is preserved. On the OpenFWI benchmark, DLO outperforms classical regularizers and existing diffusion-based methods under clean, noisy, and missing-trace acquisitions. The prior, trained on 70*70 OpenFWI models, transfers directly to the Marmousi and Overthrust benchmarks, where DLO recovers intricate fault structures and remains robust to initialization smoothing and measurement noise.

14.
arXiv (CS.CV) 2026-06-15

Avatar V: Scaling Video-Reference Avatar Video Generation

Generating avatar videos that are not merely visually similar to a target individual but behaviorally recognizable, faithfully reproducing their talking rhythm, gestural tendencies, and expression dynamics, remains an open challenge. Existing methods predominantly condition on single static images, which provide insufficient identity information and cannot capture dynamic motion traits, while standard pixel-level objectives underserve the perceptually critical facial regions that determine avatar fidelity. We present Avatar V, a production-scale framework that addresses these limitations through video-reference-conditioned identity modeling. Rather than compressing identity into fixed-size embeddings, the model conditions directly on the full token sequence of a reference video, learning to reproduce both static identity attributes (facial geometry, skin texture) and dynamic behavioral patterns (talking rhythm, micro-expressions) through attention over the reference context. We introduce Sparse Reference Attention, an asymmetric mechanism achieving linear-complexity conditioning on arbitrarily long references; a motion representation stream enabling closed-loop talking style transfer; and an identity-aware super-resolution refiner inheriting the full reference conditioning. These are supported by a data engine curating 100M+ training clips from 50M raw videos, and a five-stage training pipeline with flow matching pre-training, personality fine-tuning, two-phase distillation (>10x acceleration), and RLHF alignment, deployed across thousands of GPUs. Avatar V generates 1080p videos of unlimited duration, achieving state-of-the-art identity preservation, lip synchronization, and generation quality on our cross-scene benchmark, consistently outperforming leading systems including Seedance 2.0, Kling O3 Pro, Veo 3.1, and OmniHuman 1.5 in both automated metrics and human evaluation.

15.
arXiv (quant-ph) 2026-06-12

Simple analytical flux-tuned iSWAP pulses for leakage suppression

arXiv:2606.13052v1 Announce Type: new Abstract: Fast, high-fidelity two-qubit gates are a key requirement for fault-tolerant quantum computation. Tunable coupler architectures provide a flexible approach for implementing entangling gates through flux control with large on-off ratios, but fast flux modulation can induce diabatic transitions and population leakage to non-computational states, limiting gate performance. Here we present an analytical flux control method enabling derivative removal by adiabatic gate ($\Phi$-DRAG) for suppressing leakage in flux tunable two-qubit gates. We show that $\Phi$-DRAG differs fundamentally from conventional microwave implementations and derive modified flux modulation protocols that suppress leakage below $10^{-4}$ for fast entangling gates. The method remains effective across a range of asymmetry between qubit anharmonicities and different circuit parameters, enabling high-fidelity two-qubit gates within the fifteen nanosecond range.

16.
medRxiv (Medicine) 2026-06-16

Fidelity-Derived Quantum Dissimilarity-Enhanced k-Nearest Neighbor Algorithm for Arterial Hypertension Prediction

We present a quantum-enhanced version of the classic k-Nearest Neighbors (kNN) classification algorithm, applied to the prediction of arterial hypertension. The traditional Euclidean distance metric of the kNN algorithm is replaced with a Fidelity-derived quantum dissimilarity measure to evaluate the similarity between data samples. We map classical real-world clinical and ECG-derived data features into quantum states via the Dense-Angle Encoding, which efficiently utilizes parameterized rotation gates to pack multiple features into minimal qubits while maintaining pure states. We evaluate the performance of the dissimilarity measure using both the noiseless state vector Simulator and the IBM Qiskit Estimator primitives. The quantum circuit demonstrates robust predictive capabilities comparable to the classical model. While it does not claim computational supremacy over the classical baseline, the framework proves that fidelity-based similarity is a physically meaningful and efficient approach for hybrid quantum classical classification.

17.
medRxiv (Medicine) 2026-06-23

Differential Recovery Trajectories of Emergency Otolaryngologic Conditions across the COVID-19 Pandemic: A Six-year Longitudinal Study from an Urban Emergency Center

Authors:

Objective: The COVID-19 pandemic markedly altered social activity patterns, healthcare utilization, and the epidemiology of infectious diseases. However, its long-term impact on emergency otolaryngologic conditions remains incompletely understood. This study investigated long-term trends in emergency otolaryngologic conditions before, during, and after the COVID-19 pandemic using comprehensive data from a large urban emergency clinic in Osaka, Japan. Methods: All new otolaryngologic outpatients who visited the Chuo Emergency Medical Clinic (CEMC) in Osaka City between 2019 and 2024were retrospectively analyzed. Annual trends in absolute numbers and relative proportions of emergency otolaryngologic conditions were examined by anatomical region and disease category, using 2019 as the pre-pandemic baseline. Results: A total of 99,324 new otolaryngologic outpatients were analyzed. Overall emergency visits declined sharply to approximately half of baseline in 2020, followed by a gradual but incomplete recovery toward pre-pandemic levels by 2024. Most anatomical categories declined to 45-61% of baseline in 2020 and exhibited gradual yet incomplete recovery through 2023; in stark contrast, laryngeal conditions diverged sharply, surging beyond pre-pandemic levels after 2022. Acute infectious otorhinolaryngologic diseases fell to 23-50% of baseline in 2020 and showed variable recovery (69-103%) by 2024. Notably, laryngitis exceeded the baseline, reaching 132% in 2023, whereas epiglottic edema exhibited only a transient increase approaching the baseline in 2021. Non-infectious emergency conditions generally showed only a marginal decrease in 2020 and remained relatively stable throughout the study period, except for sudden sensorineural hearing loss (SSNHL), which dropped sharply to 39% of the baseline in 2020 and remained persistently reduced through 2024. Traumatic emergencies declined variably to 53-81% of the baseline in 2020, followed by an incomplete recovery, reaching only 55-69% by 2024. Conclusion: Emergency otolaryngologic conditions demonstrated heterogeneous recovery trajectories following the COVID-19 pandemic. While most infectious and traumatic conditions gradually but incompletely normalized, laryngeal conditions showed a distinct post-pandemic surge, and SSNHL remained persistently suppressed. These findings reveal heterogeneous, condition-specific recovery trajectories that reflect both genuine shifts in community pathogen burden, true traumatic incidence, and persistent alterations in healthcare-seeking behaviors, insights essential for resource allocation during future public health emergencies.

18.
bioRxiv (Bioinfo) 2026-06-16

PhenoBIC: operator-free single-cell spatial phenotyping in multiplex imaging data using deep learning of cell staining patterns

Multiplex imaging is a valuable tool for spatially examining tissue microenvironments at the single-cell level to uncover biological and clinical insights. However, most multiplex image analysis workflows currently require manual intervention for cell phenotyping, which slows progress, demands human effort, and yields operator-dependent outputs. Here, we developed PhenoBIC, a pre-trained deep learning model for image classification of the multiplexed biomarker signals in a cell (Biomarker Imprint of a Cell) to classify cell phenotypes. We show that PhenoBIC (F1-score ~0.88) outperforms manual gating (widely used) and other machine learning-based computational approaches for cell marker expression classification. We validated this across multiple biomarkers, tissue sampling strategies (whole biopsies and tissue microarrays), multiplex panels, imaging platforms, and tissue types. We have released our in-house training and validation datasets of ~1.4 million manually curated cell expression ground truth labels. We have also open-sourced PhenoBIC and enabled its community-wide deployment via the QuPath interface.

19.
arXiv (CS.AI) 2026-06-19

A Systematic Evaluation of Black-Box Uncertainty Estimation Methods for Large Language Models

arXiv:2606.19868v1 Announce Type: new Abstract: Although large language models (LLMs) have shown strong capabilities across a wide range of tasks, their outputs often remain unreliable and may contain hallucinations, making uncertainty estimation (UE) essential for building trustworthy LLMs. In practice, many mainstream LLMs are only accessible through restricted APIs, where internal signals such as logits and hidden states are unavailable, making black-box UE especially important. However, existing work on black-box UE for LLMs remains fragmented in methodology and lacks a unified empirical comparison. To address this gap, we present a systematic review of black-box UE methods and organize them into five categories: verbalization-based, sampling-based, explanation-based, multi-agent, and hybrid methods. We further build a unified evaluation framework and benchmark 24 representative methods across 4 models and 4 dataset settings. Our results show that no single method consistently dominates across all settings. Nevertheless, methods that reason over and compare candidates in the answer space are generally effective, and hybrid methods that combine multiple uncertainty signals perform well under most conditions. By releasing the benchmark data and a unified evaluation framework, we aim to facilitate reproducible comparisons and support future research, while our empirical findings provide practical guidance for developing future black-box UE methods for LLMs.

20.
arXiv (CS.CL) 2026-06-12

MoReBench: Evaluating Procedural and Pluralistic Moral Reasoning in Language Models, More than Outcomes

As AI systems progress, we rely more on them to make decisions with us and for us. To ensure that such decisions are aligned with human values, it is imperative for us to understand not only what decisions they make but also how they come to those decisions. Reasoning language models, which provide both final responses and (partially transparent) intermediate thinking traces, present a timely opportunity to study AI procedural reasoning. Unlike math and code problems which often have objectively correct answers, moral dilemmas are an excellent testbed for process-focused evaluation because they allow for multiple defensible conclusions. To do so, we present MoReBench: 1,000 moral scenarios, each paired with a set of rubric criteria that experts consider essential to include (or avoid) when reasoning about the scenarios. MoReBench contains over 23 thousand criteria including identifying moral considerations, weighing trade-offs, and giving actionable recommendations to cover cases on AI advising humans moral decisions as well as making moral decisions autonomously. Separately, we curate MoReBench-Theory: 150 examples to test whether AI can reason under five major frameworks in normative ethics. Our results show that scaling laws and existing benchmarks on math, code, and scientific reasoning tasks fail to predict models' abilities to perform moral reasoning. Models also show partiality towards specific moral frameworks (e.g., Benthamite Act Utilitarianism and Kantian Deontology), which might be side effects of popular training paradigms. Together, these benchmarks advance process-focused reasoning evaluation towards safer and more transparent AI.

21.
arXiv (CS.CL) 2026-06-15

GitOfThoughts: Version-Controlled Reasoning and Agent Memory You Can Replay, Diff, and Merge

Large language model (LLM) reasoning is ephemeral: chains of thought vanish with the context window, pruned search branches leave no record, and memory buffers cannot be diffed, merged, or audited. Every other complex software process (code, infrastructure, data, experiments) is version-controlled; reasoning is not. We introduce GitOfThoughts, which stores an agent's reasoning tree as a git repository: every scored thought is a commit, scores are notes, outcomes are tags, and retrieval is "git log" over the agent's own history. This makes reasoning replayable, auditable, and mergeable across agents at near-zero engineering cost. We then ask the harder question: does memory, in any substrate, actually improve accuracy? Across five substrates (none, markdown, vector, graph, git), two benchmarks, two model scales, and pre-registered replications, the answer for novel problems is no. No memory format reliably helps, and a promising early result collapsed under its own pre-registered replication. Memory pays only above what we call the copyability threshold: when the retrieved case is a near-duplicate of the current problem (similarity >~ 0.8), accuracy jumps sharply; below it, nothing. The gain is answer retrieval, not method transfer: a 4.5x larger model doubles the near-duplicate payoff yet still cannot extract a transferable method from a worked example. The only general lever we find is test-time sampling. The case for git-as-substrate is therefore auditability, provenance, and mergeability at accuracy parity. We document a retracted result and a refuted hypothesis to model the evaluation standard we hold ourselves to.

22.
arXiv (quant-ph) 2026-06-19

Effects of interaction range on the mean-field dynamics of Bose polarons

arXiv:2606.20020v1 Announce Type: cross Abstract: We consider the three-dimensional Bose polaron problem in the regime of finite range interactions and competing length scales. Working in the reference frame of the impurity, we study both static and out of equilibrium properties of the system, in particular the transfer of momentum between the impurity and the host gas. We find that relaxation dynamics can occur via damped oscillations of the impurity velocity with simple dependence on the interaction strength. Furthermore, the equilibration process is sensitive to the type of the impurity-bath interaction. Specifically, interatomic forces describing ion-atom systems lead to much longer timescales and more pronounced oscillations in the strong coupling regime with respect to local interaction potentials. We also find that the effective masses can differ by a large amount between the two scenarios, even if the number of atoms in the polaron cloud remains similar for both cases.

23.
arXiv (CS.CL) 2026-06-11

BioDivergence: A Benchmark and Evaluation Framework for Hidden Contextual Contradictions in Biomedical Abstracts

Biomedical findings often seem to conflict across studies, but many of these differences are context-dependent rather than true contradictions. Variations in cohort, geography, assay protocol, disease subtype, and clinical setting can make both claims locally valid. Existing NLI and scientific claim-verification benchmarks reduce such cases to entailment, contradiction, or neutral, failing to capture the contextual structure behind divergence. To address this, we introduce BioDivergence, an evaluation framework with a six-class conflict taxonomy, a 13-axis divergence ontology, and four structured outputs per claim pair: conflict type, divergence axes, dominant confounder, and reconciliation explanation. We release BioDivergence-Silver-v1.0, an article-disjoint silver benchmark of 11,865 claim pairs across five biomedical domains, alongside a legacy deduplicated variant for comparison. Results show notable ranking differences between the two variants, with the fine-tuned reference model dropping about 12 points under the article-disjoint setting, while Mistral-7B-Instruct-v0.3 achieves 0.5523 accuracy and 0.3894 contextual-F1 on the 842-example primary test set. BioDivergence offers a more faithful way to distinguish contextual divergence from direct contradiction and to separate article-level memorization from genuine task learning.

24.
arXiv (CS.LG) 2026-06-15

On the Influence of the Feature Computation Budget on Per-Instance Algorithm Selection for Black-Box Optimization

arXiv:2605.04954v2 Announce Type: replace-cross Abstract: Per-instance algorithm selection (PIAS) takes advantage of complementarity between a set of algorithms by deciding which algorithm to run on a given instance. This decision is based on features of the instances, which, in the context of black-box optimization (BBO), require a part of the optimization budget to be computed. This raises two questions: (a) from which fraction of the budget spent on feature computation does PIAS become worth it for BBO, and (b) which fraction of the budget optimizes the tradeoff between feature accuracy and PIAS performance. To this end, we perform a broad study where PIAS with varying sampling budgets for feature computation is compared to the single best algorithm on a broad range of algorithm selection scenarios. These scenarios consist of two portfolio sizes, three problem sets, 4 dimensionalities, and 10 target budgets. We find that PIAS is viable for the majority of tested scenarios, even when as much as a quarter of the total budget is spent on feature computation. The tradeoff for the fraction of the budget spent on feature computation to maximize the benefit of PIAS is highly dependent on the specific AS scenario. Further, on average 20 percent of PIAS loss to the virtual best solver is explained by the budget spent on feature computation, highlighting the importance of properly accounting for the feature budget.

25.
arXiv (CS.LG) 2026-06-16

Coercivity and Local Convergence of Physical Learning in Linear Circuits

arXiv:2606.15443v1 Announce Type: cross Abstract: Physical learning methods train physical networks to perform computational tasks using only local update rules, exploiting the physics of the system to handle the global transfer of information. We provide the first local convergence analysis of three such methods – Equilibrium Propagation (EP), Coupled Learning (CL), and a new method we call Adjoint Coupled Learning (AL) – for linear circuits, in the limit of small-nudging for both discrete and continuous time. EP and AL perform gradient descent on a natural loss function, while CL follows modified dynamics with an additional cubic correction. Assuming the existence of a solution, we identify a coercivity condition, expressed as a rank condition on a matrix built from the network's incidence structure, under which the training loss decays exponentially and the parameters converge to the solution manifold. We show that coercivity can fail by exhibiting a kite circuit in which a symmetry causes the coercivity constant to degenerate on the solution manifold, but prove using Sard's theorem that such degeneracies are non-generic: coercivity holds at every point of the solution manifold for almost every choice of desired output.