Academic Intelligence · Curated Daily

Explore the Frontier of Global Academia

AcademicHub aggregates real-time literature from top journals and preprint platforms. Build your personal research radar and let large language models compile cross-disciplinary analysis briefings automatically.

01.
arXiv (CS.AI) 2026-06-16

SorryDB: Can AI Provers Complete Real-World Lean Theorems?

arXiv:2603.02668v2 Announce Type: replace Abstract: We present SorryDB, a dynamically-updating benchmark of open Lean tasks drawn from 78 real world formalization projects on GitHub. Unlike existing static benchmarks, often composed of competition problems, hillclimbing the SorryDB benchmark will yield tools that are aligned to the community needs, more usable by mathematicians, and more capable of understanding complex dependencies. Moreover, by providing a continuously updated stream of tasks, SorryDB mitigates test-set contamination and offers a robust metric for an agent's ability to contribute to novel formal mathematics projects. We evaluate a collection of approaches, including generalist large language models, agentic approaches, and specialized symbolic provers, over a selected snapshot of 1000 tasks from SorryDB. We show that current approaches are complementary: even though an agentic approach based on Gemini Flash is the most performant, it is not strictly better than other off-the-shelf large-language models, specialized provers, or even a curated list of Lean tactics.

02.
arXiv (math.PR) 2026-06-19

Establishing an $\Omega(\sqrt{d})$ complexity lower bound for PDMP samplers and how to break it: a sub-$\sqrt{d}$ algorithm for Gaussian-tailed targets

arXiv:2606.19909v1 Announce Type: cross Abstract: Despite the theoretical appeal of their non-reversibility, to date, no Piecewise Deterministic Markov Process (PDMP) samplers have been developed that scale better than $\mathcal{O}(\sqrt{d})$ in computational complexity with respect to the target dimension $d$. We prove that this is a fundamental limitation by establishing an $\Omega(\sqrt{d})$ lower bound on the algorithmic complexity of PDMP samplers in a standard setup. By relaxing the assumption that the target density must remain invariant at all continuous times, we then demonstrate how to bypass this barrier. Specifically, we introduce a novel PDMP sampling scheme and show that it achieves an empirical complexity of $\mathcal{O}(d^\alpha)$, where $\alpha \in [0.2, 0.3]$ for Gaussian-tailed targets. In addition, this PDMP scheme is locally adaptive in both trajectory length and distance between velocity updates.

03.
arXiv (CS.CV) 2026-06-17

LiveStarPro: Proactive Streaming Video Understanding with Hierarchical Memory for Long-Horizon Streams

Despite the remarkable progress of Video Large Language Models (Video-LLMs), current online architectures still struggle to simultaneously process continuous video streams, decide autonomously when to respond, and preserve long-horizon contextual memory. These obstacles undermine real-time responsiveness and cause severe forgetting throughout prolonged interactions. In this work, we introduce LiveStarPro, a live streaming assistant that is designed for proactive video understanding over long-horizon streams. The design of LiveStarPro rests on three complementary components. The first component is Streaming Verification Decoding (SVeD), an inference framework that identifies the appropriate response timing through single-pass perplexity verification, thereby eliminating the dependency on explicit silence tokens. The second component is Streaming Causal Attention Masks (SCAM), a training strategy that enforces incremental video-language alignment over variable-length streams. The third component is Tree-Structured Hierarchical Memory (TSHM), a recursive memory architecture that organizes evicted historical information into event chains and consequently enables efficient retrieval from effectively unbounded video streams. To facilitate a comprehensive evaluation under realistic online conditions, we further present OmniStarPro, a large-scale benchmark that spans 15 diverse real-world scenarios and that extends to hour-scale streams for the assessment of long-term recall. Extensive experiments demonstrate that LiveStarPro consistently surpasses existing methods, attaining a 28.9% improvement in semantic correctness and an 18.2% reduction in timing error, while its streaming key-value cache further yields a 1.58x inference speedup over the same model without caching. The model and the code are publicly available at https://github.com/sotayang/LiveStarPro.

04.
medRxiv (Medicine) 2026-06-23

What Is the Optimal Timing and Frequency of Workload-Matched Postprandial Physical Activity Breaks? A Randomized Controlled Crossover Study of Cardiometabolic and Cognitive Responses During Sedentary Behavior

Purpose Postprandial sedentary behavior is associated with negative health effects and constitutes a large part of daily life in modern society. This study investigated how the timing of physical activity after eating influences glucose levels, cerebral and muscle oxygenation, cognitive performance, and well-being during subsequent sitting. Methods In a four-armed randomized crossover trial, healthy adults consumed four standardized meals separated by 48-hour washout periods. Each meal was followed by 2 hours of sitting combined, in random order, with one of four interventions: (1) sitting only, (2) 15 minutes of moderate intensity cycling immediately after eating, (3) 15 minutes of cycling 20 minutes after eating, or (4) three workload-matched five-minute cycling bouts during sitting. Interstitial glucose (continuous glucose monitoring), cerebral and muscle oxygenation (Functional near infrared spectroscopy), cognitive performance (Stroop test), heart rate, blood pressure, and subjective ratings were assessed every 30 minutes. Data were analyzed using repeated-measures ANOVA. Results Twenty participants (mean age 27.1{+/-}10.3 years, 12 females) completed the study. Cycling immediately after eating reduced mean glucose levels during postprandial sitting, while both 15-minute cycling bouts increased cerebral oxygenation. All active conditions enhanced muscle oxygenation. Heart rate and arousal increased with delayed cycling and active breaks. No effects were observed for blood pressure, cognitive performance, focus, or well-being. Conclusion A short bout of physical activity immediately after eating reduces postprandial hyperglycemia and improves brain oxygenation during sitting, whereas delayed activity and brief breaks increase physiological activation without cognitive or perceptual benefits.

05.
arXiv (CS.CV) 2026-06-17

MeiBRD: Meta-Learning Intraoperative Biomechanical Residual Deformation

Accurate intraoperative liver registration is challenging due to substantial soft-tissue deformation yet sparse intraoperative measurements. Biomechanical models regularize this ill-posedness with prior knowledge but exhibit persistent prediction bias due to simplifying assumptions, while data-driven learning solutions struggle with data efficiency, generalization, and physical plausibility. We propose a hybrid registration framework that adapts a biomechanical prior using sparse intraoperative correspondences. Rather than learning a full deformation field, we learn a residual deformation function that corrects linear biomechanical predictions, modeled as a graph neural diffusion function with geometry-aware attention over the 3D liver mesh. To enable long-range information transfer of sparse observations, we take a novel perspective of sparse intraoperative measurements as context samples where input-output pairs of the residual deformation function are fully observed, casting the problem into learning-to-learn this residual function from intraoperative context samples with feedforward meta-learners. Experiments on a deformable liver phantom dataset demonstrate improved registration accuracy and generalization compared to rigid, biomechanical, and data-driven baselines, particularly for out-of-distribution geometries and deformations.

06.
arXiv (CS.AI) 2026-06-15

Robustness without Wrinkles: Parallel Simulation and Robust MPC for Certified Deformable Manipulation

arXiv:2606.14188v1 Announce Type: cross Abstract: We present CORD-SLS, a real-time control method for safe deformable object manipulation, with a focus on ropes and cloth. At its core is a GPU-parallel differentiable simulator with contact smoothing which enables efficient gradient-based planning through intermittent contact. To robustly satisfy constraints under model and sensing uncertainty, we develop a real-time, GPU-parallel output-feedback robust model predictive control (MPC) algorithm that plans with this simulator. We further show that the simulator accelerates model-based RL for training neural manipulation policies. To improve real-world robustness, we use conformal prediction to calibrate visual-feedback and perception-error bounds for MPC, producing reachable tubes that enable high-probability safe control. We evaluate CORD-SLS on high-dimensional, contact-rich rope and cloth manipulation tasks in simulation and hardware, including obstacle avoidance, routing, folding, and smoothing. Across settings, CORD-SLS achieves millisecond-speed planning, exceeding baselines in safety, speed, and task success.

07.
arXiv (CS.CL) 2026-06-18

LLM Compression by Block Removal with Constrained Binary Optimization

In this paper, we formulate the compression of large language models (LLMs) by optimally deleting transformer blocks (``block removal'') as a constrained binary optimization (CBO) problem that can be mapped to a physical system (Ising glass), whose energies are a strong proxy for downstream model performance. This formulation enables an efficient ranking of a large number of candidate block-removal configurations yielding many high-quality, non-trivial solutions beyond those only removing consecutive regions. Our method performs strongly in the deep compression regime, such as for 50% compression of Llama-3.3-70B-Instruct, where we achieve an almost 23 percentage point increase on the MMLU benchmark compared to other state-of-the-art (SOTA) block-removal methods. For lighter compression, it performs on par with those methods across several benchmarks for Llama-3.1-8B-Instruct, Qwen3-14B (both before and after retraining), as well as Llama-3.3-70B-Instruct. The approach is computationally efficient and requires only forward and backward passes on a calibration dataset for a few active parameters. Additionally, we demonstrate that using good heuristic solvers for the CBO problem provides solutions that perform well on downstream tasks in negligible runtime when it is unfeasible to solve the problem exactly. The method can be readily applied to any architecture. We illustrate this generality on the recent NVIDIA-Nemotron-3-Nano-30B-A3B-FP8 model, which exhibits a highly inhomogeneous and challenging block structure, and where we outperform SOTA for AIME25 and GPQA when removing either 2 attention layers or 3 mixture-of-experts layers.

08.
arXiv (CS.AI) 2026-06-17

The Discrete-Log Clock: How a Transformer Learns Modular Multiplication

arXiv:2606.17399v1 Announce Type: cross Abstract: When small transformers grok modular multiplication, prior work reports that the learned embedding has a "dense" Fourier spectrum requiring all frequencies. This contrasts with modular addition, where only a sparse set of key frequencies suffices. We show this density is an artifact of analyzing in the wrong basis. The natural Fourier transform for multiplication is not the standard additive DFT but the multiplicative character transform, which decomposes functions on the multiplicative group $(\mathbb{Z}/p\mathbb{Z})^*$ into its irreducible representations. Applying this transform to a grokked transformer trained on $a \cdot b \bmod 113$, we find the embedding spectrum becomes highly sparse (Gini coefficient 0.58 vs. 0.07 in the additive basis) with only 4 key frequencies carrying significant energy. Furthermore, 96.9% of MLP neurons are cleanly tuned to a single multiplicative frequency, and neuron activation heatmaps reveal 2D-periodic structure when reordered by the discrete logarithm. These results demonstrate the transformer reduces multiplication to addition in discrete-log space, implementing a "Discrete-Log Clock" algorithm analogous to Nanda et al.'s Clock algorithm for addition. The methodology generalizes: matching the analysis basis to the algebraic structure of the task reveals interpretable structure where standard tools see noise.

09.
arXiv (CS.AI) 2026-06-19

Temporal Self-Imitation Learning

arXiv:2606.19752v1 Announce Type: cross Abstract: Long-horizon robot manipulation policies trained with reward shaping can still exploit dense rewards through inefficient interaction, while rare efficient behaviors may be forgotten during training. We argue that temporal efficiency itself provides a powerful and underutilized source of self-supervision for reinforcement learning. We introduce Temporal Self-Imitation Learning (TSIL), a reinforcement learning framework that mines temporally efficient successful trajectories generated during learning and converts them into reusable supervision for future policy improvement. TSIL progressively refines learning using configuration-conditioned adaptive temporal targets derived from fast successful trajectories, while preserving and replaying efficient behaviors through efficiency-weighted self-imitation learning. Across 15 distinct long-horizon manipulation tasks, TSIL consistently improves learning efficiency, task-completion efficiency, revisitation of fast successful behaviors, and robustness to unstable training conditions. More broadly, our results suggest that the temporal structure of successful behavior itself provides a scalable self-supervisory signal for reinforcement learning beyond manually engineered reward shaping alone.

10.
arXiv (quant-ph) 2026-06-11

Fermions are fundamentally more nonlocal than Bosons

arXiv:2606.12363v1 Announce Type: new Abstract: Bell's theorem shows that entangled quantum particles can exhibit correlations that classical particles cannot reproduce without an additional nonlocal resource, such as communication. In this sense, quantum particles are fundamentally more nonlocal than classical ones, and entanglement becomes unavoidable in physics. Here we prove the analogous result within quantum theory itself: indistinguishable fermions transmitted through a quantum network can generate correlations that distinguishable particles or indistinguishable bosons cannot reproduce without additional communication. In the same sense, fermions are fundamentally more nonlocal than bosons or distinguishable particles, motivating fermionic anticommutation and indistinguishability as unavoidable operational resources. Our result further implies that fermions can strictly surpass all qubit-based protocols for certain distributed computing tasks, demonstrating that a complete understanding of information processing requires going beyond qubits to fermionic information carriers - febits.

11.
arXiv (CS.LG) 2026-06-17

AnomalyMatch: Discovering Rare Objects of Interest with Semi-supervised and Active Learning

arXiv:2505.03509v3 Announce Type: replace Abstract: Anomaly detection in large datasets is essential in astronomy and computer vision. However, due to a scarcity of labelled data, it is often infeasible to apply supervised methods to anomaly detection. We present AnomalyMatch, an anomaly detection framework combining the semi-supervised FixMatch algorithm using EfficientNet classifiers with active learning. AnomalyMatch is tailored for large-scale applications and integrated into the ESA Datalabs science platform. In this method, we treat anomaly detection as a binary classification problem and efficiently utilise limited labelled and abundant unlabelled images for training. We enable active learning via a user interface for verification of high-confidence anomalies and correction of false positives. Evaluations on the GalaxyMNIST astronomical dataset and the miniImageNet natural-image benchmark under severe class imbalance display strong performance. Starting from five to ten labelled anomalies, we achieve an average AUROC of 0.96 (miniImageNet) and 0.89 (GalaxyMNIST), with respective AUPRC of 0.82 and 0.77. After three active learning cycles, anomalies are ranked with 76% (miniImageNet) to 94% (GalaxyMNIST) precision in the top 1% of the highest-ranking images by score. We compare to the established Astronomaly software on selected 'odd' galaxies from the 'Galaxy Zoo- The Galaxy Challenge' dataset, achieving comparable performance with an average AUROC of 0.83. Our results underscore the exceptional utility and scalability of this approach for anomaly discovery, highlighting the value of specialised approaches for domains characterised by severe label scarcity

12.
arXiv (CS.LG) 2026-06-16

Stochastic trace estimation with tensor train random vectors

arXiv:2606.15679v1 Announce Type: cross Abstract: Stochastic trace estimation is a standard tool for approximating the trace of a large-scale matrix available only through matrix-vector products. However, in tensor-structured settings, unstructured Gaussian or Rademacher test vectors may be prohibitively expensive to store and compute with, while cheaper rank-one tensor-product vectors can require sample complexities that grow exponentially with the tensor order. This work studies Gaussian random tensor train vectors as a structured alternative for stochastic trace estimation. We show that, with a suitable choice of the tensor train rank, random tensor train vectors recover dimension-independent guarantees for the Girard–Hutchinson estimator. In particular, a median-of-means variant with tensor train rank $r \geq d-1$ achieves the same dependence on the accuracy $\varepsilon$ and failure probability $\delta$ as the classical estimator based on unstructured Gaussian vectors. We further prove an oblivious subspace injection result for sketches formed from independent Gaussian random tensor train vectors: tensor train rank $r\geq d-1$ and $\mathcal{O}(\varepsilon^{-2}(k+\log(1/\delta)))$ samples suffice for a $k$-dimensional target subspace. Finally, we investigate the use of such sketches within the Nystr\"{o}m++ framework. We show that the resulting estimator can achieve the desired $\mathcal{O}(\varepsilon^{-1})$ sample complexity under an additional spectral-tail condition. These results provide clarififcation on both the potential and the limitations of random tensor train vectors in stochastic trace estimation.

13.
medRxiv (Medicine) 2026-06-10

Developing a Unified Criminal Justice Pathway into Drug and Alcohol Treatment from Police Custody: A Public Health Service Evaluation and Pathway-Design Project in Blackpool, United Kingdom

Introduction: Blackpool, England's most deprived local authority, has the highest drug-related death rate in the country. People in police custody with problem substance use are a key Core20PLUS5 inclusion-health group, yet referral from the police into structured drug and alcohol treatment is fragmented and relies heavily on self-report. We evaluated the current police-to-treatment route in Blackpool and designed an evidence-informed unified pathway. Materials and Methods: A mixed-methods service evaluation and pathway-design project was conducted during a six-month General Practice / Public Health rotation. Routinely collected referral data from Horizon (the local specialist drug and alcohol service) covering the 47-month period from December 2019 to October 2023 were analysed. Findings were triangulated with national policy, the Project ADDER and Liaison and Diversion evaluations, and the international evidence on police-led pre-arrest diversion. Results: Of 5,900 total referrals into Horizon over 47 months, only 269 (4.56%) originated from the police. Police referrals accounted for fewer than 5% of monthly referrals in 30 of 47 months, for 5 to 9.9% in 16 months, and for >/= 10% in only one month (10.8%, December 2022). Blackpool recorded 76 drug-misuse deaths in 2019-21 (19.4 per 100,000, approximately four times the England rate). A six-step unified pathway is proposed: Initiate Referral (opt-out, from ADDER Police and Liaison and Diversion); Initial Assessment; Tailored Treatment Plan; Continuous Support; Collaboration and Monitoring; and Evaluation and Adjustment. Conclusions: Police contact is markedly under-used as a gateway to treatment despite Blackpool having the highest drug-related mortality in England. An opt-out, multi-agency pathway anchored in Core20PLUS5 has the potential to narrow the treatment gap, reduce re-offending, and address the structural health inequalities that drive premature mortality.

14.
arXiv (CS.CV) 2026-06-17

Contactless Respiratory Monitoring on Heterogeneous Mobile Robots: A Multimodal Edge-Computing Framework

Respiratory-rate (RR) monitoring is a critical component of remote triage and victim assessment in emergency response, disaster recovery, and infectious-disease scenarios, where minimizing physical contact can reduce responder risk and improve operational safety. However, field deployment of contactless RR monitoring remains challenging due to variable illumination, posture changes, platform heterogeneity, and the impracticality of wearable sensors in hazardous environments. In this paper, we present a modality-adaptive contactless RR monitoring framework for heterogeneous mobile robots with onboard edge computing. The proposed system combines brightness-adaptive sensor selection across RGB, thermal, near-infrared (NIR), and low-light cameras, keypoint-guided chest ROI extraction for posture-robust monitoring, and a signal-quality-index (SQI)-based filtering mechanism for reliable respiratory estimation. We implement and evaluate the framework on three robotic platforms spanning quadruped and wheeled locomotion and multiple edge-computing architectures. Experiments conducted across diverse lighting conditions, subject poses, and robot-to-subject distances demonstrate that the framework generalizes across platforms without per-platform algorithmic retuning, while revealing modality-specific operational boundaries. RGB provides the broadest coverage up to 8m, NIR remains effective up to 6m, thermal is reliable only at short range, and low-light sensing supports monitoring in complete darkness up to 8m. Overall, the results demonstrate the feasibility of multimodal contactless RR monitoring on mobile robots and support its use as a foundation for autonomous triage and victim assessment in hazardous search-and-rescue settings.

15.
arXiv (math.PR) 2026-06-17

On Injectivity of Phase Retrieval

Authors:

arXiv:2606.17922v1 Announce Type: cross Abstract: In this short note, we prove that if $A \in \mathbb C^{N \times M}$ with $N=4M-5$ has i.i.d.\ standard complex Gaussian entries, then the probability that the phase retrieval map generated by $A$ is not injective is positive. This proves Part (1) of a conjecture of Cynthia Vinzant, which was later restated by Afonso S. Bandeira in [BDL+26]. The main result of this paper was obtained using generative AI, in particular the Rethlas system.

16.
arXiv (CS.CV) 2026-06-15

QualiaNet: An Experience-Before-Inference Network

Authors:

Human 3D vision involves two distinct stages: an Experience Module, where stereo depth is extracted relative to fixation, and an Inference Module, where this experience is interpreted to estimate 3D scene properties. Paradoxically, although stereo vision does not provide us with absolute distance information, it nonetheless affects our inferences about distance. We propose the Inference Module exploits a natural scene statistic: near scenes produce vivid disparity gradients, while far scenes appear comparatively flat. QualiaNet implements this two-stage architecture computationally: disparity maps simulating human stereo experience are passed to a CNN trained to estimate distance. The network can recover distance from disparity gradients alone, validating this approach.

17.
arXiv (CS.LG) 2026-06-17

Eigen-Spike Emergence and Quadratic Equivalents for Conjugate Kernels on Nonlinearly Separable Data

arXiv:2605.29669v2 Announce Type: replace-cross Abstract: Recent work in random matrix theory (RMT) has developed the notion of deterministic equivalents: typically linear surrogate models that approximate the spectral behavior of large nonlinear random matrices, such as nonlinear feature maps in neural networks (NNs). Such equivalents make theoretical predictions tractable by reducing a complex model to a simpler one with properties that fall under the umbrella of classical RMT tools. However, this leaves open the question of whether this idealized linear equivalence remains meaningful for classification of high-dimensional nonlinearly separable data. Motivated by this, we consider the conjugate kernel (CK), which is the nonlinear feature map of a one-layer feedforward NN, under a canonical nonlinearly separable dataset for the XOR problem; and we use the study of informative outlier eigenvalues in the CK and whether their corresponding eigenvectors asymptotically align with XOR labels as a proxy for nonlinear learnability. We develop a robust quadratic equivalent of the CK matrix that enables a precise analysis of emergent informative spikes, as one modifies various knobs common in ML practice: sample complexity, signal-to-noise ratio (SNR), nonlinear activation choice, and pretrained features. We identify regimes in which these knobs move the CK beyond the linear equivalent and produce BBP-type transitions to label-aligned outlier eigenspaces. Our analysis helps bring deterministic-equivalence tools from RMT to bear on problems of practical relevance in ML.

18.
medRxiv (Medicine) 2026-06-22

A Plasmodium vivax controlled human infection and transmission model to evaluate interventions across the life cycle

Background Plasmodium vivax is an underappreciated cause of malaria disease burden. No reproducible and standardized full life-cycle controlled human malaria infection (CHMI) model to accelerate development of novel interventions is available. Methods This transmission-CHMI trial was conducted in Nijmegen, Netherlands. Healthy, malaria-naive adults were sequentially enrolled into three cohorts of four and inoculated with the asexual blood-stage isolate PvW1. Primary endpoint was proportion of oocyst-positive laboratory-reared Anopheles stephensi mosquitoes. The sequential design allowed for adaptations between cohorts. At parasitemia >10 parasites/microL or symptom onset, participants received oral gametocyte-sparing treatment (GST): mepacrine (Cohort 1 and 3; 100 mg at 0, 8 16 hours, then once daily for 3 days) or piperaquine (Cohort 3; 480 mg single-dose). Transmission was assessed by direct skin feeding (DSF) and membrane feeding assay (DMFA) with and without enrichment of gametocytes. End-of-study treatment was atovaquone-proguanil (1000/400 mg once daily for 3 days). The trial was registered: NL-OMON57011. Findings Participants were enrolled between September 17, 2024 and March 25, 2025, all (12/12) developed parasitemia and transmitted PvW1 to mosquitoes. No serious adverse events occurred. Most adverse reactions were related to malaria. Mepacrine and piperaquine reduced asexual parasitemia while preserving gametocytemia and transmission. Peak transmission occurred within 3 days after GST and depended on the parasite developmental cycle, with highest gametocyte-infectivity ~48 h post ring-stage. In Cohort 3, mosquito infection reached 100% in all transmission assays. Median peak oocyst counts were 24 (IQR: 14-31) for DSF, 17 (12-19) for DMFA, and 150 (116-199) for enriched DMFA. A two-fold increase in pre-GST maximal parasitemia was associated with 20 additional oocysts (95% CI 8,6-32) in enriched DMFA. Sporozoites were viable in primary human hepatocytes. Interpretation A PvW1 transmission-CHMI is reproducible and safe, enabling P. vivax sporozoite production, relapse models and evaluation of transmission-blocking interventions.

19.
Nature (Science) 2026-06-10

A first-in-class pulsatile FXR agonist for bile-acid-related liver diseases

Authors:

Nuclear receptors are central regulators of metabolism1, yet therapeutic strategies that enforce continuous receptor activation frequently lead to reduced efficacy and unacceptable toxicity. Here we report a first-principles drug design strategy that aligns pharmacokinetics with physiological signalling cycles. We developed linafexor, a potent non-bile-acid agonist of the farnesoid X receptor (FXR)2; it is engineered for rapid systemic clearance, which enables pulsatile receptor activation that mirrors endogenous bile acid dynamics3–5. Linafexor has robust efficacy across multiple preclinical models of metabolic dysfunction-associated steatohepatitis6, liver fibrosis7, primary biliary cholangitis and primary sclerosing cholangitis8,9. Transcriptomic analyses reveal that, unlike long-acting FXR agonists10,11, linafexor preserves cyclic FXR signalling, avoids receptor downregulation and prevents broad transcriptional dysregulation. Direct manipulation of delivery patterns demonstrates that sustained FXR activation—independent of compound identity—induces severe toxicity, establishing activation duration as a determinant of therapeutic index. In phase 1 clinical studies (ClinicalTrials.gov; NCT05082779), linafexor administered once daily produces transient FXR pathway engagement, marked by (1) induction of FGF1912–14, a key endocrine mediator of bile acid feedback regulation; and (2) suppression of C415, an intermediate reflecting hepatic bile acid synthesis, with no treatment-related adverse events. Together, these findings identify pulsatile FXR activation as a mechanistically grounded and clinically translatable strategy, and establish linafexor as a first-in-class therapeutic for bile acid–related liver diseases. Linafexor is a rapidly cleared FXR agonist designed to mimic natural bile acid signalling, achieving transient receptor activation with strong efficacy and reduced toxicity in preclinical and early clinical studies.

20.
arXiv (CS.CV) 2026-06-18

Clinically Aligned Geometry Constraints for Robust IVUS Vessel Boundary Segmentation

Intravascular ultrasound (IVUS) lumen and external elastic membrane (EEM) segmentation is important for quantitative coronary plaque burden assessment. Errors in lumen or EEM delineation directly propagate to plaque area, plaque burden and geometric measurements. However, standard methods prioritising overlap scores often suffer from boundary drift and topology errors, leading to inaccurate clinical measurements. We present GeoCat, a geometry-consistent network that processes 5-frame IVUS clips using dual Cartesian-polar encoders with cross-domain attention and temporal fusion. A differentiable geometry consistency loss directly supervises clinically relevant descriptors including diameters, orientations, and cross-sectional areas. The model is trained on 12,242 annotated frames from 146 patients acquired with two commercial IVUS systems. We evaluate performance using both segmentation accuracy and plaque-relevant clinical metrics, including Dice/IoU, boundary measures(95HD (mm), ASSD), topology violation rate, and clinical geometry errors (dmax/dmin, angles, and areas). On our dataset, GeoCat achieves a Dice of 0.93, reduces 95HD to 0.14 mm, and lowers topology violations to 1.0%. Importantly, it significantly improves geometric fidelity, yielding diameter errors of 0.13-0.16 mm and angular errors of ~8 degrees, supporting reliable plaque burden quantification.

21.
arXiv (CS.AI) 2026-06-16

Integrating Reasoning and Generalization in Text-to-SQL via Self-Enhanced Fine-Tuning

arXiv:2606.15598v1 Announce Type: new Abstract: Text-to-SQL aims to translate natural language questions into executable SQL queries over structured databases, enabling non-expert users to access data intuitively. While recent advances in large language models (LLMs) have shown promise in this task, existing LLM-based approaches often struggle to strike a balance between strong reasoning capabilities and robust generalization. To address these limitations, we propose CoTE-SQL to enhance the LLM-based text-to-SQL generation with three key innovations: (i) self-enhanced reasoning traces distilled from LLMs without human annotation, (ii) structured chain-of-thought (CoT) prompting with modular decomposition and examples retrieval, and (iii) error-aware revision based on SQL execution feedback. Extensive experiments on the Spider and Bird benchmarks demonstrate that CoTE-SQL achieves new state-of-the-art performance among methods built on open-source LLMs with comparable model sizes on Bird (53.39% EX / 59.02 VES) and strong results on Spider (79.60% EX / 77.19 VES), with especially significant gains on complex queries. Results highlight the effectiveness of combining self-enhancement, structured reasoning, and execution-time feedback within an LLM-based framework for text-to-SQL design.

22.
arXiv (CS.AI) 2026-06-15

Learning What to Predict: Downstream-Guided Task Design for Continued Pretraining

arXiv:2601.22108v2 Announce Type: replace-cross Abstract: Continued pretraining is optimized with fixed self-supervised tasks but selected by downstream performance, creating a coarse feedback loop in which practitioners evaluate checkpoints, change data mixtures or objectives, and restart runs, while individual updates remain blind to target capabilities. We ask whether a small set of verifiable downstream examples can provide step-level feedback without directly supervising the learner. We introduce V-pretraining, which decouples a learner trained only with a self-supervised loss from a lightweight task designer that constructs targets or views for unlabeled batches. Given the current learner and batch, V-pretraining scores a candidate construction by predicting the first-order reduction in downstream loss after the induced self-supervised update. The designer maximizes this value; the learner then applies the update with targets or views detached, so downstream labels never update learner parameters. We instantiate V-pretraining as adaptive top-K soft targets for language modeling and learned views or masks for self-supervised vision. Across both modalities, V-pretraining improves target capabilities without degrading generalization. Under wall-clock-matched continued pretraining, it improves GSM8K Pass@1 for Qwen models using 1,024 GSM8K examples only as feedback, including a +7.4 point single-run gain for Qwen2.5-0.5B. In vision, it improves DINOv3 transfer to ADE20K semantic segmentation and NYUv2 depth estimation while preserving ImageNet linear accuracy, suggesting that feedback-guided task construction can improve target capabilities without collapsing general-purpose representations.

23.
arXiv (CS.LG) 2026-06-24

Extended pseudo-spectral physics-informed neural networks for phase-field models

arXiv:2606.24660v1 Announce Type: cross Abstract: Phase-field models play a central role in the continuum description of phase separation, in which the bulk free-energy density and the interfacial thickness parameter determine pattern formation and microstructural evolution. In practice, these constitutive quantities are rarely known a priori and must be inferred from limited dynamical observations. In this work, an extended pseudo-spectral physics-informed neural network (ESPINN) framework is developed for the inverse identification of phase-field models from transient snapshot data. It enables the simultaneous recovery of both the bulk chemical potential and unknown gradient coefficients. Numerical experiments on the one-dimensional Cahn-Hilliard equation demonstrate accurate and statistically stable reconstruction in the noiseless regime, with substantial constitutive information recoverable from even a single snapshot pair. In the presence of noise, reconstruction accuracy degrades gracefully, and increasing the number of snapshots improves robustness by reducing variance across runs. These results establish ESPINN as a data-efficient and physically consistent approach for learning free-energy structure in continuum models of phase separation.

24.
arXiv (CS.AI) 2026-06-19

GDGU: A Gradient Difference-based Graph Unlearning Method for Cyberattack Localization in Electric Vehicle Charging Networks

arXiv:2606.19566v1 Announce Type: cross Abstract: Electric vehicle charging stations (EVCSs) can expose distribution feeders to cyberattacks. While machine learning methods, including graph neural networks, can localize which bus is compromised, significant challenges remain in data sharing and model training. For example, privacy regulations grant EVCS owners the right to delete their training data from a deployed model, yet retraining from scratch on every request is computationally prohibitive. To address this, we study graph unlearning (GU) for EVCS cyberattack localization, formulated as a feature-level unlearning problem on a graph-level multi-label classification task. Specifically, we propose gradient difference-based graph unlearning (GDGU), which removes the influence of the requested deletion data through a first-order parameter correction. The correction is computed from the gradient difference between the original training data and a modified dataset in which only the charging power features at the requested EVCS buses are unlearned. Then, a batch-normalization recalibration and a brief recovery fine-tuning step are applied to restore localization utility. We benchmark GDGU against two second-order GU baselines on the IEEE 34-bus, 123-bus, and 8500-node distribution networks across three graph neural network backbones and cumulative unlearning scenarios. GDGU matches the strongest baseline on localization utility and reaches forgetting fidelity close to full-retraining, while unlearning 10 to 12 times faster than retraining from scratch and using far less memory than the second-order GU baselines.

25.
arXiv (CS.LG) 2026-06-19

We Need to Rethink Benchmarking in Anomaly Detection

arXiv:2507.15584v2 Announce Type: replace Abstract: Despite the continuous proposal of new anomaly detection algorithms and extensive benchmarking efforts, progress seems to stagnate, with only minor performance differences between established baselines and new algorithms. In this position paper, we argue that this stagnation is due to limitations in how we evaluate anomaly detection algorithms. In current benchmarks, a trivial algorithm that only checks for extreme values in individual features performs competitively with state-of-the-art deep learning methods, despite failing on simple cases such as anomalies within an annulus of normal points. Moreover, existing benchmarks do not adequately reflect the diversity of anomaly detection applications, making it difficult for practitioners to reliably select algorithms for their applications. Consequently, we need to rethink benchmarking in anomaly detection. In our opinion, anomaly detection should be studied using scenarios that group applications sharing relevant characteristics, defined through a common taxonomy. Benchmarking within scenarios enables scenario-specific choices for preprocessing, metrics, and model selection, clarifying which advances transfer across similar applications and providing practitioners with reliable guidance for their specific contexts.