Academic Intelligence · Curated Daily

Explore the Frontier of Global Academia

AcademicHub aggregates real-time literature from top journals and preprint platforms. Build your personal research radar and let large language models compile cross-disciplinary analysis briefings automatically.

01.
arXiv (CS.CL) 2026-06-19

HydraHead: From Head-Level Functional Heterogeneity to Specialized Attention Hybridization

The quadratic complexity of attention poses a critical bottleneck for long-context processing, spurring interest in hybrid attention designs. Most open-source hybrid models adopt a layer-wise strategy. Yet, prior work has noted the inherent difficulty of integrating Linear Attention (LA) with Full Attention (FA), suggesting that the design space of attention hybridization remains underexplored. To probe this space, we conduct interpretability analysis and observe that layers exhibit block-wise functional similarity, while individual heads within the same layer display distinct functional specialization despite sharing input features. This head-level heterogeneity suggests that the head dimension provides a natural and principled granularity for fusing heterogeneous attention signals. Building on this insight, we introduce HydraHead, a novel architecture that hybridizes FA and LA along the head axis. HydraHead features two key innovations: (1) an interpretability-driven selection strategy that identifies retrieval-critical heads and preserves FA only for them, and (2) a scale-normalized fusion module that reconciles the distributional gap between FA and LA head outputs. By leveraging a three-stage transfer pipeline with parameter reuse and distillation, we achieve high-performance hybrid models with minimal training overhead. Under a unified training setup, HydraHead outperforms other hybrid designs in long-context tasks while maintaining strong general reasoning. With interpretability-driven head selection, it matches a 3:1 layer-wise hybrid's long-context performance at a 7:1 LA-to-FA ratio. Crucially, trained on only 15B tokens, HydraHead achieves over 69% improvement over the baseline at 512K context length, approaching Qwen3.5, a leading model of comparable size with a native context length of 256K. This highlights the significant scaling potential of head-level hybridization.

02.
arXiv (quant-ph) 2026-06-16

3D Ising criticality with Platonic lattice superconducting qubits

arXiv:2606.16854v1 Announce Type: new Abstract: The three-dimensional (3D) Ising model is a foundational model in statistical physics and critical phenomena, yet its analytical intractability has long impeded the precise determination of universal critical exponents. While high-precision estimates have been obtained through classical numerical methods and conformal bootstrap techniques, a direct quantum simulation of the 3D Ising criticality remains challenging, requiring nontrivial connectivity, sufficient system size, and high spectral resolution. In this work, assisted by the state-operator correspondence of conformal field theory, we perform a digital quantum simulation of the 3D Ising critical exponents using a multiply-connected 9-qubit superconducting quantum processor with a Platonic lattice geometry. Employing an extended variational quantum eigensolver equipped with a phase-based loss function, we variationally prepare the low-energy eigenstates of the transverse-field Ising model on a cubic Platonic lattice encoded in an 8-qubit register. The four lowest eigenenergies are extracted via Fourier-transform analysis and high-precision numerical fitting, agreeing with the exact diagonalization values up to +/- 0.001. The resulting scaling dimension Delta_epsilon = 1.5850 and critical exponent nu = 0.7067 match well with theory.

03.
arXiv (CS.CL) 2026-06-16

The BD-LSC Dataset: Facilitating the Benchmarking of Models for Lexical Semantic Change Detection in Slang and Standard Usage

Automatic semantic change detection aims to identify how word meanings shift over time, offering insights into both linguistic and societal change. Despite recent progress in computational lexical semantic change (LSC), existing benchmarks and methods struggle to capture bi-directional semantic change, particularly cases where words simultaneously gain and lose senses. This problem is especially challenging for words that have both slang and standard meanings. To address these gaps, we introduce two complementary benchmark datasets. The Bi-Directional Lexical Semantic Change (BD-LSC) dataset captures sense gain, sense loss, and stability across three time periods, enabling the study of complex semantic trajectories. The SlangTrack Word Sense Disambiguation (ST-WSD) dataset provides fine-grained, instance-level sense annotations for words combining slang and standard usages, supporting systematic benchmarking of WSD and semantic change detection models. Using these benchmarks, we systematically evaluate models across different methodological families: unsupervised clustering using contextualised embeddings, supervised machine learning, transformer-based models, and state-of-the-art large language models. Among the evaluated systems, the few-shot GPT-4o model achieved the strongest aggregate performance on Exact Sense Match (ESM) and multi-label accuracy; however, Macro-F1 scores near 0.5 across all systems show that rare slang senses remain difficult, which we identify as the central open challenge.

04.
medRxiv (Medicine) 2026-06-15

A controlled human infection model for symptomatic pertussis in North America using the pertactin-producing clinical isolate D420

Background Despite widespread vaccination, pertussis remains a poorly controlled disease globally and results in substantial annual morbidity and mortality, particularly in young children. Controlled human infection models (CHIMs) using the causative agent Bordetella pertussis are promising systems to enable the study of pertussis disease pathogenesis and immunology and to rapidly assess vaccines and therapeutics. While a pertussis CHIM that produces asymptomatic infection has been established in Europe, the development of a CHIM that leads to symptomatic illness would be advantageous for evaluating vaccine efficacy against both infection and disease. Methods Healthy participants 18-40 years of age were inoculated intranasally with one of eight doses (ranging from 104 to 108 colony forming units (CFU)) of the pertactin-producing B. pertussis isolate D420 at the challenge facility within the Canadian Center for Vaccinology (Nova Scotia, Canada). The study occurred in two stages. In stage one, the B. pertussis dose was escalated in cohort groups of five to six participants until reaching an endpoint where 70-90% of participants exhibited mild (non-severe, Grade 1 or 2) symptomatic infection, defined as the Human Infectious Dose 70-90 (HID70-90). In stage two, additional challenges were conducted for doses below, at, and above the identified HID70-90 to characterize the emerging pertussis model. For all challenge doses, participants were closely monitored during an inpatient stay of up to 24 days and post-discharge for laboratory-confirmed infection, pertussis symptoms, safety, and IgG antibody responses to four B. pertussis antigens including pertussis toxin, filamentous hemagglutinin, fimbriae, and pertactin. All participants received a five-day course of azithromycin, where timing of initiation depended on B. pertussis testing and symptoms. The study was conducted between July 4, 2022 and March 19, 2025. Findings Seventy-five participants were inoculated with one of the eight B. pertussis D420 challenge doses and completed the inpatient stay. From the stage-one dose escalation, we found that 107 CFU of B. pertussis D420 was the lowest dose that achieved the HID70-90, where 9 of 12 participants (75.0%) exhibited mild symptomatic infection. Following stage-two challenges, 16 of 22 total participants at 107 CFU (72.7%) developed mild symptomatic infection, thus verifying the HID70-90. The symptomatic infection rate below the HID70-90 at 5x106 CFU of D420 was 20.0% and above the HID70-90 at 5x107 and 108 CFU were 58.3% and 55.6%, respectively. Symptoms with elevated frequency for symptomatic infection (relative to background symptoms in non-infected) included nasal congestion, runny nose, fatigue, malaise, and cough. At the HID70-90, 50% of symptomatic infections included cough. Serological analyses of the four highest (stage-two) challenge doses (5x106, 107, 5x107, 108 CFU) revealed that antibody titres increased over time post-challenge. Seroconversion for at least one of the four studied antibodies was nearly twice as common for symptomatic (70.0%) than asymptomatic (35.7%) infection and was absent (0%) for non-infected. All infections were cleared following azithromycin treatment (100%) and there were no study-related serious adverse events. Interpretation A safe and reproducible symptomatic pertussis CHIM was achieved, providing a model for research on pertussis disease pathogenesis and immunology and for assessing vaccines and therapeutics. (Clinicaltrials.gov, NCT05136599).

05.
arXiv (CS.LG) 2026-06-19

Data Bias Mitigation under Coverage Constraints & The Price of Fairness

arXiv:2606.20461v1 Announce Type: new Abstract: Machine learning models have been shown to exhibit discriminatory outcomes or degraded performance for individuals at the intersection of multiple sensitive attributes, such as race and gender. This stems in part from two interrelated challenges: the lack of principled measures for quantifying bias (potentially intersectional), and insufficient representation of intersectional subgroups in training data. We extend a recent bias mitigation framework to incorporate coverage constraints that enforce sufficient representation across groups, including intersectional subgroups. Since achieving exactly zero bias for all groups may not be data efficient (meaning it may require large amounts of data), our solution trades small approximation errors in bias for greater data efficiency while satisfying coverage constraints. We also formulate bias mitigation as an integer linear program that optimizes over all mitigation strategies, and characterize the price of fairness, the minimum data modification cost, as a function of fairness tolerance. This is essential both for legal compliance, where regulations may mandate specific fairness thresholds, and for data governance, enabling practitioners to make informed trade-offs between bias reduction and data modification (particularly, data purchasing) costs. We evaluate our techniques on publicly available datasets, demonstrating that bias mitigation via our framework preserves predictive accuracy across multiple classifiers, and that coverage constraints, while motivated by statistical considerations, are essential for preserving downstream ML performance.

06.
arXiv (CS.LG) 2026-06-12

Scale Buys Interpolation, Structure Buys a Horizon: Certified Predictability for Equivariant World Models

Authors:

arXiv:2606.13092v1 Announce Type: new Abstract: Scale buys interpolation; structure buys a certified horizon. A world model's average error says nothing about whether a particular prediction can be trusted, or for how long. For equivariant latent world models we give a computable, multi-step certificate of the predictable horizon: $T$-step rollout error is provably constant over each symmetry orbit (Theorem A) and stratified channel-by-channel by the predictor's Lyapunov spectrum, $T_j(\epsilon)\sim\log(1/\epsilon)/\lambda_j$. The horizon is two-sided – a matching lower bound makes approximate equivariance provably horizon-limited – and the certificate is exclusive to structure: orbit-constant error characterizes equivariance, so no non-equivariant model has it at any scale. Empirically, on 40-D Lorenz-96 only a $\mathbb{Z}_N$-equivariant network recovers the full Lyapunov spectrum ($R^2{=}0.98$); dense and recurrent baselines fail. Because the spectrum is faithful, the certificate acts, a priori: under a fixed sensing budget a $c\times$-inflated certificate provably needs $c\times$ the budget, and the equivariant certificate meets a budget its inflated dense counterpart cannot – with zero calibration data. The same read-out, unchanged, audits public pretrained world models training-free: TD-MPC2 checkpoints land on the certificate's own scope taxonomy – calibrated where strongly expansive (ratio 0.94-1.02), optimistic where weakly expansive, correctly abstaining where contracting – a map a deployed monitor replicates cell-by-cell, out-of-sample. Across the official 1M-317M multitask ladder, calibration does not improve with parameters. On V-JEPA 2-AC (1B, real robot data) the measured cross-check correctly overrides an over-promising tangent spectrum – the cross-validated audit, not the raw number, is the deployable object. Scale buys interpolation, not a calibrated horizon.

07.
arXiv (CS.CL) 2026-06-16

MedSynth: Realistic, Synthetic Medical Dialogue-Note Pairs

Physicians spend significant time documenting clinical encounters, a burden that contributes to professional burnout. To address this, robust automation tools for medical documentation are crucial. We introduce MedSynth – a novel dataset of synthetic medical dialogues and notes designed to advance the Dialogue-to-Note (Dial-2-Note) and Note-to-Dialogue (Note-2-Dial) tasks. Informed by an extensive analysis of disease distributions, this dataset includes over 10,000 dialogue-note pairs covering over 2000 ICD-10 codes. We demonstrate that our dataset markedly enhances the performance of models in generating medical notes from dialogues, and dialogues from medical notes. The dataset provides a valuable resource in a field where open-access, privacy-compliant, and diverse training data are scarce. Code is available at https://github.com/ahmadrezarm/MedSynth/tree/main and the dataset is available at https://huggingface.co/datasets/Ahmad0067/MedSynth.

08.
arXiv (CS.CL) 2026-06-16

Sycophancy as Material Failure under Pushback Loading: A Multi-Axis Characterization Across Three Loading Cases and up to Seventeen Material Charges

Sycophancy in LLMs is documented across 70+ papers, but expert agreement on construct boundaries remains low (ICC=.184; Ye et al., 2026). The construct fragments because behavioral classification depends on which surface form is privileged. We adopt a materials-science framing: conversation as test specimen under load, LLM-model as material charge, pushback as progressive load, stance-flip as material failure. We characterize this failure across three loading cases (debate n=1000; false-presuppositions n=3400; ethical-setting n=3400; 10-17 material charges per case; 7800 specimens total) using 14 turn-level axis-measurements spanning velocity, damage accumulation, frame-drift, brittleness, and direction stability, plus three speaker-resolved axes from an independent pipeline. The measurements are Hooke-coupled ($\sigma = E \cdot \varepsilon$ analog) and reproduce across loading cases with effects up to $|r_{rb}| = 0.35$ on debate; the sign structure adds a second pattern: the ethical-setting case inverts the velocity and accumulation blocks. Variance composition partitions into two profiles: debate is charge-dominated (brittle-fracture-like: the material grade decides), false-presuppositions and ethical-setting are topic-dominated (creep-like: the load decides); the ratios (2.03 vs 0.13/0.17) are estimator-dependent, for debate even in direction. Cross-judge reliability (GPT-4o vs Haiku 4.5) shows debate scoring is judge-robust (Cohen's $\kappa = 0.88$) while false-presupposition scoring is judge-sensitive ($\kappa = 0.36$) – a caveat single-judge benchmarks must report. This is the methodological move Ye et al.'s diagnosis calls for: a multi-axis characterization that does not depend on which surface form of the construct one privileges.

09.
medRxiv (Medicine) 2026-06-15

Association of Genetic Liability to Psychiatric Disorders with Peripheral Metabolic Dysregulation

Importance: Individuals with psychiatric disorders face elevated cardiometabolic risk which is linked to increased mortality. The extent to which this reflects shared pathogenesis or the downstream effects of illness and treatment remains poorly understood. Objective: To characterize the direct pleiotropic effects of psychiatric genetic liability on circulating metabolites and aggregate cardiometabolic risk, independent of psychiatric diagnosis and psychotropic medication use. Design: Cohort study. Setting: Mass General Brigham Biobank (MGBB). Participants: MGBB participants with metabolomic profiling, genomic data, and linked electronic health records. Exposures: Genetic liability to nine psychiatric disorders quantified using polygenic risk scores (PRS): attention deficit/hyperactivity disorder (ADHD), anorexia nervosa (ANO), anxiety disorder (ANX), autism spectrum disorder (ASD), bipolar disorder (BD), major depressive disorder (MDD), PTSD, schizophrenia (SCZ), and substance use disorder (SUD). Main Outcomes and Measures: 249 circulating metabolites and four metabolomic risk scores (MRS) for type 2 diabetes, myocardial infarction, ischemic stroke, and vascular dementia. PRS-metabolite associations were estimated using nested models adjusting for lifetime psychiatric diagnosis and psychotropic medication use. Results: Across 25,290 participants, we identified 604 significant PRS-metabolite associations (Bonferroni p< 1.36 x 10-4), of which 89% persisted after adjustment for lifetime diagnosis and medication use, suggesting that the direct genetic effects on metabolism are largely independent of illness or treatment. PRS for MDD, PTSD, and ADHD showed the most extensive dysregulation, with a transdiagnostic pattern of elevated lipids and systemic inflammation, specifically triglycerides ({beta} = 0.04 to 0.05, all p< 4.4 x10-13) and glycoprotein acetyls ({beta} = 0.05, all p< 2.2 x10-16). Notably, PRS for SCZ and BD showed minimal metabolite dysregulation despite having the strongest association with their target diagnoses. PRS for MDD, PTSD, ADHD, and SUD were associated with increased MRS across cardiometabolic conditions ({beta} = 0.03 to 0.08, all p< 2.1 x10-4). Sensitivity analyses controlling for BMI or excluding participants without any psychiatric history (N: 21,305 and 11,150, respectively) showed a similar pattern. Conclusions and Relevance: Psychiatric genetic liability is associated with systemic metabolic dysregulation independent of illness onset or treatment, supporting a partially pleiotropic basis for psychiatric-cardiometabolic comorbidity.

10.
arXiv (CS.CL) 2026-06-18

Rethinking Reward Supervision: Rubric-Conditioned Self-Distillation

Post-training of reasoning language models is commonly driven by supervised distillation and reinforcement learning with verifiable rewards. Distillation often relies on chain-of-thought annotations that are expensive to obtain and may themselves be noisy, incomplete, or partially incorrect; even when the final solution is correct, an imperfect rationale can interfere with learning. Reinforcement learning with verified rewards, on the other hand, typically compresses evaluative feedback into a scalar signal, obscuring which aspects of a response should be improved. We propose Rubric-Conditioned Self-Distillation, a framework that incorporates rubrics as structured, fine-grained feedback for on-policy self-distillation. Our method conditions the teacher model on criterion-level rubrics and uses it to provide token-level guidance on the student's own sampled trajectories. This design avoids treating a single reference rationale as the sole supervision target. Instead, rubrics specify what a strong response should satisfy, enabling more fine-grained credit assignment over the reasoning process than scalar reward optimization. We instantiate this framework with a two-stage pipeline that first learns to generate task-specific rubrics and then trains a rubric-guided reasoner. We evaluate on a diverse suite of science reasoning benchmarks and results show that rubric-conditioned self-distillation effectively converts rubric-level criteria into token-level guidance over the reasoning process, surpassing GRPO by 1.0 points and OPSD by 0.9 points on average.

11.
arXiv (CS.AI) 2026-06-16

ArtNet: A JEPA-Like Articulatory Predictive Framework for Robust Zero-Shot Phoneme Recognition

arXiv:2606.16595v1 Announce Type: cross Abstract: Zero-shot cross-lingual phoneme recognition is often hindered by the fragility of direct acoustic-to-symbol mapping, which is susceptible to language-specific variations. Echoing joint-embedding predictive architecture (JEPA) work in vision, we propose ArtNet, a framework that explores a structured feature prediction task based on articulatory features to enhance acoustic robustness. Specifically, ArtNet integrates an articulatory predictor, designed to extract universal articulatory representations from self-supervised learning (SSL) features, with a variational information bottleneck (VIB) to suppress language-specific variations. Experiments on seven unseen languages demonstrate that ArtNet, particularly when synergized with the proposed vector-space inventory alignment (VSIA) strategy, significantly outperforms competitive baselines, achieving a 20.56\% relative reduction in phoneme error rate (PER) and 7.01\% in phoneme feature error rate (PFER).

12.
arXiv (CS.CV) 2026-06-18

VGGHeads: 3D Multi Head Alignment with a Large-Scale Synthetic Dataset

Human head detection, keypoint estimation, and 3D head model fitting are essential tasks with many applications. However, traditional real-world datasets often suffer from bias, privacy, and ethical concerns, and they have been recorded in laboratory environments, which makes it difficult for trained models to generalize. Here, we introduce \method – a large-scale synthetic dataset generated with diffusion models for human head detection and 3D mesh estimation. Our dataset comprises over 1 million high-resolution images, each annotated with detailed 3D head meshes, facial landmarks, and bounding boxes. Using this dataset, we introduce a new model architecture capable of simultaneous head detection and head mesh reconstruction from a single image in a single step. Through extensive experimental evaluations, we demonstrate that models trained on our synthetic data achieve strong performance on real images. Furthermore, the versatility of our dataset makes it applicable across a broad spectrum of tasks, offering a general and comprehensive representation of human heads.

13.
arXiv (CS.AI) 2026-06-24

NoContactNoWorries: Estimating Contact through Vision and Proprioception for In-Hand Dexterous Manipulation

arXiv:2606.24450v1 Announce Type: cross Abstract: Perceiving physical contact is fundamental to dexterous manipulation. While robots often rely on dedicated hardware tactile sensors, humans exhibit a remarkable ability to infer contact by integrating visual information with an innate sense of their body's pose and movement. Inspired by this embodied perceptual skill, we investigate whether a robot can learn to infer contact from vision, an approach that also offers a scalable alternative to tactile hardware specifically for binary contact estimation, which faces practical challenges in cost, fragility, and integration. We present NoContactNoWorries, a transformer-based multimodal framework that fuses RGB-D vision with the robot's proprioception to infer binary contact states as a pseudo-tactile signal for hand-object interactions. We validate by training a single contact prediction model on multiple objects and show that the inferred contact signal supports downstream reinforcement learning agents for in-hand object reorientation, generalizing to novel objects. Experiments in both simulation and on a real-world robot validate our approach, highlighting the feasibility of inferring contact from vision and proprioception. Project Page: https://soham2560.github.io/no-contact-no-worries/

14.
arXiv (CS.LG) 2026-06-11

Open Materials Generation with Inference-Time Reinforcement Learning

arXiv:2602.00424v2 Announce Type: replace Abstract: Continuous-time generative models for crystalline materials enable inverse materials design by learning to predict stable crystal structures, but incorporating explicit target properties into the generative process remains challenging. Policy-gradient reinforcement learning (RL) provides a principled mechanism for aligning generative models with downstream objectives but typically requires access to the score, which has prevented its application to flow-based models that learn only velocity fields. We introduce Open Materials Generation with Inference-time Reinforcement Learning (OMatG-IRL), a policy-gradient RL framework that operates directly on the learned velocity fields and eliminates the need for the explicit computation of the score. OMatG-IRL leverages stochastic perturbations of the underlying generation dynamics preserving the baseline performance of the pretrained generative model while enabling exploration and policy-gradient estimation at inference time. Using OMatG-IRL, we present the first application of RL to crystal structure prediction (CSP). Our method enables effective reinforcement of an energy-based objective while preserving diversity through composition conditioning, and it achieves performance competitive with score-based RL approaches. Finally, we show that OMatG-IRL can learn time-dependent velocity-annealing schedules, enabling accurate CSP with order-of-magnitude improvements in sampling efficiency and, correspondingly, reduction in generation time. The OMatG-IRL code is included in a new release of the Open Materials Generation (OMatG) framework available at https://github.com/FERMat-ML/OMatG.

15.
arXiv (CS.LG) 2026-06-19

Topological Data Analysis for High-Dimensional Dynamic Process Monitoring

arXiv:2606.20443v1 Announce Type: cross Abstract: Real-time process monitoring requires methods that extract actionable information from high-dimensional time-series data. In this work, we present a new approach for process monitoring that combines tools of topological data analysis (TDA) and machine learning. In the proposed approach, we represent multivariate time-series data as manifolds and use topological descriptors to summarize the structure of such data; we then use a neural ordinary differential equation to learn the dynamic evolution of the topological structure of the system. Using real data from an industrial process, we show that this trajectory-based event detection approach is effective at detecting diverse types of events. We contrast this approach against reconstruction-based approaches such as principal component analysis and autoencoders and against a trajectory-based approach that uses Koopman autoencoders.

16.
medRxiv (Medicine) 2026-06-24

Genetically Proxied Interleukin-6 Inhibition and Cancer Risk: A Multi-Ancestry Drug-Target Mendelian Randomization Study of Hepatocellular Carcinoma and Colorectal Cancer

Background: Interleukin-6 (IL-6) signalling drives chronic inflammation and is therapeutically targeted by tocilizumab, an approved IL-6 receptor inhibitor. Whether genetically proxied lifelong IL-6 inhibition causally influences the risk of hepatocellular carcinoma (HCC) or colorectal cancer (CRC) remains unanswered. Prior single-variant estimates from pooled observational data are methodologically limited and may reflect confounding. Methods: A two-sample drug-target Mendelian randomization (MR) study was conducted. Four independent cis-acting protein quantitative trait loci (pQTL) variants within the IL6 and IL6R gene loci (rs2228145, rs4129267, rs7529229, rs1800795) were selected as genetic instruments , with F-statistics ranging from 32.3 to 120.5, confirming instrument strength. Outcome data were obtained from four independent genome-wide association studies: HCC from BioBank Japan (BBJ; 1,866 cases, 195,745 controls), HCC from FinnGen Release 10 (674 cases, 218,118 controls), CRC from a European meta-analysis (19,948 cases, 12,124 controls), and CRC from BBJ (7,062 cases, 195,745 controls). Causal estimates were derived using inverse variance weighted (IVW) regression as the primary method, with MR-Egger and weighted median analyses as sensitivity methods. Cochran Q statistics assessed heterogeneity and MR-Egger intercept testing assessed directional pleiotropy. Results: Genetically proxied IL-6 inhibition showed no significant causal effect on HCC risk in East Asian populations (IVW odds ratio [OR] 0.997, 95% confidence interval [CI] 0.903 to 1.101, p=0.953) or European populations (IVW OR 0.984, 95% CI 0.802 to 1.208, p=0.880). Similarly, no causal effect was observed on CRC risk in European populations (IVW OR 1.015, 95% CI 0.957 to 1.075, p=0.623) or East Asian populations (IVW OR 0.999, 95% CI 0.948 to 1.052, p=0.971). Sensitivity analyses confirmed the absence of directional pleiotropy and heterogeneity across all four analyses. Leave-one-out analyses demonstrated that no single instrument drove the null findings. Conclusions: Genetically proxied IL-6 receptor inhibition, modelling the therapeutic effect of tocilizumab, showed no causal effect on HCC or CRC risk across four independent cohorts and two ancestries. These findings do not support a role for IL-6 pathway inhibition in the prevention of these cancers and provide reassuring genetic safety evidence regarding cancer risk in patients receiving tocilizumab. Larger HCC-specific GWAS are needed to definitively evaluate modest effects in this cancer type.

17.
arXiv (CS.CV) 2026-06-25

Physics Question Scene Graph: Fine-grained Evaluation of Physical Plausibility in Text-to-Video Generation

Video generation models are increasingly capable of producing realistic videos, but they still struggle to generate videos that follow basic physical laws. Compounding this is a lack of reliable granular evaluation methods for localizing and specifying physical law violations in videos. We address this by introducing Physics Question Scene Graph (PQSG), a hierarchical question-based evaluation pipeline. PQSG evaluates generated videos by checking their faithfulness to a prompt across objects, actions, and adherence to physical laws using a graph-based hierarchy of questions generated by a vision-language model (VLM), guided by high-quality in-context examples. By representing questions as a graph, PQSG introduces logical dependencies within questions, ensuring that each query is contextually valid. Moreover, PQSG provides granular assessments of which qualities of the video violate physical plausibility constraints. We validate PQSG by creating FinePhyEval, a dataset with physics-based prompts and corresponding generated videos from diverse state-of-the-art video generation models (Sora 2, Veo 3, and Wan 2.1), with each video annotated across multiple categories by humans. Using FinePhyEval, we measure the correlation between PQSG's fine-grained scores and human judgments, showing higher overall correlations than prior work. We also find that PQSG ranks closed-source models higher than Wan 2.1 on physical realism. Lastly, we show that the annotations we provide in FinePhyEval can also be used for subtask evaluation: we benchmark two strong VLMs on generating and answering questions, finding that while models can create human-like questions, they still fall short of human performance in answering them.

18.
arXiv (CS.CL) 2026-06-19

Uncertainty Decomposition for Clarification Seeking in LLM Agents

Recent position papers argue that the classical aleatoric/epistemic uncertainty framework is insufficient for interactive large language model (LLM) agents and call for underspecification-aware, decomposed, and communicable uncertainty representations that can unlock new agent capabilities such as proactive clarification seeking and shared mental-model building. Practical deployment constraints – black-box APIs, interactive latency budgets, and the absence of labeled trajectories – rule out logprob-based, multi-sampling, and training-based methods, leaving prompt-based estimation as the most viable family for surfacing such signals at deployment time. We answer this call with a simple prompt-based decomposition that separates action confidence from request uncertainty (u), enabling the agent to ask for clarification when the task specification is ambiguous. To evaluate it, we introduce two clarification-augmented benchmarks (WebShop-Clarification and ALFWorld-Clarification) in which 50% of tasks are deliberately underspecified, and systematically compare the proposed decomposition against ReAct+UE and Uncertainty-Aware Memory (UAM) across five LLM backbones (GPT-5.1, DeepSeek-v3.2-exp, GLM-4.7, Qwen3.5-35B, GPT-OSS-120B) on these variants together with the standard WebShop, ALFWorld, and REAL benchmarks for fault detection. Averaged across the five backbones, the proposed decomposition improves clarification F1 on ALFWorld-Clarification by 73% over ReAct+UE and by 36% over UAM, and leads clarification F1 on every backbone on WebShop-Clarification and on four of five backbones on ALFWorld-Clarification, indicating that the gains generalize beyond a single LLM.

19.
arXiv (math.PR) 2026-06-24

On domains of elliptic operators with distributional coefficients

arXiv:2509.24950v2 Announce Type: replace-cross Abstract: We show how one can use recently gained insights from the study of singular SPDEs, more particularly the study of singular operators via the theory of Paracontrolled Distributions, to construct domains for (singular) elliptic operators. Formally we consider \[ A (u) = (1 - \Delta) u + \nabla V \cdot \nabla u + \xi u + {{div} (\rho u)}, \] where $V \in \mathcal{C}^{\delta}$, $\xi \in \mathcal{C}^{- 2 + \delta}$, $\rho \in \mathcal{C}^{- 1 + \delta}, {div} \rho = 0$} and which satisfy a structural assumption that is notably satisfied when $\xi$ is a sub-critical noise, see {[MvZ22]}. We also show that under this assumption, one can construct a continuous change of variables $\Theta$ which satisfies \[ A \Theta - (1 - \Delta) \in \mathcal{L} (H^{2 - \delta''} ; H^{\delta'}) \] which allows us to define $A$ rigorously and parametrise a domain. Moreover, for suitably regularised operators \[ A_{\varepsilon} (u) := (1 - \Delta) u + \nabla V_{\varepsilon} \cdot \nabla u + (\xi_{\varepsilon} + c_{\varepsilon}) \cdot u + {{div} (\rho_{\varepsilon} \cdot u)}, \] we show that for a strongly converging regularised change of variables $\Theta_{\varepsilon} \rightarrow \Theta$ we have \[ A_{\varepsilon} \Theta_{\varepsilon} \rightarrow A \Theta in \mathcal{L} (H^2 ; L^2) \] which in particular implies norm resolvent convergence to a limiting closed operator. Finally, we give a class of examples and show how to apply these results to prove strong analytical local well-posedness for a singular Schrödinger equation formally given by \[ i \partial_t u + (1 - \Delta) u + \nabla V \cdot \nabla u + \xi \cdot u = - | u |^2 u \] for singular $V, \xi$ and that its solution is the limit of the solution of the classical solutions of a regularised equation

20.
arXiv (CS.CL) 2026-06-16

SAMark: A Self-Anchored Text Watermarking with Paragraph-Level Paraphrase Robustness

Semantic-level watermarking (SWM) improves robustness against text modifications by treating sentences as the basic unit. However, robustness to paragraph-level paraphrasing remains difficult because such attacks globally disrupt watermark signals by changing sentence order. In this work, we propose SAMark, a self-anchored watermarking framework that removes the dependency on sentence order by establishing a step-independent green region in semantic space. To improve detectability, we introduce a multi-channel hyperbolic scoring mechanism that amplifies watermark signals while suppressing noise from weakly aligned candidates. We further propose a diversity-aware filtering strategy that combines hard filtering with soft regularization, extending beyond simple n-gram repetition filters to address semantic redundancy. Experimental results show that SAMark achieves up to 90.2% TP@FP1% under typical paragraph-level paraphrasing attacks, outperforming the strongest prior baseline by more than 30% on average, while maintaining generation quality competitive with unwatermarked text and breaking the robustness-quality trade-off that limits prior methods.

21.
arXiv (CS.LG) 2026-06-24

Machine-Learning Emulation of Satellite Greenhouse Gas Retrievals: Stability over Time

arXiv:2606.09313v2 Announce Type: replace Abstract: Retrieval algorithms are used to estimate atmospheric concentrations of greenhouse gases (GHGs), such as carbon dioxide (CO2) and methane (CH4), by solving inverse problems from high-spectral-resolution satellite radiance measurements. However, these algorithms are computationally expensive, which makes real-time estimation at scale difficult. Machine-learning models have therefore been proposed as fast emulators of retrieval algorithms. Most existing studies, however, evaluate them only on test data from the same period as the training data. We study the stability over time of such emulators using data from the Greenhouse Gases Observing SATellite (GOSAT). We show that prediction accuracy generally deteriorates when the test period moves away from the training period. We also show that including time as an input feature substantially improves XCH4 prediction for Lasso and neural-network models. Among the methods considered, a simple Lasso model performs as well as or better than more complex methods such as neural networks, and yields more stable predictions over time. We further validate the results using the Total Carbon Column Observing Network (TCCON), a ground-based observation network. On the TCCON-matched dataset, the time-augmented Lasso achieves errors against TCCON that are comparable to the disagreement between GOSAT and TCCON for both XCO2 and XCH4.

22.
arXiv (math.PR) 2026-06-24

Sub-Poisson distributions: Concentration inequalities, optimal variance proxies, and closure properties

arXiv:2508.12103v2 Announce Type: replace Abstract: We introduce a nonasymptotic framework for sub-Poisson distributions with moment generating function dominated by that of a Poisson distribution. At its core is a new notion of optimal sub-Poisson variance proxy, analogous to the variance parameter in the sub-Gaussian setting. This framework allows us to derive a Bennett-type concentration inequality without boundedness assumptions and to show that the sub-Poisson property is closed under key operations including independent sums and convex combinations, but not under all linear operations such as scalar multiplication. We derive bounds relating the sub-Poisson variance proxy to sub-Gaussian and sub-exponential Orlicz norms. Taken together, these results unify the treatment of Bernoulli and Poisson random variables and their signed versions in their natural tail regime.

23.
arXiv (quant-ph) 2026-06-11

Enhancing Many-Body Chaos via Entropy Injection from Environment

arXiv:2606.11784v1 Announce Type: new Abstract: In closed quantum systems, local information spreads throughout the entire system and becomes highly complex under unitary evolution. In contrast, when the system is embedded in an environment, system-environment coupling can transfer information from the system into the environment, thereby reducing the rate of complexity growth within the system. This leads to the environment-induced scrambling transition established in previous works. In this work, we identify entropy injection from the environment as a different physical process that instead enhances many-body chaos. Our setup consists of coupling a system that is already in equilibrium with one environment to another environment, which serves as an entropy reservoir and drives the system into a non-equilibrium state. When entropy flows into the system through either heat transfer or particle transfer, the effective Hilbert space explored by the system enlarges, a mechanism that can enhance many-body chaos. We explicitly demonstrate this idea by constructing a solvable complex Brownian SYK model, in which both the relaxation toward the steady state and the steady-state quantum Lyapunov exponent can be computed analytically. Our results provide a controllable mechanism for tuning quantum scrambling through entropy flow in quantum many-body systems coupled to environments.

24.
arXiv (CS.CV) 2026-06-16

BRDFusion: Physics Meets Generation for Urban Scene Inverse Rendering

Inverse rendering of urban scenes from captured videos enables numerous applications, including content creation and autonomous driving simulation. Physically-based rendering methods follow and control lighting physics, but suffer from reconstruction and rendering artifacts. While generative models produce realistic videos, they offer limited consistency and controllability. We present BRDFusion, a unified framework that combines two complementary models for inverse and forward rendering. Specifically, BRDFusion recovers explicit, consistent scene properties with physical modeling and alleviates optimization ambiguity with generative priors. During forward rendering, the physical model provides controllable rendering from the scene configuration, and the generative model denoises and fixes artifacts. Therefore, our method produces high-quality videos while allowing precise control, outperforming baselines in real and synthetic scenes. Moreover, BRDFusion supports novel-view relighting, night simulation, and dynamic object insertion/editing. Project page: https://shigon255.github.io/brdfusion-page/

25.
arXiv (quant-ph) 2026-06-11

Sharing quantum indistinguishability with multiple parties

arXiv:2512.15199v3 Announce Type: replace Abstract: Quantum indistinguishability of non-orthogonal quantum states is a valuable resource in quantum information applications such as cryptography and randomness generation. In this article, we present a sequential state-discrimination scheme that enables multiple parties to share quantum uncertainty, in terms of the max relative entropy, generated by a single party. Our scheme is based upon maximum-confidence measurements and takes advantages of weak measurements to allow a number of parties to perform state discrimination on a single quantum system. We review known sequential state discrimination and show how our scheme would work through a number of examples where ensembles may or may not contain symmetries. Our results will have a role to play in understanding the ultimate limits of sequential information extraction and guide the development of quantum resource sharing in sequential settings.