Academic Intelligence · Curated Daily

探索全球前沿学术脉络

AcademicHub 汇聚顶级期刊与预印本平台的实时文献。定制您的专属科研雷达,利用大语言模型自动生成交叉领域文献分析简报。

01.
arXiv (CS.CV) 2026-06-19

OncoReg: Medical Image Registration for Oncological Challenges

In modern cancer research, the vast volume of medical data generated is often underutilised due to challenges related to patient privacy. The OncoReg Challenge addresses this issue by enabling researchers to develop and validate image registration methods through a two-phase framework that ensures patient privacy while fostering the development of more generalisable AI models. Phase one involves working with a publicly available dataset, while phase two focuses on training models on a private dataset within secure hospital networks. OncoReg builds upon the foundation established by the Learn2Reg Challenge by incorporating the registration of interventional cone-beam computed tomography with standard planning fan-beam CT images in radiotherapy. Accurate image registration is crucial in oncology, particularly for dynamic treatment adjustments in image-guided radiotherapy, where precise alignment is necessary to minimise radiation exposure to healthy tissues while effectively targeting tumours. This work details the methodology and data behind the OncoReg Challenge and provides a comprehensive analysis of the competition entries and results. Findings reveal that feature extraction plays a pivotal role in this registration task. A new method emerging from this challenge demonstrated its versatility, while established approaches continue to perform comparably to newer techniques. Both deep learning and classical approaches still play significant roles in image registration, with the combination of methods, particularly in feature extraction, proving most effective.

02.
arXiv (quant-ph) 2026-06-17

Photon anti-bunching in high harmonic generation

arXiv:2606.17620v1 Announce Type: new Abstract: Photon anti-bunching is the direct evidence for the existence of photons without having a classical counterpart. Unlike bunching of photons, which can have a semi-classical description, the effect of photon anti-bunching can only be understood with quantized electromagnetic fields. However, for the process of high harmonic generation (HHG), where many photons of the driving field are upconverted to a single photon of higher energy, there is yet no clear evidence for the presence of individual photon emission. The key result of this work is the prediction of photon anti-bunching in the process of HHG, marking it the first theoretical discovery of non-classicality in the temporal correlations of HHG photons. While other non-classical signatures in HHG, such as sub-Poissonian statistics or squeezing, have been discussed for an ensemble of photons, the anti-bunching signature reported here is a signature of a single photon. This is achieved by using the recently developed Heisenberg picture approach for quantum optical HHG, revealing clear anti-bunching signatures in the intensity correlation function across the entire harmonic spectrum.

03.
arXiv (CS.CL) 2026-06-17

GameCraft-Bench: Can Agents Build Playable Games End-to-End in a Real Game Engine?

Game generation is an emerging application of coding agents, requiring models to transform natural-language specifications into playable interactive systems. Unlike traditional coding tasks, game generation takes place within a game engine, where scripts, scenes, assets, rendering, and runtime interactions must jointly produce coherent gameplay. We formalize end-to-end game generation as the problem of producing a complete game artifact that realizes a specification through observable player-game interaction in a target environment. We argue that evaluating this setting requires three desiderata: Engine Grounding, Artifact Completeness, and Interactive Verification. We propose an interaction-grounded evaluation framework that assesses executable gameplay through replayed demonstrations and rubric-guided multimodal judging. We instantiate this framework as GameCraft-Bench, a benchmark comprising 140 Godot tasks across 15 game families. Evaluations of frontier coding agents show that end-to-end game generation remains highly challenging: the strongest agent achieves only 41.46%, and most agents score below 40%. Further analysis reveals that while agents often implement recognizable mechanics, they struggle to deliver complete games with sufficient content, functional visual feedback, and coherent presentation. See https://tongxuluo.github.io/gamecraft-bench-website for demos, code, and data.

04.
arXiv (CS.AI) 2026-06-16

TERMS-Bench: Diagnosing LLM Negotiation Agents Beyond Deal Rate

arXiv:2605.13909v2 Announce Type: replace-cross Abstract: Negotiation is a central mechanism of economic exchange, shaping markets, procurement, labor agreements, and resource allocation. It is also a canonical testbed for agentic language models, requiring multi-turn interaction under hidden preferences, strategic communication, and binding constraints. These properties make negotiation hard to evaluate: unlike math or code, it has no intrinsic verifier. Existing LLM negotiation evaluations rely on LLM-vs.-LLM interaction or aggregate outcomes such as deal rate, leaving failures opaque. We introduce Terms-Bench, short for Testbed for Economic Reasoning in Multi-turn Strategy, a Bayesian-game framework that makes the environment itself the verifier by specifying the counterpart's latent type, policy, and payoff structure. We instantiate it in bilateral price negotiation, where the counterpart's private state and simulator policy are hidden from the agent but observable to the evaluator. This turns the counterpart from a black-box opponent into a diagnostic instrument, enabling agent-attributable failure analysis and oracle-reference optimality gaps. Evaluating 13 LLM agents spanning frontier systems from major providers, Terms-Bench turns negotiation evaluation from aggregate ranking into actionable diagnosis: where agents fail, why they fail, and what to strengthen. Empirically, frontier models saturate deal rate yet diverge in surplus extraction, cue use, belief calibration, and compliance, revealing agent-specific bargaining bottlenecks masked by prior benchmarks.

05.
arXiv (CS.AI) 2026-06-12

TerraBench: Can Agents Reason Over Heterogeneous Earth-System Data?

arXiv:2606.13148v1 Announce Type: new Abstract: Climate and environmental decision-making increasingly requires reasoning across heterogeneous inputs, including gridded physical data, satellite imagery, geospatial context, and simulator outputs. Weather and climate foundation models can forecast well, but do not reason interactively in language, while large language models (LLMs) reason in language but cannot operate directly on high-dimensional Earth-system data. As a result, real scientific workflows in Earth-science remain underserved. We introduce TerraBench, a benchmark for grounded Earth-science reasoning, built on TerraAgent, a ReAct-style executable framework that interleaves reasoning, tool calls, and observations to couple LLM planning with scientific tools for environmental retrieval, geospatial processing, simulation, and artifact-backed computation. TerraBench unifies analysis of Earth observation imagery, gridded data, GIS reasoning and simulation in a single executable interface, whereas prior benchmarks isolate these capabilities into narrow individual tasks. It is also the first in this space to pair process-level tool-use metrics with tolerance-aware numeric scoring. The benchmark comprises 403 extensive agentic tasks across three tracks (Fundamentals, Simulator-Grounded, and Document-Grounded Verification) and eight application domains with 24,500 verified execution steps. These results indicate that reliable Earth-science agents must go beyond tool access to coordinate heterogeneous workflows, parameterize tools precisely, and preserve artifact provenance.

06.
PLOS Computational Biology 2026-06-18

Ten simple rules for turning your qualifying exam into an NIH-style fellowship proposal: A guide for graduate students

by Courtney Peña-Lima, Cameron S. Bader, Brendan K. Ball, Troy C. Dildine, Mekhala V. Dissanayake, Iris van ‘t Erve, Albina Ibrayeva, Amy Nippert, M.K. Quinn, Chelse Spinner, Samuel Thompson, Antonio Tomasso, Crystal M. Botham Qualifying exams, often referred to as “quals” or candidacy exams, are an important milestone in doctoral programs. Although the style of quals varies greatly by program and institution, it is usually a proposal that requires students to develop research ideas as well as their scientific writing skills. Many quals are modeled after funding mechanisms that graduate students can apply to and on a topic that the student will pursue in their dissertation. This paper offers graduate students a step-by-step guide on how to turn their quals into a fellowship-style research proposal, using National Institutes of Health (NIH) mechanisms as a benchmark, as this is the norm within US research institutions. This paper will be most useful for students who have completed or are in the process of completing proposal-based qualifying exams, usually in the second year of a doctoral program.

07.
arXiv (CS.CV) 2026-06-17

Neural Tree Reconstruction for the Open Forest Observatory

The Open Forest Observatory (OFO) is a collaboration across universities and other partners to make low-cost forest mapping accessible to ecologists, land managers, and the general public. The OFO is building both a database of geospatial forest data as well as open-source methods and tools for forest mapping by uncrewed aerial vehicle. Such data are useful for a variety of climate applications including prioritizing reforestation efforts, informing wildfire hazard reduction, and monitoring carbon sequestration. In the current iteration of the OFO's forest map database, 3D tree maps are created using classical structure-from-motion techniques. This approach is prone to artifacts, lacks detail, and has particular difficulty on the forest floor where the input data (overhead imagery) has limited visibility. These reconstruction errors can potentially propagate to the downstream scientific tasks (e.g. a wildfire simulation.) Advances in 3D reconstruction, including methods like Neural Radiance Fields (NeRF), produce higher quality results that are more robust to sparse views and support data-driven priors. We explore ways to incorporate NeRFs into the OFO dataset, outline future work to support even more state-of-the-art 3D vision models, and describe the importance of high-quality 3D reconstructions for forestry applications.

08.
medRxiv (Medicine) 2026-06-22

Why drinking episodes escalate differently: Event-level pathways linking hazardous alcohol consumption and sexual risk

Background: Alcohol-involved drinking episodes vary in whether they involve hazardous alcohol consumption alone, near-miss sexual risk, or sexual risk behavior, but the within-event mechanisms underlying this variability remain unclear. Methods: Guided by syndemic theory, we conducted a qualitative event-level analysis using modified grounded theory among adults in the San Francisco Bay Area who reported hazardous alcohol consumption, defined as an Alcohol Use Disorder Identification Test score [≥]16. In-depth interviews elicited narratives of recent heavy drinking episodes and yielded 64 discrete drinking events across 22 participants. We focused on 35 events with evidence of within-event interaction between biopsychosocial and contextual factors. Using constant comparison, we identified escalation pathways, characterized interruption, and examined how events diverge into three outcomes: hazardous alcohol consumption only, hazardous alcohol consumption with near-miss sexual risk (when risk was plausible but not enacted), and hazardous alcohol consumption with sexual risk behavior. Results: Two primary escalation pathways emerged. Dose-driven escalation involved cumulative alcohol or substance exposure that progressively impaired awareness and self-regulation. Meaning-driven escalation involved prioritizing connection, intimacy, or belonging despite awareness of risk. Time-driven continuation extended exposure across contexts and amplified both pathways. Hazardous alcohol consumption-only events more often followed dose-driven pathways, whereas events involving sexual risk behavior more often followed meaning-driven pathways. Near-miss events occurred across both pathways and illustrated how interruption before the escalation constraint point, when the capacity to modify behavior became reduced, could redirect escalation before sexual risk behavior occurred. Across events with similar levels of intoxication narratives, outcomes diverged according to when the interruption occurred and whether it altered escalation. Conclusion: Hazardous drinking episodes diverge into different outcomes based on escalation pathways and the timing and effectiveness of interruption. Early and effective interruption before the escalation constraint point may represent a key target for harm-reduction strategies to prevent progression to sexual risk behavior.

09.
arXiv (CS.CV) 2026-06-18

Hilbert-Geo: Solving Solid Geometric Problems by Neural-Symbolic Reasoning

Geometric problem solving, as a typical multimodal reasoning problem, has attracted much attention and made great progress recently, however most of works focus on plane geometry while usually fail in solid geometry due to 3D spatial diagrams and complex reasoning. To bridge this gap, we introduce Hilbert-Geo, the first unified formal language framework for solid geometry, including an extensive predicate library and a dedicated theorem bank. Based on this framework, we propose a Parse2Reason method containing two steps of first parsing then reasoning. In the parsing step, we utilize conditional description language (CDL), a formalized language composed of predicates specifically designed to construct geometric conditions, to represent both problem description (natural text) and solid diagrams (visual image). In the reasoning step, we leverage those formal CDL and the theorem bank to perform relational inference and algebraic computation, generating strictly correct, verifiable, and human-readable reasoning processes. Notably, our proposed Hilbert-Geo is also applicable to plane geometry. To advance geometric reasoning, we curate two expert-annotated dataset SolidFGeo2k and PlaneFGeo3k, which are furnished with geometric formal language annotations, solutions and answers. Extensive experiments show that our proposed method achieves the state-of-the-art (SOTA) performance 77.3% in SolidFGeo2k and 84.1% in MathVerse-Solid (one small subset in MathVerse dedicated to solid geometry), substantially outperforming leading MLLMs, such as Gemini-2.5-pro (54.2% on SolidFGeo2k) and GPT-5 (62.9% on MathVerse-Solid). In addition, our method achieves the SOTA accuracy 80.2% in PlaneFGeo3k, demonstrating the generality of the Hilbert-Geo in geometric reasoning. Our code and datasets are released at https://github.com/PremiLab-Math/Hilbert-Geo.

10.
arXiv (CS.CV) 2026-06-16

Unified Multimodal Model for Brain MRI Imputation and Understanding

Multimodal large language models (MLLMs) hold great potential for medicine, as they inherit knowledge from LLM and allow multiple data modalities to be integrated, analysed and interpreted in natural language. However, the field of medical MLLMs is constrained by non-trivial challenges, notably the scarcity of high-quality training data and the frequent occurrence of missing data in the real-world clinical setting. Here, we propose a novel unified multimodal model, UniBrain, for brain magnetic resonance image (MRI) analysis. To address potential missing brain MRI modalities, we employ a unified training strategy to perform joint imaging modality imputation and brain image understanding. During training, an interleaved and description-enriched data flow is constructed to train the model in an autoregressive manner, enabling medical reasoning with generated multimodal data. A self-alignment strategy is introduced to leverage dense image embeddings to learn fine-grained anatomical features without requiring detailed image captions. Furthermore, we propose a dynamic hidden state mechanism to alleviate the exposure bias during long-context multimodal inference. Extensive experiments on multi-disease brain MRI dataset demonstrate that UniBrain achieves high performance for brain image imputation, understanding, and disease diagnosis under various extents of modality incompleteness.

11.
arXiv (CS.CV) 2026-06-18

CAMEO: A Conditional and Quality-Aware Multi-Agent Image Editing Orchestrator

Conditional image editing aims to modify a source image according to textual prompts and optional reference guidance. Such editing is crucial in scenarios requiring strict structural control (i.e., anomaly insertion in driving scenes and complex human pose transformation). Despite recent advances in large-scale editing models (i.e., Seedream, Nano Banana, etc), most approaches rely on single-step generation. This paradigm often lacks explicit quality control, may introduce excessive deviation from the original image, and frequently produces structural artifacts or environment-inconsistent modifications, typically requiring manual prompt tuning to achieve acceptable results. We propose CAMEO, a structured multi-agent framework that reformulates conditional editing as a quality-aware, feedback-driven process rather than a one-shot generation task. CAMEO decomposes editing into coordinated stages of planning, structured prompting, hypothesis generation, and adaptive reference grounding, where external guidance is invoked only when task complexity requires it. To overcome the lack of intrinsic quality control in existing methods, evaluation is embedded directly within the editing loop. Intermediate results are iteratively refined through structured feedback, forming a closed-loop process that progressively corrects structural and contextual inconsistencies. We evaluate CAMEO on anomaly insertion and human pose switching tasks. Across multiple strong editing backbones and independent evaluation models, CAMEO consistently achieves 20\% more win rate on average compared to multiple state-of-the-art models, demonstrating improved robustness, controllability, and structural reliability in conditional image editing.

12.
arXiv (CS.AI) 2026-06-11

Erased but Not Forgotten: How Backdoors Compromise Concept Erasure

arXiv:2504.21072v3 Announce Type: replace-cross Abstract: The expansion of text-to-image diffusion models has raised concerns about harmful outputs, from fabricated depictions of public figures to sexually explicit imagery. To mitigate such risks, prior work has proposed concept erasure methods that aim to sever unwanted concepts from the model via fine-tuning, yet it remains unclear whether these approaches truly remove all links to the harmful concept or merely conceal superficial connections. In this work, we reveal a critical vulnerability, the Erasure Evasion Backdoor (EEB): an adversary binds a backdoor trigger to a concept slated for removal, and this malicious link survives subsequent erasure. We show that both black-box and white-box adversaries can instantiate this threat. Across six state-of-the-art erasure methods, including robust ones that explicitly search for alternative representations of the target concept, EEB consistently exposes harmful content: up to 82% success against celebrity-identity unlearning, up to 94% for object erasure, and up to 16 times amplification of explicit-content exposure. While EEB uncovers a blind spot in current erasure methods, it also provides a diagnostic tool for stress-testing future concept erasure techniques.

13.
arXiv (CS.CV) 2026-06-12

Emerging Flexible Designs for Geospatial Multimodal Foundation Models

Foundation models are rapidly transforming Earth observation by enabling scalable pretraining across diverse unlabeled geospatial modalities. However, their architectural diversity ranging from encoder-only to encoder-decoder and masked autoencoding paradigms makes it challenging to assess performance trade offs in a consistent manner. In this work, we present an apples-to-apples comparison of leading FM architectures designed for geospatial multimodal reasoning, with a particular focus on flexibility across varied spectral band configurations. We standardize pretraining using identical self supervised learning objectives and training datasets, and evaluate all models under consistent parameterization on the GEOBench benchmark across classification and segmentation tasks. Our results offer new insights into the design trade-offs between model flexibility, modality alignment, and downstream task performance. By highlighting architectural strengths and limitations under controlled conditions, this study provides practical guidance for building next generation geospatial foundation models capable of robust multimodal reasoning.

14.
PLOS Medicine 2026-06-18

Association between initial benzodiazepine prescribing patterns and time to benzodiazepine discontinuation: A population-based retrospective cohort study

by Nikki Bozinoff, Tanya S. Hauck, Robert A. Kleinman, Matthew E. Sloan, Beth A. Sproule, Simone N. Vigod, Jennifer Wyman, Priscila Pequeno, Tara Gomes Background Long-term benzodiazepine use has been associated with increased risk of morbidity and mortality. Preventing long-term use through safer prescribing practices has received little attention to date. We sought to better understand associations between initial prescription characteristics and duration of benzodiazepine use. Methods and findings This was a retrospective population-based cohort study of 1,820,808 adults in Ontario with incident benzodiazepine prescriptions between January 1, 2013 and December 31, 2020, with follow-up to December 31, 2021. The primary exposure was duration of the index prescription (≤7 days—referent group, 8–14 days, 15–30 days, or >30 days). Secondary exposures were: (a) duration of action of index benzodiazepine(s) prescription (short-acting, long-acting or both); (b) number of benzodiazepine dispensed on index (1 or 2+); and (c) mean daily dose of the index prescription in Diazepam Milligram Equivalents (DMEs). The primary outcome was time to benzodiazepine discontinuation in days. Multivariable models were adjusted for age, sex, anxiety, insomnia, and substance use disorders as well as other important comorbidities and socio-demographic characteristics. The median age at index was 53 years (Interquartile Range (IQR) 38–67), and 62.6% were women. The median time to discontinuation in women was 16 days (IQR: 6–29) while the median time to discontinuation in men was 19 days (IQR: 6–29). Lorazepam was the most commonly prescribed benzodiazepine on index (63.9%), followed by clonazepam (17.3%) and diazepam (5.8%). In multivariable Cox Proportional Hazards Models, longer index prescriptions were associated with a lower likelihood of benzodiazepine discontinuation (adjusted Hazard Ratio (aHR) 0.54 (95% Confidence Interval (CI) [0.54,0.54]) for 8–14 days; aHR 0.26 (95% CI [0.25,0.26] for 15–30 days and aHR 0.14 (95% CI [0.14,0.14]) for >30 days, compared to ≤7 days, respectively). Being prescribed two or more benzodiazepines versus 1 was also associated with a reduced likelihood of discontinuation (aHR 0.59 (95% CI [0.57,0.61])), as was being prescribed long-acting benzodiazepines (aHR 0.80 (95% CI [0.80,0.80])) or a combination of short and long acting benzodiazepine (aHR 0.84 (95% CI [0.80,0.88])) versus short-acting benzodiazepines alone. Mean daily doses of >5 to ≤10 DME and >10 to ≤20 DME were associated with an increased likelihood of discontinuation (aHR 1.03 (95% CI [1.03,1.03]); aHR: 1.03 (95% CI [1.03,1.04])), whereas doses >20 DME were associated with a reduced likelihood of discontinuation (aHR 0.98 (95% CI [0.97,0.98])) compared with ≤5 DME. Findings may be subject to bias from unmeasured confounding. Conclusion This large population-based cohort study found that prescribing shorter courses of benzodiazepines, use of a single benzodiazepine, use of a short-acting agent, were associated with reduced likelihood of long-term benzodiazepine use. Findings suggest that simple changes to prescribing practices could reduce prolonged benzodiazepine use and the morbidity and mortality associated with long-term use of these medications.

15.
Nature (Science) 2026-06-10

Human migration has surged since 2000 — these maps reveal where people are going

Modelling with artificial-intelligence tools has filled gaps in migration data, revealing detailed global population movements from 1990 to 2023. Modelling with artificial-intelligence tools has filled gaps in migration data, revealing detailed global population movements from 1990 to 2023.

16.
medRxiv (Medicine) 2026-06-12

Sociodemographic and health correlates of reimbursement authorizations for cannabis for medical purposes in Canadian veterans: A cross-sectional study linking the Life After Services Studies 2019 and Health Administrative Databases

Background Evidence on factors associated with cannabis for medical purposes (CMP) authorizations among Veterans Affairs Canada (VAC) clients remains limited and inconsistent, particularly concerning mental health and posttraumatic stress disorder (PTSD), a leading indication for use. We investigated demographic, clinical and service characteristics associated with VAC authorizations for CMP reimbursement. Method We linked VAC administrative CMP program data with responses from the 2019 Life After Services Studies cross-sectional survey of Regular Force veterans released between 1998 and 2018. Multivariable logistic regressions examined associations between CMP reimbursement (yes/no) and demographic, clinical and well-being factors, with analyses stratified by PTSD status. Results Among 1,289 respondents (weighted n=33,131), 18.4% were authorized for CMP reimbursement. Younger age (

17.
arXiv (CS.AI) 2026-06-18

Forecasting what Matters: Decision-Focused RL for Controlled EV Charging with Unknown Departure Times

arXiv:2606.19199v1 Announce Type: cross Abstract: The recent growth of EV adoption poses challenges for power systems, including increased peak demand and potential grid instability. Smart control of EV charging – e.g., based on reinforcement learning (RL) – can alleviate these issues by learning temporal and contextual patterns from historical data. Yet, in real-world scenarios, key features, such as departure time, often are unavailable. This, in turn, makes it harder for an RL agent to learn and execute an effective charging policy. To mitigate this uncertainty, a trained forecaster can approximate the unknown features from available data. However, since these forecasting models are typically trained for accuracy (rather than their impact on a downstream agent's decision quality), their errors may propagate and hinder the overall performance of a controller that is using the forecasts. To avoid this, we propose a decision-focused RL (DF-RL) framework in which the forecaster is trained end-to-end, i.e., with feedback from the charging policy actions taken by the RL agent. Such joint training of both the forecaster and controller ultimately results in higher-quality actions: our proposed DF-RL method yields superior charging decisions compared to other baselines, achieving up to a 14% improvement in total reward and a 55% reduction of unsupplied energy (i.e., charging that failed to happen because the EV already left), relative to the RL method without departure time forecasting.

18.
arXiv (CS.CL) 2026-06-16

Rhythm of the Deep: A Computational-Linguistic Test of Duality of Patterning in Sperm Whale Codas

Human language has often been described as combining structure at two levels: lower-level units combine into larger units, which then combine into larger sequences. We test for this design feature, duality of patterning, in sperm whale codas using 1,483 codas from the Dominica Sperm Whale Project. Because acoustic similarity can imitate symbolic structure, we treat the problem as computational-linguistic structure discovery from continuous audio rather than as a direct claim about language or meaning. We use a consensus of frozen audio encoders, held-out structural tests, per-statistic nulls, and acoustic-null recoverability gates. The evidence supports a narrow two-tier architecture. At the lower tier, clicks compose into codas not by a stable ordered rule, but by which clicks are present together with their inter-click rhythm. At the upper tier, coda tokens show bout-level sequential dependence, with an NSB second-order transfer-entropy lift of 0.132 bits (p = 0.002). Under tempo scaling, encoder-derived click identity is strongly rate-bound, while coda identity remains substantially more stable, yielding a measurable abstraction gradient across the click-to-coda step. Rhythm-only baselines recover substantial lower-tier structure but fail to reproduce the upper-tier sequential-dependence signal. We do not claim language, semantics, perception, or human-like phonemes. Instead, we report representation-level evidence for a duality-of-patterning-like architecture whose lower tier is rhythmic rather than segmental, and provide a portable null-controlled framework for testing combinatorial structure in induced acoustic token systems.

19.
arXiv (CS.CL) 2026-06-12

Causal Inference with Generative Artificial Intelligence: Application to Texts as Treatments

In this paper, we demonstrate how to enhance the validity of causal inference with unstructured high-dimensional treatments like texts, by leveraging the power of generative Artificial Intelligence (GenAI). Specifically, we propose to use a deep generative model such as large language models (LLMs) to efficiently generate treatments and use their internal representation for subsequent causal effect estimation. We show that the knowledge of this true internal representation helps disentangle the treatment features of interest, such as specific sentiments and certain topics, from other possibly unknown confounding features. Unlike existing methods, the proposed GenAI-Powered Inference (GPI) methodology eliminates the need to learn causal representation from the data, and hence produces more accurate and efficient estimates. We formally establish the conditions required for the nonparametric identification of the average treatment effect, propose an estimation strategy that avoids the violation of the overlap assumption, and derive the asymptotic properties of the proposed estimator through the application of double machine learning. Finally, using an instrumental variables approach, we extend the proposed GPI methodology to the settings in which the treatment feature is based on human perception. The GPI is also applicable to text reuse where an LLM is used to regenerate existing texts. We conduct simulation and empirical studies, using the generated text data from an open-source LLM, Llama 3, to illustrate the advantages of our estimator over state-of-the-art causal representation learning algorithms.

21.
bioRxiv (Bioinfo) 2026-06-11

TifBERT: a self-supervised foundation model for normalization-robust bulk RNA-seq representation learning

Bulk RNA sequencing remains central to translational genomics, yet foundation-model development has largely focused on single-cell data. Existing transformer approaches for bulk RNA-seq often rely on expression discretization, numerical reconstruction, external gene embeddings, or restricted gene sets, limiting robustness across normalization schemes and cohorts. Here, we introduce TifBERT, a self-supervised framework for full-transcriptome bulk RNA-seq representation learning. TifBERT converts each unordered expression profile into a sample-specific gene sequence using term frequency-inverse document frequency (TF-IDF) ordering, prioritizing genes that are both highly expressed within a sample and selectively expressed across the cohort. It is then pretrained using masked gene modeling, predicting gene identities from transcriptomic context rather than reconstructing expression values. Pretrained on harmonized TCGA Pan-Cancer data spanning five RNA-seq normalization schemes, TifBERT learns contextual representations across approximately 10,000 genes without expression binning, landmark-gene restriction, or external biological embeddings. Across 33 TCGA cancer types, TifBERT achieved 90.83% accuracy, 0.996 macro AUC-ROC, and 0.903 MCC. It also captured pathway-level biology, achieving mean sample-wise and pathway-wise Pearson correlations of 0.754 and 0.762 across 1,387 PARADIGM pathway activities. Independent evaluation on GTEx healthy tissues showed preservation of tissue-level transcriptomic structure without retraining. In comparison with existing models, TifBERT achieves competitive subtype discrimination with substantially greater stability and produces markedly richer embedding geometry (effective rank 95.6 versus 6.3), without requiring expression discretization or in-distribution pretraining exposure. Together, TifBERT provides a scalable, normalization-independent foundation model for reusable bulk transcriptomic representation learning

22.
medRxiv (Medicine) 2026-06-19

"Us with them": Co-designing a caesarean section consent and debriefing intervention in West Cameroon

Background Women-centred maternity care is a rights issue that determines the use of services. Such care ensures responsiveness to womens needs which is enacted through shared decision-making, review and response. In the West Region of Cameroon, informed consent (IC) and Debriefing for caesarean section (c-section) have been shown to be suboptimal or absent. This paper describes the participatory design of a quality-improvement hospital-based intervention. Methods From February to May 2025, we conducted a co-design process with three groups of stakeholders: 59 post c-section women and community representatives, 78 frontline c-section providers, and 29 directors of public and private hospitals. We followed four phases: planning, conducting, evaluating, and reporting. The conduct phase comprised five all-day workshops with post c-section women and community representatives, followed by five all-day workshops with the c-section providers. Finally, we held an 11th workshop with the hospital directors to scrutinize suggested interventions, evaluate their feasibility, and establish a consensus on their components. We described the intervention using the TIDieR (Template for Intervention Description and Replication) checklist. We documented the co-design process, using open-ended narratives to delineate interventions, and carried out real-time synthesis on visual aids (whiteboards and flipcharts). Intervention feasibility was quantified using a structured ad hoc matrix, while insights on facilitators and barriers were captured through qualitative free-text entries. We coupled data collection with constant comparison and triangulation through contemporaneous field notes, photographic documentation, and thematic mapping of stakeholders perceptions and interactive dynamics. Results Participants perspectives on the co-design were positive, and their motivation were very high although less than 50% reported previous involvement in co-design processes. More than 80% of participants found rated the co-design process as either good or very good. The final intervention comprised four components: (i) an in-service training; (ii) a standard operating procedure including a harmonised consent form and debriefing checklist; (ii) systematic supportive supervision, monitoring & evaluation; and (iv) a routine clinical audit. Each group of stakeholders upheld specific dimensions of the consent and debrief intervention. Post c-section women and community members emphasized emotional support, written discharge advice after debriefing, and zero tolerance of suboptimal consent and debriefing practices. Frontline c-section providers insisted on robust documentation for medico-legal protection. Hospitals Directors emphasized capacity-building and cultural friendliness. All the groups supported womans autonomous decision making. The intervention feasibility was rated high or very high by hospital directors except for the financial, infrastructural and technical domains. Conclusion This co-design process yielded a context-specific, multi-component intervention that was well accepted and deemed feasible across stakeholders. It provides a methodological approach to strengthening informed consent and debriefing as core elements of women-centred, accountable maternity care, and warrants implementation.

23.
arXiv (CS.AI) 2026-06-19

Policy-aware Vector Search: A Vision for Fine Grained Access Control in Vector Databases

arXiv:2606.19803v1 Announce Type: cross Abstract: Vector databases are increasingly used in security sensitive contexts with Retrieval Augmented Generation and organizational AI pipelines; however, their security capabilities remain limited. Specifically, Fine-grained Access Control (FGAC) which is required to ensure that data access adheres to user-specific policies is not fully supported in modern vector databases. Unlike relational databases, vector databases combine structured and unstructured attributes to provide semantic, approximate query results, which complicates FGAC implementation. This creates an inherent tension between enforcing FGAC policies correctly, achieving high ANN search recall and maintaining low query latency. In this paper, we present a vision for Policy-aware Vector Search by formalizing the FGAC policy model in vector databases as well as the enforcement problem. We compare various enforcement strategies, present preliminary findings, and identify key open challenges for future research in policy-aware vector search.

24.
arXiv (CS.CL) 2026-06-19

What Makes Effective Supervision in Latent Chain-of-Thought: An Information-Theoretic Analysis

Latent Chain-of-Thought (CoT) internalizes reasoning within continuous hidden states, offering a promising alternative to verbose discrete reasoning traces. However, robust latent reasoning remains difficult because outcome supervision provides weak learning signals and leaves latent trajectories prone to semantic drift. In this work, we analyze Latent CoT from an information-theoretic perspective and identify this failure as a dual collapse: gradient attenuation along the optimization path and representational drift in the latent space. We further decompose process supervision into two complementary dimensions: Trajectory Supervision, which injects dense stepwise reasoning signals, and Space Supervision, which preserves the semantic structure of the latent manifold. Our analysis shows that rigid geometric compression can collapse the reasoning space, whereas generative reconstruction provides a more flexible semantic anchor that better preserves information capacity. To measure these effects, we introduce the Unified Latent Probe (ULP), which quantifies the mutual information between latent trajectories and explicit reasoning steps. Experiments reveal a clear Information-Performance Binding: reasoning accuracy depends on the information fidelity preserved in the latent chain. These findings provide a principled framework for latent reasoning supervision and suggest shifting from geometric imitation toward mutual information maximization. Our code is available at \href{https://github.com/EIT-NLP/Supervision-in-Latent-CoT}{this repository}.

25.
arXiv (CS.LG) 2026-06-16

Beyond Artifacts: Towards Generalizable Synthetic Song Detection via Music-Intrinsic Features

arXiv:2606.16612v1 Announce Type: cross Abstract: The rapid advancement of AI music generators highlights the urgent need for reliable Synthetic Song Detection (SSD). Existing SSD methods often rely on low-level artifacts or fixed feature assumptions, struggling to capture generator-agnostic cues. To address this, we propose Sofia (Synthetic-song detection framework via music features), a flexible framework that models music-intrinsic attributes via feature-specific experts and an adaptive Mixture-of-Experts (MoE) module. By configuring Sofia with representative Vocal, Audio-effect, Global structure features, and their combinations, we present their individual and complementary contributions. To comprehensively evaluate our framework, we further construct MUSIC8K, a challenging benchmark featuring lastest emerging generators and realistic audio perturbations. Experiments show that Sofia learns generator-agnostic representations from music-intrinsic features, improving the F1 score by 18.5 points over the strongest baseline on MUSIC8K-O while maintaining strong robustness.