Academic Intelligence · Curated Daily

Explore the Frontier of Global Academia

AcademicHub aggregates real-time literature from top journals and preprint platforms. Build your personal research radar and let large language models compile cross-disciplinary analysis briefings automatically.

01.
arXiv (CS.CL) 2026-06-17

Reading between the Lines: Leveraging Large Language Models for Global Dementia and Depression Assessment from Clinical Interviews

Dementia and depression are the most prevalent neuropsychiatric disorders in geriatric populations, and their overlapping symptoms pose major challenges for differential diagnosis. In this study, we investigate open-weights Large Language Models (LLMs) for predicting dementia and depression severity from speech samples collected during standardized history taking interviews with 154 German-speaking subjects. We introduce an observer-based Global Depression Scale (GDS-D) aligned with the established Global Deterioration Scale (GDS), enabling parallel global staging of affective and cognitive symptoms. We compare three LLMs (Mistral 3.1, DeepHermes, Qwen3) in two settings: (1) zero-shot prediction and (2) LLM-based feature extraction for Support Vector Regression, using human and pause-enriched transcripts. Results show that LLMs effectively predict depression severity in zero-shot settings (best MAE of 0.60), while dementia assessment benefits substantially from structured feature extraction (best MAE of 0.78), reducing errors by up to 35% over zero-shot baselines. Pause-enriched transcripts achieve competitive performance with human transcriptions, demonstrating the viability of fully automatic screening pipelines for differential neuropsychiatric assessment.

02.
arXiv (CS.AI) 2026-06-24

BioMedArena: An Open-source Toolkit for Building and Evaluating Biomedical Deep Research Agents

arXiv:2605.06177v2 Announce Type: replace Abstract: Reproducing and comparing deep research agents today is hard: the same backbone evaluated on the same benchmark can report different accuracies across papers because the harness and tool registry differ, and integrating a new model into a comparable evaluation surface costs weeks of model-specific engineering. These are symptoms of a broader reproducibility problem in deep research agent research. Here, we introduce BioMedArena, an open-source toolkit that addresses this reproducibility gap and provides an arena for comparing deep research agents under a shared evaluation environment. BioMedArena decouples six layers of biomedical agent evaluation – benchmark loading, tool exposure, tool selection, harness mode, context management, and scoring – and exposes 166 biomedical benchmarks and 75 biomedical tools across 9 functional families. Adding a new model, benchmark, or tool can be accomplished with a few-line provider adapter. Beyond evaluation infrastructure, BioMedArena ships a library of high-quality reference components: 6 agent harnesses (including our proposed Mutual-Evolve) and 6 context-management strategies, any of which can be equipped on any backbone. Equipping these components substantially improves all 12 backbones; on each of 8 representative biomedical benchmarks, the best equipped backbone surpasses prior state-of-the-art (SOTA), by 15.01 percentage points on average. The toolkit, configurations, and per-task traces are available at https://github.com/AI-in-Health/BioMedArena.

03.
arXiv (CS.CL) 2026-06-15

Dialogue SWE-Bench: A Benchmark for Dialogue-Driven Coding Agents

AI coding agents have rapidly transformed software engineering, powering widely used interactive coding assistants. Despite their interactive real-world use, existing benchmarks evaluate them as fully-autonomous systems. In this work, we introduce Dialogue SWE-Bench, an automatic benchmark dataset for evaluating the ability of coding agents to resolve real-world software engineering problems through dialogue with a user. We design a novel, persona-grounded user simulator to support our task evaluation, and augment our task evaluation with automatic evaluations of dialogue quality. We also propose a new schema-guided agent, aimed at improving the dialogue capabilities of off-the-shelf coding agents, which improves over strong baselines by 3-14%. Our results indicate that better coding models do not always correspond to better dialogue models, suggesting that dialogue capability is a distinct and currently understudied dimension of coding agent performance.

04.
arXiv (CS.LG) 2026-06-19

Pose6DAug: Physically Plausible Multi-view Object Swapping for Robot Data Augmentation

arXiv:2606.20118v1 Announce Type: cross Abstract: Vision-language-action (VLA) policies have shown strong potential for general-purpose manipulation, yet they often fail on novel, out-of-distribution objects whose appearance or geometry deviates from the training distribution. The standard remedy is to collect multi-view teleoperation data for every failure case, but this scales poorly in both cost and time. We introduce Pose6DAug, a failure-driven data augmentation framework that turns a policy's own successful episodes into targeted demonstrations for its failure modes, without any new data collection. Our key insight is that each successful episode already encodes a physically valid action trajectory together with calibrated multi-view observations. By swapping only the manipulated object while preserving this trajectory, we obtain new and physically grounded demonstrations. However, naive 2D video editing breaks multi-view consistency and physical plausibility, particularly under heavy occlusion and egocentric viewpoints. Our method instead operates directly in 3D, anchoring the target object with an explicit mesh driven by a temporally coherent 6D pose trajectory, ensuring geometrically consistent renderings across all camera views. Fine-tuning a VLA on data augmented by our method improves success rates by 16.5% relative to the state-of-the-art baseline on novel objects, while preserving in-distribution performance. These results show that multi-view and physically consistent augmentation is a practical path to scalable VLA generalization.

05.
arXiv (CS.AI) 2026-06-15

AFFORDANCE20Q: Evaluating Affordance Reasoning from Physical Properties

arXiv:2606.14240v1 Announce Type: new Abstract: Affordance reasoning, the inference of an object's action possibilities from its physical properties (e.g., shape and material), is fundamental to human physical understanding and increasingly critical for Large Language Models (LLMs). However, existing affordance benchmarks largely expose explicit object identities in the evaluation setup, allowing models to rely on memorized object-affordance mappings rather than reasoning over physical properties. To address this gap, we introduce Affordance20Q, a novel affordance reasoning benchmark formulated as a 20-Questions game without exposing the object's identity. In each game, the model identifies a hidden object's affordance from a candidate set by asking yes/no questions about its physical properties. Affordance20Q comprises 1,009 games over 454 objects and 59 affordances, all manually filtered, refined, and annotated. We conduct comprehensive experiments with 15 state-of-the-art LLMs and find a substantial gap (~20 points) compared to human performance. A KL-based information-gain (IG) analysis further shows that models fail to ask discriminating questions as the game progresses. To close the gap, we develop KB-Anchored Rule Induction (KARI), a pipeline based on LLMs that generates affordance rules grounded in evidence from knowledge bases (KBs). KARI improves open-source LLMs by up to 15.2 points, while the limited coverage of KBs hinders further gains. We release all our code and data at https://github.com/1171-jpg/Affordance20Q.git

06.
arXiv (CS.CL) 2026-06-18

CEO-Bench: Can Agents Play the Long Game?

Language model agents are becoming proficient executors at isolated, short-horizon tasks such as software engineering and customer service. Yet real-world challenges require a combination of sophisticated skills that remain largely untested in agents: (1) navigating long horizons amid uncertainty; (2) acquiring information in noisy environments; (3) adapting to a changing world; (4) orchestrating multiple moving parts toward a coherent goal. We introduce CEO-Bench, which evaluates these capabilities together by simulating a representative real-world task: operating a startup for 500 days. An agent manages pricing, marketing, budgeting, and many other aspects of a fictional company through a programmable Python interface, operating in the same environment and facing the same challenges as a human CEO. Success demands analyzing noisy, interconnected business databases, translating signals into sound strategy, and coordinating many decisions with programming. The strongest agents write sophisticated code that simulates customer cohorts to forecast future cash and mines negotiation history to uncover hidden customer preferences. Even so, most state-of-the-art models struggle in this environment. Only Claude Opus 4.8 and GPT-5.5 finish above the $1M starting balance, and neither consistently turns a profit. CEO-Bench takes a first step toward measuring the intelligence required to drive sustained, adaptive progress over time.

07.
PLOS Medicine 2026-06-12

Comparison of count-based and clustering definitions of multimorbidity and their association with prevalence of multimorbidity, health profiles, and mortality: A cohort study of UK Biobank participants

by Gabriella C. Silva, Aurore Fayosse, Louis Jacob, Séverine Sabia, Archana Singh-Manoux, Benjamin Landré Background Multimorbidity, the presence of several chronic conditions, is linked to higher mortality and healthcare use and thus poses a major challenge for aging populations. While most studies rely on simple counts of conditions, clustering approaches have been proposed to describe patterns of co-occurring diseases. We aimed to evaluate the extent to which these methodological choices influence prevalence and association with health profiles and mortality. Methods and findings Using UK Biobank baseline data (n = 474,397), collected between 2006 and 2010, we compared six count-based definitions of multimorbidity based on different condition lists (extended, most prevalent, or body systems) and thresholds (≥2 versus ≥3 conditions). We also applied a clustering analysis to characterize subtypes of multimorbidity among participants with at least two chronic conditions. We compared prevalence and associations with concurrent health outcomes (polypharmacy, self-rated health, frailty, falls, surgery, chronic pain), blood-based measures (C-reactive protein, Cystatin-C, HDL, LDL Cholesterol, IGF-1), and 3- and 10-year mortality risks. Analyses were undertaken separately in men and women using multivariable regression models adjusted for sociodemographic characteristics and body mass index. Multimorbidity prevalence ranged from 1.0% (cluster-based) to 35.3% (count-based). Count-based definitions using lists with more conditions yielded higher prevalence. Higher thresholds identified more severe health profiles on all measured health outcomes, blood-based measures, but not higher mortality risks. Associations with blood-based measures were more pronounced using clustering, with the highest differences from the standard definition distributed across clusters. Odds ratios for 3-year mortality ranged from 1.44 [1.26; 1.64] to 4.60 [3.73; 5.62] for men and 1.35 [1.07; 1.69] to 3.83 [2.78; 5.14] for women. For 10-year mortality, they ranged from 1.42 [1.34; 1.50] to 3.86 [3.46; 4.30] in men and 1.29 [1.21; 1.39] to 3.33 [2.93; 3.77] for women, with clustering identifying groups with low prevalence and high mortality risks. Findings should be interpreted in light of the selected nature of the UK Biobank cohort and the cross-sectional assessment of several health indicators. Conclusion Operational definitions of multimorbidity substantially influence prevalence estimates, while associations with mortality appear more robust across count-based approaches. Clustering analyses provide complementary insights into heterogeneity within multimorbid populations. Future translational studies are warranted to determine how multimorbidity definitions can be optimized to ultimately improve clinical management and health outcomes in practice.

08.
arXiv (CS.AI) 2026-06-12

SciR: A Controllable Benchmark for Scientific Reasoning in LLMs

arXiv:2606.13020v1 Announce Type: new Abstract: Three paradigmatic forms of inference recur across scientific reasoning: deduction, induction, and causal abduction. Reliably evaluating LLMs on these in scientific settings is currently out of reach: scientific benchmarks built on human annotations are costly and lack mechanistic ground truth, while synthetic logical-reasoning benchmarks do not resemble real scientific documents. We introduce SciR, a benchmark that combines multi-paradigm reasoning with controllable scientific rendering, anchored on three paradigmatic scientific problems. Tasks are generated from formal objects (deduction tree, inductive rule hypothesis, causal graph) to guarantee verifiable answers, then rendered into multi-document scientific discourse via per-track domain-tuned genres. The construction lets us independently vary two difficulty axes: how hard it is to extract the key information needed for inference, and how hard the principled inference itself is. We test six models. Both axes hurt every model, and their effects compound. The rendering even hurts neurosymbolic pipelines, which hand inference to a verified solver. The two axes yield a per-model extraction-vs-inference profile: for instance, reasoning models like deepseek-r1 mostly surpass non-reasoning instruct models on the inference axis. To our knowledge, SciR is the first multi-paradigm scientific-reasoning benchmark with parametric control on both extraction and inference difficulty.

09.
arXiv (CS.AI) 2026-06-12

Algorithmic Constitutionalism

arXiv:2606.12437v1 Announce Type: cross Abstract: The increasing encroachment of artificial intelligence (AI) on social life raises significant risks for society, particularly within the infospheres created and controlled by companies such as Google, Facebook, Apple, and Amazon. This article examines these risks through an in-depth analysis of Facebook's content moderation regime, which is already partially governed by algorithms. We argue that the idea of ethical engineering, often proposed in the literature as a solution to the governance challenges posed by AI, is inadequate for several reasons. In response, we develop an alternative framework, which we term "algorithmic constitutionalism." Our approach rests on three pillars: (a) a layered architecture consisting of two levels of code: (i) an operative or object level and (ii) a meta level designed to protect the system's core principles from algorithmically initiated change; (b) algorithmic meta-reasoning, which enables the system to operate simultaneously at both levels so that it can monitor, verify, and potentially correct in real time operations at the object level that depart from principles protected at the meta-code level; and (c) correction through deliberation. The article elaborates the concept of algorithmic constitutionalism and demonstrates how it may be applied to Facebook's content moderation regime. As part of this analysis, we examine the tension between societal constitutionalism and algorithmic constitutionalism. Paradoxically, attempts to subject AI systems to external deliberative control may also enable AI agents to intervene in that process, potentially undermining its purpose. The article concludes by considering the implications of this argument for the European Digital Services Act, which entered into force in October 2022.

10.
arXiv (CS.CV) 2026-06-15

Aligned but Stereotypical? How System Prompts Shape Demographic Bias in LLM-Based Text-to-Image Models

Text-to-image (T2I) systems increasingly rely on Large Language Model (LLM)-based text conditioning to interpret and expand user prompts. While this improves prompt understanding and text-image alignment, we find that it can also introduce implicit demographic assumptions, even when demographic attributes are unspecified. To systematically investigate this behavior across varying levels of prompt ambiguity and complexity, we construct a comprehensive benchmark covering diverse prompt settings. Evaluations on eight recent T2I models show that LLM-based systems consistently exhibit stronger demographic skew than non-LLM-based baselines. We further analyze system prompts, a component unique to LLM-based T2I systems that guides prompt interpretation and expansion. Our analyses show that these instructions strongly influence text embeddings, which subsequently leads to biased image generations. Motivated by these findings, we propose FairPro, a training-free debiasing framework that adaptively generates fairness-aware instructions while preserving user intent. Experiments demonstrate that FairPro substantially reduces demographic disparities while maintaining prompt fidelity.

11.
arXiv (quant-ph) 2026-06-16

Atom–photon Entanglement with a Single Trapped Cesium Atom

arXiv:2605.28968v2 Announce Type: replace Abstract: We demonstrate atom–photon entanglement using a single cesium atom trapped in an optical tweezer. Entanglement is generated by resonant excitation and subsequent spontaneous decay, which entangles the atomic Zeeman state with photon polarization. The photon is collected with a high numerical aperture objective (NA = 0.55) and coupled into a single-mode fiber, enabling atom photon measurements and measurement of the Bell-state fidelity. We obtain raw entanglement fidelity of ${\mathcal F} = 0.942(16)$ and inferred fidelity of ${\mathcal F}_inf = 0.962(26)$ after correcting independently characterized atom measurement errors. Compared with related free-space experiments using $^{87}$Rb, the multilevel structure of the relevant excited state in $^{133}$Cs requires the use of a single short excitation pulse in each entanglement attempt in order to suppress unwanted re-excitation. These results establish a free-space Cs atom–photon interface and provide a step toward dual-species Rb–Cs quantum networking.

12.
arXiv (CS.LG) 2026-06-24

Experiments with Optimal Model Trees

arXiv:2503.12902v4 Announce Type: replace Abstract: Model trees provide an appealing way to perform interpretable machine learning for both classification and regression problems. In contrast to ``classic'' decision trees with constant values in their leaves, model trees can use linear combinations of predictor variables in their leaf nodes to form predictions, which can help achieve higher accuracy and smaller trees. Typical algorithms for learning model trees from training data work in a greedy fashion, growing the tree in a top-down manner by recursively splitting the data into smaller and smaller subsets. Crucially, the selected splits are only locally optimal, potentially rendering the tree overly complex and less accurate than a tree whose structure is globally optimal for the training data. In this paper, we empirically investigate the effect of constructing globally optimal model trees for classification and regression with linear support vector machines at the leaf nodes. To this end, we present mixed-integer linear programming formulations to learn optimal trees, compute such trees for a large collection of benchmark data sets, and compare their performance against greedily grown model trees in terms of interpretability and accuracy. We also compare to classic optimal and greedily grown decision trees, random forests, and support vector machines. Our results show that optimal model trees can achieve competitive accuracy with very small trees. We also investigate the effect on the accuracy of replacing axis-parallel splits with multivariate ones, foregoing interpretability while potentially obtaining greater accuracy.

13.
Science (Express) 2026-05-21

DNA polymerization activates RNA cleavage of a reverse transcriptase–like antiviral enzyme | Science

Authors: Unknown Author

Defense-associated reverse transcriptases (DRTs) transcribe noncoding RNAs (ncRNAs) for antiviral defense, but the mechanisms of ncRNA-independent DRTs remain unclear. In this work, we show that a single DRT4 mediates RNA-targeting antiphage defense by integrating DNA polymerase, exonuclease, and RNA endonuclease activities. First, through an equilibrium between its DNA polymerase and exonuclease activities, DRT4 senses phage infection, as elevated dNTP levels shift the equilibrium toward polymerase activity, thereby promoting protein-primed single-stranded DNA (ssDNA) synthesis. Second, ssDNA of sufficient length, phage DNA-binding proteins, and deoxyguanosine triphosphate collectively activate an unusual RNA endonuclease activity of DRT4, excising 3′–guanosine monophosphate from both phage and host RNA to terminate infection. These findings reveal a distinctive immune strategy combining nucleic acid synthesis and degradation, expanding the functional landscape of DRTs for new DNA- and RNA-processing technologies.

14.
arXiv (quant-ph) 2026-06-16

Non-Gaussian Phase Transition and Cascade of Instabilities in the Dissipative Quantum Rabi Model

arXiv:2507.07092v3 Announce Type: replace Abstract: The open quantum Rabi model describes a two-level system coupled to a harmonic oscillator. A Gaussian phase transition for the nonequilibrium steady states has been predicted when the bosonic mode is soft and subject to damping. We show that oscillator dephasing is a relevant perturbation, which leads to a non-Gaussian phase transition and an intriguing cascade of instabilities for $k$-th order bosonic operators, as well as a jump in the steady-state qubit polarization. For the soft-mode limit, the equations of motion form a closed hierarchy and spectral properties can be efficiently studied. To this purpose, we establish a fruitful connection to non-Hermitian Hamiltonians. The results for the phase diagram, stability boundaries, and relevant observables are based on mean-field analysis, exact diagonalization, perturbation theory, and Keldysh field theory.

15.
arXiv (CS.CV) 2026-06-16

Position: The Systemic Lack of Agency in Visual Reasoning

This paper argues that a systemic lack of Agency constrains the implicit reasoning capabilities of current Vision-Language Models (VLMs). Implicit reasoning refers to the ability to autonomously discover and utilize hidden visual evidence to bridge information gaps, rather than merely relying on explicitly specified targets. This capacity underlies human visual understanding and everyday reasoning. We argue that this limitation arises from a tendency to approach visual reasoning primarily as passive semantic retrieval, rather than as active, situated reasoning that depends on autonomous visual exploration. As a result, most existing benchmarks primarily assess Passive Capacity, leaving this aspect of reasoning largely unmeasured. To address this gap, we introduce the Visual Implicit Reasoning Diagnosing Benchmark (V-IRD), which targets this missing quadrant by requiring models to derive answers strictly through autonomous visual analysis. Our results show that, despite strong retrieval abilities, prominent VLMs struggle to utilize reference objects and to attend to visual evidence that requires self-directed inquiry. Simply put, strong semantic recognition does not equate to active visual exploration, revealing a critical gap in current VLMs. More information can be found at https://haoychen.github.io/Implicit-Reasoning/

16.
bioRxiv (Bioinfo) 2026-06-11

DLDN-Bench: A Benchmark Framework for Deep Learning de Novo Peptide Sequencing in Proteomics

De novo peptide sequencing is an essential approach for analyzing mass spectrometry data because it enables the identification of novel peptides without relying on protein sequence databases. Recent advances in deep learning have substantially improved the performance of de novo sequencing methods, but the rapid emergence of new models has led to heterogeneous evaluation practices and limited comparability. To address this, we introduce DLDN-Bench, a benchmark framework including a set of benchmark datasets derived from human muscle biopsy mass spectrometry data retrieved from PRIDE and annotated through consensus across multiple widely used database search engines. Using these datasets, we systematically benchmark recent deep learning-based de novo sequencing tools alongside traditional approaches. Performance is assessed using established metrics, including precision and coverage relative to a pseudo-ground truth defined by cross-engine agreement. To demonstrate the utility of DLDN-Bench, we benchmark four recent deep learning models and make all results publicly available. This benchmark framework provides a standardized basis for comparing state-of-the-art methods and offers an extensible resource for evaluating future tools in de novo peptide sequencing.

17.
arXiv (quant-ph) 2026-06-11

Exact Dynamics of Topological Order Across a CDW–SPT Transition

arXiv:2606.11303v1 Announce Type: cross Abstract: We investigate the nonequilibrium dynamics of a one-dimensional interacting system across a transition from a charge-density-wave (CDW) phase to a symmetry-protected topological (SPT) phase. Starting from a CDW initial state, we study both sudden quenches and slow ramps into the SPT regime. While the CDW order melts under both protocols, the fate of topological order is sharply different. Following a sudden quench, long-range SPT order does not emerge because the post-quench state contains a finite density of excitations above the topological ground state. In contrast, slow ramps allow the system to follow the instantaneous ground state away from the critical region, enabling the buildup of SPT order with deviations governed by Kibble-Zurek defect production. The dynamics is solvable via a unitary mapping to a quadratic fermionic Hamiltonian, allowing us to compute the Loschmidt echo, correlation functions, and string correlator. The Loschmidt rate function exhibits cusps signaling dynamical quantum phase transitions, while the correlation dynamics reveal the contrasting mechanisms governing quenches and ramps across the transition. These results demonstrate that entering the topological regime is not sufficient for the emergence of topological order; the decisive factor is the suppression of excitation production during the evolution.

18.
arXiv (CS.CV) 2026-06-17

Beware of Aliases – Signal Preservation is Crucial for Robust Image Restoration

Image restoration networks are usually comprised of an encoder and a decoder, responsible for aggregating image content from noisy, distorted data and to restore clean, undistorted images, respectively. Data aggregation as well as high-resolution image generation both usually come at the risk of involving aliases, i.e.~standard architectures put their ability to reconstruct the model input in jeopardy to reach high PSNR values on validation data. The price to be paid is low model robustness. In this work, we show that simply providing alias-free paths in state-of-the-art reconstruction transformers supports improved model robustness at low costs on the restoration performance. We do so by proposing BOA-Restormer, a transformer-based image restoration model that executes downsampling and upsampling operations partly in the frequency domain to ensure alias-free paths along the entire model while potentially preserving all relevant high-frequency information.

19.
medRxiv (Medicine) 2026-06-24

Cardiologists perspectives on sociocultural and structural factors shaping cardiovascular genetic testing

Introduction: Genetic testing is increasingly central to the diagnosis and management of cardiovascular genetic conditions. However, use and follow-through vary across patient populations. Examining clinician perspectives on sociocultural and structural factors influencing testing is important for understanding these differences and informing public health genomics research and implementation efforts. Methods: We conducted semi-structured interviews with 15 cardiologists from health systems across the United States who have integrated cardiogenetics in their practice. Interviews explored experiences diagnosing cardiovascular genetic conditions among patients from underrepresented backgrounds, as well as approaches to incorporating social and contextual information into care. Data were coded thematically and analyzed using a framework analysis guided by the Health Equity Implementation Framework and Social Determinants of Health domains. Results: Clinicians described multi-level factors shaping genetic testing practices, including patient-provider interactions, clinical workflows, health system infrastructure, and broader policy contexts. Key themes included challenges communicating complex genetic information across language and literacy differences; patient trust shaped by prior healthcare experiences; fragmented insurance coverage separating genetic testing from genetic counseling; and challenges interpreting variants of uncertain significance, particularly for populations underrepresented in genomic reference databases. Clinicians also described adaptive strategies, such as interdisciplinary collaboration, telehealth, and patient assistance programs, that supported testing in some settings but were often inconsistent or resource-dependent. Conclusion: Among cardiologists using genetic testing, system-level and sociocultural factors shape the feasibility and downstream use of cardiovascular genetic testing. Findings highlight considerations for public health-informed genomic infrastructure that accounts for social context, supports communication, and reduces reliance on individual clinician workarounds, with implications for clinical decision support and related public health genomics initiatives.

20.
arXiv (CS.AI) 2026-06-18

Optimizing Lithium Production Decisions under Geological, Demand, and Pricing Uncertainties: A POMDP Framework for Multi-Objective Decision Making

arXiv:2606.18598v1 Announce Type: new Abstract: Decision making in lithium production is challenging, whether from an investor's perspective or a strategic production standpoint. Determining which mines to open and when to open them involves not only geological and price uncertainties, but also complexities around the choice of extraction method, from direct lithium extraction to hard rock mining. Prior work explored models of this problem and different methods to optimize mining decisions; these models did not account for uncertainty in pricing, uncertainty in demand, or different mining technologies to extract lithium. Incorporating different pricing models and extraction technology into these models enables more robust strategies for determining not only when and where to open a mine, but also which method of production to pursue. We frame the problem as a partially observable Markov decision process (POMDP) and solve using belief state planning methods to get optimal decision making. In our study, we show that POMDP solvers outperform human inspired heuristics by dynamically adapting to shifting lithium price regimes (static, linear, exponential, and stochastic) through belief state planning and explicit uncertainty management. By optimally sequencing exploration, production, and technology choice, the framework achieves higher demand fulfillment and more balanced economic environmental outcomes over the projects lifetime in all different pricing and deposit scenarios.

21.
arXiv (CS.AI) 2026-06-17

A Gradient-based Causal Discovery Framework with Applications to Complex Industrial Processes

arXiv:2507.11178v3 Announce Type: replace-cross Abstract: With the advancement of deep learning technologies, various neural network-based Granger causality models have been proposed. Although these models have demonstrated notable improvements, several limitations remain. Most existing approaches adopt the component-wise architecture, necessitating the construction of a separate model for each time series, which results in substantial computational costs. In addition, imposing the sparsity-inducing penalty on the first-layer weights of the neural network to extract causal relationships weakens the model's ability to capture complex interactions. To address these limitations, we propose Gradient Regularization-based Neural Granger Causality (GRNGC), which requires only one time series prediction model and applies $L_{1}$ regularization to the gradient between model's input and output to infer Granger causality. Moreover, GRNGC is not tied to a specific time series forecasting model and can be implemented with diverse architectures such as KAN, MLP, and LSTM, offering enhanced flexibility. Numerical simulations on DREAM, Lorenz-96, fMRI BOLD, and CausalTime show that GRNGC outperforms existing baselines and significantly reduces computational overhead. Meanwhile, experiments on real-world DNA, Yeast, HeLa, and bladder urothelial carcinoma datasets further validate the model's effectiveness in reconstructing gene regulatory networks.

22.
arXiv (CS.CV) 2026-06-16

CPS4: Class Prompt driven Semi-Supervised Spine Segmentation with Class-specific Consistency Constraint

Vision Language Model (VLM) has great potential to enhance the quality of pseudo labels in semi-supervised spine segmentation by leveraging textual class prompts to generate segmentation map, but no one has studied it yet. Although promising, it lacks explicit constraints to ensure consistency between spine class prompts and spine unit region, resulting in unsatisfactory performance in multi-class segmentation map generation. In this paper, we propose CPS4, the first text-guided semi-supervised spine segmentation network using class prompts to enhance the quality of spine pseudo labels. Specifically, CPS4 is implemented through two training stages. (i) Class-specific consistency constrained VLM pretraining stage: we propose token- and pixel-level attention loss to optimize the consistency between class prompts and spine units, forcing the textual class prompt to be closely coupled with the target spine unit in the semantic space. (ii) Class Prompt driven semi-supervised spine segmentation stage: using the pretrained vision-text encoder, we derive each class-specific binary segmentation map for the unlabeled spine image and integrate them into an unified multi-class segmentation map, improving the quality of the spine pseudo label generated by the semi-supervised spine segmentation network. Experimental results show that our CPS4 achieves superior spine segmentation performance with Dice of 80.44%, only using 5% labeled data on the public spine segmentation dataset, surpassing popular semi-supervised learning and VLM methods. Our code will be available.

23.
arXiv (CS.LG) 2026-06-19

Toward all-optical unsupervised Hebbian learning in deep photonic neuromorphic networks

arXiv:2601.22300v3 Announce Type: replace-cross Abstract: We propose a deep photonic neuromorphic network (PNN) architecture based on phase-change material (PCM) synapses and local optical feedback for online, unsupervised Hebbian learning. The proposed architecture combines optical vector-matrix multiplication, non-volatile PCM synaptic weighting, and local coincidence-driven synaptic adaptation within a multilayer photonic crossbar framework compatible with photonic integrated circuits. Unlike conventional PNNs that rely on externally computed gradients, repeated optical-electrical-optical conversions, or global backpropagation, the proposed framework employs local Hebbian learning governed directly by correlated pre- and post-synaptic optical activity. To investigate the feasibility of the proposed learning mechanism, we implemented the PNN design using fiber-optic components, programmable variable optical attenuators, and real-time software control that incorporates PCM thermal dynamics. Supervised and unsupervised learning behaviors were experimentally evaluated under both offline and online learning conditions using representative image-recognition tasks. The experimental results demonstrate adaptive synaptic evolution, successful optical inference, and autonomous pattern encoding through local Hebbian learning under realistic fiber-optic hardware conditions. These results establish a pathway toward future integrated photonic neuromorphic systems capable of scalable and energy-efficient online Hebbian learning.

24.
arXiv (CS.CL) 2026-06-16

The Truth Stays in the Family: Enhancing Contextual Grounding via Inherited Truthful Heads in Model Lineages

Recent advances in large language models (LLMs) have produced many specialized multimodal LLMs (MLLMs) that share common foundational LLMs, forming distinct model lineages. It remains unclear whether a fundamental behavioral link exists between the foundational LLMs and downstream variants. We investigate this question by quantifying head-level context-truthfulness scores. Across diverse LLM and MLLM lineages, including Vicuna-, Qwen2.5-, LLaMA2-, and Mistral-based models, we find that Truth Scores are strongly preserved within model families, even after instruction tuning or multimodal adaptation. We further show that this inheritance is consistent with attention-head weight preservation, and that context-truthful heads attend to query-relevant evidence. Building on this finding, we propose TruthProbe, a soft-gating strategy that amplifies context-truthful heads while preserving other head contributions. TruthProbe improves contextual truthfulness on HaluEval and reduces multimodal hallucination on POPE and CHAIR, with base-LLM Truth Scores transferring effectively to their fine-tuned LLM and MLLM descendants. Code is available at https://github.com/miso-choi/TruthProbe.

25.
arXiv (CS.CV) 2026-06-19

SketchKeyAnime: Reference-anchored Sparse Key-Sketch Animation Synthesis

Traditional animation production relies heavily on manual drawing and iterative refinement, particularly for key-pose design, in-betweening, and character coloring. While existing animation and video generation methods have made notable progress, they typically depend on RGB boundary frames, dense frame-wise conditions, or complete sketch sequences, limiting their applicability under low-cost input conditions. We present SketchKeyAnime, a video diffusion framework for generating structurally controllable, appearance-consistent, and temporally coherent animations from sparse key-sketch inputs. Given a single reference RGB image and a few temporally indexed key sketches, SketchKeyAnime introduces a dual-branch conditioning mechanism to encode local geometric constraints alongside semantic-temporal context. It leverages Sketch Cross Attention to fuse reference image and sketch conditions with learnable gating, and incorporates an Adaptive Weighted Loss to strengthen supervision on key-sketch frames and line-art regions. Experimental results on the Aesthetic subset of Sakuga-42M show that our approach consistently outperforms representative animation interpolation and sketch-guided generation baselines. Compared to the best-performing baseline, SketchKeyAnime reduces EDMD by 31.9\% and FVD by 9.5\%, demonstrating superior sketch fidelity and temporal coherence, while achieving the best overall performance across most quantitative metrics. These results validate the proposed framework and highlight its potential for low-cost, highly controllable animation creation.