Predicting Depression and Anxiety Progression in Multiple Sclerosis from Longitudinal Clinical Data Using Machine Learning
Depression and anxiety are highly prevalent in multiple sclerosis (MS), yet tools for predicting mental health trajectories from clinical data remain limited. We investigated what structured electronic health record data can predict about depression and anxiety progression in MS, and where its limits lie. We developed gradient boosting models to predict PHQ-9 (depression) and GAD-7 (anxiety) score change using EHR data from 2,163 MS patients (7,327 observations) and 1,465 patients (3,319 observations), respectively. Models achieved R^2 of 0.22 (PHQ-9) and 0.28 (GAD-7). Baseline score was the dominant predictor, but this largely reflects regression to the mean: patients with high baseline scores tend to improve, while those with low scores tend to worsen. Age emerged as a consistent secondary predictor across both models: younger patients showed smaller improvements independent of baseline severity. Feature importance differed between models—PHQ-9 prediction relied on symptom subscales while GAD-7 incorporated pain and disease duration. These results suggest that structured clinical data alone capture only a fraction of what drives mental health trajectories, and that richer data sources—clinical notes, patient-reported outcomes, digital phenotyping—will be needed to enable meaningful individual-level prediction.