Jump To Top


Could Machine Learning models serve to predict depression in early pregnancy in racial/ethnic minority women?

In a recent study uploaded to the medRxiv* pre-print server, researchers built and assessed machine learning models to predict depression in pregnant women using electronic medical record data.

Their study cohort comprised primarily low-income Hispanic and Black female patients from the University of Illinois Hospital & Health Sciences System. Their findings revealed that while machine learning can predict mental health conditions during early pregnancy, their predictive performance is poor for low-income minority women.

Study: Predicting prenatal depression and assessing model bias using machine learning models. Image Credit: NicoElNino/Shutterstock.com

*Important notice: medRxiv publishes preliminary scientific reports that are not peer-reviewed and, therefore, should not be regarded as conclusive, guide clinical practice/health-related behavior, or treated as established information.

Perinatal depression and its associated risk factors

Perinatal depression (PND) is a subset of mental illness affecting women during their pregnancy and up to one year after childbirth. It is a growing concern, especially in the United States (US), where PND affects 10-20% of pregnant women.

The incidences of PND have been reported to have increased more than 3-fold between 2000-2005, with Black (2-fold) and Hispanic (5-fold) women being much higher than their non-Hispanic White counterparts.

The COVID-19 pandemic further exuberated PND, with 27-32% of US women affected. Research has shown that PND results in several non-mental health-related complications, including preterm labor, reduced infant birth weight, higher hospital stay duration and cost, and increased maternal morbidity and mortality.

Infants are observed to suffer from a significantly increased risk of inadequate cognitive development, underdeveloped social-emotional behavior, and altered stress responses. Research has further reported stunted infant growth and increased risk of future mental disorders in children of women with PND.

Perinatal depression incidence has been associated with numerous environmental factors, including unplanned pregnancies, adverse childhood experiences, prior mental health conditions, and lack of social support. While minorities have been at higher risk than White women, reports suggest that social stigma makes them less likely to be screened for PND or seek professional help.

Machine learning (ML) models have been shown to predict pregnancy outcomes using Electronic Medical Records (EMRs). However, previous studies employing ML on PND have focused on predicting depression following childbirth and have been conducted on cohorts of middle-class White women, largely ignoring racial or economic minorities.

This is expected to introduce predictive bias in ML models, reducing their ability to assess EMR data from minorities, including Black and Hispanic women.

About the study

In the present pre-print, researchers developed ML models to predict and assess depression severity in women of color. Researchers collected EMR data from women who received obstetric care from the University of Illinois Hospital & Health Sciences System (UIHealth) from 2014-2020.

Data was biased towards Black (51%) and Hispanic (29%) women. In contrast, non-Hispanic White (9%) and Asians and Native Americans (10%) are racial minorities in this dataset.

Of the 5,875 individuals initially included, researchers identified 2,414 women who met their screening criteria – complete EMR data for the Patient Health Questionnaire-9 (PHQ-9; this is a test of depression presence and severity) and first obstetric visit before 24 weeks of pregnancy. Researchers used PHQ-9 scores to assign study cohorts – women with scores of 1-4 (low depression) were the control, while those with scores of 9 and above formed the cases group.

The set of variables used in ML model training comprised 29 broad classes of prescribed medication, race, and health insurance (a proxy for financial status). Demographic variables (employment status, marital status) and lifestyle (smoking and alcohol consumption) were used for model selection and tuning.

Multiple models, including the XGBoost, Random Forest, and Elastic Net models, were tested, following which Shapley values were used to identify variables contributing most to perinatal depression.

Shapley values are a game theory approach to evaluating the individual contributions of variables to an observed outcome (in this case, perinatal depression and its severity).

Finally, researchers used their ML model to assess the risk of perinatal depression, both in the study and control cohorts.

Study findings

Based on the study criteria, the 2,414 women included were divided into 657 cases and 1,757 controls. To account for the inherent bias given imbalanced cohort sizes, researchers used 400 pairs of randomly selected cases and controls to train each of the 20 developed models.

Statistical analyses of raw EMR data revealed that 81% of the study cohort comprised low-income Black and Hispanic women. Black women showed statistically higher unplanned pregnancies and unemployment status than other ethnic groups.

Their probability of being single was similarly high. Lifestyle and health choices (unplanned pregnancy and tobacco use) seemed to play a role in depression incidence independent of ethnicity.

Researchers identified the Elastic Net model as the best out of the 20 models developed. While the Random Forest model matched the Elastic Net model in predicting depression in race-agnostic simulations, the latter showed significantly reduced computational time and was thus used for training and assessment.  

Out of the over 600 variables in the EMR dataset, the ML model identified marital status, unplanned pregnancies, age, employment status, insurance policy, and tobacco consumption as the most predictive of PND.

“…our model also identified features that have not been previously associated with depressive symptom severity in pregnancy, or just reported in a few studies. For instance, we discovered that elevated depressive symptoms were positively associated with self-reported levels of pain, an asthma diagnosis, carrying a male fetus (82), using antihistamines, analgesics, or antibiotics, and with lower platelet levels in blood.”

The ML model further revealed that PND severity was most strongly associated with self-reported pain levels and previous mental illness, with the former being highest in Black women.

Finally, model performance tests on case and control cohort data revealed that, while the model was able to predict depression and severity with moderate (50-66%) accuracy in race-agnostic simulations, sensitivity was significantly higher for White high-income women (85%) when compared to Black (70%) women, despite sample size being biased towards the latter.


In the present pre-print, researchers built, selected, and tested the sensitivity of machine learning models in predicting perinatal depression. They identified the Elastic Net and Random Forest models as being the most accurate, with the latter used in testing given its lower computational requirements.

Despite the sample size being biased toward low-income minorities (Black and Hispanic women), model accuracy was higher for high-income White women (85% vs. 66%).

Accuracy notwithstanding, this research suggests that ML models can be used to identify EMR in the early stages of pregnancy. This could improve mother and infant health if incorporated into obstetric care practices.

*Important notice: medRxiv publishes preliminary scientific reports that are not peer-reviewed and, therefore, should not be regarded as conclusive, guide clinical practice/health-related behavior, or treated as established information.

Journal reference:
  • Preliminary scientific report.

    Huang, Y. et al. (2023) "Predicting prenatal depression and assessing model bias using machine learning models". medRxiv. doi: 10.1101/2023.07.17.23292587. https://www.medrxiv.org/content/10.1101/2023.07.17.23292587v1

Posted in: Device / Technology News | Medical Science News | Medical Research News | Medical Condition News | Women's Health News | Healthcare News

Tags: Alcohol, Antihistamines, Asthma, Birth Weight, Blood, Childbirth, Children, covid-19, Depression, Electronic Medical Records, Health Insurance, Hospital, Labor, Machine Learning, Mental Health, Mortality, Pain, Pandemic, Platelet, Pregnancy, Research, Smoking, Stress, Tobacco

Comments (0)

Written by

Hugo Francisco de Souza

Hugo Francisco de Souza is a scientific writer based in Bangalore, Karnataka, India. His academic passions lie in biogeography, evolutionary biology, and herpetology. He is currently pursuing his Ph.D. from the Centre for Ecological Sciences, Indian Institute of Science, where he studies the origins, dispersal, and speciation of wetland-associated snakes. Hugo has received, amongst others, the DST-INSPIRE fellowship for his doctoral research and the Gold Medal from Pondicherry University for academic excellence during his Masters. His research has been published in high-impact peer-reviewed journals, including PLOS Neglected Tropical Diseases and Systematic Biology. When not working or writing, Hugo can be found consuming copious amounts of anime and manga, composing and making music with his bass guitar, shredding trails on his MTB, playing video games (he prefers the term ‘gaming’), or tinkering with all things tech.

Source: Read Full Article

  • Posted on July 20, 2023