TY - JOUR
T1 - Machine learning guided postnatal gestational age assessment using new-born screening metabolomic data in South Asia and sub-Saharan Africa
AU - Sazawal, Sunil
AU - Ryckman, Kelli K.
AU - Das, Sayan
AU - Khanam, Rasheda
AU - Nisar, Imran
AU - Jasper, Elizabeth
AU - Dutta, Arup
AU - Rahman, Sayedur
AU - Mehmood, Usma
AU - Bedell, Bruce
AU - Deb, Saikat
AU - Chowdhury, Nabidul Haque
AU - Barkat, Amina
AU - Mittal, Harshita
AU - Ahmed, Salahuddin
AU - Khalid, Farah
AU - Raqib, Rubhana
AU - Manu, Alexander
AU - Yoshida, Sachiyo
AU - Ilyas, Muhammad
AU - Nizar, Ambreen
AU - Ali, Said Mohammed
AU - Baqui, Abdullah H.
AU - Jehan, Fyezah
AU - Dhingra, Usha
AU - Bahl, Rajiv
N1 - Publisher Copyright:
© 2021, The Author(s).
PY - 2021/12
Y1 - 2021/12
N2 - Background: Babies born early and/or small for gestational age in Low and Middle-income countries (LMICs) contribute substantially to global neonatal and infant mortality. Tracking this metric is critical at a population level for informed policy, advocacy, resources allocation and program evaluation and at an individual level for targeted care. Early prenatal ultrasound examination is not available in these settings, gestational age (GA) is estimated using new-born assessment, last menstrual period (LMP) recalls and birth weight, which are unreliable. Algorithms in developed settings, using metabolic screen data, provided GA estimates within 1–2 weeks of ultrasonography-based GA. We sought to leverage machine learning algorithms to improve accuracy and applicability of this approach to LMICs settings. Methods: This study uses data from AMANHI-ACT, a prospective pregnancy cohorts in Asia and Africa where early pregnancy ultrasonography estimated GA and birth weight are available and metabolite screening data in a subset of 1318 new-borns were also available. We utilized this opportunity to develop machine learning (ML) algorithms. Random Forest Regressor was used where data was randomly split into model-building and model-testing dataset. Mean absolute error (MAE) and root mean square error (RMSE) were used to evaluate performance. Bootstrap procedures were used to estimate confidence intervals (CI) for RMSE and MAE. For pre-term birth identification ROC analysis with bootstrap and exact estimation of CI for area under curve (AUC) were performed. Results: Overall model estimated GA had MAE of 5.2 days (95% CI 4.6–6.8), which was similar to performance in SGA, MAE 5.3 days (95% CI 4.6–6.2). GA was correctly estimated to within 1 week for 85.21% (95% CI 72.31–94.65). For preterm birth classification, AUC in ROC analysis was 98.1% (95% CI 96.0–99.0; p < 0.001). This model performed better than Iowa regression, AUC Difference 14.4% (95% CI 5–23.7; p = 0.002). Conclusions: Machine learning algorithms and models applied to metabolomic gestational age dating offer a ladder of opportunity for providing accurate population-level gestational age estimates in LMICs settings. These findings also point to an opportunity for investigation of region-specific models, more focused feasible analyte models, and broad untargeted metabolome investigation.
AB - Background: Babies born early and/or small for gestational age in Low and Middle-income countries (LMICs) contribute substantially to global neonatal and infant mortality. Tracking this metric is critical at a population level for informed policy, advocacy, resources allocation and program evaluation and at an individual level for targeted care. Early prenatal ultrasound examination is not available in these settings, gestational age (GA) is estimated using new-born assessment, last menstrual period (LMP) recalls and birth weight, which are unreliable. Algorithms in developed settings, using metabolic screen data, provided GA estimates within 1–2 weeks of ultrasonography-based GA. We sought to leverage machine learning algorithms to improve accuracy and applicability of this approach to LMICs settings. Methods: This study uses data from AMANHI-ACT, a prospective pregnancy cohorts in Asia and Africa where early pregnancy ultrasonography estimated GA and birth weight are available and metabolite screening data in a subset of 1318 new-borns were also available. We utilized this opportunity to develop machine learning (ML) algorithms. Random Forest Regressor was used where data was randomly split into model-building and model-testing dataset. Mean absolute error (MAE) and root mean square error (RMSE) were used to evaluate performance. Bootstrap procedures were used to estimate confidence intervals (CI) for RMSE and MAE. For pre-term birth identification ROC analysis with bootstrap and exact estimation of CI for area under curve (AUC) were performed. Results: Overall model estimated GA had MAE of 5.2 days (95% CI 4.6–6.8), which was similar to performance in SGA, MAE 5.3 days (95% CI 4.6–6.2). GA was correctly estimated to within 1 week for 85.21% (95% CI 72.31–94.65). For preterm birth classification, AUC in ROC analysis was 98.1% (95% CI 96.0–99.0; p < 0.001). This model performed better than Iowa regression, AUC Difference 14.4% (95% CI 5–23.7; p = 0.002). Conclusions: Machine learning algorithms and models applied to metabolomic gestational age dating offer a ladder of opportunity for providing accurate population-level gestational age estimates in LMICs settings. These findings also point to an opportunity for investigation of region-specific models, more focused feasible analyte models, and broad untargeted metabolome investigation.
KW - Gestational age
KW - Machine learning
KW - New born screening
KW - Pre-term births
UR - http://www.scopus.com/inward/record.url?scp=85114399035&partnerID=8YFLogxK
U2 - 10.1186/s12884-021-04067-y
DO - 10.1186/s12884-021-04067-y
M3 - Article
C2 - 34493237
AN - SCOPUS:85114399035
SN - 1471-2393
VL - 21
JO - BMC Pregnancy and Childbirth
JF - BMC Pregnancy and Childbirth
IS - 1
M1 - 609
ER -