Model-based machine learning to explore the nexus between COVID-19 and environmental factors in the United States

T. Munir, I. L. Hudson, S. A. Cheema, R. Muhammad, M. Shafqat, T. Kifayat

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

1 Citation (Scopus)

Abstract

The aim of this study is to demonstrate the applicability of machine learning methods to understand the transmission of the viral flow of COVID-19 with respect to various environmental factors. Daily update data of new COVID-19 related reported cases from six states of the United State (US), dated from 1st March 2020 to 30th November 2020, across 6 US states - New York, New Jersey, Illinois, Massachusetts, Georgia and Michigan are examined. The daily COVID-19 update data are assembled from the US health department and Weather Underground Company (WUC) official websites. A diverse set of environmental factors, including temperature, humidity, dew point, wind speed, atmospheric pressure and precipitation are used to express possible environmental determinants. Asymmetric distributions of daily reported new cases of COVID-19 with respect to all states is evident. The average numbers of new reported cases of COVID-19 patients remains highest in Illinois. Whereas maximum numbers of affected cases in a single day were reported in Georgia. The lowest of the average new cases is found in Massachusetts state. We test six most used model-based machine learning methods, namely, linear discriminant analysis (LDA), classification and regression trees (CART), k-nearest neighbours (KNN), support vector machines (SVM), random forest (RF) and the naïve bayes (NB) method. The comparative performance of these ML schemes is expressed using statistics, such as kappa, balanced accuracy, detection rate, information preservation rate, accuracy, sensitivity, and specificity. Moreover, predictive orderings of the environmental factors, for each state with respect to the most promising ML method, are also reported to highlight the hierarchical significance of climatic determinants. The performance orderings of the ML approaches vary across states with the RF model the most promising in exploring the underlying nexus of between the environment covariates and case numbers across all states, the ML hierarchies are: New York: PRF > PKNN = PCART = PSVM > PLDA > PNB, New Jersey : PRF > PLDA = PSVM > PNB = PCART > PKNN, Illinois: PRF > PKNN = PSVM > PNB = PCART > PLDA, Massachusetts: PRF > PSVM > PCART > PKNN > PNB > PLDA, Georgia: PRF > PSVM > PCART > PKNN = PNB > PLDA and Michigan: PRF > PKNN > PSVM > PCART = PNB > PLDA. Noting that procedures such as CART, NB and LDA show questionable performance where Michigan state is concerned. Across the states, average temperature emerges as the most important candidate in explaining the underlying nexus between environment and COVID-19 numbers, consistent with Shahzad et al. (2020). However, we have found that other climate variables such as dewpoint, is a close second in Georgia and Michigan states, and humidity and wind speed play a similarly important role to dewpoint in Illinois and Michigan. Note Georgia and Michigan states have highest average temperature and dew point, and both states record low average wind speed. Michigan has a reported high black community, as does Georgia. For Illinois, temperature is dominant, but followed by dew point, and then closely by both humidity and wind speed, with Illinois having lowest average wind speed and low temperature. There is less evidence for an association between air pressure and precipitation and COVID-19 cases in all states. Finally, based on the outcomes of this research, we believe that a more rigorous study targeting other variables, such as population density, mobility, air quality, nature of travel bans, race, and the degree of health interventions, is required. Furthermore, understanding the potential for seasonality and the association with weather is particularly relevant for further work given the longer time series of COVID-19 information now available in 2021, as is modelling new cases, transmission, along with deaths, reproduction number and severity levels of COVID-19. Given the skewed nature of the distribution of number of reported cases in each state, future work could likewise employ the quintile regression approach.

Original languageEnglish (UK)
Title of host publicationProceedings of the 24th International Congress on Modelling and Simulation, MODSIM 2021
EditorsR. Willem Vervoort, A. Alexey Voinov, Jason P. Evans, Lucy Marshall
PublisherModelling and Simulation Society of Australia and New Zealand Inc. (MSSANZ)
Pages428-434
Number of pages7
ISBN (Electronic)9780987214393
Publication statusPublished - 2021
Externally publishedYes
Event24th International Congress on Modelling and Simulation, MODSIM 2021 - Sydney, Australia
Duration: 5 Dec 202110 Dec 2021

Publication series

NameProceedings of the International Congress on Modelling and Simulation, MODSIM
ISSN (Electronic)2981-8001

Conference

Conference24th International Congress on Modelling and Simulation, MODSIM 2021
Country/TerritoryAustralia
CitySydney
Period5/12/2110/12/21

Keywords

  • COVID-19
  • environmental covariates
  • machine learning model

Fingerprint

Dive into the research topics of 'Model-based machine learning to explore the nexus between COVID-19 and environmental factors in the United States'. Together they form a unique fingerprint.

Cite this