TY - GEN
T1 - Model-based machine learning to explore the nexus between COVID-19 and environmental factors in the United States
AU - Munir, T.
AU - Hudson, I. L.
AU - Cheema, S. A.
AU - Muhammad, R.
AU - Shafqat, M.
AU - Kifayat, T.
N1 - Publisher Copyright:
© 2021 Proceedings of the International Congress on Modelling and Simulation, MODSIM. All rights reserved.
PY - 2021
Y1 - 2021
N2 - The aim of this study is to demonstrate the applicability of machine learning methods to understand the transmission of the viral flow of COVID-19 with respect to various environmental factors. Daily update data of new COVID-19 related reported cases from six states of the United State (US), dated from 1st March 2020 to 30th November 2020, across 6 US states - New York, New Jersey, Illinois, Massachusetts, Georgia and Michigan are examined. The daily COVID-19 update data are assembled from the US health department and Weather Underground Company (WUC) official websites. A diverse set of environmental factors, including temperature, humidity, dew point, wind speed, atmospheric pressure and precipitation are used to express possible environmental determinants. Asymmetric distributions of daily reported new cases of COVID-19 with respect to all states is evident. The average numbers of new reported cases of COVID-19 patients remains highest in Illinois. Whereas maximum numbers of affected cases in a single day were reported in Georgia. The lowest of the average new cases is found in Massachusetts state. We test six most used model-based machine learning methods, namely, linear discriminant analysis (LDA), classification and regression trees (CART), k-nearest neighbours (KNN), support vector machines (SVM), random forest (RF) and the naïve bayes (NB) method. The comparative performance of these ML schemes is expressed using statistics, such as kappa, balanced accuracy, detection rate, information preservation rate, accuracy, sensitivity, and specificity. Moreover, predictive orderings of the environmental factors, for each state with respect to the most promising ML method, are also reported to highlight the hierarchical significance of climatic determinants. The performance orderings of the ML approaches vary across states with the RF model the most promising in exploring the underlying nexus of between the environment covariates and case numbers across all states, the ML hierarchies are: New York: PRF > PKNN = PCART = PSVM > PLDA > PNB, New Jersey : PRF > PLDA = PSVM > PNB = PCART > PKNN, Illinois: PRF > PKNN = PSVM > PNB = PCART > PLDA, Massachusetts: PRF > PSVM > PCART > PKNN > PNB > PLDA, Georgia: PRF > PSVM > PCART > PKNN = PNB > PLDA and Michigan: PRF > PKNN > PSVM > PCART = PNB > PLDA. Noting that procedures such as CART, NB and LDA show questionable performance where Michigan state is concerned. Across the states, average temperature emerges as the most important candidate in explaining the underlying nexus between environment and COVID-19 numbers, consistent with Shahzad et al. (2020). However, we have found that other climate variables such as dewpoint, is a close second in Georgia and Michigan states, and humidity and wind speed play a similarly important role to dewpoint in Illinois and Michigan. Note Georgia and Michigan states have highest average temperature and dew point, and both states record low average wind speed. Michigan has a reported high black community, as does Georgia. For Illinois, temperature is dominant, but followed by dew point, and then closely by both humidity and wind speed, with Illinois having lowest average wind speed and low temperature. There is less evidence for an association between air pressure and precipitation and COVID-19 cases in all states. Finally, based on the outcomes of this research, we believe that a more rigorous study targeting other variables, such as population density, mobility, air quality, nature of travel bans, race, and the degree of health interventions, is required. Furthermore, understanding the potential for seasonality and the association with weather is particularly relevant for further work given the longer time series of COVID-19 information now available in 2021, as is modelling new cases, transmission, along with deaths, reproduction number and severity levels of COVID-19. Given the skewed nature of the distribution of number of reported cases in each state, future work could likewise employ the quintile regression approach.
AB - The aim of this study is to demonstrate the applicability of machine learning methods to understand the transmission of the viral flow of COVID-19 with respect to various environmental factors. Daily update data of new COVID-19 related reported cases from six states of the United State (US), dated from 1st March 2020 to 30th November 2020, across 6 US states - New York, New Jersey, Illinois, Massachusetts, Georgia and Michigan are examined. The daily COVID-19 update data are assembled from the US health department and Weather Underground Company (WUC) official websites. A diverse set of environmental factors, including temperature, humidity, dew point, wind speed, atmospheric pressure and precipitation are used to express possible environmental determinants. Asymmetric distributions of daily reported new cases of COVID-19 with respect to all states is evident. The average numbers of new reported cases of COVID-19 patients remains highest in Illinois. Whereas maximum numbers of affected cases in a single day were reported in Georgia. The lowest of the average new cases is found in Massachusetts state. We test six most used model-based machine learning methods, namely, linear discriminant analysis (LDA), classification and regression trees (CART), k-nearest neighbours (KNN), support vector machines (SVM), random forest (RF) and the naïve bayes (NB) method. The comparative performance of these ML schemes is expressed using statistics, such as kappa, balanced accuracy, detection rate, information preservation rate, accuracy, sensitivity, and specificity. Moreover, predictive orderings of the environmental factors, for each state with respect to the most promising ML method, are also reported to highlight the hierarchical significance of climatic determinants. The performance orderings of the ML approaches vary across states with the RF model the most promising in exploring the underlying nexus of between the environment covariates and case numbers across all states, the ML hierarchies are: New York: PRF > PKNN = PCART = PSVM > PLDA > PNB, New Jersey : PRF > PLDA = PSVM > PNB = PCART > PKNN, Illinois: PRF > PKNN = PSVM > PNB = PCART > PLDA, Massachusetts: PRF > PSVM > PCART > PKNN > PNB > PLDA, Georgia: PRF > PSVM > PCART > PKNN = PNB > PLDA and Michigan: PRF > PKNN > PSVM > PCART = PNB > PLDA. Noting that procedures such as CART, NB and LDA show questionable performance where Michigan state is concerned. Across the states, average temperature emerges as the most important candidate in explaining the underlying nexus between environment and COVID-19 numbers, consistent with Shahzad et al. (2020). However, we have found that other climate variables such as dewpoint, is a close second in Georgia and Michigan states, and humidity and wind speed play a similarly important role to dewpoint in Illinois and Michigan. Note Georgia and Michigan states have highest average temperature and dew point, and both states record low average wind speed. Michigan has a reported high black community, as does Georgia. For Illinois, temperature is dominant, but followed by dew point, and then closely by both humidity and wind speed, with Illinois having lowest average wind speed and low temperature. There is less evidence for an association between air pressure and precipitation and COVID-19 cases in all states. Finally, based on the outcomes of this research, we believe that a more rigorous study targeting other variables, such as population density, mobility, air quality, nature of travel bans, race, and the degree of health interventions, is required. Furthermore, understanding the potential for seasonality and the association with weather is particularly relevant for further work given the longer time series of COVID-19 information now available in 2021, as is modelling new cases, transmission, along with deaths, reproduction number and severity levels of COVID-19. Given the skewed nature of the distribution of number of reported cases in each state, future work could likewise employ the quintile regression approach.
KW - COVID-19
KW - environmental covariates
KW - machine learning model
UR - https://www.scopus.com/pages/publications/85177051041
M3 - Conference contribution
AN - SCOPUS:85177051041
T3 - Proceedings of the International Congress on Modelling and Simulation, MODSIM
SP - 428
EP - 434
BT - Proceedings of the 24th International Congress on Modelling and Simulation, MODSIM 2021
A2 - Vervoort, R. Willem
A2 - Voinov, A. Alexey
A2 - Evans, Jason P.
A2 - Marshall, Lucy
PB - Modelling and Simulation Society of Australia and New Zealand Inc. (MSSANZ)
T2 - 24th International Congress on Modelling and Simulation, MODSIM 2021
Y2 - 5 December 2021 through 10 December 2021
ER -