Abstract

Straight Connector 1

The arising 2019 novel Coronavirus (2019-nCoV) has taken the world by storm. It is known that this disease is not that same as the coronaviruses that are commonly found among humans and cause mild symptoms, such as the common cold. In addition, Coronavirus 229E, NL63, OC43, or HKU1 are not the same as a 2019-nCoV diagnosis. Thus, the new coronavirus leaves many unanswered questions. Big Data is an important tool that health care officials are using to control and track the disease. This paper aims to deploy a modular regression time-series function to forecast the confirmed cases of COVID-19, deaths, and recovery cases worldwide based on the data published by WHO and multiple international government organizations.

  1. Introduction

The 2019 Novel Coronavirus (2019-nCoV) is a respiratory illness first confirmed in early December in Wuhan Province, China. The Chinese government reported initially that patients had some link to a large seafood and animal market, suggesting animal-to-person spread. The disease has since been confirmed to be spread person-to-person as well [1]. The disease is novel in regards to its molecular differences to other identified coronaviruses such as 229E, NL63, OC43, or HKU1; in addition, in accordance with the writing of this article, no confirmed vaccine or antiviral medicine has been made available to treat this disease [2]. It is no surprise that the attention of all world health organizations has been focused on attempting to control the spread of 2019-nCoV.

Multiple governments and government health organizations such as the World Health Organization (WHO), National Health Commission of the People’s Republic of China (NHC), China CDC (CCDC), Hong Kong Department of Health, Macau Government, Taiwan CDC, US CDC, the Government of Canada, The Australian Government Department of Health, The European Centre for Disease Prevention and Control (ECDC), The Ministry of Health Singapore (MOH), and The Italy Ministry of Health have published public data about the confirmed cases, deaths and recovery cases of their respective citizens (WHO, although, reports cases worldwide). The Johns Hopkins University Center for Systems Science and Engineering (supported by the ESRI Living Atlas Team and the Johns Hopkins University Applied Physics Lab) have compiled the 2019 Novel Coronavirus COVID-19 (2019-nCoV) Data Repository by Johns Hopkins CSSE [3,4]. This dataset can be used for data analysis, manipulation, and forecasting.

This paper, for data visualization and forecasting purposes, will use a popular computer programming language, Python. Using Plotly, a Python graphing library [5], and the 2019-nCoV John Hopkins Data Repository, the number of confirmed cases, deaths and recovery cases can be visualized; it is clear that all variables are exponentially increasing (Figure 1).

Figure 1 Worldwide Confirmed Cases, Deaths, and Recovery cases from 1/22/20-03/24/20.

Screen Shot 2020-03-25 at 11.29.16 PM.png

This paper will be focusing on forecasting the confirmed cases, deaths, and recovered cases. It is important to thus define that recovered cases mean two negative swab tests on consecutive days; an absence of fever, with no use of fever-reducing medication, for three full days; improvement in other symptoms, such as coughing and shortness of breath; a period of seven full days since symptoms first appeared [6].

Data forecasting is a common task to extrapolate data points further in time. It has been used in a multitude of fields ranging from predicting fossil locations to predicting astronomical anomalies [7,8]. Forecasting is a useful way to make sense of large amounts of data and help anticipate future events. There are many models of forecasting such as the Drift Method, Seasonal naïve approach, support vector machines, and artificial neural networks [9]. In this paper we will use Python to use a very common forecasting model: Time Series Forecasting.

  1. Materials and Methodology

The data repository provided by the Johns Hopkins University Center for Systems Science and Engineering will be the primary dataset for the forecasting model presented in this paper. The dataset that is used in this paper has been updated last on 03/24/20. The dataset includes the following variables: serial number, observation date, province/state, country/region, last update, confirmed cases, deaths, and recovered cases.

Python 3.5 and the following libraries will be used for implementing the forecasting model: pandas, numpy, seaborn. matplotlib, plotly, fbprophet, pycountry.

The model implemented in this paper is adapted from Facebook Core AI’s Prophet Forecasting paper[10]. It consists of a forecasting time series model based on an additive procedure where non-linear trends are fit with yearly, weekly, and daily seasonality, plus holiday effects. Prophet works best with time series that have strong seasonal effects and several seasons of historical data. Prophet is robust to missing data and shifts in the trend, and typically handles outliers well.

The Prophet model is defined as:

(Figure 2) where g(t) is the trend function which models non-periodic changes in the value of the time series, s(t) represents periodic changes (e.g., weekly and yearly seasonality), and h(t) represents the effects of holidays which occur on potentially irregular schedules over one or more days. The error term ε(t) represents any idiosyncratic changes which are not accommodated by the model. It is important to note, however, that the logistic growth model used is a special case of generalized logistic growth curves, which is only a single type of sigmoid curve. The prophet model is implemented using the Facebook prophet library, which allows us to facilitate the forecasting process by simply feeding in the previous                                  data (

) .

The model will forecast the confirmed cases, deaths, and recovered cases from 04-01-2020 to 04-15-2020 using Prophet and 95% confidence intervals. No tweaking of seasonality or additive regression models were added.

Figure 2 The Facebook Prophet time series model summarized.

  1. Results

Using prophet, confirmed cases, deaths, and recovery cases are forecasted from 04/01/20 to 04/15/20 with a prediction value and a lower and upper uncertainty prediction as well. The results for confirmed cases are shown in Table I and are graphed in Figure 3. The results for deaths are shown in Table II and are graphed in Figure 4. The results for recovery cases are shown in Table III and are graphed in Figure 5.


Table I Forecasted confirmed cases by date including lower and upper uncertainty predictions

Date

Predicted

Lower

Upper

04-01-20

453079.445301

419513.649591

489212.379745

04-02-20

469124.886850

433109.124770

505562.892242

04-03-20

486153.186869

447807.002236

527074.312065

04-04-20

502529.593721

463820.128418

541716.175332

04-05-20

519217.466290

477959.049949

559228.976027

04-06-20

536899.806302

493526.477808

577874.905225

04-07-20

553624.325152

507990.543179

599764.276492

04-08-20

558972.941367

512067.448528

603994.585677

04-09-20

575018.382917

529097.743649

621851.808099

04-10-20

592046.682936

543933.863237

644608.498571

04-11-20

608423.089788

554294.063989

662345.878965

04-12-20

625110.962356

568976.305970

684913.009929

04-13-20

642793.302369

579227.765196

704431.586162

04-14-20

659517.821218

597737.597275

720024.128186

04-15-20

664866.437434

603810.623149

726738.134273


Table II Forecasted deaths by date including lower and upper uncertainty predictions

Date

Prediction

Lower

Upper

04-01-20

22585.187266

21178.557044

24080.738468

04-02-20

23417.122506

21972.416925

24916.567463

04-03-20

24328.257159

22606.544818

26014.643357

04-04-20

25259.147869

23540.297159

27038.111695

04-05-20

26206.462505

24432.292052

28165.841966

04-06-20

27197.541807

25047.127419

29000.991058

04-07-20

28230.167361

25946.760013

30254.327814

04-08-20

28832.410283

26404.584039

30966.692089

04-09-20

29664.345523

27207.506090

32099.965154

04-10-20

30575.480176

27940.697342

33150.336532

04-11-20

31506.370885

28739.088122

34265.891853

04-12-20

32453.685522

29468.474615

35504.261682

04-13-20

33444.764824

30155.166010

36665.576906

04-14-20

34477.390378

31011.621609

37981.901815

04-15-20

35079.633299

31276.591941

38824.239081


Table III Forecasted recovered cases by date including lower and upper uncertainty predictions

Date

04-01-20

121728.856961

118364.783725

125295.153700

04-02-20

124095.768386

120061.339507

128195.299243

04-03-20

126560.755391

122519.559616

131212.448196

04-04-20

129609.682297

124625.420052

134590.361581

04-05-20

132473.766993

127668.713193

137927.978113

04-06-20

135096.620843

129442.154843

140453.558490

04-07-20

138229.146223

131994.325687

144038.737917

04-08-20

140073.486312

133492.175692

146196.948247

04-09-20

142440.397737

135479.818246

149118.426320

04-10-20

144905.384742

136716.940202

152145.464183

04-11-20

147954.311649

139758.738004

156329.311780

04-12-20

150818.396345

142215.670193

159890.308617

04-13-20

153441.250194

144086.838825

163330.860406

04-14-20

156573.775575

146928.498743

167165.276032

04-15-20

158418.115664

147715.166883

169698.813546

Confirmed Cases.jpg

Figure 3 Confirmed cases plotted and forecasted over time.

Deaths.jpg

Figure 4 Deaths plotted and forecasted over time.

Recovered Cases.jpg

Figure 5 Recovered cases plotted and forecasted over time.

  1. Discussion

The results presented show clear upward trends. The graphs further elucidate and project the track COVID-19 will spread in our world. It is important to note that the current data is forecasted from data updated on 03/24/20; the more recent the data, the more accurate the forecasting model will perform. However, under these minute-by-minute changes in the confirmed cases, the deaths, and the recovered cases, the model may be inaccurate.

It will be interesting to compare the results of the model as the cases increase every hour, every day. Another thing to account is a spike (possibly from the lack of testing in some countries in earlier months) which may also throw the model off.

  1. Conclusion

In this paper, data from a reliable data repository is computed and visualized; Python is implemented to use the Prophet Time Series prediction model for the data provided. Data analysis shows clear upward trends that follow a specific slope/pattern. The fit of the model can be evaluated in real-time as the cases continuously increase. Future work may include introducing a recurrent neural network to adjust the parameters of the prophet model based on its ability to predict on a day-by-day basis.

The Code to this paper can be found here.

  1. References

[1] "What You Need To Know About The Coronavirus 2019". 2020. Cdc.Gov. https://www.cdc.gov/coronavirus/2019-ncov/downloads/2019-ncov-factsheet.pdf.

[2] "Coronavirus". 2020. Who.Int. https://www.who.int/emergencies/diseases/novel-coronavirus-2019.

[3] Dong, Ensheng, Hongru Du, and Lauren Gardner. 2020. "An Interactive Web-Based Dashboard To Track COVID-19 In Real Time". The Lancet Infectious Diseases. doi:10.1016/s1473-3099(20)30120-1.

[4] "Cssegisanddata/COVID-19". 2020. Github. https://github.com/CSSEGISandData/COVID-19.

[5] "Plotly Python Graphing Library". 2020. Plotly.Com. https://plotly.com/python/.

[6] "Coronavirus Disease 2019 (COVID-19)". 2020. Centers For Disease Control And Prevention. https://www.cdc.gov/coronavirus/2019-ncov/about/index.html.

[7] Block, Sebastián, Frédérik Saltré, Marta Rodríguez-Rey, Damien A. Fordham, Ingmar Unkel, and Corey J. A. Bradshaw. 2016. "Where To Dig For Fossils: Combining Climate-Envelope, Taphonomy And Discovery Models". PLOS ONE 11 (3): e0151090. doi:10.1371/journal.pone.0151090.

[8] Feigelson, Eric D., G. Jogesh Babu, and Gabriel A. Caceres. 2018. "Autoregressive Times Series Methods For Time Domain Astronomy". Frontiers In Physics 6. doi:10.3389/fphy.2018.00080.

[9] Chretien, Jean-Paul, Dylan George, Jeffrey Shaman, Rohit A. Chitale, and F. Ellis McKenzie. 2014. "Influenza Forecasting In Human Populations: A Scoping Review". Plos ONE 9 (4): e94130. doi:10.1371/journal.pone.0094130.

[10] Taylor, Sean J, and Benjamin Letham. 2017. "Prophet: Forecasting At Scale". doi:10.7287/peerj.preprints.3190v2.