In this project, the aim is to make a prediction of foot traffic for week 44 (since Indiana recorded first COVID-19 case) in 1804 Point of Interests (POIs) in Tippecanoe County in Indiana, United States of America (a regression task). The data that is being used has been collected from the week 1 to week 43 and is available as a BIgQuery project file. The schema of the tables that will be used for this project can be found here.
To start with, the six tables would be imported from the BigQuery project and stored as a pandas DataFrame. Then, Exploratory Data Analysis (EDA) would be carried out on the data to better understand their relationships for feature selection for the modeling task. For this challenge, some of the various models I used were:
The result presented in this report was obtained using catboost. This report visualized the effectiveness of the policies taken during the fall in five poi locations that reported the highest cases of the virus.
Data:
Terms:
The timeseries of the twenty poi_id with highest predicted values for week 44 was plotted to see the development over the 33 weeks as shown below. Purdue University, Tipcannoe Plaza and the University's Main Campus are notable for their sharp reduction in the number of cases. They can help inform policy for curbing the spread in places like Beck plaza that has been experiencing a steady increase in reported cases.
fig, ax = plt.subplots(figsize=(20,9))
ax.set_title('Temporal Development for the 20 most Crowded Places')
for j in top_20:
plot_gra(new_data(j))
plt.savefig('temporal_20.png')
By subtracting the predicted results for week 44 from the reported values for week 40, the twenty places with the highest increase were seperated. The timeseries chart illustrating the progress of the cases for these places was then plotted as shown below. Meijer and Menard's are two notably worrying places for the pandemic.
fig, ax = plt.subplots(figsize=(20,9))
box = ax.get_position()
ax.set_position([box.x0, box.y0, box.width * 0.8, box.height])
# ax.legend(loc='center left', bbox_to_anchor=(1, 0.5))
ax.set_title('Temporal Development for the 20 most Increased Places from week 40 to 44')
for j in top_20_increase[1:]:
plot_gra(new_data(j))
plot_gra(new_data(top_20_increase[0]))
plt.savefig('temporal_increased.png')
The task featured a hundred CBGs. The predicted values were aggregated for each of the CBG and the ten CBGs with the highest cases were separated to be used for timeseries analysis shown below. Mike Raisor Lincoln, Bob Automotive group and Aster Place are notably fo concern given the predicted spikes.
fig, ax = plt.subplots(figsize=(20,9))
ax.set_title('Temporal Development for the 10 most Crowded CBGs')
for j in top_cbg:
mod_plot_gra(mod_new_data(j))
plt.savefig('temporal_cbg_11.png')
Similar to the time series analyses above, the highest increase for the CBGs was aggregated and then plotted. Gas America Service CBG are stood out, therefor efforts should be directed towards controlling spreadrate there.
fig, ax = plt.subplots(figsize=(20,9))
ax.set_title('Temporal Development for the 10 most Increased CBGs')
for j in top_cbg_1:
mod_plot_gra_1(mod_new_data_1(j))
plt.savefig('temporal_cbg_10.png')
Based on the results of the analysis, it is apparent that some locations are responding well to precventive measures while some others are recording an upsurge in the number of cases. A notable instance is Purdue University Main Campus in the first plot, the weekly cases recorded at the beginning of fall 2020 peaked at about 8000 which reduced to about 3000 cases per week and can be attributed to the adherence to the Protect Purdue pledge. The curve for Purdue University also correspondinly reduced. These are classic illustrations that simple measure such as masking up, washing hands, social distancing can indeed help flatten the curve. The second charts illutrated that supermarkets are some of the places that recorded the highest increase in the number of cases; this needs to be monitored. CBGs around Mike Raisor and Bob Automative group are some of the most crowded areas where policy change efforts should be focused.
In conclusion, visual cues can be obtained from the charts and maps to guage the performance of the previously used policies andplan further. The Purdue case however affirmaively shows that the simple measures are indeed effective in curbing the spread of the virus.