The New Brunswick Department of Health, Health Analytics branch, fosters the strategic use of information and analytics to inform decision making as it relates to the Department’s mandate of planning, funding and monitoring a high quality & sustainable health system for the citizens of New Brunswick.

Virtual Care Utilization

About the Data

The world of digital health is continuously expanding and has enormous potential to provide adequate, cost-efficient, safe andscalable eHealth interventions to improve health and health care.It is time to reimagine the traditional, in-person approach to care.

Digital health solutions can change the way New Brunswickers receive services and how citizens and providers engage with the health-care system. These services or interventions should bedesigned around the patient’s needs and pertinent informationshould be shared in a proactive and efficient way through smarter use of data, devices, communication platforms and people.

This data set contains an aggregation of patient visits that have occurred during the pandemic. As part of the pandemic response, which includes the new program to support virtual care delivery for the public. The data elements within the data set will allow you to trend the progress of virtual visits, and to describe the characteristics of patients and physicians who have embraced virtual care.The data set contains 113,202 rows including title headers.

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn import linear_model
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report, confusion_matrix

from sklearn.metrics import mean_squared_error
from math import sqrt
import statsmodels.api as sm

import warnings
warnings.filterwarnings("ignore")
C:\Users\Shivam Goyal\anaconda3\lib\site-packages\statsmodels\tsa\base\tsa_model.py:7: FutureWarning: pandas.Int64Index is deprecated and will be removed from pandas in a future version. Use pandas.Index with the appropriate dtype instead.
  from pandas import (to_datetime, Int64Index, DatetimeIndex, Period,
C:\Users\Shivam Goyal\anaconda3\lib\site-packages\statsmodels\tsa\base\tsa_model.py:7: FutureWarning: pandas.Float64Index is deprecated and will be removed from pandas in a future version. Use pandas.Index with the appropriate dtype instead.
  from pandas import (to_datetime, Int64Index, DatetimeIndex, Period,
df.head()
YYYYMM Year Month Age Group Code (Patient) Patient Age Group Age Group (Patient) Gender (Patient) Patient Gender Health Zone (Patient) Patient Health Zone Gender (Physician) Physician Gender Health Zone (Physician) Physician Health Zone Visit Type Number of Visits
0 202112 2021 12 4 0-20 Years 15 - 19 Years 1 Male 3 Fredricton Area 1 Male 7 Miramichi Area Office 9103
1 202103 2021 3 4 0-20 Years 15 - 19 Years 1 Male 3 Fredricton Area 2 Female 1 Moncton Area Office 4591
2 202111 2021 11 4 0-20 Years 15 - 19 Years 1 Male 3 Fredricton Area 2 Female 1 Moncton Area Office 4448
3 202104 2021 4 4 0-20 Years 15 - 19 Years 1 Male 3 Fredricton Area 2 Female 1 Moncton Area Office 4435
4 202102 2021 2 4 0-20 Years 15 - 19 Years 1 Male 3 Fredricton Area 2 Female 1 Moncton Area Office 4370
df.columns, len(df.columns)
(Index(['YYYYMM', 'Year', 'Month', 'Age Group Code (Patient)',
        'Patient Age Group', 'Age Group (Patient)', 'Gender (Patient)',
        'Patient Gender', 'Health Zone (Patient)', 'Patient Health Zone',
        'Gender (Physician)', 'Physician Gender', 'Health Zone (Physician)',
        'Physician Health Zone', 'Visit Type', 'Number of Visits'],
       dtype='object'),
 16)
(df.groupby(['Year','Month'])['Number of Visits'].sum().reset_index()).groupby(['Year'])['Number of Visits'].sum().reset_index().mean()
Year                   2020.50
Number of Visits    2436942.75
dtype: float64
(df.groupby(['Year','Patient Gender'])['Number of Visits'].sum().reset_index()).groupby(['Patient Gender'])['Number of Visits'].mean().reset_index()
Patient Gender Number of Visits
0 Female 1378825.25
1 Male 1058107.50
2 Not Specified 10.00
df.groupby(['Year'])['Physician Gender'].count().reset_index()
Year Physician Gender
0 2019 22627
1 2020 36737
2 2021 43025
3 2022 10812
(df.groupby(['Year','Month'])['Number of Visits'].sum().reset_index()).groupby(['Year'])['Number of Visits'].mean().reset_index().to_csv("Monthly Average Visits.csv")
(df.groupby(['Year','Patient Health Zone'])['Number of Visits'].sum().reset_index()).groupby(['Patient Health Zone'])['Number of Visits'].mean().reset_index()

# .to_csv("Monthly Average Visits.csv")
Patient Health Zone Number of Visits
0 Bathurst Area 299642.25
1 Campbellton Area 69923.50
2 Edmunston Area 115816.75
3 Fredricton Area 566323.50
4 Miramichi Area 133572.25
5 Moncton Area 627299.75
6 Saint John Area 615620.50
7 Unknown 8744.25

image

# Visits Across Time
pd.pivot_table(df, values='Number of Visits', index=['Year', 'Month'],
                    columns=['Visit Type'], aggfunc=np.sum).reset_index().to_csv("Visitsacrosstime.csv",index = False)


image

# Visits Across Patient Health Zone
pd.pivot_table(df[(df['Year']==2020) | (df['Year']==2021)], 
               values='Number of Visits', index=['Patient Health Zone'],
               columns=['Visit Type'], 
               aggfunc=np.sum).reset_index().to_csv("VisitsPatientHealthZone.csv",index = False)


image

# Visits Across Physician Health Zone
pd.pivot_table(df[(df['Year']==2020) | (df['Year']==2021)], 
               values='Number of Visits', index=['Physician Health Zone'],
               columns=['Visit Type'], 
               aggfunc=np.sum).reset_index().to_csv("VisitsPhysicianHealthZone.csv",index = False)


image

# Visits Across Patient Gender
pd.pivot_table(df[(df['Year']==2020) | (df['Year']==2021)], 
               values='Number of Visits', index=['Patient Gender'],
               columns=['Visit Type'], 
               aggfunc=np.sum).reset_index().to_csv("VisitsPatientGender.csv",index = False)

image

# Visits Across Physician Gender
pd.pivot_table(df[(df['Year']==2020) | (df['Year']==2021)], 
               values='Number of Visits', index=['Physician Gender'],
               columns=['Visit Type'], 
               aggfunc=np.sum).reset_index().to_csv("VisitsPhysicianGender.csv",index = False)

image

# Visits Across Patient Age Group
pd.pivot_table(df[(df['Year']==2020) | (df['Year']==2021)], 
               values='Number of Visits', index=['Age Group (Patient)'],
               columns=['Visit Type'], 
               aggfunc=np.sum).reset_index().to_csv("VisitsPatientAgeGroup.csv",index = False)

image

# Visits Across Patient and Physician Health Zone Combined
pd.pivot_table(df[(df['Year']==2020) | (df['Year']==2021)], 
               values='Number of Visits', index=['Patient Health Zone','Physician Health Zone'],
               columns=['Visit Type'], 
               aggfunc=np.sum).reset_index().to_csv("VisitsPatientPhysicianHealthZone.csv",index = False)


# Visits Across Patient Gender & Health Zone Combined
pd.pivot_table(df[(df['Year']==2020) | (df['Year']==2021)], 
               values='Number of Visits', index=['Patient Health Zone','Patient Gender'],
               columns=['Visit Type'], 
               aggfunc=np.sum).reset_index().to_csv("patient healthzone gender.csv",index=False)
# Visits Across Physician Gender & Health Zone Combined
pd.pivot_table(df[(df['Year']==2020) | (df['Year']==2021)], 
               values='Number of Visits', index=['Physician Health Zone','Physician Gender'],
               columns=['Visit Type'], 
               aggfunc=np.sum).reset_index().to_csv("Physician healthzone gender.csv",index=False)

Forecast

Moving Average

# Visits Across Patient and Time
forecast_data = pd.pivot_table(df[~(df['Year']==2022)], 
               values='Number of Visits', index=['YYYYMM'],
               columns=['Visit Type'], 
               aggfunc=np.sum).reset_index()
forecast_data = forecast_data.loc[:,['YYYYMM','Virtual Care']]
forecast_data
train = forecast_data.iloc[13:,]
train
Visit Type YYYYMM Virtual Care MA
13 202005 73908.0 NaN
14 202006 170473.0 NaN
15 202007 145645.0 130008.666667
16 202008 125760.0 147292.666667
17 202009 144209.0 138538.000000
18 202010 137580.0 135849.666667
19 202011 140555.0 140781.333333
20 202012 133632.0 137255.666667
21 202101 169229.0 147805.333333
22 202102 154023.0 152294.666667
23 202103 165160.0 162804.000000
24 202104 146006.0 155063.000000
25 202105 140076.0 150414.000000
26 202106 144661.0 143581.000000
27 202107 108489.0 131075.333333
28 202108 105513.0 119554.333333
29 202109 124829.0 112943.666667
30 202110 122944.0 117762.000000
31 202111 128506.0 125426.333333
32 202112 115565.0 122338.333333
train['MA'] = train['Virtual Care'].rolling(window=2).mean()
rms = sqrt(mean_squared_error(train.loc[30:,'Virtual Care'],train.loc[30:,'MA']))
print(rms)
4102.423572312672
test = train.loc[30:,]
test
Visit Type YYYYMM Virtual Care MA
30 202110 122944.0 123886.5
31 202111 128506.0 125725.0
32 202112 115565.0 122035.5
# 3 --5234.994913153281
# 2 --4102.423572312672
fit2.forecast(7)
array([119851.48, 119851.48, 119851.48, 119851.48, 119851.48, 119851.48,
       119851.48])

Simple Exponential Smoothening

from statsmodels.tsa.api import ExponentialSmoothing, SimpleExpSmoothing, Holt

fit2 = SimpleExpSmoothing(np.asarray(train.loc[30:,'Virtual Care'])).fit(smoothing_level=0.6,optimized=False)
test['SES'] = fit2.forecast(len(test))
plt.figure(figsize=(16,8))
plt.plot(train['Virtual Care'], label='Train')
plt.plot(test['Virtual Care'], label='Test')
plt.plot(test['SES'], label='SES')
plt.legend(loc='best')
plt.show()

png

Holt’s Winters

# import statsmodels.api as sm
# sm.tsa.seasonal_decompose(train['Virtual Care']).plot()
# result = sm.tsa.stattools.adfuller(train['Virtual Care'])
# plt.show()
sns.barplot(x = 'Year', y = 'Number of Visits', hue = 'Visit Type', data = df)
#print(df.groupby(['Year', 'Visit Type']).mean()['Number of Visits'])
plt.show()

plt.figure(figsize=(25,5))
sns.barplot(x = 'Age Group (Patient)', y = 'Number of Visits', hue = 'Visit Type', data = df)
plt.show()

sns.barplot(x = 'Patient Gender', y = 'Number of Visits', hue = 'Visit Type', data = df)
plt.show()

sns.barplot(x = 'Physician Gender', y = 'Number of Visits', hue = 'Visit Type', data = df)
plt.show()

plt.figure(figsize=(25,5))
sns.barplot(x = 'Patient Health Zone', y = 'Number of Visits', hue = 'Visit Type', data = df)
plt.show()

plt.figure(figsize=(25,5))
sns.barplot(x = 'Physician Health Zone', y = 'Number of Visits', hue = 'Visit Type', data = df)
plt.show()


png

png

png

png

png

png

Linear Regression Model

dummies=pd.get_dummies(df,drop_first=True)
df=pd.concat([df,dummies.iloc[:,16:46]],axis=1)
#list(dummies.columns)

df_reg=pd.concat([df.iloc[:,0],df.iloc[:,16:46],df.iloc[:,15]],axis=1)
#df_reg=pd.concat([df.iloc[:,0],df.iloc[:,3],df.iloc[:,28:45],df.iloc[:,15]],axis=1)

x=df_reg.loc[:,df_reg.columns!="Number of Visits"]
y=df_reg["Number of Visits"]

x_train, x_test,y_train,y_test = train_test_split(x,y,test_size =0.3)
regr = linear_model.LinearRegression()
regr.fit(x_train, y_train)

#print('Intercept: \n', regr.intercept_)
#print('Coefficients: \n', regr.coef_)

x = sm.add_constant(x_train) # adding a constant
 
model = sm.OLS(y_train, x_train).fit()
predictions = model.predict(x_test) 
 
print_model = model.summary()
print(print_model)

regr.score(x_test,y_test)

                                 OLS Regression Results                                
=======================================================================================
Dep. Variable:       Number of Visits   R-squared (uncentered):                   0.218
Model:                            OLS   Adj. R-squared (uncentered):              0.217
Method:                 Least Squares   F-statistic:                              710.9
Date:                Fri, 04 Nov 2022   Prob (F-statistic):                        0.00
Time:                        01:20:15   Log-Likelihood:                     -5.3945e+05
No. Observations:               79240   AIC:                                  1.079e+06
Df Residuals:                   79209   BIC:                                  1.079e+06
Df Model:                          31                                                  
Covariance Type:            nonrobust                                                  
==========================================================================================================
                                             coef    std err          t      P>|t|      [0.025      0.975]
----------------------------------------------------------------------------------------------------------
Year                                       0.0361      0.002     22.958      0.000       0.033       0.039
Age Group (Patient)_35 - 39 Years         10.6933      3.476      3.076      0.002       3.881      17.506
Age Group (Patient)_40 - 44 Years         15.3216      3.490      4.391      0.000       8.482      22.161
Age Group (Patient)_45 - 49 Years         22.3762      3.482      6.426      0.000      15.552      29.201
Age Group (Patient)_50 - 54 Years         30.1556      3.442      8.761      0.000      23.410      36.902
Age Group (Patient)_55 - 59 Years         48.4215      3.391     14.280      0.000      41.776      55.068
Age Group (Patient)_60 - 64 Years         57.7919      3.349     17.254      0.000      51.227      64.357
Age Group (Patient)_65 - 69 Years         95.5130      3.347     28.534      0.000      88.952     102.074
Age Group (Patient)_70 - 74 Years        134.8046      3.422     39.390      0.000     128.097     141.512
Age Group (Patient)_75 - 79 Years        107.7460      3.646     29.554      0.000     100.600     114.892
Age Group (Patient)_80 - 84 Years         75.6499      3.932     19.237      0.000      67.942      83.358
Age Group (Patient)_85 - 89 Years         31.6560      4.133      7.659      0.000      23.555      39.757
Age Group (Patient)_90+ Years              5.8787      4.565      1.288      0.198      -3.068      14.826
Patient Gender_Male                      -19.8723      1.560    -12.738      0.000     -22.930     -16.815
Patient Gender_Not Specified            -129.7541     42.995     -3.018      0.003    -214.024     -45.484
Patient Health Zone_Campbellton Area     -58.0391      3.421    -16.965      0.000     -64.745     -51.334
Patient Health Zone_Edmunston Area       -39.6609      3.556    -11.155      0.000     -46.630     -32.692
Patient Health Zone_Fredricton Area       27.3165      3.022      9.040      0.000      21.394      33.239
Patient Health Zone_Miramichi Area       -48.5354      3.151    -15.403      0.000     -54.711     -42.359
Patient Health Zone_Moncton Area          39.6895      2.942     13.491      0.000      33.923      45.456
Patient Health Zone_Saint John Area       60.7020      3.258     18.632      0.000      54.316      67.088
Patient Health Zone_Unknown             -102.6974      3.586    -28.637      0.000    -109.726     -95.669
Physician Gender_Male                     10.5345      1.566      6.727      0.000       7.465      13.604
Physician Health Zone_Campbellton Area   -32.0513      3.597     -8.911      0.000     -39.101     -25.002
Physician Health Zone_Edmunston Area     -28.1643      3.739     -7.533      0.000     -35.492     -20.837
Physician Health Zone_Fredricton Area      9.6762      2.993      3.233      0.001       3.810      15.542
Physician Health Zone_Miramichi Area     -54.1623      3.079    -17.592      0.000     -60.197     -48.128
Physician Health Zone_Moncton Area        29.3353      2.826     10.382      0.000      23.797      34.874
Physician Health Zone_Saint John Area     34.7255      3.088     11.247      0.000      28.674      40.777
Physician Health Zone_Unknown            -87.6980      4.727    -18.552      0.000     -96.963     -78.433
Visit Type_Virtual Care                  -28.6571      1.611    -17.792      0.000     -31.814     -25.500
==============================================================================
Omnibus:                    90032.920   Durbin-Watson:                   1.985
Prob(Omnibus):                  0.000   Jarque-Bera (JB):         21699933.674
Skew:                           5.593   Prob(JB):                         0.00
Kurtosis:                      83.295   Cond. No.                     1.12e+05
==============================================================================

Notes:
[1] R² is computed without centering (uncentered) since the model does not contain a constant.
[2] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[3] The condition number is large, 1.12e+05. This might indicate that there are
strong multicollinearity or other numerical problems.





0.11132790904846412

Final Selected Model Prediction

image