You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: vignettes/epipredict.Rmd
+30-22Lines changed: 30 additions & 22 deletions
Original file line number
Diff line number
Diff line change
@@ -13,20 +13,6 @@ vignette: >
13
13
source(here::here("vignettes/_common.R"))
14
14
```
15
15
16
-
```{r setup, message=FALSE, include = FALSE}
17
-
library(dplyr)
18
-
library(parsnip)
19
-
library(workflows)
20
-
library(recipes)
21
-
library(epidatasets)
22
-
library(epipredict)
23
-
library(epiprocess)
24
-
library(ggplot2)
25
-
library(purrr)
26
-
forecast_date <- as.Date("2021-08-01")
27
-
used_locations <- c("ca", "ma", "ny", "tx")
28
-
library(epidatr)
29
-
```
30
16
31
17
At a high level, the goal of `{epipredict}` is to make it easy to run simple machine
32
18
learning and statistical forecasters for epidemiological data.
@@ -86,6 +72,27 @@ For a more in-depth treatment with some practical applications, see also the
86
72
# Panel forecasting basics
87
73
88
74
This section gives basic usage examples for the package beyond the most basic usage of `arx_forecaster()` for forecasting a single ahead using the default engine.
75
+
Before we start actually building forecasters, lets import some relevant libraries
76
+
77
+
```{r setup, message=FALSE}
78
+
library(dplyr)
79
+
library(parsnip)
80
+
library(workflows)
81
+
library(recipes)
82
+
library(epidatasets)
83
+
library(epipredict)
84
+
library(epiprocess)
85
+
library(ggplot2)
86
+
library(purrr)
87
+
library(epidatr)
88
+
```
89
+
90
+
And our default forecasting date and selected states (we will use these to limit the data to make discussion easier):
91
+
92
+
```{r}
93
+
forecast_date <- as.Date("2021-08-01")
94
+
used_locations <- c("ca", "ma", "ny", "tx")
95
+
```
89
96
90
97
## Example data
91
98
@@ -435,14 +442,11 @@ autoplot(
435
442
436
443
The 8 graphs represent all combinations of the `geo_values` (`"Quebec"` and `"British Columbia"`), `edu_quals` (`"Undergraduate degree"` and `"Professional degree"`), and age brackets (`"15 to 34 years"` and `"35 to 64 years"`).
437
444
438
-
## Fitting a non-geo-pooled model
445
+
## Fitting a forecaster without geo-pooling
439
446
440
-
The methods shown so far fit a single model across all geographic regions.
441
-
This is called "geo-pooling".
442
-
To fit a non-geo-pooled model that fits each geography separately, one either needs a multi-level
443
-
engine (which at the moment `{parsnip}` doesn't support), or one needs to loop over
444
-
geographies.
445
-
Here, we're using `purrr::map` to perform the loop.
447
+
The methods shown so far fit a single model across all geographic regions, treating them as if they are independently and identically distributed (see [Mathematical description] for an explicit model example).
448
+
This is called "geo-pooling".
449
+
In the context of `{epipredict}`, the simplest way to avoid geo-pooling and use different parameters for each geography is to loop over the `geo_value`s:
446
450
447
451
```{r fit_non_geo_pooled, warning=FALSE}
448
452
geo_values <- covid_case_death_rates |>
@@ -475,7 +479,9 @@ Fitting separate models for each geography is both 56 times slower[^7] than geo-
475
479
If a dataset contains relatively few observations for each geography, fitting a geo-pooled model is likely to produce better, more stable results.
476
480
However, geo-pooling can only be used if values are comparable in meaning and scale across geographies or can be made comparable, for example by normalization.
477
481
478
-
If we wanted to build a geo-aware model, such as a linear regression with a different intercept for each geography, we would need to build a [custom workflow](custom_epiworkflows) with geography as a factor.
482
+
If we wanted to build a geo-aware model, such as a linear regression with a
483
+
different intercept for each geography, we would need to build a [custom
484
+
workflow](custom_epiworkflows) with geography as a factor.
479
485
480
486
# Anatomy of a canned forecaster
481
487
@@ -573,6 +579,8 @@ $$
573
579
574
580
For example, $a_1$ is `lag_0_death_rate` above, with a value of `r round(hardhat::extract_fit_engine(four_week_small$epi_workflow)$coefficients["lag_0_death_rate"], 3)`,
575
581
while $a_5$ is `r round(hardhat::extract_fit_engine(four_week_small$epi_workflow)$coefficients["lag_7_case_rate"], 4) `.
582
+
Note that unlike `d_{t,j}` or `c_{t,j}`, these *don't* depend on either the time $t$ or the location $j$.
583
+
This is what make it a geo-pooled model.
576
584
577
585
The training data for fitting this linear model is constructed within the `arx_forecaster()` function by shifting a series
578
586
of columns the appropriate amount -- based on the requested `lags`.
0 commit comments