Skip to content

Commit 4a9f43e

Browse files
committed
moving library, geo-pooling phrasing
1 parent ef1fd58 commit 4a9f43e

File tree

1 file changed

+30
-22
lines changed

1 file changed

+30
-22
lines changed

vignettes/epipredict.Rmd

Lines changed: 30 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -13,20 +13,6 @@ vignette: >
1313
source(here::here("vignettes/_common.R"))
1414
```
1515

16-
```{r setup, message=FALSE, include = FALSE}
17-
library(dplyr)
18-
library(parsnip)
19-
library(workflows)
20-
library(recipes)
21-
library(epidatasets)
22-
library(epipredict)
23-
library(epiprocess)
24-
library(ggplot2)
25-
library(purrr)
26-
forecast_date <- as.Date("2021-08-01")
27-
used_locations <- c("ca", "ma", "ny", "tx")
28-
library(epidatr)
29-
```
3016

3117
At a high level, the goal of `{epipredict}` is to make it easy to run simple machine
3218
learning and statistical forecasters for epidemiological data.
@@ -86,6 +72,27 @@ For a more in-depth treatment with some practical applications, see also the
8672
# Panel forecasting basics
8773

8874
This section gives basic usage examples for the package beyond the most basic usage of `arx_forecaster()` for forecasting a single ahead using the default engine.
75+
Before we start actually building forecasters, lets import some relevant libraries
76+
77+
```{r setup, message=FALSE}
78+
library(dplyr)
79+
library(parsnip)
80+
library(workflows)
81+
library(recipes)
82+
library(epidatasets)
83+
library(epipredict)
84+
library(epiprocess)
85+
library(ggplot2)
86+
library(purrr)
87+
library(epidatr)
88+
```
89+
90+
And our default forecasting date and selected states (we will use these to limit the data to make discussion easier):
91+
92+
```{r}
93+
forecast_date <- as.Date("2021-08-01")
94+
used_locations <- c("ca", "ma", "ny", "tx")
95+
```
8996

9097
## Example data
9198

@@ -435,14 +442,11 @@ autoplot(
435442

436443
The 8 graphs represent all combinations of the `geo_values` (`"Quebec"` and `"British Columbia"`), `edu_quals` (`"Undergraduate degree"` and `"Professional degree"`), and age brackets (`"15 to 34 years"` and `"35 to 64 years"`).
437444

438-
## Fitting a non-geo-pooled model
445+
## Fitting a forecaster without geo-pooling
439446

440-
The methods shown so far fit a single model across all geographic regions.
441-
This is called "geo-pooling".
442-
To fit a non-geo-pooled model that fits each geography separately, one either needs a multi-level
443-
engine (which at the moment `{parsnip}` doesn't support), or one needs to loop over
444-
geographies.
445-
Here, we're using `purrr::map` to perform the loop.
447+
The methods shown so far fit a single model across all geographic regions, treating them as if they are independently and identically distributed (see [Mathematical description] for an explicit model example).
448+
This is called "geo-pooling".
449+
In the context of `{epipredict}`, the simplest way to avoid geo-pooling and use different parameters for each geography is to loop over the `geo_value`s:
446450

447451
```{r fit_non_geo_pooled, warning=FALSE}
448452
geo_values <- covid_case_death_rates |>
@@ -475,7 +479,9 @@ Fitting separate models for each geography is both 56 times slower[^7] than geo-
475479
If a dataset contains relatively few observations for each geography, fitting a geo-pooled model is likely to produce better, more stable results.
476480
However, geo-pooling can only be used if values are comparable in meaning and scale across geographies or can be made comparable, for example by normalization.
477481

478-
If we wanted to build a geo-aware model, such as a linear regression with a different intercept for each geography, we would need to build a [custom workflow](custom_epiworkflows) with geography as a factor.
482+
If we wanted to build a geo-aware model, such as a linear regression with a
483+
different intercept for each geography, we would need to build a [custom
484+
workflow](custom_epiworkflows) with geography as a factor.
479485

480486
# Anatomy of a canned forecaster
481487

@@ -573,6 +579,8 @@ $$
573579

574580
For example, $a_1$ is `lag_0_death_rate` above, with a value of `r round(hardhat::extract_fit_engine(four_week_small$epi_workflow)$coefficients["lag_0_death_rate"], 3)`,
575581
while $a_5$ is `r round(hardhat::extract_fit_engine(four_week_small$epi_workflow)$coefficients["lag_7_case_rate"], 4) `.
582+
Note that unlike `d_{t,j}` or `c_{t,j}`, these *don't* depend on either the time $t$ or the location $j$.
583+
This is what make it a geo-pooled model.
576584

577585
The training data for fitting this linear model is constructed within the `arx_forecaster()` function by shifting a series
578586
of columns the appropriate amount -- based on the requested `lags`.

0 commit comments

Comments
 (0)