Skip to content

Commit 9f0af0a

Browse files
committed
fit -> estimate
1 parent 4a9f43e commit 9f0af0a

File tree

2 files changed

+34
-24
lines changed

2 files changed

+34
-24
lines changed

vignettes/epipredict.Rmd

Lines changed: 29 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,7 @@ To do this, we have extended the [tidymodels](https://www.tidymodels.org/)
2020
framework to handle the case of panel time-series data.
2121

2222
Our hope is that it is easy for users with epidemiological training and some statistical knowledge to
23-
fit baseline models, while also allowing those with more nuanced statistical
23+
estimate baseline models, while also allowing those with more nuanced statistical
2424
understanding to create complex custom models using the same framework.
2525
Towards that end, `{epipredict}` provides two main classes of tools:
2626

@@ -33,7 +33,7 @@ We currently provide the following basic forecasters:
3333
with increasingly wide quantiles.
3434
* `climatological_forecaster()`: predicts the median and quantiles based on the historical values around the same date in previous years.
3535
* `arx_forecaster()`: an AutoRegressive eXogenous feature forecaster, which
36-
fits a model (e.g. linear regression) on lagged data to predict quantiles
36+
estimates a model (e.g. linear regression) on lagged data to predict quantiles
3737
for continuous values.
3838
* `arx_classifier()`: fits a model (e.g. logistic regression) on lagged data
3939
to predict a binned version of the growth rate.
@@ -133,7 +133,7 @@ Let's expand on the basic example presented on the [landing
133133
page](../index.html#motivating-example), starting with adjusting some parameters in
134134
`arx_forecaster()`.
135135

136-
The `trainer` argument allows us to set the fitting engine. We can use either
136+
The `trainer` argument allows us to set the computational engine. We can use either
137137
one of the relevant [parsnip models](https://www.tidymodels.org/find/parsnip/),
138138
or one of the included engines, such as `smooth_quantile_reg()`:
139139

@@ -383,7 +383,8 @@ which define the bin boundaries.
383383

384384
In this example, the custom `breaks` passed to `arx_class_args_list()` correspond to 2 bins:
385385
`(-∞, 0.0357]` and `(0.0357, ∞)`.
386-
The bins can be interpreted as: the outcome variable is decreasing, approximately stable, slightly increasing, or increasing quickly.
386+
The bins can be interpreted as: `death_rate` is decreasing/growing slowly,
387+
or `death_rate` is growing quickly.
387388

388389
The returned `predictions` assigns each state to one of the growth rate bins.
389390
In this case, the classifier expects the growth rate for all 4 of the states to fall into the same category,
@@ -403,14 +404,16 @@ growth_rates <- covid_case_death_rates |>
403404
growth_rates |> filter(time_value == "2021-08-14")
404405
```
405406

406-
The accuracy is 50%, since all 4 states were predicted to be in the interval `(-Inf, 0.0357]`, while two, `ca` and `ny` actually were.
407+
The accuracy is 50%, since all 4 states were predicted to be in the interval
408+
`(-Inf, 0.0357]`, while two, `ca` and `ny` actually were.
407409

408410

409-
## Fitting multi-key panel data
411+
## Handling multi-key panel data
410412

411-
If multiple keys are set in the `epi_df` as `other_keys`,
412-
`arx_forecaster` will automatically group by those in addition to the required geographic key.
413-
For example, predicting the number of graduates in each of the categories in `grad_employ_subset` from above:
413+
If multiple keys are set in the `epi_df` as `other_keys`, `arx_forecaster` will
414+
automatically group by those in addition to the required geographic key.
415+
For example, predicting the number of graduates in each of the categories in
416+
`grad_employ_subset` from above:
414417

415418
```{r multi_key_forecast, warning=FALSE}
416419
# only fitting a subset, otherwise there are ~550 distinct pairs, which is bad for plotting
@@ -442,9 +445,9 @@ autoplot(
442445

443446
The 8 graphs represent all combinations of the `geo_values` (`"Quebec"` and `"British Columbia"`), `edu_quals` (`"Undergraduate degree"` and `"Professional degree"`), and age brackets (`"15 to 34 years"` and `"35 to 64 years"`).
444447

445-
## Fitting a forecaster without geo-pooling
448+
## Estimating models without geo-pooling
446449

447-
The methods shown so far fit a single model across all geographic regions, treating them as if they are independently and identically distributed (see [Mathematical description] for an explicit model example).
450+
The methods shown so far estimate a single model across all geographic regions, treating them as if they are independently and identically distributed (see [Mathematical description] for an explicit model example).
448451
This is called "geo-pooling".
449452
In the context of `{epipredict}`, the simplest way to avoid geo-pooling and use different parameters for each geography is to loop over the `geo_value`s:
450453

@@ -475,7 +478,7 @@ all_fits |>
475478
list_rbind()
476479
```
477480

478-
Fitting separate models for each geography is both 56 times slower[^7] than geo-pooling, and fits each model on far less data.
481+
Estimating separate models for each geography is both 56 times slower[^7] than geo-pooling, and uses far less data for each estimate.
479482
If a dataset contains relatively few observations for each geography, fitting a geo-pooled model is likely to produce better, more stable results.
480483
However, geo-pooling can only be used if values are comparable in meaning and scale across geographies or can be made comparable, for example by normalization.
481484

@@ -568,7 +571,7 @@ hardhat::extract_fit_engine(four_week_small$epi_workflow)
568571
```
569572

570573
If $d_{t,j}$ is the death rate on day $t$ at location $j$ and $c_{t,j}$ is the
571-
associated case rate, then the model we're fitting is:
574+
associated case rate, then the corresponding model is:
572575

573576
$$
574577
\begin{aligned}
@@ -577,14 +580,21 @@ d_{t+28, j} = & a_0 + a_1 d_{t,j} + a_2 d_{t-7,j} + a_3 d_{t-14, j} +\\
577580
\end{aligned}
578581
$$
579582

580-
For example, $a_1$ is `lag_0_death_rate` above, with a value of `r round(hardhat::extract_fit_engine(four_week_small$epi_workflow)$coefficients["lag_0_death_rate"], 3)`,
581-
while $a_5$ is `r round(hardhat::extract_fit_engine(four_week_small$epi_workflow)$coefficients["lag_7_case_rate"], 4) `.
582-
Note that unlike `d_{t,j}` or `c_{t,j}`, these *don't* depend on either the time $t$ or the location $j$.
583+
For example, $a_1$ is `lag_0_death_rate` above, with a value of `r
584+
round(hardhat::extract_fit_engine(four_week_small$epi_workflow)$coefficients["lag_0_death_rate"],
585+
3)`,
586+
while $a_5$ is `r
587+
round(hardhat::extract_fit_engine(four_week_small$epi_workflow)$coefficients["lag_7_case_rate"],
588+
4) `.
589+
Note that unlike `d_{t,j}` or `c_{t,j}`, these *don't* depend on either the time
590+
$t$ or the location $j$.
583591
This is what make it a geo-pooled model.
584592

585-
The training data for fitting this linear model is constructed within the `arx_forecaster()` function by shifting a series
586-
of columns the appropriate amount -- based on the requested `lags`.
587-
Each row containing no `NA` values is used as a training observation to fit the coefficients $a_0,\ldots, a_6$.
593+
The training data for estimating the parameters of this linear model is
594+
constructed within the `arx_forecaster()` function by shifting a series of
595+
columns the appropriate amount -- based on the requested `lags`.
596+
Each row containing no `NA` values in the predictors is used as a training observation to fit the
597+
coefficients $a_0,\ldots, a_6$.
588598

589599
[^4]: in the case of a `{parsnip}` engine which doesn't explicitly predict
590600
quantiles, these quantiles are created using `layer_residual_quantiles()`,

vignettes/panel-data.Rmd

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -109,7 +109,7 @@ sample_n(employ, 6)
109109
```
110110

111111
In the following sections, we will go over pre-processing the data in the
112-
`epi_recipe` framework, and fitting a model and making predictions within the
112+
`epi_recipe` framework, and estimating a model and making predictions within the
113113
`epipredict` framework and using the package's canned forecasters.
114114

115115
# Autoregressive (AR) model to predict number of graduates in a year
@@ -213,9 +213,9 @@ our `epi_recipe`:
213213
`lag_2_num_graduates_prop` correspond to $y_{tijk}$, $y_{t-1,ijk}$, and $y_{t-2,ijk}$
214214
respectively.
215215

216-
## Model fitting and prediction
216+
## Model estimation and prediction
217217

218-
Since our goal for now is to fit a simple autoregressive model, we can use
218+
Since our goal for now is to estimate a simple autoregressive model, we can use
219219
[`parsnip::linear_reg()`](
220220
https://parsnip.tidymodels.org/reference/linear_reg.html) with the default
221221
engine `lm`, which fits a linear regression using ordinary least squares.
@@ -333,9 +333,9 @@ rx <- epi_recipe(employ_small) %>%
333333
bake_and_show_sample(rx, employ_small)
334334
```
335335

336-
## Model fitting & post-processing
336+
## Model estimation & post-processing
337337

338-
Before fitting our model and making predictions, let's add some post-processing
338+
Before estimating our model and making predictions, let's add some post-processing
339339
steps using a few [`frosting`](
340340
https://cmu-delphi.github.io/epipredict/reference/frosting.html) layers to do
341341
a few things:

0 commit comments

Comments
 (0)