You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The accuracy is 50%, since all 4 states were predicted to be in the interval `(-Inf, 0.0357]`, while two, `ca` and `ny` actually were.
407
+
The accuracy is 50%, since all 4 states were predicted to be in the interval
408
+
`(-Inf, 0.0357]`, while two, `ca` and `ny` actually were.
407
409
408
410
409
-
## Fitting multi-key panel data
411
+
## Handling multi-key panel data
410
412
411
-
If multiple keys are set in the `epi_df` as `other_keys`,
412
-
`arx_forecaster` will automatically group by those in addition to the required geographic key.
413
-
For example, predicting the number of graduates in each of the categories in `grad_employ_subset` from above:
413
+
If multiple keys are set in the `epi_df` as `other_keys`, `arx_forecaster` will
414
+
automatically group by those in addition to the required geographic key.
415
+
For example, predicting the number of graduates in each of the categories in
416
+
`grad_employ_subset` from above:
414
417
415
418
```{r multi_key_forecast, warning=FALSE}
416
419
# only fitting a subset, otherwise there are ~550 distinct pairs, which is bad for plotting
@@ -442,9 +445,9 @@ autoplot(
442
445
443
446
The 8 graphs represent all combinations of the `geo_values` (`"Quebec"` and `"British Columbia"`), `edu_quals` (`"Undergraduate degree"` and `"Professional degree"`), and age brackets (`"15 to 34 years"` and `"35 to 64 years"`).
444
447
445
-
## Fitting a forecaster without geo-pooling
448
+
## Estimating models without geo-pooling
446
449
447
-
The methods shown so far fit a single model across all geographic regions, treating them as if they are independently and identically distributed (see [Mathematical description] for an explicit model example).
450
+
The methods shown so far estimate a single model across all geographic regions, treating them as if they are independently and identically distributed (see [Mathematical description] for an explicit model example).
448
451
This is called "geo-pooling".
449
452
In the context of `{epipredict}`, the simplest way to avoid geo-pooling and use different parameters for each geography is to loop over the `geo_value`s:
450
453
@@ -475,7 +478,7 @@ all_fits |>
475
478
list_rbind()
476
479
```
477
480
478
-
Fitting separate models for each geography is both 56 times slower[^7] than geo-pooling, and fits each model on far less data.
481
+
Estimating separate models for each geography is both 56 times slower[^7] than geo-pooling, and uses far less data for each estimate.
479
482
If a dataset contains relatively few observations for each geography, fitting a geo-pooled model is likely to produce better, more stable results.
480
483
However, geo-pooling can only be used if values are comparable in meaning and scale across geographies or can be made comparable, for example by normalization.
For example, $a_1$ is `lag_0_death_rate` above, with a value of `r round(hardhat::extract_fit_engine(four_week_small$epi_workflow)$coefficients["lag_0_death_rate"], 3)`,
581
-
while $a_5$ is `r round(hardhat::extract_fit_engine(four_week_small$epi_workflow)$coefficients["lag_7_case_rate"], 4) `.
582
-
Note that unlike `d_{t,j}` or `c_{t,j}`, these *don't* depend on either the time $t$ or the location $j$.
583
+
For example, $a_1$ is `lag_0_death_rate` above, with a value of `r
0 commit comments