Skip to content

Commit ef1fd58

Browse files
committed
move pkgdown-watch, better climate ex, some wording
1 parent 2056e0a commit ef1fd58

File tree

5 files changed

+38
-35
lines changed

5 files changed

+38
-35
lines changed

DEVELOPMENT.md

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -40,8 +40,7 @@ difficulties. To clear those, run `make`, with either `clean_knitr`,
4040
`clean_site`, or `clean` (which does both).
4141

4242
If you work without R Studio and want to iterate on documentation, you might
43-
find [this
44-
script](https://gist.github.com/gadenbuie/d22e149e65591b91419e41ea5b2e0621)
43+
find `Rscript pkgdown-watch.R` useful.
4544
helpful. For updating references, you will need to manually call `pkgdown::build_reference()`.
4645

4746
## Versioning

README.Rmd

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -334,7 +334,7 @@ four_week_ahead$predictions |>
334334
select(geo_value, forecast_date, target_date, quantile = .pred_distn_quantile_level, value = .pred_distn_value)
335335
```
336336

337-
The yellow dot gives the median prediction, while the blue intervals give the
337+
The orange dot gives the point prediction, while the blue intervals give the
338338
25-75%, the 10-90%, and 2.5-97.5% inter-quantile ranges[^4].
339339
For this particular day and these locations, the forecasts are relatively
340340
accurate, with the true data being at least within the 10-90% interval.

README.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -309,11 +309,11 @@ four_week_ahead$predictions |>
309309
#> # ℹ 14 more rows
310310
```
311311

312-
The yellow dot gives the median prediction, while the blue intervals
313-
give the 25-75%, the 10-90%, and 2.5-97.5% inter-quantile ranges[^3].
314-
For this particular day and these locations, the forecasts are
315-
relatively accurate, with the true data being at least within the 10-90%
316-
interval. A couple of things to note:
312+
The orange dot gives the point prediction, while the blue intervals give
313+
the 25-75%, the 10-90%, and 2.5-97.5% inter-quantile ranges[^3]. For
314+
this particular day and these locations, the forecasts are relatively
315+
accurate, with the true data being at least within the 10-90% interval.
316+
A couple of things to note:
317317

318318
1. `epipredict` methods are primarily direct forecasters; this means we
319319
don’t need to predict 1, 2,…, 27 days ahead to then predict 28 days
File renamed without changes.

vignettes/epipredict.Rmd

Lines changed: 31 additions & 27 deletions
Original file line numberDiff line numberDiff line change
@@ -102,14 +102,9 @@ Let's look at an example `epi_df`:
102102
covid_case_death_rates
103103
```
104104

105-
This dataset uses a single key, `geo_value`, and two separate
106-
time series, `case_rate` and `death_rate`.
107-
The keys are represented in "long" format, with separate columns for the key and
108-
the value, while separate time series are represented in "wide" format with each
109-
time series stored in a separate column.
110-
111-
`{epiprocess}` is designed to handle data that always has a geographic key, and
112-
potentially other key values, such as age, ethnicity, or other demographic
105+
An `epi_df` always has a `geo_value` and a `time_value` as keys, along with some number of value columns, in this case `case_rate` and `death_rate`.
106+
Each of these has an associated `geo_type` (state) and `time_type` (day), for which there are some utilities.
107+
While this `geo_value` and `time_value` are the minimal set of keys, the functions of `{epiprocess}` and `{epipredict}` are designed to accommodate other key values, such as age, ethnicity, or other demographic
113108
information.
114109
For example, `grad_employ_subset` from `{epidatasets}` also has both `age_group`
115110
and `edu_qual` as additional keys:
@@ -314,39 +309,45 @@ one-ahead uncertainty.
314309
The `climatological_forecaster()` is a different kind of baseline. It produces a
315310
point forecast and quantiles based on the historical values for a given time of
316311
year, rather than extrapolating from recent values.
317-
For example, on the same dataset as above:
312+
Among our forecasters, it is the only one well suited for forecasts at long time horizons.
313+
314+
Since it requires multiple years of data and a roughly seasonal signal, the dataset we've been using for demonstrations so far is poor example for a climate forecast[^8].
315+
Instead, we'll use the fluview ILI dataset, which is weekly influenza like illness data for hhs regions, going back to 1997.
316+
317+
318+
We'll predict the 2023/24 season using all previous data, including 2020-2022, the two years where there was approximately no seasonal flu, forecasting from the start of the season, `2023-10-08`:
319+
318320
```{r make-climatological-forecast, warning=FALSE}
321+
fluview_hhs <- pub_fluview(regions = paste0("hhs", 1:10), epiweeks = epirange(100001,222201))
322+
fluview <- fluview_hhs %>% select(geo_value = region, time_value = epiweek, issue, ili) %>% as_epi_archive() %>% epix_as_of_current()
323+
319324
all_climate <- climatological_forecaster(
320-
covid_case_death_rates_extended |>
321-
filter(time_value <= forecast_date, geo_value %in% used_locations),
322-
outcome = "death_rate",
325+
fluview %>% filter(time_value < "2023-10-08"),
326+
outcome = "ili",
323327
args_list = climate_args_list(
324328
forecast_horizon = seq(0, 28),
325-
window_size = 14,
326-
time_type = "day",
327-
forecast_date = forecast_date
329+
time_type = "week",
330+
quantile_by_key = "geo_value",
331+
forecast_date = as.Date("2023-10-08")
328332
)
329333
)
330334
workflow <- all_climate$epi_workflow
331335
results <- all_climate$predictions
332336
autoplot(
333337
object = workflow,
334338
predictions = results,
335-
observed_response = covid_case_death_rates_extended |> filter(geo_value %in% used_locations, time_value > "2021-07-01")
339+
observed_response = fluview %>% filter(time_value >= "2023-10-08", time_value < "2024-05-01") %>% mutate(geo_value = factor(geo_value, levels = paste0("hhs", 1:10)))
336340
)
337341
```
338342

339-
Note that to have enough training data for this method, we're using
340-
`covid_case_death_rates_extended`, which starts in March 2020, rather than
341-
`covid_case_death_rates`, which starts in December.
342-
Without at least a year's worth of historical data, it is impossible to do a
343-
climatological model.
344-
Even with one year of data, as we have here, the resulting forecasts are unreliable.
345343

346344
One feature of the climatological baseline is that it forecasts multiple aheads
347-
simultaneously.
345+
simultaneously; here we do so for the entire season of 28 weeks.
348346
This is possible for `arx_forecaster()`, but only using `trainer =
349-
smooth_quantile_reg()`, which is built to handle multiple aheads simultaneously.
347+
smooth_quantile_reg()`, which is built to handle multiple aheads simultaneously[^9].
348+
349+
A pure climatological forecast can be thought of as forecasting a typical year so far.
350+
The 2023/24 had some regions, such as `hhs10` which were quite close to the typical year, and some, such as `hhs2` that were frequently outside even the 90% prediction band (the lightest shown above).
350351

351352
### `arx_classifier()`
352353

@@ -410,7 +411,6 @@ edu_quals <- c("Undergraduate degree", "Professional degree")
410411
geo_values <- c("Quebec", "British Columbia")
411412
412413
grad_employ <- grad_employ_subset |>
413-
filter(time_value < 2017) |>
414414
filter(edu_qual %in% edu_quals, geo_value %in% geo_values)
415415
416416
grad_employ
@@ -429,8 +429,8 @@ grad_forecast <- arx_forecaster(
429429
autoplot(
430430
grad_forecast$epi_workflow,
431431
grad_forecast$predictions,
432-
grad_employ,
433-
)
432+
observed_response = grad_employ,
433+
) + geom_vline(aes(xintercept = 2016))
434434
```
435435

436436
The 8 graphs represent all combinations of the `geo_values` (`"Quebec"` and `"British Columbia"`), `edu_quals` (`"Undergraduate degree"` and `"Professional degree"`), and age brackets (`"15 to 34 years"` and `"35 to 64 years"`).
@@ -590,3 +590,7 @@ Each row containing no `NA` values is used as a training observation to fit the
590590
`hardhat::extract_preprocessor(four_week_ahead$epi_workflow)`
591591

592592
[^7]: the number of geographies
593+
594+
[^8]: It has only a year of data, which is barely enough to run the method without errors, let alone get a meaningful prediction.
595+
596+
[^9]: Though not 28 weeks into the future! Such a forecast will be an absurd extrapolation.

0 commit comments

Comments
 (0)