You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: vignettes/epipredict.Rmd
+31-27Lines changed: 31 additions & 27 deletions
Original file line number
Diff line number
Diff line change
@@ -102,14 +102,9 @@ Let's look at an example `epi_df`:
102
102
covid_case_death_rates
103
103
```
104
104
105
-
This dataset uses a single key, `geo_value`, and two separate
106
-
time series, `case_rate` and `death_rate`.
107
-
The keys are represented in "long" format, with separate columns for the key and
108
-
the value, while separate time series are represented in "wide" format with each
109
-
time series stored in a separate column.
110
-
111
-
`{epiprocess}` is designed to handle data that always has a geographic key, and
112
-
potentially other key values, such as age, ethnicity, or other demographic
105
+
An `epi_df` always has a `geo_value` and a `time_value` as keys, along with some number of value columns, in this case `case_rate` and `death_rate`.
106
+
Each of these has an associated `geo_type` (state) and `time_type` (day), for which there are some utilities.
107
+
While this `geo_value` and `time_value` are the minimal set of keys, the functions of `{epiprocess}` and `{epipredict}` are designed to accommodate other key values, such as age, ethnicity, or other demographic
113
108
information.
114
109
For example, `grad_employ_subset` from `{epidatasets}` also has both `age_group`
115
110
and `edu_qual` as additional keys:
@@ -314,39 +309,45 @@ one-ahead uncertainty.
314
309
The `climatological_forecaster()` is a different kind of baseline. It produces a
315
310
point forecast and quantiles based on the historical values for a given time of
316
311
year, rather than extrapolating from recent values.
317
-
For example, on the same dataset as above:
312
+
Among our forecasters, it is the only one well suited for forecasts at long time horizons.
313
+
314
+
Since it requires multiple years of data and a roughly seasonal signal, the dataset we've been using for demonstrations so far is poor example for a climate forecast[^8].
315
+
Instead, we'll use the fluview ILI dataset, which is weekly influenza like illness data for hhs regions, going back to 1997.
316
+
317
+
318
+
We'll predict the 2023/24 season using all previous data, including 2020-2022, the two years where there was approximately no seasonal flu, forecasting from the start of the season, `2023-10-08`:
Note that to have enough training data for this method, we're using
340
-
`covid_case_death_rates_extended`, which starts in March 2020, rather than
341
-
`covid_case_death_rates`, which starts in December.
342
-
Without at least a year's worth of historical data, it is impossible to do a
343
-
climatological model.
344
-
Even with one year of data, as we have here, the resulting forecasts are unreliable.
345
343
346
344
One feature of the climatological baseline is that it forecasts multiple aheads
347
-
simultaneously.
345
+
simultaneously; here we do so for the entire season of 28 weeks.
348
346
This is possible for `arx_forecaster()`, but only using `trainer =
349
-
smooth_quantile_reg()`, which is built to handle multiple aheads simultaneously.
347
+
smooth_quantile_reg()`, which is built to handle multiple aheads simultaneously[^9].
348
+
349
+
A pure climatological forecast can be thought of as forecasting a typical year so far.
350
+
The 2023/24 had some regions, such as `hhs10` which were quite close to the typical year, and some, such as `hhs2` that were frequently outside even the 90% prediction band (the lightest shown above).
The 8 graphs represent all combinations of the `geo_values` (`"Quebec"` and `"British Columbia"`), `edu_quals` (`"Undergraduate degree"` and `"Professional degree"`), and age brackets (`"15 to 34 years"` and `"35 to 64 years"`).
@@ -590,3 +590,7 @@ Each row containing no `NA` values is used as a training observation to fit the
0 commit comments