Skip to content

Commit c342680

Browse files
committed
recommended edits
1 parent 9f0af0a commit c342680

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

48 files changed

+316
-315
lines changed

NAMESPACE

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -190,6 +190,7 @@ export(nested_quantiles)
190190
export(new_default_epi_recipe_blueprint)
191191
export(new_epi_recipe_blueprint)
192192
export(pivot_longer)
193+
export(pivot_quantiles)
193194
export(pivot_quantiles_longer)
194195
export(pivot_quantiles_wider)
195196
export(pivot_wider)

R/arx_classifier.R

Lines changed: 11 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -34,17 +34,17 @@
3434
#' ```
3535
#'
3636
#' The key takeaway from the predictions is that there are two prediction
37-
#' classes: `(-Inf, 0.25]` and `(0.25, Inf)`. This is because for our goal of
38-
#' classification the classes must be discrete. The discretization of the
39-
#' real-valued outcome is controlled by the `breaks` argument, which defaults
40-
#' to `0.25`. Such breaks will be automatically extended to cover the entire
41-
#' real line. For example, the default break of `0.25` is silently extended to
42-
#' `breaks = c(-Inf, .25, Inf)` and, therefore, results in two classes:
43-
#' `[-Inf, 0.25]` and `(0.25, Inf)`. These two classes are used to discretize
44-
#' the outcome. The conversion of the outcome to such classes is handled
45-
#' internally. So if discrete classes already exist for the outcome in the
46-
#' `epi_df`, then we recommend to code a classifier from scratch using the
47-
#' `epi_workflow` framework for more control.
37+
#' classes: `(-Inf, 0.25]` and `(0.25, Inf)`: the classes to predict must be
38+
#' discrete. The discretization of the real-valued outcome is controlled by
39+
#' the `breaks` argument, which defaults to `0.25`. Such breaks will be
40+
#' automatically extended to cover the entire real line. For example, the
41+
#' default break of `0.25` is silently extended to `breaks = c(-Inf, .25,
42+
#' Inf)` and, therefore, results in two classes: `[-Inf, 0.25]` and `(0.25,
43+
#' Inf)`. These two classes are used to discretize the outcome. The conversion
44+
#' of the outcome to such classes is handled internally. So if discrete
45+
#' classes already exist for the outcome in the `epi_df`, then we recommend to
46+
#' code a classifier from scratch using the `epi_workflow` framework for more
47+
#' control.
4848
#'
4949
#' The `trainer` is a `parsnip` model describing the type of estimation such
5050
#' that `mode = "classification"` is enforced. The two typical trainers that

R/arx_forecaster.R

Lines changed: 5 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@
33
#' This is an autoregressive forecasting model for
44
#' [epiprocess::epi_df][epiprocess::as_epi_df] data. It does "direct"
55
#' forecasting, meaning that it estimates a model for a particular target
6-
#' horizon of `outcome` based on the lags of the `predictors`. See the [Get
6+
#' horizon of the `outcome` based on the lags of the `predictors`. See the [Get
77
#' started vignette](../articles/epipredict.html) for some worked examples and
88
#' [Custom epi_workflows vignette](../articles/custom_epiworkflows.html) for a
99
#' recreation using a custom `epi_workflow()`.
@@ -13,16 +13,15 @@
1313
#' @param outcome A character (scalar) specifying the outcome (in the `epi_df`).
1414
#' @param predictors A character vector giving column(s) of predictor variables.
1515
#' This defaults to the `outcome`. However, if manually specified, only those
16-
#' variables specifically mentioned will be used. (The `outcome` will not be
17-
#' added.) By default, equals the outcome. If manually specified, does not
18-
#' add the outcome variable, so make sure to specify it.
16+
#' variables specifically mentioned will be used, and the `outcome` will not be
17+
#' added.
1918
#' @param trainer A `{parsnip}` model describing the type of estimation. For
2019
#' now, we enforce `mode = "regression"`.
2120
#' @param args_list A list of customization arguments to determine the type of
2221
#' forecasting model. See [arx_args_list()].
2322
#'
2423
#' @return An `arx_fcast`, with the fields `predictions` and `epi_workflow`.
25-
#' `predictions` is an `epi_df` of predicted values while `epi_workflow()` is
24+
#' `predictions` is a `tibble` of predicted values while `epi_workflow()` is
2625
#' the fit workflow used to make those predictions
2726
#' @export
2827
#' @seealso [arx_fcast_epi_workflow()], [arx_args_list()]
@@ -270,8 +269,7 @@ arx_fcast_epi_workflow <- function(
270269
#' training residuals. A `NULL` value will result in point forecasts only.
271270
#' @param symmetrize Logical. The default `TRUE` calculates symmetric prediction
272271
#' intervals. This argument only applies when residual quantiles are used. It
273-
#' is not applicable with `trainer = quantile_reg()`, for example. This is
274-
#' achieved by including both the residuals and their negation. Typically, one
272+
#' is not applicable with `trainer = quantile_reg()`, for example. Typically, one
275273
#' would only want non-symmetric quantiles when increasing trajectories are
276274
#' quite different from decreasing ones, such as a strictly postive variable
277275
#' near zero.

R/climatological_forecaster.R

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -134,12 +134,12 @@ climatological_forecaster <- function(epi_data,
134134
# get the distinct .idx for the target date(s)
135135
distinct_target_idx <- predictions$.idx %>% unique()
136136
# get all of the idx's within the window of the target .idxs
137-
entries <- map(distinct_target_idx, \(idx) within_window(idx, window_size, modulus)) %>%
137+
entries <- map(distinct_target_idx, function(idx) within_window(idx, window_size, modulus)) %>%
138138
do.call(c, .) %>%
139139
unique()
140140
# for the center, we need those within twice the window, since for each point
141141
# we're subtracting out the center to generate the quantiles
142-
entries_double_window <- map(entries, \(idx) within_window(idx, window_size, modulus)) %>%
142+
entries_double_window <- map(entries, function(idx) within_window(idx, window_size, modulus)) %>%
143143
do.call(c, .) %>%
144144
unique()
145145

R/epi_recipe.R

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -232,8 +232,7 @@ is_epi_recipe <- function(x) {
232232

233233

234234

235-
#' Given an `epi_recipe`, add it to, remove it from, or update it in an
236-
#' `epi_workflow`
235+
#' Add/remove/update the `epi_recipe` of an `epi_workflow`
237236
#'
238237
#' @description
239238
#' - `add_recipe()` specifies the terms of the model and any preprocessing that

R/epi_workflow.R

Lines changed: 12 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -113,14 +113,19 @@ fit.epi_workflow <- function(object, data, ..., control = workflows::control_wor
113113
#'
114114
#' @description
115115
#' This is the `predict()` method for a fit epi_workflow object. The 3 steps that this implements are:
116+
#' - Preprocess `new_data` using the preprocessing method specified when the
117+
#' workflow was created and fit. This is accomplished using
118+
#' [hardhat::forge()], which will apply any formula preprocessing or call
119+
#' [recipes::bake()] if a recipe was supplied.
116120
#'
117121
#' - Preprocessing `new_data` using the preprocessing method specified when the
118122
#' epi_workflow was created and fit. This is accomplished using
119-
#' `recipes::bake()` if a recipe was supplied. Note that this is a slightly
120-
#' different `bake` operation than the one occuring during the fit. Any `step`
121-
#' that has `skip = TRUE` isn't applied during prediction; for example in
122-
#' `step_epi_naomit()`, `all_outcomes()` isn't `NA` omitted, since doing so
123-
#' would drop the exact `time_values` we are trying to predict.
123+
#' `hardhat::bake()` if a recipe was supplied (passing through
124+
#' [hardhat::forge()], which is used for non-recipe preprocessors). Note that
125+
#' this is a slightly different `bake` operation than the one occuring during
126+
#' the fit. Any `step` that has `skip = TRUE` isn't applied during prediction;
127+
#' for example in `step_epi_naomit()`, `all_outcomes()` isn't `NA` omitted,
128+
#' since doing so would drop the exact `time_values` we are trying to predict.
124129
#'
125130
#' - Calling `parsnip::predict.model_fit()` for you using the underlying fit
126131
#' parsnip model.
@@ -137,7 +142,7 @@ fit.epi_workflow <- function(object, data, ..., control = workflows::control_wor
137142
#'
138143
#' @return
139144
#' A data frame of model predictions, with as many rows as `new_data` has.
140-
#' If `new_data` is an `epi_df()` or a data frame with `time_value` or
145+
#' If `new_data` is an `epiprocess::epi_df` or a data frame with `time_value` or
141146
#' `geo_value` columns, then the result will have those as well.
142147
#'
143148
#' @name predict-epi_workflow
@@ -234,7 +239,7 @@ print.epi_workflow <- function(x, ...) {
234239
}
235240

236241

237-
#' Produce a forecast from just an epi workflow
242+
#' Produce a forecast from an epi workflow and it's training data
238243
#'
239244
#' `forecast.epi_workflow` predicts by restricting the training data to the
240245
#' latest available data, and predicting on that. It binds together

R/extrapolate_quantiles.R

Lines changed: 6 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -3,14 +3,15 @@
33
#' This both interpolates between quantile levels already defined in `x` and
44
#' extrapolates quantiles outside their bounds. The interpolation method is
55
#' determined by the `quantile` argument `middle`, which can be either `"cubic"`
6-
#' for a (hyman) cubic spline interpolation, or `"linear"` for simple linear
6+
#' for a (Hyman) cubic spline interpolation, or `"linear"` for simple linear
77
#' interpolation.
88
#'
99
#' There is only one extrapolation method for values greater than the largest
10-
#' known quantile level or smaller than the smallest known quantile level. It
11-
#' assumes a roughly exponential tail, whose decay rate and offset is derived
12-
#' from the slope of the two most extreme quantile levels on a logistic scale.
13-
#' See the internal function `tail_extrapolate()` for the exact implementation.
10+
#' available quantile level or smaller than the smallest available quantile
11+
#' level. It assumes a roughly exponential tail, whose decay rate and offset is
12+
#' derived from the slope of the two most extreme quantile levels on a logistic
13+
#' scale. See the internal function `tail_extrapolate()` for the exact
14+
#' implementation.
1415
#'
1516
#' This function takes a `quantile_pred` vector and returns the same
1617
#' type of object, expanded to include

R/frosting.R

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,4 @@
1-
#' Given a `frosting()`, add it to, remove it from, or update it in an
2-
#' `epi_workflow`
1+
#' Add/remove/update the `frosting` of an `epi_workflow`
32
#'
43
#' @param x A workflow
54
#' @param frosting A frosting object created using `frosting()`.

R/get_test_data.R

Lines changed: 7 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,12 @@
11
#' Get test data for prediction based on longest lag period
22
#'
3-
#' If `predict()` is given the full training dataset, it will produce a forecast
4-
#' for every day which has enough data. For most cases, this is far more
5-
#' forecasts than is necessary. `get_test_data()` is designed to restrict the given dataset to the minimum amount needed to produce a forecast on the `forecast_date`.
6-
#' Primarily this is based on the longest lag period in the recipe.
3+
#' If `predict()` is given the full training dataset, it will produce a
4+
#' prediction for every `time_value` which has enough data. For most cases, this
5+
#' generates predictions for `time_values` where the `outcome` has already been
6+
#' observed. `get_test_data()` is designed to restrict the given dataset to the
7+
#' minimum amount needed to produce a forecast on the `forecast_date` for future
8+
#' data, rather than a prediction on past `time_value`s. Primarily this is
9+
#' based on the longest lag period in the recipe.
710
#'
811
#' The minimum required (recent) data to produce a forecast is equal to
912
#' the maximum lag requested (on any predictor) plus the longest horizon

R/layer_population_scaling.R

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,10 @@
11
#' Convert per-capita predictions to raw scale
22
#'
33
#' `layer_population_scaling` creates a specification of a frosting layer that
4-
#' will "undo" per-capita scaling done in `step_population_scaling()`. Typical
5-
#' usage would set `df` to be a dataset that contains state-level population,
6-
#' and use it to convert predictions made from a raw scale model to rate-scale
7-
#' by dividing by the population.
4+
#' will "undo" per-capita scaling done in `step_population_scaling()`.
5+
#' Typical usage would set `df` to be a dataset that contains a list of
6+
#' population for the `geo_value`s, and use it to convert predictions made from
7+
#' a raw scale model to rate-scale by dividing by the population.
88
#' Although, it is worth noting that there is nothing special about
99
#' "population", and the function can be used to scale by any variable.
1010
#' Population is the standard use case in the epidemiology forecasting scenario.

0 commit comments

Comments
 (0)