Skip to content

Commit 8d045b7

Browse files
authored
Merge pull request #657 from cmu-delphi/covidcast-0.5.2
Rebuild documentation for covidcast 0.5.2
2 parents 3f6548f + 351e2d1 commit 8d045b7

File tree

95 files changed

+2482
-714
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

95 files changed

+2482
-714
lines changed

R-packages/covidcast/DESCRIPTION

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,8 @@
11
Package: covidcast
22
Type: Package
33
Title: Client for Delphi's 'COVIDcast Epidata' API
4-
Version: 0.5.0
5-
Date: 2023-05-23
4+
Version: 0.5.2
5+
Date: 2023-07-11
66
Authors@R:
77
c(
88
person(given = "Taylor",

R-packages/covidcast/NEWS.md

Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,31 @@
1+
# covidcast 0.5.3
2+
3+
To be released.
4+
5+
- Package vignettes have been adjusted so they do not make requests to the
6+
COVIDcast API during CRAN check runs. This change only affects the package
7+
build and check process, and shouldn't affect end users.
8+
9+
# covidcast 0.5.2
10+
11+
- `covidcast_meta()` now caches the server's response for a length of time
12+
specified by the COVIDcast API server, based on how frequently the metadata is
13+
recomputed. Because `covidcast_meta()` is called by `covidcast_signal()`, this
14+
saves one API call per call to `covidcast_signal()`. (@krivard, #645)
15+
16+
- `covidcast_meta()` now more clearly reports errors when the API usage limit
17+
has been reached.
18+
19+
# covidcast 0.5.1
20+
21+
- `covidcast_signals()` now supports the `time_type` argument, to match
22+
`covidcast_signal()`. If you used optional arguments to `covidcast_signals()`
23+
by position rather than by name, this may cause problems until you switch to
24+
using named arguments.
25+
26+
- Package vignettes have been altered to demonstrate more widely suitable
27+
signals, and to consolidate on a smaller set of signals.
28+
129
# covidcast 0.5.0
230

331
- The package now supports supplying API keys with requests to the COVIDcast

R-packages/covidcast/R/covidcast.R

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1022,12 +1022,17 @@ covidcast <- function(data_source, signal, time_type, geo_type, time_values,
10221022
return(paste0(unlist(lapply(values, .listitem)), collapse=','))
10231023
}
10241024

1025+
# from testthat::is_testing(), under MIT license
1026+
.is_testing <- function() {
1027+
identical(Sys.getenv("TESTTHAT"), "true")
1028+
}
1029+
10251030
# Helper function to use cached metadata whenever possible
10261031
.request_meta <- function() {
10271032
# temporary check while we wait for rerequest support in httptest: always
10281033
# request while testing. see
10291034
# https://github.com/nealrichardson/httptest/issues/84
1030-
pkg_env$META_RESPONSE <- if(identical(pkg_env$META_RESPONSE, NA) || testthat::is_testing()) {
1035+
pkg_env$META_RESPONSE <- if(identical(pkg_env$META_RESPONSE, NA) || .is_testing()) {
10311036
.request("covidcast_meta", list(format = "csv"), raw = TRUE)
10321037
} else {
10331038
httr::rerequest(pkg_env$META_RESPONSE)

R-packages/covidcast/vignettes/correlation-utils.Rmd.orig

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@ vignette: >
1111
```{r, setup, include=FALSE}
1212
knitr::opts_chunk$set(
1313
comment = "", fig.width = 6, fig.height = 6, fig.path = "figures/corr-",
14-
fig.cap = ""
14+
fig.cap = "", dev = "ragg_png"
1515
)
1616
```
1717

R-packages/covidcast/vignettes/covidcast.Rmd

Lines changed: 180 additions & 14 deletions
Large diffs are not rendered by default.
Lines changed: 297 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,297 @@
1+
---
2+
title: Get started with covidcast
3+
description: An introductory tutorial with examples.
4+
output: rmarkdown::html_vignette
5+
vignette: >
6+
%\VignetteIndexEntry{Get started with covidcast}
7+
%\VignetteEngine{knitr::rmarkdown}
8+
%\VignetteEncoding{UTF-8}
9+
---
10+
11+
```{r, setup, include=FALSE}
12+
knitr::opts_chunk$set(
13+
comment = "", fig.width = 6, fig.height = 6, fig.path = "figures/covidcast-",
14+
fig.cap = "", dev = "ragg_png"
15+
)
16+
```
17+
18+
This package provides access to data frames of values from the [COVIDcast
19+
endpoint of the Epidata
20+
API](https://cmu-delphi.github.io/delphi-epidata/api/covidcast.html). Using the
21+
`covidcast_signal()` function, you can fetch any data you may be interested in
22+
analyzing, then use `plot.covidcast_signal()` to make plots and maps. Since the
23+
data is provided as a simple data frame, you can also wrangle it into whatever
24+
form you need to conduct your desired analyses using other packages and
25+
functions.
26+
27+
## Installing
28+
29+
This package is [available on
30+
CRAN](https://cran.r-project.org/package=covidcast), so the easiest way to
31+
install it is simply
32+
33+
```r
34+
install.packages("covidcast")
35+
```
36+
37+
## Basic examples
38+
39+
To obtain smoothed estimates of COVID-like illness from our [COVID-19 Trends and
40+
Impact Survey](https://delphi.cmu.edu/covidcast/surveys/) for every county in
41+
the United States between 2020-05-01 and 2020-05-07, we can use
42+
`covidcast_signal()`:
43+
44+
```{r, message=FALSE}
45+
library(covidcast)
46+
library(dplyr)
47+
48+
cli <- covidcast_signal(data_source = "fb-survey", signal = "smoothed_wcli",
49+
start_day = "2020-05-01", end_day = "2020-05-07",
50+
geo_type = "county")
51+
knitr::kable(head(cli))
52+
```
53+
54+
`covidcast_signal()` returns a data frame. (Here we're using `knitr::kable()` to
55+
make it more readable.) Each row represents one observation in one county on one
56+
day. The county FIPS code is given in the `geo_value` column, the date in the
57+
`time_value` column. Here `value` is the requested signal---in this case, the
58+
smoothed estimate of the percentage of people with COVID-like illness, based on
59+
the symptom surveys, and `stderr` is its standard error. See the
60+
`covidcast_signal()` documentation for details on the returned data frame.
61+
62+
To get a basic summary of the returned data frame:
63+
64+
```{r}
65+
summary(cli)
66+
```
67+
68+
The COVIDcast API makes estimates available at several different geographic
69+
levels, and `covidcast_signal()` defaults to requesting county-level data. To
70+
request estimates for states instead of counties, we use the `geo_type`
71+
argument:
72+
73+
```{r, message=FALSE}
74+
cli <- covidcast_signal(data_source = "fb-survey", signal = "smoothed_wcli",
75+
start_day = "2020-05-01", end_day = "2020-05-07",
76+
geo_type = "state")
77+
knitr::kable(head(cli))
78+
```
79+
80+
One can also select a specific geographic region by its ID. For example, this is
81+
the FIPS code for Allegheny County, Pennsylvania:
82+
83+
```{r, message=FALSE}
84+
cli <- covidcast_signal(data_source = "fb-survey", signal = "smoothed_wcli",
85+
start_day = "2020-05-01", end_day = "2020-05-07",
86+
geo_type = "county", geo_value = "42003")
87+
knitr::kable(head(cli))
88+
```
89+
90+
### API keys
91+
92+
By default, this package submits queries to the API anonymously. All the
93+
examples in the package documentation are compatible with anonymous use of the
94+
API, but [there are some limits on anonymous
95+
queries](https://cmu-delphi.github.io/delphi-epidata/api/api_keys.html),
96+
including rate limits on the number of queries that can be submitted per hour.
97+
To lift these limits, see the "API keys" section of the `covidcast_signal()`
98+
documentation for information on how to register for and use an API key.
99+
100+
### Plotting and mapping
101+
102+
This package provides convenient functions for plotting and mapping these
103+
signals. For example, simple line charts are easy to construct:
104+
105+
```{r}
106+
plot(cli, plot_type = "line",
107+
title = "Survey results in Allegheny County, PA")
108+
```
109+
110+
For more details and examples, including choropleth and bubble maps, see
111+
`vignette("plotting-signals")`.
112+
113+
114+
### Finding signals of interest
115+
116+
Above we used data from [Delphi's symptom
117+
surveys](https://delphi.cmu.edu/covid19/ctis/), but the COVIDcast API includes
118+
numerous data streams: medical claims data, cases and deaths, mobility, and many
119+
others; new signals are added regularly. This can make it a challenge to find
120+
the data stream that you are most interested in.
121+
122+
The [COVIDcast Data Sources and Signals
123+
documentation](https://cmu-delphi.github.io/delphi-epidata/api/covidcast_signals.html)
124+
lists all data sources and signals available through COVIDcast. When you find a
125+
signal of interest, get the data source name (such as `jhu-csse` or `fb-survey`)
126+
and the signal name (such as `confirmed_incidence_num` or `smoothed_wcli`).
127+
These are provided as arguments to `covidcast_signal()` to request the data you
128+
want.
129+
130+
131+
### Finding counties and metro areas
132+
133+
The COVIDcast API identifies counties by their 5-digit FIPS code and
134+
metropolitan areas by their CBSA ID number. (See the [geographic coding
135+
documentation](https://cmu-delphi.github.io/delphi-epidata/api/covidcast_geography.html)
136+
for details.) This means that to query a specific county or metropolitan area,
137+
we must have some way to quickly find its identifier.
138+
139+
This package includes several utilities intended to make the process easier. For
140+
example, if we look at `?county_census`, we find that the package provides
141+
census data (such as population) on every county in the United States, including
142+
its FIPS code. Similarly, by looking at `?msa_census` we can find data about
143+
metropolitan statistical areas, their corresponding CBSA IDs, and recent census
144+
data.
145+
146+
(Note: the `msa_census` data includes types of area beyond metropolitan
147+
statistical areas, including micropolitan statistical areas. The `LSAD` column
148+
identifies the type of each area. The COVIDcast API only provides estimates for
149+
metropolitan statistical areas, not for their divisions or for micropolitan
150+
areas.)
151+
152+
Building on these datasets, the convenience functions `name_to_fips()` and
153+
`name_to_cbsa()` conduct `grep()`-based searching of county or metropolitan area
154+
names to find FIPS or CBSA codes, respectively:
155+
156+
```{r}
157+
name_to_fips("Allegheny")
158+
name_to_cbsa("Pittsburgh")
159+
```
160+
161+
Since these functions return vectors of IDs, we can use them to construct the
162+
`geo_values` argument to `covidcast_signal()` to select specific regions to
163+
query.
164+
165+
You may also want to convert FIPS codes or CBSA IDs back to well-known names,
166+
for instance to report in tables or graphics. The package provides inverse
167+
mappings `county_fips_to_name()` and `cbsa_to_name()` that work in the
168+
analogous way:
169+
170+
```{r}
171+
county_fips_to_name("42003")
172+
cbsa_to_name("38300")
173+
```
174+
175+
See their documentation for more details (for example, the options for handling
176+
matches when counties have the same name).
177+
178+
## Signal metadata
179+
180+
If we are interested in exploring the available signals and their metadata, we
181+
can use `covidcast_meta()` to fetch a data frame of the available signals:
182+
183+
```{r}
184+
meta <- covidcast_meta()
185+
knitr::kable(head(meta))
186+
```
187+
188+
The `covidcast_meta()` documentation describes the columns and their meanings.
189+
The metadata data frame can be filtered and sliced as desired to obtain
190+
information about signals of interest. To get a basic summary of the metadata:
191+
192+
```{r, eval = FALSE}
193+
summary(meta)
194+
```
195+
196+
(We silenced the evaluation because the output of `summary()` here is still
197+
quite long.)
198+
199+
## Tracking issues and updates
200+
201+
The COVIDcast API records not just each signal's estimate for a given location
202+
on a given day, but also *when* that estimate was made, and all updates to that
203+
estimate.
204+
205+
For example, consider using our [doctor visits
206+
signal](https://cmu-delphi.github.io/delphi-epidata/api/covidcast-signals/doctor-visits.html),
207+
which estimates the percentage of outpatient doctor visits that are
208+
COVID-related, and consider a result row with `time_value` 2020-05-01 for
209+
`geo_values = "pa"`. This is an estimate for the percentage in Pennsylvania on
210+
May 1, 2020. That estimate was *issued* on May 5, 2020, the delay being due to
211+
the aggregation of data by our source and the time taken by the COVIDcast API to
212+
ingest the data provided. Later, the estimate for May 1st could be updated,
213+
perhaps because additional visit data from May 1st arrived at our source and was
214+
reported to us. This constitutes a new *issue* of the data.
215+
216+
### Data known "as of" a specific date
217+
218+
By default, `covidcast_signal()` fetches the most recent issue available. This
219+
is the best option for users who simply want to graph the latest data or
220+
construct dashboards. But if we are interested in knowing *when* data was
221+
reported, we can request specific data versions using the `as_of`, `issues`, or
222+
`lag` arguments. (Note these are mutually exclusive; only one can be specified
223+
at a time.)
224+
225+
First, we can request the data that was available *as of* a specific date, using
226+
the `as_of` argument:
227+
228+
```{r, message = FALSE}
229+
covidcast_signal(data_source = "doctor-visits", signal = "smoothed_adj_cli",
230+
start_day = "2020-05-01", end_day = "2020-05-01",
231+
geo_type = "state", geo_values = "pa", as_of = "2020-05-07")
232+
```
233+
234+
This shows that an estimate of about 2.3% was issued on May 7. If we don't
235+
specify `as_of`, we get the most recent estimate available:
236+
237+
```{r, message = FALSE}
238+
covidcast_signal(data_source = "doctor-visits", signal = "smoothed_adj_cli",
239+
start_day = "2020-05-01", end_day = "2020-05-01",
240+
geo_type = "state", geo_values = "pa")
241+
```
242+
243+
Note the substantial change in the estimate, to over 5%, reflecting new data
244+
that became available *after* May 7 about visits occurring on May 1. This
245+
illustrates the importance of issue date tracking, particularly for forecasting
246+
tasks. To backtest a forecasting model on past data, it is important to use the
247+
data that would have been available *at the time*, not data that arrived much
248+
later.
249+
250+
### Multiple issues of observations
251+
252+
By using the `issues` argument, we can request all issues in a certain time
253+
period:
254+
255+
```{r, message = FALSE}
256+
covidcast_signal(data_source = "doctor-visits", signal = "smoothed_adj_cli",
257+
start_day = "2020-05-01", end_day = "2020-05-01",
258+
geo_type = "state", geo_values = "pa",
259+
issues = c("2020-05-01", "2020-05-15")) %>%
260+
knitr::kable()
261+
```
262+
263+
This estimate was clearly updated many times as new data for May 1st arrived.
264+
Note that these results include only data issued or updated between 2020-05-01
265+
and 2020-05-15. If a value was first reported on 2020-04-15, and never updated,
266+
a query for issues between 2020-05-01 and 2020-05-15 will not include that value
267+
among its results.
268+
269+
After fetching multiple issues of data, we can use the `latest_issue()` or
270+
`earliest_issue()` functions to subset the data frame to view only the latest or
271+
earliest issue of each observation.
272+
273+
### Observations issued with a specific lag
274+
275+
Finally, we can use the `lag` argument to request only data reported with a
276+
certain lag. For example, requesting a lag of 7 days means to request only
277+
issues 7 days after the corresponding `time_value`:
278+
279+
```{r, message = FALSE}
280+
covidcast_signal(data_source = "doctor-visits", signal = "smoothed_adj_cli",
281+
start_day = "2020-05-01", end_day = "2020-05-07",
282+
geo_type = "state", geo_values = "pa", lag = 7) %>%
283+
knitr::kable()
284+
```
285+
286+
Note that though this query requested all values between 2020-05-01 and
287+
2020-05-07, May 3rd and May 4th were *not* included in the results set. This is
288+
because the query will only include a result for May 3rd if a value were issued
289+
on May 10th (a 7-day lag), but in fact the value was not updated on that day:
290+
291+
```{r, message = FALSE}
292+
covidcast_signal(data_source = "doctor-visits", signal = "smoothed_adj_cli",
293+
start_day = "2020-05-03", end_day = "2020-05-03",
294+
geo_type = "state", geo_values = "pa",
295+
issues = c("2020-05-09", "2020-05-15")) %>%
296+
knitr::kable()
297+
```

0 commit comments

Comments
 (0)