Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
33 commits
Select commit Hold shift + click to select a range
a00c125
add prefix wip
Jul 27, 2020
15bf81e
update test_run for wip signals
Jul 27, 2020
6dcd1bc
update signal names and export end date
Jul 30, 2020
df302b3
fixed errors in case:no new data
Aug 1, 2020
f8f0bbe
update signal names in DETAILS
Aug 3, 2020
d883c0c
update DETAILS
Aug 4, 2020
e0a6124
remove prefix wip_
Aug 4, 2020
156476f
update the export start date to generate reports from -45 days to -5days
Aug 4, 2020
c730742
deleted white spaces'
Aug 4, 2020
aefcfb5
Update Exceptions
jingjtang Aug 5, 2020
3457802
add wip test_per_device)
Aug 6, 2020
de7b82d
update unit tests
Aug 13, 2020
779c5d6
Added new constants.py file
vishakha1812 Aug 14, 2020
7c50aa1
Updated run.py: added new usecase for signal names
vishakha1812 Aug 14, 2020
e6d1ff6
Updated params.json.template
vishakha1812 Aug 14, 2020
4ffff29
Updated setup.py
vishakha1812 Aug 14, 2020
b3f8977
Updated tests/params.json.template
vishakha1812 Aug 14, 2020
cf2f725
Added new test case to check signal names
vishakha1812 Aug 14, 2020
bfb422e
Updated test_run.py to include new sensor names
vishakha1812 Aug 14, 2020
baac1ca
Added new file to handle signal naming
vishakha1812 Aug 14, 2020
566e7d7
Added missing files in /static
vishakha1812 Aug 14, 2020
e62f772
Update quidel_covidtest/params.json.template
vishakha1812 Aug 18, 2020
148fa7b
Update quidel_covidtest/tests/params.json.template
vishakha1812 Aug 18, 2020
e52e9b0
added a dry-run mode
Aug 18, 2020
71cb57d
Add files via upload
jingjtang Aug 19, 2020
16d6de4
Delete test_data_tools.py
jingjtang Aug 19, 2020
b0814ba
Add files via upload
jingjtang Aug 19, 2020
fd66b79
resolved a conflict caused by accident
Aug 19, 2020
3637c8e
Solved the problems in pylint test
Aug 22, 2020
9d22aa0
Added explainations to TestDate and StorageDate
Aug 26, 2020
a18184e
commented out test_per_device signals
Aug 26, 2020
d01ade2
Fixed the error in the documentation of se
jingjtang Aug 28, 2020
2167231
uploaded test_data for unit tests
Aug 28, 2020
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
28 changes: 23 additions & 5 deletions quidel_covidtest/DETAILS.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,8 +4,8 @@
Starting May 9, 2020, we began getting Quidel COVID Test data and started reporting it from May 26, 2020 due to limitation in the data volume. The data contains a number of features for every test, including localization at 5-digit Zip Code level, a TestDate and StorageDate, patient age, and several identifiers that uniquely identify the device on which the test was performed (SofiaSerNum, the individual test (FluTestNum), and the result (ResultID). Multiple tests are stored on each device. The present Quidel COVID Test sensor concerns the positive rate in the test result.

### Signal names
- raw_pct_positive: estimates of the percentage of positive tests in total tests
- smoothed_pct_positive: same as in the first one, but where the estimates are formed by pooling together the last 7 days of data
- covid_ag_raw_pct_positive: percent of tests returning positive that day
- covid_ag_smoothed_pct_positive: same as above, but for the moving average of the most recent 7 days

### Estimating percent positive test proportion
Let n be the number of total COVID tests taken over a given time period and a given location (the test result can be negative/positive/invalid). Let x be the number of tests taken with positive results in this location over the given time period. We are interested in estimating the percentage of positive tests which is defined as:
Expand Down Expand Up @@ -35,10 +35,28 @@ p = 100 * X / N

The estimated standard error is simply:
```
se = 1/100 * sqrt{ p*(1-p)/N }
se = 100 * sqrt{ p/100 *(1-p/100)/N }
```
where we assume for each time point, the estimates follow a binomial distribution.


### Temporal Pooling
Additionally, as with the Quidel COVID Test signal, we consider smoothed estimates formed by pooling data over time. That is, daily, for each location, we first pool all data available in that location over the last 7 days, and we then recompute everything described in the last two subsections. Pooling in this data makes estimates available in more geographic areas, as many areas report very few tests per day, but have enough data to report when 7 days are considered.
### Temporal and Spatial Pooling
We conduct temporal and spatial pooling for the smoothed signal. The spatial pooling is described in the previous section where we shrink the estimates to the state's mean if the total test number is smaller than 50 for a certain location on a certain day. Additionally, as with the Quidel COVID Test signal, we consider smoothed estimates formed by pooling data over time. That is, daily, for each location, we first pool all data available in that location over the last 7 days, and we then recompute everything described in the last two subsections. Pooling in this data makes estimates available in more geographic areas.

### Exceptions
There are 9 special zip codes that are included in Quidel COVID raw data but are not included in our reports temporarily since we do not have enough mapping information for them.

|zip |State| Number of Tests|
|---|-------|------|
|78086 |TX|98|
|20174 | VA|17|
|48824 |MI|14|
|32313 |FL|37|
|29486 |SC|69|
|75033 |TX|2318|
|79430 |TX|43|
|44325 |OH|56|
|75072 |TX|63|

* Number of tests calculated until 08-05-2020
* Until 08-05-2020, only 2,715 tests out of 942,293 tests for those zip codes.
33 changes: 33 additions & 0 deletions quidel_covidtest/delphi_quidel_covidtest/constants.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
"""Registry for constants"""
# global constants
MIN_OBS = 50 # minimum number of observations in order to compute a proportion.
POOL_DAYS = 7 # number of days in the past (including today) to pool over
END_FROM_TODAY_MINUS = 5 # report data until - X days
EXPORT_DAY_RANGE = 40 # Number of dates to report
# Signal names
SMOOTHED_POSITIVE = "covid_ag_smoothed_pct_positive"
RAW_POSITIVE = "covid_ag_raw_pct_positive"
SMOOTHED_TEST_PER_DEVICE = "covid_ag_smoothed_test_per_device"
RAW_TEST_PER_DEVICE = "covid_ag_raw_test_per_device"
# Geo types
COUNTY = "county"
MSA = "msa"
HRR = "hrr"

GEO_RESOLUTIONS = [
COUNTY,
MSA,
HRR
]
SENSORS = [
SMOOTHED_POSITIVE,
RAW_POSITIVE
# SMOOTHED_TEST_PER_DEVICE,
# RAW_TEST_PER_DEVICE
]
SMOOTHERS = {
SMOOTHED_POSITIVE: (False, True),
RAW_POSITIVE: (False, False)
# SMOOTHED_TEST_PER_DEVICE: (True, True),
# RAW_TEST_PER_DEVICE: (True, False)
}
123 changes: 123 additions & 0 deletions quidel_covidtest/delphi_quidel_covidtest/data_tools.py
Original file line number Diff line number Diff line change
Expand Up @@ -250,3 +250,126 @@ def smoothed_positive_prop(positives, tests, min_obs, pool_days,
pooled_tests = tpooled_tests
## STEP 2: CALCULATE AS THOUGH THEY'RE RAW
return raw_positive_prop(pooled_positives, pooled_tests, min_obs)


def raw_tests_per_device(devices, tests, min_obs):
'''
Calculates the tests per device for a single geographic
location, without any temporal smoothing.

If on any day t, tests[t] < min_obs, then we report np.nan.
The second and third returned np.ndarray are the standard errors,
currently all np.nan; and the sample size.
Args:
devices: np.ndarray[float]
Number of devices, ordered in time, where each array element
represents a subsequent day. If there were no devices, this should
be zero (never np.nan).
tests: np.ndarray[float]
Number of tests performed. If there were no tests performed, this
should be zero (never np.nan).
min_obs: int
Minimum number of observations in order to compute a ratio
Returns:
np.ndarray
Tests per device on each day, with the same length
as devices and tests.
np.ndarray
Placeholder for standard errors
np.ndarray
Sample size used to compute estimates.
'''
devices = devices.astype(float)
tests = tests.astype(float)
if (np.any(np.isnan(devices)) or np.any(np.isnan(tests))):
print(devices)
print(tests)
raise ValueError('devices and tests should be non-negative '
'with no np.nan')
if min_obs <= 0:
raise ValueError('min_obs should be positive')
tests[tests < min_obs] = np.nan
tests_per_device = tests / devices
se = np.repeat(np.nan, len(devices))
sample_size = tests

return tests_per_device, se, sample_size

def smoothed_tests_per_device(devices, tests, min_obs, pool_days,
parent_devices=None, parent_tests=None):
"""
Calculates the ratio of tests per device for a single geographic
location, with temporal smoothing.
For a given day t, if sum(tests[(t-pool_days+1):(t+1)]) < min_obs, then we
'borrow' min_obs - sum(tests[(t-pool_days+1):(t+1)]) observations from the
parents over the same timespan. Importantly, it will make sure NOT to
borrow observations that are _already_ in the current geographic partition
being considered.
If min_obs is specified but not satisfied over the pool_days, and
parent arrays are not provided, then we report np.nan.
The second and third returned np.ndarray are the standard errors,
currently all placeholder np.nan; and the reported sample_size.
Args:
devices: np.ndarray[float]
Number of devices, ordered in time, where each array element
represents a subsequent day. If there were no devices, this should
be zero (never np.nan).
tests: np.ndarray[float]
Number of tests performed. If there were no tests performed, this
should be zero (never np.nan).
min_obs: int
Minimum number of observations in order to compute a ratio
pool_days: int
Number of days in the past (including today) over which to pool data.
parent_devices: np.ndarray
Like devices, but for the parent geographic partition (e.g., State)
If this is None, then this shall have 0 devices uniformly.
parent_tests: np.ndarray
Like tests, but for the parent geographic partition (e.g., State)
If this is None, then this shall have 0 tests uniformly.
Returns:
np.ndarray
Tests per device after the pool_days pooling, with the same
length as devices and tests.
np.ndarray
Standard errors, currently uniformly np.nan (placeholder).
np.ndarray
Effective sample size (after temporal and geographic pooling).
"""
devices = devices.astype(float)
tests = tests.astype(float)
if (parent_devices is None) or (parent_tests is None):
has_parent = False
else:
has_parent = True
parent_devices = parent_devices.astype(float)
parent_tests = parent_tests.astype(float)
if (np.any(np.isnan(devices)) or np.any(np.isnan(tests))):
raise ValueError('devices and tests '
'should be non-negative with no np.nan')
if has_parent:
if (np.any(np.isnan(parent_devices))
or np.any(np.isnan(parent_tests))):
raise ValueError('parent devices and parent tests '
'should be non-negative with no np.nan')
if min_obs <= 0:
raise ValueError('min_obs should be positive')
if (pool_days <= 0) or not isinstance(pool_days, int):
raise ValueError('pool_days should be a positive int')
# STEP 0: DO THE TEMPORAL POOLING
tpooled_devices = _slide_window_sum(devices, pool_days)
tpooled_tests = _slide_window_sum(tests, pool_days)
if has_parent:
tpooled_pdevices = _slide_window_sum(parent_devices, pool_days)
tpooled_ptests = _slide_window_sum(parent_tests, pool_days)
borrow_prop = _geographical_pooling(tpooled_tests, tpooled_ptests,
min_obs)
pooled_devices = (tpooled_devices
+ borrow_prop * tpooled_pdevices)
pooled_tests = (tpooled_tests
+ borrow_prop * tpooled_ptests)
else:
pooled_devices = tpooled_devices
pooled_tests = tpooled_tests
## STEP 2: CALCULATE AS THOUGH THEY'RE RAW
return raw_tests_per_device(pooled_devices, pooled_tests, min_obs)
3 changes: 2 additions & 1 deletion quidel_covidtest/delphi_quidel_covidtest/export.py
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,8 @@ def export_csv(df, geo_name, sensor, receiving_dir, start_date, end_date):
t = pd.to_datetime(str(date))
date_short = t.strftime('%Y%m%d')
export_fn = f"{date_short}_{geo_name}_{sensor}.csv"
result_df = df[df["timestamp"] == date][["geo_id", "val", "se", "sample_size"]].dropna()
result_df = df[df["timestamp"] == date][["geo_id", "val", "se", "sample_size"]]
result_df = result_df[result_df["sample_size"].notnull()]
result_df.to_csv(f"{receiving_dir}/{export_fn}",
index=False,
float_format="%.8f")
120 changes: 90 additions & 30 deletions quidel_covidtest/delphi_quidel_covidtest/generate_sensor.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,85 +3,145 @@
Functions to help generate sensor for different geographical levels
"""
import pandas as pd
from .data_tools import fill_dates, raw_positive_prop, smoothed_positive_prop
from .data_tools import (fill_dates, raw_positive_prop,
smoothed_positive_prop,
smoothed_tests_per_device,
raw_tests_per_device)

MIN_OBS = 50 # minimum number of observations in order to compute a proportion.
POOL_DAYS = 7

def generate_sensor_for_states(state_data, smooth, first_date, last_date):
def generate_sensor_for_states(state_groups, smooth, device, first_date, last_date):
"""
fit over states
Args:
state_data: pd.DataFrame
state_groups: pd.groupby.generic.DataFrameGroupBy
state_key: "state_id"
smooth: bool
Consider raw or smooth
device: bool
Consider test_per_device or pct_positive
Returns:
df: pd.DataFrame
"""
state_df = pd.DataFrame(columns=["geo_id", "val", "se", "sample_size", "timestamp"])
state_groups = state_data.groupby("state_id")
state_list = list(state_groups.groups.keys())
for state in state_list:
state_group = state_groups.get_group(state)
state_group = state_group.drop(columns=["state_id"])
state_group.set_index("timestamp", inplace=True)
state_group = fill_dates(state_group, first_date, last_date)

if smooth:
stat, se, sample_size = smoothed_positive_prop(tests=state_group['totalTest'].values,
positives=state_group['positiveTest'].values,
min_obs=MIN_OBS, pool_days=POOL_DAYS)
# smoothed test per device
if device & smooth:
stat, se, sample_size = smoothed_tests_per_device(
devices=state_group["numUniqueDevices"].values,
tests=state_group['totalTest'].values,
min_obs=MIN_OBS, pool_days=POOL_DAYS)
# raw test per device
elif device & (not smooth):
stat, se, sample_size = raw_tests_per_device(
devices=state_group["numUniqueDevices"].values,
tests=state_group['totalTest'].values,
min_obs=MIN_OBS)
# smoothed pct positive
elif (not device) & smooth:
stat, se, sample_size = smoothed_positive_prop(
tests=state_group['totalTest'].values,
positives=state_group['positiveTest'].values,
min_obs=MIN_OBS, pool_days=POOL_DAYS)
stat = stat * 100
# raw pct positive
else:
stat, se, sample_size = raw_positive_prop(tests=state_group['totalTest'].values,
positives=state_group['positiveTest'].values,
min_obs=MIN_OBS)
stat = stat * 100
stat, se, sample_size = raw_positive_prop(
tests=state_group['totalTest'].values,
positives=state_group['positiveTest'].values,
min_obs=MIN_OBS)
stat = stat * 100

se = se * 100
state_df = state_df.append(pd.DataFrame({"geo_id": state,
"timestamp": state_group.index,
"val": stat,
"se": se,
"sample_size": sample_size}))
return state_df, state_groups
return state_df

def generate_sensor_for_other_geores(state_groups, data, res_key, smooth, first_date, last_date):
def generate_sensor_for_other_geores(state_groups, data, res_key, smooth,
device, first_date, last_date):
"""
fit over counties/HRRs/MSAs
Args:
data: pd.DataFrame
res_key: "fips", "cbsa_id" or "hrrnum"
smooth: bool
Consider raw or smooth
device: bool
Consider test_per_device or pct_positive
Returns:
df: pd.DataFrame
"""
has_parent = True
res_df = pd.DataFrame(columns=["geo_id", "val", "se", "sample_size"])
res_groups = data.groupby(res_key)
loc_list = list(res_groups.groups.keys())
for loc in loc_list:
res_group = res_groups.get_group(loc)
parent_state = res_group['state_id'].values[0]
parent_group = state_groups.get_group(parent_state)
res_group = res_group.merge(parent_group, how="left",
on="timestamp", suffixes=('', '_parent'))
res_group = res_group.drop(columns=[res_key, "state_id", "state_id" + '_parent'])
try:
parent_group = state_groups.get_group(parent_state)
res_group = res_group.merge(parent_group, how="left",
on="timestamp", suffixes=('', '_parent'))
res_group = res_group.drop(columns=[res_key, "state_id", "state_id" + '_parent'])
except:
has_parent = False
res_group = res_group.drop(columns=[res_key, "state_id"])
res_group.set_index("timestamp", inplace=True)
res_group = fill_dates(res_group, first_date, last_date)

if smooth:
stat, se, sample_size = smoothed_positive_prop(
tests=res_group['totalTest'].values,
positives=res_group['positiveTest'].values,
min_obs=MIN_OBS, pool_days=POOL_DAYS,
parent_tests=res_group["totalTest_parent"].values,
parent_positives=res_group['positiveTest_parent'].values)
if has_parent:
if device:
stat, se, sample_size = smoothed_tests_per_device(
devices=res_group["numUniqueDevices"].values,
tests=res_group['totalTest'].values,
min_obs=MIN_OBS, pool_days=POOL_DAYS,
parent_devices=res_group["numUniqueDevices_parent"].values,
parent_tests=res_group["totalTest_parent"].values)
else:
stat, se, sample_size = smoothed_positive_prop(
tests=res_group['totalTest'].values,
positives=res_group['positiveTest'].values,
min_obs=MIN_OBS, pool_days=POOL_DAYS,
parent_tests=res_group["totalTest_parent"].values,
parent_positives=res_group['positiveTest_parent'].values)
stat = stat * 100
else:
if device:
stat, se, sample_size = smoothed_tests_per_device(
devices=res_group["numUniqueDevices"].values,
tests=res_group['totalTest'].values,
min_obs=MIN_OBS, pool_days=POOL_DAYS)
else:
stat, se, sample_size = smoothed_positive_prop(
tests=res_group['totalTest'].values,
positives=res_group['positiveTest'].values,
min_obs=MIN_OBS, pool_days=POOL_DAYS)
stat = stat * 100
else:
stat, se, sample_size = raw_positive_prop(
tests=res_group['totalTest'].values,
positives=res_group['positiveTest'].values,
min_obs=MIN_OBS)
stat = stat * 100
se = se * 100
if device:
stat, se, sample_size = raw_tests_per_device(
devices=res_group["numUniqueDevices"].values,
tests=res_group['totalTest'].values,
min_obs=MIN_OBS)
else:
stat, se, sample_size = raw_positive_prop(
tests=res_group['totalTest'].values,
positives=res_group['positiveTest'].values,
min_obs=MIN_OBS)
stat = stat * 100

se = se * 100
res_df = res_df.append(pd.DataFrame({"geo_id": loc,
"timestamp": res_group.index,
"val": stat,
Expand Down
Loading