From 5e0d8da88aa60edc48dc6e0ecb378b92087ccce0 Mon Sep 17 00:00:00 2001 From: Alex Reinhart Date: Thu, 9 Sep 2021 10:05:07 -0400 Subject: [PATCH 01/11] Improved survey credits --- docs/symptom-survey/index.md | 26 +++++++++++++++++--------- 1 file changed, 17 insertions(+), 9 deletions(-) diff --git a/docs/symptom-survey/index.md b/docs/symptom-survey/index.md index fb5961d05..1689535ff 100644 --- a/docs/symptom-survey/index.md +++ b/docs/symptom-survey/index.md @@ -30,20 +30,28 @@ If you have questions about the survey or getting access to data, contact us at ## Credits The COVID-19 Trends and Impact Survey (CTIS) is a project of the [Delphi -Group](https://delphi.cmu.edu/) at Carnegie Mellon University. The Principal -Investigator is [Alex Reinhart](https://www.refsmmat.com/); Wichada La -Motte-Kerr is Survey Coordinator. The survey protocol is reviewed by the -Carnegie Mellon University Institutional Review Board. +Group](https://delphi.cmu.edu/) at Carnegie Mellon University. Team members +include: + +* [Alex Reinhart](https://www.refsmmat.com/), Principal Investigator +* Wichada La Motte-Kerr, Survey Coordinator +* Robin Mejia, survey advisor +* Nat DeFries, statistical developer and data engineer +* plus support from many members of the [Delphi + team](https://delphi.cmu.edu/about/team/) + +The survey protocol is reviewed by the Carnegie Mellon University Institutional +Review Board. The support of several institutions makes the survey possible. Facebook supports the survey through recruitment (participants are invited via their News Feed), survey sampling and weighting procedures, technical assistance in survey design and implementation, and coordination with researchers and public health -officials. The University of Maryland's Joint Program in Survey Methodology -conducts an [international version of the survey](https://covidmap.umd.edu/), -and we coordinate closely on survey design and implementation. Delphi collects, -aggregates, and distributes the US survey data, and retains ultimate -responsibility for the US survey instrument and data. +officials. The University of Maryland's Social Data Science Center conducts a +[global version of the survey](https://covidmap.umd.edu/), and we coordinate +closely on survey design and implementation. Delphi collects, aggregates, and +distributes the US survey data, and retains ultimate responsibility for the US +survey instrument and data. We develop the survey collaboratively with data users, public health officials, and others. If you are interested in getting involved, see our From dc29ca2a1b4a0fafb9933b3ecc75db1298d3ad0f Mon Sep 17 00:00:00 2001 From: Alex Reinhart Date: Thu, 9 Sep 2021 10:05:32 -0400 Subject: [PATCH 02/11] Link to the dashboard --- docs/symptom-survey/index.md | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/docs/symptom-survey/index.md b/docs/symptom-survey/index.md index 1689535ff..87dc13e03 100644 --- a/docs/symptom-survey/index.md +++ b/docs/symptom-survey/index.md @@ -13,10 +13,11 @@ social distancing), mental health, and economic and health impacts they have experienced as a result of the pandemic. A high-level overview of the survey is posted [on the COVIDcast website](https://delphi.cmu.edu/covidcast/surveys/). -Geographically aggregated data from this survey is publicly available through -the [COVIDcast API](../api/covidcast.md) as the [`fb-survey` data source](../api/covidcast-signals/fb-survey.md). -Demographic breakdowns of survey data are publicly available as -[downloadable contingency tables](contingency-tables.md). +The [survey results dashboard](https://delphi.cmu.edu/covidcast/survey-results/) +provides a high-level summary of survey results. Geographically aggregated data +from this survey is publicly available through the [COVIDcast API](../api/covidcast.md) +as the [`fb-survey` data source](../api/covidcast-signals/fb-survey.md). Demographic breakdowns of survey +data are publicly available as [downloadable contingency tables](contingency-tables.md). This documentation describes the survey items, data coding, data distribution, and the survey weights computed by Facebook. It also documents the individual From 77104b89e2be1125cf49b4cebb4afa1bd4be549d Mon Sep 17 00:00:00 2001 From: Alex Reinhart Date: Thu, 9 Sep 2021 10:05:54 -0400 Subject: [PATCH 03/11] Add survey limitations page --- docs/symptom-survey/limitations.md | 204 +++++++++++++++++++++++++++++ docs/symptom-survey/problems.md | 4 +- 2 files changed, 206 insertions(+), 2 deletions(-) create mode 100644 docs/symptom-survey/limitations.md diff --git a/docs/symptom-survey/limitations.md b/docs/symptom-survey/limitations.md new file mode 100644 index 000000000..73497a612 --- /dev/null +++ b/docs/symptom-survey/limitations.md @@ -0,0 +1,204 @@ +--- +title: Survey Limitations +parent: COVID-19 Trends and Impact Survey +nav_order: 9 +--- + +# Survey Limitations +{: .no_toc} + +The COVID-19 Trends and Impact Survey (CTIS) is large and provides exceptionally +detailed data; however, it is not perfect, and its design means it is subject to +several crucial limitations. Anyone using the data to make policy decisions or +answer research questions should be aware of these limitations. Given these +limitations, we recommend using the data to: + +- Track changes over time, such as to monitor sudden increases in reported + symptoms or changes in reported vaccination attitudes. +- Make comparisons across space, such as to identify regions with much higher or + lower values. +- Augment data collected from smaller, more rigorously controlled surveys, such + as those that use representative panels and extensive demographic weighting to + reduce bias in estimates + +We do **not** recommend using CTIS data to + +- Make point estimates of population quantities (such as the exact percentage of + people who meet a certain criterion) without reference to other data sources. + Because of sampling, weighting, and response biases, such estimates can be + biased, and standard confidence intervals and hypothesis tests will be + misleading. +- Analyze very small or localized demographic subgroups. Due to the [response + behavior issues](#response-behavior) discussed below, there is measurement + error in the demographic data. Very small demographic groups may + disproportionately include respondents who pick their demographics at random + or attempt to disrupt the survey in other ways, even if those respondents are + rare overall. + +The sections below explain these limitations in more detail. + +## Table of contents +{: .no_toc .text-delta} + +1. TOC +{:toc} + +## The Sample + +Facebook takes a random sample of active adult users every day and invites them +to complete the survey. Taking the survey is voluntary, and only 1-2% of those +users who are invited actually take the survey. This leaves opportunities for +sampling bias, if the sample is construed to represent the US adult population: + +1. **Sampling frame.** The sample is random and maintains similar user + characteristics each day, but it is drawn from adult Facebook active users + who use one of the languages the survey is translated into: English [American + and British], Spanish [Spain and Latin American], French, Brazilian + Portuguese, Vietnamese, and simplified Chinese. This is not the United States + population as a whole. While [most American adults use + Facebook](https://www.pewresearch.org/internet/2021/04/07/social-media-use-in-2021/) + and the available languages are more comprehensive than for many public + health surveys, "most" is not the same as "all", and some demographic groups + may be poorly represented in the Facebook sample. +2. **Non-response bias.** Only a small fraction of invited users choose to take + the survey when they are invited. If their decision on whether to take the + survey is random, this is not a problem. However, their decision to take the + survey may be correlated with other factors---such as their level of concern + about COVID-19 or their trust of academic researchers. If that is the case, + the sample will disproportionately contain people with certain attitudes and + beliefs. + +Facebook calculates [survey weights](weights.md) (see below) that are intended +to help correct for these issues. The weights adjust the age and gender +distribution of the respondents to match Census data, and adjust for +non-response by using a model for the probability of any user to click on the +survey link. However, if that non-response model is not perfect (for example, +non-response varies with respondent attributes not included in the model), or if +the Facebook population differs from the US population on more features than +just age and gender, the weights will not account for all sampling biases. For +example, analyses of weighted survey data shows demographics relatively similar +to the US population, with slightly higher levels of education and a smaller +proportion of non-white respondents; however, comparisons of self-reported +vaccination rates of survey respondents with CDC US population benchmarks +indicate that CTIS respondents are more likely to be vaccinated than the general +population. + +We do, however, expect that any sampling biases will remain relatively +consistent over time, allowing us to make reliable comparisons over time (such +as noting an increase or decrease in vaccination rates or vaccine intent) even +if the point estimates are consistently biased. This is a common issue with +self-reported data; for example, surveys on illegal drug use expect +under-reporting (as they ask about an illegal activity) but are commonly used to +make comparisons between groups or over time. + +Also, Facebook's sampling process allows users to be invited to the survey +repeatedly. A user will only be reinvited at least thirty days after their +previous invitation. Because respondents are anonymous and we do not receive any +unique identifiers, responses from the same user are not linked in any way. +Analysts must be aware that when working with responses submitted more than a +month apart, some responses may be from the same users. + +## Weighting + +It is important to **read the [weights documentation](weights.md)** to +understand how Facebook calculates survey weights and what they account for. +There are some key limitations: + +1. Because we do not receive Facebook profile data and Facebook does not receive + survey response data, the weights are based only on attributes in Facebook + profiles, *not* on demographics reported in response to survey questions. For + example, if a respondent's Facebook profile says they are 35 years old and + live in Delaware, but on the survey they respond that they are 45 years old + and live in Maryland, the weight will be calculated based on the profile + information and reflect the Delaware location. This causes measurement error + in the weights. +2. Similarly, the non-response model used by Facebook only uses information + available to Facebook, such as profile information. As discussed above, if + this model is not perfect, for example if factors not included in the model + that affect non-response, the weights will not fully account for this + non-response bias. +3. Facebook only invites users who it believes reside in the 50 states or + Washington, DC. (Puerto Rico is sampled separately as part of the + [international version of the survey](https://covidmap.umd.edu/).) If + Facebook believes a user qualifies, but the user then replies that they live + in Puerto Rico or another US territory, their weight will be incorrect. + Starting in September 2021, these responses are not included in any + microdata. + +## Response Behavior + +Survey scientists have long known that humans do not always provide complete and +truthful responses to questions about their attributes, beliefs, and behaviors. +There are two primary reasons CTIS responses may be suspect. + +First is **social desirability bias.** As with all self-report measurements, +survey respondents may give responses consistent with what they believe is +socially desirable, because they feel pressured to fit social norms. For +example, if someone lives in an area where masks are widely used and seen as +essential, they may report that they wear their mask most or all of the time +when in public, even if they don't. While this effect is likely smaller on an +anonymous online survey than in an in-person interview, it could still be +present. + +The second problem is deliberate trolling. While intentional mis-reporting is +always a possibility when users provide self-report data, it is a particular +concern for a large, online survey on a controversial topic offered through a +large social media platform. It appears that the vast majority of CTIS +respondents complete the survey in good faith; however, we occasionally receive +emails from survey respondents gloating that they have deliberately provided +false responses to the survey, usually because they believe the COVID-19 +pandemic is a conspiracy or that scientists are suppressing key information. + +We have also observed problematic behavior in a specific subset of respondents. +While less than 1% of respondents opt to self-describe their own gender, a large +percentage of respondents who do choose that option provide a description that +is actually a protest against the question or the survey; for example, making +trans-phobic comments or [reporting their gender identification as “Attack +Helicopter”](https://knowyourmeme.com/memes/i-sexually-identify-as-an-attack-helicopter). +Additionally, these respondents disproportionately select specific demographic +groups, such as having a PhD, being over age 75, and being Hispanic, all at +rates far exceeding their overall presence in the US population, suggesting that +people who want to disrupt the survey also pick on specific groups to troll. + +(Note that if a respondent completes the survey multiple times, or shares their +unique link with friends to take it, only the first response is counted; this +limits the impact of deliberate trolling.) + +For overall estimates, trolling is not expected to impact results in a +meaningful way. However, given the concentration of trolls in small demographic +groups, users interested in comparisons of small demographic groups should +examine a sample of the raw data. For example, if you are interested in +responses from Hispanic adults over age 65, examine the other demographic +variables for this group of respondents to ensure they appear to match what you +would expect and do not appear influenced by respondents who give deliberately +strange answers. + +Importantly, weights cannot correct for trolling behavior. Users can either note +any concerns they have when reporting for small groups, or they may choose to +analyze the data without a suspect group. We are continuing to evaluate trolling +and will provide updates if new patterns appear. + +## Missing Data + +Some survey respondents do not complete the entire survey. This could be because +they get impatient with it, because they do not want to respond to questions +about specific topics, or simply because they are responding to the survey +during a quick break or while waiting in line at Starbucks. (Remember, Facebook +users see the invitation when they're browsing the Facebook news feed, which +could be any time someone might pull out their phone and check Facebook.) + +As a result, questions that appear later in the survey, including demographics, +can be blank in 10-20% of survey responses. Similar to overall non-response, +this is an issue when such behavior does not occur at random relative to the +questions you are analyzing; for example, if individuals who are particularly +concerned about COVID-19 are more likely to take the time to finish the survey. + +Also, the CTIS survey instrument is deliberately designed so that most items are +optional---Qualtrics will not attempt to force respondents to answer questions +that they leave blank. This allows respondents to leave an item blank if they +prefer not to answer it, rather than entering a nonsense answer. This can lead +to missingness in the middle of the survey, even among respondents who answer +later questions. As noted above, this missingness is almost certainly not at +random. Data users should examine and report the missingness in the questions +they use. Imputation methods are an option; users should consider whether the +assumptions of imputation models appear to be met for the data. diff --git a/docs/symptom-survey/problems.md b/docs/symptom-survey/problems.md index 0ddbe5fc6..4b612c631 100644 --- a/docs/symptom-survey/problems.md +++ b/docs/symptom-survey/problems.md @@ -1,10 +1,10 @@ --- -title: Problems and Data Errors +title: Data and Sampling Errors parent: COVID-19 Trends and Impact Survey nav_order: 8 --- -# Problems and Data Errors +# Data and Sampling Errors {: .no_toc} Given the scale of the COVID-19 Trends and Impact Survey (CTIS), we occasionally From a4de80234e8a81b383a90ec4b2749499d4e3aed4 Mon Sep 17 00:00:00 2001 From: Alex Reinhart Date: Thu, 9 Sep 2021 10:10:37 -0400 Subject: [PATCH 04/11] Fix page title --- docs/symptom-survey/modules.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/symptom-survey/modules.md b/docs/symptom-survey/modules.md index b80118f30..a13c550ed 100644 --- a/docs/symptom-survey/modules.md +++ b/docs/symptom-survey/modules.md @@ -4,7 +4,7 @@ parent: COVID-19 Trends and Impact Survey nav_order: 7 --- -# Questions and Coding +# Survey Modules & Randomization {: .no_toc} To reduce the overall length of the instrument and minimize response burden, From c7e133a0be7c0b574e179dc5109eb9811c007073 Mon Sep 17 00:00:00 2001 From: Alex Reinhart Date: Thu, 9 Sep 2021 10:10:45 -0400 Subject: [PATCH 05/11] Link to weighting limitations section --- docs/symptom-survey/limitations.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/docs/symptom-survey/limitations.md b/docs/symptom-survey/limitations.md index 73497a612..a498a57cf 100644 --- a/docs/symptom-survey/limitations.md +++ b/docs/symptom-survey/limitations.md @@ -68,9 +68,9 @@ sampling bias, if the sample is construed to represent the US adult population: the sample will disproportionately contain people with certain attitudes and beliefs. -Facebook calculates [survey weights](weights.md) (see below) that are intended -to help correct for these issues. The weights adjust the age and gender -distribution of the respondents to match Census data, and adjust for +Facebook calculates [survey weights](weights.md) ([see below](#weighting)) that +are intended to help correct for these issues. The weights adjust the age and +gender distribution of the respondents to match Census data, and adjust for non-response by using a model for the probability of any user to click on the survey link. However, if that non-response model is not perfect (for example, non-response varies with respondent attributes not included in the model), or if From 48a6695fc13f0b7500c38a2210d03ce303792589 Mon Sep 17 00:00:00 2001 From: Alex Reinhart Date: Thu, 9 Sep 2021 10:12:07 -0400 Subject: [PATCH 06/11] Link to the limitations page from other useful places --- docs/symptom-survey/data-access.md | 2 ++ docs/symptom-survey/weights.md | 5 ++++- 2 files changed, 6 insertions(+), 1 deletion(-) diff --git a/docs/symptom-survey/data-access.md b/docs/symptom-survey/data-access.md index a0fa103e4..215f7ec3c 100644 --- a/docs/symptom-survey/data-access.md +++ b/docs/symptom-survey/data-access.md @@ -34,6 +34,8 @@ University Institutional Review Board with IRB ID STUDY2020_00000162. Some important notes about obtaining access to the individual survey responses: +* You should be familiar with the [survey's limitations](limitations.md) and + ensure the survey data is suitable for your research goals. * Your research purpose must be consistent with the consent language used in [Wave 1 of the survey](coding.md#wave-1), which states the responses may be used to create "a better public health understanding of where the coronavirus diff --git a/docs/symptom-survey/weights.md b/docs/symptom-survey/weights.md index a65252189..162c60c04 100644 --- a/docs/symptom-survey/weights.md +++ b/docs/symptom-survey/weights.md @@ -15,4 +15,7 @@ by Facebook. These weights are also used to produce our Facebook has provided documentation to describe the calculation and usage of these weights, [available here](symptom-survey-weights.pdf). This documentation explains the weight methodology, gives examples of how to use the weights when -calculating estimates, and states the known limitations of the weights. +calculating estimates, and states the known limitations of the weights. We also +have separate information about the [survey's limitations](limitations.md), +including limitations in the weights, that affect what conclusions can be drawn +from the survey data. From edb3a07f30827fb718bff1f901a63a09c3d4e683 Mon Sep 17 00:00:00 2001 From: Alex Reinhart Date: Thu, 9 Sep 2021 14:42:14 -0400 Subject: [PATCH 07/11] Update docs/symptom-survey/limitations.md Co-authored-by: Katie Mazaitis --- docs/symptom-survey/limitations.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/symptom-survey/limitations.md b/docs/symptom-survey/limitations.md index a498a57cf..909f8a8b1 100644 --- a/docs/symptom-survey/limitations.md +++ b/docs/symptom-survey/limitations.md @@ -115,7 +115,7 @@ There are some key limitations: 2. Similarly, the non-response model used by Facebook only uses information available to Facebook, such as profile information. As discussed above, if this model is not perfect, for example if factors not included in the model - that affect non-response, the weights will not fully account for this + affect non-response, the weights will not fully account for this non-response bias. 3. Facebook only invites users who it believes reside in the 50 states or Washington, DC. (Puerto Rico is sampled separately as part of the From 7e6cdacd19ad251eed31326eabf3ee4362864b66 Mon Sep 17 00:00:00 2001 From: Alex Reinhart Date: Thu, 9 Sep 2021 14:42:42 -0400 Subject: [PATCH 08/11] Suggested changes --- docs/symptom-survey/limitations.md | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/docs/symptom-survey/limitations.md b/docs/symptom-survey/limitations.md index 909f8a8b1..6bd817b6e 100644 --- a/docs/symptom-survey/limitations.md +++ b/docs/symptom-survey/limitations.md @@ -17,9 +17,11 @@ limitations, we recommend using the data to: symptoms or changes in reported vaccination attitudes. - Make comparisons across space, such as to identify regions with much higher or lower values. -- Augment data collected from smaller, more rigorously controlled surveys, such - as those that use representative panels and extensive demographic weighting to - reduce bias in estimates +- Make comparisons between groups, such as between occupational or age groups, + keeping in mind any [sample limitations](#the-sample) that might affect these + comparisons. +- Augment data collected from other sources, such as more rigorously controlled + surveys with high response rates. We do **not** recommend using CTIS data to From 4bcd1dcee0d2378c072f1fafe21164661c2809a1 Mon Sep 17 00:00:00 2001 From: Alex Reinhart Date: Thu, 9 Sep 2021 14:44:07 -0400 Subject: [PATCH 09/11] Copyedits from review --- docs/symptom-survey/limitations.md | 7 ++++--- docs/symptom-survey/weights.md | 5 ++--- 2 files changed, 6 insertions(+), 6 deletions(-) diff --git a/docs/symptom-survey/limitations.md b/docs/symptom-survey/limitations.md index 6bd817b6e..b3790f543 100644 --- a/docs/symptom-survey/limitations.md +++ b/docs/symptom-survey/limitations.md @@ -162,9 +162,10 @@ groups, such as having a PhD, being over age 75, and being Hispanic, all at rates far exceeding their overall presence in the US population, suggesting that people who want to disrupt the survey also pick on specific groups to troll. -(Note that if a respondent completes the survey multiple times, or shares their -unique link with friends to take it, only the first response is counted; this -limits the impact of deliberate trolling.) +(Note that if a respondent is invited once but completes the survey multiple +times, or shares their unique link with friends to take it, only the first +response is counted; this limits the impact of deliberate trolling. If the +respondent is sampled and invited again later, they receive a new unique link.) For overall estimates, trolling is not expected to impact results in a meaningful way. However, given the concentration of trolls in small demographic diff --git a/docs/symptom-survey/weights.md b/docs/symptom-survey/weights.md index 162c60c04..ccaf144a7 100644 --- a/docs/symptom-survey/weights.md +++ b/docs/symptom-survey/weights.md @@ -16,6 +16,5 @@ Facebook has provided documentation to describe the calculation and usage of these weights, [available here](symptom-survey-weights.pdf). This documentation explains the weight methodology, gives examples of how to use the weights when calculating estimates, and states the known limitations of the weights. We also -have separate information about the [survey's limitations](limitations.md), -including limitations in the weights, that affect what conclusions can be drawn -from the survey data. +have separate information about the [survey's limitations](limitations.md) that +affect what conclusions can be drawn from the survey data. From 58e8dd79df27d819cea472cc6c7675e6762525a8 Mon Sep 17 00:00:00 2001 From: Alex Reinhart Date: Fri, 10 Sep 2021 10:00:27 -0400 Subject: [PATCH 10/11] Define "adult" --- docs/symptom-survey/limitations.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/docs/symptom-survey/limitations.md b/docs/symptom-survey/limitations.md index b3790f543..0d5be4d9f 100644 --- a/docs/symptom-survey/limitations.md +++ b/docs/symptom-survey/limitations.md @@ -48,7 +48,8 @@ The sections below explain these limitations in more detail. ## The Sample Facebook takes a random sample of active adult users every day and invites them -to complete the survey. Taking the survey is voluntary, and only 1-2% of those +to complete the survey. ("Adult" means the user has indicated they are least 18 +years old in their profile.) Taking the survey is voluntary, and only 1-2% of those users who are invited actually take the survey. This leaves opportunities for sampling bias, if the sample is construed to represent the US adult population: From d7a3c17bae077ba9c8c282b0be8bb9c2673eda31 Mon Sep 17 00:00:00 2001 From: Alex Reinhart Date: Fri, 10 Sep 2021 10:48:42 -0400 Subject: [PATCH 11/11] Additional copyedit --- docs/symptom-survey/limitations.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/docs/symptom-survey/limitations.md b/docs/symptom-survey/limitations.md index 0d5be4d9f..42d995864 100644 --- a/docs/symptom-survey/limitations.md +++ b/docs/symptom-survey/limitations.md @@ -7,10 +7,10 @@ nav_order: 9 # Survey Limitations {: .no_toc} -The COVID-19 Trends and Impact Survey (CTIS) is large and provides exceptionally -detailed data; however, it is not perfect, and its design means it is subject to -several crucial limitations. Anyone using the data to make policy decisions or -answer research questions should be aware of these limitations. Given these +The COVID-19 Trends and Impact Survey (CTIS) gathers large amounts of detailed +data; however, it is not perfect, and its design means it is subject to several +crucial limitations. Anyone using the data to make policy decisions or answer +research questions should be aware of these limitations. Given these limitations, we recommend using the data to: - Track changes over time, such as to monitor sudden increases in reported