Skip to content

Commit 0b5faa6

Browse files
committed
Rebuild with changes
1 parent 7ed3cc9 commit 0b5faa6

File tree

2 files changed

+5
-5
lines changed

2 files changed

+5
-5
lines changed

content/blog/2023-04-10-is-latest-patch.Rmd

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -6,8 +6,8 @@ tags:
66
- COVIDcast
77
authors:
88
- nolan
9-
heroImage: blog-is-latest-numbers-thumb.jpeg
10-
heroImageThumb: blog-is-latest-numbers.jpeg
9+
heroImage: blog-is-latest-numbers.jpeg
10+
heroImageThumb: blog-is-latest-numbers-thumb.jpeg
1111
summary: |
1212
In August 2022, the Delphi team discovered a fault in the data that we were sending through the COVIDcast API. The fault caused the API to return past versions of data as if they were actually the latest version of that requested data. The extent of this fault included all of the signals from Johns Hopkins University Center for Systems Science and Engineering (JHU-CSSE) for the months of February 2020 - October 2021. About 12 million data points were found to be faulty, which made up about 20% of the data available from JHU-CSSE as of the time of the fix. This was patched on September 28th, 2022 and the API is now returning the correct version of the data.
1313

content/blog/2023-04-10-is-latest-patch.html

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -33,7 +33,7 @@ <h2>What went wrong?</h2>
3333
</div>
3434
<div id="how-did-we-identify-this" class="section level2">
3535
<h2>How did we identify this?</h2>
36-
<p>Data faults like this are difficult to identify. In this case, this fault was found by accident while a member of the Delphi team was working on a new system to calculate metadata. During this, they found that <a href="https://github.com/cmu-delphi/covidcast-indicators/issues/1685">some of the JHU-CSSE data was not matching up and looked deeper into it</a>. The team’s analysis identified 11,987,335 rows that were labeled as the latest issue in which had more recent issues in the database; this constituted about 20% of our JHU-CSSE data at the time.</p>
36+
<p>Data faults like this are difficult to identify. In this case, this fault was found by accident while a member of the Delphi team was working on a new system to calculate metadata. During this, they found that <a href="https://github.com/cmu-delphi/covidcast-indicators/issues/1685">some of the JHU-CSSE data was not matching up and looked deeper into it</a>. The team’s analysis identified 11,987,335 rows that were labeled as the latest issue but which had more recent issues in the database; this constituted about 20% of our JHU-CSSE data at the time.</p>
3737
</div>
3838
<div id="how-did-we-fix-it" class="section level2">
3939
<h2>How did we fix it?</h2>
@@ -50,7 +50,7 @@ <h2>What did we learn?</h2>
5050
<div class="footnotes footnotes-end-of-document">
5151
<hr />
5252
<ol>
53-
<li id="fn1"><p>Much of the data that is used to create the COVIDCast API is not complete the first day that it is reported. For instance, COVID cases for a specific day will change for many days to weeks afterwards as the reporting source revises its data. Because of this, we store many different versions of the same reference day for each signal. Usually, our users are most interested in the most recent version of the data. In our previous version of Epidata, version 3, we kept a statically set flag in our table to delineate the latest version of a certain row of data. This flag was set when we ingested a new version of said data. This workflow was very prone to data faults when patching the database outside of the acquisition pipeline (that typically sets the flag). For more information, <a href="https://delphi.cmu.edu/blog/2022/12/14/introducing-epidata-v4/">check out our blog post on Epidata version 4</a>.<a href="#fnref1" class="footnote-back">↩︎</a></p></li>
54-
<li id="fn2"><p>A patch, in this context, is a set of data that matches to a database that contains incorrect information. The patch contains the keys to find these rows and update them with the correct information.<a href="#fnref2" class="footnote-back">↩︎</a></p></li>
53+
<li id="fn1"><p>Much of the data that is used to create the COVIDCast API is not complete the first day that it is reported. For instance, COVID cases for a specific day will change for many days to weeks afterwards as the reporting source revises its data. Because of this, we store many different versions of the same reference day for each signal. Usually, our users are most interested in the most recent version of the data. In our previous version of Epidata, version 3, we kept a statically set flag in our table to delineate the latest version of a certain row of data. This flag was set when we ingested a new version of said data. This workflow was very prone to data faults when patching the database outside of the daily acquisition pipeline that typically set the flag. For more information on how we eliminated this shortcoming in the new database, <a href="https://delphi.cmu.edu/blog/2022/12/14/introducing-epidata-v4/">check out our blog post on Epidata version 4</a>.<a href="#fnref1" class="footnote-back">↩︎</a></p></li>
54+
<li id="fn2"><p>A patch, in this context, is a way to fix a database that contains incorrect information. The patch contains the keys to find the faulty rows and update them with the correct information.<a href="#fnref2" class="footnote-back">↩︎</a></p></li>
5555
</ol>
5656
</div>

0 commit comments

Comments
 (0)