-
Notifications
You must be signed in to change notification settings - Fork 67
Feature/data versioning and optimization #149
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature/data versioning and optimization #149
Conversation
* Supporting infra in acquisition * Extend all data insertion test cases to include toy issue and lag * Test that we can insert a new issue for an extant (source,signal,date,geo)
* Retrieve selected issues or most recent issue in data queries * Retrieve most recent issue in meta calculation * Reduce floating-point instability in meta calculation * All four clients now accept paramters for issues and lag * Includes client integration tests to verify correct issues are returned Co-authored-by: Wael H. Al Saeed <[email protected]>
* Include as_of in subquery WHERE clause and API dispatch * Hookups in all clients * Include as_of in client integration tests * Describe as_of (and issues and lag) in API documentation All tests pass.
* New Database methods: - get_all_record_values_of_timeseries_with_potentially_stale_direction: retrieves all records from all timeseries with potentially stale direction, with an option to store result in temporary table. - batched_update_direction: updates the direction field of the `covidcast` table, given a single new value, and a list of row ids. - drop_temporary_table - update_timestamp2_from_temporary_table: takes tmp_table_name as input and updates the timestamp2 field of `covidcast` rows whose id is present in tmp_table_name. * New direction_updater.py functionality: - optimized_update_loop
|
@korlaxxalrok this is the PR we'll want to merge on Migration Day simultaneously with the data loading, since it will not insert a spurious issue for each direction recomputation. |
Memory and speed optimizations for covidcast script to produce issues from backups
korlaxxalrok
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@krivard I don't think I should be the final word here, but this looks good to me 👍
eujing
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a very minor comment about commented-out code, but not sure if worth the git hassle.
| # is_eq_nan = ts_pot_changed.direction.isnull() & ts_pot_changed.new_direction.isnull() | ||
| # is_eq_num = ts_pot_changed.direction == ts_pot_changed.new_direction | ||
| # changed_mask = ~(is_eq_nan | is_eq_num) | ||
| # ts_changed = ts_pot_changed[changed_mask] | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we remove these if not needed anymore?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, I considered updating only the rows where the value actually changed, it turned out that this increases overall runtime, so I didn't end up using it.
| timestamp2s = ts_rows.timestamp2.values.astype(np.int64) | ||
|
|
||
| # create a direction classifier for this signal | ||
| data_stdev = data_stdevs[source][signal][geo_type] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| data_stdev = data_stdevs[source][signal][geo_type] | |
| data_stdev = data_stdevs[source][signal][geo_type] if (source in data_stdevs) and (signal in data_stdevs[source]) and (geo_type in data_stdevs[source][signal]) else 0 |
To avoid crashing in case some (source,signal,geo_type) tuple doesn't have data on or before Constants.SIGNAL_STDEV_MAX_DAY.
|
My only additional note is that |
CoffeeScript 2 produces ES6, which is not ingestible by uglify-js (part of the delphi-epidata deployment machinery). As a workaround, we've dropped back to ECMAScript 5 by passing the resulting javascript file through `babel delphi_epidata.js --presets=@babel/env` to transpile it for this commit.
This PR combines the changes from #133 and #143, rebased for legibility and to incorporate the changes in meta behavior on main.