Skip to content

Conversation

@krivard
Copy link
Contributor

@krivard krivard commented Jul 9, 2020

This PR combines the changes from #133 and #143, rebased for legibility and to incorporate the changes in meta behavior on main.

krivard and others added 6 commits July 9, 2020 10:14
* Supporting infra in acquisition
* Extend all data insertion test cases to include toy issue and lag
* Test that we can insert a new issue for an extant (source,signal,date,geo)
* Retrieve selected issues or most recent issue in data queries
* Retrieve most recent issue in meta calculation
* Reduce floating-point instability in meta calculation
* All four clients now accept paramters for issues and lag
* Includes client integration tests to verify correct issues are returned

Co-authored-by: Wael H. Al Saeed <[email protected]>
* Include as_of in subquery WHERE clause and API dispatch
* Hookups in all clients
* Include as_of in client integration tests
* Describe as_of (and issues and lag) in API documentation

All tests pass.
* New Database methods:
  - get_all_record_values_of_timeseries_with_potentially_stale_direction: retrieves all records from all timeseries with potentially stale direction, with an option to store result in temporary table.
  - batched_update_direction: updates the direction field of the `covidcast` table, given a single new value, and a list of row ids.
  - drop_temporary_table
  - update_timestamp2_from_temporary_table: takes tmp_table_name as input and updates the timestamp2 field of `covidcast` rows whose id is present in tmp_table_name.
* New direction_updater.py functionality:
  - optimized_update_loop
@krivard
Copy link
Contributor Author

krivard commented Jul 9, 2020

@korlaxxalrok this is the PR we'll want to merge on Migration Day simultaneously with the data loading, since it will not insert a spurious issue for each direction recomputation.

eujing and others added 2 commits July 9, 2020 10:55
Memory and speed optimizations for covidcast script to produce issues from backups
Copy link
Contributor

@korlaxxalrok korlaxxalrok left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@krivard I don't think I should be the final word here, but this looks good to me 👍

Copy link
Contributor

@eujing eujing left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a very minor comment about commented-out code, but not sure if worth the git hassle.

Comment on lines +252 to +256
# is_eq_nan = ts_pot_changed.direction.isnull() & ts_pot_changed.new_direction.isnull()
# is_eq_num = ts_pot_changed.direction == ts_pot_changed.new_direction
# changed_mask = ~(is_eq_nan | is_eq_num)
# ts_changed = ts_pot_changed[changed_mask]

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we remove these if not needed anymore?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, I considered updating only the rows where the value actually changed, it turned out that this increases overall runtime, so I didn't end up using it.

timestamp2s = ts_rows.timestamp2.values.astype(np.int64)

# create a direction classifier for this signal
data_stdev = data_stdevs[source][signal][geo_type]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
data_stdev = data_stdevs[source][signal][geo_type]
data_stdev = data_stdevs[source][signal][geo_type] if (source in data_stdevs) and (signal in data_stdevs[source]) and (geo_type in data_stdevs[source][signal]) else 0

To avoid crashing in case some (source,signal,geo_type) tuple doesn't have data on or before Constants.SIGNAL_STDEV_MAX_DAY.

@AlSaeed
Copy link
Contributor

AlSaeed commented Jul 9, 2020

My only additional note is that get_data_stdev_across_locations in database.py is still computed regardless of issue dates, I am not sure if this is the intended behavior or if it also should be reimplemented to compute standard deviation on only data with latest issue per key.

krivard added 2 commits July 14, 2020 17:03
CoffeeScript 2 produces ES6, which is not ingestible by uglify-js (part of the delphi-epidata deployment machinery). As a workaround, we've dropped back to ECMAScript 5 by passing the resulting javascript file through `babel delphi_epidata.js --presets=@babel/env` to transpile it for this commit.
@korlaxxalrok korlaxxalrok merged commit f55b7ec into cmu-delphi:main Jul 16, 2020
@krivard krivard deleted the feature/data-versioning-and-optimization branch June 5, 2023 17:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants