[Refactor] Clarify a few `csv_importer` types and names #949

dshemetov · 2022-08-02T22:00:24Z

A few small refactors for clarity that I couldn't help myself in making while working on #947.

Most of the changes are purely for better readability. No functionality is altered, except for the interface of load_csv, but this should not affect actual usage.

Changes:

rename RowValues to CsvRowValue
make CsvRowValue into a dataclass
remove pandas optional argument from load_csv that was only used for test mocks
use unittest.patch to mock test instead
update other tests
update csv_to_database with a few relative imports

Further details:

I changed

# From
def load_csv(filepath, geo_type, pandas=pandas):

# To
def load_csv(filepath, geo_type):

where pandas was just an abstract argument to the Pandas library meant for mocking. Instead of complicating this function's interface, we can just use @patch("pandas.read_csv") in the tests, which automatically mocks any calls to that function with our desired mock object.

Prerequisites:

Unless it is a documentation hotfix it should be merged against the dev branch
Branch is up-to-date with the branch to be merged with, i.e. dev
Build is successful
Code is cleaned up and formatted

dshemetov · 2022-08-02T22:24:34Z

Problem

CI is failing. I can repro on my local machine by running the same command CI runs

docker run -i --rm --network delphi-net \
		--env "SQLALCHEMY_DATABASE_URI=mysql+mysqldb://user:pass@delphi_database_epidata:3306/epidata" \
		--env "FLASK_SECRET=abc" \
		delphi_web_python python -m pytest --import-mode importlib repos/delphi/delphi-epidata/tests/

wherein it complains about the relative imports I use

_ ERROR collecting repos/delphi/delphi-epidata/tests/acquisition/covidcast/test_csv_importer.py _
ImportError while importing test module '/usr/src/app/repos/delphi/delphi-epidata/tests/acquisition/covidcast/test_csv_importer.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
repos/delphi/delphi-epidata/tests/acquisition/covidcast/test_csv_importer.py:16: in <module>
    from ....src.acquisition.covidcast.csv_importer import CsvImporter, CsvRowValue
E   ModuleNotFoundError: No module named 'repos.delphi.delphi-epidata.src'

But I didn't run into this issue locally when running the same command with a volume mount

docker run -i --rm --network delphi-net \
		--mount type=bind,source=$(CWD)repos/delphi/delphi-epidata,target=/usr/src/app/repos/delphi/delphi-epidata,readonly \
		--mount type=bind,source=$(CWD)repos/delphi/delphi-epidata/src,target=/usr/src/app/delphi/epidata,readonly \
		--env "SQLALCHEMY_DATABASE_URI=mysql+mysqldb://user:pass@delphi_database_epidata:3306/epidata" \
		--env "FLASK_SECRET=abc" \
		delphi_web_python python -m pytest --import-mode importlib repos/delphi/delphi-epidata/tests/

How does the volume mount remove this issue? 🤔 ~~It's not changing the paths at all, only updating the contents.~~

Answer

UPDATE: so it appears that we're adding a new source directory with the volume mount that is usually not present.

# CI Version
> docker run -i --rm -t delphi_web_python ls repos/delphi/delphi-epidata/
Jenkinsfile  deploy.json  integrations       package.json          scripts
LICENSE      dev          labels             pyproject.toml        tasks.py
README.md    devops       mypy.ini           requirements.dev.txt  testdata
build.js     docs         package-lock.json  requirements.txt      tests

# Mount Version
> docker run -i --rm --network delphi-net \
                --mount type=bind,source=/home/dskel/Documents/Code/Delphi/delphi-epidata-dev/driver/repos/delphi/delphi-epidata,target=/usr/src/app/repos/delphi/delphi-epidata,readonly \
                --mount type=bind,source=/home/dskel/Documents/Code/Delphi/delphi-epidata-dev/driver/repos/delphi/delphi-epidata/src,target=/usr/src/app/delphi/epidata,readonly \
                -t delphi_web_python ls repos/delphi/delphi-epidata/
Jenkinsfile  dev           mypy.ini              requirements.txt  tests
LICENSE      devops        package-lock.json     scripts
README.md    docs          package.json          src
build.js     integrations  pyproject.toml        tasks.py
deploy.json  labels        requirements.dev.txt  testdata

So in the CI version, the src directory is not there, which is what my relative imports rely on.

Future Work

Can we just add that directory to our Docker image instructions? Looking at the CI file, we can see that the setup.sh script here actually moves the src directories from all the repos over. We could fix this by changing the mv command to a cp command.

I'm going to leave this alone for now. There are many ways to "fix" our directory issues there and I'm not ready to do an exhaustive review of all our options for fixing that at this point (maybe relative imports, maybe directory restructuring, maybe renaming delphi-epidata to something without a hypen, etc.).

Resolution

So I'm just going to let this go, focus on the JIT meta computation priority, and revert all the relative directory imports.

* rename RowValues to CsvRowValue * make CsvRowValue into a dataclass * remove argument in load_csv only used for test mocks; correctly mock test * update tests

krivard

Gorgeous change, I'll accept the long line lengths as a necessary evil if I must

krivard · 2023-01-20T15:47:35Z

src/acquisition/covidcast/csv_importer.py

  MIN_YEAR = 2019
  MAX_YEAR = 2030

+  # The datatypes expected by pandas.read_csv. Int64 is like float in that it can handle both numbers and nans.


krivard · 2023-01-20T15:49:27Z

src/acquisition/covidcast/csv_importer.py

-      missing_value, missing_stderr, missing_sample_size
-    )
-    return (row_values, None)
+    return (CsvRowValue(geo_id, value, stderr, sample_size, missing_value, missing_stderr, missing_sample_size), None)


I find this harder to read but I will hold my nose and accept it as the way of the future

krivard · 2023-01-20T16:05:42Z

tests/acquisition/covidcast/test_csv_to_database.py


-from delphi.epidata.acquisition.covidcast.csv_to_database import get_argument_parser, main, \
-  collect_files, upload_archive, make_handlers
+from delphi.epidata.acquisition.covidcast.csv_to_database import get_argument_parser, main, collect_files, upload_archive, make_handlers


Ugh same, I don't like it but it's fine

dshemetov · 2023-01-20T22:47:44Z

I guess I have to admit it... I'm a long-line lover. There are dozens of us. Dozens!

dshemetov force-pushed the ds/refactor-csv-importer branch from b5384d7 to 7c111f5 Compare August 2, 2022 22:28

dshemetov mentioned this pull request Aug 2, 2022

[JIT]: just-in-time series computations #646

Draft

27 tasks

dshemetov changed the title ~~[Refactor] Refactor csv_importer~~ [Refactor] Clarify a few csv_importer types and names Aug 2, 2022

dshemetov mentioned this pull request Aug 3, 2022

Infra refactor: convert acquisition to an installable package #939

Open

dshemetov added refactor Substantial projects to make the code do the same thing, better. enhancement good first issue and removed enhancement labels Aug 23, 2022

This was referenced Aug 30, 2022

Get "secrets" from something other than the operations repo #965

Open

Refactor usage of mock in csv_to_database.py #966

Closed

dshemetov mentioned this pull request Dec 5, 2022

Add CovidcastRow testing util and a few other changes #1044

Merged

5 tasks

dshemetov requested a review from krivard January 19, 2023 21:57

Acquisition: update csv_importer

87220a0

* rename RowValues to CsvRowValue * make CsvRowValue into a dataclass * remove argument in load_csv only used for test mocks; correctly mock test * update tests

dshemetov force-pushed the ds/refactor-csv-importer branch from 9f60604 to 87220a0 Compare January 19, 2023 22:10

krivard approved these changes Jan 20, 2023

View reviewed changes

dshemetov merged commit 54722a4 into dev Jan 20, 2023

dshemetov deleted the ds/refactor-csv-importer branch January 20, 2023 22:48

dshemetov mentioned this pull request Jan 21, 2023

[Refactor] Further cleans of csv_importer #1072

Merged

4 tasks

krivard mentioned this pull request Feb 9, 2023

Release Delphi Epidata 0.4.6 #1081

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Refactor] Clarify a few `csv_importer` types and names #949

[Refactor] Clarify a few `csv_importer` types and names #949

Uh oh!

dshemetov commented Aug 2, 2022 •

edited

Loading

Uh oh!

dshemetov commented Aug 2, 2022 •

edited

Loading

Uh oh!

krivard left a comment

Uh oh!

krivard Jan 20, 2023

Uh oh!

krivard Jan 20, 2023

Uh oh!

krivard Jan 20, 2023

Uh oh!

dshemetov commented Jan 20, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[Refactor] Clarify a few csv_importer types and names #949

[Refactor] Clarify a few csv_importer types and names #949

Uh oh!

Conversation

dshemetov commented Aug 2, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes:

Further details:

Uh oh!

dshemetov commented Aug 2, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Answer

Future Work

Resolution

Uh oh!

krivard left a comment

Choose a reason for hiding this comment

Uh oh!

krivard Jan 20, 2023

Choose a reason for hiding this comment

Uh oh!

krivard Jan 20, 2023

Choose a reason for hiding this comment

Uh oh!

krivard Jan 20, 2023

Choose a reason for hiding this comment

Uh oh!

dshemetov commented Jan 20, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[Refactor] Clarify a few `csv_importer` types and names #949

[Refactor] Clarify a few `csv_importer` types and names #949

dshemetov commented Aug 2, 2022 •

edited

Loading

dshemetov commented Aug 2, 2022 •

edited

Loading