Skip to content

Conversation

dchigarev
Copy link
Contributor

@dchigarev dchigarev commented Feb 4, 2022

Existing Sample Changes

Description

The notebook is supposed to show the performance improvement after switching from stock Pandas to Modin, it compares the execution times of Pandas and Modin in various scenarios. The previous version of the notebook didn't imply that the Modin could be slower than Pandas anyhow and so confused users if such happened.

The changes in this PR introduce a time verifier, that compares Modin and Pandas execution times and prints a note if Modin appears to be slower with the possible reasons for it (poor CPU performance, old Modin version, silly environment variables).

Fixes Jira MODIN6-135.

External Dependencies

The changes do not add any new dependencies.

Type of change

  • New feature (non-breaking change which adds functionality)
  • Implement fixes for Jiras

How Has This Been Tested?

  • Command Line

As there are no automatic test cases for jupyter notebooks in the repo, the changes have been tested manually:

  1. Create an environment with Modin installed if you don't have one:
conda create -n test_env "python>=3.8.1" -c conda-forge
conda activate test_env
pip install modin[all]
  1. Install requirements for the notebook:
cd path/to/getting/started/notebook
pip install -r requirements.txt
  1. Run the notebook:
ipython IntelModin_GettingStarted.ipynb
  1. Verify that the last line of the output is "[CODE_SAMPLE_COMPLETED_SUCCESFULLY]"

Note: currently, tests in CI are failing due to environment problems, there's a Jira ticket that's supposed to resolve this (REIE-1371). I suppose that the PR is blocked until the CI problems will be resolved.

Would also like to get a review from Modin folks: @Garra1980 @YarShev @vnlitvinov

@dchigarev dchigarev changed the title [MODIN]: Add time results verifier to getting started notebook [MODIN]: Add time results verifier for getting started notebook Feb 4, 2022
Copy link
Contributor

@praveenkk123 praveenkk123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approve for CI

@dchigarev
Copy link
Contributor Author

@praveenkk123, The sample's tests in CI are failing at the importing Modin step. It seems that the failure doesn't correlate with the changes in this PR as it doesn't change any notebook's dependencies.

Moreover, I found that the previous PR to the Modin's samples also had red CI. Do the CI for Modin samples is broken? Is it a known problem?

" \"Current execution is:\"\n",
" )\n",
" try:\n",
" import modin.config as cfg\n",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can use get_current_execution to simplify the code.

@@ -612,6 +656,7 @@
"modin_time = time.time() - t1\n",
"\n",
"print(\"Pandas Time(seconds):\",pandas_time,\"\\nModin Time(seconds):\",modin_time)\n",
"verify_and_print_times(pandas_time, modin_time)\n",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see a return result from this function for read_csv.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where do you expect to see the return result? verify_and_print_times returns void, the function's only effect is that it prints time results on the screen

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see any print from this function in the notebook (when opening ... -> View file).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, that's because I haven't saved the new output of the notebook's cells. Rerunning the notebook with saving the new output would probably change execution times that were printed before, as I would probably be running the notebook on a different version of Modin and on a different machine. @praveenkk123, can I just rerun the notebook on an arbitrary machine and update the execution times? Wouldn't this cause any legal problems?

@dchigarev
Copy link
Contributor Author

PR is currently blocked by Jira REIE-1371, which is supposed to resolve failing CI testing

@dchigarev dchigarev marked this pull request as draft February 16, 2022 19:32
@dchigarev
Copy link
Contributor Author

Converting PR to draft until #863 is merged into master

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants