-
Notifications
You must be signed in to change notification settings - Fork 727
[MODIN]: Add time results verifier for getting started notebook #843
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Dmitry Chigarev <[email protected]>
Signed-off-by: Dmitry Chigarev <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approve for CI
@praveenkk123, The sample's tests in CI are failing at the importing Modin step. It seems that the failure doesn't correlate with the changes in this PR as it doesn't change any notebook's dependencies. Moreover, I found that the previous PR to the Modin's samples also had red CI. Do the CI for Modin samples is broken? Is it a known problem? |
" \"Current execution is:\"\n", | ||
" )\n", | ||
" try:\n", | ||
" import modin.config as cfg\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can use get_current_execution
to simplify the code.
@@ -612,6 +656,7 @@ | |||
"modin_time = time.time() - t1\n", | |||
"\n", | |||
"print(\"Pandas Time(seconds):\",pandas_time,\"\\nModin Time(seconds):\",modin_time)\n", | |||
"verify_and_print_times(pandas_time, modin_time)\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see a return result from this function for read_csv.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where do you expect to see the return result? verify_and_print_times
returns void, the function's only effect is that it prints time results on the screen
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see any print from this function in the notebook (when opening ...
-> View file
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, that's because I haven't saved the new output of the notebook's cells. Rerunning the notebook with saving the new output would probably change execution times that were printed before, as I would probably be running the notebook on a different version of Modin and on a different machine. @praveenkk123, can I just rerun the notebook on an arbitrary machine and update the execution times? Wouldn't this cause any legal problems?
PR is currently blocked by Jira REIE-1371, which is supposed to resolve failing CI testing |
Converting PR to draft until #863 is merged into master |
Existing Sample Changes
Description
The notebook is supposed to show the performance improvement after switching from stock Pandas to Modin, it compares the execution times of Pandas and Modin in various scenarios. The previous version of the notebook didn't imply that the Modin could be slower than Pandas anyhow and so confused users if such happened.
The changes in this PR introduce a time verifier, that compares Modin and Pandas execution times and prints a note if Modin appears to be slower with the possible reasons for it (poor CPU performance, old Modin version, silly environment variables).
Fixes Jira MODIN6-135.
External Dependencies
The changes do not add any new dependencies.
Type of change
How Has This Been Tested?
As there are no automatic test cases for jupyter notebooks in the repo, the changes have been tested manually:
Note: currently, tests in CI are failing due to environment problems, there's a Jira ticket that's supposed to resolve this (REIE-1371). I suppose that the PR is blocked until the CI problems will be resolved.
Would also like to get a review from Modin folks: @Garra1980 @YarShev @vnlitvinov