-
-
Notifications
You must be signed in to change notification settings - Fork 19.3k
PERF: improved performance of CategoricalIndex.is_monotonic* #21025
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
b7f6e04 to
a775186
Compare
|
this hit the same code path; so check this |
a775186 to
3378b3a
Compare
|
Not sure I follow, but these two versions do not follow the same code path, as the old version required creating a new
|
Codecov Report
@@ Coverage Diff @@
## master #21025 +/- ##
=======================================
Coverage 91.83% 91.83%
=======================================
Files 153 153
Lines 49495 49495
=======================================
Hits 45454 45454
Misses 4041 4041
Continue to review full report at Codecov.
|
|
can you add additional tests using strings (and not just integers) in that same test. otherwise lgtm. |
|
do we have sufficient asv's for this? |
1ee1d93 to
6bdbb5d
Compare
|
There were no asv's for this. However, if you run my code snippet above, there is a huge spike in RAM usage, when run in the old version. I've even gotten a few So my ASV is done using only Also, Series.is_monotonic* wasn't added untill 0.19. should that be put in a try/except clause, to avoid failing on older versions of pandas? |
jreback
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor comment on the asv. its ok if it fails under 0.19, that's pretty far back now
asv_bench/benchmarks/categoricals.py
Outdated
| self.c = pd.CategoricalIndex(list('a'*N + 'b'*N + 'c'*N)) | ||
| self.s = pd.Series(self.c) | ||
|
|
||
| def time_categorical_index_is_monotonic(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these shouldn't be in the same asv, you can do this with params I think
6bdbb5d to
6d4aea9
Compare
|
Hello @topper-123! Thanks for updating the PR. Cheers ! There are no PEP8 issues in this Pull Request. 🍻 Comment last updated on May 16, 2018 at 19:04 Hours UTC |
6d4aea9 to
2e34678
Compare
doc/source/whatsnew/v0.23.0.txt
Outdated
| - Improved performance of :func:`pandas.core.groupby.GroupBy.pct_change` (:issue:`19165`) | ||
| - Improved performance of :func:`Series.isin` in the case of categorical dtypes (:issue:`20003`) | ||
| - Improved performance of ``getattr(Series, attr)`` when the Series has certain index types. This manifiested in slow printing of large Series with a ``DatetimeIndex`` (:issue:`19764`) | ||
| - Improved performance of :meth:`CategoricalIndex.is_monotonic_increasing`, :meth:`CategoricalIndex.is_monotonic_decreasing` and :meth:`CategoricalIndex.is_monotonic` (:issue:`21025`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will need to be in 0.23.1 (not yet in repo, soon)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved to 0.23.1.
2e34678 to
c815d62
Compare
|
thanks @topper-123 |
) (cherry picked from commit 1ee5ecf)
(cherry picked from commit 1ee5ecf)
git diff upstream/master -u -- "*.py" | flake8 --diffThere seem to be a few more like this, where
CategoricalIndexshould useself._enginebut doesn't.@TomAugspurger?