-
-
Notifications
You must be signed in to change notification settings - Fork 19.3k
PERF: faster groupby diff #45575
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PERF: faster groupby diff #45575
Changes from 3 commits
6edfb83
97d65ea
3badfad
1c730a0
760e13e
84a4f8b
3d5ca7a
c4459b4
c6668fc
90a7b20
8e990c0
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -417,6 +417,7 @@ class GroupByMethods: | |
| "cumprod", | ||
| "cumsum", | ||
| "describe", | ||
| "diff", | ||
| "ffill", | ||
| "first", | ||
| "head", | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -3484,6 +3484,47 @@ def shift(self, periods=1, freq=None, axis=0, fill_value=None): | |
| ) | ||
| return res | ||
|
|
||
| @final | ||
| @Substitution(name="groupby") | ||
| @Appender(_common_see_also) | ||
| def diff(self, periods=1, axis=0): | ||
| """ | ||
| First discrete difference of element. | ||
|
|
||
| Calculates the difference of each element compared with another | ||
| element in the group (default is element in previous row). | ||
|
|
||
| Parameters | ||
| ---------- | ||
| periods : int, default 1 | ||
| Periods to shift for calculating difference, accepts negative values. | ||
| axis : axis to shift, default 0 | ||
| Take difference over rows (0) or columns (1). | ||
|
|
||
| Returns | ||
| ------- | ||
| Series or DataFrame | ||
| First differences. | ||
| """ | ||
| if axis != 0: | ||
| return self.apply(lambda x: x.diff(periods=periods, axis=axis)) | ||
|
|
||
| obj = self._obj_with_exclusions | ||
| shifted = self.shift(periods=periods, axis=axis) | ||
|
|
||
| # GH45562 - to retain existing behavior and match behavior of Series.diff(), | ||
| # int8 and int16 are coerced to float32 rather than float64. | ||
| dtypes_to_f32 = ["int8", "int16"] | ||
| if obj.ndim == 1: | ||
| if obj.dtype in dtypes_to_f32: | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. this will only work on the series case. why is this necessary? i think upcasting is fine here (likey what we do now) |
||
| shifted = shifted.astype("float32") | ||
| else: | ||
| mask = obj.dtypes.astype(str).isin(dtypes_to_f32).values | ||
|
||
| if mask.any(): | ||
| shifted.loc[:, mask] = shifted.loc[:, mask].astype("float32") | ||
|
|
||
| return obj - shifted | ||
|
|
||
| @final | ||
| @Substitution(name="groupby") | ||
| @Appender(_common_see_also) | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you type these args & return
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typing added