Skip to content
1 change: 1 addition & 0 deletions asv_bench/benchmarks/groupby.py
Original file line number Diff line number Diff line change
Expand Up @@ -417,6 +417,7 @@ class GroupByMethods:
"cumprod",
"cumsum",
"describe",
"diff",
"ffill",
"first",
"head",
Expand Down
1 change: 1 addition & 0 deletions doc/source/whatsnew/v1.5.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -160,6 +160,7 @@ Performance improvements
~~~~~~~~~~~~~~~~~~~~~~~~
- Performance improvement in :meth:`.GroupBy.transform` for some user-defined DataFrame -> Series functions (:issue:`45387`)
- Performance improvement in :meth:`DataFrame.duplicated` when subset consists of only one column (:issue:`45236`)
- Performance improvement in :meth:`.GroupBy.diff` (:issue:`16706`)
-

.. ---------------------------------------------------------------------------
Expand Down
41 changes: 41 additions & 0 deletions pandas/core/groupby/groupby.py
Original file line number Diff line number Diff line change
Expand Up @@ -3484,6 +3484,47 @@ def shift(self, periods=1, freq=None, axis=0, fill_value=None):
)
return res

@final
@Substitution(name="groupby")
@Appender(_common_see_also)
def diff(self, periods=1, axis=0):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you type these args & return

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typing added

"""
First discrete difference of element.

Calculates the difference of each element compared with another
element in the group (default is element in previous row).

Parameters
----------
periods : int, default 1
Periods to shift for calculating difference, accepts negative values.
axis : axis to shift, default 0
Take difference over rows (0) or columns (1).

Returns
-------
Series or DataFrame
First differences.
"""
if axis != 0:
return self.apply(lambda x: x.diff(periods=periods, axis=axis))

obj = self._obj_with_exclusions
shifted = self.shift(periods=periods, axis=axis)

# GH45562 - to retain existing behavior and match behavior of Series.diff(),
# int8 and int16 are coerced to float32 rather than float64.
dtypes_to_f32 = ["int8", "int16"]
if obj.ndim == 1:
if obj.dtype in dtypes_to_f32:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this will only work on the series case. why is this necessary? i think upcasting is fine here (likey what we do now)

shifted = shifted.astype("float32")
else:
mask = obj.dtypes.astype(str).isin(dtypes_to_f32).values
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

umm what is this for?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the existing groupby diff behavior upcast int8 and int16 to float32, not float64. However, groupby shift upcasts to float64. See #45562 for more discussion.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

its fine to break it here (and change the tests) upcatsing to float32 is weird

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, will do. That means groupby.diff will upcast differently vs Series.diff, but that can be discussed in #45562.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no you would fix that too

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh i see we did that because of some cython issue. ok then let's preserve it (you will need an explict test for this ) and i don't think what you are doing is actually correct for frames

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, fixed the type coercion and added an explicit test.

if mask.any():
shifted.loc[:, mask] = shifted.loc[:, mask].astype("float32")

return obj - shifted

@final
@Substitution(name="groupby")
@Appender(_common_see_also)
Expand Down