Skip to content

Conversation

@jbrockmendel
Copy link
Member

I think this is much closer to "the correct" abstraction here (i find the layered abstractions with similar names in core.groupby hard to follow). Contains \approx 0 groupby logic (takes ngroups as an argument in some methods) and is only about how to wrap/call/unwrap the libgroupby functions.

There are a few more parts of _cython_operation I'd like to move, but I'm holding off on that pending some non-refactor work that touches those pieces.

No specific plans, but I'm hopeful that something like this can be re-used to de-duplicate logic in nanops and maybe algorithms.

This uses classmethods/staticmethods (i like the explicit statelessness) and passes how+kind as arguments, but an alternative would be to instantiate with how+kind passed to __init__.

The only potentially-logical change is that when defining out_dtype, this returns "float64" instead of "float". I haven't checked if that makes a difference on 32bit builds, but after weeks of dealing with np.intp much prefer the explicitness.

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 on this

out_dtype = "object"

codes, _, _ = self.group_info
out_shape = WrappedCythonOp.get_output_shape(how, kind, ngroups, values)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i would go one step further and use an instance of WrappedCythonOp here, then you can pass whatever you need into the construct and just ask the instance for properties (if you want to make this simple), you can use a dataclass for example.

I find this pattern better than using a class like this (meaning the class is just a collection of methods here).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated + green

@jreback jreback added Groupby Refactor Internal refactoring of code labels Apr 1, 2021
Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks fine, some suggestions for possible followups

Dispatch logic for functions defined in _libs.groupby
"""

def __init__(self, kind: str, how: str):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

prob should be after the class variables / methods, but nbd

out_dtype = "object"

codes, _, _ = self.group_info
out_shape = cy_op.get_output_shape(ngroups, values)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

future, would just have a single method here where you pass ngroup, values, may be able to determine is_numeric inside the WrappedCythonOps (instead of here), and just return the items you need, though you can take it a step further and have a WrappedAggregateOp and WrappedTransformOp, something like

op = WrappedCythonOps(kind, how, values.....) # other args
# op is WrappedTransform / WrappedAggregate
result = op.get_result()

@jreback jreback added this to the 1.3 milestone Apr 2, 2021
@jreback jreback merged commit e69df38 into pandas-dev:master Apr 2, 2021
@jbrockmendel jbrockmendel deleted the ref-gb-cyfunc branch April 2, 2021 02:22
@jorisvandenbossche
Copy link
Member

@jbrockmendel can you hold off a bit with such refactoring changes to groupby/ops.py until #40672 (comment) is resolved?

@jbrockmendel
Copy link
Member Author

can you hold off a bit with such refactoring

Sure. The two things I have queued up in this area are 1) a refactor aimed specifically at ameliorating the linked concern and 2) a bugfix where we currently cast int64s to float64s in a lossy way. When I post them ill put them in "draft" mode and cc you on them.

vladu pushed a commit to vladu/pandas that referenced this pull request Apr 5, 2021
JulianWgs pushed a commit to JulianWgs/pandas that referenced this pull request Jul 3, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Groupby Refactor Internal refactoring of code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants