PERF: avoid creating numpy array in groupby.first|last #34178

topper-123 · 2020-05-14T18:36:35Z

A unneeded numpy array is created for each group when calling groupby.first and groupby.last on ExtensionArrays. This avoids that.

>>> cat = pd.Categorical(["a"] * 1_000_000 + ["b"] * 1_000_000)
>>> ser = pd.Series(cat)
>>> %timeit ser.groupby(cat).first()
210 ms ± 3.03 ms per loop  # master
78.4 ms ± 766 µs per loop  # this PR

The same speedup is archieved for groupby.last. The above is 3x faster than in master because there are two groups == we save creating two arrays. If there were more groups/larger arrays, we'd get even more improvements.

Also adds some type hints to help understand what parameters these funtions accept.

jreback · 2020-05-15T12:54:47Z

thanks @topper-123 very nice

PERF: avoid creating numpy array in groupby.first|last

0389edc

topper-123 force-pushed the groupby_first_last branch from a457506 to 0389edc Compare May 14, 2020 18:37

jreback added Groupby Performance Memory or execution speed performance labels May 15, 2020

jreback added this to the 1.1 milestone May 15, 2020

jreback merged commit 1f1735e into pandas-dev:master May 15, 2020

topper-123 mentioned this pull request May 15, 2020

CLN/TYP: Groupby agg methods #34200

Merged

topper-123 deleted the groupby_first_last branch May 24, 2020 16:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

PERF: avoid creating numpy array in groupby.first|last #34178

PERF: avoid creating numpy array in groupby.first|last #34178

Uh oh!

topper-123 commented May 14, 2020 •

edited

Loading

Uh oh!

jreback commented May 15, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

PERF: avoid creating numpy array in groupby.first|last #34178

PERF: avoid creating numpy array in groupby.first|last #34178

Uh oh!

Conversation

topper-123 commented May 14, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jreback commented May 15, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

topper-123 commented May 14, 2020 •

edited

Loading