Skip to content

Conversation

@jorisvandenbossche
Copy link
Member

Master:

dtype = np.dtype(float)
dtype2 = pd.Int32Dtype()

In [2]: %timeit pd.api.types.is_extension_array_dtype(dtype)
645 ns ± 59.1 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

In [3]: %timeit pd.api.types.is_extension_array_dtype(dtype2)
251 ns ± 9.68 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

with this branch, it keeps the speed for ExtensionDtype the same, and improves for numpy.dtype (avoiding a extension dtype registry lookup):

In [2]: %timeit pd.api.types.is_extension_array_dtype(dtype)
328 ns ± 24.3 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

In [2]:  %timeit pd.api.types.is_extension_array_dtype(dtype2)
240 ns ± 19.4 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

@jorisvandenbossche jorisvandenbossche added Performance Memory or execution speed performance Dtype Conversions Unexpected or buggy dtype conversions labels Feb 8, 2021
@jorisvandenbossche jorisvandenbossche added this to the 1.3 milestone Feb 8, 2021
Copy link
Member

@jbrockmendel jbrockmendel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@jreback
Copy link
Contributor

jreback commented Feb 8, 2021

sure (assume you found this from existing asvs), if not can you add the one from above.

@jreback
Copy link
Contributor

jreback commented Feb 8, 2021

also a whatsnew mention is prob worth it.

@jorisvandenbossche
Copy link
Member Author

assume you found this from existing asvs, if not can you add the one from above.

Yes, it came from profiling the concat benchmarks (#39612), gives a small improvement. Now, apparently we don't have specific benchmarks for our dtype checks, so added one for is_extension_array_dtype (that could be supplemented with benchmarks for some of the other is_ checkers, but going to leave that for another issue/PR).

also a whatsnew mention is prob worth it.

Not sure it is worth it, this only gives a tiny improvement anywhere it is used (it was just lowing hanging fruit to shave off a percent, and with ArrayManager it gets used more, so getting more worth fixing it)

@jreback jreback merged commit be68850 into pandas-dev:master Feb 10, 2021
@jreback
Copy link
Contributor

jreback commented Feb 10, 2021

thanks @jorisvandenbossche

@jorisvandenbossche jorisvandenbossche deleted the perf-is_extension_array_dtype branch February 10, 2021 14:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Dtype Conversions Unexpected or buggy dtype conversions Performance Memory or execution speed performance

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants