Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Ref #62449
The performance of array reductions in
nanops/bottleneck
can be significantly improved upon for large data using numba. The improvements are due to two factors:Although the added code is fairly complex, it provided a central, unified piece of code built from scratch covering the different reductions across different data types, array classes, skipna toggle, masked arrays etc. , potentially replacing code existing across multiple modules and, in the case of
bottleneck
, code that lives in a different repository.It currently covers
nan(sum|mean|min|max|var|std|sem)
and should be easily extensible. I am seeking code review before bottoming it out completely, so as not to waste effort.This screenshot demonstrates a potential 4x improvement on a DataFrame of 10-million rows and 5 columns of various types.
I am running the code on a features branch, and all unit tests for the feature branch are passing locally.
https://github.com/eoincondron/pandas/tree/nanops-numba-implementation
The hardware is a new MacBook Pro with 8 cores.
The performance is still slightly better at 1-million rows and is even greater at larger magnitudes (8x at 100 million rows).
The caveat is that all JIT-compilation is already completed.
I have carried out a more comprehensive performance comparison and these results hold up.
Similarly to
bottleneck
, these codepaths can be toggled on and off.