Skip to content

Conversation

eoincondron
Copy link

@eoincondron eoincondron commented Sep 26, 2025

Ref #62449

The performance of array reductions in nanops/bottleneck can be significantly improved upon for large data using numba. The improvements are due to two factors:

  • single-pass algorithms when null values are present and avoiding any copies.
  • multi-threading over chunked of array or over an axis in a single axis reduction.

Although the added code is fairly complex, it provided a central, unified piece of code built from scratch covering the different reductions across different data types, array classes, skipna toggle, masked arrays etc. , potentially replacing code existing across multiple modules and, in the case of bottleneck, code that lives in a different repository.
It currently covers nan(sum|mean|min|max|var|std|sem) and should be easily extensible. I am seeking code review before bottoming it out completely, so as not to waste effort.

This screenshot demonstrates a potential 4x improvement on a DataFrame of 10-million rows and 5 columns of various types.

image

I am running the code on a features branch, and all unit tests for the feature branch are passing locally.
https://github.com/eoincondron/pandas/tree/nanops-numba-implementation

The hardware is a new MacBook Pro with 8 cores.

The performance is still slightly better at 1-million rows and is even greater at larger magnitudes (8x at 100 million rows).
The caveat is that all JIT-compilation is already completed.
I have carried out a more comprehensive performance comparison and these results hold up.

Similarly to bottleneck, these codepaths can be toggled on and off.

eoincondron and others added 6 commits September 26, 2025 16:44
…s_numba

- Tests all 9 private methods prefixed with underscore
- 37 test cases organized in 8 test classes
- Comprehensive coverage of Numba-accelerated reduction operations
- Tests edge cases: NaN handling, empty arrays, masks, different dtypes
- Uses pytest fixtures and parameterization to avoid code duplication
- Tests NumbaList usage for parallel processing
- Uses pandas._testing for consistent assertion helpers
- All tests pass in pandas-dev environment

Functions tested:
- _get_initial_value: Finding first valid values in arrays
- _nb_reduce_single_arr: Single array reduction operations
- _nullify_below_mincount: Minimum count validation
- _reduce_empty_array: Empty array handling
- _chunk_arr_into_arr_list: Array chunking for parallel processing
- _nb_reduce_arr_list_in_parallel: Parallel reduction operations
- _reduce_chunked_results: Combining chunked results
- _cast_to_timelike: DateTime/timedelta type casting
- _nanvar_std_sem: Variance/standard deviation/standard error calculations

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <[email protected]>
@eoincondron eoincondron force-pushed the nanops-numba-implementation branch from ae60654 to a12aa7c Compare September 26, 2025 15:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant