Nanops numba implementation #62463

eoincondron · 2025-09-26T12:08:11Z

The performance of array reductions in nanops/bottleneck can be significantly improved upon for large data using numba. The improvements are due to two factors:

single-pass algorithms when null values are present and avoiding any copies.
multi-threading over chunked of array or over an axis in a single axis reduction.

Although the added code is fairly complex, it provided a central, unified piece of code built from scratch covering the different reductions across different data types, array classes, skipna toggle, masked arrays etc. , potentially replacing code existing across multiple modules and, in the case of bottleneck, code that lives in a different repository.
It currently covers nan(sum|mean|min|max|var|std|sem) and should be easily extensible. I am seeking code review before bottoming it out completely, so as not to waste effort.

This screenshot demonstrates a potential 4x improvement on a DataFrame of 10-million rows and 5 columns of various types.

I am running the code on a features branch, and all unit tests for the feature branch are passing locally.
https://github.com/eoincondron/pandas/tree/nanops-numba-implementation

The hardware is a new MacBook Pro with 8 cores.

The performance is still slightly better at 1-million rows and is even greater at larger magnitudes (8x at 100 million rows).
The caveat is that all JIT-compilation is already completed.
I have carried out a more comprehensive performance comparison and these results hold up.

Similarly to bottleneck, these codepaths can be toggled on and off.

…g parallel chunking for large arrays

…neck switches

…s_numba - Tests all 9 private methods prefixed with underscore - 37 test cases organized in 8 test classes - Comprehensive coverage of Numba-accelerated reduction operations - Tests edge cases: NaN handling, empty arrays, masks, different dtypes - Uses pytest fixtures and parameterization to avoid code duplication - Tests NumbaList usage for parallel processing - Uses pandas._testing for consistent assertion helpers - All tests pass in pandas-dev environment Functions tested: - _get_initial_value: Finding first valid values in arrays - _nb_reduce_single_arr: Single array reduction operations - _nullify_below_mincount: Minimum count validation - _reduce_empty_array: Empty array handling - _chunk_arr_into_arr_list: Array chunking for parallel processing - _nb_reduce_arr_list_in_parallel: Parallel reduction operations - _reduce_chunked_results: Combining chunked results - _cast_to_timelike: DateTime/timedelta type casting - _nanvar_std_sem: Variance/standard deviation/standard error calculations 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>

eoincondron and others added 6 commits September 26, 2025 16:44

Add nanops_numba module with numba implementations of nanops includin…

17b90c5

…g parallel chunking for large arrays

Add a numba_switch decorator to nanops and replace most of the bottle…

24f75e2

…neck switches

Use isclose instead of exact equality when testing sum of large Series

00863c4

(linting for test_nanops_numba)

c095e1a

(minor fixes to examples in docstrings)

a12aa7c

eoincondron force-pushed the nanops-numba-implementation branch from ae60654 to a12aa7c Compare September 26, 2025 15:44

eoincondron closed this Sep 26, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Nanops numba implementation #62463

Nanops numba implementation #62463

Uh oh!

eoincondron commented Sep 26, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Nanops numba implementation #62463

Nanops numba implementation #62463

Uh oh!

Conversation

eoincondron commented Sep 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

eoincondron commented Sep 26, 2025 •

edited

Loading