Skip to content

Conversation

ShaharNaveh
Copy link
Contributor

  • closes #xxxx
  • tests added / passed
  • passes black pandas
  • passes git diff upstream/master -u -- "*.py" | flake8 --diff
  • whatsnew entry

This PR was opened as @jbrockmendel suggested (ref #32177 (comment))


Benchmarks:

Master:

In [1]: import pandas._libs.internals as internals

In [2]: %timeit internals.BlockPlacement(slice(1_000_000)).as_array
1.55 ms ± 143 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

PR:

In [1]: import pandas._libs.internals as internals

In [2]: %timeit internals.BlockPlacement(slice(1_000_000)).as_array
1.46 ms ± 3.55 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

@jbrockmendel
Copy link
Member

Since the perf numbers are pretty similar, can you report a few rounds of each

@ShaharNaveh
Copy link
Contributor Author

Since the perf numbers are pretty similar, can you report a few rounds of each

Sure!


In [1]: import pandas._libs.internals as internals                                                            

In [2]: %timeit internals.BlockPlacement(slice(1)).as_array                                                   
1.46 µs ± 45.7 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) # master
586 ns ± 35.4 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) # PR

In [3]: %timeit internals.BlockPlacement(slice(10)).as_array                                                  
1.65 µs ± 21.2 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) # master
832 ns ± 19.3 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) # PR

In [4]: %timeit internals.BlockPlacement(slice(100)).as_array                                                 
1.74 µs ± 14.4 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) # master
879 ns ± 12 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) # PR

In [5]: %timeit internals.BlockPlacement(slice(1_000)).as_array                                               
2.94 µs ± 105 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) # master
2.1 µs ± 77.8 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) # PR

In [6]: %timeit internals.BlockPlacement(slice(10_000)).as_array                                              
10.1 µs ± 198 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) # master
9.39 µs ± 170 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) # PR

In [7]: %timeit internals.BlockPlacement(slice(100_000)).as_array                                             
81 µs ± 1.75 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each) # master
83 µs ± 1.36 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each) # PR

In [8]: %timeit internals.BlockPlacement(slice(1_000_000)).as_array                                           
1.72 ms ± 165 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) # master
1.56 ms ± 13.1 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) # PR

@jbrockmendel
Copy link
Member

thats a nice improvement for the small cases, thanks

start, stop, step, _ = slice_get_indices_ex(self._as_slice)
self._as_array = np.arange(start, stop, step, dtype=np.int64)
# NOTE: this is the C-optimized equivalent of
# np.arange(start, stop, step, dtype=np.int64)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick: i like to add an extra space at the beginning of multi-line comments specifically to avoid confusing this for commented-out code. quotation marks or backticks or ... are also good

@jbrockmendel
Copy link
Member

lgtm, thanks for accomodating my nitpicking

@ShaharNaveh
Copy link
Contributor Author

@jbrockmendel I have a list of your nitpicks and requests, I try to address them when I have the time :)

@WillAyd
Copy link
Member

WillAyd commented Mar 13, 2020

Why does this generate anything different from Cython? Shouldn't it generate the same result?

@jbrockmendel
Copy link
Member

Why does this generate anything different from Cython? Shouldn't it generate the same result?

The generated C code has to call np.arange in python-space (and lookup np.int64, and ...)

@WillAyd
Copy link
Member

WillAyd commented Mar 13, 2020 via email

@topper-123 topper-123 added Performance Memory or execution speed performance Internals Related to non-user accessible pandas implementation labels Mar 14, 2020
@topper-123 topper-123 added this to the 1.1 milestone Mar 14, 2020
@jreback jreback merged commit 5c7a901 into pandas-dev:master Mar 14, 2020
@jreback
Copy link
Contributor

jreback commented Mar 14, 2020

thanks I agree we should not just wholesale convert calls to use the c-api, but in specific cases where we are actually returning a numpy array (as opposed to iteration), ok

@ShaharNaveh ShaharNaveh deleted the PERF-arange branch March 18, 2020 12:47
SeeminSyed pushed a commit to CSCD01-team01/pandas that referenced this pull request Mar 22, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Internals Related to non-user accessible pandas implementation Performance Memory or execution speed performance

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants