Skip to content

Conversation

jbrockmendel
Copy link
Member

Also avoid copy/cast in mixed-dtype case for _combine_match_index

Lets us use dispatch_to_series and get rid of DataFrame._compare_frame.

asv continuous -f 1.1 -E virtualenv master HEAD -b binary_ops
[...]
    before     after       ratio
  [03707400] [0cf9fa00]
-  105.09ms    94.39ms      0.90  binary_ops.Ops.time_frame_multi_and(True, 'default')
-   11.67ms    10.41ms      0.89  binary_ops.Ops.time_frame_mult(True, 1)
-   32.72ms    29.03ms      0.89  binary_ops.Timeseries.time_timestamp_ops_diff(None)
-   34.11ms    29.24ms      0.86  binary_ops.Ops.time_frame_mult(True, 'default')
-   20.06ms    15.78ms      0.79  binary_ops.Ops.time_frame_comparison(True, 'default')
-   37.93ms    28.27ms      0.75  binary_ops.AddOverflowArray.time_add_overflow_arr_mask_nan

asv continuous -f 1.1 -E virtualenv master HEAD -b binary_ops
[...]
    before     after       ratio
  [03707400] [0cf9fa00]
+   34.44ms    45.40ms      1.32  binary_ops.Ops2.time_frame_int_div_by_zero
+   32.34ms    42.38ms      1.31  binary_ops.Ops2.time_frame_float_mod
+   11.65ms    15.03ms      1.29  binary_ops.Ops.time_frame_mult(False, 'default')
+    8.01ms     9.76ms      1.22  binary_ops.Timeseries.time_series_timestamp_compare('US/Eastern')
+   48.31ms    58.72ms      1.22  binary_ops.Ops2.time_frame_int_mod
+    7.36ms     8.41ms      1.14  binary_ops.Timeseries.time_timestamp_series_compare('US/Eastern')
+    7.11ms     8.09ms      1.14  binary_ops.Timeseries.time_series_timestamp_compare(None)
+   67.32ms    76.35ms      1.13  binary_ops.Ops2.time_frame_float_div
+   30.83ms    34.92ms      1.13  binary_ops.Timeseries.time_timestamp_ops_diff_with_shift(None)
+    7.12ms     8.06ms      1.13  binary_ops.Timeseries.time_timestamp_series_compare(None)
+  104.54ms   116.93ms      1.12  binary_ops.Ops2.time_frame_float_floor_by_zero
+   32.93ms    36.62ms      1.11  binary_ops.Timeseries.time_timestamp_ops_diff(None)
+   40.88ms    45.45ms      1.11  binary_ops.Timeseries.time_timestamp_ops_diff_with_shift('US/Eastern')
-  122.79ms   111.19ms      0.91  binary_ops.Ops.time_frame_comparison(False, 1)
-   11.82ms    10.58ms      0.89  binary_ops.Ops.time_frame_add(True, 1)
-   25.92ms    22.90ms      0.88  binary_ops.AddOverflowArray.time_add_overflow_b_mask_nan
-   11.36ms     9.82ms      0.86  binary_ops.Ops.time_frame_add(False, 1)
-   10.63ms     8.88ms      0.84  binary_ops.AddOverflowScalar.time_add_overflow_scalar(0)
-   10.50ms     8.62ms      0.82  binary_ops.AddOverflowScalar.time_add_overflow_scalar(1)
-   11.55ms     9.43ms      0.82  binary_ops.Ops.time_frame_add(False, 'default')
-   10.47ms     8.41ms      0.80  binary_ops.AddOverflowScalar.time_add_overflow_scalar(-1)
-   28.92ms    22.54ms      0.78  binary_ops.AddOverflowArray.time_add_overflow_both_arg_nan
-   15.50ms    11.39ms      0.73  binary_ops.AddOverflowArray.time_add_overflow_arr_rev
-   43.70ms    25.47ms      0.58  binary_ops.AddOverflowArray.time_add_overflow_arr_mask_nan

asv continuous -f 1.1 -E virtualenv master HEAD -b binary_ops
[...]
    before     after       ratio
  [03707400] [0cf9fa00]
+    6.10ms     6.87ms      1.13  binary_ops.AddOverflowScalar.time_add_overflow_scalar(-1)
-    5.14ms     4.59ms      0.89  binary_ops.Ops.time_frame_comparison(True, 1)
-    4.80ms     4.21ms      0.88  binary_ops.Timeseries.time_series_timestamp_compare(None)
-    4.81ms     4.17ms      0.87  binary_ops.Timeseries.time_timestamp_series_compare('US/Eastern')
-   22.17ms    19.05ms      0.86  binary_ops.Timeseries.time_timestamp_ops_diff(None)
-   21.63ms    18.29ms      0.85  binary_ops.Timeseries.time_timestamp_ops_diff_with_shift(None)
-    5.23ms     4.42ms      0.85  binary_ops.Timeseries.time_timestamp_series_compare(None)
-   52.74ms    42.41ms      0.80  binary_ops.Ops.time_frame_multi_and(False, 1)

asv continuous -f 1.1 -E virtualenv master HEAD -b binary_ops
[...]
    before     after       ratio
  [03707400] [0cf9fa00]
-    4.85ms     4.40ms      0.91  binary_ops.Timeseries.time_series_timestamp_compare(None)
-   17.56ms    15.43ms      0.88  binary_ops.AddOverflowArray.time_add_overflow_arr_mask_nan
-   29.55ms    25.75ms      0.87  binary_ops.Timeseries.time_timestamp_ops_diff_with_shift('US/Eastern')
-    5.33ms     4.59ms      0.86  binary_ops.Timeseries.time_timestamp_series_compare(None)
-   68.42ms    58.38ms      0.85  binary_ops.Ops2.time_frame_float_floor_by_zero
-    5.05ms     4.23ms      0.84  binary_ops.Timeseries.time_series_timestamp_compare('US/Eastern')
-   22.39ms    18.35ms      0.82  binary_ops.Ops2.time_frame_float_mod
-   79.24ms    63.43ms      0.80  binary_ops.Ops.time_frame_comparison(False, 1)
-    5.38ms     4.18ms      0.78  binary_ops.Timeseries.time_timestamp_series_compare('US/Eastern')

@codecov
Copy link

codecov bot commented Aug 11, 2018

Codecov Report

Merging #22284 into master will increase coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #22284      +/-   ##
==========================================
+ Coverage   92.05%   92.05%   +<.01%     
==========================================
  Files         169      169              
  Lines       50783    50787       +4     
==========================================
+ Hits        46749    46753       +4     
  Misses       4034     4034
Flag Coverage Δ
#multiple 90.46% <100%> (ø) ⬆️
#single 42.3% <26.31%> (ø) ⬆️
Impacted Files Coverage Δ
pandas/core/ops.py 96.94% <100%> (+0.03%) ⬆️
pandas/core/frame.py 97.19% <100%> (-0.01%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 70c9003...776ea34. Read the comment docs.

@gfyoung gfyoung added Performance Memory or execution speed performance Algos Non-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diff labels Aug 13, 2018
Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you need a specific asv that tests this (e.g. ops on a big frame that is mixed and one that is single dtype). the perf numbers look about the same.

@jbrockmendel
Copy link
Member Author

the perf numbers look about the same.

Yah, this is actually a win though. A previous attempt to use dispatch_to_series instead of _compare_frame had a 15x slowdown. This gets us to a .8x speedup. But the real motivation is to avoid the redundant dispatch code in _compare_frame

BTW for triage purposes, getting #22163 through will make it easier to do a bunch of test cleanup/parametrization. (and, you know, fix a bunch of bugs)

@jbrockmendel
Copy link
Member Author

The perf part of this PR is really secondary. The main goal is to unify DataFrame ops going through dispatch_to_series. The numexpr part is necessary to avoid a perf hit in frame/frame comparisons.

@pep8speaks
Copy link

Hello @jbrockmendel! Thanks for updating the PR.

@jreback jreback added this to the 0.24.0 milestone Sep 8, 2018
@jreback jreback merged commit 16a6e9c into pandas-dev:master Sep 8, 2018
@jreback
Copy link
Contributor

jreback commented Sep 8, 2018

thanks!

@jbrockmendel jbrockmendel deleted the dispatch2 branch September 8, 2018 03:11
aeltanawy pushed a commit to aeltanawy/pandas that referenced this pull request Sep 20, 2018
Sup3rGeo pushed a commit to Sup3rGeo/pandas that referenced this pull request Oct 1, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Algos Non-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diff Performance Memory or execution speed performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants