Skip to content

Conversation

TomAugspurger
Copy link
Contributor

xref #16584

Here are the timings vs. master

       before           after         ratio
     [ff0d1f4d]       [6b1552f1]
-      12.3±0.3ms      2.28±0.04ms     0.18  frame_methods.Interpolate.time_interpolate_some_good_infer
-      12.6±0.7ms      1.15±0.01ms     0.09  frame_methods.Interpolate.time_interpolate_some_good
-           1.30s         53.8±1ms     0.04  frame_methods.Interpolate.time_interpolate

I still need to time vs 0.20.1 (will post those timings later, have to run for a bit now). I may have one more spot to optimize.

@TomAugspurger
Copy link
Contributor Author

TomAugspurger commented Jun 3, 2017

Also, the master and 0.20.1 versions scale poorly with limit, but that's not benchmarked anywhere. Here's some timings

limit / version 0.20.1 master interp-perf
None 42.2 ms 1.37 s 43.8 ms
50 1.45 s 1.33 s 66.2 ms
500 1.40 s 1.36 s 62.5 ms

So we're slightly slower than 0.20.1 for the default, but much faster with any limit (notice the units).

I'll add a benchmark for limit and direction. Anyone know if there's a way to run new benchmarks against old commits in asv?

@TomAugspurger
Copy link
Contributor Author

TomAugspurger commented Jun 3, 2017

Just finished vs. 0.20.1, and they're essentially the same (I think the first one is my branch, 0.20.1 is second).

· Running 6 total benchmarks (2 commits * 1 environments * 3 benchmarks)
[  0.00%] · For pandas commit hash b2bbd1bf:
[  0.00%] ·· Building for conda-py3.6-Cython-matplotlib-numexpr-numpy-openpyxl-pytables-pytest-scipy-sqlalchemy-xlrd-xlsxwriter-xlwt...........................................................................................
[  0.00%] ·· Benchmarking conda-py3.6-Cython-matplotlib-numexpr-numpy-openpyxl-pytables-pytest-scipy-sqlalchemy-xlrd-xlsxwriter-xlwt
[ 16.67%] ··· Running frame_methods.Interpolate.time_interpolate                                                                                  44.8±0.3ms
[ 33.33%] ··· Running frame_methods.Interpolate.time_interpolate_some_good                                                                       1.09±0.03ms
[ 50.00%] ··· Running frame_methods.Interpolate.time_interpolate_some_good_infer                                                                 2.41±0.01ms
[ 50.00%] · For pandas commit hash 20baf972:
[ 50.00%] ·· Building for conda-py3.6-Cython-matplotlib-numexpr-numpy-openpyxl-pytables-pytest-scipy-sqlalchemy-xlrd-xlsxwriter-xlwt...
[ 50.00%] ·· Benchmarking conda-py3.6-Cython-matplotlib-numexpr-numpy-openpyxl-pytables-pytest-scipy-sqlalchemy-xlrd-xlsxwriter-xlwt
[ 66.67%] ··· Running frame_methods.Interpolate.time_interpolate                                                                                  46.7±0.9ms
[ 83.33%] ··· Running frame_methods.Interpolate.time_interpolate_some_good                                                                       1.08±0.02ms
[100.00%] ··· Running frame_methods.Interpolate.time_interpolate_some_good_infer                                                                 2.44±0.07ms
BENCHMARKS NOT SIGNIFICANTLY CHANGED.

def _interp_limit0(invalid, fw_limit, bw_limit):
"Get idx of values that won't be filled b/c they exceed the limits."
for x in np.where(invalid)[0]:
if invalid[max(0, x - fw_limit):x + bw_limit + 1].all():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should rewrite this as well

@TomAugspurger
Copy link
Contributor Author

TomAugspurger commented Jun 3, 2017 via email

@codecov
Copy link

codecov bot commented Jun 3, 2017

Codecov Report

Merging #16592 into master will increase coverage by 0.17%.
The diff coverage is 90.9%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #16592      +/-   ##
==========================================
+ Coverage   90.75%   90.92%   +0.17%     
==========================================
  Files         161      161              
  Lines       51097    49265    -1832     
==========================================
- Hits        46372    44795    -1577     
+ Misses       4725     4470     -255
Flag Coverage Δ
#multiple 88.68% <90.9%> (+0.09%) ⬆️
#single 40.22% <9.09%> (+0.05%) ⬆️
Impacted Files Coverage Δ
pandas/core/missing.py 84.54% <90.9%> (+0.27%) ⬆️
pandas/core/computation/expressions.py 0% <0%> (-90.48%) ⬇️
pandas/io/msgpack/_version.py 44.65% <0%> (-0.22%) ⬇️
pandas/io/formats/style.py 100% <0%> (+4.07%) ⬆️
pandas/core/util/hashing.py 98.34% <0%> (+5.37%) ⬆️
pandas/io/json/json.py 100% <0%> (+9.63%) ⬆️
pandas/util/testing.py 100% <0%> (+19.01%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 31e67d5...b2bbd1b. Read the comment docs.

@codecov
Copy link

codecov bot commented Jun 3, 2017

Codecov Report

Merging #16592 into master will increase coverage by 0.17%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #16592      +/-   ##
==========================================
+ Coverage   90.75%   90.93%   +0.17%     
==========================================
  Files         161      161              
  Lines       51097    49261    -1836     
==========================================
- Hits        46372    44794    -1578     
+ Misses       4725     4467     -258
Flag Coverage Δ
#multiple 88.69% <100%> (+0.1%) ⬆️
#single 40.22% <6.89%> (+0.05%) ⬆️
Impacted Files Coverage Δ
pandas/core/missing.py 85.25% <100%> (+0.97%) ⬆️
pandas/core/computation/expressions.py 0% <0%> (-90.48%) ⬇️
pandas/io/msgpack/_version.py 44.65% <0%> (-0.22%) ⬇️
pandas/io/formats/style.py 100% <0%> (+4.07%) ⬆️
pandas/core/util/hashing.py 98.34% <0%> (+5.37%) ⬆️
pandas/io/json/json.py 100% <0%> (+9.63%) ⬆️
pandas/util/testing.py 100% <0%> (+19.01%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 31e67d5...bf88fc7. Read the comment docs.

@TomAugspurger TomAugspurger added this to the 0.20.2 milestone Jun 3, 2017
@TomAugspurger TomAugspurger added Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate Performance Memory or execution speed performance labels Jun 3, 2017
return f_idx & b_idx


def rolling_window(a, window):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe move this inside _interp_limit to avoid confusion that this is public and/or used externally

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this all looks fine
i suspect a pure cython impl might be easier to read though (maybe even faster)

can tackle later if interest

@TomAugspurger
Copy link
Contributor Author

i suspect a pure cython impl might be easier to read though (maybe even faster)

Yeah, it's kind of unreadable, and I have no idea what that striding stuff is doing.

I'll leave this open if you want to take a look. I'll tag 0.20.2 early tomorrow morning.
We can also do the cython for 0.21

@jreback
Copy link
Contributor

jreback commented Jun 4, 2017

no this is all fine maybe make an issue for 0.21 to have a look

@TomAugspurger TomAugspurger merged commit 473615e into pandas-dev:master Jun 4, 2017
TomAugspurger added a commit to TomAugspurger/pandas that referenced this pull request Jun 4, 2017
* PERF: vectorize _interp_limit

* CLN: remove old implementation

* fixup! CLN: remove old implementation

(cherry picked from commit 473615e)
TomAugspurger added a commit that referenced this pull request Jun 4, 2017
* PERF: vectorize _interp_limit

* CLN: remove old implementation

* fixup! CLN: remove old implementation

(cherry picked from commit 473615e)
@TomAugspurger TomAugspurger deleted the interp-perf branch June 4, 2017 20:29
Kiv pushed a commit to Kiv/pandas that referenced this pull request Jun 11, 2017
* PERF: vectorize _interp_limit

* CLN: remove old implementation

* fixup! CLN: remove old implementation
stangirala pushed a commit to stangirala/pandas that referenced this pull request Jun 11, 2017
* PERF: vectorize _interp_limit

* CLN: remove old implementation

* fixup! CLN: remove old implementation
yarikoptic added a commit to neurodebian/pandas that referenced this pull request Jul 12, 2017
Version 0.20.2

* tag 'v0.20.2': (68 commits)
  RLS: v0.20.2
  DOC: Update release.rst
  DOC: Whatsnew fixups (pandas-dev#16596)
  ERRR: Raise error in usecols when column doesn't exist but length matches (pandas-dev#16460)
  BUG: convert numpy strings in index names in HDF pandas-dev#13492 (pandas-dev#16444)
  PERF: vectorize _interp_limit (pandas-dev#16592)
  DOC: whatsnew 0.20.2 edits (pandas-dev#16587)
  API: Make is_strictly_monotonic_* private (pandas-dev#16576)
  BUG: reimplement MultiIndex.remove_unused_levels (pandas-dev#16565)
  Strictly monotonic (pandas-dev#16555)
  ENH: add .ngroup() method to groupby objects (pandas-dev#14026) (pandas-dev#14026)
  fix linting
  BUG: Incorrect handling of rolling.cov with offset window (pandas-dev#16244)
  BUG: select_as_multiple doesn't respect start/stop kwargs GH16209 (pandas-dev#16317)
  return empty MultiIndex for symmetrical difference on equal MultiIndexes (pandas-dev#16486)
  BUG: Bug in .resample() and .groupby() when aggregating on integers (pandas-dev#16549)
  BUG: Fixed tput output on windows (pandas-dev#16496)
  Strictly monotonic (pandas-dev#16555)
  BUG: fixed wrong order of ordered labels in pd.cut()
  BUG: Fixed to_html ignoring index_names parameter
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate Performance Memory or execution speed performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants