Skip to content

Conversation

@unutbu
Copy link
Contributor

@unutbu unutbu commented Mar 11, 2014

Currently,

>>> import pandas.core.common as com
>>> com.array_equivalent(Float64Index([0, np.nan]), Float64Index([0, np.nan]))
False

Although the current pandas code base does not use array_equivalent to compare Float64Indexes, leaving array_equivalent in its current state may be a bug waiting to happen.

This PR attempts to fix the problem by using pd.isnull for all arrays of dtype object. In a previous PR I tried this and got terrible perf results. Since then I've discovered that my machine does not have enough memory to run the full perf test suit without page faults. If I rerun test_perf.sh for just a few Benchmarks, I can avoid the page faults and get consistent results.

Running /usr/bin/time -v ./test_perf.sh -b master -t fix-equivalent yielded two tests with ratio > 1.1.

reindex_fillna_pad                           |   0.5784 |   0.5034 |   1.1490 |
packers_write_pack                           |  15.2360 |   7.1851 |   2.1205 |
-------------------------------------------------------------------------------
Test name                                    | head[ms] | base[ms] |  ratio   |
-------------------------------------------------------------------------------

which I believe were due to page faults. When I reran perf on just these tests using
/usr/bin/time -v ./test_perf.sh -b master -t fix-equivalent -r "reindex_fillna_pad|packers_write_pack"

I got

Test name                                    | head[ms] | base[ms] |  ratio   |
-------------------------------------------------------------------------------
reindex_fillna_pad_float32                   |   0.4633 |   0.4590 |   1.0093 |
packers_write_pack                           |   7.9544 |   7.8390 |   1.0147 |
reindex_fillna_pad                           |   0.7290 |   0.7180 |   1.0154 |
-------------------------------------------------------------------------------
Test name                                    | head[ms] | base[ms] |  ratio   |
-------------------------------------------------------------------------------

jreback added a commit that referenced this pull request Mar 11, 2014
FIX: Bug whereby array_equivalent was not correctly comparing Float64Ind...
@jreback jreback merged commit 45009f0 into pandas-dev:master Mar 11, 2014
@jreback
Copy link
Contributor

jreback commented Mar 11, 2014

thank you sir!

@jreback jreback added this to the 0.14.0 milestone Mar 11, 2014
@unutbu unutbu deleted the fix-equivalent branch March 11, 2014 14:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Dtype Conversions Unexpected or buggy dtype conversions Performance Memory or execution speed performance

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants