Skip to content

Conversation

@sinhrks
Copy link
Member

@sinhrks sinhrks commented Jul 22, 2016

Index/Series.duplicated now uses dtype-based logic. Also skip algorithm if Index.is_unique is True (expected to be cached in practical situation).

asv:

   before     after       ratio
  [bb6b5e54] [16c6da4f]
-   34.41ms     4.69ms      0.14  algorithms.algorithm.time_int_duplicated
-   53.77ms     5.59ms      0.10  algorithms.algorithm.time_float_duplicated
-   60.22ms    40.20μs      0.00  algorithms.algorithm.time_int_unique_duplicated
SOME BENCHMARKS HAVE CHANGED SIGNIFICANTLY.

NOTE: can create template for htable after #13716.

@sinhrks sinhrks added Indexing Related to indexing on series/frames, not to indexes themselves Performance Memory or execution speed performance labels Jul 22, 2016
@sinhrks sinhrks added this to the 0.19.0 milestone Jul 22, 2016
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

int -> float

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah thanks. re-submit a bench:)

@codecov-io
Copy link

codecov-io commented Jul 22, 2016

Current coverage is 84.58% (diff: 100%)

Merging #13751 into master will increase coverage by <.01%

@@             master     #13751   diff @@
==========================================
  Files           141        141          
  Lines         51233      51258    +25   
  Methods           0          0          
  Messages          0          0          
  Branches          0          0          
==========================================
+ Hits          43331      43356    +25   
  Misses         7902       7902          
  Partials          0          0          

Powered by Codecov. Last update 2c047d4...12fb5ac

@sinhrks
Copy link
Member Author

sinhrks commented Jul 22, 2016

Updated the bench.algorithms.algorithm.time_int_unique_duplicated now benchmark under is_unique is once cached.

cdef:
Py_ssize_t i, n
dict seen = dict()
object row
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe we can use templates for these (make another issue)

@jreback jreback closed this in 2166ac1 Jul 25, 2016
@sinhrks sinhrks deleted the perf_duplicated branch July 25, 2016 22:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Indexing Related to indexing on series/frames, not to indexes themselves Performance Memory or execution speed performance

Projects

None yet

Development

Successfully merging this pull request may close these issues.

PERF: core/base/IndexOpsMixin duplicated should be changed to use same impl as frame.duplicated

4 participants