Skip to content

Conversation

gfyoung
Copy link
Member

@gfyoung gfyoung commented Jun 27, 2018

Title is self-explanatory.

Closes #16818.
Closes #18656.

@gfyoung gfyoung added Enhancement API Design Algos Non-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diff labels Jun 27, 2018
@gfyoung gfyoung added this to the 0.24.0 milestone Jun 27, 2018
@gfyoung gfyoung requested a review from jreback June 27, 2018 07:16
@codecov
Copy link

codecov bot commented Jun 27, 2018

Codecov Report

Merging #21650 into master will increase coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #21650      +/-   ##
==========================================
+ Coverage    91.9%    91.9%   +<.01%     
==========================================
  Files         154      154              
  Lines       49555    49557       +2     
==========================================
+ Hits        45542    45544       +2     
  Misses       4013     4013
Flag Coverage Δ
#multiple 90.27% <100%> (ø) ⬆️
#single 42.03% <0%> (ø) ⬆️
Impacted Files Coverage Δ
pandas/core/frame.py 97.19% <ø> (ø) ⬆️
pandas/core/algorithms.py 94.85% <100%> (+0.01%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e0f978d...4998050. Read the comment docs.

s = Series([10, 9, 8, 7, 7, 7, 7, 6])
result = s.nlargest(4, keep='all')
expected = Series([10, 9, 8, 7, 7, 7, 7])
print(result, expected)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

print statement should be removed

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@@ -24,6 +24,7 @@ Other Enhancements
<https://pandas-gbq.readthedocs.io/en/latest/changelog.html#changelog-0-5-0>`__.
(:issue:`21627`)
- New method :meth:`HDFStore.walk` will recursively walk the group hierarchy of an HDF5 file (:issue:`10932`)
- :func:`Series` / :func:`DataFrame` methods :func:`nlargest` / :func:`nsmallest` now accept the value 'all' for the `keep` argument. This keeps all ties for the nth largest/smallest value (:issue:`16818`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think these will link properly if the class is separated from the method (I could be wrong though), and I think you'll need to explicitly write out the variations, e.g. :meth:`Series.nlargest`.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@gfyoung
Copy link
Member Author

gfyoung commented Jun 27, 2018

@jschendel : Thanks for comments! TBH, I didn't really check the PR diff before trying to apply it on master. 😄 I'll get to these shortly.

@gfyoung gfyoung force-pushed the nlargest-keep-last branch from 8d3b883 to 4b7f199 Compare June 27, 2018 16:47
@gfyoung
Copy link
Member Author

gfyoung commented Jun 27, 2018

@jreback @jschendel : Comments addressed, and all is passing. PTAL.

@jschendel
Copy link
Member

jschendel commented Jun 27, 2018

Can you add 'all' to the list of params for the Series and DataFrame asv's related to NSort:

class NSort(object):
goal_time = 0.2
params = ['last', 'first']

class NSort(object):
goal_time = 0.2
params = ['first', 'last']

I don't think this change will have any perf impact, so no need to run the asv's, but might be nice to have benchmarks in place for this.

LGTM otherwise.

@gfyoung gfyoung force-pushed the nlargest-keep-last branch from 4b7f199 to a07c686 Compare June 28, 2018 00:15
@gfyoung
Copy link
Member Author

gfyoung commented Jun 28, 2018

@jschendel : Yep, done.

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm. very minor comments. merge on green.

4 8 e 4.0

When using ``keep='all'``, all duplicate items are maintained
>>> df.nsmallest(3, 'a', keep='all')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick, blank line here

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

``TypeError``:

>>> df.nsmallest(3, 'b')
Traceback (most recent call last):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@@ -2461,6 +2461,22 @@ def test_n_duplicate_index(self, df_duplicates, n, order):
expected = df.sort_values(order, ascending=False).head(n)
tm.assert_frame_equal(result, expected)

def test_keep_all_ties(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you name: test_duplicate_keep_all_ties

side issue: I think make sense to split out duplicate tests to separate file (can you make an issue / PR if you feel!)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@@ -2082,6 +2082,17 @@ def test_boundary_datetimelike(self, nselect_method, dtype):
vals = [min_val + 1, min_val + 2, max_val - 1, max_val, min_val]
assert_check_nselect_boundary(vals, dtype, nselect_method)

def test_keep_all_ties(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same naming here

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@@ -24,6 +24,7 @@ Other Enhancements
<https://pandas-gbq.readthedocs.io/en/latest/changelog.html#changelog-0-5-0>`__.
(:issue:`21627`)
- New method :meth:`HDFStore.walk` will recursively walk the group hierarchy of an HDF5 file (:issue:`10932`)
- :meth:`Series.nlargest`, :meth:`Series.nsmallest`, :meth:`DataFrame.nlargest`, and :meth:`DataFrame.nsmallest` now accept the value ``"all"`` for the ``keep` argument. This keeps all ties for the nth largest/smallest value (:issue:`16818`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add both issue numbers here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The second issue number #18656 is an actually the old PR. I mentioned it for completeness.

@gfyoung gfyoung force-pushed the nlargest-keep-last branch from a07c686 to 4998050 Compare June 28, 2018 17:01
@gfyoung
Copy link
Member Author

gfyoung commented Jun 28, 2018

All is green, so merging.

@gfyoung gfyoung merged commit 0801b8c into pandas-dev:master Jun 28, 2018
@gfyoung gfyoung deleted the nlargest-keep-last branch June 28, 2018 21:20
Sup3rGeo pushed a commit to Sup3rGeo/pandas that referenced this pull request Oct 1, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Algos Non-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diff API Design Enhancement
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants