-
-
Notifications
You must be signed in to change notification settings - Fork 18.7k
ENH: Added DataFrame.nsorted
to select top n
rows according to column-dependent order
#61457
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
MartinBraquet
wants to merge
20
commits into
pandas-dev:main
Choose a base branch
from
MartinBraquet:nsorted
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from 6 commits
Commits
Show all changes
20 commits
Select commit
Hold shift + click to select a range
a3aed2f
Rename tests to nsorted and rename order to columns
MartinBraquet e476e18
Add time_nsorted to benchmark
MartinBraquet 92d53da
Add nsorted method for dataframe and series
MartinBraquet d771487
Add test for nsorted method
MartinBraquet 89a656c
Add whatsnew for nsorted
MartinBraquet 72f7f13
Fix index compute type
MartinBraquet 0e79e2b
Re-Trigger PR checks
MartinBraquet 1790f6c
Merge branch 'main' into nsorted
MartinBraquet 24ac85f
Merge branch 'main' into nsorted
MartinBraquet 541d5f2
Merge branch 'main' into nsorted
MartinBraquet dc368b7
Fixes
MartinBraquet b6c25b3
Add nsorted to frame API docs
MartinBraquet b1e38b1
Add nsorted to series API docs and add docstring
MartinBraquet ad2975f
Update pandas/tests/frame/methods/test_nlargest.py
MartinBraquet 76715c1
Merge branch 'main' into nsorted
MartinBraquet 550903f
Fix test after updating error message
MartinBraquet aff4671
Merge remote-tracking branch 'origin/nsorted' into nsorted
MartinBraquet 4bae30f
Fix
MartinBraquet e0d6ab1
Apply suggestions from code review
MartinBraquet 8f0fcec
Merge branch 'main' into nsorted
MartinBraquet File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -7447,6 +7447,160 @@ def value_counts( | |
|
||
return counts | ||
|
||
def nsorted( | ||
self, | ||
n: int, | ||
columns: IndexLabel, | ||
ascending: bool | Sequence[bool], | ||
keep: NsmallestNlargestKeep = "first", | ||
) -> DataFrame: | ||
""" | ||
Return the first `n` rows ordered by `columns` in the order defined by | ||
`ascending`. | ||
|
||
The columns that are not specified are returned as | ||
well, but not used for ordering. | ||
|
||
This method is equivalent to | ||
``df.sort_values(columns, ascending=ascending).head(n)``, but more | ||
performant. | ||
|
||
Parameters | ||
---------- | ||
n : int | ||
Number of rows to return. | ||
columns : label or list of labels | ||
Column label(s) to order by. | ||
ascending : bool or list of bools | ||
Whether to sort in ascending or descending order. | ||
If a list, must be the same length as `columns`. | ||
keep : {'first', 'last', 'all'}, default 'first' | ||
Where there are duplicate values: | ||
|
||
- ``first`` : prioritize the first occurrence(s) | ||
- ``last`` : prioritize the last occurrence(s) | ||
- ``all`` : keep all the ties of the smallest item even if it means | ||
selecting more than ``n`` items. | ||
|
||
Returns | ||
------- | ||
DataFrame | ||
The first `n` rows ordered by the given columns in the order given | ||
in `ascending`. | ||
|
||
See Also | ||
-------- | ||
DataFrame.nlargest : Return the first `n` rows ordered by `columns` in | ||
descending order. | ||
DataFrame.nsmallest : Return the first `n` rows ordered by `columns` in | ||
ascending order. | ||
DataFrame.sort_values : Sort DataFrame by the values. | ||
DataFrame.head : Return the first `n` rows without re-ordering. | ||
|
||
Notes | ||
----- | ||
This function cannot be used with all column types. For example, when | ||
specifying columns with `object` or `category` dtypes, ``TypeError`` is | ||
raised. | ||
|
||
Examples | ||
-------- | ||
>>> df = pd.DataFrame( | ||
... { | ||
... "population": [ | ||
... 59000000, | ||
... 65000000, | ||
... 434000, | ||
... 434000, | ||
... 434000, | ||
... 337000, | ||
... 11300, | ||
... 11300, | ||
... 11300, | ||
... ], | ||
... "GDP": [1937894, 2583560, 12011, 4520, 12128, 17036, 182, 38, 311], | ||
... "alpha-2": ["IT", "FR", "MT", "MV", "BN", "IS", "NR", "TV", "AI"], | ||
... }, | ||
... index=[ | ||
... "Italy", | ||
... "France", | ||
... "Malta", | ||
... "Maldives", | ||
... "Brunei", | ||
... "Iceland", | ||
... "Nauru", | ||
... "Tuvalu", | ||
... "Anguilla", | ||
... ], | ||
... ) | ||
>>> df | ||
population GDP alpha-2 | ||
Italy 59000000 1937894 IT | ||
France 65000000 2583560 FR | ||
Malta 434000 12011 MT | ||
Maldives 434000 4520 MV | ||
Brunei 434000 12128 BN | ||
Iceland 337000 17036 IS | ||
Nauru 11300 182 NR | ||
Tuvalu 11300 38 TV | ||
Anguilla 11300 311 AI | ||
|
||
In the following example, we will use ``nsorted`` to select the three | ||
rows having the largest values in column "population". | ||
|
||
>>> df.nsorted(3, "population", ascending=False) | ||
population GDP alpha-2 | ||
France 65000000 2583560 FR | ||
Italy 59000000 1937894 IT | ||
Malta 434000 12011 MT | ||
|
||
When using ``keep='last'``, ties are resolved in reverse order: | ||
MartinBraquet marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
>>> df.nsorted(3, "population", ascending=False, keep="last") | ||
population GDP alpha-2 | ||
France 65000000 2583560 FR | ||
Italy 59000000 1937894 IT | ||
Brunei 434000 12128 BN | ||
|
||
When using ``keep='all'``, the number of elements kept can go beyond ``n`` | ||
MartinBraquet marked this conversation as resolved.
Show resolved
Hide resolved
|
||
if there are duplicate values for the smallest element. All the | ||
ties are kept: | ||
|
||
>>> df.nsorted(3, "population", ascending=False, keep="all") | ||
population GDP alpha-2 | ||
France 65000000 2583560 FR | ||
Italy 59000000 1937894 IT | ||
Malta 434000 12011 MT | ||
Maldives 434000 4520 MV | ||
Brunei 434000 12128 BN | ||
|
||
However, ``nsorted`` does not keep ``n`` distinct largest elements: | ||
|
||
>>> df.nsorted(5, "population", ascending=False, keep="all") | ||
population GDP alpha-2 | ||
France 65000000 2583560 FR | ||
Italy 59000000 1937894 IT | ||
Malta 434000 12011 MT | ||
Maldives 434000 4520 MV | ||
Brunei 434000 12128 BN | ||
|
||
To order by the largest values in column "population" and break ties | ||
according to the smallest values in column "GDP", we can specify | ||
multiple columns and ascending orders like in the next example. | ||
MartinBraquet marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
>>> df.nsorted(3, ["population", "GDP"], ascending=[False, True]) | ||
population GDP alpha-2 | ||
France 65000000 2583560 FR | ||
Italy 59000000 1937894 IT | ||
Maldives 434000 4520 MV | ||
""" | ||
return selectn.SelectNFrame( | ||
self, | ||
n=n, | ||
keep=keep, | ||
columns=columns, | ||
).nsorted(ascending=ascending) | ||
|
||
def nlargest( | ||
self, n: int, columns: IndexLabel, keep: NsmallestNlargestKeep = "first" | ||
) -> DataFrame: | ||
|
@@ -7457,6 +7611,9 @@ def nlargest( | |
descending order. The columns that are not specified are returned as | ||
well, but not used for ordering. | ||
|
||
This method is equivalent to | ||
``df.nsorted(n, columns, ascending=False)``. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'd prefer if this line was removed; including |
||
|
||
This method is equivalent to | ||
``df.sort_values(columns, ascending=False).head(n)``, but more | ||
performant. | ||
|
@@ -7485,6 +7642,8 @@ def nlargest( | |
-------- | ||
DataFrame.nsmallest : Return the first `n` rows ordered by `columns` in | ||
ascending order. | ||
DataFrame.nsorted : Return the first `n` rows ordered by `columns` in | ||
the order given in `ascending`. | ||
DataFrame.sort_values : Sort DataFrame by the values. | ||
DataFrame.head : Return the first `n` rows without re-ordering. | ||
|
||
|
@@ -7553,7 +7712,7 @@ def nlargest( | |
Italy 59000000 1937894 IT | ||
Brunei 434000 12128 BN | ||
|
||
When using ``keep='all'``, the number of element kept can go beyond ``n`` | ||
When using ``keep='all'``, the number of elements kept can go beyond ``n`` | ||
if there are duplicate values for the smallest element, all the | ||
ties are kept: | ||
|
||
|
@@ -7584,7 +7743,7 @@ def nlargest( | |
Italy 59000000 1937894 IT | ||
Brunei 434000 12128 BN | ||
""" | ||
return selectn.SelectNFrame(self, n=n, keep=keep, columns=columns).nlargest() | ||
return self.nsorted(n=n, columns=columns, ascending=False, keep=keep) | ||
|
||
def nsmallest( | ||
self, n: int, columns: IndexLabel, keep: NsmallestNlargestKeep = "first" | ||
|
@@ -7596,6 +7755,9 @@ def nsmallest( | |
ascending order. The columns that are not specified are returned as | ||
well, but not used for ordering. | ||
|
||
This method is equivalent to | ||
``df.nsorted(n, columns, ascending=True)``. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ditto. |
||
|
||
This method is equivalent to | ||
``df.sort_values(columns, ascending=True).head(n)``, but more | ||
performant. | ||
|
@@ -7623,6 +7785,8 @@ def nsmallest( | |
-------- | ||
DataFrame.nlargest : Return the first `n` rows ordered by `columns` in | ||
descending order. | ||
DataFrame.nsorted : Return the first `n` rows ordered by `columns` in | ||
the order given in `ascending`. | ||
DataFrame.sort_values : Sort DataFrame by the values. | ||
DataFrame.head : Return the first `n` rows without re-ordering. | ||
|
||
|
@@ -7715,7 +7879,7 @@ def nsmallest( | |
Anguilla 11300 311 AI | ||
Nauru 337000 182 NR | ||
""" | ||
return selectn.SelectNFrame(self, n=n, keep=keep, columns=columns).nsmallest() | ||
return self.nsorted(n=n, columns=columns, ascending=True, keep=keep) | ||
|
||
def swaplevel(self, i: Axis = -2, j: Axis = -1, axis: Axis = 0) -> DataFrame: | ||
""" | ||
|
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.