Skip to content

Conversation

rhshadrach
Copy link
Member

@rhshadrach rhshadrach added Bug Reshaping Concat, Merge/Join, Stack/Unstack, Explode labels Jun 13, 2023
@rhshadrach rhshadrach requested a review from mroeschke June 13, 2023 14:57
@mroeschke mroeschke added this to the 2.1 milestone Jun 13, 2023
@mroeschke mroeschke merged commit 5edc2cc into pandas-dev:main Jun 13, 2023
@mroeschke
Copy link
Member

Nice! Thanks @rhshadrach

@rhshadrach rhshadrach deleted the stack_sort_bug branch June 13, 2023 18:10
Comment on lines +2040 to +2050
expected_index = MultiIndex(
levels=[[0, 1, 2, 3], [0, 1]],
codes=[[1, 1, 0, 0, 2, 2, 3, 3], [1, 0, 1, 0, 1, 0, 1, 0]],
)
expected = DataFrame(
{
0: [0, 1, 0, 1, 0, 1, 0, 1],
1: [2, 3, 2, 3, 2, 3, 2, 3],
},
index=expected_index,
)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mroeschke - I messed up the order of the codes here, it should be [0, 1, 0, 1...] for the 2nd level. I believe this PR didn't actually change any behavior.

In attempting to fix this, I'm finding myself questioning what sort=True should be doing in stack. Should it be equivalent to just calling result.sort_index(), or should it be only sorting the levels that are added to the index from the columns? The latter would be equivalent to:

result = result.sort_index(level=np.arange(self.index.nlevels, result.index.nlevels))

Either one of these breaks ~20 tests in tests/frame/test_stack_unstack.py. As far as I can tell, only some code paths through DataFrame.stack do sorting at all.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel the most predictable result is for sort=True to be equivalent to result.sort_index instead of only sorting the added levels.

When adding the sort keyword I was more in the mindset of sort=False per #15105 of ensuring added level didn't sort the entire result so I'm pretty new to contemplating what sort=True should truly do.

Copy link
Member Author

@rhshadrach rhshadrach Jun 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks - digging through that issue helped. It seems like the issue wasn't necessarily the order of the values in the MultiIndex (which is a combination of the order of the levels and the associated codes), but rather the order of the levels themselves. I'm finding that sort=True/False only controls the order of the levels (and not the values). I'm going to think on this a bit more - it's becoming hard to introduce #53515 with this behavior (even before the introduction of sort=False).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Understanding that sort=True isn't supposed to sort the values helped; I'm going to revert this PR.

rhshadrach added a commit that referenced this pull request Jun 21, 2023
mroeschke pushed a commit that referenced this pull request Jun 21, 2023
…evels" (#53760)

Revert "BUG: DataFrame.stack with sort=True and unsorted MultiIndex levels (#53637)"

This reverts commit 5edc2cc.
punndcoder28 pushed a commit to punndcoder28/pandas that referenced this pull request Jun 21, 2023
…evels" (pandas-dev#53760)

Revert "BUG: DataFrame.stack with sort=True and unsorted MultiIndex levels (pandas-dev#53637)"

This reverts commit 5edc2cc.
Daquisu pushed a commit to Daquisu/pandas that referenced this pull request Jul 8, 2023
…andas-dev#53637)

* BUG: DataFrame.stack with sort=True and unsorted MultiIndex levels

* revert ignoring warnings
Daquisu pushed a commit to Daquisu/pandas that referenced this pull request Jul 8, 2023
…evels" (pandas-dev#53760)

Revert "BUG: DataFrame.stack with sort=True and unsorted MultiIndex levels (pandas-dev#53637)"

This reverts commit 5edc2cc.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: DataFrame.stack with sort=True and unsorted MultiIndex levels
2 participants