Skip to content

Conversation

johannes-mueller
Copy link
Contributor

@johannes-mueller johannes-mueller commented Nov 23, 2021

BaseMultiIndexCodesEngine._extract_level_codes relies on the assumption that
get_indexer() is always usable as any index level is_unique. However, that
is not the case for overlapping IntervalIndex levels. get_indexer() can
not be used on an IntervalIndex that is_overlapping even if it is_unique.

This patch uses get_indexer_for() instead.

This patch checks if the index level is an overlapping IntervalIndex and then
calls get_indexer_non_unique() instead.

We should not switch to get_indexer_non_unique() for other index levels than
overlapping IntervalIndex, even though that actually would work as well.
That is because get_indexer_non_unique() comes with a performance penalty of
roughly 30% compared to get_indexer(), measured by simple %timeit
measurements.

@johannes-mueller johannes-mueller force-pushed the bugfix/join_multiindex-overlapping-intervals-44096 branch from 5e3a141 to 8f1027f Compare November 23, 2021 13:22
Copy link
Member

@phofl phofl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

small comment, otherwise lgtm

@johannes-mueller johannes-mueller force-pushed the bugfix/join_multiindex-overlapping-intervals-44096 branch from 8f1027f to 7939a68 Compare November 24, 2021 07:47
@jreback jreback added this to the 1.4 milestone Nov 25, 2021
@jreback jreback added Interval Interval data type MultiIndex Reshaping Concat, Merge/Join, Stack/Unstack, Explode labels Nov 25, 2021
@jreback
Copy link
Contributor

jreback commented Nov 25, 2021

@johannes-mueller can you merge master and ping on green

…#44096)

`BaseMultiIndexCodesEngine._extract_level_codes` relies on the assumption that
`get_indexer()` is always usable as any index level `is_unique`.  However, that
is not the case for overlapping `IntervalIndex` levels.  `get_indexer()` can
not be used on an `IntervalIndex` that `is_overlapping` even if it `is_unique`.

This patch uses `get_indexer_for()` instead.
@johannes-mueller johannes-mueller force-pushed the bugfix/join_multiindex-overlapping-intervals-44096 branch from 7939a68 to 83d9ab7 Compare November 25, 2021 17:33
@johannes-mueller
Copy link
Contributor Author

@jreback rebased

@jreback jreback merged commit f0d4d8d into pandas-dev:master Nov 25, 2021
@jreback
Copy link
Contributor

jreback commented Nov 25, 2021

thanks @johannes-mueller

@johannes-mueller johannes-mueller deleted the bugfix/join_multiindex-overlapping-intervals-44096 branch November 26, 2021 07:29
johannes-mueller added a commit to boschresearch/pandas that referenced this pull request Jan 27, 2022
Replacing calls to `get_indexer()` with `get_indexer_for()` as
`IntervalIndex`es can be unique and overlapping.

Similar to pandas-dev#44588
johannes-mueller added a commit to boschresearch/pandas that referenced this pull request Jan 27, 2022
Replacing calls to `get_indexer()` with `get_indexer_for()` as
`IntervalIndex`es can be unique and overlapping.

Similar to pandas-dev#44588
johannes-mueller added a commit to boschresearch/pandas that referenced this pull request Jan 28, 2022
Replacing calls to `get_indexer()` with `get_indexer_for()` as
`IntervalIndex`es can be unique and overlapping.

Similar to pandas-dev#44588
johannes-mueller added a commit to boschresearch/pandas that referenced this pull request Jan 28, 2022
Replacing calls to `get_indexer()` with `get_indexer_for()` as
`IntervalIndex`es can be unique and overlapping.

Similar to pandas-dev#44588
johannes-mueller added a commit to boschresearch/pandas that referenced this pull request Jan 28, 2022
Replacing calls to `get_indexer()` with `get_indexer_for()` as
`IntervalIndex`es can be unique and overlapping.

Similar to pandas-dev#44588
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Interval Interval data type MultiIndex Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: Joining MultiIndex with IntervalIndex level fails when IntervalIndex level is overlapping
3 participants