Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions doc/source/whatsnew/v1.5.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -325,6 +325,7 @@ Indexing
- Bug in :meth:`loc.__setitem__` treating ``range`` keys as positional instead of label-based (:issue:`45479`)
- Bug in :meth:`Series.__setitem__` when setting ``boolean`` dtype values containing ``NA`` incorrectly raising instead of casting to ``boolean`` dtype (:issue:`45462`)
- Bug in :meth:`Series.__setitem__` where setting :attr:`NA` into a numeric-dtpye :class:`Series` would incorrectly upcast to object-dtype rather than treating the value as ``np.nan`` (:issue:`44199`)
- Bug in :meth:`Index.__getitem__` raising ``ValueError`` when indexer is from boolean dtype with ``NA`` (:issue:`45806`)
- Bug in :meth:`Series.mask` with ``inplace=True`` or setting values with a boolean mask with small integer dtypes incorrectly raising (:issue:`45750`)
- Bug in :meth:`DataFrame.mask` with ``inplace=True`` and ``ExtensionDtype`` columns incorrectly raising (:issue:`45577`)
- Bug in getting a column from a DataFrame with an object-dtype row index with datetime-like values: the resulting Series now preserves the exact object-dtype Index from the parent DataFrame (:issue:`42950`)
Expand Down
5 changes: 4 additions & 1 deletion pandas/core/indexes/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -5226,7 +5226,10 @@ def __getitem__(self, key):
# takes 166 µs + 2.1 ms and cuts the ndarray.__getitem__
# time below from 3.8 ms to 496 µs
# if we already have ndarray[bool], the overhead is 1.4 µs or .25%
key = np.asarray(key, dtype=bool)
if is_extension_array_dtype(getattr(key, "dtype", None)):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clearer or more performant to do isinstance(key, ExtensionArray)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thought about this too, but does not work unfortunately, since key is a Series here

key = key.to_numpy(dtype=bool, na_value=False)
else:
key = np.asarray(key, dtype=bool)

result = getitem(key)
# Because we ruled out integer above, we always get an arraylike here
Expand Down
8 changes: 8 additions & 0 deletions pandas/tests/indexes/base_class/test_indexing.py
Original file line number Diff line number Diff line change
Expand Up @@ -76,3 +76,11 @@ def test_get_loc_nan_object_dtype_nonmonotonic_nonunique(self):
# we don't match at all on mismatched NA
with pytest.raises(KeyError, match="NaT"):
idx.get_loc(NaT)


def test_getitem_boolean_ea_indexer():
# GH#45806
ser = pd.Series([True, False, pd.NA], dtype="boolean")
result = ser.index[ser]
expected = Index([0])
Copy link
Member

@jbrockmendel jbrockmendel Feb 15, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a discussion where it was decided that this is the correct behavior? could plausibly raise.

could also use ser._values or Index(ser) for the key?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This works for a regular bool series, so thought that this should work too. We can add these test cases too, but if Series should work we have to keep that one

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@phofl just so im clear, your comment is responding to the "could also use..." part of mine? If so that's fine. I'm unclear on the other part of why we're not raising on pd.NA

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah now I got you.

Why should we raise on pd.NA?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why should we raise on pd.NA?

I think of idx[mask] as akin to [idx[n] for n in range(len(mask)) if mask[n]] which would raise.

tm.assert_index_equal(result, expected)