Skip to content

Conversation

@dsaxton
Copy link
Contributor

@dsaxton dsaxton commented Feb 27, 2020

# In other series, the leads to False, so do that here too
ret[mask] = False
if opname == "__ne__":
ret[mask & (self._codes == other_codes)] = True
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This redundancy was intentional and an attempt to "reuse" the mask calculation above since I think it may short-circuit when mask is False, but could just make it (self._codes == -1) & (other_codes == -1) as well. I also think __ne__ is the only situation that'd have to be special-cased here out of the comparison operators. Also worth pointing out that this is assuming we're dealing with NaN rather than NA.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(self._codes == -1) & (other_codes == -1)

I think this would be more clear. Also there might be a cached _isnan attribute for this

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dsaxton im working on cleaning up these comparisons, am confused by this line. can we make this use the more common pattern

fill_value = True if op is operator.ne else False
[...]

ret = op(self._codes, other_codes)
mask = (self._codes == -1) | (other_codes == -1)
ret[mask] = fill_value

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jbrockmendel That is much simpler and seems correct to me

with pytest.raises(TypeError, match=msg):
op(data, other)

def test_not_equal_with_na(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we get this right in cases with e.g. datetime64 or datetime64tz categories?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems to work, added some parameterization over other category types

@jreback jreback added Categorical Categorical Data Type Reshaping Concat, Merge/Join, Stack/Unstack, Explode labels Mar 3, 2020
@jreback jreback added this to the 1.1 milestone Mar 3, 2020
@jreback jreback merged commit 821aa25 into pandas-dev:master Mar 3, 2020
@jreback
Copy link
Contributor

jreback commented Mar 3, 2020

thanks @dsaxton

@dsaxton dsaxton deleted the noteq-cat branch April 9, 2020 02:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Categorical Categorical Data Type Reshaping Concat, Merge/Join, Stack/Unstack, Explode

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Categorical NaN behaviour different from a str

3 participants