Skip to content

Conversation

@MarcoGorelli
Copy link
Member

@MarcoGorelli MarcoGorelli commented Jan 10, 2020

@MarcoGorelli MarcoGorelli changed the title 🐛 Series.diff was always setting the dtype to object [BUG] Series.diff was always setting the dtype to object Jan 10, 2020
@MarcoGorelli
Copy link
Member Author

@jreback @jbrockmendel @WillAyd thanks for your quick reviews, I've updated the PR accordingly

Copy link
Member

@jorisvandenbossche jorisvandenbossche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR!

Copy link
Contributor

@TomAugspurger TomAugspurger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't the correct fix to to dispatch diff to the EA, rather than trying to fix the type after the fact? Or can we perhaps write a version of diff in terms of take and __sub__?

@jreback
Copy link
Contributor

jreback commented Jan 13, 2020

Isn't the correct fix to to dispatch diff to the EA, rather than trying to fix the type after the fact? Or can we perhaps write a version of diff in terms of take and __sub__?

this is actually a good idea

would be simple to write as a generic EA
method

@MarcoGorelli
Copy link
Member Author

MarcoGorelli commented Jan 13, 2020

@TomAugspurger @jreback I don't understand.

Currently, we have

        result = algorithms.diff(com.values_from_object(self), periods)

In the case of

pd.Series([1, 2, 3], dtype="Int16").diff()

then

com.values_from_object(self)

is already converting the values to Object before even reaching algorithms.diff:

>>> com.values_from_object(self)
array([1, 2, 3], dtype=object)

So should com.values_from_object be changed?

Anyway, I realise that there's a somewhat tight deadline to get v1.0.0 out, and I'm happy to keep tackling this issue, but if you think it is (currently) beyond my capabilities, I'm happy for someone else to take over :)

@TomAugspurger
Copy link
Contributor

TomAugspurger commented Jan 13, 2020

So should com.values_from_object be changed?

We would need to not call that, at least for EAs, but probably for all dtypes, before passing it off. I can probably take a look tomorrow.

We have a week or two for 1.0.0 final.

@MarcoGorelli MarcoGorelli force-pushed the nullable-diff branch 2 times, most recently from bdc1883 to 6ad19ca Compare January 13, 2020 15:55
@MarcoGorelli MarcoGorelli changed the title [BUG] Series.diff was always setting the dtype to object (wip) [BUG] Series.diff was always setting the dtype to object Jan 13, 2020
@MarcoGorelli MarcoGorelli force-pushed the nullable-diff branch 2 times, most recently from c4b5cb7 to ac7bb30 Compare January 13, 2020 16:06
@MarcoGorelli
Copy link
Member Author

@TomAugspurger OK thanks - I might have figured out what you were suggesting anyway, have pushed another commit

@MarcoGorelli MarcoGorelli changed the title (wip) [BUG] Series.diff was always setting the dtype to object [BUG] Series.diff was always setting the dtype to object Jan 13, 2020
@TomAugspurger
Copy link
Contributor

See #31025 for an alternative which allows EAs to define how the operation should be done.

@TomAugspurger
Copy link
Contributor

I think we'll proceed with #31025. Apologies for the duplicated effort @MarcoGorelli.

@TomAugspurger TomAugspurger added this to the No action milestone Jan 19, 2020
@MarcoGorelli MarcoGorelli deleted the nullable-diff branch January 19, 2020 17:11
@MarcoGorelli
Copy link
Member Author

It's OK, it was a good learning experience to attempt this and then read through your PR!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Series lose nullable integer dtype after call to .diff()

6 participants