Skip to content

Conversation

lithomas1
Copy link
Contributor

@lithomas1 lithomas1 added the IO Pickle read_pickle, to_pickle label Dec 3, 2023
@lithomas1 lithomas1 added this to the 2.1.4 milestone Dec 3, 2023

# Old pandas (<=1.3.0) will call this function with placements
# that aren't necessarily BlockPlacements when unpickling
if not isinstance(placement, BlockPlacement):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can this be done in _setstate__ or something? a lot of effort has gone into optimizing this function

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also maybe deprecate reading of such-old pickles?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea on deprecating reading old pickles, I'll do that as a follow-up (1.3.5 is 2 years old at this point I think).

How would I change __setstate__? I'm not familiar at all with pickle stuff.

I thought __reduce__ calls new_block directly in 1.3.5.

cpdef __reduce__(self):
# We have to do some gymnastics b/c "ndim" is keyword-only
from functools import partial
from pandas.core.internals.blocks import new_block
args = (self.values, self.mgr_locs.indexer)
func = partial(new_block, ndim=self.ndim)
return func, args

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yah, __reduce__ sounds like the right place for this

Copy link
Contributor Author

@lithomas1 lithomas1 Dec 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, the old __reduce__ (from 1.3.5) is getting called though, which is leading to the problem, since it directly calls new_block.

The newer reduce goes through _unpickle_block, which does the conversion to BlockPlacement for us.

Are we OK just taking the perf hit for now?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might need patching in compat.pickle_compat then. That is always a PITA.

i think id be against taking the perf hit for something so old since it would be hit a zillion times.

@lithomas1
Copy link
Contributor Author

Huh, the pickle_compat route is even simpler.

I just re-routed new_block to go through _unpickle_block.
Hopefully, that should be right.

Copy link
Member

@WillAyd WillAyd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@mroeschke
Copy link
Member

Thanks so much for looking into this @lithomas1

meeseeksmachine pushed a commit to meeseeksmachine/pandas that referenced this pull request Dec 5, 2023
@lithomas1 lithomas1 deleted the fix-read-old-pkl branch December 5, 2023 00:40
lithomas1 added a commit that referenced this pull request Dec 5, 2023
…m pandas 1.3.5) (#56335)

Backport PR #56308: REGR: Fix reading old pickles from pandas 1.3.5

Co-authored-by: Thomas Li <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
IO Pickle read_pickle, to_pickle
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: pd.read_pickle not backwards compatible from v2.1.0 to v1.3.4
4 participants