Skip to content

Conversation

@HyukjinKwon
Copy link
Member

What changes were proposed in this pull request?

TDB

Why are the changes needed?

TDB

Does this PR introduce any user-facing change?

TDB

How was this patch tested?

TDB

This is a followup of the discussion in apache#21157. See the PR and the
linked JIRA ticket for context and motivation.
@HyukjinKwon HyukjinKwon marked this pull request as draft September 23, 2020 11:22
@SparkQA

This comment has been minimized.

@SparkQA
Copy link

SparkQA commented Sep 23, 2020

Test build #129027 has finished for PR 29851 at commit db3eafa.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Sep 23, 2020

Test build #129029 has finished for PR 29851 at commit d837158.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@tvalentyn
Copy link

Hi, @HyukjinKwon - do you know what are the next steps on addressing this issue? Do you plan to work on it? If not, is there any chance you can help find an owner? Thanks!

@HyukjinKwon
Copy link
Member Author

I plan to remove this out at the end. Investigating.

@tvalentyn
Copy link

Thanks a lot. Some context: this monkeypatch makes it difficult for other libraries, like tensorflow-extended (tfx) to maintain interoperability for pyspark and requires rather inconvenient hacks to work around it. We would much rather prefer to fix this in pyspark instead.

Do we need help or input from other pyspark maintainers to move forward with the fix?

@rcrowe-google
Copy link

Is this fixed in Spark 3?

@tvalentyn
Copy link

tvalentyn commented Oct 8, 2020

According to #23008 (comment), it's not fixed in 3.0.

@Tagar
Copy link

Tagar commented Oct 8, 2020

@rcrowe-google @tvalentyn edit: I didn't read carefully; I do see that it affects TF too.

This was affecting PyTorch and was fixed from PyTorch side yesterday - pytorch/pytorch#45870

Yes it would be nice to be fixed from PySpark side too. From what I understand, a potential fix in PySpark would require Python 3.8. @HyukjinKwon knows these details better .

@HyukjinKwon HyukjinKwon deleted the remove-hacks branch December 7, 2020 02:07
@hauntsaninja
Copy link
Contributor

Hello! I recently got bitten by the monkeypatching of namedtuple. Anything I can do to help move this along? :-)

@HyukjinKwon
Copy link
Member Author

Hey, yeah we should remove this away since cloudpickle is now backed by C-pickle (Python 3.8+) which is as fast as the regular pickle. I think we can switch to use cloudpickle after removing the hack.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants