[SPARK-21094][PYTHON] Add popen_kwargs to launch_gateway #18339

parente · 2017-06-17T23:47:45Z

What changes were proposed in this pull request?

Allow the caller to customize the py4j JVM subprocess pipes and buffers for programmatic capturing of its output.

https://issues.apache.org/jira/browse/SPARK-21094 has more detail about the use case.

How was this patch tested?

Tested by running the pyspark unit tests locally.

holdenk · 2017-07-02T02:11:03Z

This is interesting, I've got a similar approach I've been working on in #17298 which has some issues inside of PyPI. Would that suit your needs if I extended it to allow you to enable it manually in addition to when the pipe was overloaded?

Let me know.

In the meantime, jenkins ok to test.

parente · 2017-07-03T19:25:03Z

Oh neat. #17298 looks similar to the approach we took in spylon-kernel to launch with stdout/stderr pipes redirected to the parent process and threads to read them (https://github.com/maxpoint/spylon-kernel/blob/master/spylon_kernel/scala_interpreter.py#L73). That project is based on Calysto/metakernel, which has an API for sending stdout/stderr back to kernel clients, so we use that instead of print() like the PR here does.

I still think it would be handy to give clients more control over how the py4j gateway is launched. For instance, if I want to use pyspark in an asyncio application, I might want to open pipes to the jvm process, but then switch them to non-blocking IO mode and hook them up to an async reader. If #17298 merges without a making the threads optional and exposing the pipes for the caller to use, it's likely to be more harmful than helpful in the async situation.

holdenk · 2017-07-06T06:30:27Z

The approach taken in https://github.com/maxpoint/spylon-kernel/blob/master/spylon_kernel/scala_interpreter.py#L73 is interesting (and definitely not supported) - so making it easier for kernels to get at the JVM logs as needed seems worthwhile. That being said if the messages are piped through from the JVM to the existing stderr/stdout pipes would that be sufficient?

holdenk · 2017-09-06T19:00:21Z

Jenkins ok to test.

holdenk · 2017-09-06T19:01:09Z

python/pyspark/java_gateway.py

I'd make this _popen_kwargs to indicate it's usage is possibly not super supported.

Would a comment in the docstring to that effect be better? I haven't seen _var_name used in Python projects to indicate a developer feature. (But of course, maybe I've just not seen it yet!)

holdenk · 2017-09-06T19:01:24Z

python/pyspark/java_gateway.py

Mention that this is a developer feature and may change in future versions.

And ... you already noted what I just commented above. Doh! I'll update the docstring at least.

SparkQA · 2017-09-06T19:34:53Z

Test build #81472 has finished for PR 18339 at commit e9b7743.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

holdenk · 2017-09-08T19:02:19Z

Let's get some extra eyes on this, maybe @davies or @HyukjinKwon want to take a quick look? I think it makes sense as an advanced developer API but I'm open to other ideas.

SparkQA · 2017-09-09T02:54:01Z

Test build #81570 has finished for PR 18339 at commit 3ece21f.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

HyukjinKwon · 2017-09-09T07:58:45Z

Thanks for cc'ing me. To me, I think I can follow the discussion and the motivation here but I think I am neutral (rather -0) as launch_gateway itself looks undocumented and the workaround appears working fine. Will help review this one if any committer strongly prefers this anyway.

holdenk · 2017-11-18T15:08:10Z

Jenkins OK to test.

HyukjinKwon · 2017-11-19T06:20:30Z

I am okay with going ahead @holdenk if you think it's okay anyway.

HyukjinKwon · 2017-11-19T06:20:35Z

retest this please

SparkQA · 2017-11-19T06:53:28Z

Test build #83998 has finished for PR 18339 at commit 3ece21f.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

holdenk · 2018-02-28T00:46:53Z

Lets see what @BryanCutler thinks

HyukjinKwon · 2018-06-09T08:56:56Z

ok to test

SparkQA · 2018-06-09T12:27:36Z

Test build #91604 has finished for PR 18339 at commit 3ece21f.

This patch fails SparkR unit tests.
This patch does not merge cleanly.
This patch adds no public classes.

holdenk · 2018-06-28T20:13:23Z

@HyukjinKwon what re-triggered your interest in this PR?

HyukjinKwon · 2018-06-29T01:51:34Z

Jenkins left a comment asking like "Can one of the admins verify this patch?" again. I was thinking it's worth given your comment above so I just triggered the build again ..

I am not sure why / when / who about Jenkins leaving those comments again to some particular PRs. I was thinking about asking this into dev mailing list if happens one more time.

holdenk · 2018-10-26T16:33:35Z

Since @HyukjinKwon's concerns for this PR have been addressed if @parente can update this to master would be lovely to get this in for 3+ since I'm working on some multi-language pipeline stuff which could benefit.

parente · 2018-10-26T20:27:25Z

@holdenk Took a note to look at it this weekend.

Allow the caller to customize the py4j JVM subprocess pipes and buffers for programmatic capturing of its output.

SparkQA · 2018-10-29T02:22:46Z

Test build #98174 has finished for PR 18339 at commit fa63ba7.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2018-10-29T02:27:42Z

Test build #98175 has finished for PR 18339 at commit ea267c6.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

parente · 2018-10-30T12:49:54Z

@holdenk I rebased the PR and I think it's good to go if you'd like to give it another look.

parente · 2019-02-09T18:31:50Z

Small bump in case this is still of interest for 3.x.

holdenk · 2019-02-11T18:23:22Z

The longer this PR has been open the more times I've seen the need for it, my bad on not coming back to this. Jenkins retest this please.
Looks good to me pending jenkins.

HyukjinKwon · 2019-02-11T18:35:45Z

For clarification, I am okay. no objection.

holdenk · 2019-02-16T01:29:10Z

Jenkins retest this please

holdenk · 2019-02-16T01:29:29Z

@parente if you could merge in master that would trigger a Jenkins run.

holdenk · 2019-02-16T02:03:24Z

Looks like Jenkins listened, everything passed so will merge to master.

SparkQA · 2019-02-16T02:04:39Z

Test build #102407 has finished for PR 18339 at commit ea267c6.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

holdenk · 2019-02-16T02:08:58Z

Merged to master

## What changes were proposed in this pull request? Allow the caller to customize the py4j JVM subprocess pipes and buffers for programmatic capturing of its output. https://issues.apache.org/jira/browse/SPARK-21094 has more detail about the use case. ## How was this patch tested? Tested by running the pyspark unit tests locally. Closes apache#18339 from parente/feature/SPARK-21094-popen-args. Lead-authored-by: Peter Parente <[email protected]> Co-authored-by: Peter Parente <[email protected]> Signed-off-by: Holden Karau <[email protected]>

holdenk reviewed Sep 6, 2017

View reviewed changes

yingw787 mentioned this pull request Oct 25, 2018

[IMPROVEMENT] Add something like py4j.redirect_stdout to reconcile logging mechanisms between Java and Python py4j/py4j#338

Open

parente and others added 3 commits October 28, 2018 21:29

SPARK-21094: Add popen_kwargs to launch_gateway

d00b6df

Allow the caller to customize the py4j JVM subprocess pipes and buffers for programmatic capturing of its output.

SPARK-21094: Add note about developer feature

464f251

SPARK-21094: Rebase on master

fa63ba7

parente force-pushed the feature/SPARK-21094-popen-args branch from 3ece21f to fa63ba7 Compare October 29, 2018 01:42

SPARK-21094: Fix docstring typo

ea267c6

asfgit closed this in 3d6066e Feb 16, 2019

[SPARK-21094][PYTHON] Add popen_kwargs to launch_gateway #18339

[SPARK-21094][PYTHON] Add popen_kwargs to launch_gateway #18339

Uh oh!

Conversation

parente commented Jun 17, 2017

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

holdenk commented Jul 2, 2017

Uh oh!

parente commented Jul 3, 2017

Uh oh!

holdenk commented Jul 6, 2017

Uh oh!

holdenk commented Sep 6, 2017

Uh oh!

holdenk Sep 6, 2017

Choose a reason for hiding this comment

Uh oh!

parente Sep 7, 2017

Choose a reason for hiding this comment

Uh oh!

holdenk Sep 6, 2017

Choose a reason for hiding this comment

Uh oh!

parente Sep 7, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Sep 6, 2017

Uh oh!

holdenk commented Sep 8, 2017

Uh oh!

SparkQA commented Sep 9, 2017

Uh oh!

HyukjinKwon commented Sep 9, 2017

Uh oh!

holdenk commented Nov 18, 2017

Uh oh!

HyukjinKwon commented Nov 19, 2017

Uh oh!

HyukjinKwon commented Nov 19, 2017

Uh oh!

SparkQA commented Nov 19, 2017

Uh oh!

holdenk commented Feb 28, 2018

Uh oh!

HyukjinKwon commented Jun 9, 2018

Uh oh!

SparkQA commented Jun 9, 2018

Uh oh!

holdenk commented Jun 28, 2018

Uh oh!

HyukjinKwon commented Jun 29, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

holdenk commented Oct 26, 2018

Uh oh!

parente commented Oct 26, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

SparkQA commented Oct 29, 2018

Uh oh!

SparkQA commented Oct 29, 2018

Uh oh!

parente commented Oct 30, 2018

Uh oh!

parente commented Feb 9, 2019

Uh oh!

holdenk commented Feb 11, 2019

Uh oh!

HyukjinKwon commented Feb 11, 2019

Uh oh!

holdenk commented Feb 16, 2019

Uh oh!

holdenk commented Feb 16, 2019

Uh oh!

holdenk commented Feb 16, 2019

Uh oh!

SparkQA commented Feb 16, 2019

parente Sep 7, 2017 •

edited

Loading

HyukjinKwon commented Jun 29, 2018 •

edited

Loading

parente commented Oct 26, 2018 •

edited

Loading