Skip to content

Conversation

@davies
Copy link
Contributor

@davies davies commented Feb 13, 2015

There is chance of dead lock that the Python process is waiting for ending mark from JVM, but which is eaten by corrupted stream.

This PR checks the ending mark from Python in non-block way, so it will not blocked by Python process.

There is a small chance that the ending mark is sent by Python process but not available right now, then Python process will not be used.

cc @JoshRosen @pwendell

@SparkQA
Copy link

SparkQA commented Feb 13, 2015

Test build #27463 has started for PR 4601 at commit 05e1085.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Feb 14, 2015

Test build #27463 has finished for PR 4601 at commit 05e1085.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • case class Params(

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/27463/
Test PASSed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo: "Tt"

@JoshRosen
Copy link
Contributor

Can we add logging for the uncommon cases here? I'd add a log message for the case where the next integer is not available and a second case for when it's not END_OF_STREAM (this log message should contain the actual integer received).

@SparkQA
Copy link

SparkQA commented Feb 17, 2015

Test build #27598 has started for PR 4601 at commit 656d544.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Feb 17, 2015

Test build #27600 has started for PR 4601 at commit 890329c.

  • This patch merges cleanly.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "ended cleanly" case can probably stay at logInfo (or maybe logDebug), but I think we should make this case and the other error-case into warnings so that they aren't swallowed at lower log levels.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a normal case, could happen when user calls take(), I think it should be INFO.

For the first one, INFO or DEBUG both work for me.

@JoshRosen
Copy link
Contributor

Minor log-level nitpicking aside, this looks good to me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like use INFO here, then we can expect one logging for each task, saying it's re-used or not.

@davies
Copy link
Contributor Author

davies commented Feb 17, 2015

@JoshRosen updated

@SparkQA
Copy link

SparkQA commented Feb 17, 2015

Test build #27602 has started for PR 4601 at commit e15a8c3.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Feb 17, 2015

Test build #27602 has finished for PR 4601 at commit e15a8c3.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/27602/
Test FAILed.

@SparkQA
Copy link

SparkQA commented Feb 17, 2015

Test build #27600 has finished for PR 4601 at commit 890329c.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/27600/
Test FAILed.

@SparkQA
Copy link

SparkQA commented Feb 17, 2015

Test build #27598 has finished for PR 4601 at commit 656d544.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/27598/
Test PASSed.

@JoshRosen
Copy link
Contributor

LGTM. I'm going to merge this into master (1.4.0), branch-1.3 (1.3.0), and branch-1.2 (1.2.2).

asfgit pushed a commit that referenced this pull request Feb 17, 2015
There is chance of dead lock that the Python process is waiting for ending mark from JVM, but which is eaten by corrupted stream.

This PR checks the ending mark from Python in non-block way, so it will not blocked by Python process.

There is a small chance that the ending mark is sent by Python process but not available right now, then Python process will not be used.

cc JoshRosen pwendell

Author: Davies Liu <[email protected]>

Closes #4601 from davies/freeze and squashes the following commits:

e15a8c3 [Davies Liu] update logging
890329c [Davies Liu] Merge branch 'freeze' of github.com:davies/spark into freeze
2bd2228 [Davies Liu] add more logging
656d544 [Davies Liu] Update PythonRDD.scala
05e1085 [Davies Liu] check ending mark in non-block way

(cherry picked from commit ac6fe67)
Signed-off-by: Josh Rosen <[email protected]>
@asfgit asfgit closed this in ac6fe67 Feb 17, 2015
asfgit pushed a commit that referenced this pull request Feb 17, 2015
There is chance of dead lock that the Python process is waiting for ending mark from JVM, but which is eaten by corrupted stream.

This PR checks the ending mark from Python in non-block way, so it will not blocked by Python process.

There is a small chance that the ending mark is sent by Python process but not available right now, then Python process will not be used.

cc JoshRosen pwendell

Author: Davies Liu <[email protected]>

Closes #4601 from davies/freeze and squashes the following commits:

e15a8c3 [Davies Liu] update logging
890329c [Davies Liu] Merge branch 'freeze' of github.com:davies/spark into freeze
2bd2228 [Davies Liu] add more logging
656d544 [Davies Liu] Update PythonRDD.scala
05e1085 [Davies Liu] check ending mark in non-block way

(cherry picked from commit ac6fe67)
Signed-off-by: Josh Rosen <[email protected]>
@JoshRosen
Copy link
Contributor

I merged this into master (1.4.0), branch-1.3 (1.3.0), and branch-1.2 (1.2.2), but did so right before I noticed that there's a comment on JIRA suggesting that this didn't fix the freeze. I guess I was a bit too trigger-happy here since I wanted to try to squeeze a fix in for 1.3.0.

asfgit pushed a commit that referenced this pull request Feb 17, 2015
asfgit pushed a commit that referenced this pull request Feb 17, 2015
asfgit pushed a commit that referenced this pull request Feb 17, 2015
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JoshRosen This does not work very well in practice, it's common to see some workers can not be re-used, I will try to find a better solution, or revert this? (because it seems that it did not solve the freeze problem).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, let's revert and continue to investigate.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, thanks!

@JoshRosen
Copy link
Contributor

Reverted in master (1.4.0), branch-1.3 (1.3.0), and branch-1.2 (1.2.2).

markhamstra pushed a commit to markhamstra/spark that referenced this pull request Feb 24, 2015
There is chance of dead lock that the Python process is waiting for ending mark from JVM, but which is eaten by corrupted stream.

This PR checks the ending mark from Python in non-block way, so it will not blocked by Python process.

There is a small chance that the ending mark is sent by Python process but not available right now, then Python process will not be used.

cc JoshRosen pwendell

Author: Davies Liu <[email protected]>

Closes apache#4601 from davies/freeze and squashes the following commits:

e15a8c3 [Davies Liu] update logging
890329c [Davies Liu] Merge branch 'freeze' of github.com:davies/spark into freeze
2bd2228 [Davies Liu] add more logging
656d544 [Davies Liu] Update PythonRDD.scala
05e1085 [Davies Liu] check ending mark in non-block way

(cherry picked from commit ac6fe67)
Signed-off-by: Josh Rosen <[email protected]>
markhamstra pushed a commit to markhamstra/spark that referenced this pull request Feb 24, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants