Skip to content

Conversation

@viirya
Copy link
Member

@viirya viirya commented Feb 4, 2015

testPackage in StreamingContextSuite often throws SparkException because its ssc is not shut down gracefully. Not affect the unit test but I think we can make it graceful.

@srowen
Copy link
Member

srowen commented Feb 4, 2015

All the other calls to stop() in tests are un-graceful; why is this one specially problematic? or should they all be graceful? Also why change the logic to not stop the SparkContext here?

@SparkQA
Copy link

SparkQA commented Feb 4, 2015

Test build #26746 timed out for PR 4364 at commit 682a52f after a configured wait of 120m.

@viirya
Copy link
Member Author

viirya commented Feb 4, 2015

I think that is because it uses TestReceiver to produce data. In this test suite, only testPackage and other two unit tests use TestReceiver or similar receiver. And the other two unit tests are both used to test stop ssc gracefully. I think whether stop SparkContext or not should not affect it.

@SparkQA
Copy link

SparkQA commented Feb 4, 2015

Test build #26750 timed out for PR 4364 at commit c232af1 after a configured wait of 120m.

@viirya
Copy link
Member Author

viirya commented Feb 4, 2015

I don't see any error but the test failed...

@srowen
Copy link
Member

srowen commented Feb 4, 2015

@viirya It timed out, which may actually be related to waiting for the test to gracefully complete!

@tdas
Copy link
Contributor

tdas commented Feb 4, 2015

Yes, I agree with @srowen . Might be worth investigating why stopGracefully does not stop at all.

@viirya viirya changed the title [Minor] Let the test stop gracefully and don't throw exception [SPARK-5615] Let the test stop gracefully and don't throw exception Feb 5, 2015
@viirya
Copy link
Member Author

viirya commented Feb 5, 2015

After investigating that, I found that sometimes the receiver will be registered into tracker after ssc.stop is called. So the receiver doesn't get the StopReceiver message from the tracker. And ssc.stop waits it forever in graceful mode. It is needed to call ssc.awaitTermination before stopping ssc.

@SparkQA
Copy link

SparkQA commented Feb 5, 2015

Test build #26836 has finished for PR 4364 at commit 685875d.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@viirya
Copy link
Member Author

viirya commented Feb 5, 2015

@srowen @tdas I think I figure out the reason and solve it. Please take a look. Thanks!

@tdas
Copy link
Contributor

tdas commented Feb 5, 2015

Could you explain why adding the await termination did the trick?? Are there issues like if you stop() too soon after you start(), the thing may get stuck indefinitely?

@viirya
Copy link
Member Author

viirya commented Feb 6, 2015

Yes. That is because ReceiverTracker.ReceiverLauncher has been revised recently to introduce the graceful mode that waits the all receivers to quit before it stops. It has no timeout setting now.

So if you stop() soon after you start() and you stop() in graceful way, then you will possibly hit this problem and get stuck indefinitely.

Adding the await termination can do the trick. And we should add a timeout to ReceiverTracker.ReceiverLauncher too. Do you need me to add that in this pr?

@viirya
Copy link
Member Author

viirya commented Feb 8, 2015

@srowen @tdas Do you think is it worth fixing this? Or leave it untouched?

@tdas
Copy link
Contributor

tdas commented Feb 9, 2015

I think we should fix that problem first. stop() immediately after start()
should not cause problems. Lets fix that first, then we can see how to fix
this problem. Do you want to post a patch for that?

On Sun, Feb 8, 2015 at 8:36 AM, Liang-Chi Hsieh [email protected]
wrote:

@srowen https://github.com/srowen @tdas https://github.com/tdas Do
you think is it worth fixing this? Or leave it untouched?


Reply to this email directly or view it on GitHub
#4364 (comment).

@viirya
Copy link
Member Author

viirya commented Feb 9, 2015

Okay I would submit a patch for that later.

@srowen
Copy link
Member

srowen commented Feb 18, 2015

@viirya Are we closing this PR for the moment? what's the status of the JIRA, given the discussion?

@viirya
Copy link
Member Author

viirya commented Feb 18, 2015

Without this pr, running testPackage in StreamingContextSuite sometimes will still throw SparkException. As I said, it doesn't affect the unit test. But I think the Exception and error message are annoying. How do you think? Are we closing this?

@srowen
Copy link
Member

srowen commented Feb 18, 2015

@viirya it sounds like @tdas prefers to focus on fixing the more fundamental issue you identified above. If that's going to happen and also fixes this, then I don't think it's worth committing this, if it's just a band-aid for a cosmetic issue. But is that the case?

@viirya
Copy link
Member Author

viirya commented Feb 18, 2015

Yes, this pr is just used to avoid causing the SparkException message. I agreed that we can close it now and focus on the another issue.

@viirya viirya closed this Feb 18, 2015
@viirya viirya deleted the fix_test_exception branch December 27, 2023 18:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants