-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-25017][Core] Add test suite for ContextBarrierState #22165
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
cc @jiangxb1987 |
|
Test build #94994 has finished for PR 22165 at commit
|
|
retest this please. |
|
Test build #95007 has finished for PR 22165 at commit
|
|
@xuanyuanking thanks for helping the test coverage! |
|
I'll make one pass of this later today :) Thanks for taking this task! |
|
My pleasure, just find this during glance over jira in recent days. :) |
|
One general idea is that we don't need to rely on the RPC framework to test
|
|
@jiangxb1987 Great thanks for your comment! Actually I also want to implement like this at first also as you asked in jira, but Pretty cool for the list, the 5 in front scenarios are including in currently implement, I'll add the last checking work of |
gental ping @jiangxb1987, I still follow up this. :) |
|
I think it should be fine to make |
| barrierEpoch = 0), | ||
| timeout = new RpcTimeout(5 seconds, "rpcTimeOut")) | ||
| // sleep for waiting barrierEpoch value change | ||
| Thread.sleep(500) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do not use explicit sleep. It basically means adding 0.5 seconds to total test time and flakyness. Use conditional wait, for example: bfb7439#diff-a90010f459c27926238d7a4ce5a0aca1R107
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for guidance, done in ecf12bd. I'll also pay attention in the future work.
| }.start() | ||
| } | ||
| // sleep for waiting barrierEpoch value change | ||
| Thread.sleep(500) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
|
I didn't make a full pass over the tests. @jiangxb1987 let me know if you need me to take a pass. |
| } | ||
|
|
||
| // Check for clearing internal data, visible for test only. | ||
| private[spark] def cleanCheck(): Boolean = requesters.isEmpty && timerTask == null |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add internal data clean check in Xingbo's comments: #22165 (comment).
|
Test build #95761 has finished for PR 22165 at commit
|
|
gental ping @jiangxb1987 |
| // to identify each barrier() call. It shall get increased when a barrier() call succeeds, or | ||
| // reset when a barrier() call fails due to timeout. | ||
| private var barrierEpoch: Int = 0 | ||
| private[spark] var barrierEpoch: Int = 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Is it better to add a getter method? This is because to make var visible may cause unexpected update of the variable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make sense, done in ec8466a。
|
Test build #96173 has finished for PR 22165 at commit
|
|
retest this please. |
|
Test build #96176 has finished for PR 22165 at commit
|
|
UT fixed by #22452. |
|
retest this please. |
|
Test build #96216 has finished for PR 22165 at commit
|
|
gental ping @jiangxb1987 @kiszk |
| private[spark] def cleanCheck(): Boolean = requesters.isEmpty && timerTask == null | ||
|
|
||
| // Get currently barrier epoch, visible for test only. | ||
| private[spark] def getBarrierEpoch(): Int = barrierEpoch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not just make barrierEpoch visible for testing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#22165 (comment) As the comment here, need revert back?
| } | ||
|
|
||
| // Check for clearing internal data, visible for test only. | ||
| private[spark] def cleanCheck(): Boolean = requesters.isEmpty && timerTask == null |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: cleanCheck() -> isInternalStateClear()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, done in fd4d150.
core/src/test/scala/org/apache/spark/scheduler/BarrierCoordinatorSuite.scala
Outdated
Show resolved
Hide resolved
| if (epoch != barrierEpoch) { | ||
| requester.sendFailure(new SparkException(s"The request to sync of $barrierId with " + | ||
| s"barrier epoch $barrierEpoch has already finished. Maybe task $taskId is not " + | ||
| s"barrier epoch $epoch has already finished. Maybe task $taskId is not " + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
During write the UT for ContextBarrierState, I think this is a typo in message? @jiangxb1987 Please correct me if I'm wrong.
|
Test build #96702 has finished for PR 22165 at commit
|
|
Test build #96703 has finished for PR 22165 at commit
|
|
Actually my original thinking was like this: So you don't have to launch a SparkContext for the test. Could you please check whether this is feasible? |
| } | ||
|
|
||
| test("normal test for single task") { | ||
| val barrierCoordinator = new BarrierCoordinator( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So you don't have to launch a SparkContext for the test. Could you please check whether this is feasible?
Thanks for Xingbo's guidance and sorry for misunderstand at first. That's feasible. But maybe this is the last thing not clear cause we still need a real BarrierCoordinator. Because a mock one will cause the timer NPE. Thanks @jiangxb1987
| timer.schedule(timerTask, timeoutInSecs * 1000) |
|
Test build #97337 has finished for PR 22165 at commit
|
|
retest this please. |
|
Test build #98516 has finished for PR 22165 at commit
|
|
gental ping @jiangxb1987 |
|
We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable. |
What changes were proposed in this pull request?
Currently
ContextBarrierStateis only covered by end-to-end test inBarrierTaskContextSuite, add BarrierCoordinatorSuite to test both classes.How was this patch tested?
UT newly added in ContextBarrierStateSuite.