Skip to content

Conversation

@xuanyuanking
Copy link
Member

@xuanyuanking xuanyuanking commented Aug 21, 2018

What changes were proposed in this pull request?

Currently ContextBarrierState is only covered by end-to-end test in BarrierTaskContextSuite, add BarrierCoordinatorSuite to test both classes.

How was this patch tested?

UT newly added in ContextBarrierStateSuite.

@xuanyuanking
Copy link
Member Author

cc @jiangxb1987

@SparkQA
Copy link

SparkQA commented Aug 21, 2018

Test build #94994 has finished for PR 22165 at commit 21bd1c3.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@xuanyuanking
Copy link
Member Author

retest this please.

@SparkQA
Copy link

SparkQA commented Aug 21, 2018

Test build #95007 has finished for PR 22165 at commit 21bd1c3.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@xuanyuanking
Copy link
Member Author

cc @gatorsmile @cloud-fan

@gatorsmile
Copy link
Member

cc @jiangxb1987 @mengxr

@gatorsmile
Copy link
Member

@xuanyuanking thanks for helping the test coverage!

@jiangxb1987
Copy link
Contributor

I'll make one pass of this later today :) Thanks for taking this task!

@xuanyuanking
Copy link
Member Author

My pleasure, just find this during glance over jira in recent days. :)

@jiangxb1987
Copy link
Contributor

One general idea is that we don't need to rely on the RPC framework to test ContextBarrierState, just mock RpcCallContexts should be enough (haven't go into detail so correct me if I'm wrong).
We shall cover the following scenarios:

  • RequestToSync that carries different numTasks;
  • RequestToSync that carries different barrierEpoch;
  • Collect enough RequestToSync messages before timeout;
  • Don't collect enough RequestToSync messages before timeout;
  • Handle RequestToSync from different stage attempts concurrently;
  • Make sure we clear all the internal data under each case.

@xuanyuanking
Copy link
Member Author

@jiangxb1987 Great thanks for your comment!

One general idea is that we don't need to rely on the RPC framework to test ContextBarrierState, just mock RpcCallContexts should be enough.

Actually I also want to implement like this at first also as you asked in jira, but ContextBarrierState is the private inner class in BarrierCoordinator. Could I do the refactor of moving ContextBarrierState out of BarrierCoordinator? If that is permitted I think we can just mock RpcCallContext to reach this.

We shall cover the following scenarios:

Pretty cool for the list, the 5 in front scenarios are including in currently implement, I'll add the last checking work of Make sure we clear all the internal data under each case. after we reach an agreement.

@xuanyuanking
Copy link
Member Author

Could I do the refactor of moving ContextBarrierState out of BarrierCoordinator?

gental ping @jiangxb1987, I still follow up this. :)

@jiangxb1987
Copy link
Contributor

I think it should be fine to make ContextBarrierState private[spark] to test it, WDYT @mengxr ?

barrierEpoch = 0),
timeout = new RpcTimeout(5 seconds, "rpcTimeOut"))
// sleep for waiting barrierEpoch value change
Thread.sleep(500)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do not use explicit sleep. It basically means adding 0.5 seconds to total test time and flakyness. Use conditional wait, for example: bfb7439#diff-a90010f459c27926238d7a4ce5a0aca1R107

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for guidance, done in ecf12bd. I'll also pay attention in the future work.

}.start()
}
// sleep for waiting barrierEpoch value change
Thread.sleep(500)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

@mengxr
Copy link
Contributor

mengxr commented Sep 5, 2018

I didn't make a full pass over the tests. @jiangxb1987 let me know if you need me to take a pass.

}

// Check for clearing internal data, visible for test only.
private[spark] def cleanCheck(): Boolean = requesters.isEmpty && timerTask == null
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add internal data clean check in Xingbo's comments: #22165 (comment).

@SparkQA
Copy link

SparkQA commented Sep 6, 2018

Test build #95761 has finished for PR 22165 at commit ecf12bd.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@xuanyuanking
Copy link
Member Author

gental ping @jiangxb1987

// to identify each barrier() call. It shall get increased when a barrier() call succeeds, or
// reset when a barrier() call fails due to timeout.
private var barrierEpoch: Int = 0
private[spark] var barrierEpoch: Int = 0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Is it better to add a getter method? This is because to make var visible may cause unexpected update of the variable.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make sense, done in ec8466a

@SparkQA
Copy link

SparkQA commented Sep 18, 2018

Test build #96173 has finished for PR 22165 at commit ec8466a.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@xuanyuanking
Copy link
Member Author

retest this please.

@SparkQA
Copy link

SparkQA commented Sep 18, 2018

Test build #96176 has finished for PR 22165 at commit ec8466a.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@xuanyuanking
Copy link
Member Author

xuanyuanking commented Sep 19, 2018

UT fixed by #22452.

@xuanyuanking
Copy link
Member Author

retest this please.

@SparkQA
Copy link

SparkQA commented Sep 19, 2018

Test build #96216 has finished for PR 22165 at commit ec8466a.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@xuanyuanking
Copy link
Member Author

gental ping @jiangxb1987 @kiszk

private[spark] def cleanCheck(): Boolean = requesters.isEmpty && timerTask == null

// Get currently barrier epoch, visible for test only.
private[spark] def getBarrierEpoch(): Int = barrierEpoch
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just make barrierEpoch visible for testing?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#22165 (comment) As the comment here, need revert back?

}

// Check for clearing internal data, visible for test only.
private[spark] def cleanCheck(): Boolean = requesters.isEmpty && timerTask == null
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: cleanCheck() -> isInternalStateClear()

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, done in fd4d150.

if (epoch != barrierEpoch) {
requester.sendFailure(new SparkException(s"The request to sync of $barrierId with " +
s"barrier epoch $barrierEpoch has already finished. Maybe task $taskId is not " +
s"barrier epoch $epoch has already finished. Maybe task $taskId is not " +
Copy link
Member Author

@xuanyuanking xuanyuanking Sep 27, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

During write the UT for ContextBarrierState, I think this is a typo in message? @jiangxb1987 Please correct me if I'm wrong.

@SparkQA
Copy link

SparkQA commented Sep 27, 2018

Test build #96702 has finished for PR 22165 at commit 8cd78a9.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Sep 27, 2018

Test build #96703 has finished for PR 22165 at commit fd4d150.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@jiangxb1987
Copy link
Contributor

Actually my original thinking was like this:

      val state = new ContextBarrierState(barrierId, numTasks)
      val requester = mockRequester()
      val request = forgeRequest(numTasks, stageId, stageAttemptId, taskAttemptId, barrierEpoch)
      state.handleRequest(requester, request)
      // Verify states
      ...
      // Verify cleanup
      ...

So you don't have to launch a SparkContext for the test. Could you please check whether this is feasible?

@xuanyuanking xuanyuanking changed the title [SPARK-25017][Core] Add test suite for BarrierCoordinator and ContextBarrierState [SPARK-25017][Core] Add test suite for ContextBarrierState Oct 13, 2018
}

test("normal test for single task") {
val barrierCoordinator = new BarrierCoordinator(
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So you don't have to launch a SparkContext for the test. Could you please check whether this is feasible?

Thanks for Xingbo's guidance and sorry for misunderstand at first. That's feasible. But maybe this is the last thing not clear cause we still need a real BarrierCoordinator. Because a mock one will cause the timer NPE. Thanks @jiangxb1987

timer.schedule(timerTask, timeoutInSecs * 1000)

@SparkQA
Copy link

SparkQA commented Oct 13, 2018

Test build #97337 has finished for PR 22165 at commit aea2fa0.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@xuanyuanking
Copy link
Member Author

retest this please.

@SparkQA
Copy link

SparkQA commented Nov 6, 2018

Test build #98516 has finished for PR 22165 at commit aea2fa0.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@xuanyuanking
Copy link
Member Author

gental ping @jiangxb1987

@github-actions
Copy link

github-actions bot commented Jan 8, 2020

We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.
If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!

@github-actions github-actions bot added the Stale label Jan 8, 2020
@github-actions github-actions bot closed this Jan 9, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants