Skip to content

Conversation

@BryanCutler
Copy link
Member

Changed AppClient to be non-blocking in receiveAndReply by using a separate thread to wait for response and reply to the context. The threads are managed by a thread pool. Also added unit tests for the AppClient interface.

@BryanCutler
Copy link
Member Author

@zsxwing would you mind checking this out? I noticed there were no existing unit tests for AppClient, so I added them. I couldn't think of a way to test that the calls are non-blocking, any ideas?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a way to tell if the client fails to connect to the master? It does log an error, but if I wanted to check to see if the AppClient endpoint was registered after calling start(), there doesn't seem to be a way.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the private volatile var registered lets us check if we suggested at registering with a master is that what you were looking for?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, I was trying to think of making a test for AppClient in the case of an unreachable Master. The rpc env logs an exception right away, but the only way to tell from outside the AppClient is to set a listener with a connected callback, then poll to see if it ever gets hit. Maybe this isn't really an issue in practice though.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the only way to tell from outside the AppClient is to set a listener with a connected callback, then poll to see if it ever gets hit. Maybe this isn't really an issue in practice though.

I think that's fine.

@SparkQA
Copy link

SparkQA commented Oct 28, 2015

Test build #44495 has finished for PR 9317 at commit 5e155cc.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@BryanCutler
Copy link
Member Author

retest this please

@SparkQA
Copy link

SparkQA commented Oct 29, 2015

Test build #44639 has finished for PR 9317 at commit 5e155cc.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@BryanCutler
Copy link
Member Author

@holdenk if you wouldn't mind taking a look at this, I'd appreciate it!

@holdenk
Copy link
Contributor

holdenk commented Oct 30, 2015

Sure thing :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this code (the new runnable sending reply) seems to be duplicated a few times, maybe factor it out into a helper function?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At first I was trying to put these 2 calls into something like Utils.tryWithSafeFinally with blocks as arguments but the call ended up looking a little confusing as to what was happening.

maybe just a regular function like this would be better, although a little less flexible

private def receiveAndReplyAsync[T](masterRef: RpcEndpointRef, context: RpcCallContext,
                                     msg: T): Unit = {
  // execute ask and reply in thread pool
  ..

@BryanCutler
Copy link
Member Author

Thanks for the feedback @holdenk !

@BryanCutler
Copy link
Member Author

retest this please

@SparkQA
Copy link

SparkQA commented Oct 30, 2015

Test build #44705 has finished for PR 9317 at commit 5e155cc.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@BryanCutler
Copy link
Member Author

retest this please

@BryanCutler
Copy link
Member Author

bad luck today I guess :(
jenkins retest this please

@BryanCutler
Copy link
Member Author

retest this please

@SparkQA
Copy link

SparkQA commented Oct 31, 2015

Test build #44722 has finished for PR 9317 at commit ab3e929.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@BryanCutler
Copy link
Member Author

retest this please

@SparkQA
Copy link

SparkQA commented Nov 2, 2015

Test build #44824 has finished for PR 9317 at commit 95d2499.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):\n * case class RegisteredExecutor(hostname: String) extends CoarseGrainedClusterMessage\n

@BryanCutler
Copy link
Member Author

retest this please

@SparkQA
Copy link

SparkQA commented Nov 3, 2015

Test build #44939 has finished for PR 9317 at commit 95d2499.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@BryanCutler
Copy link
Member Author

Hi @zsxwing , could you take a look at this? There have been some superficial test failures, but the last one in CoarseMesosSchedulerBackendSuite looks like it could be real, but as far as I can tell, it does not use the AppClient.

cc @rxin @vanzin

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use org.apache.spark.util.ThreadUtils.newDaemonCachedThreadPool instead.

@zsxwing
Copy link
Member

zsxwing commented Nov 3, 2015

@BryanCutler Could you also make the following variables volatile? They are accessed in multiple threads.

  private var endpoint: RpcEndpointRef = null
  private var appId: String = null

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: indention. The correct indention is:

    private def receiveAndReplyAsync[T](
        masterRef: RpcEndpointRef, context: RpcCallContext, msg: T): Unit = {

@BryanCutler
Copy link
Member Author

Thanks for the feedback @zsxwing !

@BryanCutler Could you also make the following variables volatile? They are accessed in multiple threads.

private var endpoint: RpcEndpointRef = null
private var appId: String = null

I made this change, but would you mind clarifying a little where these are shared? From what I can tell, I think the RpcEndpointRef gets wrapped in the NettyRpcCallContext but appId is just copied to the RequestExecutors and KillExecutors messages.

@zsxwing
Copy link
Member

zsxwing commented Nov 4, 2015

I made this change, but would you mind clarifying a little where these are shared? From what I can tell, I think the RpcEndpointRef gets wrapped in the NettyRpcCallContext but appId is just copied to the RequestExecutors and KillExecutors messages.

E.g., requestTotalExecutors is called from SparkDeploySchedulerBackend and it's not in the thread that setting endpoint and appId.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: must be notified.

@SparkQA
Copy link

SparkQA commented Nov 4, 2015

Test build #45044 has finished for PR 9317 at commit cd5329c.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@BryanCutler
Copy link
Member Author

E.g., requestTotalExecutors is called from SparkDeploySchedulerBackend and it's not in the thread that setting endpoint and appId.

Got it, thanks!

@SparkQA
Copy link

SparkQA commented Nov 4, 2015

Test build #45048 has finished for PR 9317 at commit 4f46cfd.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@zsxwing
Copy link
Member

zsxwing commented Nov 10, 2015

LGTM.

CC @rxin to take a final look

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's better to use an atomic reference to be more explicit here. i'm going to merge this and update it.

@asfgit asfgit closed this in a398905 Nov 11, 2015
asfgit pushed a commit that referenced this pull request Nov 11, 2015
…veAndReply`

Changed AppClient to be non-blocking in `receiveAndReply` by using a separate thread to wait for response and reply to the context.  The threads are managed by a thread pool.  Also added unit tests for the AppClient interface.

Author: Bryan Cutler <[email protected]>

Closes #9317 from BryanCutler/appClient-receiveAndReply-SPARK-10827.

(cherry picked from commit a398905)
Signed-off-by: Reynold Xin <[email protected]>
asfgit pushed a commit that referenced this pull request Nov 11, 2015
This is a followup for #9317 to replace volatile fields with AtomicBoolean and AtomicReference.

Author: Reynold Xin <[email protected]>

Closes #9611 from rxin/SPARK-10827.

(cherry picked from commit e1bcf6a)
Signed-off-by: Reynold Xin <[email protected]>
asfgit pushed a commit that referenced this pull request Nov 11, 2015
This is a followup for #9317 to replace volatile fields with AtomicBoolean and AtomicReference.

Author: Reynold Xin <[email protected]>

Closes #9611 from rxin/SPARK-10827.
dskrvk pushed a commit to dskrvk/spark that referenced this pull request Nov 13, 2015
This is a followup for apache#9317 to replace volatile fields with AtomicBoolean and AtomicReference.

Author: Reynold Xin <[email protected]>

Closes apache#9611 from rxin/SPARK-10827.
@BryanCutler BryanCutler deleted the appClient-receiveAndReply-SPARK-10827 branch November 18, 2015 21:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants