Skip to content

Conversation

@zsxwing
Copy link
Member

@zsxwing zsxwing commented Mar 30, 2015

This PR replaced the following Actors to RpcEndpoint:

  1. HeartbeatReceiver
  2. ExecutorActor
  3. BlockManagerMasterActor
  4. BlockManagerSlaveActor
  5. CoarseGrainedExecutorBackend and subclasses
  6. CoarseGrainedSchedulerBackend.DriverActor

This is the first PR. I will split the work of SPARK-6602 to several PRs for code review.

@SparkQA
Copy link

SparkQA commented Mar 30, 2015

Test build #29406 has finished for PR 5268 at commit 705245d.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
  • This patch does not change any dependencies.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need to send some reply here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh ic it is done in doAsync. Is this ok with the RPC contract that we are replying from a different thread?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. RpcCallContext should be thread-safe and can be called in another thread. It's a very common usage. I will add docs about it.

@rxin
Copy link
Contributor

rxin commented Mar 31, 2015

I took a very quick look at this. @zsxwing If there are any tricky parts, can you point them out on this pull request to make it easier to review?

@SparkQA
Copy link

SparkQA commented Mar 31, 2015

Test build #29481 has finished for PR 5268 at commit 30a9036.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
  • This patch does not change any dependencies.

@zsxwing
Copy link
Member Author

zsxwing commented Mar 31, 2015

There are two things that's worth to review carefully:

Calling setupEndpointRef or setupThreadSafeEndpointRef to create RpcEndpoint. When I was doing this PR, I found that it's inconvenient to check if using setupEndpointRef or setupThreadSafeEndpointRef correctly. The problem is I need to read the implementation of RpcEndpoint in one file at first, search and switch to other places that creating RpcEndpoint. I'm wondering if the current design of setupEndpointRef and setupThreadSafeEndpointRef can be improved.

One possible improvement is using a special trait, such as ThreadSafeRpcEndpoint to mark the RpcEndpoint that needs to be thread-safe, and only have one method, setupEndpointRef, to setup RpcEndpoint. We can use isInstanceOf to distinguish them.

If a message is passed to send, the receiver side should handle it in receive. If a message is passed to sendWithReply or askWithReply, the receiver side should handle it in receiveAndReply. If using a wrong method, the message will be lost (of cause, will be logged).

@zsxwing
Copy link
Member Author

zsxwing commented Mar 31, 2015

does send ever throw an exception? what happens to that thread?

self.send may throw an IllegalArgumentException, because both self.send and onStop may called at the same time.

I updated the contract of self. Now if a RpcEndpointRef is stopped, self will return null. I also added Utils.tryLogNonFatalError, which will catch all non-fatal errors, and use it to wrap the codes in Runnables.

@rxin
Copy link
Contributor

rxin commented Apr 3, 2015

I like the idea of ThreadSafeRpcEndpoint.

I need to think a little bit more about send/sendWithReply. I do find it error prone as is.

@rxin
Copy link
Contributor

rxin commented Apr 3, 2015

The changes mostly lgtm. Let's add the ThreadSafeRpcEndpoint interface for differentiation.

zsxwing added 2 commits April 4, 2015 21:29
Conflicts:
	core/src/main/scala/org/apache/spark/HeartbeatReceiver.scala
	core/src/main/scala/org/apache/spark/SparkContext.scala
	core/src/main/scala/org/apache/spark/util/Utils.scala
	yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala
@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/29710/
Test FAILed.

@rxin
Copy link
Contributor

rxin commented Apr 4, 2015

Merging this in master. Thanks.

@asfgit asfgit closed this in f15806a Apr 4, 2015
@zsxwing zsxwing deleted the rpc-rewrite branch December 23, 2015 23:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants