[SPARK-8882][Streaming]Add a new Receiver scheduling mechanism #7276

One high-level question, which perhaps has been addressed elsewhere: why can't we use the existing submitJob method that returns a SimpleFutureAction?

BTW, I use the name submitAsyncJob because if using submitJob, the type inference cannot work well and many places using the previous submitJob cannot be compiled.

SparkQA · 2015-07-11T18:19:17Z

Test build #37090 timed out for PR 7276 at commit 28d1bee after a configured wait of 175m.

tdas · 2015-07-13T21:27:06Z

streaming/src/main/scala/org/apache/spark/streaming/scheduler/ReceiverScheduler.scala

Doesnt it return a list of executors?

JoshRosen · 2015-07-13T22:38:47Z

My high-level comment / question: why not use the existing submitJob that returns a SimpleFutureAction rather than defining your own asynchronous version of that method?

tdas · 2015-07-14T00:11:25Z

streaming/src/main/scala/org/apache/spark/streaming/scheduler/ReceiverTracker.scala

This TODO needs to be removed.

zsxwing · 2015-07-14T00:25:04Z

My high-level comment / question: why not use the existing submitJob that returns a SimpleFutureAction rather than defining your own asynchronous version of that method?

The problem of SimpleFutureAction is its onComplete calls awaitResult. So if using SimpleFutureAction, each Job needs to block one thread. It means each running Receiver needs one thread in the driver. It may exhaust threads.

  override def onComplete[U](func: (Try[T]) => U)(implicit executor: ExecutionContext) {
    executor.execute(new Runnable {
      override def run() {
        func(awaitResult())
      }
    })
  }

tdas · 2015-07-14T00:36:27Z

streaming/src/main/scala/org/apache/spark/streaming/scheduler/ReceiverTracker.scala

I think this messaging should be done differently to align the messaging style with the other messages. It does not make sense for the receiver to fetch all the allowed locations that the receiver is allowed to start. Rather it should be
ShouldStartReceiver, which returns true or false, and accordingly the receiver is started by the ReceiverSupervisor.

Or if we want to maintain the naming scheme (which uses Register instead Start), it may be better to name the new one ValidateLocation(streamId, host)

Actually, we can merge the change you did in #6294, where ReceiverSupervisor.onReceiverStart() (that is, RegisterReceiver message) is called before the receiver is started, and the receiver is only started if RegisterReceiver message returns true. This mechanism can be used by the ReceiverTracker to prevent the receiver from starting on unwanted executors - when RegisterReceiver is received by the tracker, it will return !stopping && scheduleReceiver(receiverId).contains(host)).

This will solve the race condition as well as check for location without introducing additional messaging... isnt it?

Sounds great. One potential issue is if we reject a receiver using !scheduleReceiver(receiverId).contains(host)) when receiving RegisterReceiver, we may keep restarting a receiver if the scheduled executors keep busy, such as running some long running jobs.

My current implementation only rejects a receiver for a mismatch location when restarting it.

Good point. How about this.
The scheduler returns the following.

if there are >= 3, zero-weight options, return all of them

else return 3 best options.

Before submitting the job, run the scheduler and use those options to launch the job. When the RegisterReceiver is called, you again run the scheduler to get updated options and verify whether the host where the task was actually launched is still in the best options. That should reduce the likelihood of the above condition happening by a lot.

How does that sound?

Sounds good. I will update it.

tdas · 2015-07-14T02:39:14Z

2 major high-level comments that need significant code changes before we proceed further.

Messaging can be optimized, see other comments
The ReceiverLauncher is becoming quite complex, is it going to be too complex to move it to a separate file?

SparkQA · 2015-07-23T02:18:10Z

Test build #38144 has finished for PR 7276 at commit 05daf9c.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

tdas · 2015-07-23T02:59:38Z

streaming/src/main/scala/org/apache/spark/streaming/scheduler/ReceiverTracker.scala

Is there any drawback in implementing in the current way? Is it that with 100 receivers, there will be 100 threads stuck.

Is it that with 100 receivers, there will be 100 threads stuck.

Right. That's why we need #7385.

tdas · 2015-07-23T08:53:59Z

streaming/src/main/scala/org/apache/spark/streaming/scheduler/ReceiverSchedulingPolicy.scala

Might be more consistent with scheduleReceivers to pass the whole Receiver object, rather than receiverId and preferredLocatons.

Using receiverId and preferredLocatons is because we don't store the Receiver object currently. The Receiver object is only available when launching it.

SparkQA · 2015-07-23T09:20:24Z

Test build #38202 has finished for PR 7276 at commit 137b257.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

tdas · 2015-07-23T09:39:28Z

...ming/src/test/scala/org/apache/spark/streaming/scheduler/ReceiverSchedulingPolicySuite.scala

Would be possible to express these test in the nicer way that ReceiverTracker expressed?
testScheduler(numReceivers = 5, preferredLocation = false, allocation = "0|1|2|3|0")

Sorry. Forgot to reply this one. I don't want to use an accurate assertion because the scheduling policy should be able to assign receivers to any executor as long as the finally result is even.

E.g., if we have 2 executors and 2 receivers, the scheduling result could be (receiver 0 -> executor 0, receiver 1 -> executor 1), or (receiver 0 -> executor 1, receiver 1 -> executor 0).

Looks it's hard to express it as something like allocation = "0|1|2|3|0"

Got it, makes sense.

tdas · 2015-07-24T03:44:09Z

I think this PR is close to LGTM. I will wait for #7385 to get merged, then this PR can be updated accordingly.

tdas · 2015-07-27T20:25:16Z

@zsxwing I chatted with @pwendell for his opinion on this, and its fine to merge this as is without the #7385. Once that is merged we can always improve the performance later. Not that its going to really hurt performance in the current state, as it will be just a bunch of sleeping blocked threads, there is not constant thread switching. What do you think?

zsxwing · 2015-07-28T00:55:43Z

@tdas Agree. We can do it later.

tdas · 2015-07-28T00:58:56Z

Alright, then i am merging this!

Add a new Receiver scheduling mechanism

d9a3e72

zsxwing added 3 commits July 8, 2015 20:26

Add JobWaiter.toFuture to avoid blocking threads

cc76142

Add unit tests for LoadBalanceReceiverSchedulerImplSuite

27acd45

Add a test for Receiver.restart

ca6fe35

zsxwing changed the title ~~[SPARK-8882][Streaming][WIP]Add a new Receiver scheduling mechanism~~ [SPARK-8882][Streaming]Add a new Receiver scheduling mechanism Jul 8, 2015

Use tryFailure to support calling jobFailed multiple times

2c86a9e

tdas reviewed Jul 10, 2015
View reviewed changes

Make 'host' protected; rescheduleReceiver -> getAllowedLocations

28d1bee

tdas reviewed Jul 13, 2015
View reviewed changes

streaming/src/main/scala/org/apache/spark/streaming/scheduler/ReceiverScheduler.scala Outdated

Copy link

Contributor

tdas Jul 13, 2015

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesnt it return a list of executors?

tdas reviewed Jul 14, 2015
View reviewed changes

streaming/src/main/scala/org/apache/spark/streaming/scheduler/ReceiverTracker.scala Outdated

Copy link

Contributor

tdas Jul 14, 2015

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This TODO needs to be removed.

tdas reviewed Jul 14, 2015
View reviewed changes

Use receiverTrackingInfo.toReceiverInfo

05daf9c

tdas reviewed Jul 23, 2015
View reviewed changes

zsxwing added 5 commits July 23, 2015 16:11

Rename: scheduledLocations -> scheduledExecutors; locations -> executors

715ef9c

Move DummyReceiver back to ReceiverTrackerSuite

7451498

Fix the code style

5e1fa48

Set state to ReceiverState.INACTIVE in deregisterReceiver

61a6c3f

Add preferredNumExecutors to rescheduleReceiver

137b257

tdas reviewed Jul 23, 2015
View reviewed changes

asfgit closed this in daa1964 Jul 28, 2015

zsxwing deleted the receiver-scheduling branch July 28, 2015 02:49

[SPARK-8882][Streaming]Add a new Receiver scheduling mechanism #7276

[SPARK-8882][Streaming]Add a new Receiver scheduling mechanism #7276

Uh oh!

Conversation

zsxwing commented Jul 8, 2015

Uh oh!

SparkQA commented Jul 8, 2015

Uh oh!

SparkQA commented Jul 8, 2015

Uh oh!

SparkQA commented Jul 8, 2015

Uh oh!

zsxwing commented Jul 9, 2015

Uh oh!

SparkQA commented Jul 9, 2015

Uh oh!

SparkQA commented Jul 9, 2015

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Jul 11, 2015

Uh oh!

Choose a reason for hiding this comment

Uh oh!

JoshRosen commented Jul 13, 2015

Uh oh!

Choose a reason for hiding this comment

Uh oh!

zsxwing commented Jul 14, 2015

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tdas commented Jul 14, 2015

Uh oh!

SparkQA commented Jul 23, 2015

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Jul 23, 2015

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tdas commented Jul 24, 2015

Uh oh!

tdas commented Jul 27, 2015

Uh oh!

zsxwing commented Jul 28, 2015

Uh oh!

tdas commented Jul 28, 2015

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development