Skip to content

Conversation

@cloud-fan
Copy link
Contributor

@cloud-fan cloud-fan commented Mar 22, 2016

What changes were proposed in this pull request?

Track executor information like host and port, cache size, running tasks.

How was this patch tested?

manual test

@cloud-fan
Copy link
Contributor Author

cc @rxin

@SparkQA
Copy link

SparkQA commented Mar 22, 2016

Test build #53775 has finished for PR 11888 at commit f76de44.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan cloud-fan changed the title [SPARK-14069][SQL][WIP] Improve SparkStatusTracker to also track executor information [SPARK-14069][SQL] Improve SparkStatusTracker to also track executor information Mar 23, 2016
@cloud-fan
Copy link
Contributor Author

I found it's difficulty to write tests for it. As it just collects the informations which are already exposed by the system, is it worth to test them again? cc @rxin

@rxin
Copy link
Contributor

rxin commented Mar 23, 2016

It's probably ok.

@SparkQA
Copy link

SparkQA commented Mar 23, 2016

Test build #53884 has finished for PR 11888 at commit fe80390.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor Author

retest this please.

@SparkQA
Copy link

SparkQA commented Mar 23, 2016

Test build #53904 has finished for PR 11888 at commit fe80390.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor Author

retest this please.

@SparkQA
Copy link

SparkQA commented Mar 23, 2016

Test build #53921 has finished for PR 11888 at commit fe80390.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor Author

retest this please

@SparkQA
Copy link

SparkQA commented Mar 23, 2016

Test build #53925 has finished for PR 11888 at commit fe80390.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

/**
* Returns a list of all known executors, represented by string with format: "host:port"
*/
def getExecutors(): Array[String] = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems kind of arbitrary that getExecutors returns host:port but not IDs. I think it's better that we make a SparkExecutorInfo or something and expose the host:port there, along with other things like cache size, numRunningTasks etc. Then in the future we can add more things we want to expose without tying ourselves with the host:port identifier.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I just want a list of executors, why shouldn't I be able to get them? I think it makes sense to have a more detailed version (maybe replace the following 2), but having a simple one that returns just the list of executors seem to make sense too.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The other thing is I don't know if we want to query the scheduler every time we want a list of executors.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can expose both, but I'd rather call it something more explicit like getExecutorHostPort or something. Elsewhere in Spark I would think getExecutors: Array[String] returns the executor IDs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getExecutorList

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getExecutorList LGTM, I'll rename to it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wait, how is getExecutorList different from getExecutors? Why not just be more specific what the strings are?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually let me move this to the main thread so it doesn't get collapsed.

@SparkQA
Copy link

SparkQA commented Mar 30, 2016

Test build #54526 has finished for PR 11888 at commit 0b9400e.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@andrewor14
Copy link
Contributor

@rxin I don't understand the distinction between getExecutors and getExecutorList. Why not just be more explicit about what the strings represent?

The other problem is the keys to the other maps are also expected to be host:port but I would normally expect them to be executor IDs. Also host can be confusing because it can be in one of many different formats (e.g. IP addr, all the hostname formats in EC2). We already expose the executor IDs in a few places (e.g. UI, autoscaling API) so the user already knows what they are.

@andrewor14
Copy link
Contributor

My proposal:

def getExecutorIds(): Array[String]
def getExecutorInfo(executorId: String): Option[SparkExecutorInfo]
// If you want we can also expose the following:
def getExecutorHostPorts(): Array[String]

private class SparkExecutorInfoImpl(
    val id: String,
    val hostport: String, // host:port
    ...
    val cacheSize: Long,
    val numRunningTasks: Int)
 extends SparkExecutorInfo

This is more consistent with the existing status API, where we have things like

def getActiveJobIds(): Array[Int]
def getJobInfo(jobId: Int): Option[SparkJobInfo]

private class SparkJobInfoImpl(
    val jobId: Int,
    ...)
  extends SparkJobInfo

@rxin
Copy link
Contributor

rxin commented Mar 30, 2016

What are executor ids? is that even an external concept?

@rxin
Copy link
Contributor

rxin commented Mar 30, 2016

Can you paste me what an executor looks like? If you just tell me "executor id" as an end user, I have no clue what you are talking about.

@andrewor14
Copy link
Contributor

screen shot 2016-03-30 at 3 16 08 pm

screen shot 2016-03-30 at 3 16 47 pm

screen shot 2016-03-30 at 3 25 09 pm

scala> df.count()
16/03/30 15:21:58 INFO SparkContext: Starting job: count at <console>:27
...
16/03/30 15:21:58 INFO SparkDeploySchedulerBackend: Launching task 7 on executor id: 3 hostname: 192.168.0.209.
16/03/30 15:21:58 INFO SparkDeploySchedulerBackend: Launching task 8 on executor id: 0 hostname: 192.168.0.209.
16/03/30 15:21:58 INFO SparkDeploySchedulerBackend: Launching task 9 on executor id: 5 hostname: 192.168.0.209.
16/03/30 15:21:58 INFO SparkDeploySchedulerBackend: Launching task 10 on executor id: 1 hostname: 192.168.0.209.
16/03/30 15:21:58 INFO SparkDeploySchedulerBackend: Launching task 11 on executor id: 4 hostname: 192.168.0.209.
16/03/30 15:21:58 INFO SparkDeploySchedulerBackend: Launching task 12 on executor id: 2 hostname: 192.168.0.209.

@rxin
Copy link
Contributor

rxin commented Mar 30, 2016

Yea that integer id is completely useless to users who want to figure out what to do with their clusters.

@andrewor14
Copy link
Contributor

OK, @rxin and I discussed this more offline. Our proposal is:

def getExecutorInfos: Seq[SparkExecutorInfo]

private class SparkExecutorInfoImpl(
    val host: String,
    val port: Int,
    ...
    val cacheSize: Long,
    val numRunningTasks: Int)
 extends SparkExecutorInfo

Then we don't need to tie us down with the very specific host:port format, and we don't have to expose the executor IDs, which are just integers that don't mean much.

@SparkQA
Copy link

SparkQA commented Mar 31, 2016

Test build #54577 has finished for PR 11888 at commit 39dd0ee.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@andrewor14
Copy link
Contributor

LGTM retest this please

@SparkQA
Copy link

SparkQA commented Mar 31, 2016

Test build #54588 has finished for PR 11888 at commit 39dd0ee.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@andrewor14
Copy link
Contributor

Merged into master thanks guys.

@asfgit asfgit closed this in 0abee53 Mar 31, 2016
@JoshRosen
Copy link
Contributor

This looks somewhat dodgy to me from a thread-safety perspective since executorIdToTaskCount isn't thread-safe and thus we may iterate over it in SparkStatusTracker while also concurrently updating it while processing task updates. I'm going to fix this in a followup but wanted to point it out here since ideally this should not have slipped past code review.

// Number of tasks running on each executor
private val executorIdToTaskCount = new HashMap[String, Int]

def runningTasksByExecutors(): Map[String, Int] = executorIdToTaskCount.toMap
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding a synchronized here would resolve the thread-safety issue, I think. I'll do this as part of a patch fixing another bug and also touching this line.

@rxin
Copy link
Contributor

rxin commented Nov 23, 2016

Why did we merge this when the description says "N/A"?

@markhamstra
Copy link
Contributor

@rxin Do you mean the N/A in "How was this patch tested?" Some guy said that the lack of tests was ok. #11888 (comment)

@rxin
Copy link
Contributor

rxin commented Nov 23, 2016

Yea but "TODO: tests" and tests: N/A ...

We needed to update the description.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants