Support HDFS rack locality

This is likely the last sub-item of HDFS locality umbrella issue #206.

When using HDFS, Spark driver looks up which rack a given datanode or executor belongs to.  So that it sends tasks to right executors that can read task data from datanodes on same racks. (This happens as a fallback when node locality fails). To support rack locality, Spark driver loads a configurable topology plugin into its JVM.

[TaskSchedulerImpl](https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala) has the `getRackForHost` method, that is supposed to be overridden by a subclass to call the topology plugin. Then [TaskSetManager](https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala) will call `getRackForHost` to populate `pendingTasksForRack` map. It gets called for each datanode host associated with input data blocks of pending tasks.

```
  private def addPendingTask(index: Int) {

...

      for (rack <- sched.getRackForHost(loc.host)) {
        pendingTasksForRack.getOrElseUpdate(rack, new ArrayBuffer) += index
      }
    }  
```

`getRackForHost` is also called with executor addresses when the driver is about to send tasks to executors.
```
  private def dequeueTask(execId: String, host: String, maxLocality: TaskLocality.Value)
    : Option[(Int, TaskLocality.Value, Boolean)] =

   ...

    if (TaskLocality.isAllowed(maxLocality, TaskLocality.RACK_LOCAL)) {
      for {
        rack <- sched.getRackForHost(host)
        index <- dequeueTaskFromList(execId, host, getPendingTasksForRack(rack))
      } {
        return Some((index, TaskLocality.RACK_LOCAL, false))
      }
    }
```

Yarn implements `getRackForHost` method in [YarnScheduler](https://github.com/apache-spark-on-k8s/spark/blob/branch-2.1-kubernetes/yarn/src/main/scala/org/apache/spark/scheduler/cluster/YarnScheduler.scala):

```
  override def getRackForHost(hostPort: String): Option[String] = {
    val host = Utils.parseHostPort(hostPort)._1
    Option(RackResolver.resolve(sc.hadoopConfiguration, host).getNetworkLocation)
  }
```

We can add similar code in [KubernetesTaskSchedulerImpl](https://github.com/apache-spark-on-k8s/spark/blob/branch-2.1-kubernetes/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/kubernetes/KubernetesTaskSchedulerImpl.scala). The datanode handling will be exactly like the above. For executors, we would map pod IP addresses to cluster node name/IP and then call the topology plugin using the cluster node address. I'll send a PR soon.





Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Support HDFS rack locality #349

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Support HDFS rack locality #349

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions