HDFS data locality has a bug in case cluster node names are not full host names

With #216, the Spark driver sends tasks to right executors with tasks' HDFS data on local disks. This is done in two steps:
1. We map executor pod IPs to cluster node names that executor pods run on.
2. And compare those cluster node names with host names that data node daemons run on.

Step (2) has a minor bug in case cluster node names are not fully qualified host names. e.g. The cluster node name is just `myhost` whereas the full name is `myhost.mydomain`. We observed this bug in an HDFS [experiment](https://github.com/apache-spark-on-k8s/spark/issues/206#issuecomment-297776635) on Google Cloud GKE.

The fix is simple. In case comparison with short names fails, get the full host name using `InetAddress.getCanonicalHostName`. And use the output for comparison. I'll send a PR shortly. 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

HDFS data locality has a bug in case cluster node names are not full host names #290

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

HDFS data locality has a bug in case cluster node names are not full host names #290

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions