This repository was archived by the owner on Jan 9, 2020. It is now read-only.

Description
With #216, the Spark driver sends tasks to right executors with tasks' HDFS data on local disks. This is done in two steps:
- We map executor pod IPs to cluster node names that executor pods run on.
- And compare those cluster node names with host names that data node daemons run on.
Step (2) has a minor bug in case cluster node names are not fully qualified host names. e.g. The cluster node name is just myhost whereas the full name is myhost.mydomain. We observed this bug in an HDFS experiment on Google Cloud GKE.
The fix is simple. In case comparison with short names fails, get the full host name using InetAddress.getCanonicalHostName. And use the output for comparison. I'll send a PR shortly.