Skip to content
This repository was archived by the owner on Jan 9, 2020. It is now read-only.
This repository was archived by the owner on Jan 9, 2020. It is now read-only.

HDFS data locality has a bug in case cluster node names are not full host names #290

@kimoonkim

Description

@kimoonkim

With #216, the Spark driver sends tasks to right executors with tasks' HDFS data on local disks. This is done in two steps:

  1. We map executor pod IPs to cluster node names that executor pods run on.
  2. And compare those cluster node names with host names that data node daemons run on.

Step (2) has a minor bug in case cluster node names are not fully qualified host names. e.g. The cluster node name is just myhost whereas the full name is myhost.mydomain. We observed this bug in an HDFS experiment on Google Cloud GKE.

The fix is simple. In case comparison with short names fails, get the full host name using InetAddress.getCanonicalHostName. And use the output for comparison. I'll send a PR shortly.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions