Skip to content
This repository was archived by the owner on Jan 9, 2020. It is now read-only.
This repository was archived by the owner on Jan 9, 2020. It is now read-only.

Change driver pod's restart policy from OnFailure to Never #144

@ash211

Description

@ash211

After #138 the exit code of the rest server (and thus the driver pod) is the exit code of the Spark driver. This means the driver pod is exiting in non-zero ways more frequently.

As @kimoonkim notes in #135 (comment) this means that k8s is now restarting the driver pod on failure since we have the OnFailure restart policy set on the driver pod. But the restarted driver pod never gets sent a submission from the launcher because we don't have that logic built in yet.

So we need to either:

  • build re-launch logic into the launcher for driver pod failure and restart
  • turn off driver pod restart so the launcher shuts down cleanly (though we'll want an error message that the Spark job failed)

I'd prefer option 2 and to pursue option 1 in the future. Potentially a richer version of option 1 would also include creating a k8s Job resource for the driver so we're interacting with k8s at a higher level than directly on pods.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions