Skip to content
This repository was archived by the owner on Jan 9, 2020. It is now read-only.

Conversation

@mccheah
Copy link

@mccheah mccheah commented Jul 28, 2017

Closes #400.

Initial full documentation for the submission client. Templates for the external shuffle service and the scheduler backend.

---


Similarly to YARN and Standalone mode, it is common for Spark applications to be deployed on Kubernetes through the
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the whole this document seems verbose - if we could use bulleted lists or diagrams in place of some of these discourses that would improve readability, but I'm not sure how to best represent the information in that way.

@ifilonenko
Copy link
Member

Thank you for this

@ifilonenko ifilonenko self-requested a review July 28, 2017 04:51
@erikerlandson
Copy link
Member

LGTM


## Init-Containers

The submission client and the scheduler backend both use init-containers to localize resources before the driver and
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"The driver and executor Pods both contain an init-container to ..." might be more accurate.

pod spec in a YML file: https://github.com/apache-spark-on-k8s/spark/issues/38
- The resource staging server can be backed by a distributed file store like HDFS to improve robustness and scalability
- Additional driver bootstrap steps need to be added to support communication with Kerberized HDFS clusters:
https://github.com/apache-spark-on-k8s/spark/pull/391 No newline at end of file
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing periods at the end of each bullet point.



Similarly to YARN and Standalone mode, it is common for Spark applications to be deployed on Kubernetes through the
`spark-submit` process. Applications are deployed on Kubernetes via sending YML files to the Kubernetes API server.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/YML/YAML. Also this is probably not accurate. Applications are deployed on Kubernetes by creating Kubernetes API objects via the API server. Such Kubernetes API objects are typically declared in YAML files.

# Future Work

- The driver's pod specification should be highly customizable, to the point where users may want to specify a template
pod spec in a YML file: https://github.com/apache-spark-on-k8s/spark/issues/38
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/YML/YAML.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've seen both used interchangeably - is there a standard to use YAML in the Kubernetes community?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The k8s documentation seems consistently using YAML. https://kubernetes.io/docs/search/?q=YAML.

/**
* Represents a step in preparing the Kubernetes driver.
*/
private[spark] trait DriverConfigurationStep {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My 2cents around putting interface code in arch docs, is that it's probably easier for the maintainers if we document to this detail once we move out of beta. Otherwise I will suspect it's a constant moving target where we're not really maintaining this interface as general public API

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's fine if this has to be a constantly moving target - this is an architecture document so it should reflect the semantics for those who are contributing to the project.

@liyinan926
Copy link
Member

LGTM.

@foxish foxish mentioned this pull request Aug 3, 2017
10 tasks
@foxish
Copy link
Member

foxish commented Aug 3, 2017

rerun integration tests please

1 similar comment
@foxish
Copy link
Member

foxish commented Aug 3, 2017

rerun integration tests please

@mccheah
Copy link
Author

mccheah commented Aug 4, 2017

Note that this still is missing documentation on the external shuffle service and the scheduler backend itself.

mccheah and others added 3 commits August 8, 2017 10:00
Initial full documentation for the submission client. Templates for the
external shuffle service and the scheduler backend.
@erikerlandson
Copy link
Member

resync w/ head of branch-2.2-kubernetes

@erikerlandson erikerlandson merged commit 24cd9ee into branch-2.2-kubernetes Aug 8, 2017
ifilonenko pushed a commit to ifilonenko/spark that referenced this pull request Feb 26, 2019
The generated file is correct but the expected file in the test was not.
puneetloya pushed a commit to puneetloya/spark that referenced this pull request Mar 11, 2019
* Initial architecture documentation.

Initial full documentation for the submission client. Templates for the
external shuffle service and the scheduler backend.

* Add title to scheduler backend doc.

* edits for PR review feedback
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants