-
Notifications
You must be signed in to change notification settings - Fork 117
Initial architecture documentation. #401
Conversation
| --- | ||
|
|
||
|
|
||
| Similarly to YARN and Standalone mode, it is common for Spark applications to be deployed on Kubernetes through the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the whole this document seems verbose - if we could use bulleted lists or diagrams in place of some of these discourses that would improve readability, but I'm not sure how to best represent the information in that way.
|
Thank you for this |
|
LGTM |
|
|
||
| ## Init-Containers | ||
|
|
||
| The submission client and the scheduler backend both use init-containers to localize resources before the driver and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"The driver and executor Pods both contain an init-container to ..." might be more accurate.
| pod spec in a YML file: https://github.com/apache-spark-on-k8s/spark/issues/38 | ||
| - The resource staging server can be backed by a distributed file store like HDFS to improve robustness and scalability | ||
| - Additional driver bootstrap steps need to be added to support communication with Kerberized HDFS clusters: | ||
| https://github.com/apache-spark-on-k8s/spark/pull/391 No newline at end of file |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing periods at the end of each bullet point.
|
|
||
|
|
||
| Similarly to YARN and Standalone mode, it is common for Spark applications to be deployed on Kubernetes through the | ||
| `spark-submit` process. Applications are deployed on Kubernetes via sending YML files to the Kubernetes API server. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/YML/YAML. Also this is probably not accurate. Applications are deployed on Kubernetes by creating Kubernetes API objects via the API server. Such Kubernetes API objects are typically declared in YAML files.
| # Future Work | ||
|
|
||
| - The driver's pod specification should be highly customizable, to the point where users may want to specify a template | ||
| pod spec in a YML file: https://github.com/apache-spark-on-k8s/spark/issues/38 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/YML/YAML.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've seen both used interchangeably - is there a standard to use YAML in the Kubernetes community?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The k8s documentation seems consistently using YAML. https://kubernetes.io/docs/search/?q=YAML.
| /** | ||
| * Represents a step in preparing the Kubernetes driver. | ||
| */ | ||
| private[spark] trait DriverConfigurationStep { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My 2cents around putting interface code in arch docs, is that it's probably easier for the maintainers if we document to this detail once we move out of beta. Otherwise I will suspect it's a constant moving target where we're not really maintaining this interface as general public API
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's fine if this has to be a constantly moving target - this is an architecture document so it should reflect the semantics for those who are contributing to the project.
|
LGTM. |
|
rerun integration tests please |
1 similar comment
|
rerun integration tests please |
|
Note that this still is missing documentation on the external shuffle service and the scheduler backend itself. |
Initial full documentation for the submission client. Templates for the external shuffle service and the scheduler backend.
27c00f1 to
83d6f55
Compare
|
resync w/ head of branch-2.2-kubernetes |
The generated file is correct but the expected file in the test was not.
* Initial architecture documentation. Initial full documentation for the submission client. Templates for the external shuffle service and the scheduler backend. * Add title to scheduler backend doc. * edits for PR review feedback
Closes #400.
Initial full documentation for the submission client. Templates for the external shuffle service and the scheduler backend.