[SPARK-18278][NOSUBMIT] Ongoing diff for Spark on Kubernetes (branch-2.1) #200

mccheah · 2017-03-22T19:17:43Z

This pull request can serve as a live diff as to what our work has amounted to so far.

Upstream ticket: https://issues.apache.org/jira/browse/SPARK-18278

- Don't hold the raw secret bytes - Add CPU limits and requests

The build process fails ScalaStyle checks otherwise.

* Use tar and gzip to archive shipped jars. * Address comments * Move files to resolve merge

* Use alpine and java 8 for docker images. * Remove installation of vim and redundant comment

* Error messages when the driver container fails to start. * Fix messages a bit * Use timeout constant * Delete the pod if it fails for any reason (not just timeout) * Actually set submit succeeded * Fix typo

* Documentation for the current state of the world. * Adding navigation links from other pages * Address comments, add TODO for things that should be fixed * Address comments, mostly making images section clearer * Virtual runtime -> container runtime

#20) * Development workflow documentation for the current state of the world. * Address comments. * Clarified code change and added ticket link

* Added service name as prefix to executor pods to be able to tell them apart from kubectl output * Addressed comments

* Add kubernetes profile to travis yml file * Fix long lines in CompressionUtils.scala

* Improved the example commands in running-on-k8s document. * Fixed more example commands. * Fixed typo.

* Support custom labels on the driver pod. * Add integration test and fix logic. * Fix tests * Fix minor formatting mistake * Reduce unnecessary diff

* A number of small tweaks to the MVP. - Master protocol defaults to https if not specified - Removed upload driver extra classpath functionality - Added ability to specify main app resource with container:// URI - Updated docs to reflect all of the above - Add examples to Docker images, mostly for integration testing but could be useful for easily getting started without shipping anything * Add example to documentation.

See https://github.com/apache-spark-on-k8s/spark/blob/k8s-support-alternate-incremental/pom.xml#L2615

* Support setting the driver pod launching timeout. And increase the default value from 30s to 60s. The current value of 30s is kind of short for pulling the image from public docker registry plus the container/JVM start time. * Use a better name for the default timeout.

* Use "extraTestArgLine" to pass extra options to scalatest. Because the "argLine" option of scalatest is set in pom.xml and we can't overwrite it from the command line. Ref #37 * Added a default value for extraTestArgLine * Use a better name. * Added a tip for this in the dev docs.

…from data locality (#316) * Use node affinity to launch executors on data local nodes * Fix comment style * Use JSON object mapper * Address review comments * Fix a style issue * Clean up and add a TODO * Fix style issue * Address review comments

* Fix sbt build. - Remove extraneous Feign dependency that we no longer use in submission v2. - Exclude Jackson from various modules to ensure every Jackson module is forced to 2.6.5. - Fix a linter error only caught by sbt. - Add Kubernetes modules to various parts of the SBT infrastructure * Actually remove feign * Actually exclude Jackson from kubernetes client.

* New API for custom labels and annotations. This APi allows for these labels and annotations to have = and , characters, which is hard to accomplish in the old scheme. * Compare correct values in requirements * Use helper method * Address comments. * Fix scalastyle * Use variable * Remove unused import

The conf property spark.kubernetes.shuffle.namespace is used to specify the namesapce of shuffle pods. In normal cases, only one "shuffle daemonset" is deployed and shared by all spark pods. The spark driver should be able to list and watch shuffle pods in the namespace specified by user. Note: by default, spark driver pod doesn't have authority to list and watch shuffle pods in another namespace. Some action is needed to grant it the authority. For example, below ABAC policy works. ``` {"apiVersion": "abac.authorization.kubernetes.io/v1beta1", "kind": "Policy", "spec": {"group": "system:serviceaccounts", "namespace": "SHUFFLE_NAMESPACE", "resource": "pods", "readonly": true}} ```

This commit tries to solve issue #359 by allowing the `spark.executor.cores` configuration key to take fractional values, e.g., 0.5 or 1.5. The value is used to specify the cpu request when creating the executor pods, which is allowed to be fractional by Kubernetes. When the value is passed to the executor process through the environment variable `SPARK_EXECUTOR_CORES`, the value is rounded up to the closest integer as required by the `CoarseGrainedExecutorBackend`. Signed-off-by: Yinan Li <[email protected]>

* Adding PySpark Submit functionality. Launching Python from JVM * Addressing scala idioms related to PR351 * Removing extends Logging which was necessary for LogInfo * Refactored code to leverage the ContainerLocalizedFileResolver * Modified Unit tests so that they would pass * Modified Unit Test input to pass Unit Tests * Setup working environent for integration tests for PySpark * Comment out Python thread logic until Jenkins has python in Python * Modifying PythonExec to pass on Jenkins * Modifying python exec * Added unit tests to ClientV2 and refactored to include pyspark submission resources * Modified unit test check * Scalastyle * PR 348 file conflicts * Refactored unit tests and styles * further scala stylzing and logic * Modified unit tests to be more specific towards Class in question * Removed space delimiting for methods * Submission client redesign to use a step-based builder pattern. This change overhauls the underlying architecture of the submission client, but it is intended to entirely preserve existing behavior of Spark applications. Therefore users will find this to be an invisible change. The philosophy behind this design is to reconsider the breakdown of the submission process. It operates off the abstraction of "submission steps", which are transformation functions that take the previous state of the driver and return the new state of the driver. The driver's state includes its Spark configurations and the Kubernetes resources that will be used to deploy it. Such a refactor moves away from a features-first API design, which considers different containers to serve a set of features. The previous design, for example, had a container files resolver API object that returned different resolutions of the dependencies added by the user. However, it was up to the main Client to know how to intelligently invoke all of those APIs. Therefore the API surface area of the file resolver became untenably large and it was not intuitive of how it was to be used or extended. This design changes the encapsulation layout; every module is now responsible for changing the driver specification directly. An orchestrator builds the correct chain of steps and hands it to the client, which then calls it verbatim. The main client then makes any final modifications that put the different pieces of the driver together, particularly to attach the driver container itself to the pod and to apply the Spark configuration as command-line arguments. * Don't add the init-container step if all URIs are local. * Python arguments patch + tests + docs * Revert "Python arguments patch + tests + docs" This reverts commit 4533df2. * Revert "Don't add the init-container step if all URIs are local." This reverts commit e103225. * Revert "Submission client redesign to use a step-based builder pattern." This reverts commit 5499f6d. * style changes * space for styling

* Submission client redesign to use a step-based builder pattern. This change overhauls the underlying architecture of the submission client, but it is intended to entirely preserve existing behavior of Spark applications. Therefore users will find this to be an invisible change. The philosophy behind this design is to reconsider the breakdown of the submission process. It operates off the abstraction of "submission steps", which are transformation functions that take the previous state of the driver and return the new state of the driver. The driver's state includes its Spark configurations and the Kubernetes resources that will be used to deploy it. Such a refactor moves away from a features-first API design, which considers different containers to serve a set of features. The previous design, for example, had a container files resolver API object that returned different resolutions of the dependencies added by the user. However, it was up to the main Client to know how to intelligently invoke all of those APIs. Therefore the API surface area of the file resolver became untenably large and it was not intuitive of how it was to be used or extended. This design changes the encapsulation layout; every module is now responsible for changing the driver specification directly. An orchestrator builds the correct chain of steps and hands it to the client, which then calls it verbatim. The main client then makes any final modifications that put the different pieces of the driver together, particularly to attach the driver container itself to the pod and to apply the Spark configuration as command-line arguments. * Add a unit test for BaseSubmissionStep. * Add unit test for kubernetes credentials mounting. * Add unit test for InitContainerBootstrapStep. * unit tests for initContainer * Add a unit test for DependencyResolutionStep. * further modifications to InitContainer unit tests * Use of resolver in PythonStep and unit tests for PythonStep * refactoring of init unit tests and pythonstep resolver logic * Add unit test for KubernetesSubmissionStepsOrchestrator. * refactoring and addition of secret trustStore+Cert checks in a SubmissionStepSuite * added SparkPodInitContainerBootstrapSuite * Added InitContainerResourceStagingServerSecretPluginSuite * style in Unit tests * extremely minor style fix in variable naming * Address comments. * Rename class for consistency. * Attempt to make spacing consistent. Multi-line methods should have four-space indentation for arguments that aren't on the same line as the method call itself... but this is difficult to do consistently given how IDEs handle Scala multi-line indentation in most cases.

Otherwise we can get a Scalastyle error when building from SBT.

Test with ./dev/scalastyle

…st. (#378) * Retry binding server to random port in the resource staging server test. * Break if successful start * Start server in try block. * FIx scalastyle * More rigorous cleanup logic. Increment port numbers. * Move around more exception logic. * More exception refactoring. * Remove whitespace * Fix test * Rename variable

* set RestartPolicy=Never for executor As for current implementation the RestartPolicy of executor pod is not set, so the default value "OnFailure" is in effect. But this causes problem. If an executor is terminated unexpectedly, for example, exit by java.lang.OutOfMemoryError, it'll be restarted by k8s with the same executor ID. When the new executor tries to fetch a block hold by the last executor, ShuffleBlockFetcherIterator.splitLocalRemoteBlocks() think it's a **local** block and tries to read it from it's local dir. But the executor's local dir is changed because random generated ID is part of local dir. FetchFailedException will raise and the stage will fail. The rolling Error message: 17/06/29 01:54:56 WARN KubernetesTaskSetManager: Lost task 0.1 in stage 2.0 (TID 7, 172.16.75.92, executor 1): FetchFailed(BlockManagerId(1, 172.16.75.92, 40539, None), shuffleId=2, mapId=0, reduceId=0, message= org.apache.spark.shuffle.FetchFailedException: /data2/spark/blockmgr-0e228d3c-8727-422e-aa97-2841a877c42a/32/shuffle_2_0_0.index (No such file or directory) at org.apache.spark.storage.ShuffleBlockFetcherIterator.throwFetchFailedException(ShuffleBlockFetcherIterator.scala:357) at org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:332) at org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:54) at scala.collection.Iterator$$anon$11.next(Iterator.scala:409) * Update KubernetesClusterSchedulerBackend.scala

This makes executors consistent with the driver. Note that SPARK_EXTRA_CLASSPATH isn't set anywhere by Spark itself, but it's primarily meant to be set by images that inherit from the base driver/executor images.

…n. (#244) * Changes to support executor recovery behavior during static allocation. * addressed review comments * Style changes and removed inocrrectly merged code * addressed latest review comments * changed import order * Minor changes to avoid exceptions when exit code is missing * fixed style check * Addressed review comments from Yinan LiAddressed review comments from Yinan Li.. * Addressed comments and got rid of an explicit lock object. * Fixed imports order. * Addressed review comments from Matt * Couple of style fixes

mccheah · 2017-07-26T22:40:28Z

This ongoing diff should be ported to 2.2, right?

ash211 · 2017-07-26T22:41:40Z

@mccheah yes -- I think it's just a matter of changing the PR from
branch-2.1-kubernetes -> branch-2.1
to
branch-2.2-kubernetes -> branch-2.2

Might require opening a new PR

…DFS locality support (#412) (#421) * Flag-guard expensive DNS lookup of cluster node full names, part of HDFS locality support * Clean up a bit * Improve unit tests

Update pom to v0.3.1 for the new 2.1 release

ash211 · 2017-08-22T00:32:58Z

This was the branch-2.1 PR. I just opened a replacement ongoing diff PR for branch-2.2 at #200 now that we've moved development to that branch.

mccheah and others added 30 commits March 8, 2017 10:18

[SPARK-18278] Minimal support for submitting to Kubernetes.

2ffed59

Fix style

00e545f

Make naming more consistent

cdbd9bb

Fix building assembly with Kubernetes.

8f69fc0

Service account support, use constants from fabric8 library.

75c6086

Some small changes

93b75ce

- Don't hold the raw secret bytes - Add CPU limits and requests

Use k8s:// formatted URL instead of separate setting.

e7397e8

Reindent comment to conforn to JavaDoc style

ed65428

The build process fails ScalaStyle checks otherwise.

Move kubernetes under resource-managers folder.

f9ddb63

Use tar and gzip to compress+archive shipped jars (#2)

178abc1

* Use tar and gzip to archive shipped jars. * Address comments * Move files to resolve merge

Use alpine and java 8 for docker images. (#10)

e2787e8

* Use alpine and java 8 for docker images. * Remove installation of vim and redundant comment

Copy the Dockerfiles from docker-minimal-bundle into the distribution. (

acceb72

#12)

inherit IO (#13)

24f4bf0

Error messages when the driver container fails to start. (#11)

adcc906

* Error messages when the driver container fails to start. * Fix messages a bit * Use timeout constant * Delete the pod if it fails for any reason (not just timeout) * Actually set submit succeeded * Fix typo

Fix linter error to make CI happy (#18)

0b81dbf

Development workflow documentation for the current state of the world. (

b25bc8b

#20) * Development workflow documentation for the current state of the world. * Address comments. * Clarified code change and added ticket link

Added service name as prefix to executor pods (#14)

761b317

* Added service name as prefix to executor pods to be able to tell them apart from kubectl output * Addressed comments

Add kubernetes profile to travis CI yml file (#21)

8739b41

* Add kubernetes profile to travis yml file * Fix long lines in CompressionUtils.scala

Improved the example commands in running-on-k8s document. (#25)

928e00e

* Improved the example commands in running-on-k8s document. * Fixed more example commands. * Fixed typo.

Fix spacing for command highlighting (#31)

3e3c4d4

Support custom labels on the driver pod. (#27)

36c4e94

* Support custom labels on the driver pod. * Add integration test and fix logic. * Fix tests * Fix minor formatting mistake * Reduce unnecessary diff

Make pod name unique using the submission timestamp (#32)

b6c57c7

Correct hadoop profile: hadoop2.7 -> hadoop-2.7 (#41)

81875a6

See https://github.com/apache-spark-on-k8s/spark/blob/k8s-support-alternate-incremental/pom.xml#L2615

Sanitize kubernetesAppId for use in secret, service, and pod names (#45)

b98c852

Support spark.driver.extraJavaOptions (#48)

27f3005

Use OpenJDK8's official Alpine image. (#51)

81bd355

lins05 and others added 4 commits June 8, 2017 17:07

Added log4j config for k8s unit tests. (#314)

2f80b1d

ash211 mentioned this pull request Jun 22, 2017

[SPARK-20992][Scheduler] Add support for Nomad as a scheduler backend apache/spark#18209

Closed

Hong Zhiguo and others added 15 commits June 22, 2017 01:57

Bypass init-containers when possible (#348)

08fe944

Config for hard cpu limit on pods; default unlimited (#356)

8b3248f

Add implicit conversions to imports. (#374)

8c35d81

Otherwise we can get a Scalastyle error when building from SBT.

Fix import order and scalastyle (#375)

db5f5be

Test with ./dev/scalastyle

fix submit job errors (#376)

8751a9a

Add node selectors for driver and executor pods (#355)

6dbd32e

Read classpath entries from SPARK_EXTRA_CLASSPATH on executors. (#383)

b1c48f9

This makes executors consistent with the driver. Note that SPARK_EXTRA_CLASSPATH isn't set anywhere by Spark itself, but it's primarily meant to be set by images that inherit from the base driver/executor images.

Update pom to v0.3.0 of spark-kubernetes (#385)

37f9943

foxish and others added 4 commits August 4, 2017 14:16

Fix bug with null arguments (#415)

af446e6

Flag-guard expensive DNS lookup of cluster node full names, part of H…

96a1d8c

…DFS locality support (#412) (#421) * Flag-guard expensive DNS lookup of cluster node full names, part of HDFS locality support * Clean up a bit * Improve unit tests

Updated pom version to 0.3.1 for the new bug fix 2.1 release

84d4336

Merge pull request #422 from liyinan926/branch-2.1-kubernetes

d2c2e9b

Update pom to v0.3.1 for the new 2.1 release

ash211 changed the title ~~[SPARK-18278][NOSUBMIT] Ongoing diff for Spark on Kubernetes~~ [SPARK-18278][NOSUBMIT] Ongoing diff for Spark on Kubernetes (branch-2.1) Aug 22, 2017

ash211 closed this Aug 22, 2017

ifilonenko pushed a commit to ifilonenko/spark that referenced this pull request Feb 26, 2019

Add publish-local script (apache-spark-on-k8s#200)

4fa9588

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SPARK-18278][NOSUBMIT] Ongoing diff for Spark on Kubernetes (branch-2.1) #200

[SPARK-18278][NOSUBMIT] Ongoing diff for Spark on Kubernetes (branch-2.1) #200

Uh oh!

mccheah commented Mar 22, 2017

Uh oh!

mccheah commented Jul 26, 2017

Uh oh!

ash211 commented Jul 26, 2017

Uh oh!

ash211 commented Aug 22, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

15 participants

[SPARK-18278][NOSUBMIT] Ongoing diff for Spark on Kubernetes (branch-2.1) #200

[SPARK-18278][NOSUBMIT] Ongoing diff for Spark on Kubernetes (branch-2.1) #200

Uh oh!

Conversation

mccheah commented Mar 22, 2017

Uh oh!

mccheah commented Jul 26, 2017

Uh oh!

ash211 commented Jul 26, 2017

Uh oh!

ash211 commented Aug 22, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

15 participants