-
Notifications
You must be signed in to change notification settings - Fork 117
TPR Support #284
TPR Support #284
Changes from all commits
3946444
75057f3
e21f3b5
706de99
f762f85
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,27 @@ | ||
| # | ||
| # Licensed to the Apache Software Foundation (ASF) under one or more | ||
| # contributor license agreements. See the NOTICE file distributed with | ||
| # this work for additional information regarding copyright ownership. | ||
| # The ASF licenses this file to You under the Apache License, Version 2.0 | ||
| # (the "License"); you may not use this file except in compliance with | ||
| # the License. You may obtain a copy of the License at | ||
| # | ||
| # http://www.apache.org/licenses/LICENSE-2.0 | ||
| # | ||
| # Unless required by applicable law or agreed to in writing, software | ||
| # distributed under the License is distributed on an "AS IS" BASIS, | ||
| # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
| # See the License for the specific language governing permissions and | ||
| # limitations under the License. | ||
| # | ||
|
|
||
| metadata: | ||
| name: spark-job.apache.org | ||
| labels: | ||
| resource: spark-job | ||
| object: spark | ||
| apiVersion: extensions/v1beta1 | ||
| kind: ThirdPartyResource | ||
| description: "A resource that reports status of a spark job" | ||
| versions: | ||
| - name: v1 |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -210,6 +210,45 @@ the command may then look like the following: | |
| --conf spark.kubernetes.shuffle.labels="app=spark-shuffle-service,spark-version=2.1.0" \ | ||
| local:///opt/spark/examples/jars/spark_examples_2.11-2.2.0.jar 10 400000 2 | ||
|
|
||
| ## ThirdPartyResources for visibility into state of deployed Spark job | ||
|
|
||
| In order to expose the state of a deployed spark job to a kubernetes administrator or user, via the kubectl or the | ||
| kubernetes dashboard, we have added a kubernetes Resource (of kind: SparkJob) storing pertinent information | ||
| related to a specific spark job. | ||
|
|
||
| Using this, we can view current and all past (if not already cleaned up) deployed spark apps within the | ||
| current namespace using `kubectl` like so: | ||
|
|
||
| kubectl get sparkjobs | ||
|
|
||
| Or via the kubernetes dashboard using the link as provided by: | ||
|
|
||
| kubectl cluster-info | ||
|
|
||
|
|
||
| ### Prerequisites | ||
|
|
||
| Note that this resource is dependent on extending the kubernetes API using a | ||
| [ThirdPartyResource (TPR)](https://kubernetes.io/docs/tasks/access-kubernetes-api/extend-api-third-party-resource/). | ||
|
|
||
| TPRs are available in K8s API as of v1.5 | ||
|
|
||
| See conf/kubernetes-custom-resource.yaml for the recommended yaml file. From the spark base directory, | ||
| we can create the recommended TPR like so: | ||
|
|
||
| kubectl create -f conf/kubernetes-custom-resource.yaml | ||
|
|
||
| ### Important Things to note | ||
|
|
||
| TPRs are an alpha feature that might not be available in every cluster. | ||
| TPRs need to be manually cleaned up because garbage collection support does not exist for them yet. | ||
|
|
||
| ### Future work | ||
|
|
||
| Kube administrators or users would be able to stop a spark app running in their cluster by simply | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Kubernetes cluster administrators or users should be able to stop... |
||
| deleting the attached resource. | ||
|
|
||
|
|
||
| ## Advanced | ||
|
|
||
| ### Securing the Resource Staging Server with TLS | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,33 @@ | ||
| /* | ||
| * Licensed to the Apache Software Foundation (ASF) under one or more | ||
| * contributor license agreements. See the NOTICE file distributed with | ||
| * this work for additional information regarding copyright ownership. | ||
| * The ASF licenses this file to You under the Apache License, Version 2.0 | ||
| * (the "License"); you may not use this file except in compliance with | ||
| * the License. You may obtain a copy of the License at | ||
| * | ||
| * http://www.apache.org/licenses/LICENSE-2.0 | ||
| * | ||
| * Unless required by applicable law or agreed to in writing, software | ||
| * distributed under the License is distributed on an "AS IS" BASIS, | ||
| * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
| * See the License for the specific language governing permissions and | ||
| * limitations under the License. | ||
| */ | ||
|
|
||
| package org.apache.spark.deploy.kubernetes.tpr | ||
|
|
||
| private[spark] object JobState extends Enumeration { | ||
| type JobState = Value | ||
|
|
||
| /* | ||
| * QUEUED - Spark Job has been queued to run | ||
| * SUBMITTED - Driver Pod deployed but tasks are not yet scheduled on worker pod(s) | ||
| * RUNNING - Task(s) have been allocated to worker pod(s) to run and Spark Job is now running | ||
| * FINISHED - Spark Job ran and exited cleanly, i.e, worker pod(s) and driver pod were | ||
| * gracefully deleted | ||
| * FAILED - Spark Job Failed due to error | ||
| * KILLED - A user manually killed this Spark Job | ||
|
||
| */ | ||
| val QUEUED, SUBMITTED, RUNNING, FINISHED, FAILED, KILLED = Value | ||
| } | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,46 @@ | ||
| /* | ||
| * Licensed to the Apache Software Foundation (ASF) under one or more | ||
| * contributor license agreements. See the NOTICE file distributed with | ||
| * this work for additional information regarding copyright ownership. | ||
| * The ASF licenses this file to You under the Apache License, Version 2.0 | ||
| * (the "License"); you may not use this file except in compliance with | ||
| * the License. You may obtain a copy of the License at | ||
| * | ||
| * http://www.apache.org/licenses/LICENSE-2.0 | ||
| * | ||
| * Unless required by applicable law or agreed to in writing, software | ||
| * distributed under the License is distributed on an "AS IS" BASIS, | ||
| * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
| * See the License for the specific language governing permissions and | ||
| * limitations under the License. | ||
| */ | ||
|
|
||
| package org.apache.spark.deploy.kubernetes.tpr | ||
|
|
||
| import org.json4s.{CustomSerializer, JString} | ||
| import org.json4s.JsonAST.JNull | ||
|
|
||
| import org.apache.spark.deploy.kubernetes.tpr.JobState.JobState | ||
|
|
||
| /** | ||
| * JobState Serializer and Deserializer | ||
| */ | ||
| private[spark] object JobStateSerDe extends CustomSerializer[JobState](_ => | ||
|
||
| ({ | ||
| case JString("SUBMITTED") => JobState.SUBMITTED | ||
| case JString("QUEUED") => JobState.QUEUED | ||
| case JString("RUNNING") => JobState.RUNNING | ||
| case JString("FINISHED") => JobState.FINISHED | ||
| case JString("KILLED") => JobState.KILLED | ||
| case JString("FAILED") => JobState.FAILED | ||
| case JNull => | ||
| throw new UnsupportedOperationException("No JobState Specified") | ||
| }, { | ||
| case JobState.FAILED => JString("FAILED") | ||
| case JobState.SUBMITTED => JString("SUBMITTED") | ||
| case JobState.KILLED => JString("KILLED") | ||
| case JobState.FINISHED => JString("FINISHED") | ||
| case JobState.QUEUED => JString("QUEUED") | ||
| case JobState.RUNNING => JString("RUNNING") | ||
| }) | ||
| ) | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
put the version requirement information here. starting in what version of k8s will the provided yaml file work?