Skip to content
This repository was archived by the owner on Jan 9, 2020. It is now read-only.

Conversation

@iyanuobidele
Copy link

This will eventually close #111.

Initial implementation of the CRUD + Watch calls for the SparkJob resource. This will be the base I'll be making patches against.

/cc
@foxish @mccheah @ash211 @erikerlandson @ssuchter

import org.apache.spark.SparkException
import org.apache.spark.util.ThreadUtils

/**
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will take this comment out

spec:
image: "driver-image"
state: "completed"
num-executors: 10 No newline at end of file
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"/n"

kind: ThirdPartyResource
description: "A resource that manages a spark job"
versions:
- name: v1 No newline at end of file
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"/n"

protected val kubeToken: Option[String] = None

implicit val formats: Formats = DefaultFormats + JobStateSerDe

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one extra "/n" to take out

case JobState.QUEUED => JString("QUEUED")
case JobState.RUNNING => JString("RUNNING")
})
) No newline at end of file
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"/n"

* KILLED - A user manually killed this Spark Job
*/
val QUEUED, SUBMITTED, RUNNING, FINISHED, FAILED, KILLED = Value
} No newline at end of file
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

}

def buildOkhttpClientFromWithinPod(client: BaseClient): OkHttpClient = {
val field = classOf[BaseClient].getDeclaredField("httpClient")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need this reflection to grab the httpClient out of the BaseClient? Is this what we have to do because the fabric8 client doesn't support TPR yet?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes you're exactly right.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we move forward the Kubernetes client to support third party resources?

case JString("RUNNING") => JobState.RUNNING
case JString("FINISHED") => JobState.FINISHED
case JString("KILLED") => JobState.KILLED
case JString("FAILED") => JobState.FAILED
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there's probably a nicer way to do this rather than the manual mapping -- I think Scala case classes have an apply method that creates the class you expect?

def getJobObject(name: String): SparkJobState = {
val request = completeRequest(new Request.Builder()
.get()
.url(s"$kubeMaster/${TPR_API_ENDPOINT.format(TPR_API_VERSION, namespace)}/$name"))
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pull these URLs out to statics. Maybe use Feign with the JAX-RS annotations instead?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good. Thanks for the quick review. I'll look into your comments.

* This method relies on the assumption of one sparkjob per namespace
*/
def watchJobObject(): Future[WatchObject] = {
val watchClient = httpClient.newBuilder().readTimeout(0, TimeUnit.MILLISECONDS).build()
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should probably be a clone not newBuilder

@foxish
Copy link
Member

foxish commented Feb 17, 2017

Thanks @iyanuobidele!
A quick comment on the schema we want.
We want to follow the conventions of the other Kubernetes API objects, which means having all the items we discussed in #111 to be nested under a status field.
The way I imagine it is:

apiVersion: "apache.org/v1" # apache.org, conforming with the project itself.
kind: "SparkJob"
metadata:
  name: "spark-job-1"
status:
  image: "driver-image"
  state: "completed"
  numExecutors: 10 # camelcase, instead of num-executors
  ... 
  ... 
  # other status items

I think it previously was nested under spec when we thought we'd want those fields to be mutable. The current thinking is to first use the TPR for reporting status via kubectl and the dashboard, and later extending it to other items. This will still give us the opportunity in future to add spec and expose some other fields there at a later stage.

Copy link

@mccheah mccheah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This review is at a first glance - much of this code assumes the Kubernetes client cannot support third party resources directly. We should look into contributing to the Kubernetes client project. If that doesn't make sense for our use case then we can move forward with this implementation.

import org.apache.spark.deploy.kubernetes.constants._

private[spark] object KubernetesClientBuilder {
private[spark] object ClientBuilder {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any particular reason for the name change?

private val protocol: String = "https://"

// we can also get the host from the environment variable
private val kubeHost: String = {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use what we already have in constants.scala, which provides a hostname that can be DNS resolved within a pod.

case Failure(_) => None
}
host.map(h => h).getOrElse {
// Log a warning just in case, but this should almost certainly never happen
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Throwing an exception here would be best, but this shouldn't be necessary once we use the proper hostname.


// the port from the environment variable
private val kubeHostPort: String = {
val port = Try(sys.env("KUBERNETES_PORT_443_TCP_PORT")) match {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe using the hostname with DNS resolution removes the need to specify the port.

protected implicit val ec: ThreadPoolExecutor = ThreadUtils
.newDaemonCachedThreadPool("tpr-watcher-pool")

private def executeBlocking(cb: => WatchObject): Future[WatchObject] = {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This once again seems like something the kubernetes client should provide for us. Although, it's unclear how the kubernetes client would be able to support completely arbitrary resource types.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Matt for taking a look at this. This is mostly a port of work we did a couple months ago. At the time, we considered making a contribution upstream, but we decided to go the crud way to save some cycles.

I'll spend sometime looking into what it'll take to get a contribution upstream and maybe we could also talk about this at the next sig.

@iyanuobidele
Copy link
Author

Sure @foxish, I'll take note of that. I'll be making patches to this shortly.

@ash211
Copy link

ash211 commented Mar 16, 2017

Please rebase onto branch-2.1-kubernetes and send this PR into that branch instead of k8s-support-alternate-incremental which is now deprecated.

@foxish
Copy link
Member

foxish commented Apr 25, 2017

@iyanuobidele, will you have time to pick this PR up again?

@iyanuobidele iyanuobidele force-pushed the tpr-support branch 2 times, most recently from 05d8885 to c198d9f Compare May 1, 2017 21:08
@iyanuobidele
Copy link
Author

closing this and opening a new PR to cover this.

@iyanuobidele iyanuobidele mentioned this pull request May 19, 2017
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants