Skip to content
This repository was archived by the owner on Jan 9, 2020. It is now read-only.

Conversation

@foxish
Copy link
Member

@foxish foxish commented Jan 27, 2017

Fixes #29
The executors, service and secret, all point back to the driver pod in the tree.
When we have the TPR here, that'll likely be the root of the tree.

cc @ash211 @mccheah

.withUid(t.getMetadata.getUid)
.build()

secret.getMetadata().setOwnerReferences(ownerRefs.asJava)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bugs in the library prevented me from abstracting this out to a common function.

        def addOwnerReference(owner: HasMetadata, resource: HasMetadata) = {
          val ownerRefs = new ArrayBuffer[OwnerReference]
          ownerRefs += new OwnerReferenceBuilder()
            .withApiVersion(owner.getApiVersion)
            .withController(true)
            .withKind(owner.getKind)
            .withName(owner.getMetadata.getName)
            .withUid(owner.getMetadata.getUid)
            .build()
          resource.getMetadata().setOwnerReferences(ownerRefs.asJava)
          kubernetesClient.resource(resource).createOrReplace()
        }

I couldn't get the above to work for service, although it works for secrets, pods, etc.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to compile for me, does it not run properly?

            if (action == Action.ADDED) {
              def addOwnerReference(owner: HasMetadata, resource: HasMetadata) = {
                val ownerRefs = new ArrayBuffer[OwnerReference]
                ownerRefs += new OwnerReferenceBuilder()
                  .withApiVersion(owner.getApiVersion)
                  .withController(true)
                  .withKind(owner.getKind)
                  .withName(owner.getMetadata.getName)
                  .withUid(owner.getMetadata.getUid)
                  .build()
                resource.getMetadata().setOwnerReferences(ownerRefs.asJava)
                kubernetesClient.resource(resource).createOrReplace()
              }

              addOwnerReference(t, secret)
              addOwnerReference(t, service)
            }

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it doesn't work as expected for the service and fails silently without adding any owner references. Ideally, we'd want to PATCH and not PUT but that doesn't work either.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the link

Copy link

@ash211 ash211 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like where this is going.

One question as well -- if we have the owner references all set up, do we no longer need to proactively destroy resources on shutdown? Is it ok to rely on these owner references to clean everything up, every time?

.withUid(t.getMetadata.getUid)
.build()

secret.getMetadata().setOwnerReferences(ownerRefs.asJava)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to compile for me, does it not run properly?

            if (action == Action.ADDED) {
              def addOwnerReference(owner: HasMetadata, resource: HasMetadata) = {
                val ownerRefs = new ArrayBuffer[OwnerReference]
                ownerRefs += new OwnerReferenceBuilder()
                  .withApiVersion(owner.getApiVersion)
                  .withController(true)
                  .withKind(owner.getKind)
                  .withName(owner.getMetadata.getName)
                  .withUid(owner.getMetadata.getUid)
                  .build()
                resource.getMetadata().setOwnerReferences(ownerRefs.asJava)
                kubernetesClient.resource(resource).createOrReplace()
              }

              addOwnerReference(t, secret)
              addOwnerReference(t, service)
            }

@foxish
Copy link
Member Author

foxish commented Jan 27, 2017

Garbage collection is enabled by default since k8s-1.4. However, one issue is that the GC kicks in only after the driver pod is actually deleted. So, if the container exits and the pod goes to either "Completed" or "Error", the dependent resources stick around, which is not desirable IMO. I am not sure that this is the intended behavior even (it is in beta, so, I will verify if we can add a way to trigger GC on pod completion). For now, GC is a just an additional safeguard to ensure that all resources do get cleaned up when the driver pod is deleted eventually.

The OwnerRefs may also be useful as a grouping mechanism to be used in aggregating resources belonging to one "SparkJob" by kubectl and the dashboard. I'm proposing this currently on another issue and will see how that goes.

@ash211
Copy link

ash211 commented Jan 27, 2017

Thanks for the context. Sounds like we should use it as a backup means of deleting resources (in error cases) but still proactively clean resources up when we're done with them anyway.

@ash211
Copy link

ash211 commented Jan 27, 2017

@mccheah good to merge?

@iyanuobidele
Copy link

iyanuobidele commented Jan 27, 2017

In addition to @foxish's comments, another thing IIRC is that the dependents are deleted asynchronously, I don't know if this is the behavior we want to go for.

However, I believe there is an ongoing issue on supporting synchronous delete and workarounds, perhaps that would be more appropriate for our use case.

LGTM

@foxish
Copy link
Member Author

foxish commented Jan 27, 2017

We don't have naming conflicts between SparkJobs, and we are not using any finalizers which may delay deletion of those resources. I think asynchronous GC should work just as well for us. The GC controller watches for changes to any deletable resources and the time delay wouldn't be significant in practice. In tests so far, cleanup always completed for me within 10 or so seconds.

@iyanuobidele
Copy link

Awesome, thanks for the note.
Sounds good then.

@ash211
Copy link

ash211 commented Jan 27, 2017

That sounds like a +1 from Iyanu and from me, so merging

@ash211 ash211 merged commit c3428f7 into k8s-support-alternate-incremental Jan 27, 2017
@ash211 ash211 deleted the add-gc branch January 27, 2017 19:15
ash211 pushed a commit that referenced this pull request Feb 8, 2017
ash211 pushed a commit that referenced this pull request Mar 8, 2017
foxish added a commit that referenced this pull request Jul 24, 2017
ifilonenko pushed a commit to ifilonenko/spark that referenced this pull request Feb 25, 2019
puneetloya pushed a commit to puneetloya/spark that referenced this pull request Mar 11, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants