Skip to content
This repository was archived by the owner on Jan 9, 2020. It is now read-only.

Conversation

@sahilprasad
Copy link

…e staging server URI

Helpful validation to inhibit users from submitting local dependencies without specifying a URI for the RSS. Closes #339.

How was this patch tested?

Ran provided test suite and manually tested with local and non-local application dependency submissions.

@mccheah
Copy link

mccheah commented Aug 22, 2017

This is not necessarily the case with #437

@ash211
Copy link

ash211 commented Aug 22, 2017

Hi @sahilprasad , thanks for the contribution to this project!

Just earlier today we merged a PR that supports sending "small files" from the submitter to drivers/executors via k8s secrets, for a definition of small. If the files aren't small enough and no RSS is specified, then it throws this exception: https://github.com/apache-spark-on-k8s/spark/pull/437/files#diff-5fd183129559d8c0a34135c347be647bR40

For jars there's no small jar vs large jar distinction, so any time there's a local jar there must be an RSS specified.

Were you looking at this primarily because of files or because of jars?

@sahilprasad
Copy link
Author

@ash211 I didn't realize that PR was there! I'm more concerned with jars here.
@mccheah Not necessarily as in small files submitted without an RSS specification will be sent through secrets, but not jars?

@mccheah
Copy link

mccheah commented Aug 22, 2017

Yeah sorry, to clarify, this change is still valid for jars, but we need to be more careful with local files.

@sahilprasad
Copy link
Author

@mccheah Got it. If I were to change this just to accommodate jars, is there anything else in the way of jar-to-RSS validation that would be good to have?

@mccheah
Copy link

mccheah commented Aug 22, 2017

jars probably don't need any special case. I think local files is actually already handled by the small files bootstrap given that we do a best effort to mount as a secret but when the files are too large we print an informative error message.

@sahilprasad
Copy link
Author

Right, is this change applicable, if modified, to just jars then?

@mccheah
Copy link

mccheah commented Aug 22, 2017

I think so - should be able to write a unit and/or integration test that covers this.

@sahilprasad
Copy link
Author

@mccheah I can't get this test to pass when I build locally: https://github.com/apache-spark-on-k8s/spark/blob/branch-2.2-kubernetes/resource-managers/kubernetes/core/src/test/scala/org/apache/spark/deploy/kubernetes/submit/submitsteps/initcontainer/InitContainerConfigurationStepsOrchestratorSuite.scala#L72

Since the SPARK_JARS seq declared at the top of this class includes a submitter-local dependency (file:///app/jars/jar2.txt), that test fails with the validation error I am introducing with this PR. Since my test essentially constructs the orchestrator in the same way, and catches the thrown IllegalArgumentException, would that justify deleting this test?

@mccheah
Copy link

mccheah commented Aug 22, 2017

This indicates to me that the validation is being added at the incorrect location, because the init container configuration steps orchestrator should always be being built with the arguments already having been validated. In other words, the orchestrator itself shouldn't be doing the validation, but some component above it.

@mccheah
Copy link

mccheah commented Aug 22, 2017

Either that or the test should be patched to make it such that the orchestrator is created with "compatible" arguments - that is, if we're giving the orchestrator local files, it should also be given a resource staging server URI.

@sahilprasad
Copy link
Author

@mccheah Where would the validation take place if not InitContainerConfigurationStepsOrchestrator? DriverConfigurationStepsOrchestrator? The former class already has some validation around the RSS, so I figured I'd put my code there.

One way that I see to patch the test in question would be to include only the first element of SPARK_JARS (the jar prepended with hdfs://). This leads to a successful build. Do you suggest an alternative to this, or does my patch suffice?

@mccheah
Copy link

mccheah commented Aug 23, 2017

The orchestrator can do the validation, but we should then change the test such that the arguments provided to the orchestrator line up with what the expectations are. We should be testing that

  • When you use local jars, you must provide a resource staging server URI
  • When you do not use local jars, it is not strictly necessary either way to provide this URI.

@sahilprasad
Copy link
Author

@mccheah can you review?

.getOrElse(false)

OptionRequirements.requireSecondIfFirstIsDefined(
KubernetesFileUtils.getOnlySubmitterLocalFiles(sparkJars).nonEmpty match {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't match on true or false here - use the functional APIs like map, filter, getOrElse, etc.


assert(sparkConf.get(RESOURCE_STAGING_SERVER_URI).isEmpty)

intercept[IllegalArgumentException] {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we inspect the correctness of the error message as well? We wouldn't want an IllegalArgumentException to come from some other part of the constructor.

@mccheah
Copy link

mccheah commented Aug 24, 2017

Ok to merge once the build passes.

@sahilprasad
Copy link
Author

@mccheah should I squash my commits, or is that done automatically?

@foxish
Copy link
Member

foxish commented Sep 15, 2017

@mccheah this good to merge?

NAMESPACE,
APP_RESOURCE_PREFIX,
SPARK_JARS,
SPARK_JARS.take(1),
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a bit brittle in case SPARK_JARS changes in the future -- we should create a new SPARK_JARS_REMOTE that has only hdfs:// path

@ash211
Copy link

ash211 commented Sep 15, 2017

Good to merge though, and Matt +1'd several weeks ago

@foxish foxish merged commit 8a0f485 into apache-spark-on-k8s:branch-2.2-kubernetes Sep 16, 2017
ifilonenko pushed a commit to ifilonenko/spark that referenced this pull request Feb 26, 2019
apache-spark-on-k8s#447)

* Fail submission if submitter-local files are provided without resource staging server URI

* Modified logic to validate only submitted jars; added orchestrator tests

* Incorporated feedback

* Fix failing test case
ifilonenko pushed a commit to ifilonenko/spark that referenced this pull request Feb 26, 2019
puneetloya pushed a commit to puneetloya/spark that referenced this pull request Mar 11, 2019
apache-spark-on-k8s#447)

* Fail submission if submitter-local files are provided without resource staging server URI

* Modified logic to validate only submitted jars; added orchestrator tests

* Incorporated feedback

* Fix failing test case
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants