Skip to content

Conversation

@kishorvpatil
Copy link
Contributor

What changes were proposed in this pull request?

During spark-submit, if yarn dist cache is instructed to add same file under --files and --archives, This code change ensures the spark yarn distributed cache behaviour is retained i.e. to warn and fail if same files is mentioned in both --files and --archives.

How was this patch tested?

Manually tested:

  1. if same jar is mentioned in --jars and --files it will continue to submit the job.

Please review https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark before opening a pull request.

… under archives and files

@SparkQA
Copy link

SparkQA commented Oct 25, 2016

Test build #67525 has finished for PR 15627 at commit 9bb1623.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@tgravescs
Copy link
Contributor

@kishorvpatil please look at the test failure

@SparkQA
Copy link

SparkQA commented Oct 26, 2016

Test build #67600 has finished for PR 15627 at commit a3eb6d4.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@tgravescs
Copy link
Contributor

Jenkins, test this please

@tgravescs
Copy link
Contributor

Jenkins, add to whitelist

cachedSecondaryJarLinks += localizedPath
}
} else {
require(localizedPath !=null)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add space after !=

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets change the error to illegal argument exception.
Also lets comment this to indicate jars are ok due to spark 2.0 jar install, everything else shouldn't have multiple of same jar/file/archive.

@SparkQA
Copy link

SparkQA commented Oct 27, 2016

Test build #67648 has finished for PR 15627 at commit 2c55fc2.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Oct 27, 2016

Test build #67649 has finished for PR 15627 at commit 2c55fc2.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

} else {
if (localizedPath != null) {
throw new IllegalArgumentException(s"Attempt to add ($file) multiple times. " +
"Please check the values of 'spark.yarn.dist.files' and/or " +
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you remove the part about check values of those specific configs because there are multiple ways for these to specified (configs or --files, --jars, etc). Perhaps just say please check the values you specified for uploading files,jars, and archives to make sure one isn't specified multiple times..

@SparkQA
Copy link

SparkQA commented Oct 28, 2016

Test build #67719 has finished for PR 15627 at commit a1dc858.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Oct 28, 2016

Test build #67720 has finished for PR 15627 at commit 33f95ab.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

val (_, localizedPath) = distribute(file, resType = resType)
if (addToClasspath && localizedPath != null) {
cachedSecondaryJarLinks += localizedPath
if (addToClasspath) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a comment here explaining what exactly thi sis doing to help explain and keep from breaking in future.

Also can you add another unit test to cover this case.

@SparkQA
Copy link

SparkQA commented Oct 31, 2016

Test build #67826 has finished for PR 15627 at commit f797481.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

val userLib1 = Utils.createTempDir()
val userLib2 = Utils.createTempDir()

val jar1 = TestUtils.createJarWithFiles(Map(), jarsDir)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not used anywhere

test("distribute archive multiple times") {
val libs = Utils.createTempDir()
val jarsDir = new File(libs, "jars")
assert(jarsDir.mkdir())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see jarsDir being used anywhere either

val output = new FileOutputStream(target)
Utils.copyStream(input, output, closeStreams = true)
target.toURI.toURL
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we can cleanup the variables names above I think it would help a lot, the test is confusing. I know you just copy and pasted but would be nice to clean up.
Also can we have 3 tests or 3 asserts,

  • one for same file in --files
  • one for same file in --archives
  • one for same file in --files and --archives

@SparkQA
Copy link

SparkQA commented Nov 3, 2016

Test build #68083 has finished for PR 15627 at commit 51eefa5.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@tgravescs
Copy link
Contributor

+1

asfgit pushed a commit that referenced this pull request Nov 3, 2016
… --files and --archives

## What changes were proposed in this pull request?

During spark-submit, if yarn dist cache is instructed to add same file under --files and --archives, This code change ensures the spark yarn distributed cache behaviour is retained i.e. to warn and fail if same files is mentioned in both --files and --archives.
## How was this patch tested?

Manually tested:
1. if same jar is mentioned in --jars and --files it will continue to submit the job.
- basically functionality [SPARK-14423] #12203 is unchanged
  1. if same file is mentioned in --files and --archives it will fail to submit the job.

Please review https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark before opening a pull request.

… under archives and files

Author: Kishor Patil <[email protected]>

Closes #15627 from kishorvpatil/spark18099.

(cherry picked from commit 098e4ca)
Signed-off-by: Tom Graves <[email protected]>
@asfgit asfgit closed this in 098e4ca Nov 3, 2016
@ueshin
Copy link
Member

ueshin commented Nov 8, 2016

@kishorvpatil @tgravescs It seems this pr is breaking functionalities of --files or --archives.
Using --files or --archives with files which are not included to --jars doesn't work.

}
} else {
require(localizedPath !=null)
if (localizedPath != null) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess here is localizedPath == null ?

@tgravescs
Copy link
Contributor

thanks for pointing this out

@tgravescs
Copy link
Contributor

SPARK-18357 filed to fix

@kishorvpatil
Copy link
Contributor Author

@ueshin Sorry about this. The patch is available in #15810.

asfgit pushed a commit that referenced this pull request Nov 8, 2016
## What changes were proposed in this pull request?

The #15627 broke functionality with yarn --files --archives does not accept any files.
This patch ensures that --files and --archives accept unique files.

## How was this patch tested?

A. I added unit tests.
B. Also, manually tested --files with --archives to throw exception if duplicate files are specified and continue if unique files are specified.

Author: Kishor Patil <[email protected]>

Closes #15810 from kishorvpatil/SPARK18357.

(cherry picked from commit 245e5a2)
Signed-off-by: Tom Graves <[email protected]>
ghost pushed a commit to dbtsai/spark that referenced this pull request Nov 8, 2016
## What changes were proposed in this pull request?

The apache#15627 broke functionality with yarn --files --archives does not accept any files.
This patch ensures that --files and --archives accept unique files.

## How was this patch tested?

A. I added unit tests.
B. Also, manually tested --files with --archives to throw exception if duplicate files are specified and continue if unique files are specified.

Author: Kishor Patil <[email protected]>

Closes apache#15810 from kishorvpatil/SPARK18357.
@ueshin
Copy link
Member

ueshin commented Nov 9, 2016

@kishorvpatil Thank you for fixing this!

uzadude pushed a commit to uzadude/spark that referenced this pull request Jan 27, 2017
… --files and --archives

## What changes were proposed in this pull request?

During spark-submit, if yarn dist cache is instructed to add same file under --files and --archives, This code change ensures the spark yarn distributed cache behaviour is retained i.e. to warn and fail if same files is mentioned in both --files and --archives.
## How was this patch tested?

Manually tested:
1. if same jar is mentioned in --jars and --files it will continue to submit the job.
- basically functionality [SPARK-14423] apache#12203 is unchanged
  1. if same file is mentioned in --files and --archives it will fail to submit the job.

Please review https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark before opening a pull request.

… under archives and files

Author: Kishor Patil <[email protected]>

Closes apache#15627 from kishorvpatil/spark18099.
uzadude pushed a commit to uzadude/spark that referenced this pull request Jan 27, 2017
## What changes were proposed in this pull request?

The apache#15627 broke functionality with yarn --files --archives does not accept any files.
This patch ensures that --files and --archives accept unique files.

## How was this patch tested?

A. I added unit tests.
B. Also, manually tested --files with --archives to throw exception if duplicate files are specified and continue if unique files are specified.

Author: Kishor Patil <[email protected]>

Closes apache#15810 from kishorvpatil/SPARK18357.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants