Skip to content

Conversation

@felixcheung
Copy link
Member

and add tests.
Spark submit expects comma-separated list

@SparkQA
Copy link

SparkQA commented Nov 30, 2015

Test build #46860 has finished for PR 10034 at commit 11d4ddc.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@shivaram
Copy link
Contributor

Hmm I think its better to just support both forms ? Isn't c("jar1", "jar2") more natural ?

@felixcheung
Copy link
Member Author

It is. I could add that.

@sun-rui
Copy link
Contributor

sun-rui commented Nov 30, 2015

The example is correct. Jars is expected to a vector of character.

The bug is at https://github.com/apache/spark/blob/master/R/pkg/R/client.R#L47.
The code can be change as follows:

  jars <- paste(jars, collapse = ",")
  if (jars != "") {
     jars <- paste0("--jars ", jars)
  }

It is more natural to use character vector in R. I don't think it necessary to support both forms as it complicates the logic.

@felixcheung
Copy link
Member Author

I considered about changing it that way but thought that would break everyone who had gotten it to work with a comma-separated string.
Also spark-submit (pyspark, spark-shell, spark-submit, sparkR CLIs) is expecting a comma-separated string - it might make more sense for someone who is more familiar with that.

@felixcheung
Copy link
Member Author

I could make it either way - let me know which way we should support.

@sun-rui
Copy link
Contributor

sun-rui commented Dec 1, 2015

My personal opinion is that comma-speparated jars is for command line options, while it is natural to use vector/array in API. jars is Seq[String] in Scala SparkContext API, for example.
If @shivaram insists both way, I am OK. It takes time to support them and more test cases are needed:)

@shivaram
Copy link
Contributor

shivaram commented Dec 1, 2015

Well the only reason to support both use cases it to have some backwards compatibility for users who might already be using this. Can we see how much more complex the code becomes ? Its good to not break compatibility if we can avoid it.

@felixcheung
Copy link
Member Author

it shouldn't be hard. jars <- paste(jars, collapse = ",") sun-rui suggested should gracefully handle the case length(jars) == 1 and has comma-separated values.

assuming we don't plan to do very in-depth checks (eg. not removing leading/trailing space around ,, not checking for c("abc,dbc", "efg")), it should be ok

@sun-rui
Copy link
Contributor

sun-rui commented Dec 1, 2015

@shivaram, from document's point of view, this is not backward compatibility issue:

@param sparkJars Character string vector of jar files to pass to the worker nodes.

The comma-separated string happens to work because there is a redudant code logic that pass the string to SparkSubmit --jars option, which I think is not necessary, as SparkContext.addJar will be called.

So I propose to remove the redudant code logic.

@felixcheung felixcheung changed the title [SPARK-12019][SPARKR] Check param and fix doc for sparkR.init() [SPARK-12019][SPARKR] Support character vector for sparkR.init(), check param and fix doc Dec 3, 2015
@felixcheung
Copy link
Member Author

Updated to support both "abc,def" and c("abc", "def") - we would need to remove empty string anyway ("", " " etc), so not much more work to split by ','

Also refactored to allow better testing and reuse.

@SparkQA
Copy link

SparkQA commented Dec 3, 2015

Test build #47112 has finished for PR 10034 at commit bc98d5b.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@felixcheung
Copy link
Member Author

@shivaram we might want this PR in Spark 1.6 ...

R/pkg/R/client.R Outdated
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems pretty subtle -- Can you add a comment just above saying Leave a space after --jars

@shivaram
Copy link
Contributor

shivaram commented Dec 3, 2015

Thanks @felixcheung -- This is very well done. I just had a couple of minor comments. You are right it will be good to have in 1.6 -- We'll merge this into branch-1.6 and if there is a RC2 we might catch 1.6.0. Lets see

cc @marmbrus

@SparkQA
Copy link

SparkQA commented Dec 3, 2015

Test build #47166 has finished for PR 10034 at commit c820cc3.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@shivaram
Copy link
Contributor

shivaram commented Dec 3, 2015

LGTM. Merging this

asfgit pushed a commit that referenced this pull request Dec 3, 2015
…ck param and fix doc

and add tests.
Spark submit expects comma-separated list

Author: felixcheung <[email protected]>

Closes #10034 from felixcheung/sparkrinitdoc.

(cherry picked from commit 2213441)
Signed-off-by: Shivaram Venkataraman <[email protected]>
@asfgit asfgit closed this in 2213441 Dec 3, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants