Comment and doc updates

felixcheung · felixcheung · commit c21713ed32fd · 2015-10-28T16:17:35.000-07:00
diff --git a/R/pkg/R/sparkR.R b/R/pkg/R/sparkR.R
@@ -93,7 +93,7 @@ sparkR.stop <- function() {
 #' sc <- sparkR.init("local[2]", "SparkR", "/home/spark",
 #'                  list(spark.executor.memory="1g"))
 #' sc <- sparkR.init("yarn-client", "SparkR", "/home/spark",
-#'                  list(spark.executor.memory="1g"),
+#'                  list(spark.executor.memory="4g", spark.driver.memory="2g"),
 #'                  list(LD_LIBRARY_PATH="/directory of JVM libraries (libjvm.so) on workers/"),
 #'                  c("jarfile1.jar","jarfile2.jar"))
 #'}
@@ -130,9 +130,6 @@ sparkR.init <- function(
     backendPort <- existingPort
   } else {
     path <- tempfile(pattern = "backend_port")
-    # A few Spark config cannot be set in env:
-    # http://spark.apache.org/docs/latest/configuration.html#application-properties
-    # Add them to spark-submit commandline if not already set in SPARKR_SUBMIT_ARGS
     submitOps <- getClientModeSparkSubmitOpts(
         Sys.getenv("SPARKR_SUBMIT_ARGS", "sparkr-shell"),
         sparkEnvirMap)
@@ -334,6 +331,13 @@ sparkConfToSubmitOps[["spark.driver.extraJavaOptions"]] <- "--driver-java-option
 sparkConfToSubmitOps[["spark.driver.extraLibraryPath"]] <- "--driver-library-path"
 
 # Utility function that returns Spark Submit arguments as a string
+#
+# A few Spark Application and Runtime environment properties cannot take effort after driver
+# JVM has started, as documented in:
+# http://spark.apache.org/docs/latest/configuration.html#application-properties
+# When starting SparkR without using spark-submit, for example, in Rstudio, add them to
+# spark-submit commandline if not already set in SPARKR_SUBMIT_ARGS so that they can be
+# effective.
 getClientModeSparkSubmitOpts <- function(submitOps, sparkEnvirMap) {
   envirToOps <- lapply(ls(sparkConfToSubmitOps), function(conf) {
     opsValue <- sparkEnvirMap[[conf]]
diff --git a/docs/sparkr.md b/docs/sparkr.md
@@ -37,17 +37,27 @@ sc <- sparkR.init()
 sqlContext <- sparkRSQL.init(sc)
 {% endhighlight %}
 
+In the event you are creating `SparkContext` instead of using `sparkR` shell or `spark-submit`, you
+could also specify certain Spark driver properties. Normally these
+[Application properties](configuration.html#application-properties) and [Runtime Environment](configuration.html#runtime-environment) cannot be set programmatically, as the
+driver JVM process would have been started, in this case SparkR takes care of this for you. To set
+them, pass them as you would other configuration properties in the `sparkEnvir` argument.
+
+{% highlight r %}
+sc <- sparkR.init("local[*]", "SparkR", "/home/spark", list(spark.driver.memory="2g"))
+{% endhighlight %}
+
 </div>
 
 ## Creating DataFrames
 With a `SQLContext`, applications can create `DataFrame`s from a local R data frame, from a [Hive table](sql-programming-guide.html#hive-tables), or from other [data sources](sql-programming-guide.html#data-sources).
 
 ### From local data frames
-The simplest way to create a data frame is to convert a local R data frame into a SparkR DataFrame. Specifically we can use `createDataFrame` and pass in the local R data frame to create a SparkR DataFrame. As an example, the following creates a `DataFrame` based using the `faithful` dataset from R. 
+The simplest way to create a data frame is to convert a local R data frame into a SparkR DataFrame. Specifically we can use `createDataFrame` and pass in the local R data frame to create a SparkR DataFrame. As an example, the following creates a `DataFrame` based using the `faithful` dataset from R.
 
 <div data-lang="r"  markdown="1">
 {% highlight r %}
-df <- createDataFrame(sqlContext, faithful) 
+df <- createDataFrame(sqlContext, faithful)
 
 # Displays the content of the DataFrame to stdout
 head(df)
@@ -96,7 +106,7 @@ printSchema(people)
 </div>
 
 The data sources API can also be used to save out DataFrames into multiple file formats. For example we can save the DataFrame from the previous example
-to a Parquet file using `write.df` 
+to a Parquet file using `write.df`
 
 <div data-lang="r"  markdown="1">
 {% highlight r %}
@@ -139,7 +149,7 @@ Here we include some basic examples and a complete list can be found in the [API
 <div data-lang="r"  markdown="1">
 {% highlight r %}
 # Create the DataFrame
-df <- createDataFrame(sqlContext, faithful) 
+df <- createDataFrame(sqlContext, faithful)
 
 # Get basic information about the DataFrame
 df
@@ -152,7 +162,7 @@ head(select(df, df$eruptions))
 ##2     1.800
 ##3     3.333
 
-# You can also pass in column name as strings 
+# You can also pass in column name as strings
 head(select(df, "eruptions"))
 
 # Filter the DataFrame to only retain rows with wait times shorter than 50 mins
@@ -166,7 +176,7 @@ head(filter(df, df$waiting < 50))
 
 </div>
 
-### Grouping, Aggregation 
+### Grouping, Aggregation
 
 SparkR data frames support a number of commonly used functions to aggregate data after grouping. For example we can compute a histogram of the `waiting` time in the `faithful` dataset as shown below
 
@@ -194,7 +204,7 @@ head(arrange(waiting_counts, desc(waiting_counts$count)))
 
 ### Operating on Columns
 
-SparkR also provides a number of functions that can directly applied to columns for data processing and during aggregation. The example below shows the use of basic arithmetic functions. 
+SparkR also provides a number of functions that can directly applied to columns for data processing and during aggregation. The example below shows the use of basic arithmetic functions.
 
 <div data-lang="r"  markdown="1">
 {% highlight r %}