Skip to content

Commit c21713e

Browse files
committed
Comment and doc updates
1 parent ceeca81 commit c21713e

File tree

2 files changed

+25
-11
lines changed

2 files changed

+25
-11
lines changed

R/pkg/R/sparkR.R

Lines changed: 8 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -93,7 +93,7 @@ sparkR.stop <- function() {
9393
#' sc <- sparkR.init("local[2]", "SparkR", "/home/spark",
9494
#' list(spark.executor.memory="1g"))
9595
#' sc <- sparkR.init("yarn-client", "SparkR", "/home/spark",
96-
#' list(spark.executor.memory="1g"),
96+
#' list(spark.executor.memory="4g", spark.driver.memory="2g"),
9797
#' list(LD_LIBRARY_PATH="/directory of JVM libraries (libjvm.so) on workers/"),
9898
#' c("jarfile1.jar","jarfile2.jar"))
9999
#'}
@@ -130,9 +130,6 @@ sparkR.init <- function(
130130
backendPort <- existingPort
131131
} else {
132132
path <- tempfile(pattern = "backend_port")
133-
# A few Spark config cannot be set in env:
134-
# http://spark.apache.org/docs/latest/configuration.html#application-properties
135-
# Add them to spark-submit commandline if not already set in SPARKR_SUBMIT_ARGS
136133
submitOps <- getClientModeSparkSubmitOpts(
137134
Sys.getenv("SPARKR_SUBMIT_ARGS", "sparkr-shell"),
138135
sparkEnvirMap)
@@ -334,6 +331,13 @@ sparkConfToSubmitOps[["spark.driver.extraJavaOptions"]] <- "--driver-java-option
334331
sparkConfToSubmitOps[["spark.driver.extraLibraryPath"]] <- "--driver-library-path"
335332

336333
# Utility function that returns Spark Submit arguments as a string
334+
#
335+
# A few Spark Application and Runtime environment properties cannot take effort after driver
336+
# JVM has started, as documented in:
337+
# http://spark.apache.org/docs/latest/configuration.html#application-properties
338+
# When starting SparkR without using spark-submit, for example, in Rstudio, add them to
339+
# spark-submit commandline if not already set in SPARKR_SUBMIT_ARGS so that they can be
340+
# effective.
337341
getClientModeSparkSubmitOpts <- function(submitOps, sparkEnvirMap) {
338342
envirToOps <- lapply(ls(sparkConfToSubmitOps), function(conf) {
339343
opsValue <- sparkEnvirMap[[conf]]

docs/sparkr.md

Lines changed: 17 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -37,17 +37,27 @@ sc <- sparkR.init()
3737
sqlContext <- sparkRSQL.init(sc)
3838
{% endhighlight %}
3939

40+
In the event you are creating `SparkContext` instead of using `sparkR` shell or `spark-submit`, you
41+
could also specify certain Spark driver properties. Normally these
42+
[Application properties](configuration.html#application-properties) and [Runtime Environment](configuration.html#runtime-environment) cannot be set programmatically, as the
43+
driver JVM process would have been started, in this case SparkR takes care of this for you. To set
44+
them, pass them as you would other configuration properties in the `sparkEnvir` argument.
45+
46+
{% highlight r %}
47+
sc <- sparkR.init("local[*]", "SparkR", "/home/spark", list(spark.driver.memory="2g"))
48+
{% endhighlight %}
49+
4050
</div>
4151

4252
## Creating DataFrames
4353
With a `SQLContext`, applications can create `DataFrame`s from a local R data frame, from a [Hive table](sql-programming-guide.html#hive-tables), or from other [data sources](sql-programming-guide.html#data-sources).
4454

4555
### From local data frames
46-
The simplest way to create a data frame is to convert a local R data frame into a SparkR DataFrame. Specifically we can use `createDataFrame` and pass in the local R data frame to create a SparkR DataFrame. As an example, the following creates a `DataFrame` based using the `faithful` dataset from R.
56+
The simplest way to create a data frame is to convert a local R data frame into a SparkR DataFrame. Specifically we can use `createDataFrame` and pass in the local R data frame to create a SparkR DataFrame. As an example, the following creates a `DataFrame` based using the `faithful` dataset from R.
4757

4858
<div data-lang="r" markdown="1">
4959
{% highlight r %}
50-
df <- createDataFrame(sqlContext, faithful)
60+
df <- createDataFrame(sqlContext, faithful)
5161

5262
# Displays the content of the DataFrame to stdout
5363
head(df)
@@ -96,7 +106,7 @@ printSchema(people)
96106
</div>
97107

98108
The data sources API can also be used to save out DataFrames into multiple file formats. For example we can save the DataFrame from the previous example
99-
to a Parquet file using `write.df`
109+
to a Parquet file using `write.df`
100110

101111
<div data-lang="r" markdown="1">
102112
{% highlight r %}
@@ -139,7 +149,7 @@ Here we include some basic examples and a complete list can be found in the [API
139149
<div data-lang="r" markdown="1">
140150
{% highlight r %}
141151
# Create the DataFrame
142-
df <- createDataFrame(sqlContext, faithful)
152+
df <- createDataFrame(sqlContext, faithful)
143153

144154
# Get basic information about the DataFrame
145155
df
@@ -152,7 +162,7 @@ head(select(df, df$eruptions))
152162
##2 1.800
153163
##3 3.333
154164

155-
# You can also pass in column name as strings
165+
# You can also pass in column name as strings
156166
head(select(df, "eruptions"))
157167

158168
# Filter the DataFrame to only retain rows with wait times shorter than 50 mins
@@ -166,7 +176,7 @@ head(filter(df, df$waiting < 50))
166176

167177
</div>
168178

169-
### Grouping, Aggregation
179+
### Grouping, Aggregation
170180

171181
SparkR data frames support a number of commonly used functions to aggregate data after grouping. For example we can compute a histogram of the `waiting` time in the `faithful` dataset as shown below
172182

@@ -194,7 +204,7 @@ head(arrange(waiting_counts, desc(waiting_counts$count)))
194204

195205
### Operating on Columns
196206

197-
SparkR also provides a number of functions that can directly applied to columns for data processing and during aggregation. The example below shows the use of basic arithmetic functions.
207+
SparkR also provides a number of functions that can directly applied to columns for data processing and during aggregation. The example below shows the use of basic arithmetic functions.
198208

199209
<div data-lang="r" markdown="1">
200210
{% highlight r %}

0 commit comments

Comments
 (0)