You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[SPARK-15863][SQL][DOC][SPARKR] sql programming guide updates to include sparkSession in R
## What changes were proposed in this pull request?
Update doc as per discussion in PR #13592
## How was this patch tested?
manual
shivaram liancheng
Author: Felix Cheung <[email protected]>
Closes#13799 from felixcheung/rsqlprogrammingguide.
You can also create SparkDataFrames from Hive tables. To do this we will need to create a SparkSession with Hive support which can access tables in the Hive MetaStore. Note that Spark should have been built with [Hive support](building-spark.html#building-with-hive-and-jdbc-support) and more details can be found in the [SQL programming guide](sql-programming-guide.html#starting-point-sqlcontext). In SparkR, by default it will attempt to create a SparkSession with Hive support enabled (`enableHiveSupport = TRUE`).
155
+
You can also create SparkDataFrames from Hive tables. To do this we will need to create a SparkSession with Hive support which can access tables in the Hive MetaStore. Note that Spark should have been built with [Hive support](building-spark.html#building-with-hive-and-jdbc-support) and more details can be found in the [SQL programming guide](sql-programming-guide.html#starting-point-sparksession). In SparkR, by default it will attempt to create a SparkSession with Hive support enabled (`enableHiveSupport = TRUE`).
Unlike Scala, Java, and Python API, we haven't finished migrating `SQLContext` to `SparkSession` for SparkR yet, so
111
-
the entry point into all relational functionality in SparkR is still the
112
-
`SQLContext` class in Spark 2.0. To create a basic `SQLContext`, all you need is a `SparkContext`.
110
+
The entry point into all functionality in Spark is the [`SparkSession`](api/R/sparkR.session.html) class. To initialize a basic `SparkSession`, just call `sparkR.session()`:
113
111
114
112
{% highlight r %}
115
-
spark <- sparkRSQL.init(sc)
113
+
sparkR.session()
116
114
{% endhighlight %}
117
115
118
-
Note that when invoked for the first time, `sparkRSQL.init()` initializes a global `SQLContext` singleton instance, and always returns a reference to this instance for successive invocations. In this way, users only need to initialize the `SQLContext` once, then SparkR functions like `read.df` will be able to access this global instance implicitly, and users don't need to pass the `SQLContext` instance around.
116
+
Note that when invoked for the first time, `sparkR.session()` initializes a global `SparkSession` singleton instance, and always returns a reference to this instance for successive invocations. In this way, users only need to initialize the `SparkSession` once, then SparkR functions like `read.df` will be able to access this global instance implicitly, and users don't need to pass the `SparkSession` instance around.
119
117
</div>
120
118
</div>
121
119
122
-
`SparkSession`(or `SQLContext` for SparkR) in Spark 2.0 provides builtin support for Hive features including the ability to
120
+
`SparkSession` in Spark 2.0 provides builtin support for Hive features including the ability to
123
121
write queries using HiveQL, access to Hive UDFs, and the ability to read data from Hive tables.
124
122
To use these features, you do not need to have an existing Hive setup.
125
123
@@ -175,15 +173,15 @@ df.show()
175
173
</div>
176
174
177
175
<divdata-lang="r"markdown="1">
178
-
With a `SQLContext`, applications can create DataFrames from an [existing `RDD`](#interoperating-with-rdds),
176
+
With a `SparkSession`, applications can create DataFrames from a local R data.frame,
179
177
from a Hive table, or from [Spark data sources](#data-sources).
180
178
181
179
As an example, the following creates a DataFrame based on the content of a JSON file:
For a complete list of the types of operations that can be performed on a DataFrame refer to the [API Documentation](api/R/index.html).
417
415
418
-
In addition to simple column references and expressions, DataFrames also have a rich library of functions including string manipulation, date arithmetic, common math operations and more. The complete list is available in the [DataFrame Function Reference](api/R/index.html).
416
+
In addition to simple column references and expressions, DataFrames also have a rich library of functions including string manipulation, date arithmetic, common math operations and more. The complete list is available in the [DataFrame Function Reference](api/R/SparkDataFrame.html).
419
417
420
418
</div>
421
419
@@ -452,7 +450,7 @@ df = spark.sql("SELECT * FROM table")
452
450
</div>
453
451
454
452
<divdata-lang="r"markdown="1">
455
-
The `sql` function enables applications to run SQL queries programmatically and returns the result as a `DataFrame`.
453
+
The `sql` function enables applications to run SQL queries programmatically and returns the result as a `SparkDataFrame`.
456
454
457
455
{% highlight r %}
458
456
df <- sql("SELECT * FROM table")
@@ -1159,11 +1157,10 @@ for teenName in teenNames.collect():
1159
1157
<divdata-lang="r"markdown="1">
1160
1158
1161
1159
{% highlight r %}
1162
-
# spark from the previous example is used in this example.
1163
1160
1164
-
schemaPeople # The DataFrame from the previous example.
1161
+
schemaPeople # The SparkDataFrame from the previous example.
1165
1162
1166
-
# DataFrames can be saved as Parquet files, maintaining the schema information.
1163
+
# SparkDataFrame can be saved as Parquet files, maintaining the schema information.
1167
1164
write.parquet(schemaPeople, "people.parquet")
1168
1165
1169
1166
# Read in the Parquet file created above. Parquet files are self-describing so the schema is preserved.
@@ -1342,7 +1339,6 @@ df3.printSchema()
1342
1339
<divdata-lang="r"markdown="1">
1343
1340
1344
1341
{% highlight r %}
1345
-
# spark from the previous example is used in this example.
1346
1342
1347
1343
# Create a simple DataFrame, stored into a partition directory
0 commit comments