-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-15863][SQL][DOC] Initial SQL programming guide update for Spark 2.0 #13592
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
docs/sql-programming-guide.md
Outdated
| displayTitle: Spark SQL, DataFrames and Datasets Guide | ||
| title: Spark SQL and DataFrames | ||
| displayTitle: Spark SQL and Datasets Guide | ||
| title: Spark SQL and Datasets |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd keep DataFrame in the title since Python is using it.
|
Test build #60277 has finished for PR 13592 at commit
|
10b3618 to
819e109
Compare
|
Test build #60309 has finished for PR 13592 at commit
|
docs/sql-programming-guide.md
Outdated
| {% highlight r %} | ||
| # sc is an existing SparkContext. | ||
| sqlContext <- sparkRSQL.init(sc) | ||
| spark <- sparkRSQL.init(sc) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently, sparkRSQL.init call org.apache.spark.sql.api.r.SQLUtils.createSQLContext which return SQLContext object not SparkSession object. So here it seems we need to update the R api ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
R API is still in experimental status, and we haven't introduced SparkSession to SparkR yet.
docs/sql-programming-guide.md
Outdated
| by Spark SQL provide Spark with more information about the structure of both the data and the computation being performed. Internally, | ||
| Spark SQL uses this extra information to perform extra optimizations. There are several ways to | ||
| interact with Spark SQL including SQL, the DataFrames API and the Datasets API. When computing a result | ||
| interact with Spark SQL including SQL and the Datasets API. When computing a result |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how about , DataFrame API(python/R) and Dataset API(scala/java)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Dataset API" instead of "Datasets API"?
|
Test build #60637 has finished for PR 13592 at commit
|
|
Test build #60664 has finished for PR 13592 at commit
|
|
Test build #60698 has finished for PR 13592 at commit
|
|
Thanks to everyone for the review! |
|
@liancheng Is it worth adding two parameters |
|
Could you please update docs/sparkr.md too which cross link to, eg. sql-programming-guide.html#starting-point-sqlcontext |
| `SQLContext` class, or one of its decedents. To create a basic `SQLContext`, all you need is a SparkContext. | ||
| Unlike Scala, Java, and Python API, we haven't finished migrating `SQLContext` to `SparkSession` for SparkR yet, so | ||
| the entry point into all relational functionality in SparkR is still the | ||
| `SQLContext` class in Spark 2.0. To create a basic `SQLContext`, all you need is a `SparkContext`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SparkSession support has been merged. The entrypoint is now sparkR.session()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And use of SQLContext is deprecated. Please see PR #13751
Both use of SparkContext and SQLContext in SparkR has been deprecated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah, just saw it. Great!
|
Thanks! Let's get it in first and then we can revise it. |
…k 2.0 ## What changes were proposed in this pull request? Initial SQL programming guide update for Spark 2.0. Contents like 1.6 to 2.0 migration guide are still incomplete. We may also want to add more examples for Scala/Java Dataset typed transformations. ## How was this patch tested? N/A Author: Cheng Lian <[email protected]> Closes #13592 from liancheng/sql-programming-guide-2.0. (cherry picked from commit 6df8e38) Signed-off-by: Yin Huai <[email protected]>
|
@felixcheung I merged this one since I think it is better to make changes in parallel using this version as the foundation. Can you help on revising the R related doc? Thanks! |
| # Getting Started | ||
|
|
||
| ## Starting Point: SQLContext | ||
| ## Starting Point: SparkSession |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The doc looks like this. I am not sure if there is a better way to improve this section (making it clear that SparkSession is not available in SparkR). @felixcheung @shivaram maybe you have better ideas?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
@felixcheung Thanks for the review and your work on PR #13751! Was traveling during the weekend. Let's address these comments in follow-up PRs. |
|
@maropu Sorry for the late reply. Yea, adding description to these two options makes sense. Would you like to open a PR for this? Thanks! |
|
@liancheng okay, I'll do that. |
…ude sparkSession in R ## What changes were proposed in this pull request? Update doc as per discussion in PR #13592 ## How was this patch tested? manual shivaram liancheng Author: Felix Cheung <[email protected]> Closes #13799 from felixcheung/rsqlprogrammingguide. (cherry picked from commit 58f6e27) Signed-off-by: Cheng Lian <[email protected]>
…ude sparkSession in R ## What changes were proposed in this pull request? Update doc as per discussion in PR #13592 ## How was this patch tested? manual shivaram liancheng Author: Felix Cheung <[email protected]> Closes #13799 from felixcheung/rsqlprogrammingguide.

What changes were proposed in this pull request?
Initial SQL programming guide update for Spark 2.0. Contents like 1.6 to 2.0 migration guide are still incomplete.
We may also want to add more examples for Scala/Java Dataset typed transformations.
How was this patch tested?
N/A