Skip to content

Conversation

@liancheng
Copy link
Contributor

@liancheng liancheng commented Jun 10, 2016

What changes were proposed in this pull request?

Initial SQL programming guide update for Spark 2.0. Contents like 1.6 to 2.0 migration guide are still incomplete.

We may also want to add more examples for Scala/Java Dataset typed transformations.

How was this patch tested?

N/A

displayTitle: Spark SQL, DataFrames and Datasets Guide
title: Spark SQL and DataFrames
displayTitle: Spark SQL and Datasets Guide
title: Spark SQL and Datasets
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd keep DataFrame in the title since Python is using it.

@SparkQA
Copy link

SparkQA commented Jun 10, 2016

Test build #60277 has finished for PR 13592 at commit 92f3f11.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@liancheng liancheng force-pushed the sql-programming-guide-2.0 branch from 10b3618 to 819e109 Compare June 10, 2016 18:50
@SparkQA
Copy link

SparkQA commented Jun 10, 2016

Test build #60309 has finished for PR 13592 at commit 819e109.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

{% highlight r %}
# sc is an existing SparkContext.
sqlContext <- sparkRSQL.init(sc)
spark <- sparkRSQL.init(sc)
Copy link
Contributor

@WeichenXu123 WeichenXu123 Jun 11, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, sparkRSQL.init call org.apache.spark.sql.api.r.SQLUtils.createSQLContext which return SQLContext object not SparkSession object. So here it seems we need to update the R api ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

R API is still in experimental status, and we haven't introduced SparkSession to SparkR yet.

by Spark SQL provide Spark with more information about the structure of both the data and the computation being performed. Internally,
Spark SQL uses this extra information to perform extra optimizations. There are several ways to
interact with Spark SQL including SQL, the DataFrames API and the Datasets API. When computing a result
interact with Spark SQL including SQL and the Datasets API. When computing a result
Copy link
Contributor

@cloud-fan cloud-fan Jun 13, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about , DataFrame API(python/R) and Dataset API(scala/java)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Dataset API" instead of "Datasets API"?

@SparkQA
Copy link

SparkQA commented Jun 16, 2016

Test build #60637 has finished for PR 13592 at commit 200a68c.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 16, 2016

Test build #60664 has finished for PR 13592 at commit 4b3c4d3.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 17, 2016

Test build #60698 has finished for PR 13592 at commit f413cbb.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@liancheng
Copy link
Contributor Author

liancheng commented Jun 17, 2016

Thanks to everyone for the review!

@maropu
Copy link
Member

maropu commented Jun 18, 2016

@liancheng Is it worth adding two parameters spark.sql.files.maxPartitionBytes and spark.sql.files.openCostInBytes in Other Configuration Options? They are kinds of internal parameters though, it seems they are useful for the users that would like to control #partitions. https://issues.apache.org/jira/browse/SPARK-15894

@felixcheung
Copy link
Member

Could you please update docs/sparkr.md too which cross link to, eg. sql-programming-guide.html#starting-point-sqlcontext

`SQLContext` class, or one of its decedents. To create a basic `SQLContext`, all you need is a SparkContext.
Unlike Scala, Java, and Python API, we haven't finished migrating `SQLContext` to `SparkSession` for SparkR yet, so
the entry point into all relational functionality in SparkR is still the
`SQLContext` class in Spark 2.0. To create a basic `SQLContext`, all you need is a `SparkContext`.
Copy link
Member

@felixcheung felixcheung Jun 20, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SparkSession support has been merged. The entrypoint is now sparkR.session()

Copy link
Member

@felixcheung felixcheung Jun 20, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And use of SQLContext is deprecated. Please see PR #13751
Both use of SparkContext and SQLContext in SparkR has been deprecated.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, just saw it. Great!

@yhuai
Copy link
Contributor

yhuai commented Jun 20, 2016

Thanks! Let's get it in first and then we can revise it.

asfgit pushed a commit that referenced this pull request Jun 20, 2016
…k 2.0

## What changes were proposed in this pull request?

Initial SQL programming guide update for Spark 2.0. Contents like 1.6 to 2.0 migration guide are still incomplete.

We may also want to add more examples for Scala/Java Dataset typed transformations.

## How was this patch tested?

N/A

Author: Cheng Lian <[email protected]>

Closes #13592 from liancheng/sql-programming-guide-2.0.

(cherry picked from commit 6df8e38)
Signed-off-by: Yin Huai <[email protected]>
@asfgit asfgit closed this in 6df8e38 Jun 20, 2016
@yhuai
Copy link
Contributor

yhuai commented Jun 20, 2016

@felixcheung I merged this one since I think it is better to make changes in parallel using this version as the foundation. Can you help on revising the R related doc? Thanks!

# Getting Started

## Starting Point: SQLContext
## Starting Point: SparkSession
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image

The doc looks like this. I am not sure if there is a better way to improve this section (making it clear that SparkSession is not available in SparkR). @felixcheung @shivaram maybe you have better ideas?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@liancheng
Copy link
Contributor Author

@felixcheung Thanks for the review and your work on PR #13751! Was traveling during the weekend. Let's address these comments in follow-up PRs.

@liancheng liancheng deleted the sql-programming-guide-2.0 branch June 20, 2016 23:50
@liancheng
Copy link
Contributor Author

@maropu Sorry for the late reply. Yea, adding description to these two options makes sense. Would you like to open a PR for this? Thanks!

@maropu
Copy link
Member

maropu commented Jun 21, 2016

@liancheng okay, I'll do that.

asfgit pushed a commit that referenced this pull request Jun 21, 2016
…ude sparkSession in R

## What changes were proposed in this pull request?

Update doc as per discussion in PR #13592

## How was this patch tested?

manual

shivaram liancheng

Author: Felix Cheung <[email protected]>

Closes #13799 from felixcheung/rsqlprogrammingguide.

(cherry picked from commit 58f6e27)
Signed-off-by: Cheng Lian <[email protected]>
asfgit pushed a commit that referenced this pull request Jun 21, 2016
…ude sparkSession in R

## What changes were proposed in this pull request?

Update doc as per discussion in PR #13592

## How was this patch tested?

manual

shivaram liancheng

Author: Felix Cheung <[email protected]>

Closes #13799 from felixcheung/rsqlprogrammingguide.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

10 participants