Skip to content

Conversation

@cloud-fan
Copy link
Contributor

@cloud-fan cloud-fan commented May 4, 2016

What changes were proposed in this pull request?

see #12873 (comment). The problem is, if we create SparkContext first and then call SparkSession.builder.enableHiveSupport().getOrCreate(), we will reuse the existing SparkContext and the hive flag won't be set.

How was this patch tested?

verified it locally.

@cloud-fan
Copy link
Contributor Author

cc @andrewor14 @rxin

@rxin
Copy link
Contributor

rxin commented May 4, 2016

Are these not called only once?

@SparkQA
Copy link

SparkQA commented May 4, 2016

Test build #57732 has finished for PR 12890 at commit 49d653b.

  • This patch fails build dependency tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor Author

cloud-fan commented May 4, 2016

yea, but look at the logic of Builder.getOrCreate:

def getOrCreate(): SparkSession = synchronized {
      // Step 1. Create a SparkConf
      // Step 2. Get a SparkContext
      // Step 3. Get a SparkSession
      val sparkConf = new SparkConf()
      options.foreach { case (k, v) => sparkConf.set(k, v) }
      val sparkContext = SparkContext.getOrCreate(sparkConf)

      SQLContext.getOrCreate(sparkContext).sparkSession
    }

In REPL, we create the SparkContext first, then SparkSession, so the getOrCreate won't create a new SparkSession, and hive support is not enabled.

@cloud-fan
Copy link
Contributor Author

retest this please

@rxin
Copy link
Contributor

rxin commented May 4, 2016

Sorry I still don't understand what's going on by looking at your comment. One thing is that we are going to remove withHiveSupport, so adding it back isn't a "fix" per se.

@rxin
Copy link
Contributor

rxin commented May 4, 2016

It seems like there is an underlying problem here. Can we fix that?

@cloud-fan cloud-fan changed the title [SQL] revert 2 REPL changes in SPARK-15073 [SPARK-15116] In REPL we should create SparkSession first and get SparkContext from it May 4, 2016
@SparkQA
Copy link

SparkQA commented May 4, 2016

Test build #57733 has finished for PR 12890 at commit 49d653b.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented May 4, 2016

Test build #57742 has finished for PR 12890 at commit 2681e79.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@yhuai
Copy link
Contributor

yhuai commented May 4, 2016

Is the problem that {{val sparkContext = SparkContext.getOrCreate(sparkConf)}} will give us a {{sparkContext}} that already created by the repl and its conf does not spark.sql.catalogImplementation to hive?

val builder = SparkSession.builder.config(conf)
if (SparkSession.hiveClassesArePresent) {
sparkSession = SparkSession.builder.enableHiveSupport().getOrCreate()
sparkSession = builder.enableHiveSupport().getOrCreate()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we want to use builder.config("spark.sql.catalogImplementation", "hive")?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's what enableHiveSupport does?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, right. We still have this method.

@andrewor14
Copy link
Contributor

LGTM

@andrewor14
Copy link
Contributor

andrewor14 commented May 4, 2016

By the way I think the comment #12873 (comment) captures the point but is a little misleading. The issue here is that if there's already a SparkContext then we just throw away all the conf set through the builder, including the spark.sql.catalogImplementation! Note that, however, we always create a new SparkSession even before this patch.

@rxin
Copy link
Contributor

rxin commented May 4, 2016

It seems like we might need a little bit more of design here to figure out what the right behavior should be. Let's talk more offline.

@andrewor14
Copy link
Contributor

andrewor14 commented May 4, 2016

This patch itself has little to do with the design so let's just merge this first.
Merging into master 2.0.

asfgit pushed a commit that referenced this pull request May 4, 2016
…rkContext from it

## What changes were proposed in this pull request?

see #12873 (comment). The problem is, if we create `SparkContext` first and then call `SparkSession.builder.enableHiveSupport().getOrCreate()`, we will reuse the existing `SparkContext` and the hive flag won't be set.

## How was this patch tested?

verified it locally.

Author: Wenchen Fan <[email protected]>

Closes #12890 from cloud-fan/repl.
@asfgit asfgit closed this in a432a2b May 4, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants