-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-15644] [MLlib] [SQL] Replace SQLContext with SparkSession in MLlib #13380
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #59578 has finished for PR 13380 at commit
|
|
cc @jkbradley / @mengxr are we ok with changing the API? |
| @Since("1.6.0") | ||
| def context(sqlContext: SQLContext): this.type = { | ||
| optionSQLContext = Option(sqlContext) | ||
| def context(sparkSession: SparkSession): this.type = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd prefer to maintain compatibility. Can you create a new method which accepts a SparkSession?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. Let me fix it now. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shall we deprecate these existing methods? Perhaps we can call the new method session (for example)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! @MLnick A good point!
@rxin @jkbradley What is your opinion? Should we rename it?
|
Those are the only items I'd modify. Basically, I like switching to SparkSession but also maintaining compatibility for public APIs. Changing the private APIs is fine. |
| */ | ||
| @transient | ||
| private[sql] val sqlContext: SQLContext = new SQLContext(this) | ||
| private[spark] val sqlContext: SQLContext = new SQLContext(this) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rxin Is that OK? Here, the reason why I changed is for the call in https://github.com/gatorsmile/spark/blob/7dcaaa4539324d596c290ca189adc2e65d77bb0c/mllib/src/main/scala/org/apache/spark/ml/util/ReadWrite.scala#L76
|
Test build #59800 has finished for PR 13380 at commit
|
|
retest this please |
|
Test build #60468 has finished for PR 13380 at commit
|
|
FWIW, I think since |
|
+1 for deprecating Sorry for the slow response! |
|
this looks good to me other than deprecating context() and calling the new method session() |
|
@jkbradley Let me change it now. Thanks! |
|
Test build #60720 has finished for PR 13380 at commit
|
|
Test build #60721 has finished for PR 13380 at commit
|
| // override for Java compatibility | ||
| override def session(sparkSession: SparkSession): this.type = super.session(sparkSession) | ||
|
|
||
| // override for Java compatibility |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just realized: Can you please change this to call super.session(sqlContext.sparkSession) to avoid a deprecation warning? Same for the other call to context() below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, will do it. Thanks!
|
Other than that, this looks ready to me. |
|
LGTM pending tests! |
|
Test build #60994 has finished for PR 13380 at commit
|
|
Merging with master, branch-2.0 |
#### What changes were proposed in this pull request? This PR is to use the latest `SparkSession` to replace the existing `SQLContext` in `MLlib`. `SQLContext` is removed from `MLlib`. Also fix a test case issue in `BroadcastJoinSuite`. BTW, `SQLContext` is not being used in the `MLlib` test suites. #### How was this patch tested? Existing test cases. Author: gatorsmile <[email protected]> Author: xiaoli <[email protected]> Author: Xiao Li <[email protected]> Closes #13380 from gatorsmile/sqlContextML. (cherry picked from commit 0e3ce75) Signed-off-by: Joseph K. Bradley <[email protected]>
What changes were proposed in this pull request?
This PR is to use the latest
SparkSessionto replace the existingSQLContextinMLlib.SQLContextis removed fromMLlib.Also fix a test case issue in
BroadcastJoinSuite.BTW,
SQLContextis not being used in theMLlibtest suites.How was this patch tested?
Existing test cases.