-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-31088][SQL] Add back HiveContext and createExternalTable #27815
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #119423 has finished for PR 27815 at commit
|
dongjoon-hyun
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately, HiveContext is more than alias-level-maintenance. And, technically, this falls into a migration from 1.6.x age.
cc @marmbrus and @gatorsmile
|
Retest this please. |
|
Test build #119546 has finished for PR 27815 at commit
|
|
Test build #119555 has finished for PR 27815 at commit
|
| :param jhiveContext: An optional JVM Scala HiveContext. If set, we do not instantiate a new | ||
| :class:`HiveContext` in the JVM, instead we make all calls to this object. | ||
| .. note:: Deprecated in 2.0.0. Use SparkSession.builder.enableHiveSupport().getOrCreate(). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please rethink about this. Although the other removed API are recovered, I believe we should not recover this.
cc @marmbrus
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you justify this position using the rubric we agreed upon?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe @gatorsmile should justify this using the new rubric about why this is maintenance cost are relatively small. We are reviewing this PR.
Also, cc @rxin since he is the release manager for 3.0.0.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Look at the code in this PR. I see a bunch of alias methods that call other methods and a few minimal tests. What part of this is hard to maintain?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
class HiveContext(SQLContext) itself is not a simple alias. You know that. This is a whole new layer from Spark 1.6.x age.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see 63 lines of code in that file? That class itself looks like an alias... not a whole separate implementation. What am I missing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that we have different ideas on the alias. You are saying that all wrappers are alias, aren't you?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you are just delegating to the "correct"/new implementation, then that sounds like an alias to me. If we are maintaining large parallel/duplication implementation, then that is a different story.
|
Test build #120078 has finished for PR 27815 at commit
|
|
Test build #4996 has finished for PR 27815 at commit
|
|
CC @dongjoon-hyun @marmbrus @rxin @srowen This PR is ready for review. |
|
Test build #120242 has finished for PR 27815 at commit
|
srowen
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment as elsewhere: keeping aliases is easy, so it's hard to feel strongly about it either way. I see the value in not changing APIs even across major versions. So, why were these deprecated in the first place, if you intended to never remove them, and why has that only come up now? Going forward, I assume we are just not going to deprecate anything, as nothing will realistically be removed
|
Test build #4997 has finished for PR 27815 at commit
|
|
I also still strongly believe that we should let this go away from 3.0.0. |
|
Adoption of Spark 3.0 is much more important than removal of these deprecated methods. I strongly disagree we removed this APIs in Spark 3.0 at the beginning. |
|
retest this please |
|
To eventually remove these deprecated APIs, we need at least build some migration tools for end users to collect the usage of the deprecated APIs. So far, this is missing based on my knowledge. In general, I believe we need a systematic way for reduce the migration pain and make more end users easily upgrade to the latest version of Apache Spark. |
|
Test build #120288 has finished for PR 27815 at commit
|
|
@gatorsmile where did you disagree with this removal originally? |
|
Let me correct the typo and rewrite my original reply to make my point clear. Based on the latest discussion in the community, I strongly disagree that we removed these APIs in Spark 3.0 initially. I think we need to add them back. If we want to remove them, we need more public discussions in the community. This PR is just to add the removed APIs back and delay the related discussions to the future releases. |
|
Hi, All. |
|
There are three major reasons why I think we should keep HiveContext:
|
|
@gatorsmile these changes happened almost a year ago. I'd like to understand why you have advanced these arguments only now. Has something changed? |
|
@srowen Like the previous releases, we are doing the API auditing and checking all the changes we made in the upcoming 3.0 release. During the auditing, we found it and multiple Spark committers (including @marmbrus @zsxwing @yhuai ) challenged this API change in our offline discussion. Thus, I submitted this PR to add them back. |
cloud-fan
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Since we keep SQLContext as well, this is consistent to also keep HiveContext.
### What changes were proposed in this pull request? Based on the discussion in the mailing list [[Proposal] Modification to Spark's Semantic Versioning Policy](http://apache-spark-developers-list.1001551.n3.nabble.com/Proposal-Modification-to-Spark-s-Semantic-Versioning-Policy-td28938.html) , this PR is to add back the following APIs whose maintenance cost are relatively small. - HiveContext - createExternalTable APIs ### Why are the changes needed? Avoid breaking the APIs that are commonly used. ### Does this PR introduce any user-facing change? Adding back the APIs that were removed in 3.0 branch does not introduce the user-facing changes, because Spark 3.0 has not been released. ### How was this patch tested? add a new test suite for createExternalTable APIs. Closes #27815 from gatorsmile/addAPIsBack. Lead-authored-by: gatorsmile <[email protected]> Co-authored-by: yi.wu <[email protected]> Signed-off-by: gatorsmile <[email protected]> (cherry picked from commit b9eafcb) Signed-off-by: gatorsmile <[email protected]>
### What changes were proposed in this pull request? Based on the discussion in the mailing list [[Proposal] Modification to Spark's Semantic Versioning Policy](http://apache-spark-developers-list.1001551.n3.nabble.com/Proposal-Modification-to-Spark-s-Semantic-Versioning-Policy-td28938.html) , this PR is to add back the following APIs whose maintenance cost are relatively small. - HiveContext - createExternalTable APIs ### Why are the changes needed? Avoid breaking the APIs that are commonly used. ### Does this PR introduce any user-facing change? Adding back the APIs that were removed in 3.0 branch does not introduce the user-facing changes, because Spark 3.0 has not been released. ### How was this patch tested? add a new test suite for createExternalTable APIs. Closes apache#27815 from gatorsmile/addAPIsBack. Lead-authored-by: gatorsmile <[email protected]> Co-authored-by: yi.wu <[email protected]> Signed-off-by: gatorsmile <[email protected]>
What changes were proposed in this pull request?
Based on the discussion in the mailing list [Proposal] Modification to Spark's Semantic Versioning Policy , this PR is to add back the following APIs whose maintenance cost are relatively small.
Why are the changes needed?
Avoid breaking the APIs that are commonly used.
Does this PR introduce any user-facing change?
Adding back the APIs that were removed in 3.0 branch does not introduce the user-facing changes, because Spark 3.0 has not been released.
How was this patch tested?
add a new test suite for createExternalTable APIs.