-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-24252][SQL] Move Table and TableCapabilities to catalyst module #24410
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@cloud-fan and @mccheah, here is the PR that moves the classes needed for #24246. |
|
Test build #104723 has finished for PR 24410 at commit
|
924bcdf to
8a0ab62
Compare
|
Test build #104726 has finished for PR 24410 at commit
|
8a0ab62 to
44346ac
Compare
|
Test build #104728 has finished for PR 24410 at commit
|
This is in preparation for apache#24246 that adds TableCatalog to catalog.v2 and requires Table in the catalyst module.
44346ac to
f249ddb
Compare
| */ | ||
|
|
||
| package org.apache.spark.sql.sources.v2; | ||
| package org.apache.spark.sql.catalog.v2; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we only want to move to the catalyst module but not the catalog package?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think moving to the catalog package immediately makes sense. If we're going to have a move PR, might as well do as much of the moves as we can in one shot.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with Matt. Why would we only do half of the move in this PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
well, if we want to move the package, shall we consider the proposal from @rxin?
i.e.
TableCatalog and its friends -> org.apache.spark.sql.connectors.catalog
Table and its friends -> org.apache.spark.sql.connectors.table
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cloud-fan, we want to get #24246 in as soon as possible to so that we can get all of this work done in time for the release.
The point of this PR is to make reviewing #24246 easier, not to consider all of the other moves that we may need to make. We need these changes to add the TableCatalog API, so let's get just this done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While I agree we want to get this in sooner, I think it's cumbersome to have two package move PRs if we don't need to. I.e. we should do as few move-only PRs as possible.
What I'd like to avoid is a case in the future where we decide that we should move packages again before we merge another feature. I.e. we end up repeating the situation we're in now, again in some future feature.
So I think it's at least worth considering if there's any reason to think we want the package hierarchy to be different from what we have proposed here. But I wouldn't belabor the point too long - we can always still change our mind later if necessary.
That being said I don't feel strongly either way. @rxin and @cloud-fan, what's the specific rationale for having a separate table package?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it takes time to reach a consensus about how the package hierarchy should be. To get the table catalog PR in ASAP, I'd suggest to not move packages at this stage.
It will be quite wasteful if we decide to have a table package later and undo most of the changes in this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is my fault. I should have insisted from the start that Table should be in the catalog package, but I didn't want a needless argument about it at the time.
Table is part of the catalog API because it is what table catalogs pass back to Spark. That's why it was part of the original PR, #21306. This makes the catalog API self-contained, and the storage API is dependent on it. The two should not be inter-dependent. We should be able to update either one without affecting the other.
But the purpose of this PR is not to block the table catalog API and other work while we debate organization. I'll close this PR and revert the package move in #24246. Then we can get that in without further code churn.
|
Test build #104730 has finished for PR 24410 at commit
|
What changes were proposed in this pull request?
This moves
TableandTableCapabilitiesto the catalyst module. This is in preparation for #24246 that addsTableCatalog. The table catalog interface returnsTableinstances so it must be in catalyst or an upstream module.How was this patch tested?
Existing tests for regressions.