Skip to content

Conversation

@rdblue
Copy link
Contributor

@rdblue rdblue commented Apr 18, 2019

What changes were proposed in this pull request?

This moves Table and TableCapabilities to the catalyst module. This is in preparation for #24246 that adds TableCatalog. The table catalog interface returns Table instances so it must be in catalyst or an upstream module.

How was this patch tested?

Existing tests for regressions.

@rdblue
Copy link
Contributor Author

rdblue commented Apr 18, 2019

@cloud-fan and @mccheah, here is the PR that moves the classes needed for #24246.

@SparkQA
Copy link

SparkQA commented Apr 18, 2019

Test build #104723 has finished for PR 24410 at commit 924bcdf.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@rdblue rdblue force-pushed the SPARK-24252-move-v2-table-to-catalyst branch from 924bcdf to 8a0ab62 Compare April 18, 2019 22:11
@SparkQA
Copy link

SparkQA commented Apr 18, 2019

Test build #104726 has finished for PR 24410 at commit 8a0ab62.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dongjoon-hyun dongjoon-hyun changed the title SPARK-24252: Move Table and TableCapabilities to catalyst module. [SPARK-24252][SQL] Move Table and TableCapabilities to catalyst module. Apr 18, 2019
@rdblue rdblue force-pushed the SPARK-24252-move-v2-table-to-catalyst branch from 8a0ab62 to 44346ac Compare April 18, 2019 22:21
@rdblue rdblue changed the title [SPARK-24252][SQL] Move Table and TableCapabilities to catalyst module. [SPARK-24252][SQL] Move Table and TableCapabilities to catalyst module Apr 18, 2019
@SparkQA
Copy link

SparkQA commented Apr 18, 2019

Test build #104728 has finished for PR 24410 at commit 44346ac.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

This is in preparation for apache#24246 that adds TableCatalog to catalog.v2
and requires Table in the catalyst module.
@rdblue rdblue force-pushed the SPARK-24252-move-v2-table-to-catalyst branch from 44346ac to f249ddb Compare April 18, 2019 23:04
*/

package org.apache.spark.sql.sources.v2;
package org.apache.spark.sql.catalog.v2;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we only want to move to the catalyst module but not the catalog package?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think moving to the catalog package immediately makes sense. If we're going to have a move PR, might as well do as much of the moves as we can in one shot.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with Matt. Why would we only do half of the move in this PR?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well, if we want to move the package, shall we consider the proposal from @rxin?

i.e.
TableCatalog and its friends -> org.apache.spark.sql.connectors.catalog
Table and its friends -> org.apache.spark.sql.connectors.table

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cloud-fan, we want to get #24246 in as soon as possible to so that we can get all of this work done in time for the release.

The point of this PR is to make reviewing #24246 easier, not to consider all of the other moves that we may need to make. We need these changes to add the TableCatalog API, so let's get just this done.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I agree we want to get this in sooner, I think it's cumbersome to have two package move PRs if we don't need to. I.e. we should do as few move-only PRs as possible.

What I'd like to avoid is a case in the future where we decide that we should move packages again before we merge another feature. I.e. we end up repeating the situation we're in now, again in some future feature.

So I think it's at least worth considering if there's any reason to think we want the package hierarchy to be different from what we have proposed here. But I wouldn't belabor the point too long - we can always still change our mind later if necessary.

That being said I don't feel strongly either way. @rxin and @cloud-fan, what's the specific rationale for having a separate table package?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it takes time to reach a consensus about how the package hierarchy should be. To get the table catalog PR in ASAP, I'd suggest to not move packages at this stage.

It will be quite wasteful if we decide to have a table package later and undo most of the changes in this PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is my fault. I should have insisted from the start that Table should be in the catalog package, but I didn't want a needless argument about it at the time.

Table is part of the catalog API because it is what table catalogs pass back to Spark. That's why it was part of the original PR, #21306. This makes the catalog API self-contained, and the storage API is dependent on it. The two should not be inter-dependent. We should be able to update either one without affecting the other.

But the purpose of this PR is not to block the table catalog API and other work while we debate organization. I'll close this PR and revert the package move in #24246. Then we can get that in without further code churn.

@SparkQA
Copy link

SparkQA commented Apr 19, 2019

Test build #104730 has finished for PR 24410 at commit f249ddb.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@rdblue rdblue closed this Apr 19, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants