[SPARK-46423][PYTHON][SQL] Make the Python Data Source instance at DataSource.lookupDataSourceV2 #44374

HyukjinKwon · 2023-12-15T20:13:46Z

What changes were proposed in this pull request?

This PR is a kind of a followup of #44305 that proposes to create Python Data Source instance at DataSource.lookupDataSourceV2

Why are the changes needed?

Semantically the instance has to be ready at DataSource.lookupDataSourceV2 level instead of after that. It's more consistent as well.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Existing tests should cover.

Was this patch authored or co-authored using generative AI tooling?

No.

HyukjinKwon · 2023-12-15T20:14:00Z

cc @cloud-fan and @allisonwang-db

HyukjinKwon · 2023-12-15T23:18:05Z

documentation build will be fixed at #44376.

Merged to master.

allisonwang-db

Nice! This approach is cleaner.

allisonwang-db · 2023-12-18T02:39:10Z

sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala

   * there is no corresponding Data Source V2 implementation, or the provider is configured to
   * fallback to Data Source V1 code path.
   */
  def lookupDataSourceV2(provider: String, conf: SQLConf): Option[TableProvider] = {


Should we remove the changes in the lookupDataSource method above? Since a Python data source can only be a V2 source.

else if (isUserDefinedDataSource) { classOf[PythonTableProvider] }

and

case head :: Nil => // there is exactly one registered alias // TODO(SPARK-45600): should be session-based. val isUserDefinedDataSource = SparkSession.getActiveSession.exists( _.sessionState.dataSourceManager.dataSourceExists(provider)) // The source can be successfully loaded as either a V1 or a V2 data source. // Check if it is also a user-defined data source. if (isUserDefinedDataSource) { throw QueryCompilationErrors.foundMultipleDataSources(provider) } head.getClass

It depends on whether we will call lookupDataSource to detect python data sources. Seems no?

lookupDataSource detects both V1 and V2 classes. lookupDataSourceV2 just creates the instance from V2 classes.

Yes, but technically python data source is not v1 or v2. So it seems ok if lookupDataSource doesn't return python data sources.

But when we extend the Python Data Source features, we will use v2 code path, for example, ExternalCommandRunner. Same as DataStreamReader.loadInternal code path.

Make the Python Data Source instance at DataSource.lookupDataSourceV2

5f6576b

github-actions bot added SQL STRUCTURED STREAMING PYTHON labels Dec 15, 2023

cloud-fan approved these changes Dec 15, 2023

View reviewed changes

Fix a mistake

042e2dd

HyukjinKwon closed this in 62fc27d Dec 15, 2023

allisonwang-db reviewed Dec 18, 2023

View reviewed changes

HyukjinKwon deleted the SPARK-46423 branch January 15, 2024 00:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SPARK-46423][PYTHON][SQL] Make the Python Data Source instance at DataSource.lookupDataSourceV2 #44374

[SPARK-46423][PYTHON][SQL] Make the Python Data Source instance at DataSource.lookupDataSourceV2 #44374

Uh oh!

HyukjinKwon commented Dec 15, 2023

Uh oh!

HyukjinKwon commented Dec 15, 2023

Uh oh!

HyukjinKwon commented Dec 15, 2023

Uh oh!

allisonwang-db left a comment

Uh oh!

allisonwang-db Dec 18, 2023

Uh oh!

cloud-fan Dec 18, 2023

Uh oh!

HyukjinKwon Dec 19, 2023

Uh oh!

cloud-fan Dec 19, 2023

Uh oh!

HyukjinKwon Dec 19, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[SPARK-46423][PYTHON][SQL] Make the Python Data Source instance at DataSource.lookupDataSourceV2 #44374

[SPARK-46423][PYTHON][SQL] Make the Python Data Source instance at DataSource.lookupDataSourceV2 #44374

Uh oh!

Conversation

HyukjinKwon commented Dec 15, 2023

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

HyukjinKwon commented Dec 15, 2023

Uh oh!

HyukjinKwon commented Dec 15, 2023

Uh oh!

allisonwang-db left a comment

Choose a reason for hiding this comment

Uh oh!

allisonwang-db Dec 18, 2023

Choose a reason for hiding this comment

Uh oh!

cloud-fan Dec 18, 2023

Choose a reason for hiding this comment

Uh oh!

HyukjinKwon Dec 19, 2023

Choose a reason for hiding this comment

Uh oh!

cloud-fan Dec 19, 2023

Choose a reason for hiding this comment

Uh oh!

HyukjinKwon Dec 19, 2023

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants