Skip to content

Conversation

@peter-toth
Copy link
Contributor

@peter-toth peter-toth commented Apr 13, 2023

What changes were proposed in this pull request?

DataSet.show() currently triggers a job for a simple show tables command. This is because the command output contains an isTemporary boolean column that needs to be casted to string when we use show() on the dataset.
This PR converts CommandResult to LocalRelation and let ConvertToLocalRelation to do the casting locally to avoid triggering job execution.

Why are the changes needed?

A simple show tables shouldn not require an executor.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Added new UT.

@github-actions github-actions bot added the SQL label Apr 13, 2023
@peter-toth
Copy link
Contributor Author

cc @cloud-fan, please let me know if you have a better idea.

@cloud-fan
Copy link
Contributor

shall we update ConvertToLocalRelation to support CommandResult as well?

@cloud-fan
Copy link
Contributor

cloud-fan commented Apr 13, 2023

or a more surgical way is to update Dataset.getRows, convert CommandResult to LocalRelation in the logical plan, and then execute the new logical plan.

@peter-toth
Copy link
Contributor Author

or a more surgical way is to update Dataset.getRows, convert CommandResult to LocalRelation in the logical plan, and then execute the new logical plan.

Thanks, that's a good idea!

@peter-toth
Copy link
Contributor Author

Updated to LocalRelation conversion in Dataset.getRows() in 98bf071

} else {
Column(col).cast(StringType)
}
val data = newDf.logicalPlan match {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to deduplicate code, how about

val newDf = logicalPlan match {
  case c: CommandResult =>
    Dataset.ofRows(sparkSession, LocalRelation(c.output, c.rows)
  case _ => toDf()
}
same code as before...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But then we would rely on running ConvertToLocalRelation that can be excluded. Isn't that an issue?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if people exclude ConvertToLocalRelation, then we shouldn't do local evaluation here as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All right, then d8210ff contains the deduplication. Thanks for the idea.

…124-dataset-show-projects-commandresults-locally

# Conflicts:
#	sql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala
@HyukjinKwon
Copy link
Member

Merged to master.

@peter-toth
Copy link
Contributor Author

Thanks all for the review!

cloud-fan pushed a commit that referenced this pull request Mar 11, 2024
### What changes were proposed in this pull request?

Similar to #40779, `Dataset.isEmpty` should also not trigger job execution on CommandResults.

This PR converts `CommandResult` to `LocalRelation` in `Dataset.isEmpty` method.

### Why are the changes needed?

A simple `spark.sql("show tables").isEmpty` shouldn not require an executor.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Added new UT.

### Was this patch authored or co-authored using generative AI tooling?

No

Closes #45373 from wForget/SPARK-47270.

Authored-by: Zhen Wang <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
sweisdb pushed a commit to sweisdb/spark that referenced this pull request Apr 1, 2024
### What changes were proposed in this pull request?

Similar to apache#40779, `Dataset.isEmpty` should also not trigger job execution on CommandResults.

This PR converts `CommandResult` to `LocalRelation` in `Dataset.isEmpty` method.

### Why are the changes needed?

A simple `spark.sql("show tables").isEmpty` shouldn not require an executor.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Added new UT.

### Was this patch authored or co-authored using generative AI tooling?

No

Closes apache#45373 from wForget/SPARK-47270.

Authored-by: Zhen Wang <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
peter-toth added a commit to peter-toth/spark that referenced this pull request Nov 26, 2024
…ocally

`DataSet.show()` currently triggers a job for a simple `show tables` command. This is because the command output contains an `isTemporary` boolean column that needs to be casted to string when we use `show()` on the dataset.
This PR converts `CommandResult` to `LocalRelation` and let `ConvertToLocalRelation` to do the casting locally to avoid triggering job execution.

A simple `show tables` shouldn not require an executor.

No.

Added new UT.

Closes apache#40779 from peter-toth/SPARK-43124-dataset-show-projects-commandresults-locally.

Change-Id: I9971d7ee1f5385bcaf018dee0dd81b3ae3ac33ae
Authored-by: Peter Toth <[email protected]>
Signed-off-by: Hyukjin Kwon <[email protected]>
turboFei added a commit to turboFei/spark that referenced this pull request Nov 6, 2025
…s locally (apache#600)

[SPARK-47270][SQL] Dataset.isEmpty projects CommandResults locally

### What changes were proposed in this pull request?

Similar to apache#40779, `Dataset.isEmpty` should also not trigger job execution on CommandResults.

This PR converts `CommandResult` to `LocalRelation` in `Dataset.isEmpty` method.

### Why are the changes needed?

A simple `spark.sql("show tables").isEmpty` shouldn not require an executor.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Added new UT.

### Was this patch authored or co-authored using generative AI tooling?

No

Closes apache#45373 from wForget/SPARK-47270.

Authored-by: Zhen Wang <[email protected]>

Signed-off-by: Wenchen Fan <[email protected]>
Co-authored-by: Zhen Wang <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants