-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-43124][SQL] Dataset.show projects CommandResults locally #40779
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-43124][SQL] Dataset.show projects CommandResults locally #40779
Conversation
|
cc @cloud-fan, please let me know if you have a better idea. |
|
shall we update |
|
or a more surgical way is to update |
Thanks, that's a good idea! |
|
Updated to |
| } else { | ||
| Column(col).cast(StringType) | ||
| } | ||
| val data = newDf.logicalPlan match { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to deduplicate code, how about
val newDf = logicalPlan match {
case c: CommandResult =>
Dataset.ofRows(sparkSession, LocalRelation(c.output, c.rows)
case _ => toDf()
}
same code as before...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But then we would rely on running ConvertToLocalRelation that can be excluded. Isn't that an issue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if people exclude ConvertToLocalRelation, then we shouldn't do local evaluation here as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All right, then d8210ff contains the deduplication. Thanks for the idea.
…124-dataset-show-projects-commandresults-locally # Conflicts: # sql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala
|
Merged to master. |
|
Thanks all for the review! |
### What changes were proposed in this pull request? Similar to #40779, `Dataset.isEmpty` should also not trigger job execution on CommandResults. This PR converts `CommandResult` to `LocalRelation` in `Dataset.isEmpty` method. ### Why are the changes needed? A simple `spark.sql("show tables").isEmpty` shouldn not require an executor. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Added new UT. ### Was this patch authored or co-authored using generative AI tooling? No Closes #45373 from wForget/SPARK-47270. Authored-by: Zhen Wang <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>
### What changes were proposed in this pull request? Similar to apache#40779, `Dataset.isEmpty` should also not trigger job execution on CommandResults. This PR converts `CommandResult` to `LocalRelation` in `Dataset.isEmpty` method. ### Why are the changes needed? A simple `spark.sql("show tables").isEmpty` shouldn not require an executor. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Added new UT. ### Was this patch authored or co-authored using generative AI tooling? No Closes apache#45373 from wForget/SPARK-47270. Authored-by: Zhen Wang <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>
…ocally `DataSet.show()` currently triggers a job for a simple `show tables` command. This is because the command output contains an `isTemporary` boolean column that needs to be casted to string when we use `show()` on the dataset. This PR converts `CommandResult` to `LocalRelation` and let `ConvertToLocalRelation` to do the casting locally to avoid triggering job execution. A simple `show tables` shouldn not require an executor. No. Added new UT. Closes apache#40779 from peter-toth/SPARK-43124-dataset-show-projects-commandresults-locally. Change-Id: I9971d7ee1f5385bcaf018dee0dd81b3ae3ac33ae Authored-by: Peter Toth <[email protected]> Signed-off-by: Hyukjin Kwon <[email protected]>
…s locally (apache#600) [SPARK-47270][SQL] Dataset.isEmpty projects CommandResults locally ### What changes were proposed in this pull request? Similar to apache#40779, `Dataset.isEmpty` should also not trigger job execution on CommandResults. This PR converts `CommandResult` to `LocalRelation` in `Dataset.isEmpty` method. ### Why are the changes needed? A simple `spark.sql("show tables").isEmpty` shouldn not require an executor. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Added new UT. ### Was this patch authored or co-authored using generative AI tooling? No Closes apache#45373 from wForget/SPARK-47270. Authored-by: Zhen Wang <[email protected]> Signed-off-by: Wenchen Fan <[email protected]> Co-authored-by: Zhen Wang <[email protected]>
What changes were proposed in this pull request?
DataSet.show()currently triggers a job for a simpleshow tablescommand. This is because the command output contains anisTemporaryboolean column that needs to be casted to string when we useshow()on the dataset.This PR converts
CommandResulttoLocalRelationand letConvertToLocalRelationto do the casting locally to avoid triggering job execution.Why are the changes needed?
A simple
show tablesshouldn not require an executor.Does this PR introduce any user-facing change?
No.
How was this patch tested?
Added new UT.