Skip to content

Conversation

@wForget
Copy link
Member

@wForget wForget commented Mar 4, 2024

What changes were proposed in this pull request?

Similar to #40779, Dataset.isEmpty should also not trigger job execution on CommandResults.

This PR converts CommandResult to LocalRelation in Dataset.isEmpty method.

Why are the changes needed?

A simple spark.sql("show tables").isEmpty shouldn not require an executor.

Does this PR introduce any user-facing change?

No

How was this patch tested?

Added new UT.

Was this patch authored or co-authored using generative AI tooling?

No

@github-actions github-actions bot added the SQL label Mar 4, 2024
@wForget
Copy link
Member Author

wForget commented Mar 5, 2024

@peter-toth @HyukjinKwon @cloud-fan could you please take a look?

@peter-toth
Copy link
Contributor

peter-toth commented Mar 5, 2024

I don't fully get this issue. In #40779 the isTemporary column had to be casted to string so a job was triggered. But why does isEmpty trigger a job? Also, do other APIs (like head()) trigger jobs on CommandResults that shouldn't?

@wForget
Copy link
Member Author

wForget commented Mar 5, 2024

I don't fully get this issue. In #40779 the isTemporary column had to be casted to string so a job was triggered. But why does isEmpty trigger a job? Also, do other APIs (like head()) trigger jobs on CommandResults that shouldn't?

isEmpty added an empty project and will also trigger a job.

select().limit(1).queryExecution

@wForget
Copy link
Member Author

wForget commented Mar 5, 2024

Also, do other APIs (like head()) trigger jobs on CommandResults that shouldn't?

The head() method will not trigger a job. Because CollectLimitExec.executeCollect() calls child.executeTake(CommandResultExec.executeTake) which will not trigger a job.

Copy link
Contributor

@peter-toth peter-toth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thanks for clarifying my questions.

@cloud-fan
Copy link
Contributor

thanks, merging to master!

@cloud-fan cloud-fan closed this in f350b76 Mar 11, 2024
sweisdb pushed a commit to sweisdb/spark that referenced this pull request Apr 1, 2024
### What changes were proposed in this pull request?

Similar to apache#40779, `Dataset.isEmpty` should also not trigger job execution on CommandResults.

This PR converts `CommandResult` to `LocalRelation` in `Dataset.isEmpty` method.

### Why are the changes needed?

A simple `spark.sql("show tables").isEmpty` shouldn not require an executor.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Added new UT.

### Was this patch authored or co-authored using generative AI tooling?

No

Closes apache#45373 from wForget/SPARK-47270.

Authored-by: Zhen Wang <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
turboFei added a commit to turboFei/spark that referenced this pull request Nov 6, 2025
…s locally (apache#600)

[SPARK-47270][SQL] Dataset.isEmpty projects CommandResults locally

### What changes were proposed in this pull request?

Similar to apache#40779, `Dataset.isEmpty` should also not trigger job execution on CommandResults.

This PR converts `CommandResult` to `LocalRelation` in `Dataset.isEmpty` method.

### Why are the changes needed?

A simple `spark.sql("show tables").isEmpty` shouldn not require an executor.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Added new UT.

### Was this patch authored or co-authored using generative AI tooling?

No

Closes apache#45373 from wForget/SPARK-47270.

Authored-by: Zhen Wang <[email protected]>

Signed-off-by: Wenchen Fan <[email protected]>
Co-authored-by: Zhen Wang <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants