Prevent triggering spark job when executing the initialization sql #6789

turboFei · 2024-11-01T00:26:22Z

🔍 Description

Issue References 🔗

I found that, the spark.sql("show databases").isEmpty will trigger the job with community spark-3.5.2（3.1 does not).

Describe Your Solution 🔧

Using dataFrame.take(1).isEmpty instead of dataFrame.isEmpty.

Types of changes 🔖

Bugfix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to change)

Test Plan 🧪

Behavior Without This Pull Request ⚰️

Behavior With This Pull Request 🎉

Related Unit Tests

No job triggered for take(1).isEmpty.

Checklist 📝

This patch was not authored or co-authored using Generative Tooling

Be nice. Be informative.

bowenliang123

LGTM. Nice catch.

turboFei · 2024-11-01T01:08:05Z

...uubi-spark-sql-engine/src/test/scala/org/apache/kyuubi/engine/spark/WithSparkSQLEngine.scala

-  protected val initJobId: Int = if (SPARK_ENGINE_RUNTIME_VERSION >= "4.0") 0 else 1
+  // KYUUBI #6789 makes it avoid triggering job
+  // protected val initJobId: Int = if (SPARK_ENGINE_RUNTIME_VERSION >= "4.0") 0 else 1
+  protected val initJobId: Int = 0


how do you think about? @pan3793

Fixed by @wForget apache/spark#45397

In our use case, I set initial/mini executors to 0 for notebook connections.

If job triggered by initialization SQL, it will need at least 1 executor before the connection ready.

And for notebook connections, we will init the spark driver in a temporary queue, and then move it to the user queue eventually, so it should be fast to make the connection ready even the user queue has no available resources.

But if job triggered during initialization, it might make it slow if the user queue has no available resources.

ulysses-you · 2024-11-01T01:18:52Z

.../kyuubi-spark-sql-engine/src/main/scala/org/apache/kyuubi/engine/spark/KyuubiSparkUtil.scala

      debug(s"Execute initialization sql: $sql")
      try {
-        spark.sql(sql).isEmpty
+        spark.sql(sql).take(1).isEmpty


spark.sql(sql).take(1) has two issues:

it will do an row conversion from InternalRow to Row

it does not do column pruning

how about using the same way of the pr apache/spark#45373 to improve it ?

Not sure whether it is easy to maintain this workaround in kyuubi end.

pan3793 · 2024-11-01T02:06:00Z

.../kyuubi-spark-sql-engine/src/main/scala/org/apache/spark/sql/kyuubi/SparkDatasetHelper.scala

  }
+
+  /** SPARK-47270: Returns a optimized plan for CommandResult, convert to `LocalRelation`. */
+  def commandResultOptimized[T](dataset: Dataset[T]): Dataset[T] = {


I'm -0 on this change, I would consider it a Spark side issue. Spark’s master branch is undergoing a major refactoring, I'm worried about accessing Spark's non-public API in the engine's "core code path".

For users who want to avoid triggering executor launch, they can either patch their spark or set init SQL as something like: SET spark.app.id

Ok. I will cherry-pick the pr into our managed spark 3.5.

turboFei · 2024-11-05T03:02:13Z

Prefer to backport the spark PR to managed spark

do not trigger job

a3fcbbc

github-actions bot added the module:spark label Nov 1, 2024

turboFei closed this Nov 1, 2024

turboFei changed the title ~~do not trigger job~~ Prevent trigger spark job when executing the initialization sql Nov 1, 2024

turboFei changed the title ~~Prevent trigger spark job when executing the initialization sql~~ Prevent triggering spark job when executing the initialization sql Nov 1, 2024

turboFei reopened this Nov 1, 2024

turboFei requested review from bowenliang123, cfmcgrady, pan3793 and ulysses-you November 1, 2024 00:48

bowenliang123 approved these changes Nov 1, 2024

View reviewed changes

turboFei force-pushed the do_not_trigger_job branch from 2497389 to c35b608 Compare November 1, 2024 01:07

turboFei commented Nov 1, 2024

View reviewed changes

fix ut

7b67d5d

turboFei force-pushed the do_not_trigger_job branch from c35b608 to 7b67d5d Compare November 1, 2024 01:16

ulysses-you reviewed Nov 1, 2024

View reviewed changes

turboFei force-pushed the do_not_trigger_job branch from 1d2fd7c to 0ab3c2a Compare November 1, 2024 01:57

[SPARK-47270][SQL] Dataset.isEmpty projects CommandResults locally

da4df40

turboFei force-pushed the do_not_trigger_job branch from 0ab3c2a to da4df40 Compare November 1, 2024 02:02

pan3793 reviewed Nov 1, 2024

View reviewed changes

turboFei closed this Nov 5, 2024

turboFei deleted the do_not_trigger_job branch November 5, 2024 03:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Prevent triggering spark job when executing the initialization sql #6789

Prevent triggering spark job when executing the initialization sql #6789

Uh oh!

turboFei commented Nov 1, 2024 •

edited

Loading

Uh oh!

bowenliang123 left a comment

Uh oh!

turboFei Nov 1, 2024

Uh oh!

turboFei Nov 1, 2024

Uh oh!

turboFei Nov 1, 2024

Uh oh!

turboFei Nov 1, 2024 •

edited

Loading

Uh oh!

ulysses-you Nov 1, 2024

Uh oh!

turboFei Nov 1, 2024

Uh oh!

pan3793 Nov 1, 2024

Uh oh!

turboFei Nov 1, 2024

Uh oh!

turboFei commented Nov 5, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Prevent triggering spark job when executing the initialization sql #6789

Prevent triggering spark job when executing the initialization sql #6789

Uh oh!

Conversation

turboFei commented Nov 1, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔍 Description

Issue References 🔗

Describe Your Solution 🔧

Types of changes 🔖

Test Plan 🧪

Behavior Without This Pull Request ⚰️

Behavior With This Pull Request 🎉

Related Unit Tests

Checklist 📝

Uh oh!

bowenliang123 left a comment

Choose a reason for hiding this comment

Uh oh!

turboFei Nov 1, 2024

Choose a reason for hiding this comment

Uh oh!

turboFei Nov 1, 2024

Choose a reason for hiding this comment

Uh oh!

turboFei Nov 1, 2024

Choose a reason for hiding this comment

Uh oh!

turboFei Nov 1, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ulysses-you Nov 1, 2024

Choose a reason for hiding this comment

Uh oh!

turboFei Nov 1, 2024

Choose a reason for hiding this comment

Uh oh!

pan3793 Nov 1, 2024

Choose a reason for hiding this comment

Uh oh!

turboFei Nov 1, 2024

Choose a reason for hiding this comment

Uh oh!

turboFei commented Nov 5, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

turboFei commented Nov 1, 2024 •

edited

Loading

turboFei Nov 1, 2024 •

edited

Loading