-
Notifications
You must be signed in to change notification settings - Fork 970
Prevent triggering spark job when executing the initialization sql #6789
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. Weβll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -21,7 +21,6 @@ import org.apache.spark.sql.SparkSession | |
|
|
||
| import org.apache.kyuubi.{KyuubiFunSuite, Utils} | ||
| import org.apache.kyuubi.config.KyuubiConf | ||
| import org.apache.kyuubi.engine.spark.KyuubiSparkUtil.SPARK_ENGINE_RUNTIME_VERSION | ||
|
|
||
| trait WithSparkSQLEngine extends KyuubiFunSuite { | ||
| protected var spark: SparkSession = _ | ||
|
|
@@ -35,7 +34,9 @@ trait WithSparkSQLEngine extends KyuubiFunSuite { | |
| // Behavior is affected by the initialization SQL: 'SHOW DATABASES' | ||
| // SPARK-35378 (3.2.0) makes it triggers job | ||
| // SPARK-43124 (4.0.0) makes it avoid triggering job | ||
| protected val initJobId: Int = if (SPARK_ENGINE_RUNTIME_VERSION >= "4.0") 0 else 1 | ||
| // KYUUBI #6789 makes it avoid triggering job | ||
| // protected val initJobId: Int = if (SPARK_ENGINE_RUNTIME_VERSION >= "4.0") 0 else 1 | ||
| protected val initJobId: Int = 0 | ||
|
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. how do you think about? @pan3793
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Fixed by @wForget apache/spark#45397
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. In our use case, I set initial/mini executors to 0 for notebook connections. If job triggered by initialization SQL, it will need at least 1 executor before the connection ready.
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. And for notebook connections, we will init the spark driver in a temporary queue, and then move it to the user queue eventually, so it should be fast to make the connection ready even the user queue has no available resources. But if job triggered during initialization, it might make it slow if the user queue has no available resources. |
||
|
|
||
| override def beforeAll(): Unit = { | ||
| startSparkEngine() | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm -0 on this change, I would consider it a Spark side issue. Sparkβs master branch is undergoing a major refactoring, I'm worried about accessing Spark's non-public API in the engine's "core code path".
For users who want to avoid triggering executor launch, they can either patch their spark or set init SQL as something like:
SET spark.app.idThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. I will cherry-pick the pr into our managed spark 3.5.