-
Notifications
You must be signed in to change notification settings - Fork 970
Prevent triggering spark job when executing the initialization sql #6789
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. Weβll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
bowenliang123
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Nice catch.
2497389 to
c35b608
Compare
| protected val initJobId: Int = if (SPARK_ENGINE_RUNTIME_VERSION >= "4.0") 0 else 1 | ||
| // KYUUBI #6789 makes it avoid triggering job | ||
| // protected val initJobId: Int = if (SPARK_ENGINE_RUNTIME_VERSION >= "4.0") 0 else 1 | ||
| protected val initJobId: Int = 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how do you think about? @pan3793
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed by @wForget apache/spark#45397
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In our use case, I set initial/mini executors to 0 for notebook connections.
If job triggered by initialization SQL, it will need at least 1 executor before the connection ready.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And for notebook connections, we will init the spark driver in a temporary queue, and then move it to the user queue eventually, so it should be fast to make the connection ready even the user queue has no available resources.
But if job triggered during initialization, it might make it slow if the user queue has no available resources.
c35b608 to
7b67d5d
Compare
| debug(s"Execute initialization sql: $sql") | ||
| try { | ||
| spark.sql(sql).isEmpty | ||
| spark.sql(sql).take(1).isEmpty |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
spark.sql(sql).take(1) has two issues:
- it will do an row conversion from
InternalRowtoRow - it does not do column pruning
how about using the same way of the pr apache/spark#45373 to improve it ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure whether it is easy to maintain this workaround in kyuubi end.
1d2fd7c to
0ab3c2a
Compare
0ab3c2a to
da4df40
Compare
| } | ||
|
|
||
| /** SPARK-47270: Returns a optimized plan for CommandResult, convert to `LocalRelation`. */ | ||
| def commandResultOptimized[T](dataset: Dataset[T]): Dataset[T] = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm -0 on this change, I would consider it a Spark side issue. Sparkβs master branch is undergoing a major refactoring, I'm worried about accessing Spark's non-public API in the engine's "core code path".
For users who want to avoid triggering executor launch, they can either patch their spark or set init SQL as something like: SET spark.app.id
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. I will cherry-pick the pr into our managed spark 3.5.
|
Prefer to backport the spark PR to managed spark |
π Description
Issue References π
I found that, the
spark.sql("show databases").isEmptywill trigger the job with community spark-3.5.2οΌ3.1 does not).Describe Your Solution π§
Using
dataFrame.take(1).isEmptyinstead ofdataFrame.isEmpty.Types of changes π
Test Plan π§ͺ
Behavior Without This Pull Request β°οΈ
Behavior With This Pull Request π
Related Unit Tests
No job triggered for

take(1).isEmpty.Checklist π
Be nice. Be informative.