-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-26103][SQL] Limit the length of debug strings for query plans #23169
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@MaxGekk and @hvanhovell, this is an alternative solution for #23076. It limits overall plan length when generating the full string in memory, but not if a specific writer is passed in. |
|
ok to test |
|
Test build #99410 has finished for PR 23169 at commit
|
|
Test build #99414 has finished for PR 23169 at commit
|
HeartSaVioR
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your effort on addressing this!
Would this patch address the issue on UI side too, or it will be addressed in another PR?
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/SizeLimitedWriter.scala
Outdated
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/SizeLimitedWriter.scala
Outdated
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
Outdated
Show resolved
Hide resolved
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/trees/TreeNodeSuite.scala
Outdated
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/package.scala
Outdated
Show resolved
Hide resolved
|
Test build #99418 has finished for PR 23169 at commit
|
|
Test build #99426 has finished for PR 23169 at commit
|
Added the limit in for query execution
|
Test build #99427 has finished for PR 23169 at commit
|
|
I added changes to QueryExecution in the latest commit to address the UI issue. |
|
Test build #99430 has finished for PR 23169 at commit
|
felixcheung
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't follow on all the long discussion, but I"m worry that having max len by default and blindly truncating plan string will break some of our important use cases that requires the full plan string?
|
If you have an idea of what those use cases are I could take a look and see if there is an impact. If not, we could turn it off by default (set the max length to Long.Max). |
…t is specifically configured.
…an-size # Conflicts: # sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
|
Ok @felixcheung , I've updated this PR so that the default behavior does not change - full plan strings are always printed. |
|
Test build #99632 has finished for PR 23169 at commit
|
|
Test build #99631 has finished for PR 23169 at commit
|
|
retest this, please |
|
You might miss to roll back change in test. I also think you need to add a new test with setting configuration to some value and see whether it works properly. |
vanzin
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@hvanhovell @MaxGekk if you don't comment here I'll assume you're ok with the changes.
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
Outdated
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/StringUtils.scala
Outdated
Show resolved
Hide resolved
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/StringUtilsSuite.scala
Outdated
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/StringUtils.scala
Outdated
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/StringUtils.scala
Show resolved
Hide resolved
|
I think we should indicate to users that a plan was cut otherwise the truncated plan can confuse them. For example, |
|
I've removed the check that only prints the warning once, and added an indicator to the end of the truncated string saying how much has been removed. I decided to enforce that the plan string would always be within the limit, even counting the message at the end saying it was truncated. This was a bit of extra code but I think is more the behavior people would expect. |
|
Test build #103222 has finished for PR 23169 at commit
|
|
Test build #103223 has finished for PR 23169 at commit
|
MaxGekk
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
Test build #103231 has finished for PR 23169 at commit
|
|
Test build #103268 has finished for PR 23169 at commit
|
vanzin
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just style issues. We generally prefer using === and !== in tests.
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/StringUtils.scala
Outdated
Show resolved
Hide resolved
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/StringUtilsSuite.scala
Outdated
Show resolved
Hide resolved
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/StringUtilsSuite.scala
Outdated
Show resolved
Hide resolved
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/StringUtilsSuite.scala
Outdated
Show resolved
Hide resolved
|
Test build #103346 has finished for PR 23169 at commit
|
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
Outdated
Show resolved
Hide resolved
|
Test build #103402 has finished for PR 23169 at commit
|
|
Merging to master. |
…nfig key. ## What changes were proposed in this pull request? This is a follow-up of #23169. We should've used string-interpolation to show the config key in the warn message. ## How was this patch tested? Existing tests. Closes #24217 from ueshin/issues/SPARK-26103/s. Authored-by: Takuya UESHIN <[email protected]> Signed-off-by: Hyukjin Kwon <[email protected]>
|
@DaveDeCaprio @vanzin PR says merged but jira is open? |
|
@tooptoop4 SPARK-26103 has been marked as resolved. Looks like you're referring to SPARK-25380 - SPARK-26103 would help for SPARK-25380 but the issue is not identical (SPARK-25380 concerns there're so many generated plans stored in memory). |
The PR puts in a limit on the size of a debug string generated for a tree node. Helps to fix out of memory errors when large plans have huge debug strings. In addition to SPARK-26103, this should also address SPARK-23904 and SPARK-25380. AN alternative solution was proposed in apache#23076, but that solution doesn't address all the cases that can cause a large query. This limit is only on calls treeString that don't pass a Writer, which makes it play nicely with apache#22429, apache#23018 and apache#23039. Full plans can be written to files, but truncated plans will be used when strings are held in memory, such as for the UI. - A new configuration parameter called spark.sql.debug.maxPlanLength was added to control the length of the plans. - When plans are truncated, "..." is printed to indicate that it isn't a full plan - A warning is printed out the first time a truncated plan is displayed. The warning explains what happened and how to adjust the limit. Unit tests were created for the new SizeLimitedWriter. Also a unit test for TreeNode was created that checks that a long plan is correctly truncated. Closes apache#23169 from DaveDeCaprio/text-plan-size. Lead-authored-by: Dave DeCaprio <[email protected]> Co-authored-by: David DeCaprio <[email protected]> Signed-off-by: Marcelo Vanzin <[email protected]>
What changes were proposed in this pull request?
The PR puts in a limit on the size of a debug string generated for a tree node. Helps to fix out of memory errors when large plans have huge debug strings. In addition to SPARK-26103, this should also address SPARK-23904 and SPARK-25380. AN alternative solution was proposed in #23076, but that solution doesn't address all the cases that can cause a large query. This limit is only on calls treeString that don't pass a Writer, which makes it play nicely with #22429, #23018 and #23039. Full plans can be written to files, but truncated plans will be used when strings are held in memory, such as for the UI.
How was this patch tested?
Unit tests were created for the new SizeLimitedWriter. Also a unit test for TreeNode was created that checks that a long plan is correctly truncated.