Skip to content

Conversation

@DaveDeCaprio
Copy link
Contributor

What changes were proposed in this pull request?

The PR puts in a limit on the size of a debug string generated for a tree node. Helps to fix out of memory errors when large plans have huge debug strings. In addition to SPARK-26103, this should also address SPARK-23904 and SPARK-25380. AN alternative solution was proposed in #23076, but that solution doesn't address all the cases that can cause a large query. This limit is only on calls treeString that don't pass a Writer, which makes it play nicely with #22429, #23018 and #23039. Full plans can be written to files, but truncated plans will be used when strings are held in memory, such as for the UI.

  • A new configuration parameter called spark.sql.debug.maxPlanLength was added to control the length of the plans.
  • When plans are truncated, "..." is printed to indicate that it isn't a full plan
  • A warning is printed out the first time a truncated plan is displayed. The warning explains what happened and how to adjust the limit.

How was this patch tested?

Unit tests were created for the new SizeLimitedWriter. Also a unit test for TreeNode was created that checks that a long plan is correctly truncated.

@DaveDeCaprio
Copy link
Contributor Author

@MaxGekk and @hvanhovell, this is an alternative solution for #23076. It limits overall plan length when generating the full string in memory, but not if a specific writer is passed in.

@gatorsmile
Copy link
Member

ok to test

@SparkQA
Copy link

SparkQA commented Nov 28, 2018

Test build #99410 has finished for PR 23169 at commit 5528ca1.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Nov 28, 2018

Test build #99414 has finished for PR 23169 at commit 3171cf3.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Contributor

@HeartSaVioR HeartSaVioR left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your effort on addressing this!

Would this patch address the issue on UI side too, or it will be addressed in another PR?

@SparkQA
Copy link

SparkQA commented Nov 29, 2018

Test build #99418 has finished for PR 23169 at commit 3ffdc6a.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Nov 29, 2018

Test build #99426 has finished for PR 23169 at commit 45a60fc.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • class WriterSizeException(val extraChars: Long, val charLimit: Long) extends Exception(

@SparkQA
Copy link

SparkQA commented Nov 29, 2018

Test build #99427 has finished for PR 23169 at commit 9678799.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@DaveDeCaprio
Copy link
Contributor Author

I added changes to QueryExecution in the latest commit to address the UI issue.

@SparkQA
Copy link

SparkQA commented Nov 29, 2018

Test build #99430 has finished for PR 23169 at commit a5af842.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

@felixcheung felixcheung left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't follow on all the long discussion, but I"m worry that having max len by default and blindly truncating plan string will break some of our important use cases that requires the full plan string?

@DaveDeCaprio
Copy link
Contributor Author

If you have an idea of what those use cases are I could take a look and see if there is an impact. If not, we could turn it off by default (set the max length to Long.Max).

…an-size

# Conflicts:
#	sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
@DaveDeCaprio
Copy link
Contributor Author

DaveDeCaprio commented Dec 3, 2018

Ok @felixcheung , I've updated this PR so that the default behavior does not change - full plan strings are always printed.
This should be fully backwards compatible. Plan strings will only be truncated if you specifically configure them to be.

@SparkQA
Copy link

SparkQA commented Dec 4, 2018

Test build #99632 has finished for PR 23169 at commit f0f75c2.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Dec 4, 2018

Test build #99631 has finished for PR 23169 at commit a4be985.

  • This patch fails Spark unit tests.
  • This patch does not merge cleanly.
  • This patch adds no public classes.

@HeartSaVioR
Copy link
Contributor

retest this, please

@HeartSaVioR
Copy link
Contributor

@DaveDeCaprio

You might miss to roll back change in test.
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99632/testReport/org.apache.spark.sql.catalyst.trees/TreeNodeSuite/treeString_limits_plan_length/

I also think you need to add a new test with setting configuration to some value and see whether it works properly.

Copy link
Contributor

@vanzin vanzin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hvanhovell @MaxGekk if you don't comment here I'll assume you're ok with the changes.

@MaxGekk
Copy link
Member

MaxGekk commented Mar 6, 2019

I think we should indicate to users that a plan was cut otherwise the truncated plan can confuse them. For example, truncatedString outputs "... N more fields". It would be nice if PlanStringConcat prints something like this.

@DaveDeCaprio
Copy link
Contributor Author

I've removed the check that only prints the warning once, and added an indicator to the end of the truncated string saying how much has been removed.

I decided to enforce that the plan string would always be within the limit, even counting the message at the end saying it was truncated. This was a bit of extra code but I think is more the behavior people would expect.

@SparkQA
Copy link

SparkQA commented Mar 8, 2019

Test build #103222 has finished for PR 23169 at commit 4f56e48.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Mar 8, 2019

Test build #103223 has finished for PR 23169 at commit a090fbb.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

@MaxGekk MaxGekk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@SparkQA
Copy link

SparkQA commented Mar 9, 2019

Test build #103231 has finished for PR 23169 at commit dcb4eb0.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Mar 9, 2019

Test build #103268 has finished for PR 23169 at commit b4cb7bf.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Contributor

@vanzin vanzin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just style issues. We generally prefer using === and !== in tests.

@SparkQA
Copy link

SparkQA commented Mar 11, 2019

Test build #103346 has finished for PR 23169 at commit db0db18.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Mar 13, 2019

Test build #103402 has finished for PR 23169 at commit e4afa26.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@vanzin
Copy link
Contributor

vanzin commented Mar 13, 2019

Merging to master.

@vanzin vanzin closed this in 812ad55 Mar 13, 2019
HyukjinKwon pushed a commit that referenced this pull request Mar 26, 2019
…nfig key.

## What changes were proposed in this pull request?

This is a follow-up of #23169.
We should've used string-interpolation to show the config key in the warn message.

## How was this patch tested?

Existing tests.

Closes #24217 from ueshin/issues/SPARK-26103/s.

Authored-by: Takuya UESHIN <[email protected]>
Signed-off-by: Hyukjin Kwon <[email protected]>
@tooptoop4
Copy link
Contributor

@DaveDeCaprio @vanzin PR says merged but jira is open?

@HeartSaVioR
Copy link
Contributor

@tooptoop4 SPARK-26103 has been marked as resolved. Looks like you're referring to SPARK-25380 - SPARK-26103 would help for SPARK-25380 but the issue is not identical (SPARK-25380 concerns there're so many generated plans stored in memory).

dvallejo pushed a commit to Telefonica/spark that referenced this pull request Aug 31, 2021
The PR puts in a limit on the size of a debug string generated for a tree node.  Helps to fix out of memory errors when large plans have huge debug strings.   In addition to SPARK-26103, this should also address SPARK-23904 and SPARK-25380.  AN alternative solution was proposed in apache#23076, but that solution doesn't address all the cases that can cause a large query.  This limit is only on calls treeString that don't pass a Writer, which makes it play nicely with apache#22429, apache#23018 and apache#23039.  Full plans can be written to files, but truncated plans will be used when strings are held in memory, such as for the UI.

- A new configuration parameter called spark.sql.debug.maxPlanLength was added to control the length of the plans.
- When plans are truncated, "..." is printed to indicate that it isn't a full plan
- A warning is printed out the first time a truncated plan is displayed. The warning explains what happened and how to adjust the limit.

Unit tests were created for the new SizeLimitedWriter.  Also a unit test for TreeNode was created that checks that a long plan is correctly truncated.

Closes apache#23169 from DaveDeCaprio/text-plan-size.

Lead-authored-by: Dave DeCaprio <[email protected]>
Co-authored-by: David DeCaprio <[email protected]>
Signed-off-by: Marcelo Vanzin <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.