[SPARK-25440][SQL] Dumping query execution info to a file #22429

MaxGekk · 2018-09-15T15:17:52Z

What changes were proposed in this pull request?

In the PR, I propose new method for debugging queries by dumping info about their execution to a file. It saves logical, optimized and physical plan similar to the explain() method + generated code. One of the advantages of the method over explain is it doesn't truncate output and doesn't not materializes full output as one string in memory which can cause OOMs.

How was this patch tested?

Added a test which checks that new method dumps correct info about a query.

SparkQA · 2018-09-15T19:14:44Z

Test build #96093 has finished for PR 22429 at commit 2ee75bc.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

MaxGekk · 2018-09-15T19:52:42Z

@rednaxelafx Please, take a look at the PR.

SparkQA · 2018-09-16T00:29:30Z

Test build #96097 has finished for PR 22429 at commit 9b2a3e6.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/trees/TreeNode.scala

sql/core/src/main/scala/org/apache/spark/sql/execution/QueryExecution.scala

sql/core/src/test/scala/org/apache/spark/sql/execution/QueryExecutionSuite.scala

SparkQA · 2018-09-16T20:15:40Z

Test build #96109 has finished for PR 22429 at commit ce2c086.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/trees/TreeNode.scala

sql/core/src/main/scala/org/apache/spark/sql/execution/QueryExecution.scala

rednaxelafx

Thank you so much for the proposal, Max!

While I really really like the feature you're proposing (yes please get them in!), as they're really useful for debugging / diagnosis, there are a few things in the current implementation that might be deal breakers for me:

Similar to what @hvanhovell mentioned, if we want to have a clean feature at the end, it would be better to pass the max fields argument down the call chain instead of changing the Spark conf.
Setting a Spark conf is a process-wide global change that will affect everybody, including code running in other threads. A save-use-restore sequence would only work for a single threaded scenario, and would otherwise be problematic in a multi-threaded scenario.

It would definitely be a lot more code change to pass that argument down, but I think it's worth the change (because the existing code before this change is too rigid to begin with anyway).

The use of java.io.StringWriter. I totally understand the motivation to use StringWriter for the regular path so that both the regular and file-dumping paths can share the same implementation code via the Writer interface.

But if you take a look at how StringWriter is implemented in OpenJDK8u: http://hg.openjdk.java.net/jdk8u/jdk8u/jdk/file/2660b127b407/src/share/classes/java/io/StringWriter.java#l43

public class StringWriter extends Writer {

    private StringBuffer buf;

It's actually backed by a StringBuffer, which is worse than a StringBuilder in terms of extra synchronization overhead implied.
(Yes, modern JVMs to try to optimize StringBuffers via various techniques, including biased locking, lock elimination etc, but it's just less guaranteed to have good performance)
The best-performing way to do this is probably to implement a custom Writer that uses some sort of a "rope" implementation as the backing store, so that premature string concatenation is kept to a minimum. Whether or not we wanna go that way can be subject of further discussion, though.

sql/core/src/main/scala/org/apache/spark/sql/execution/QueryExecution.scala

SparkQA · 2018-09-18T00:18:15Z

Test build #96154 has finished for PR 22429 at commit 71ff7d1.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

core/src/main/scala/org/apache/spark/util/Utils.scala

MaxGekk · 2018-11-01T07:58:17Z

May I ask you @hvanhovell @zsxwing to review the PR one more time.

MaxGekk · 2018-11-02T18:27:30Z

@gatorsmile @HyukjinKwon @viirya @rednaxelafx Are you ok with the proposed changes or there is something which blocks the PR for now?

hiboyang · 2018-11-02T21:14:24Z

@MaxGekk I sent email to spark dev list about structured plan logging, but did not get any response.

@boy-uber I guess It is better to speak about the feature to @bogdanrdc @hvanhovell @larturus

Thanks @MaxGekk for the contact list! I will ping them to gather more thoughts.

MaxGekk · 2018-11-05T07:53:01Z

jenkins, retest this, please

SparkQA · 2018-11-05T08:05:01Z

Test build #98461 has finished for PR 22429 at commit 76f4248.

This patch fails due to an unknown error code, -9.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2018-11-05T12:35:19Z

Test build #98465 has finished for PR 22429 at commit f7de26d.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2018-11-05T17:36:55Z

Test build #98475 has finished for PR 22429 at commit bda6ac2.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

MaxGekk · 2018-11-09T18:20:26Z

@hvanhovell Could you look at the PR, please.

MaxGekk · 2018-11-11T12:48:37Z

@cloud-fan @gatorsmile May I ask you to look at the PR. It stuck for a while by unclear reasons but I believe the proposed method toFile could be pretty useful in troubleshooting different issues, and I just don't want to close it so easily.

cloud-fan · 2018-11-12T07:25:16Z

This is hard to review, do you mean we should add maxFields: Option[Int] to all the string related methods?

MaxGekk · 2018-11-12T09:41:10Z

@cloud-fan

This is hard to review, do you mean we should add maxFields: Option[Int] to all the string related methods?

Not to all but only to methods involved to producing textual representation of different plans. Those are simpleString and verboseString in children of TreeNode + simpleString of StructType. The changes are trivial - I just made them by fixing compilation errors. Maybe I missed somethings but only because it wasn't touched by compiler.

Main changes are concentrated in QueryExecution.scala where everything begins from the toFile method. Other changes are consequences of that. The main idea is to write a OutputStream instead of strings materializations in memory. So, it should allow to avoid limits for string size and OOM which we see sometimes in corner cases.

HyukjinKwon · 2018-11-12T10:24:17Z

@MaxGekk, I think the implementation looks complicated here: this looks introduces None concept to indicate no limit which makes this PR hard to read. I think it's okay to expose that number to toFile. I think this is an internal API, right? People could just set whatever number they want I guess.
Fix me if I misread. If we get rid of it, I think the PR size will be drastically small.

HyukjinKwon · 2018-11-12T10:31:14Z

sql/core/src/main/scala/org/apache/spark/sql/execution/limit.scala

-  override def simpleString: String = {
-    val orderByString = Utils.truncatedString(sortOrder, "[", ",", "]")
-    val outputString = Utils.truncatedString(output, "[", ",", "]")
+  override def simpleString(maxFields: Option[Int]): String = {


Can we just get rid of the maxFields? I think this makes the PR hard to read.

HyukjinKwon · 2018-11-12T10:34:20Z

I took a super quick pass - the change actually quite looks okay in general to me.

MaxGekk · 2018-11-12T11:07:46Z

@HyukjinKwon @cloud-fan Thank you for looking at the PR.

So, if I split the PR to 2 PRs:

Writing truncated plans to a file
Control number of fields in truncated strings.

, it would be better for review, right?

I think it's okay to expose that number to toFile. I think this is an internal API, right?

Yes, it is.

People could just set whatever number they want I guess. Fix me if I misread.

I just had a concern that there are much more places (~10) besides of the toFile method where I have to pass some value for maxFields. In all such places I have to read a SQL config.

HyukjinKwon · 2018-11-12T11:14:49Z

@boy-uber, for structured streaming, let's do it out of this PR. I think the actual change of this PR can be small (1.). We can change this API for structured streaming later if needed since this is just an internal private API. Change (2.) can be shared I guess even if we go for structured streaming stuff.

hiboyang · 2018-11-12T16:43:57Z

@boy-uber, for structured streaming, let's do it out of this PR. I think the actual change of this PR can be small (1.). We can change this API for structured streaming later if needed since this is just an internal private API. Change (2.) can be shared I guess even if we go for structured streaming stuff.

Hi @HyukjinKwon thanks for the comment! My original suggestion was not related to structured streaming. It was to support structured logging for Spark plan, for example, get the Spark plan as a json blob, and send it to Spark Listener. This will make it easy to Spark users to write a Spark Listener plugin to get the plan and process the json in their code. Then people could parse and analyze a large number of Spark applications based on their plans. One use case for this is using machine learning to predict resource usage based on the Spark plans. This may not be related to this PR. If you or other people are interested, we could talk offline.

HyukjinKwon · 2018-11-12T16:50:07Z

Ooops i rished to read. Yea but still sounds related but orthogonal. Let's move it to mailing list. That should be the best place to discuss further.

MaxGekk · 2018-11-14T13:30:00Z

I am closing the PR since a part of it has been merged in #23018 already, and the rest is coming soon.

MaxGekk · 2018-11-27T22:54:05Z

Here is a PR introduces maxFields parameter to all function involved in creation of truncated strings of spark plans: #23159

The PR puts in a limit on the size of a debug string generated for a tree node. Helps to fix out of memory errors when large plans have huge debug strings. In addition to SPARK-26103, this should also address SPARK-23904 and SPARK-25380. AN alternative solution was proposed in apache#23076, but that solution doesn't address all the cases that can cause a large query. This limit is only on calls treeString that don't pass a Writer, which makes it play nicely with apache#22429, apache#23018 and apache#23039. Full plans can be written to files, but truncated plans will be used when strings are held in memory, such as for the UI. - A new configuration parameter called spark.sql.debug.maxPlanLength was added to control the length of the plans. - When plans are truncated, "..." is printed to indicate that it isn't a full plan - A warning is printed out the first time a truncated plan is displayed. The warning explains what happened and how to adjust the limit. Unit tests were created for the new SizeLimitedWriter. Also a unit test for TreeNode was created that checks that a long plan is correctly truncated. Closes apache#23169 from DaveDeCaprio/text-plan-size. Lead-authored-by: Dave DeCaprio <[email protected]> Co-authored-by: David DeCaprio <[email protected]> Signed-off-by: Marcelo Vanzin <[email protected]>

MaxGekk added 7 commits September 15, 2018 14:22

Stub implementation and a test

19b9a68

Saving all plans to file

90832f9

Output attributes

673ae56

Output whole stage codegen

fbde812

Reusing codegenToOutputStream

dca19d3

Code de-duplication

66351a0

Do not truncate fields

2ee75bc

Moving the test up because previous one leaved a garbage

9b2a3e6

MaxGekk changed the title ~~[SPARK-25440][SQL] Dump query execution info to a file~~ [SPARK-25440][SQL] Dumping query execution info to a file Sep 15, 2018

hvanhovell reviewed Sep 16, 2018

View reviewed changes

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/trees/TreeNode.scala Outdated Show resolved Hide resolved

hvanhovell reviewed Sep 16, 2018

View reviewed changes

sql/core/src/main/scala/org/apache/spark/sql/execution/QueryExecution.scala Outdated Show resolved Hide resolved

viirya reviewed Sep 16, 2018

View reviewed changes

sql/core/src/test/scala/org/apache/spark/sql/execution/QueryExecutionSuite.scala Outdated Show resolved Hide resolved

MaxGekk added 4 commits September 16, 2018 16:43

Removing string interpolation in the test

51c196e

Getting Hadoop's conf from session state

c66a616

Using java.io.Writer

ed57c8e

Using java.io.Writer

ce2c086

hvanhovell reviewed Sep 16, 2018

View reviewed changes

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/trees/TreeNode.scala Outdated Show resolved Hide resolved

hvanhovell reviewed Sep 16, 2018

View reviewed changes

sql/core/src/main/scala/org/apache/spark/sql/execution/QueryExecution.scala Outdated Show resolved Hide resolved

hvanhovell reviewed Sep 16, 2018

View reviewed changes

sql/core/src/main/scala/org/apache/spark/sql/execution/QueryExecution.scala Outdated Show resolved Hide resolved

MaxGekk added 4 commits September 17, 2018 18:07

Merge remote-tracking branch 'origin/master' into plan-to-file

37326e2

Using StringWriter

7abf14c

Removing unneeded buffering and flushing

d1188e3

Code de-duplication among toString and toFile

71ff7d1

rednaxelafx reviewed Sep 17, 2018

View reviewed changes

sql/core/src/main/scala/org/apache/spark/sql/execution/QueryExecution.scala Outdated Show resolved Hide resolved

gatorsmile reviewed Sep 18, 2018

View reviewed changes

core/src/main/scala/org/apache/spark/util/Utils.scala Outdated Show resolved Hide resolved

Merge remote-tracking branch 'origin/master' into plan-to-file

f7de26d

Merge branch 'master' into plan-to-file

bda6ac2

HyukjinKwon reviewed Nov 12, 2018

View reviewed changes

MaxGekk mentioned this pull request Nov 12, 2018

[SPARK-26023][SQL] Dumping truncated plans and generated code to a file #23018

Closed

MaxGekk closed this Nov 14, 2018

This was referenced Nov 14, 2018

[SPARK-26066][SQL] Move truncatedString to sql/catalyst and add spark.sql.debug.maxToStringFields conf #23039

Closed

[SPARK-26103][SQL] Added maxDepth to limit the length of text plans #23076

Closed

DaveDeCaprio mentioned this pull request Nov 28, 2018

[SPARK-26103][SQL] Limit the length of debug strings for query plans #23169

Closed

MaxGekk mentioned this pull request Nov 28, 2018

[SPARK-26191][SQL] Control truncation of Spark plans via maxFields parameter #23159

Closed

MaxGekk deleted the plan-to-file branch August 17, 2019 13:35

[SPARK-25440][SQL] Dumping query execution info to a file #22429

[SPARK-25440][SQL] Dumping query execution info to a file #22429

Uh oh!

Conversation

MaxGekk commented Sep 15, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

SparkQA commented Sep 15, 2018

Uh oh!

MaxGekk commented Sep 15, 2018

Uh oh!

SparkQA commented Sep 16, 2018

Uh oh!

Uh oh!

Uh oh!

Uh oh!

SparkQA commented Sep 16, 2018

Uh oh!

Uh oh!

Uh oh!

Uh oh!

rednaxelafx left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

SparkQA commented Sep 18, 2018

Uh oh!

Uh oh!

MaxGekk commented Nov 1, 2018

Uh oh!

MaxGekk commented Nov 2, 2018

Uh oh!

hiboyang commented Nov 2, 2018

Uh oh!

MaxGekk commented Nov 5, 2018

Uh oh!

SparkQA commented Nov 5, 2018

Uh oh!

SparkQA commented Nov 5, 2018

Uh oh!

SparkQA commented Nov 5, 2018

Uh oh!

MaxGekk commented Nov 9, 2018

Uh oh!

MaxGekk commented Nov 11, 2018

Uh oh!

cloud-fan commented Nov 12, 2018

Uh oh!

MaxGekk commented Nov 12, 2018

Uh oh!

HyukjinKwon commented Nov 12, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

HyukjinKwon Nov 12, 2018

Choose a reason for hiding this comment

Uh oh!

HyukjinKwon commented Nov 12, 2018

Uh oh!

MaxGekk commented Nov 12, 2018

Uh oh!

HyukjinKwon commented Nov 12, 2018

Uh oh!

hiboyang commented Nov 12, 2018

Uh oh!

HyukjinKwon commented Nov 12, 2018

Uh oh!

MaxGekk commented Nov 14, 2018

Uh oh!

MaxGekk commented Nov 27, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

10 participants

MaxGekk commented Sep 15, 2018 •

edited

Loading

HyukjinKwon commented Nov 12, 2018 •

edited

Loading