Skip to content

Conversation

@pjfanning
Copy link
Member

@pjfanning pjfanning commented Jul 17, 2017

What changes were proposed in this pull request?

When the code that is generated is greater than 64k, then Janino compile will fail and CodeGenerator.scala will log the entire code at Error level.
SPARK-20871 suggests only logging the code at Debug level.
Since, the code is already logged at debug level, this Pull Request proposes not including the formatted code in the Error logging and exception message at all.
When an exception occurs, the code will be logged at Info level but truncated if it is more than 1000 lines long.

How was this patch tested?

Existing tests were run.
An extra test test case was added to CodeFormatterSuite to test the new maxLines parameter,

@ash211
Copy link
Contributor

ash211 commented Jul 17, 2017

Jenkins this is ok to test

@pjfanning pjfanning changed the title [SPARK-20871] only log Janino code at debug level [SPARK-20871][SQL] only log Janino code at debug level Jul 17, 2017
@marmbrus
Copy link
Contributor

I don't have super strong opinions here, but in my experience its not always easy to get users to rerun a failed query with a different logging level. Have we considered truncating or special casing the 64k limitation instead?

@pjfanning
Copy link
Member Author

pjfanning commented Jul 17, 2017

@marmbrus How about something like?:

        val msg = s"failed to compile: $e"
        logError(msg, e)
        logInfo(formatted)
        throw new JaninoRuntimeException(msg, e)

Users with info logging enabled would get the code output when an issue arises but users who don't want to log the code could apply a log level of Warn or Error to the class.
If we are going to output the code, then it feels like we should output it all for completeness. I don't have a strong opinion on this latter point though. There is an alternate argument that for large code, then outputting a truncated version might be enough.

@marmbrus
Copy link
Contributor

That seems reasonable. I'm kind of pro-truncation for very very large code. Even though its not great to have something truncated, outputting GBs of logs is also pretty bad for downstream consumers.

@SparkQA
Copy link

SparkQA commented Jul 17, 2017

Test build #79677 has finished for PR 18658 at commit 803166c.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

// Only add extra debugging info to byte code when we are going to print the source code.
evaluator.setDebuggingInformation(true, true, false)
s"\n$formatted"
s"\n${CodeFormatter.format(code)}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd suggest dropping the string concatenation with \n here -- it requires an additional copy of the code to be held in-memory and for errors where the code is too long, this causes unnecessary additional pressure on the heap

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ash211 the logging will be less user friendly without the \n and this is debug logging - I am working on an enhancement to the CodeFormatter class for the info level logging discussed earlier with @marmbrus that would allow a max number of lines to be specified

@ash211
Copy link
Contributor

ash211 commented Jul 17, 2017

FYI for future reviewers as well, we've been running an extremely similar patch to PJ's on our distribution of Spark for the past several months and had no problems.

Without this change, a code compilation failure often escalates into a heap OOM as the too-large source code is replicated in memory through the log statement.

@pjfanning pjfanning changed the title [SPARK-20871][SQL] only log Janino code at debug level [SPARK-20871][SQL] limit logging of Janino code Jul 17, 2017
@SparkQA
Copy link

SparkQA commented Jul 18, 2017

Test build #79683 has finished for PR 18658 at commit 52b20f3.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@pjfanning
Copy link
Member Author

@ash211 , @marmbrus does this Pull Request look ok or are there other changes needed?

val msg = s"failed to compile: $e\n$formatted"
val msg = s"failed to compile: $e"
logError(msg, e)
logInfo(s"\n${CodeFormatter.format(code, 1000)}")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1000 -> maxLines = 1000

val msg = s"failed to compile: $e\n$formatted"
val msg = s"failed to compile: $e"
logError(msg, e)
logInfo(s"\n${CodeFormatter.format(code, 1000)}")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same here

@gatorsmile
Copy link
Member

When the size is too large, which exception will be thrown?

Truncate for all the cases? or just when the size is too large?

@pjfanning
Copy link
Member Author

@gatorsmile I made the change you requested.
The idea is that when we log the code when the exceptions happen, that we don't want to print the entire code for very large examples. There is still debug level logging that will print the entire code.
I'm pretty neutral about the limit and if it is necessary at all. My main aim is have the code logging done at a different level from the error message so that I can configure my log settings to not log the code at all. I work on a SaaS solution and we don't our user's code to appear in our logs.

def format(code: CodeAndComment): String = {
def format(code: CodeAndComment): String = format(code, -1)

def format(code: CodeAndComment, maxLines: Int): String = {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can maxLines just be an optional argument rather than declare two methods? this is an internal method so its signature is OK to change.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@srowen I can make that change. Is there any need to scaladoc the maxLines param? Or is the function definition easy enough to follow?

def format(code: CodeAndComment, maxLines: Int = -1): String

@SparkQA
Copy link

SparkQA commented Jul 21, 2017

Test build #79830 has finished for PR 18658 at commit 2997ded.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 21, 2017

Test build #79834 has finished for PR 18658 at commit 252f8ea.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

val msg = s"failed to compile: $e\n$formatted"
val msg = s"failed to compile: $e"
logError(msg, e)
logInfo(s"\n${CodeFormatter.format(code, maxLines = 1000)}")
Copy link
Member

@gatorsmile gatorsmile Jul 22, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My only concern whether we should add an internal SQLConf for this hard-coded value.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gatorsmile I can add a config item to org.apache.spark.sql.internal.SQLConf.scala if you think this is appropriate.
Maybe something like:

  val CODEGEN_LOGGING_MAX_LINES = buildConf("spark.sql.codegen.logging.maxLines")
    .internal()
    .doc("The maximum number of codegen lines to log when errors occur.")
    .intConf
    .createWithDefault(1000)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks fine to me

@SparkQA
Copy link

SparkQA commented Jul 23, 2017

Test build #79871 has finished for PR 18658 at commit c42dc46.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

val msg = s"failed to compile: $e\n$formatted"
val msg = s"failed to compile: $e"
logError(msg, e)
val maxLines = new SQLConf().loggingMaxLinesForCodegen
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-> SQLConf.get.loggingMaxLinesForCodegen

val msg = s"failed to compile: $e\n$formatted"
val msg = s"failed to compile: $e"
logError(msg, e)
val maxLines = new SQLConf().loggingMaxLinesForCodegen
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-> SQLConf.get.loggingMaxLinesForCodegen

val CODEGEN_LOGGING_MAX_LINES = buildConf("spark.sql.codegen.logging.maxLines")
.internal()
.doc("The maximum number of codegen lines to log when errors occur.")
.intConf
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add checkValue

@pjfanning
Copy link
Member Author

@gatorsmile I've made the changes you requested.

@SparkQA
Copy link

SparkQA commented Jul 23, 2017

Test build #79882 has finished for PR 18658 at commit a795db2.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gatorsmile
Copy link
Member

LGTM

@gatorsmile
Copy link
Member

Thanks! Merging to master.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants