-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-1470][SPARK-1842] Use the scala-logging wrapper instead of the directly sfl4j api #332
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Can one of the admins verify this patch? |
|
@witgo why do you want to remove this? I agree it would be good to consolidate the logging infrastructure (e.g. not use different logging for catalyst)... but is there a specific issue with scala logging? |
|
(For what my $0.02 may be worth it seems worth it just for the consolidation.) |
|
/cc @marmbrus - do you have strong attachments to scala logging? For the short term I do agree consolidating on Spark's existing logging would be nice. In particular, downstream consumers of Spark's logging subsystem do crazy stuff like routing to different sfl4j backends or inserting other log bridges and interceptors (e.g. http://apache-spark-developers-list.1001551.n3.nabble.com/0-9-0-forces-log4j-usage-td532.html) Adding another logger into the mix seems like it could be risky. Or at least, it might be nice to merge this now and then separately decide if we want Spark's overall logging system to migrate to scalalogging. |
|
We should use unified logging interfaces and overall logging system to migrate to scalalogging is a good idea. |
|
Scala logging is not a new logger. This is just a set of macro functions that call sfl4j, wrapping the logging calls in checks to see if they are enabled. This lets you perform logging without the performance overhead of doing string concatenation when the log doesn't fire, allocating a closure (like current spark logging does), or having to manually write out the if checks yourself. |
|
I think Spark's current approach is to just add checks around logging calls. It seems like if it's only a matter of saving us from having to write 10 I'm guessing these if statements get optimized at runtime to the point where they are very cheap. |
|
Its not a matter of the if statements inside of spark logging code. Scala logging rewrites your log statements to put the if checks inline with every logging statement. You write: Sparks code does something like this: The macros allow you to avoid both the function call, and possible gc overhead of allocating the closure in cases where the log does not fire. While not huge, this does make a performance difference in practice: (And this was only a micro benchmark with no GC pressure). Given that this is library is a thin layer, endorsed by typesafe, that integrates nicely with our current logging framework (and is easily pluggable if we ever decide to change), and improves performance, I really think this PR is a step backwards. If we want consistency we should probably instead get rid of Sparks home grown logging code. |
|
Very similar to what I was thinking @marmbrus -- but you wrote it up more completely than I would have. This is also why I've been moving some performance-critical proprietary code over to using Scala logging. It isn't huge, but it does make a difference, and it's generally no more difficult to use Scala logging than older methods, so I'd also like to see Spark moving to use Typesafe's library. |
|
The thing is - adding a dependency is not free. It means that everyone who packages spark downstream needs to deal with a bunch of annoying stuff. For instance, this introduces a new way for us to get an unintentional version bump of slf4j in the Spark build. It also means that if users are upgrading sfl4j on their own, they might have an issue where scala-logging doesn't work against newer versions of slf4j. It means that OS packagers will have to create a new RPM/Debian package for scala-logging if one doesn't exist. Basically, my point is we should have a high bar for any dependency. If we go with the default plan of "add it if it provides nonzero benefit" - that's way too low of a bar and would result in an unwieldy project. Anyways on this one, maybe the best answer is to just re-write the spark logging to use scala-logging. Seems like it would be very straightforward. From my end, the main concern is just consolidating it with the normal spark logging. |
|
@witgo could you take a crack at doing that? I think it would be a fairly small change to |
|
Jenkins, test this please. |
|
Merged build triggered. |
|
Merged build started. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be cleaner to do this:
protected def logName = this.getClass.getName.stripSuffix("$")
Then let subclasses override that function. Then below you could just have:
log_ = Logger(LoggerFactory.getLogger(logName))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I agree with your idea.Here method is better
|
Merged build finished. |
|
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/13835/ |
|
@marmbrus As it stands, this patch seems to give us both worlds (unnecessarily): we wrap the logging statements in a closure and then use the scala logging macro to determine if we should unwrap. However, if we were to simply make the logInfo() arguments into plain Strings, wouldn't we also lose the benefits of the Scala macro, since the string has already been materialized as the argument of logInfo? Would adding an |
|
Hm good call @aarondav I'm not sure if that would work. If it doesn't then would there be any other way to do this than to just directly use the scala logger? |
|
One solution is to have these older messages for compatibility (with deprecation) but then also expose the underlying log directly to users and convert spark's code to use the new version. The It's too bad though - the whole reason |
|
Is performance really a problem here? I don't think we log stuff in critical path (we probably shouldn't). If we do, maybe we can just get rid of those logging. |
|
I did not find the call that affect performance |
|
@pwendell |
|
We should wait until questions like these are answered before we move to scala-logging. lightbend-labs/scala-logging#4 At the very least, when we decide to move, we should make sure we can switch off it easily in case we found (blocker level) problems with it. |
|
I don't think macros is faster than the inline method import scala.compat.Platform
class Log(val flag: Boolean) {
def logFunction0(msg: => String) {
if (flag)
println(msg)
}
def logString(msg: String) {
if (flag)
println(msg)
}
@inline
def logInline(msg: => String) {
if (flag)
println(msg)
}
}
val log = new Log(false)
var startTime = Platform.currentTime
var i = 0
while (i < 1000000) {
log.logInline("" + Thread.currentThread.getStackTrace().
toSeq.take(10).mkString("\\n"))
i += 1
}
var stopTime = Platform.currentTime
Platform.collectGarbage
println("logInline: " + (stopTime - startTime) + " ms")
startTime = Platform.currentTime
i = 0
while (i < 1000000) {
log.logFunction0("" + Thread.currentThread.getStackTrace().
toSeq.take(10).mkString("\\n"))
i += 1
}
stopTime = Platform.currentTime
Platform.collectGarbage
println("logFunction0: " + (stopTime - startTime) + " ms")
startTime = Platform.currentTime
i = 0
while (i < 1000000) {
log.logString("" + Thread.currentThread.getStackTrace().
toSeq.take(10).mkString("\\n"))
i += 1
}
stopTime = Platform.currentTime
Platform.collectGarbage
println("logString: " + (stopTime - startTime) + " ms")=> |
|
@marmbrus the test code |
|
@pwendell @marmbrus @rxin |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why was this change made? It's almost the same thing as Utils.getSparkClassLoader, just not using that abstraction?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Uh, I'm sorry, I think this is a problem caused by the merging code.
|
This can't automatic test. I submit a new PR #1369. |
|
Jenkins, test this please. |
|
QA tests have started for PR 332. This patch merges cleanly. |
|
QA results for PR 332: |
* Update tags * update tags in conf directory
* Update tags * update tags in conf directory
Resolve dockerhub credentials leak
No description provided.