[SPARK-1470][SPARK-1842] Use the scala-logging wrapper instead of the directly sfl4j api #332

witgo · 2014-04-05T14:23:55Z

No description provided.

AmplabJenkins · 2014-04-05T14:27:23Z

Can one of the admins verify this patch?

pwendell · 2014-04-05T17:43:35Z

@witgo why do you want to remove this? I agree it would be good to consolidate the logging infrastructure (e.g. not use different logging for catalyst)... but is there a specific issue with scala logging?

srowen · 2014-04-05T21:23:19Z

(For what my $0.02 may be worth it seems worth it just for the consolidation.)

pwendell · 2014-04-05T22:12:56Z

/cc @marmbrus - do you have strong attachments to scala logging? For the short term I do agree consolidating on Spark's existing logging would be nice. In particular, downstream consumers of Spark's logging subsystem do crazy stuff like routing to different sfl4j backends or inserting other log bridges and interceptors (e.g. http://apache-spark-developers-list.1001551.n3.nabble.com/0-9-0-forces-log4j-usage-td532.html)

Adding another logger into the mix seems like it could be risky. Or at least, it might be nice to merge this now and then separately decide if we want Spark's overall logging system to migrate to scalalogging.

witgo · 2014-04-06T00:22:39Z

We should use unified logging interfaces and overall logging system to migrate to scalalogging is a good idea.

marmbrus · 2014-04-06T00:25:13Z

Scala logging is not a new logger. This is just a set of macro functions that call sfl4j, wrapping the logging calls in checks to see if they are enabled. This lets you perform logging without the performance overhead of doing string concatenation when the log doesn't fire, allocating a closure (like current spark logging does), or having to manually write out the if checks yourself.

pwendell · 2014-04-06T02:03:31Z

I think Spark's current approach is to just add checks around logging calls. It seems like if it's only a matter of saving us from having to write 10 if statements (which I don't think have changed or been modified in several years), it might not be worth having an extra dependency to track.

I'm guessing these if statements get optimized at runtime to the point where they are very cheap.

https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/Logging.scala#L54

marmbrus · 2014-04-06T18:26:26Z

Its not a matter of the if statements inside of spark logging code. Scala logging rewrites your log statements to put the if checks inline with every logging statement.

You write: logger.debug(s"Log message $someExpensiveComputation")
You get:

if(logger.debugEnabled) {
  val logMsg = "Log message" + someExpensiveComputation()
  logger.debug(logMsg)
}

Sparks code does something like this:

val logMessage = new Function0() { def call() =  "Log message" + someExpensiveComputation() }
log.debug(logMessage)

The macros allow you to avoid both the function call, and possible gc overhead of allocating the closure in cases where the log does not fire.

While not huge, this does make a performance difference in practice:

std logging: 19885.48ms
spark logging 914.408ms
scala logging 729.779ms

(And this was only a micro benchmark with no GC pressure).

Given that this is library is a thin layer, endorsed by typesafe, that integrates nicely with our current logging framework (and is easily pluggable if we ever decide to change), and improves performance, I really think this PR is a step backwards. If we want consistency we should probably instead get rid of Sparks home grown logging code.

markhamstra · 2014-04-06T18:52:30Z

Very similar to what I was thinking @marmbrus -- but you wrote it up more completely than I would have. This is also why I've been moving some performance-critical proprietary code over to using Scala logging. It isn't huge, but it does make a difference, and it's generally no more difficult to use Scala logging than older methods, so I'd also like to see Spark moving to use Typesafe's library.

pwendell · 2014-04-06T19:09:47Z

The thing is - adding a dependency is not free. It means that everyone who packages spark downstream needs to deal with a bunch of annoying stuff. For instance, this introduces a new way for us to get an unintentional version bump of slf4j in the Spark build. It also means that if users are upgrading sfl4j on their own, they might have an issue where scala-logging doesn't work against newer versions of slf4j. It means that OS packagers will have to create a new RPM/Debian package for scala-logging if one doesn't exist. Basically, my point is we should have a high bar for any dependency.

If we go with the default plan of "add it if it provides nonzero benefit" - that's way too low of a bar and would result in an unwieldy project.

Anyways on this one, maybe the best answer is to just re-write the spark logging to use scala-logging. Seems like it would be very straightforward. From my end, the main concern is just consolidating it with the normal spark logging.

pwendell · 2014-04-06T19:30:11Z

@witgo could you take a crack at doing that? I think it would be a fairly small change to Logging.scala where we just use the scala-logging wrapper instead of the directly sfl4j api

pwendell · 2014-04-07T04:09:16Z

Jenkins, test this please.

AmplabJenkins · 2014-04-07T04:12:24Z

Merged build triggered.

AmplabJenkins · 2014-04-07T04:12:30Z

Merged build started.

pwendell · 2014-04-07T04:17:01Z

core/src/main/scala/org/apache/spark/Logging.scala

It might be cleaner to do this:

protected def logName = this.getClass.getName.stripSuffix("$")

Then let subclasses override that function. Then below you could just have:

log_ = Logger(LoggerFactory.getLogger(logName))

Yes, I agree with your idea.Here method is better

AmplabJenkins · 2014-04-07T04:48:27Z

Merged build finished.

AmplabJenkins · 2014-04-07T04:48:27Z

Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/13835/

aarondav · 2014-04-07T04:50:08Z

@marmbrus As it stands, this patch seems to give us both worlds (unnecessarily): we wrap the logging statements in a closure and then use the scala logging macro to determine if we should unwrap. However, if we were to simply make the logInfo() arguments into plain Strings, wouldn't we also lose the benefits of the Scala macro, since the string has already been materialized as the argument of logInfo? Would adding an @inline annotation avoid this or would we have to make our log* functions into macros themselves to avoid this overhead while keeping the name?

pwendell · 2014-04-07T04:56:42Z

Hm good call @aarondav I'm not sure if that would work. If it doesn't then would there be any other way to do this than to just directly use the scala logger?

pwendell · 2014-04-07T05:15:39Z

One solution is to have these older messages for compatibility (with deprecation) but then also expose the underlying log directly to users and convert spark's code to use the new version. The Logging trait in Spark is public so it's likely downstream applications are using it.

It's too bad though - the whole reason Logging exists is to provide an indirection layer so we don't have to depend on specific implementations but that's exactly what will cause performance overhead in this case.

rxin · 2014-04-07T06:40:11Z

Is performance really a problem here? I don't think we log stuff in critical path (we probably shouldn't). If we do, maybe we can just get rid of those logging.

witgo · 2014-04-07T07:01:35Z

I did not find the call that affect performance
It is possible that here:
Spark Catalyst logger.debug is called many times
May be like this
RuleExecutor.scala#L64

witgo · 2014-04-11T16:15:23Z

@pwendell
How to let Jenkins to run test?
Submit a Jenkins, test this please. comment ?

rxin · 2014-04-13T00:21:45Z

We should wait until questions like these are answered before we move to scala-logging.

lightbend-labs/scala-logging#4

typesafehub/scalalogging#42

At the very least, when we decide to move, we should make sure we can switch off it easily in case we found (blocker level) problems with it.

witgo · 2014-04-16T17:17:40Z

I don't think macros is faster than the inline method

import scala.compat.Platform

class Log(val flag: Boolean) {

  def logFunction0(msg: => String) {
    if (flag)
      println(msg)
  }

  def logString(msg: String) {
    if (flag)
      println(msg)
  }

  @inline
  def logInline(msg: => String) {
    if (flag)
      println(msg)
  }
}

val log = new Log(false)
var startTime = Platform.currentTime
var i = 0
while (i < 1000000) {
  log.logInline("" + Thread.currentThread.getStackTrace().
    toSeq.take(10).mkString("\\n"))
  i += 1
}
var stopTime = Platform.currentTime
Platform.collectGarbage
println("logInline: " + (stopTime - startTime) + " ms")

startTime = Platform.currentTime
i = 0
while (i < 1000000) {
  log.logFunction0("" + Thread.currentThread.getStackTrace().
    toSeq.take(10).mkString("\\n"))
  i += 1
}
stopTime = Platform.currentTime
Platform.collectGarbage
println("logFunction0: " + (stopTime - startTime) + " ms")

startTime = Platform.currentTime
i = 0
while (i < 1000000) {
  log.logString("" + Thread.currentThread.getStackTrace().
    toSeq.take(10).mkString("\\n"))
  i += 1
}
stopTime = Platform.currentTime
Platform.collectGarbage
println("logString: " + (stopTime - startTime) + " ms")

=>

logInline: 9 ms
logFunction0: 14 ms
logString: 29114 ms

logInline: 10 ms
logFunction0: 13 ms
logString: 28876 ms

logInline: 9 ms
logFunction0: 13 ms
logString: 28521 ms

witgo · 2014-04-19T08:11:15Z

@marmbrus
You are right ,macros is faster than the inline method. There's a short gap.

the test code
the results =>

1
logString: 38987.745 ms
logInline: 14.198 ms
logFunction0: 16.22 ms
logMacros: 12.312 ms

2
logString: 39037.412 ms
logInline: 14.349 ms
logFunction0: 16.2 ms
logMacros: 12.249 ms

3
logString: 39169.036 ms
logInline: 14.324 ms
logFunction0: 16.871 ms
logMacros: 11.324 ms

witgo · 2014-05-11T14:26:08Z

@pwendell @marmbrus @rxin
lightbend-labs/scala-logging#4 has been solved by lightbend-labs/scala-logging#15
The PR should be able to merge into master.

aarondav · 2014-05-11T17:37:57Z

core/src/main/scala/org/apache/spark/Logging.scala

Why was this change made? It's almost the same thing as Utils.getSparkClassLoader, just not using that abstraction?

Uh, I'm sorry, I think this is a problem caused by the merging code.

witgo · 2014-07-11T08:58:54Z

This can't automatic test. I submit a new PR #1369.

witgo · 2014-08-02T12:06:46Z

Jenkins, test this please.

SparkQA · 2014-08-02T12:09:01Z

QA tests have started for PR 332. This patch merges cleanly.
View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17767/consoleFull

SparkQA · 2014-08-02T13:27:20Z

QA results for PR 332:
- This patch PASSES unit tests.
- This patch merges cleanly
- This patch adds no public classes

For more information see test ouptut:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17767/consoleFull

* Update tags * update tags in conf directory

Resolve dockerhub credentials leak

witgo changed the title ~~remove scalalogging-slf4j dependency~~ Spark logger moving to use scala-logging Apr 7, 2014

pwendell reviewed Apr 7, 2014
View reviewed changes

witgo changed the title ~~Spark logger moving to use scala-logging~~ SPARK-1470: Spark logger moving to use scala-logging Apr 11, 2014

witgo changed the title ~~SPARK-1470: Spark logger moving to use scala-logging~~ [SPARK-1470] Spark logger moving to use scala-logging May 11, 2014

aarondav reviewed May 11, 2014
View reviewed changes

witgo mentioned this pull request Jun 2, 2014

[SPARK-1997] update breeze to version 0.8.1 #940

Closed

witgo changed the title ~~[SPARK-1470] Spark logger moving to use scala-logging~~ [SPARK-1470] Use the scala-logging wrapper instead of the directly sfl4j api Jun 22, 2014

witgo closed this Jun 25, 2014

witgo deleted the remove_scalalogging branch June 25, 2014 03:49

srowen mentioned this pull request Jul 11, 2014

[SPARK-1470][SPARK-1842] Use the scala-logging wrapper instead of the directly sfl4j api #1369

Closed

witgo changed the title ~~[SPARK-1470] Use the scala-logging wrapper instead of the directly sfl4j api~~ [SPARK-1470] remove scalalogging-slf4j dependency Aug 2, 2014

witgo restored the remove_scalalogging branch August 2, 2014 10:27

witgo reopened this Aug 2, 2014

Use the scala-logging wrapper instead of the directly sfl4j api.

4a3a714

witgo changed the title ~~[SPARK-1470] remove scalalogging-slf4j dependency~~ [SPARK-1470][SPARK-1842] Use the scala-logging wrapper instead of the directly sfl4j api Aug 2, 2014

witgo closed this Aug 3, 2014

witgo deleted the remove_scalalogging branch August 3, 2014 01:25

lins05 pushed a commit to lins05/spark that referenced this pull request Jun 8, 2017

Update tags (apache#332)

5470366

* Update tags * update tags in conf directory

erikerlandson pushed a commit to erikerlandson/spark that referenced this pull request Jul 28, 2017

Update tags (apache#332)

9cdccbe

* Update tags * update tags in conf directory

mccheah pushed a commit to mccheah/spark that referenced this pull request Oct 3, 2018

Fix caching scala test results (apache#332)

388e784

bzhaoopenstack pushed a commit to bzhaoopenstack/spark that referenced this pull request Sep 11, 2019

Merge pull request apache#332 from mrhillsman/credleak

c429c4c

Resolve dockerhub credentials leak

arjunshroff pushed a commit to arjunshroff/spark that referenced this pull request Nov 24, 2020

[SPARK-302] Local privilege escalation (apache#332)

9f433c8

turboFei pushed a commit to turboFei/spark that referenced this pull request Nov 6, 2025

[HADP-53272] Backport feature: Subquery Fusion (apache#332)

b19f009

[SPARK-1470][SPARK-1842] Use the scala-logging wrapper instead of the directly sfl4j api #332

[SPARK-1470][SPARK-1842] Use the scala-logging wrapper instead of the directly sfl4j api #332

Uh oh!

Conversation

witgo commented Apr 5, 2014

Uh oh!

AmplabJenkins commented Apr 5, 2014

Uh oh!

pwendell commented Apr 5, 2014

Uh oh!

srowen commented Apr 5, 2014

Uh oh!

pwendell commented Apr 5, 2014

Uh oh!

witgo commented Apr 6, 2014

Uh oh!

marmbrus commented Apr 6, 2014

Uh oh!

pwendell commented Apr 6, 2014

Uh oh!

marmbrus commented Apr 6, 2014

Uh oh!

markhamstra commented Apr 6, 2014

Uh oh!

pwendell commented Apr 6, 2014

Uh oh!

pwendell commented Apr 6, 2014

Uh oh!

pwendell commented Apr 7, 2014

Uh oh!

AmplabJenkins commented Apr 7, 2014

Uh oh!

AmplabJenkins commented Apr 7, 2014

Uh oh!

pwendell Apr 7, 2014

Choose a reason for hiding this comment

Uh oh!

witgo Apr 7, 2014

Choose a reason for hiding this comment

Uh oh!

AmplabJenkins commented Apr 7, 2014

Uh oh!

AmplabJenkins commented Apr 7, 2014

Uh oh!

aarondav commented Apr 7, 2014

Uh oh!

pwendell commented Apr 7, 2014

Uh oh!

pwendell commented Apr 7, 2014

Uh oh!

rxin commented Apr 7, 2014

Uh oh!

witgo commented Apr 7, 2014

Uh oh!

witgo commented Apr 11, 2014

Uh oh!

rxin commented Apr 13, 2014

Uh oh!

witgo commented Apr 16, 2014

Uh oh!

witgo commented Apr 19, 2014

Uh oh!

witgo commented May 11, 2014

Uh oh!

aarondav May 11, 2014

Choose a reason for hiding this comment

Uh oh!

witgo May 12, 2014

Choose a reason for hiding this comment

Uh oh!

witgo commented Jul 11, 2014

Uh oh!

witgo commented Aug 2, 2014

Uh oh!

SparkQA commented Aug 2, 2014

Uh oh!

SparkQA commented Aug 2, 2014

Uh oh!

Reviewers

Assignees