Skip to content

Conversation

@witgo
Copy link
Contributor

@witgo witgo commented Apr 5, 2014

No description provided.

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@pwendell
Copy link
Contributor

pwendell commented Apr 5, 2014

@witgo why do you want to remove this? I agree it would be good to consolidate the logging infrastructure (e.g. not use different logging for catalyst)... but is there a specific issue with scala logging?

@srowen
Copy link
Member

srowen commented Apr 5, 2014

(For what my $0.02 may be worth it seems worth it just for the consolidation.)

@pwendell
Copy link
Contributor

pwendell commented Apr 5, 2014

/cc @marmbrus - do you have strong attachments to scala logging? For the short term I do agree consolidating on Spark's existing logging would be nice. In particular, downstream consumers of Spark's logging subsystem do crazy stuff like routing to different sfl4j backends or inserting other log bridges and interceptors (e.g. http://apache-spark-developers-list.1001551.n3.nabble.com/0-9-0-forces-log4j-usage-td532.html)

Adding another logger into the mix seems like it could be risky. Or at least, it might be nice to merge this now and then separately decide if we want Spark's overall logging system to migrate to scalalogging.

@witgo
Copy link
Contributor Author

witgo commented Apr 6, 2014

We should use unified logging interfaces and overall logging system to migrate to scalalogging is a good idea.

@marmbrus
Copy link
Contributor

marmbrus commented Apr 6, 2014

Scala logging is not a new logger. This is just a set of macro functions that call sfl4j, wrapping the logging calls in checks to see if they are enabled. This lets you perform logging without the performance overhead of doing string concatenation when the log doesn't fire, allocating a closure (like current spark logging does), or having to manually write out the if checks yourself.

@pwendell
Copy link
Contributor

pwendell commented Apr 6, 2014

I think Spark's current approach is to just add checks around logging calls. It seems like if it's only a matter of saving us from having to write 10 if statements (which I don't think have changed or been modified in several years), it might not be worth having an extra dependency to track.

I'm guessing these if statements get optimized at runtime to the point where they are very cheap.

https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/Logging.scala#L54

@marmbrus
Copy link
Contributor

marmbrus commented Apr 6, 2014

Its not a matter of the if statements inside of spark logging code. Scala logging rewrites your log statements to put the if checks inline with every logging statement.

You write: logger.debug(s"Log message $someExpensiveComputation")
You get:

if(logger.debugEnabled) {
  val logMsg = "Log message" + someExpensiveComputation()
  logger.debug(logMsg)
}

Sparks code does something like this:

val logMessage = new Function0() { def call() =  "Log message" + someExpensiveComputation() }
log.debug(logMessage)

The macros allow you to avoid both the function call, and possible gc overhead of allocating the closure in cases where the log does not fire.

While not huge, this does make a performance difference in practice:

std logging: 19885.48ms
spark logging 914.408ms
scala logging 729.779ms

(And this was only a micro benchmark with no GC pressure).

Given that this is library is a thin layer, endorsed by typesafe, that integrates nicely with our current logging framework (and is easily pluggable if we ever decide to change), and improves performance, I really think this PR is a step backwards. If we want consistency we should probably instead get rid of Sparks home grown logging code.

@markhamstra
Copy link
Contributor

Very similar to what I was thinking @marmbrus -- but you wrote it up more completely than I would have. This is also why I've been moving some performance-critical proprietary code over to using Scala logging. It isn't huge, but it does make a difference, and it's generally no more difficult to use Scala logging than older methods, so I'd also like to see Spark moving to use Typesafe's library.

@pwendell
Copy link
Contributor

pwendell commented Apr 6, 2014

The thing is - adding a dependency is not free. It means that everyone who packages spark downstream needs to deal with a bunch of annoying stuff. For instance, this introduces a new way for us to get an unintentional version bump of slf4j in the Spark build. It also means that if users are upgrading sfl4j on their own, they might have an issue where scala-logging doesn't work against newer versions of slf4j. It means that OS packagers will have to create a new RPM/Debian package for scala-logging if one doesn't exist. Basically, my point is we should have a high bar for any dependency.

If we go with the default plan of "add it if it provides nonzero benefit" - that's way too low of a bar and would result in an unwieldy project.

Anyways on this one, maybe the best answer is to just re-write the spark logging to use scala-logging. Seems like it would be very straightforward. From my end, the main concern is just consolidating it with the normal spark logging.

@pwendell
Copy link
Contributor

pwendell commented Apr 6, 2014

@witgo could you take a crack at doing that? I think it would be a fairly small change to Logging.scala where we just use the scala-logging wrapper instead of the directly sfl4j api

@witgo witgo changed the title remove scalalogging-slf4j dependency Spark logger moving to use scala-logging Apr 7, 2014
@pwendell
Copy link
Contributor

pwendell commented Apr 7, 2014

Jenkins, test this please.

@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be cleaner to do this:

protected def logName = this.getClass.getName.stripSuffix("$")

Then let subclasses override that function. Then below you could just have:

log_ = Logger(LoggerFactory.getLogger(logName))

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I agree with your idea.Here method is better

@AmplabJenkins
Copy link

Merged build finished.

@AmplabJenkins
Copy link

Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/13835/

@aarondav
Copy link
Contributor

aarondav commented Apr 7, 2014

@marmbrus As it stands, this patch seems to give us both worlds (unnecessarily): we wrap the logging statements in a closure and then use the scala logging macro to determine if we should unwrap. However, if we were to simply make the logInfo() arguments into plain Strings, wouldn't we also lose the benefits of the Scala macro, since the string has already been materialized as the argument of logInfo? Would adding an @inline annotation avoid this or would we have to make our log* functions into macros themselves to avoid this overhead while keeping the name?

@pwendell
Copy link
Contributor

pwendell commented Apr 7, 2014

Hm good call @aarondav I'm not sure if that would work. If it doesn't then would there be any other way to do this than to just directly use the scala logger?

@pwendell
Copy link
Contributor

pwendell commented Apr 7, 2014

One solution is to have these older messages for compatibility (with deprecation) but then also expose the underlying log directly to users and convert spark's code to use the new version. The Logging trait in Spark is public so it's likely downstream applications are using it.

It's too bad though - the whole reason Logging exists is to provide an indirection layer so we don't have to depend on specific implementations but that's exactly what will cause performance overhead in this case.

@rxin
Copy link
Contributor

rxin commented Apr 7, 2014

Is performance really a problem here? I don't think we log stuff in critical path (we probably shouldn't). If we do, maybe we can just get rid of those logging.

@witgo
Copy link
Contributor Author

witgo commented Apr 7, 2014

I did not find the call that affect performance
It is possible that here:
Spark Catalyst logger.debug is called many times
May be like this
RuleExecutor.scala#L64

@witgo witgo changed the title Spark logger moving to use scala-logging SPARK-1470: Spark logger moving to use scala-logging Apr 11, 2014
@witgo
Copy link
Contributor Author

witgo commented Apr 11, 2014

@pwendell
How to let Jenkins to run test?
Submit a Jenkins, test this please. comment ?

@rxin
Copy link
Contributor

rxin commented Apr 13, 2014

We should wait until questions like these are answered before we move to scala-logging.

lightbend-labs/scala-logging#4

typesafehub/scalalogging#42

At the very least, when we decide to move, we should make sure we can switch off it easily in case we found (blocker level) problems with it.

@witgo
Copy link
Contributor Author

witgo commented Apr 16, 2014

I don't think macros is faster than the inline method

import scala.compat.Platform

class Log(val flag: Boolean) {

  def logFunction0(msg: => String) {
    if (flag)
      println(msg)
  }

  def logString(msg: String) {
    if (flag)
      println(msg)
  }

  @inline
  def logInline(msg: => String) {
    if (flag)
      println(msg)
  }
}

val log = new Log(false)
var startTime = Platform.currentTime
var i = 0
while (i < 1000000) {
  log.logInline("" + Thread.currentThread.getStackTrace().
    toSeq.take(10).mkString("\\n"))
  i += 1
}
var stopTime = Platform.currentTime
Platform.collectGarbage
println("logInline: " + (stopTime - startTime) + " ms")

startTime = Platform.currentTime
i = 0
while (i < 1000000) {
  log.logFunction0("" + Thread.currentThread.getStackTrace().
    toSeq.take(10).mkString("\\n"))
  i += 1
}
stopTime = Platform.currentTime
Platform.collectGarbage
println("logFunction0: " + (stopTime - startTime) + " ms")

startTime = Platform.currentTime
i = 0
while (i < 1000000) {
  log.logString("" + Thread.currentThread.getStackTrace().
    toSeq.take(10).mkString("\\n"))
  i += 1
}
stopTime = Platform.currentTime
Platform.collectGarbage
println("logString: " + (stopTime - startTime) + " ms")

=>

logInline: 9 ms
logFunction0: 14 ms
logString: 29114 ms
logInline: 10 ms
logFunction0: 13 ms
logString: 28876 ms
logInline: 9 ms
logFunction0: 13 ms
logString: 28521 ms

@witgo
Copy link
Contributor Author

witgo commented Apr 19, 2014

@marmbrus
You are right ,macros is faster than the inline method. There's a short gap.

the test code
the results =>

1
logString: 38987.745 ms
logInline: 14.198 ms
logFunction0: 16.22 ms
logMacros: 12.312 ms

2
logString: 39037.412 ms
logInline: 14.349 ms
logFunction0: 16.2 ms
logMacros: 12.249 ms

3
logString: 39169.036 ms
logInline: 14.324 ms
logFunction0: 16.871 ms
logMacros: 11.324 ms

@witgo witgo changed the title SPARK-1470: Spark logger moving to use scala-logging [SPARK-1470] Spark logger moving to use scala-logging May 11, 2014
@witgo
Copy link
Contributor Author

witgo commented May 11, 2014

@pwendell @marmbrus @rxin
lightbend-labs/scala-logging#4 has been solved by lightbend-labs/scala-logging#15
The PR should be able to merge into master.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why was this change made? It's almost the same thing as Utils.getSparkClassLoader, just not using that abstraction?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Uh, I'm sorry, I think this is a problem caused by the merging code.

@witgo witgo changed the title [SPARK-1470] Spark logger moving to use scala-logging [SPARK-1470] Use the scala-logging wrapper instead of the directly sfl4j api Jun 22, 2014
@witgo witgo closed this Jun 25, 2014
@witgo witgo deleted the remove_scalalogging branch June 25, 2014 03:49
@witgo
Copy link
Contributor Author

witgo commented Jul 11, 2014

This can't automatic test. I submit a new PR #1369.

@witgo witgo changed the title [SPARK-1470] Use the scala-logging wrapper instead of the directly sfl4j api [SPARK-1470] remove scalalogging-slf4j dependency Aug 2, 2014
@witgo witgo restored the remove_scalalogging branch August 2, 2014 10:27
@witgo witgo reopened this Aug 2, 2014
@witgo witgo changed the title [SPARK-1470] remove scalalogging-slf4j dependency [SPARK-1470][SPARK-1842] Use the scala-logging wrapper instead of the directly sfl4j api Aug 2, 2014
@witgo
Copy link
Contributor Author

witgo commented Aug 2, 2014

Jenkins, test this please.

@SparkQA
Copy link

SparkQA commented Aug 2, 2014

QA tests have started for PR 332. This patch merges cleanly.
View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17767/consoleFull

@SparkQA
Copy link

SparkQA commented Aug 2, 2014

QA results for PR 332:
- This patch PASSES unit tests.
- This patch merges cleanly
- This patch adds no public classes

For more information see test ouptut:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17767/consoleFull

@witgo witgo closed this Aug 3, 2014
@witgo witgo deleted the remove_scalalogging branch August 3, 2014 01:25
lins05 pushed a commit to lins05/spark that referenced this pull request Jun 8, 2017
* Update tags

* update tags in conf directory
erikerlandson pushed a commit to erikerlandson/spark that referenced this pull request Jul 28, 2017
* Update tags

* update tags in conf directory
mccheah pushed a commit to mccheah/spark that referenced this pull request Oct 3, 2018
bzhaoopenstack pushed a commit to bzhaoopenstack/spark that referenced this pull request Sep 11, 2019
Resolve dockerhub credentials leak
arjunshroff pushed a commit to arjunshroff/spark that referenced this pull request Nov 24, 2020
turboFei pushed a commit to turboFei/spark that referenced this pull request Nov 6, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants