replace println to log4j #1372

fireflyc · 2014-07-11T08:57:07Z

Our program needs to receive a large amount of data and run for a long
time.
We set the log level to WARN but "Storing iterator" "received single"
as such message written to the log file. (over yarn)

Our program needs to receive a large amount of data and run for a long time. We set the log level to WARN but "Storing iterator" "received single" as such message written to the log file. (over yarn)

AmplabJenkins · 2014-07-11T09:01:17Z

Can one of the admins verify this patch?

fireflyc · 2014-07-11T14:51:32Z

I have verified, the log level is set to Info right?

mateiz · 2014-07-11T20:53:23Z

Jenkins, test this please

SparkQA · 2014-07-11T20:57:29Z

QA tests have started for PR 1372. This patch merges cleanly.
View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16574/consoleFull

SparkQA · 2014-07-11T22:34:00Z

QA results for PR 1372:
- This patch PASSES unit tests.
- This patch merges cleanly
- This patch adds the following public classes (experimental):
trait ActorHelper extends Logging{

For more information see test ouptut:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16574/consoleFull

pwendell · 2014-07-12T07:09:21Z

LGTM - I could see us maybe moving this to logDebug in the future... it could get pretty chatty if you had an active stream. But seems reasonable to start with this at info.

mateiz · 2014-07-12T17:32:00Z

Yeah I agree they might need to be debug. @tdas what do you think?

tdas · 2014-07-12T22:49:47Z

Yikes, thats a oversight on my part. The ones related to storing a single item, should be totally removed, and the other ones related to storing iterator should be logdebug.

aarondav · 2014-07-12T22:52:20Z

streaming/src/main/scala/org/apache/spark/streaming/receiver/ActorReceiver.scala

+1 to prior comments about logDebug/removing -- as an additional nit, please put a space after Logging.

pwendell · 2014-07-15T20:54:53Z

@fireflyc will you have a chance to address the comments so we can merge this?

fireflyc · 2014-07-16T05:40:50Z

I modified 'info' into the 'debug' level.

tdas · 2014-07-16T19:54:35Z

There has been further comments regarding this, from me and @aarondav. It would be great if you can address them as well.

mateiz · 2014-07-25T17:49:59Z

I've merged this, thanks. Will fix the style issue later.

mateiz · 2014-07-25T17:50:54Z

BTW @fireflyc please create an account on JIRA (https://issues.apache.org/jira/browse/SPARK) and let me know its name so I can assign this issue to you.

fireflyc · 2014-07-26T05:22:19Z

My account is fireflyc, please assign the issue to me.

critikaled · 2014-08-10T15:37:39Z

hey this change has not been included in 1.0.2 release. any heads up on the version in which this will be reflected ?

mateiz · 2014-08-10T22:09:02Z

It will be in 1.1. I guess we can also backport it to branch-1.0 -- how bad is the issue, does it cause some problems or is it just annoying?

critikaled · 2014-08-11T14:42:30Z

its just annoying, its ok, I have built spark form source and using it as external lib. btw when would be aprox release date for 1.1 and I was reading about it in some forum u talking about scala 2.11, will it compatible with scala 2.11.x ?
thanks.

mateiz · 2014-08-13T18:46:34Z

Alright, I'll cherry-pick this into branch 1.0 as well.

Spark 1.1 is targeted for being released at the end of this month, and it won't have Scala 2.11 support. However, there are some open patches for that against master that will hopefully let us add it in 1.2 (three months from now).

Our program needs to receive a large amount of data and run for a long time. We set the log level to WARN but "Storing iterator" "received single" as such message written to the log file. (over yarn) Author: fireflyc <[email protected]> Closes #1372 from fireflyc/fix-replace-stdout-log and squashes the following commits: e684140 [fireflyc] 'info' modified into the 'debug' fa22a38 [fireflyc] replace println to log4j

Our program needs to receive a large amount of data and run for a long time. We set the log level to WARN but "Storing iterator" "received single" as such message written to the log file. (over yarn) Author: fireflyc <[email protected]> Closes apache#1372 from fireflyc/fix-replace-stdout-log and squashes the following commits: e684140 [fireflyc] 'info' modified into the 'debug' fa22a38 [fireflyc] replace println to log4j

…f `G1GC` and `ON_HEAP` are used (apache#1372) ### What changes were proposed in this pull request? Spark's tungsten memory model usually tries to allocate memory by one `page` each time and allocated by `long[pageSizeBytes/8]` in `HeapMemoryAllocator.allocate`. Remember that java long array needs extra object header (usually 16 bytes in 64bit system), so the really bytes allocated is `pageSize+16`. Assume that the `G1HeapRegionSize` is 4M and `pageSizeBytes` is 4M as well. Since every time we need to allocate 4M+16byte memory, so two regions are used with one region only occupies 16byte. Then there are about **50%** memory waste. It can happenes under different combinations of G1HeapRegionSize (varies from 1M to 32M) and pageSizeBytes (varies from 1M to 64M). We can demo it using following piece of code. ``` public static void bufferSizeTest(boolean optimize) { long totalAllocatedSize = 0L; int blockSize = 1024 * 1024 * 4; // 4m if (optimize) { blockSize -= 16; } List<long[]> buffers = new ArrayList<>(); while (true) { long[] arr = new long[blockSize/8]; buffers.add(arr); totalAllocatedSize += blockSize; System.out.println("Total allocated size: " + totalAllocatedSize); } } ``` Run it using following jvm params ``` java -Xmx100m -XX:+UseG1GC -XX:G1HeapRegionSize=4m -XX:-UseGCOverheadLimit -verbose:gc -XX:+UnlockDiagnosticVMOptions -XX:+G1SummarizeConcMark -Xss4m -XX:+ExitOnOutOfMemoryError -XX:ParallelGCThreads=4 -XX:ConcGCThreads=4 ``` with optimized = false ``` Total allocated size: 46137344 [GC pause (G1 Humongous Allocation) (young) 44M->44M(100M), 0.0007091 secs] [GC pause (G1 Evacuation Pause) (young) (initial-mark)-- 48M->48M(100M), 0.0021528 secs] [GC concurrent-root-region-scan-start] [GC concurrent-root-region-scan-end, 0.0000021 secs] [GC concurrent-mark-start] [GC pause (G1 Evacuation Pause) (young) 48M->48M(100M), 0.0011289 secs] [Full GC (Allocation Failure) 48M->48M(100M), 0.0017284 secs] [Full GC (Allocation Failure) 48M->48M(100M), 0.0013437 secs] Terminating due to java.lang.OutOfMemoryError: Java heap space ``` with optimzied = true ``` Total allocated size: 96468624 [GC pause (G1 Humongous Allocation) (young)-- 92M->92M(100M), 0.0024416 secs] [Full GC (Allocation Failure) 92M->92M(100M), 0.0019883 secs] [GC pause (G1 Evacuation Pause) (young) (initial-mark) 96M->96M(100M), 0.0004282 secs] [GC concurrent-root-region-scan-start] [GC concurrent-root-region-scan-end, 0.0000040 secs] [GC concurrent-mark-start] [GC pause (G1 Evacuation Pause) (young) 96M->96M(100M), 0.0003269 secs] [Full GC (Allocation Failure) 96M->96M(100M), 0.0012409 secs] [Full GC (Allocation Failure) 96M->96M(100M), 0.0012607 secs] Terminating due to java.lang.OutOfMemoryError: Java heap space ``` This PR try to optimize the pageSize to avoid memory waste. This case exists not only in `MemoryManagement`, but also in other places such as `TorrentBroadcast.blockSize`. I would like to submit a followup PR if this modification is reasonable. ### Why are the changes needed? To avoid memory waste in G1 GC ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Existing UT Closes apache#34846 from WangGuangxin/g1_humongous_optimize. Authored-by: wangguangxin.cn <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]> (cherry picked from commit e81333c) Signed-off-by: Dongjoon Hyun <[email protected]> (cherry picked from commit 92fd5bb) Signed-off-by: Dongjoon Hyun <[email protected]> Co-authored-by: wangguangxin.cn <[email protected]>

replace println to log4j

fa22a38

Our program needs to receive a large amount of data and run for a long time. We set the log level to WARN but "Storing iterator" "received single" as such message written to the log file. (over yarn)

aarondav reviewed Jul 12, 2014
View reviewed changes

'info' modified into the 'debug'

e684140

mateiz mentioned this pull request Jul 25, 2014

Remove console logging in ActorReceiver.scala #1591

Closed

asfgit closed this in a2715cc Jul 25, 2014

replace println to log4j #1372

replace println to log4j #1372

Uh oh!

Conversation

fireflyc commented Jul 11, 2014

Uh oh!

AmplabJenkins commented Jul 11, 2014

Uh oh!

fireflyc commented Jul 11, 2014

Uh oh!

mateiz commented Jul 11, 2014

Uh oh!

SparkQA commented Jul 11, 2014

Uh oh!

SparkQA commented Jul 11, 2014

Uh oh!

pwendell commented Jul 12, 2014

Uh oh!

mateiz commented Jul 12, 2014

Uh oh!

tdas commented Jul 12, 2014

Uh oh!

aarondav Jul 12, 2014

Choose a reason for hiding this comment

Uh oh!

pwendell commented Jul 15, 2014

Uh oh!

fireflyc commented Jul 16, 2014

Uh oh!

tdas commented Jul 16, 2014

Uh oh!

mateiz commented Jul 25, 2014

Uh oh!

mateiz commented Jul 25, 2014

Uh oh!

fireflyc commented Jul 26, 2014

Uh oh!

critikaled commented Aug 10, 2014

Uh oh!

mateiz commented Aug 10, 2014

Uh oh!

critikaled commented Aug 11, 2014

Uh oh!

mateiz commented Aug 13, 2014

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants