-
Notifications
You must be signed in to change notification settings - Fork 28.9k
replace println to log4j #1372
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
replace println to log4j #1372
Conversation
Our program needs to receive a large amount of data and run for a long time. We set the log level to WARN but "Storing iterator" "received single" as such message written to the log file. (over yarn)
|
Can one of the admins verify this patch? |
|
I have verified, the log level is set to Info right? |
|
Jenkins, test this please |
|
QA tests have started for PR 1372. This patch merges cleanly. |
|
QA results for PR 1372: |
|
LGTM - I could see us maybe moving this to |
|
Yeah I agree they might need to be debug. @tdas what do you think? |
|
Yikes, thats a oversight on my part. The ones related to storing a single item, should be totally removed, and the other ones related to storing iterator should be logdebug. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 to prior comments about logDebug/removing -- as an additional nit, please put a space after Logging.
|
@fireflyc will you have a chance to address the comments so we can merge this? |
|
I modified 'info' into the 'debug' level. |
|
There has been further comments regarding this, from me and @aarondav. It would be great if you can address them as well. |
|
I've merged this, thanks. Will fix the style issue later. |
|
BTW @fireflyc please create an account on JIRA (https://issues.apache.org/jira/browse/SPARK) and let me know its name so I can assign this issue to you. |
|
My account is fireflyc, please assign the issue to me. |
|
hey this change has not been included in 1.0.2 release. any heads up on the version in which this will be reflected ? |
|
It will be in 1.1. I guess we can also backport it to branch-1.0 -- how bad is the issue, does it cause some problems or is it just annoying? |
|
its just annoying, its ok, I have built spark form source and using it as external lib. btw when would be aprox release date for 1.1 and I was reading about it in some forum u talking about scala 2.11, will it compatible with scala 2.11.x ? |
|
Alright, I'll cherry-pick this into branch 1.0 as well. Spark 1.1 is targeted for being released at the end of this month, and it won't have Scala 2.11 support. However, there are some open patches for that against master that will hopefully let us add it in 1.2 (three months from now). |
Our program needs to receive a large amount of data and run for a long time. We set the log level to WARN but "Storing iterator" "received single" as such message written to the log file. (over yarn) Author: fireflyc <[email protected]> Closes #1372 from fireflyc/fix-replace-stdout-log and squashes the following commits: e684140 [fireflyc] 'info' modified into the 'debug' fa22a38 [fireflyc] replace println to log4j
Our program needs to receive a large amount of data and run for a long time. We set the log level to WARN but "Storing iterator" "received single" as such message written to the log file. (over yarn) Author: fireflyc <[email protected]> Closes apache#1372 from fireflyc/fix-replace-stdout-log and squashes the following commits: e684140 [fireflyc] 'info' modified into the 'debug' fa22a38 [fireflyc] replace println to log4j
…f `G1GC` and `ON_HEAP` are used (apache#1372) ### What changes were proposed in this pull request? Spark's tungsten memory model usually tries to allocate memory by one `page` each time and allocated by `long[pageSizeBytes/8]` in `HeapMemoryAllocator.allocate`. Remember that java long array needs extra object header (usually 16 bytes in 64bit system), so the really bytes allocated is `pageSize+16`. Assume that the `G1HeapRegionSize` is 4M and `pageSizeBytes` is 4M as well. Since every time we need to allocate 4M+16byte memory, so two regions are used with one region only occupies 16byte. Then there are about **50%** memory waste. It can happenes under different combinations of G1HeapRegionSize (varies from 1M to 32M) and pageSizeBytes (varies from 1M to 64M). We can demo it using following piece of code. ``` public static void bufferSizeTest(boolean optimize) { long totalAllocatedSize = 0L; int blockSize = 1024 * 1024 * 4; // 4m if (optimize) { blockSize -= 16; } List<long[]> buffers = new ArrayList<>(); while (true) { long[] arr = new long[blockSize/8]; buffers.add(arr); totalAllocatedSize += blockSize; System.out.println("Total allocated size: " + totalAllocatedSize); } } ``` Run it using following jvm params ``` java -Xmx100m -XX:+UseG1GC -XX:G1HeapRegionSize=4m -XX:-UseGCOverheadLimit -verbose:gc -XX:+UnlockDiagnosticVMOptions -XX:+G1SummarizeConcMark -Xss4m -XX:+ExitOnOutOfMemoryError -XX:ParallelGCThreads=4 -XX:ConcGCThreads=4 ``` with optimized = false ``` Total allocated size: 46137344 [GC pause (G1 Humongous Allocation) (young) 44M->44M(100M), 0.0007091 secs] [GC pause (G1 Evacuation Pause) (young) (initial-mark)-- 48M->48M(100M), 0.0021528 secs] [GC concurrent-root-region-scan-start] [GC concurrent-root-region-scan-end, 0.0000021 secs] [GC concurrent-mark-start] [GC pause (G1 Evacuation Pause) (young) 48M->48M(100M), 0.0011289 secs] [Full GC (Allocation Failure) 48M->48M(100M), 0.0017284 secs] [Full GC (Allocation Failure) 48M->48M(100M), 0.0013437 secs] Terminating due to java.lang.OutOfMemoryError: Java heap space ``` with optimzied = true ``` Total allocated size: 96468624 [GC pause (G1 Humongous Allocation) (young)-- 92M->92M(100M), 0.0024416 secs] [Full GC (Allocation Failure) 92M->92M(100M), 0.0019883 secs] [GC pause (G1 Evacuation Pause) (young) (initial-mark) 96M->96M(100M), 0.0004282 secs] [GC concurrent-root-region-scan-start] [GC concurrent-root-region-scan-end, 0.0000040 secs] [GC concurrent-mark-start] [GC pause (G1 Evacuation Pause) (young) 96M->96M(100M), 0.0003269 secs] [Full GC (Allocation Failure) 96M->96M(100M), 0.0012409 secs] [Full GC (Allocation Failure) 96M->96M(100M), 0.0012607 secs] Terminating due to java.lang.OutOfMemoryError: Java heap space ``` This PR try to optimize the pageSize to avoid memory waste. This case exists not only in `MemoryManagement`, but also in other places such as `TorrentBroadcast.blockSize`. I would like to submit a followup PR if this modification is reasonable. ### Why are the changes needed? To avoid memory waste in G1 GC ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Existing UT Closes apache#34846 from WangGuangxin/g1_humongous_optimize. Authored-by: wangguangxin.cn <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]> (cherry picked from commit e81333c) Signed-off-by: Dongjoon Hyun <[email protected]> (cherry picked from commit 92fd5bb) Signed-off-by: Dongjoon Hyun <[email protected]> Co-authored-by: wangguangxin.cn <[email protected]>
Our program needs to receive a large amount of data and run for a long
time.
We set the log level to WARN but "Storing iterator" "received single"
as such message written to the log file. (over yarn)