-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-16919] Configurable update interval for console progress bar #14507
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Is this worth making another knob to configure? Just disable it if it's at all annoying? |
|
@srowen : Our daily jobs run remotely and users see the stdout / stderr logs in case they want to dig whats going on. With current update frequency, longer job's log tend to be flooded with console progress bar updates. Turning it off was an option but then one would have to click on the UI and correlate the timestamps with ones from the log (in case there is need to debug something). Setting the frequency to a higher value would give best of both the worlds. |
|
Hm, would a higher value make sense? 200ms is pretty low. 3 seconds? |
|
Test build #63271 has finished for PR 14507 at commit
|
|
For batch jobs running for say ~10 hours, with 3 sec frequency, there would be 18k lines from the progress bar. That sounds like a lot. In Hadoop land they used to have 3 sec but it was made configurable [0] due to the same reason. See MAPREDUCE-6242 [1]. For ad hoc use cases, people would want to have faster response (eg. Presto updates every half a sec [2]). Coming up with a static value which works for both the worlds is hard. To be precise, we don't rely on it for debugging issues. Its more for users so that they see that things are progressing and also while eye-balling to see if there is something obvious which went wrong. [0] : https://github.com/apache/hadoop/blob/2e1d0ff4e901b8313c8d71869735b94ed8bc40a0/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/Task.java#L784 |
| val CR = '\r' | ||
| // Update period of progress bar, in milliseconds | ||
| val UPDATE_PERIOD = 200L | ||
| val UPDATE_PERIOD_MSEC = if (SparkEnv.get == null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this be null, and, can't you use the context's conf here instead?
These can be private perhaps?
These constants really should have been in an object, but now that this sorta needs to be a field anyway, consider making its name more like a field name "updatePeriodMSec"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did suggested changes
|
Hive has |
|
OK, seems reasonable to me. |
|
Test build #63278 has finished for PR 14507 at commit
|
|
Merged to master |
What changes were proposed in this pull request?
Currently the update interval for the console progress bar is hardcoded. This PR makes it configurable for users.
How was this patch tested?
Ran a long running job and with a high value of update interval, the updates were shown less frequently.