-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-22471][SQL] SQLListener consumes much memory causing OutOfMemoryError #19700
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@vanzin would you like to review? |
|
|
||
| private val retainedExecutions = conf.getInt("spark.sql.ui.retainedExecutions", 1000) | ||
|
|
||
| private val retainedStages = conf.getInt("spark.ui.retainedStages", 1000) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW, the name should be spark.sql.ui.retainedStages instead of spark.ui.retainedStages.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dongjoon-hyun , It is already documented in the same file configuration.md:
How many stages the Spark UI and status APIs remember before garbage collecting.
This is a target maximum, and fewer elements may be retained in some circumstances.
I did not involve a new parameter, I just used an existing one.
Regarding renaming to spark.sql.ui.retainedStages, I believe it should be done in a separate pull request - if should. This parameter is also used in other parts of Spark code, not only SQL.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah. My bad. Forget about that. Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tashoyan . Since you are not introducing a new one, could you use the existing default value?
- private val retainedStages = conf.getInt("spark.ui.retainedStages", 1000)
+ private val retainedStages =
+ conf.getInt("spark.ui.retainedStages", SparkUI.DEFAULT_RETAINED_STAGES)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done for branch-2.2
|
ok to test |
|
I'm a little on the fence about considering this for master given #19681 is probably going to merged some time soon, but it should be ok for 2.2. |
|
+1 for @vanzin 's advice. |
|
Well, it would be good to have this quick fix in a 2.2-compatible bugfix release, without waiting for 2.3.0. |
|
You can open the PR directly against 2.2. |
|
Test build #83653 has finished for PR 19700 at commit
|
|
Done for branch-2.2: #19711 |
|
@tashoyan you can close out this one. |
What changes were proposed in this pull request?
This PR addresses the issue SPARK-22471. The modified version of
SQLListenerrespects the settingspark.ui.retainedStagesand keeps the number of the tracked stages within the specified limit. The hash map_stageIdToStageMetricsdoes not outgrow the limit, hence overall memory consumption does not grow with time anymore.How was this patch tested?
A new unit test covers this fix - see
SQLListenerMemorySuite.scala.