Commit 16186cd
committed
[SPARK-20955][CORE] Intern "executorId" to reduce the memory usage
## What changes were proposed in this pull request?
In [this line](https://github.com/apache/spark/blob/f7cf2096fdecb8edab61c8973c07c6fc877ee32d/core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala#L128), it uses the `executorId` string received from executors and finally it will go into `TaskUIData`. As deserializing the `executorId` string will always create a new instance, we have a lot of duplicated string instances.
This PR does a String interning for TaskUIData to reduce the memory usage.
## How was this patch tested?
Manually test using `bin/spark-shell --master local-cluster[6,1,1024]`. Test codes:
```
for (_ <- 1 to 10) { sc.makeRDD(1 to 1000, 1000).count() }
Thread.sleep(2000)
val l = sc.getClass.getMethod("jobProgressListener").invoke(sc).asInstanceOf[org.apache.spark.ui.jobs.JobProgressListener]
org.apache.spark.util.SizeEstimator.estimate(l.stageIdToData)
```
This PR reduces the size of `stageIdToData` from 3487280 to 3009744 (86.3%) in the above case.
Author: Shixiong Zhu <[email protected]>
Closes #18177 from zsxwing/SPARK-20955.1 parent e11d90b commit 16186cd
1 file changed
+12
-2
lines changedLines changed: 12 additions & 2 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
20 | 20 | | |
21 | 21 | | |
22 | 22 | | |
| 23 | + | |
| 24 | + | |
23 | 25 | | |
24 | 26 | | |
25 | 27 | | |
| |||
141 | 143 | | |
142 | 144 | | |
143 | 145 | | |
| 146 | + | |
| 147 | + | |
| 148 | + | |
| 149 | + | |
| 150 | + | |
| 151 | + | |
| 152 | + | |
| 153 | + | |
144 | 154 | | |
145 | 155 | | |
146 | 156 | | |
| |||
155 | 165 | | |
156 | 166 | | |
157 | 167 | | |
158 | | - | |
159 | | - | |
| 168 | + | |
| 169 | + | |
160 | 170 | | |
161 | 171 | | |
162 | 172 | | |
| |||
0 commit comments