Commit 9b20027
[SPARK-8202] [PYSPARK] fix infinite loop during external sort in PySpark
The batch size during external sort will grow up to max 10000, then shrink down to zero, causing infinite loop.
Given the assumption that the items usually have similar size, so we don't need to adjust the batch size after first spill.
cc JoshRosen rxin angelini
Author: Davies Liu <[email protected]>
Closes apache#6714 from davies/batch_size and squashes the following commits:
b170dfb [Davies Liu] update test
b9be832 [Davies Liu] Merge branch 'batch_size' of github.com:davies/spark into batch_size
6ade745 [Davies Liu] update test
5c21777 [Davies Liu] Update shuffle.py
e746aec [Davies Liu] fix batch size during sort1 parent 3164112 commit 9b20027
2 files changed
+5
-5
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
486 | 486 | | |
487 | 487 | | |
488 | 488 | | |
489 | | - | |
| 489 | + | |
490 | 490 | | |
491 | 491 | | |
492 | 492 | | |
| |||
512 | 512 | | |
513 | 513 | | |
514 | 514 | | |
515 | | - | |
516 | | - | |
517 | | - | |
518 | 515 | | |
519 | 516 | | |
520 | 517 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
179 | 179 | | |
180 | 180 | | |
181 | 181 | | |
| 182 | + | |
| 183 | + | |
| 184 | + | |
182 | 185 | | |
183 | 186 | | |
184 | | - | |
| 187 | + | |
185 | 188 | | |
186 | 189 | | |
187 | 190 | | |
| |||
0 commit comments