Commit b824828
committed
Sort-merge join operator spilling performance improvements
What changes were proposed in this pull request?
The following list of changes will improve SQL execution performance when data is spilled on the disk:
1) Implement lazy initialization of UnsafeSorterSpillReader - iterator on top of spilled rows:
... During SortMergeJoin (Left Semi Join) execution, the iterator on the spill data is created but no iteration over the data is done.
... Having lazy initialization of UnsafeSorterSpillReader will enable efficient processing of SortMergeJoin even if data is spilled onto disk. Unnecessary I/O will be avoided.
2) Decrease initial memory read buffer size in UnsafeSorterSpillReader from 1MB to 1KB:
... UnsafeSorterSpillReader constructor takes lot of time due to size of default 1MB memory read buffer.
... The code already has logic to increase the memory read buffer if it cannot fit the data, so decreasing the size to 1K is safe and has positive performance impact.
3) Improve memory utilization when spilling is enabled in ExternalAppendOnlyUnsafeRowArrey
... In the current implementation, when spilling is enabled, UnsafeExternalSorter object is created and then data moved from ExternalAppendOnlyUnsafeRowArrey object into UnsafeExternalSorter and then ExternalAppendOnlyUnsafeRowArrey object is emptied. Just before ExternalAppendOnlyUnsafeRowArrey object is emptied there are both objects in the memory with the same data. That require double memory and there is duplication of data. This can be avoided.
... In the proposed solution, when spark.sql.sortMergeJoinExec.buffer.in.memory.threshold is reached adding new rows into ExternalAppendOnlyUnsafeRowArray object stops. UnsafeExternalSorter object is created and new rows are added into this object. ExternalAppendOnlyUnsafeRowArray object retains all rows already added into this object. This approach will enable better memory utilization and avoid unnecessary movement of data from one object into another.
Why are the changes needed?
Testing with TPC-DS 100 TB benchmark data set showed that some of SQLs (example query 14) are not able to run even with extremely large Spark executor memory.
Spark spilling feature has to be enabled, in order to be able to process these SQLs. Processing of SQLs becomes extremely slow when spilling is enabled.
The test of this solution with query 14 and enabled spilling on the disk, showed 500X performance improvements and it didn�t degrade performance of the other SQLs from TPC-DS benchmark.
Does this PR introduce any user-facing change?
No
How was this patch tested?
By running TPC-DS SQLs with different data sets 10 TB and 100 TB
By running all Spark tests.1 parent 1c6dff7 commit b824828
File tree
12 files changed
+175
-77
lines changed- core/src
- main/java/org/apache/spark
- unsafe/map
- util/collection/unsafe/sort
- test/java/org/apache/spark/util/collection/unsafe/sort
- sql/core
- benchmarks
- src
- main/scala/org/apache/spark/sql/execution
- test/scala/org/apache/spark/sql
- execution
12 files changed
+175
-77
lines changedLines changed: 3 additions & 3 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
322 | 322 | | |
323 | 323 | | |
324 | 324 | | |
325 | | - | |
326 | | - | |
327 | | - | |
328 | 325 | | |
| 326 | + | |
| 327 | + | |
| 328 | + | |
329 | 329 | | |
330 | 330 | | |
331 | 331 | | |
| |||
Lines changed: 22 additions & 13 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
506 | 506 | | |
507 | 507 | | |
508 | 508 | | |
509 | | - | |
| 509 | + | |
510 | 510 | | |
511 | 511 | | |
512 | 512 | | |
| |||
681 | 681 | | |
682 | 682 | | |
683 | 683 | | |
684 | | - | |
685 | | - | |
686 | | - | |
687 | 684 | | |
688 | | - | |
689 | 685 | | |
690 | 686 | | |
691 | 687 | | |
692 | | - | |
| 688 | + | |
| 689 | + | |
693 | 690 | | |
694 | 691 | | |
695 | 692 | | |
696 | 693 | | |
697 | | - | |
698 | | - | |
699 | | - | |
700 | | - | |
| 694 | + | |
| 695 | + | |
701 | 696 | | |
702 | 697 | | |
703 | 698 | | |
704 | 699 | | |
705 | 700 | | |
706 | | - | |
707 | | - | |
708 | | - | |
| 701 | + | |
709 | 702 | | |
710 | 703 | | |
711 | 704 | | |
| |||
720 | 713 | | |
721 | 714 | | |
722 | 715 | | |
| 716 | + | |
| 717 | + | |
| 718 | + | |
| 719 | + | |
| 720 | + | |
| 721 | + | |
| 722 | + | |
| 723 | + | |
| 724 | + | |
| 725 | + | |
| 726 | + | |
| 727 | + | |
| 728 | + | |
| 729 | + | |
| 730 | + | |
| 731 | + | |
723 | 732 | | |
724 | 733 | | |
Lines changed: 2 additions & 2 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
21 | 21 | | |
22 | 22 | | |
23 | 23 | | |
24 | | - | |
| 24 | + | |
25 | 25 | | |
26 | 26 | | |
27 | 27 | | |
| |||
33 | 33 | | |
34 | 34 | | |
35 | 35 | | |
36 | | - | |
| 36 | + | |
37 | 37 | | |
Lines changed: 1 addition & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
71 | 71 | | |
72 | 72 | | |
73 | 73 | | |
74 | | - | |
| 74 | + | |
75 | 75 | | |
76 | 76 | | |
77 | 77 | | |
| |||
Lines changed: 53 additions & 29 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
47 | 47 | | |
48 | 48 | | |
49 | 49 | | |
50 | | - | |
| 50 | + | |
51 | 51 | | |
52 | 52 | | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
53 | 57 | | |
54 | 58 | | |
55 | 59 | | |
56 | 60 | | |
57 | 61 | | |
58 | 62 | | |
59 | | - | |
60 | | - | |
61 | | - | |
62 | | - | |
63 | | - | |
64 | | - | |
65 | | - | |
66 | | - | |
67 | | - | |
68 | | - | |
69 | | - | |
70 | | - | |
71 | | - | |
72 | | - | |
73 | | - | |
74 | | - | |
75 | | - | |
76 | | - | |
77 | | - | |
78 | | - | |
79 | | - | |
80 | | - | |
81 | | - | |
82 | | - | |
83 | | - | |
84 | | - | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
85 | 67 | | |
86 | 68 | | |
87 | 69 | | |
88 | | - | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
| 73 | + | |
| 74 | + | |
89 | 75 | | |
90 | 76 | | |
91 | 77 | | |
92 | 78 | | |
93 | | - | |
| 79 | + | |
| 80 | + | |
| 81 | + | |
| 82 | + | |
| 83 | + | |
94 | 84 | | |
95 | 85 | | |
96 | 86 | | |
97 | 87 | | |
98 | 88 | | |
| 89 | + | |
| 90 | + | |
| 91 | + | |
| 92 | + | |
99 | 93 | | |
100 | 94 | | |
101 | 95 | | |
| |||
148 | 142 | | |
149 | 143 | | |
150 | 144 | | |
| 145 | + | |
| 146 | + | |
| 147 | + | |
| 148 | + | |
| 149 | + | |
| 150 | + | |
| 151 | + | |
| 152 | + | |
| 153 | + | |
| 154 | + | |
| 155 | + | |
| 156 | + | |
| 157 | + | |
| 158 | + | |
| 159 | + | |
| 160 | + | |
| 161 | + | |
| 162 | + | |
| 163 | + | |
| 164 | + | |
| 165 | + | |
| 166 | + | |
| 167 | + | |
| 168 | + | |
| 169 | + | |
| 170 | + | |
| 171 | + | |
| 172 | + | |
| 173 | + | |
| 174 | + | |
151 | 175 | | |
Lines changed: 2 additions & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
17 | 17 | | |
18 | 18 | | |
19 | 19 | | |
| 20 | + | |
20 | 21 | | |
21 | 22 | | |
22 | 23 | | |
| |||
51 | 52 | | |
52 | 53 | | |
53 | 54 | | |
54 | | - | |
| 55 | + | |
55 | 56 | | |
56 | 57 | | |
57 | 58 | | |
| |||
Lines changed: 5 additions & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
42 | 42 | | |
43 | 43 | | |
44 | 44 | | |
45 | | - | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
Lines changed: 5 additions & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
42 | 42 | | |
43 | 43 | | |
44 | 44 | | |
45 | | - | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
Lines changed: 35 additions & 21 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
95 | 95 | | |
96 | 96 | | |
97 | 97 | | |
98 | | - | |
| 98 | + | |
| 99 | + | |
99 | 100 | | |
100 | 101 | | |
101 | 102 | | |
| |||
124 | 125 | | |
125 | 126 | | |
126 | 127 | | |
127 | | - | |
128 | | - | |
129 | | - | |
130 | | - | |
131 | | - | |
132 | | - | |
133 | | - | |
134 | | - | |
135 | | - | |
136 | | - | |
137 | | - | |
138 | | - | |
139 | 128 | | |
140 | 129 | | |
141 | 130 | | |
| |||
168 | 157 | | |
169 | 158 | | |
170 | 159 | | |
171 | | - | |
| 160 | + | |
| 161 | + | |
| 162 | + | |
| 163 | + | |
| 164 | + | |
| 165 | + | |
| 166 | + | |
| 167 | + | |
| 168 | + | |
172 | 169 | | |
173 | 170 | | |
174 | 171 | | |
| |||
204 | 201 | | |
205 | 202 | | |
206 | 203 | | |
207 | | - | |
| 204 | + | |
208 | 205 | | |
209 | | - | |
| 206 | + | |
| 207 | + | |
210 | 208 | | |
211 | 209 | | |
212 | | - | |
| 210 | + | |
213 | 211 | | |
214 | | - | |
| 212 | + | |
| 213 | + | |
| 214 | + | |
| 215 | + | |
| 216 | + | |
| 217 | + | |
| 218 | + | |
| 219 | + | |
| 220 | + | |
215 | 221 | | |
216 | 222 | | |
217 | 223 | | |
218 | | - | |
219 | | - | |
220 | | - | |
| 224 | + | |
| 225 | + | |
| 226 | + | |
| 227 | + | |
| 228 | + | |
| 229 | + | |
| 230 | + | |
| 231 | + | |
| 232 | + | |
| 233 | + | |
| 234 | + | |
221 | 235 | | |
222 | 236 | | |
223 | 237 | | |
| |||
Lines changed: 2 additions & 2 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
614 | 614 | | |
615 | 615 | | |
616 | 616 | | |
617 | | - | |
| 617 | + | |
618 | 618 | | |
619 | 619 | | |
620 | 620 | | |
| |||
628 | 628 | | |
629 | 629 | | |
630 | 630 | | |
631 | | - | |
| 631 | + | |
632 | 632 | | |
633 | 633 | | |
634 | 634 | | |
| |||
0 commit comments