Commit c4da534
[SPARK-10990] [SPARK-11018] [SQL] improve unrolling of complex types
This PR improve the unrolling and read of complex types in columnar cache:
1) Using UnsafeProjection to do serialization of complex types, so they will not be serialized three times (two for actualSize)
2) Copy the bytes from UnsafeRow/UnsafeArrayData to ByteBuffer directly, avoiding the immediate byte[]
3) Using the underlying array in ByteBuffer to create UTF8String/UnsafeRow/UnsafeArrayData without copy.
Combine these optimizations, we can reduce the unrolling time from 25s to 21s (20% less), reduce the scanning time from 3.5s to 2.5s (28% less).
```
df = sqlContext.read.parquet(path)
t = time.time()
df.cache()
df.count()
print 'unrolling', time.time() - t
for i in range(10):
t = time.time()
print df.select("*")._jdf.queryExecution().toRdd().count()
print time.time() - t
```
The schema is
```
root
|-- a: struct (nullable = true)
| |-- b: long (nullable = true)
| |-- c: string (nullable = true)
|-- d: array (nullable = true)
| |-- element: long (containsNull = true)
|-- e: map (nullable = true)
| |-- key: long
| |-- value: string (valueContainsNull = true)
```
Now the columnar cache depends on that UnsafeProjection support all the data types (including UDT), this PR also fix that.
Author: Davies Liu <[email protected]>
Closes #9016 from davies/complex2.1 parent f97e932 commit c4da534
File tree
12 files changed
+188
-140
lines changed- sql
- catalyst/src/main
- java/org/apache/spark/sql/catalyst/expressions
- scala/org/apache/spark/sql/catalyst/expressions/codegen
- core/src
- main/scala/org/apache/spark/sql/columnar
- test/scala/org/apache/spark/sql/columnar
- unsafe/src/main/java/org/apache/spark/unsafe/types
12 files changed
+188
-140
lines changedLines changed: 12 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
19 | 19 | | |
20 | 20 | | |
21 | 21 | | |
| 22 | + | |
22 | 23 | | |
23 | 24 | | |
24 | 25 | | |
| |||
145 | 146 | | |
146 | 147 | | |
147 | 148 | | |
| 149 | + | |
| 150 | + | |
148 | 151 | | |
149 | 152 | | |
150 | 153 | | |
| |||
306 | 309 | | |
307 | 310 | | |
308 | 311 | | |
| 312 | + | |
| 313 | + | |
| 314 | + | |
| 315 | + | |
| 316 | + | |
| 317 | + | |
| 318 | + | |
| 319 | + | |
| 320 | + | |
309 | 321 | | |
310 | 322 | | |
311 | 323 | | |
| |||
Lines changed: 12 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
20 | 20 | | |
21 | 21 | | |
22 | 22 | | |
| 23 | + | |
23 | 24 | | |
24 | 25 | | |
25 | 26 | | |
| |||
326 | 327 | | |
327 | 328 | | |
328 | 329 | | |
| 330 | + | |
| 331 | + | |
329 | 332 | | |
330 | 333 | | |
331 | 334 | | |
| |||
602 | 605 | | |
603 | 606 | | |
604 | 607 | | |
| 608 | + | |
| 609 | + | |
| 610 | + | |
| 611 | + | |
| 612 | + | |
| 613 | + | |
| 614 | + | |
| 615 | + | |
| 616 | + | |
605 | 617 | | |
606 | 618 | | |
607 | 619 | | |
| |||
Lines changed: 5 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
129 | 129 | | |
130 | 130 | | |
131 | 131 | | |
| 132 | + | |
132 | 133 | | |
133 | 134 | | |
134 | 135 | | |
| |||
143 | 144 | | |
144 | 145 | | |
145 | 146 | | |
| 147 | + | |
146 | 148 | | |
147 | 149 | | |
148 | 150 | | |
| |||
177 | 179 | | |
178 | 180 | | |
179 | 181 | | |
| 182 | + | |
180 | 183 | | |
181 | 184 | | |
182 | 185 | | |
| |||
222 | 225 | | |
223 | 226 | | |
224 | 227 | | |
| 228 | + | |
225 | 229 | | |
226 | 230 | | |
227 | 231 | | |
| |||
255 | 259 | | |
256 | 260 | | |
257 | 261 | | |
| 262 | + | |
258 | 263 | | |
259 | 264 | | |
260 | 265 | | |
| |||
Lines changed: 1 addition & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
124 | 124 | | |
125 | 125 | | |
126 | 126 | | |
| 127 | + | |
127 | 128 | | |
128 | 129 | | |
129 | 130 | | |
| |||
Lines changed: 20 additions & 9 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
39 | 39 | | |
40 | 40 | | |
41 | 41 | | |
| 42 | + | |
| 43 | + | |
42 | 44 | | |
43 | 45 | | |
44 | 46 | | |
| |||
77 | 79 | | |
78 | 80 | | |
79 | 81 | | |
80 | | - | |
| 82 | + | |
| 83 | + | |
| 84 | + | |
| 85 | + | |
| 86 | + | |
81 | 87 | | |
82 | 88 | | |
83 | 89 | | |
| |||
167 | 173 | | |
168 | 174 | | |
169 | 175 | | |
170 | | - | |
| 176 | + | |
| 177 | + | |
| 178 | + | |
| 179 | + | |
| 180 | + | |
| 181 | + | |
171 | 182 | | |
172 | | - | |
| 183 | + | |
173 | 184 | | |
174 | | - | |
| 185 | + | |
175 | 186 | | |
176 | 187 | | |
177 | 188 | | |
178 | | - | |
| 189 | + | |
179 | 190 | | |
180 | 191 | | |
181 | 192 | | |
| |||
194 | 205 | | |
195 | 206 | | |
196 | 207 | | |
197 | | - | |
| 208 | + | |
198 | 209 | | |
199 | | - | |
| 210 | + | |
200 | 211 | | |
201 | 212 | | |
202 | 213 | | |
203 | | - | |
| 214 | + | |
204 | 215 | | |
205 | 216 | | |
206 | 217 | | |
| |||
237 | 248 | | |
238 | 249 | | |
239 | 250 | | |
240 | | - | |
| 251 | + | |
241 | 252 | | |
242 | 253 | | |
243 | 254 | | |
| |||
Lines changed: 4 additions & 5 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
19 | 19 | | |
20 | 20 | | |
21 | 21 | | |
22 | | - | |
23 | | - | |
| 22 | + | |
24 | 23 | | |
25 | 24 | | |
26 | 25 | | |
| |||
109 | 108 | | |
110 | 109 | | |
111 | 110 | | |
112 | | - | |
| 111 | + | |
113 | 112 | | |
114 | 113 | | |
115 | 114 | | |
116 | | - | |
| 115 | + | |
117 | 116 | | |
118 | 117 | | |
119 | 118 | | |
120 | | - | |
| 119 | + | |
121 | 120 | | |
122 | 121 | | |
123 | 122 | | |
| |||
0 commit comments