Commit b90c020
[SPARK-13922][SQL] Filter rows with null attributes in vectorized parquet reader
# What changes were proposed in this pull request?
It's common for many SQL operators to not care about reading `null` values for correctness. Currently, this is achieved by performing `isNotNull` checks (for all relevant columns) on a per-row basis. Pushing these null filters in the vectorized parquet reader should bring considerable benefits (especially for cases when the underlying data doesn't contain any nulls or contains all nulls).
## How was this patch tested?
Intel(R) Core(TM) i7-4960HQ CPU 2.60GHz
String with Nulls Scan (0%): Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative
-------------------------------------------------------------------------------------------
SQL Parquet Vectorized 1229 / 1648 8.5 117.2 1.0X
PR Vectorized 833 / 846 12.6 79.4 1.5X
PR Vectorized (Null Filtering) 732 / 782 14.3 69.8 1.7X
Intel(R) Core(TM) i7-4960HQ CPU 2.60GHz
String with Nulls Scan (50%): Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative
-------------------------------------------------------------------------------------------
SQL Parquet Vectorized 995 / 1053 10.5 94.9 1.0X
PR Vectorized 732 / 772 14.3 69.8 1.4X
PR Vectorized (Null Filtering) 725 / 790 14.5 69.1 1.4X
Intel(R) Core(TM) i7-4960HQ CPU 2.60GHz
String with Nulls Scan (95%): Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative
-------------------------------------------------------------------------------------------
SQL Parquet Vectorized 326 / 333 32.2 31.1 1.0X
PR Vectorized 190 / 200 55.1 18.2 1.7X
PR Vectorized (Null Filtering) 168 / 172 62.2 16.1 1.9X
Author: Sameer Agarwal <[email protected]>
Closes #11749 from sameeragarwal/perf-testing.1 parent 4ce2d24 commit b90c020
File tree
3 files changed
+146
-5
lines changed- sql/core/src
- main/java/org/apache/spark/sql/execution/vectorized
- test/scala/org/apache/spark/sql/execution
- datasources/parquet
- vectorized
3 files changed
+146
-5
lines changedLines changed: 27 additions & 5 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
16 | 16 | | |
17 | 17 | | |
18 | 18 | | |
19 | | - | |
20 | | - | |
21 | | - | |
| 19 | + | |
22 | 20 | | |
23 | 21 | | |
24 | 22 | | |
| |||
58 | 56 | | |
59 | 57 | | |
60 | 58 | | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
61 | 62 | | |
62 | 63 | | |
63 | 64 | | |
| |||
284 | 285 | | |
285 | 286 | | |
286 | 287 | | |
287 | | - | |
| 288 | + | |
| 289 | + | |
288 | 290 | | |
289 | 291 | | |
290 | 292 | | |
291 | 293 | | |
| 294 | + | |
| 295 | + | |
| 296 | + | |
| 297 | + | |
| 298 | + | |
| 299 | + | |
| 300 | + | |
| 301 | + | |
| 302 | + | |
| 303 | + | |
| 304 | + | |
292 | 305 | | |
293 | 306 | | |
294 | 307 | | |
| |||
345 | 358 | | |
346 | 359 | | |
347 | 360 | | |
348 | | - | |
| 361 | + | |
349 | 362 | | |
350 | 363 | | |
351 | 364 | | |
352 | 365 | | |
| 366 | + | |
| 367 | + | |
| 368 | + | |
| 369 | + | |
| 370 | + | |
| 371 | + | |
| 372 | + | |
| 373 | + | |
353 | 374 | | |
354 | 375 | | |
355 | 376 | | |
356 | 377 | | |
| 378 | + | |
357 | 379 | | |
358 | 380 | | |
359 | 381 | | |
| |||
Lines changed: 90 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
299 | 299 | | |
300 | 300 | | |
301 | 301 | | |
| 302 | + | |
| 303 | + | |
| 304 | + | |
| 305 | + | |
| 306 | + | |
| 307 | + | |
| 308 | + | |
| 309 | + | |
| 310 | + | |
| 311 | + | |
| 312 | + | |
| 313 | + | |
| 314 | + | |
| 315 | + | |
| 316 | + | |
| 317 | + | |
| 318 | + | |
| 319 | + | |
| 320 | + | |
| 321 | + | |
| 322 | + | |
| 323 | + | |
| 324 | + | |
| 325 | + | |
| 326 | + | |
| 327 | + | |
| 328 | + | |
| 329 | + | |
| 330 | + | |
| 331 | + | |
| 332 | + | |
| 333 | + | |
| 334 | + | |
| 335 | + | |
| 336 | + | |
| 337 | + | |
| 338 | + | |
| 339 | + | |
| 340 | + | |
| 341 | + | |
| 342 | + | |
| 343 | + | |
| 344 | + | |
| 345 | + | |
| 346 | + | |
| 347 | + | |
| 348 | + | |
| 349 | + | |
| 350 | + | |
| 351 | + | |
| 352 | + | |
| 353 | + | |
| 354 | + | |
| 355 | + | |
| 356 | + | |
| 357 | + | |
| 358 | + | |
| 359 | + | |
| 360 | + | |
| 361 | + | |
| 362 | + | |
| 363 | + | |
| 364 | + | |
| 365 | + | |
| 366 | + | |
| 367 | + | |
| 368 | + | |
| 369 | + | |
| 370 | + | |
| 371 | + | |
| 372 | + | |
| 373 | + | |
| 374 | + | |
| 375 | + | |
| 376 | + | |
| 377 | + | |
| 378 | + | |
| 379 | + | |
| 380 | + | |
| 381 | + | |
| 382 | + | |
| 383 | + | |
| 384 | + | |
| 385 | + | |
| 386 | + | |
| 387 | + | |
| 388 | + | |
302 | 389 | | |
303 | 390 | | |
304 | 391 | | |
305 | 392 | | |
306 | 393 | | |
| 394 | + | |
| 395 | + | |
| 396 | + | |
307 | 397 | | |
308 | 398 | | |
Lines changed: 29 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
727 | 727 | | |
728 | 728 | | |
729 | 729 | | |
| 730 | + | |
| 731 | + | |
| 732 | + | |
| 733 | + | |
| 734 | + | |
| 735 | + | |
| 736 | + | |
| 737 | + | |
| 738 | + | |
| 739 | + | |
| 740 | + | |
| 741 | + | |
| 742 | + | |
| 743 | + | |
| 744 | + | |
| 745 | + | |
| 746 | + | |
| 747 | + | |
| 748 | + | |
| 749 | + | |
| 750 | + | |
| 751 | + | |
| 752 | + | |
| 753 | + | |
| 754 | + | |
| 755 | + | |
| 756 | + | |
| 757 | + | |
| 758 | + | |
730 | 759 | | |
0 commit comments