Commit b09b7f7
committed
[SPARK-36034][SQL] Rebase datetime in pushed down filters to parquet
### What changes were proposed in this pull request?
In the PR, I propose to propagate either the SQL config `spark.sql.parquet.datetimeRebaseModeInRead` or/and Parquet option `datetimeRebaseMode` to `ParquetFilters`. The `ParquetFilters` class uses the settings in conversions of dates/timestamps instances from datasource filters to values pushed via `FilterApi` to the `parquet-column` lib.
Before the changes, date/timestamp values expressed as days/microseconds/milliseconds are interpreted as offsets in Proleptic Gregorian calendar, and pushed to the parquet library as is. That works fine if timestamp/dates values in parquet files were saved in the `CORRECTED` mode but in the `LEGACY` mode, filter's values could not match to actual values.
After the changes, timestamp/dates values of filters pushed down to parquet libs such as `FilterApi.eq(col1, -719162)` are rebased according the rebase settings. For the example, if the rebase mode is `CORRECTED`, **-719162** is pushed down as is but if the current rebase mode is `LEGACY`, the number of days is rebased to **-719164**. For more context, the PR description #28067 shows the diffs between two calendars.
### Why are the changes needed?
The changes fix the bug portrayed by the following example from SPARK-36034:
```scala
In [27]: spark.conf.set("spark.sql.legacy.parquet.datetimeRebaseModeInWrite", "LEGACY")
>>> spark.sql("SELECT DATE '0001-01-01' AS date").write.mode("overwrite").parquet("date_written_by_spark3_legacy")
>>> spark.read.parquet("date_written_by_spark3_legacy").where("date = '0001-01-01'").show()
+----+
|date|
+----+
+----+
```
The result must have the date value `0001-01-01`.
### Does this PR introduce _any_ user-facing change?
In some sense, yes. Query results can be different in some cases. For the example above:
```scala
scala> spark.conf.set("spark.sql.parquet.datetimeRebaseModeInWrite", "LEGACY")
scala> spark.sql("SELECT DATE '0001-01-01' AS date").write.mode("overwrite").parquet("date_written_by_spark3_legacy")
scala> spark.read.parquet("date_written_by_spark3_legacy").where("date = '0001-01-01'").show(false)
+----------+
|date |
+----------+
|0001-01-01|
+----------+
```
### How was this patch tested?
By running the modified test suite `ParquetFilterSuite`:
```
$ build/sbt "test:testOnly *ParquetV1FilterSuite"
$ build/sbt "test:testOnly *ParquetV2FilterSuite"
```
Closes #33347 from MaxGekk/fix-parquet-ts-filter-pushdown.
Authored-by: Max Gekk <[email protected]>
Signed-off-by: Max Gekk <[email protected]>1 parent c8a3c22 commit b09b7f7
File tree
5 files changed
+145
-96
lines changed- sql/core/src
- main/scala/org/apache/spark/sql/execution/datasources
- parquet
- v2/parquet
- test/scala/org/apache/spark/sql/execution/datasources/parquet
5 files changed
+145
-96
lines changedsql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala
Lines changed: 12 additions & 5 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
266 | 266 | | |
267 | 267 | | |
268 | 268 | | |
| 269 | + | |
| 270 | + | |
| 271 | + | |
269 | 272 | | |
270 | 273 | | |
271 | 274 | | |
272 | | - | |
273 | | - | |
| 275 | + | |
| 276 | + | |
| 277 | + | |
| 278 | + | |
| 279 | + | |
| 280 | + | |
| 281 | + | |
| 282 | + | |
| 283 | + | |
274 | 284 | | |
275 | 285 | | |
276 | 286 | | |
| |||
296 | 306 | | |
297 | 307 | | |
298 | 308 | | |
299 | | - | |
300 | | - | |
301 | | - | |
302 | 309 | | |
303 | 310 | | |
304 | 311 | | |
| |||
Lines changed: 22 additions & 7 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
35 | 35 | | |
36 | 36 | | |
37 | 37 | | |
| 38 | + | |
| 39 | + | |
38 | 40 | | |
39 | 41 | | |
40 | 42 | | |
| |||
48 | 50 | | |
49 | 51 | | |
50 | 52 | | |
51 | | - | |
| 53 | + | |
| 54 | + | |
52 | 55 | | |
53 | 56 | | |
54 | 57 | | |
| |||
129 | 132 | | |
130 | 133 | | |
131 | 134 | | |
132 | | - | |
133 | | - | |
134 | | - | |
| 135 | + | |
| 136 | + | |
| 137 | + | |
| 138 | + | |
| 139 | + | |
| 140 | + | |
| 141 | + | |
| 142 | + | |
| 143 | + | |
135 | 144 | | |
136 | 145 | | |
137 | | - | |
138 | | - | |
139 | | - | |
| 146 | + | |
| 147 | + | |
| 148 | + | |
| 149 | + | |
| 150 | + | |
| 151 | + | |
| 152 | + | |
| 153 | + | |
| 154 | + | |
140 | 155 | | |
141 | 156 | | |
142 | 157 | | |
| |||
Lines changed: 12 additions & 5 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
133 | 133 | | |
134 | 134 | | |
135 | 135 | | |
| 136 | + | |
| 137 | + | |
| 138 | + | |
136 | 139 | | |
137 | 140 | | |
138 | 141 | | |
139 | | - | |
140 | | - | |
| 142 | + | |
| 143 | + | |
| 144 | + | |
| 145 | + | |
| 146 | + | |
| 147 | + | |
| 148 | + | |
| 149 | + | |
| 150 | + | |
141 | 151 | | |
142 | 152 | | |
143 | 153 | | |
| |||
170 | 180 | | |
171 | 181 | | |
172 | 182 | | |
173 | | - | |
174 | | - | |
175 | | - | |
176 | 183 | | |
177 | 184 | | |
178 | 185 | | |
| |||
Lines changed: 12 additions & 2 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
24 | 24 | | |
25 | 25 | | |
26 | 26 | | |
| 27 | + | |
27 | 28 | | |
28 | 29 | | |
29 | 30 | | |
| |||
51 | 52 | | |
52 | 53 | | |
53 | 54 | | |
54 | | - | |
55 | | - | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
56 | 66 | | |
57 | 67 | | |
58 | 68 | | |
| |||
Lines changed: 87 additions & 77 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
45 | 45 | | |
46 | 46 | | |
47 | 47 | | |
48 | | - | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
49 | 51 | | |
50 | 52 | | |
51 | 53 | | |
| |||
73 | 75 | | |
74 | 76 | | |
75 | 77 | | |
76 | | - | |
| 78 | + | |
| 79 | + | |
| 80 | + | |
77 | 81 | | |
78 | 82 | | |
79 | 83 | | |
80 | | - | |
| 84 | + | |
| 85 | + | |
81 | 86 | | |
82 | 87 | | |
83 | 88 | | |
| |||
587 | 592 | | |
588 | 593 | | |
589 | 594 | | |
590 | | - | |
| 595 | + | |
591 | 596 | | |
592 | 597 | | |
593 | 598 | | |
594 | | - | |
595 | | - | |
596 | | - | |
597 | | - | |
598 | | - | |
599 | | - | |
600 | | - | |
601 | | - | |
602 | | - | |
603 | | - | |
604 | | - | |
605 | | - | |
606 | | - | |
607 | | - | |
608 | | - | |
609 | | - | |
610 | | - | |
611 | | - | |
612 | | - | |
613 | | - | |
614 | | - | |
615 | | - | |
616 | | - | |
617 | | - | |
618 | | - | |
619 | | - | |
620 | | - | |
621 | | - | |
622 | | - | |
623 | | - | |
624 | | - | |
625 | | - | |
626 | | - | |
627 | | - | |
628 | | - | |
629 | | - | |
630 | | - | |
631 | | - | |
632 | | - | |
633 | | - | |
634 | | - | |
635 | | - | |
636 | | - | |
637 | | - | |
638 | | - | |
639 | | - | |
640 | | - | |
641 | | - | |
642 | | - | |
643 | | - | |
644 | | - | |
645 | | - | |
| 599 | + | |
| 600 | + | |
| 601 | + | |
| 602 | + | |
| 603 | + | |
| 604 | + | |
| 605 | + | |
| 606 | + | |
| 607 | + | |
| 608 | + | |
| 609 | + | |
| 610 | + | |
646 | 611 | | |
647 | | - | |
648 | | - | |
649 | | - | |
650 | | - | |
651 | | - | |
652 | | - | |
653 | | - | |
654 | | - | |
| 612 | + | |
| 613 | + | |
| 614 | + | |
| 615 | + | |
| 616 | + | |
| 617 | + | |
| 618 | + | |
| 619 | + | |
| 620 | + | |
| 621 | + | |
| 622 | + | |
| 623 | + | |
| 624 | + | |
| 625 | + | |
| 626 | + | |
| 627 | + | |
| 628 | + | |
| 629 | + | |
| 630 | + | |
| 631 | + | |
| 632 | + | |
| 633 | + | |
| 634 | + | |
| 635 | + | |
| 636 | + | |
| 637 | + | |
| 638 | + | |
| 639 | + | |
| 640 | + | |
| 641 | + | |
| 642 | + | |
| 643 | + | |
| 644 | + | |
| 645 | + | |
| 646 | + | |
| 647 | + | |
| 648 | + | |
| 649 | + | |
| 650 | + | |
| 651 | + | |
| 652 | + | |
| 653 | + | |
| 654 | + | |
| 655 | + | |
| 656 | + | |
| 657 | + | |
| 658 | + | |
| 659 | + | |
| 660 | + | |
| 661 | + | |
| 662 | + | |
| 663 | + | |
655 | 664 | | |
656 | 665 | | |
657 | 666 | | |
| |||
661 | 670 | | |
662 | 671 | | |
663 | 672 | | |
664 | | - | |
665 | | - | |
666 | | - | |
667 | | - | |
668 | | - | |
| 673 | + | |
669 | 674 | | |
670 | 675 | | |
671 | 676 | | |
672 | 677 | | |
673 | 678 | | |
674 | | - | |
675 | | - | |
| 679 | + | |
| 680 | + | |
| 681 | + | |
| 682 | + | |
676 | 683 | | |
677 | 684 | | |
678 | 685 | | |
679 | | - | |
680 | 686 | | |
681 | 687 | | |
682 | 688 | | |
683 | 689 | | |
684 | 690 | | |
685 | | - | |
686 | | - | |
| 691 | + | |
| 692 | + | |
| 693 | + | |
| 694 | + | |
687 | 695 | | |
688 | 696 | | |
689 | 697 | | |
690 | | - | |
691 | | - | |
692 | | - | |
| 698 | + | |
| 699 | + | |
| 700 | + | |
| 701 | + | |
| 702 | + | |
693 | 703 | | |
694 | 704 | | |
695 | 705 | | |
| |||
0 commit comments