Commit e41ecef
[SPARK-36034][SQL][3.0] Rebase datetime in pushed down filters to parquet
### What changes were proposed in this pull request?
In the PR, I propose to propagate either the SQL config `spark.sql.parquet.datetimeRebaseModeInRead` or/and Parquet option `datetimeRebaseMode` to `ParquetFilters`. The `ParquetFilters` class uses the settings in conversions of dates/timestamps instances from datasource filters to values pushed via `FilterApi` to the `parquet-column` lib.
Before the changes, date/timestamp values expressed as days/microseconds/milliseconds are interpreted as offsets in Proleptic Gregorian calendar, and pushed to the parquet library as is. That works fine if timestamp/dates values in parquet files were saved in the `CORRECTED` mode but in the `LEGACY` mode, filter's values could not match to actual values.
After the changes, timestamp/dates values of filters pushed down to parquet libs such as `FilterApi.eq(col1, -719162)` are rebased according the rebase settings. For the example, if the rebase mode is `CORRECTED`, **-719162** is pushed down as is but if the current rebase mode is `LEGACY`, the number of days is rebased to **-719164**. For more context, the PR description #28067 shows the diffs between two calendars.
### Why are the changes needed?
The changes fix the bug portrayed by the following example from SPARK-36034:
```scala
In [27]: spark.conf.set("spark.sql.legacy.parquet.datetimeRebaseModeInWrite", "LEGACY")
>>> spark.sql("SELECT DATE '0001-01-01' AS date").write.mode("overwrite").parquet("date_written_by_spark3_legacy")
>>> spark.read.parquet("date_written_by_spark3_legacy").where("date = '0001-01-01'").show()
+----+
|date|
+----+
+----+
```
The result must have the date value `0001-01-01`.
### Does this PR introduce _any_ user-facing change?
In some sense, yes. Query results can be different in some cases. For the example above:
```scala
scala> spark.conf.set("spark.sql.parquet.datetimeRebaseModeInWrite", "LEGACY")
scala> spark.sql("SELECT DATE '0001-01-01' AS date").write.mode("overwrite").parquet("date_written_by_spark3_legacy")
scala> spark.read.parquet("date_written_by_spark3_legacy").where("date = '0001-01-01'").show(false)
+----------+
|date |
+----------+
|0001-01-01|
+----------+
```
### How was this patch tested?
By running the modified test suite `ParquetFilterSuite`:
```
$ build/sbt "test:testOnly *ParquetV1FilterSuite"
$ build/sbt "test:testOnly *ParquetV2FilterSuite"
```
Authored-by: Max Gekk <max.gekkgmail.com>
Signed-off-by: Hyukjin Kwon <gurwls223apache.org>
(cherry picked from commit b09b7f7)
(cherry picked from commit ba71172)
Closes #33387 from MaxGekk/fix-parquet-ts-filter-pushdown-3.0.
Authored-by: Max Gekk <[email protected]>
Signed-off-by: Hyukjin Kwon <[email protected]>1 parent 86a9059 commit e41ecef
File tree
5 files changed
+134
-86
lines changed- sql/core/src
- main/scala/org/apache/spark/sql/execution/datasources
- parquet
- v2/parquet
- test/scala/org/apache/spark/sql/execution/datasources/parquet
5 files changed
+134
-86
lines changedsql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala
Lines changed: 12 additions & 6 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
270 | 270 | | |
271 | 271 | | |
272 | 272 | | |
| 273 | + | |
| 274 | + | |
| 275 | + | |
273 | 276 | | |
274 | 277 | | |
275 | 278 | | |
276 | | - | |
277 | | - | |
| 279 | + | |
| 280 | + | |
| 281 | + | |
| 282 | + | |
| 283 | + | |
| 284 | + | |
| 285 | + | |
| 286 | + | |
| 287 | + | |
278 | 288 | | |
279 | 289 | | |
280 | 290 | | |
| |||
300 | 310 | | |
301 | 311 | | |
302 | 312 | | |
303 | | - | |
304 | | - | |
305 | | - | |
306 | | - | |
307 | 313 | | |
308 | 314 | | |
309 | 315 | | |
| |||
Lines changed: 22 additions & 7 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
35 | 35 | | |
36 | 36 | | |
37 | 37 | | |
| 38 | + | |
| 39 | + | |
38 | 40 | | |
39 | 41 | | |
40 | 42 | | |
| |||
48 | 50 | | |
49 | 51 | | |
50 | 52 | | |
51 | | - | |
| 53 | + | |
| 54 | + | |
52 | 55 | | |
53 | 56 | | |
54 | 57 | | |
| |||
124 | 127 | | |
125 | 128 | | |
126 | 129 | | |
127 | | - | |
128 | | - | |
129 | | - | |
| 130 | + | |
| 131 | + | |
| 132 | + | |
| 133 | + | |
| 134 | + | |
| 135 | + | |
| 136 | + | |
| 137 | + | |
| 138 | + | |
130 | 139 | | |
131 | 140 | | |
132 | | - | |
133 | | - | |
134 | | - | |
| 141 | + | |
| 142 | + | |
| 143 | + | |
| 144 | + | |
| 145 | + | |
| 146 | + | |
| 147 | + | |
| 148 | + | |
| 149 | + | |
135 | 150 | | |
136 | 151 | | |
137 | 152 | | |
| |||
Lines changed: 12 additions & 5 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
134 | 134 | | |
135 | 135 | | |
136 | 136 | | |
| 137 | + | |
| 138 | + | |
| 139 | + | |
137 | 140 | | |
138 | 141 | | |
139 | 142 | | |
140 | | - | |
141 | | - | |
| 143 | + | |
| 144 | + | |
| 145 | + | |
| 146 | + | |
| 147 | + | |
| 148 | + | |
| 149 | + | |
| 150 | + | |
| 151 | + | |
142 | 152 | | |
143 | 153 | | |
144 | 154 | | |
| |||
171 | 181 | | |
172 | 182 | | |
173 | 183 | | |
174 | | - | |
175 | | - | |
176 | | - | |
177 | 184 | | |
178 | 185 | | |
179 | 186 | | |
| |||
Lines changed: 12 additions & 2 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
24 | 24 | | |
25 | 25 | | |
26 | 26 | | |
| 27 | + | |
27 | 28 | | |
28 | 29 | | |
29 | 30 | | |
| |||
51 | 52 | | |
52 | 53 | | |
53 | 54 | | |
54 | | - | |
55 | | - | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
56 | 66 | | |
57 | 67 | | |
58 | 68 | | |
| |||
Lines changed: 76 additions & 66 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
42 | 42 | | |
43 | 43 | | |
44 | 44 | | |
45 | | - | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
46 | 48 | | |
47 | 49 | | |
48 | 50 | | |
| |||
70 | 72 | | |
71 | 73 | | |
72 | 74 | | |
73 | | - | |
| 75 | + | |
| 76 | + | |
| 77 | + | |
74 | 78 | | |
75 | 79 | | |
76 | 80 | | |
77 | | - | |
| 81 | + | |
| 82 | + | |
78 | 83 | | |
79 | 84 | | |
80 | 85 | | |
| |||
548 | 553 | | |
549 | 554 | | |
550 | 555 | | |
551 | | - | |
| 556 | + | |
552 | 557 | | |
553 | 558 | | |
554 | 559 | | |
555 | | - | |
556 | | - | |
557 | | - | |
558 | | - | |
| 560 | + | |
| 561 | + | |
| 562 | + | |
| 563 | + | |
| 564 | + | |
| 565 | + | |
| 566 | + | |
| 567 | + | |
| 568 | + | |
| 569 | + | |
| 570 | + | |
| 571 | + | |
559 | 572 | | |
560 | | - | |
561 | | - | |
562 | | - | |
| 573 | + | |
| 574 | + | |
| 575 | + | |
| 576 | + | |
| 577 | + | |
| 578 | + | |
| 579 | + | |
| 580 | + | |
| 581 | + | |
| 582 | + | |
| 583 | + | |
| 584 | + | |
| 585 | + | |
| 586 | + | |
| 587 | + | |
| 588 | + | |
| 589 | + | |
| 590 | + | |
| 591 | + | |
| 592 | + | |
| 593 | + | |
| 594 | + | |
| 595 | + | |
| 596 | + | |
| 597 | + | |
| 598 | + | |
| 599 | + | |
| 600 | + | |
| 601 | + | |
| 602 | + | |
| 603 | + | |
| 604 | + | |
| 605 | + | |
| 606 | + | |
| 607 | + | |
| 608 | + | |
| 609 | + | |
| 610 | + | |
| 611 | + | |
| 612 | + | |
| 613 | + | |
| 614 | + | |
563 | 615 | | |
564 | | - | |
565 | | - | |
566 | | - | |
567 | | - | |
568 | | - | |
569 | | - | |
570 | | - | |
571 | | - | |
572 | | - | |
573 | | - | |
574 | | - | |
575 | | - | |
576 | | - | |
577 | | - | |
578 | | - | |
579 | | - | |
580 | | - | |
581 | | - | |
582 | | - | |
583 | | - | |
584 | | - | |
585 | | - | |
586 | | - | |
587 | | - | |
588 | | - | |
589 | | - | |
590 | | - | |
591 | | - | |
592 | | - | |
593 | | - | |
594 | | - | |
595 | | - | |
596 | | - | |
597 | | - | |
598 | | - | |
599 | | - | |
600 | | - | |
601 | | - | |
602 | | - | |
603 | | - | |
604 | | - | |
605 | | - | |
606 | | - | |
607 | 616 | | |
608 | 617 | | |
609 | 618 | | |
610 | 619 | | |
611 | 620 | | |
612 | 621 | | |
613 | 622 | | |
614 | | - | |
615 | | - | |
616 | | - | |
617 | | - | |
| 623 | + | |
618 | 624 | | |
619 | 625 | | |
620 | 626 | | |
621 | 627 | | |
622 | 628 | | |
623 | | - | |
624 | | - | |
| 629 | + | |
| 630 | + | |
| 631 | + | |
| 632 | + | |
625 | 633 | | |
626 | 634 | | |
627 | 635 | | |
628 | | - | |
629 | 636 | | |
630 | 637 | | |
631 | 638 | | |
632 | 639 | | |
633 | 640 | | |
634 | | - | |
635 | | - | |
| 641 | + | |
| 642 | + | |
| 643 | + | |
| 644 | + | |
636 | 645 | | |
637 | 646 | | |
638 | 647 | | |
639 | | - | |
640 | | - | |
641 | | - | |
| 648 | + | |
| 649 | + | |
| 650 | + | |
| 651 | + | |
642 | 652 | | |
643 | 653 | | |
644 | 654 | | |
| |||
0 commit comments