You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[SPARK-46466][SQL][3.5] Vectorized parquet reader should never do rebase for timestamp ntz
backport #44428
### What changes were proposed in this pull request?
This fixes a correctness bug. The TIMESTAMP_NTZ is a new data type in Spark and has no legacy files that need to do calendar rebase. However, the vectorized parquet reader treat it the same as LTZ and may do rebase if the parquet file was written with the legacy rebase mode. This PR fixes it to never do rebase for NTZ.
### Why are the changes needed?
bug fix
### Does this PR introduce _any_ user-facing change?
Yes, now we can correctly write and read back NTZ value even if the date is before 1582.
### How was this patch tested?
new test
### Was this patch authored or co-authored using generative AI tooling?
No
Closes#44446 from cloud-fan/ntz2.
Authored-by: Wenchen Fan <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
Copy file name to clipboardExpand all lines: sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/ParquetVectorUpdaterFactory.java
+17-14Lines changed: 17 additions & 14 deletions
Original file line number
Diff line number
Diff line change
@@ -109,24 +109,32 @@ public ParquetVectorUpdater getUpdater(ColumnDescriptor descriptor, DataType spa
109
109
// For unsigned int64, it stores as plain signed int64 in Parquet when dictionary
0 commit comments