[SPARK-49082][SQL] Support widening Date to TimestampNTZ in Avro reader #50315

aldenlau-db · 2025-03-19T01:43:58Z

What changes were proposed in this pull request?

This change adds support for widening type promotions from Date to TimestampNTZ in `AvroDeserializer. This PR is a follow-up to #47582 which adds support for other widening type promotions.

Why are the changes needed?

When reading Avro files with a mix of Date and TimestampNTZ for a given column, the reader should be able to read all files and promote Date to TimestampNTZ instead of throwing an error when reading files with Date.

Although SPARK-49082 was resolved by #47582, that PR did not include Date -> TimestampNTZ widening. The change in this PR is very similar to #44368 which adds support for Date -> TimestampNTZ widening for the Parquet reader.

Does this PR introduce any user-facing change?

Yes, users will no longer see an error when attempting to read a file containing Date when the read schema contains TimestampNTZ. The time will be set to 00:00, as has been done in #44368.

How was this patch tested?

New test in AvroSuite.

Was this patch authored or co-authored using generative AI tooling?

No

aldenlau-db · 2025-03-19T22:59:40Z

cc @johanl-db @sandip-db

aldenlau-db · 2025-03-19T23:07:21Z

connector/avro/src/test/scala/org/apache/spark/sql/avro/AvroSuite.scala

+          Row(LocalDateTime.of(-5877641, 6, 23, 0, 0)),
+          Row(LocalDateTime.of(5881580, 7, 11, 0, 0)))


@johanl-db These test cases fail with an ArithmeticException. However, this implementation is based on the Parquet implementation. I noticed that the Parquet reader will also fail when attempting to upcast any date earlier than -290308-12-22 BCE and later than +294247-01-10 CE with the same ArithmeticException due to long overflow.

I think this is because the Date Spark type supports dates from June 23 -5877641 CE to July 11 +5881580 CE, but TimestampNTZ supports -290308-12-21 BCE 19:59:06 to +294247-01-10 CE 04:00:54, which is a smaller range. In the context of type widening, shouldn't this widening be unsupported by all readers since Date stores a larger range of values than TimestampNTZ (even though it has less precision)?

That's not great..

I think for now we can document that limitation. Iceberg allows widening date -> timestamp without timezone and also calls this out: https://iceberg.apache.org/spec/#schema-evolution
Hopefully very few users are using date values outside -290308-12-21 BCE - +294247-01-10 CE

Moving forward, we'll probably want to address this by moving the error from read time to write time, i.e. check that all values in the table fit the timestamp range before applying the type change (which we should be able to tell from stats only in most cases).

Note: this is a Delta issue really, so not issue for this spark PR in itself

Makes sense, thanks. I have updated the test cases in this PR and added a comment documenting this behavior.

HyukjinKwon · 2025-03-24T03:37:15Z

Merged to master.

### What changes were proposed in this pull request? This change adds support for widening type promotions from `Date` to `TimestampNTZ` in `AvroDeserializer. This PR is a follow-up to apache#47582 which adds support for other widening type promotions. ### Why are the changes needed? When reading Avro files with a mix of Date and TimestampNTZ for a given column, the reader should be able to read all files and promote Date to TimestampNTZ instead of throwing an error when reading files with Date. Although [SPARK-49082](https://issues.apache.org/jira/browse/SPARK-49082) was resolved by apache#47582, that PR did not include Date -> TimestampNTZ widening. The change in this PR is very similar to apache#44368 which adds support for Date -> TimestampNTZ widening for the Parquet reader. ### Does this PR introduce _any_ user-facing change? Yes, users will no longer see an error when attempting to read a file containing Date when the read schema contains TimestampNTZ. The time will be set to 00:00, as has been done in apache#44368. ### How was this patch tested? New test in `AvroSuite`. ### Was this patch authored or co-authored using generative AI tooling? No Closes apache#50315 from aldenlau-db/SPARK-49082. Authored-by: Alden Lau <[email protected]> Signed-off-by: Hyukjin Kwon <[email protected]>

Initial commit

3ffab4b

github-actions bot added SQL AVRO labels Mar 19, 2025

aldenlau-db changed the title ~~Initial commit~~ [SPARK-49082][SQL] Support widening Date to TimestampNTZ in Avro reader Mar 19, 2025

aldenlau-db closed this Mar 19, 2025

aldenlau-db reopened this Mar 19, 2025

aldenlau-db changed the title ~~[SPARK-49082][SQL] Support widening Date to TimestampNTZ in Avro reader~~ [WIP][SPARK-49082][SQL] Support widening Date to TimestampNTZ in Avro reader Mar 19, 2025

aldenlau-db changed the title ~~[WIP][SPARK-49082][SQL] Support widening Date to TimestampNTZ in Avro reader~~ [SPARK-49082][SQL] Support widening Date to TimestampNTZ in Avro reader Mar 19, 2025

aldenlau-db commented Mar 19, 2025

View reviewed changes

sandip-db approved these changes Mar 20, 2025

View reviewed changes

aldenlau-db added 2 commits March 20, 2025 12:03

change test cases

082d67e

add comment

de45800

aldenlau-db requested a review from johanl-db March 20, 2025 19:13

johanl-db approved these changes Mar 21, 2025

View reviewed changes

HyukjinKwon approved these changes Mar 24, 2025

View reviewed changes

HyukjinKwon closed this in 39ad69a Mar 24, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SPARK-49082][SQL] Support widening Date to TimestampNTZ in Avro reader #50315

[SPARK-49082][SQL] Support widening Date to TimestampNTZ in Avro reader #50315

Uh oh!

aldenlau-db commented Mar 19, 2025 •

edited

Loading

Uh oh!

aldenlau-db commented Mar 19, 2025

Uh oh!

aldenlau-db Mar 19, 2025 •

edited

Loading

Uh oh!

johanl-db Mar 20, 2025 •

edited

Loading

Uh oh!

aldenlau-db Mar 20, 2025

Uh oh!

HyukjinKwon commented Mar 24, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

		Row(LocalDateTime.of(-5877641, 6, 23, 0, 0)),
		Row(LocalDateTime.of(5881580, 7, 11, 0, 0)))

[SPARK-49082][SQL] Support widening Date to TimestampNTZ in Avro reader #50315

[SPARK-49082][SQL] Support widening Date to TimestampNTZ in Avro reader #50315

Uh oh!

Conversation

aldenlau-db commented Mar 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

aldenlau-db commented Mar 19, 2025

Uh oh!

aldenlau-db Mar 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

johanl-db Mar 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

aldenlau-db Mar 20, 2025

Choose a reason for hiding this comment

Uh oh!

HyukjinKwon commented Mar 24, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

aldenlau-db commented Mar 19, 2025 •

edited

Loading

aldenlau-db Mar 19, 2025 •

edited

Loading

johanl-db Mar 20, 2025 •

edited

Loading