[SPARK-40215][SQL] Add SQL configs to control CSV/JSON date and timestamp parsing behavior #37653

sadikovi · 2022-08-25T05:56:10Z

What changes were proposed in this pull request?

This is a follow-up for SPARK-39731 and PR #37147.

I found that it could be problematic to change spark.sql.legacy.timeParserPolicy to LEGACY when inferring dates and timestamps in CSV and JSON. Sometimes it is beneficial to have the time parser policy as CORRECTED but still use a more lenient date and timestamp inference (or when migrating to a newer Spark version).

I added two separate configs that control this behavior:

spark.sql.legacy.csv.enableDateTimeParsingFallback
spark.sql.legacy.json.enableDateTimeParsingFallback

When the configs are set to true, the legacy time parsing behaviour is enabled (pre Spark 3.0).

With this PR, the precedence order is as follows for CSV (similar for JSON):

data source option enableDateTimeParsingFallback
if that is not set, check spark.sql.legacy.{csv,json}.enableDateTimeParsingFallback
if that is not set, check spark.sql.legacy.timeParserPolicy and whether or not a custom format is used.

Why are the changes needed?

The change makes it easier for users to migrate to a newer Spark version without changing global config spark.sql.legacy.timeParserPolicy. Also, allows to enable legacy parsing for CSV and JSON separately without changing the code or the global time parser config.

Does this PR introduce any user-facing change?

No, simply adds an ability to change the behaviour specifically for CSV or JSON.

How was this patch tested?

I added a unit test for CSV and JSON to verify the flag.

sadikovi · 2022-08-25T06:00:29Z

@HyukjinKwon @MaxGekk Could you review this PR? Thank you.

LuciferYang

+1, LGTM

MaxGekk

I wonder what is supposed life time of the SQL configs spark.sql.*.enableDateTimeParsingFallback? Should we place them in the spark.sql.legacy namespace similar to spark.sql.legacy.timeParserPolicy.

sadikovi · 2022-08-26T04:51:47Z

I just wanted to keep the option name short have a custom build already with those configs but I can move them under legacy, e.g.

spark.sql.legacy.csv.enableDateTimeParsingFallback
spark.sql.legacy.json.enableDateTimeParsingFallback

sadikovi · 2022-08-26T04:58:45Z

sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala


  def avroFilterPushDown: Boolean = getConf(AVRO_FILTER_PUSHDOWN_ENABLED)

+  def jsonEnableDateTimeParsingFallback: Option[Boolean] =


I decided not to add "legacy" prefix in the method as it would make the method name very long 🙂.

sadikovi · 2022-08-26T04:59:27Z

@MaxGekk I addressed your comment. Would you be able to review again? Thanks.

MaxGekk · 2022-08-26T07:22:20Z

+1, LGTM. Merging to master.
Thank you, @sadikovi and @LuciferYang @HyukjinKwon for review.

sadikovi · 2022-08-26T07:39:09Z

Thank you, @MaxGekk!

add configs

7260e2f

github-actions bot added the SQL label Aug 25, 2022

sadikovi changed the title ~~[SPARK-40215][SQL] Add SQL configs to control CSV/JSON date and timestamp parsing behaviour~~ [SPARK-40215][SQL] Add SQL configs to control CSV/JSON date and timestamp parsing behavior Aug 25, 2022

change config docs

4bffcc4

HyukjinKwon approved these changes Aug 25, 2022

View reviewed changes

LuciferYang approved these changes Aug 25, 2022

View reviewed changes

MaxGekk reviewed Aug 25, 2022

View reviewed changes

move to spark.sql.legacy namespace

48f3204

sadikovi commented Aug 26, 2022

View reviewed changes

MaxGekk approved these changes Aug 26, 2022

View reviewed changes

MaxGekk closed this in 630aa0b Aug 26, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SPARK-40215][SQL] Add SQL configs to control CSV/JSON date and timestamp parsing behavior #37653

[SPARK-40215][SQL] Add SQL configs to control CSV/JSON date and timestamp parsing behavior #37653

Uh oh!

sadikovi commented Aug 25, 2022 •

edited

Loading

Uh oh!

sadikovi commented Aug 25, 2022

Uh oh!

LuciferYang left a comment

Uh oh!

MaxGekk left a comment

Uh oh!

sadikovi commented Aug 26, 2022

Uh oh!

sadikovi Aug 26, 2022 •

edited

Loading

Uh oh!

sadikovi commented Aug 26, 2022

Uh oh!

MaxGekk commented Aug 26, 2022

Uh oh!

sadikovi commented Aug 26, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants


		def avroFilterPushDown: Boolean = getConf(AVRO_FILTER_PUSHDOWN_ENABLED)

		def jsonEnableDateTimeParsingFallback: Option[Boolean] =

[SPARK-40215][SQL] Add SQL configs to control CSV/JSON date and timestamp parsing behavior #37653

[SPARK-40215][SQL] Add SQL configs to control CSV/JSON date and timestamp parsing behavior #37653

Uh oh!

Conversation

sadikovi commented Aug 25, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

sadikovi commented Aug 25, 2022

Uh oh!

LuciferYang left a comment

Choose a reason for hiding this comment

Uh oh!

MaxGekk left a comment

Choose a reason for hiding this comment

Uh oh!

sadikovi commented Aug 26, 2022

Uh oh!

sadikovi Aug 26, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sadikovi commented Aug 26, 2022

Uh oh!

MaxGekk commented Aug 26, 2022

Uh oh!

sadikovi commented Aug 26, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

sadikovi commented Aug 25, 2022 •

edited

Loading

sadikovi Aug 26, 2022 •

edited

Loading