Skip to content

Conversation

@gengliangwang
Copy link
Member

What changes were proposed in this pull request?

When setting spark.sql.legacy.timeParserPolicy=LEGACY, Spark will use the LegacyFastTimestampFormatter to infer potential timestamp columns. The inference shouldn't throw exception.

However, when the input is 23012150952, there is exception:


For input string: "23012150952"

java.lang.NumberFormatException: For input string: "23012150952"

at java.base/java.lang.NumberFormatException.forInputString(NumberFormatException.java:67)

at java.base/java.lang.Integer.parseInt(Integer.java:668)

at java.base/java.lang.Integer.parseInt(Integer.java:786)

at org.apache.commons.lang3.time.FastDateParser$NumberStrategy.parse(FastDateParser.java:304)

at org.apache.commons.lang3.time.FastDateParser.parse(FastDateParser.java:1045)

at org.apache.commons.lang3.time.FastDateFormat.parse(FastDateFormat.java:651)

at org.apache.spark.sql.catalyst.util.LegacyFastTimestampFormatter.parseOptional(TimestampFormatter.scala:418)

This PR is to fix the issue.

Why are the changes needed?

Bug fix, Timestamp inference should not throw exception

Does this PR introduce any user-facing change?

NO

How was this patch tested?

New test case + existing tests

Was this patch authored or co-authored using generative AI tooling?

No

@gengliangwang
Copy link
Member Author

cc @Hisoka-X as well

Copy link
Contributor

@cloud-fan cloud-fan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch!

Copy link
Member

@Hisoka-X Hisoka-X left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@gengliangwang
Copy link
Member Author

Merging to master/3.5

gengliangwang added a commit that referenced this pull request Dec 14, 2023
### What changes were proposed in this pull request?

When setting `spark.sql.legacy.timeParserPolicy=LEGACY`, Spark will use the LegacyFastTimestampFormatter to infer potential timestamp columns. The inference shouldn't throw exception.

However, when the input is 23012150952, there is exception:

```

For input string: "23012150952"

java.lang.NumberFormatException: For input string: "23012150952"

at java.base/java.lang.NumberFormatException.forInputString(NumberFormatException.java:67)

at java.base/java.lang.Integer.parseInt(Integer.java:668)

at java.base/java.lang.Integer.parseInt(Integer.java:786)

at org.apache.commons.lang3.time.FastDateParser$NumberStrategy.parse(FastDateParser.java:304)

at org.apache.commons.lang3.time.FastDateParser.parse(FastDateParser.java:1045)

at org.apache.commons.lang3.time.FastDateFormat.parse(FastDateFormat.java:651)

at org.apache.spark.sql.catalyst.util.LegacyFastTimestampFormatter.parseOptional(TimestampFormatter.scala:418)

```

This PR is to fix the issue.

### Why are the changes needed?

Bug fix, Timestamp inference should not throw exception
### Does this PR introduce _any_ user-facing change?

NO

### How was this patch tested?

New test case + existing tests

### Was this patch authored or co-authored using generative AI tooling?

No

Closes #44338 from gengliangwang/fixParseOptional.

Authored-by: Gengliang Wang <[email protected]>
Signed-off-by: Gengliang Wang <[email protected]>
(cherry picked from commit 4a79ae9)
Signed-off-by: Gengliang Wang <[email protected]>
Copy link
Contributor

@beliefer beliefer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Late LGTM.

turboFei pushed a commit to turboFei/spark that referenced this pull request Nov 6, 2025
…ache#363)

### What changes were proposed in this pull request?

When setting `spark.sql.legacy.timeParserPolicy=LEGACY`, Spark will use the LegacyFastTimestampFormatter to infer potential timestamp columns. The inference shouldn't throw exception.

However, when the input is 23012150952, there is exception:

```

For input string: "23012150952"

java.lang.NumberFormatException: For input string: "23012150952"

at java.base/java.lang.NumberFormatException.forInputString(NumberFormatException.java:67)

at java.base/java.lang.Integer.parseInt(Integer.java:668)

at java.base/java.lang.Integer.parseInt(Integer.java:786)

at org.apache.commons.lang3.time.FastDateParser$NumberStrategy.parse(FastDateParser.java:304)

at org.apache.commons.lang3.time.FastDateParser.parse(FastDateParser.java:1045)

at org.apache.commons.lang3.time.FastDateFormat.parse(FastDateFormat.java:651)

at org.apache.spark.sql.catalyst.util.LegacyFastTimestampFormatter.parseOptional(TimestampFormatter.scala:418)

```

This PR is to fix the issue.

### Why are the changes needed?

Bug fix, Timestamp inference should not throw exception
### Does this PR introduce _any_ user-facing change?

NO

### How was this patch tested?

New test case + existing tests

### Was this patch authored or co-authored using generative AI tooling?

No

Closes apache#44338 from gengliangwang/fixParseOptional.

Authored-by: Gengliang Wang <[email protected]>

(cherry picked from commit 4a79ae9)

Signed-off-by: Gengliang Wang <[email protected]>
Co-authored-by: Gengliang Wang <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants