Skip to content

Conversation

@MaxGekk
Copy link
Member

@MaxGekk MaxGekk commented Feb 15, 2024

What changes were proposed in this pull request?

In the PR, I propose to add one more field to keys of supportedFormat in IntervalUtils because current implementation has duplicate keys that overwrites each other. For instance, the following keys are the same:

(YM.YEAR, YM.MONTH)
...
(DT.DAY, DT.HOUR)

because YM.YEAR = DT.DAY = 0 and YM.MONTH = DT.HOUR = 1.

Why are the changes needed?

To fix the incorrect error message when Spark cannot parse ANSI interval string. For example, the expected format should be some year-month format but Spark outputs day-time one:

spark-sql (default)> select interval '-\t2-2\t' year to month;

Interval string does not match year-month format of `[+|-]d h`, `INTERVAL [+|-]'[+|-]d h' DAY TO HOUR` when cast to interval year to month: -	2-2	. (line 1, pos 16)

== SQL ==
select interval '-\t2-2\t' year to month
----------------^^^

Does this PR introduce any user-facing change?

Yes.

How was this patch tested?

By running the existing test suite:

$ build/sbt "test:testOnly *IntervalUtilsSuite"

and regenerating the golden files:

$ SPARK_GENERATE_GOLDEN_FILES=1 PYSPARK_PYTHON=python3 build/sbt "sql/testOnly org.apache.spark.sql.SQLQueryTestSuite"

Was this patch authored or co-authored using generative AI tooling?

No.

@github-actions github-actions bot added the SQL label Feb 15, 2024
@MaxGekk MaxGekk changed the title [WIP][SQL] Fix supported interval formats in error messages [SPARK-47072][SQL] Fix supported interval formats in error messages Feb 16, 2024
@MaxGekk MaxGekk marked this pull request as ready for review February 16, 2024 07:55
@MaxGekk
Copy link
Member Author

MaxGekk commented Feb 16, 2024

@AngersZhuuuu @cloud-fan @HyukjinKwon Could you review this bug fix, please.

@MaxGekk
Copy link
Member Author

MaxGekk commented Feb 16, 2024

Merging to master. Thank you, @cloud-fan for review.

@MaxGekk MaxGekk closed this in 074fcf2 Feb 16, 2024
MaxGekk added a commit to MaxGekk/spark that referenced this pull request Feb 16, 2024
In the PR, I propose to add one more field to keys of `supportedFormat` in `IntervalUtils` because current implementation has duplicate keys that overwrites each other. For instance, the following keys are the same:
```
(YM.YEAR, YM.MONTH)
...
(DT.DAY, DT.HOUR)
```
because `YM.YEAR = DT.DAY = 0` and `YM.MONTH = DT.HOUR = 1`.

To fix the incorrect error message when Spark cannot parse ANSI interval string. For example, the expected format should be some year-month format but Spark outputs day-time one:
```sql
spark-sql (default)> select interval '-\t2-2\t' year to month;

Interval string does not match year-month format of `[+|-]d h`, `INTERVAL [+|-]'[+|-]d h' DAY TO HOUR` when cast to interval year to month: -	2-2	. (line 1, pos 16)

== SQL ==
select interval '-\t2-2\t' year to month
----------------^^^
```

Yes.

By running the existing test suite:
```
$ build/sbt "test:testOnly *IntervalUtilsSuite"
```
and regenerating the golden files:
```
$ SPARK_GENERATE_GOLDEN_FILES=1 PYSPARK_PYTHON=python3 build/sbt "sql/testOnly org.apache.spark.sql.SQLQueryTestSuite"
```

No.

Closes apache#45127 from MaxGekk/fix-supportedFormat.

Authored-by: Max Gekk <[email protected]>
Signed-off-by: Max Gekk <[email protected]>
(cherry picked from commit 074fcf2)
Signed-off-by: Max Gekk <[email protected]>
MaxGekk added a commit to MaxGekk/spark that referenced this pull request Feb 16, 2024
In the PR, I propose to add one more field to keys of `supportedFormat` in `IntervalUtils` because current implementation has duplicate keys that overwrites each other. For instance, the following keys are the same:
```
(YM.YEAR, YM.MONTH)
...
(DT.DAY, DT.HOUR)
```
because `YM.YEAR = DT.DAY = 0` and `YM.MONTH = DT.HOUR = 1`.

To fix the incorrect error message when Spark cannot parse ANSI interval string. For example, the expected format should be some year-month format but Spark outputs day-time one:
```sql
spark-sql (default)> select interval '-\t2-2\t' year to month;

Interval string does not match year-month format of `[+|-]d h`, `INTERVAL [+|-]'[+|-]d h' DAY TO HOUR` when cast to interval year to month: -	2-2	. (line 1, pos 16)

== SQL ==
select interval '-\t2-2\t' year to month
----------------^^^
```

Yes.

By running the existing test suite:
```
$ build/sbt "test:testOnly *IntervalUtilsSuite"
```
and regenerating the golden files:
```
$ SPARK_GENERATE_GOLDEN_FILES=1 PYSPARK_PYTHON=python3 build/sbt "sql/testOnly org.apache.spark.sql.SQLQueryTestSuite"
```

No.

Closes apache#45127 from MaxGekk/fix-supportedFormat.

Authored-by: Max Gekk <[email protected]>
Signed-off-by: Max Gekk <[email protected]>
(cherry picked from commit 074fcf2)
Signed-off-by: Max Gekk <[email protected]>
@MaxGekk
Copy link
Member Author

MaxGekk commented Feb 16, 2024

MaxGekk added a commit that referenced this pull request Feb 19, 2024
### What changes were proposed in this pull request?
In the PR, I propose to add one more field to keys of `supportedFormat` in `IntervalUtils` because current implementation has duplicate keys that overwrites each other. For instance, the following keys are the same:
```
(YM.YEAR, YM.MONTH)
...
(DT.DAY, DT.HOUR)
```
because `YM.YEAR = DT.DAY = 0` and `YM.MONTH = DT.HOUR = 1`.

This is a backport of #45127.

### Why are the changes needed?
To fix the incorrect error message when Spark cannot parse ANSI interval string. For example, the expected format should be some year-month format but Spark outputs day-time one:
```sql
spark-sql (default)> select interval '-\t2-2\t' year to month;

Interval string does not match year-month format of `[+|-]d h`, `INTERVAL [+|-]'[+|-]d h' DAY TO HOUR` when cast to interval year to month: -	2-2	. (line 1, pos 16)

== SQL ==
select interval '-\t2-2\t' year to month
----------------^^^
```

### Does this PR introduce _any_ user-facing change?
Yes.

### How was this patch tested?
By running the existing test suite:
```
$ build/sbt "test:testOnly *IntervalUtilsSuite"
```
and regenerating the golden files:
```
$ SPARK_GENERATE_GOLDEN_FILES=1 PYSPARK_PYTHON=python3 build/sbt "sql/testOnly org.apache.spark.sql.SQLQueryTestSuite"
```

### Was this patch authored or co-authored using generative AI tooling?
No.

Authored-by: Max Gekk <max.gekkgmail.com>
(cherry picked from commit 074fcf2)

Closes #45140 from MaxGekk/fix-supportedFormat-3.4.

Authored-by: Max Gekk <[email protected]>
Signed-off-by: Max Gekk <[email protected]>
MaxGekk added a commit that referenced this pull request Feb 19, 2024
### What changes were proposed in this pull request?
In the PR, I propose to add one more field to keys of `supportedFormat` in `IntervalUtils` because current implementation has duplicate keys that overwrites each other. For instance, the following keys are the same:
```
(YM.YEAR, YM.MONTH)
...
(DT.DAY, DT.HOUR)
```
because `YM.YEAR = DT.DAY = 0` and `YM.MONTH = DT.HOUR = 1`.

This is a backport of #45127.

### Why are the changes needed?
To fix the incorrect error message when Spark cannot parse ANSI interval string. For example, the expected format should be some year-month format but Spark outputs day-time one:
```sql
spark-sql (default)> select interval '-\t2-2\t' year to month;

Interval string does not match year-month format of `[+|-]d h`, `INTERVAL [+|-]'[+|-]d h' DAY TO HOUR` when cast to interval year to month: -	2-2	. (line 1, pos 16)

== SQL ==
select interval '-\t2-2\t' year to month
----------------^^^
```

### Does this PR introduce _any_ user-facing change?
Yes.

### How was this patch tested?
By running the existing test suite:
```
$ build/sbt "test:testOnly *IntervalUtilsSuite"
```
and regenerating the golden files:
```
$ SPARK_GENERATE_GOLDEN_FILES=1 PYSPARK_PYTHON=python3 build/sbt "sql/testOnly org.apache.spark.sql.SQLQueryTestSuite"
```

### Was this patch authored or co-authored using generative AI tooling?
No.

Authored-by: Max Gekk <max.gekkgmail.com>
(cherry picked from commit 074fcf2)

Closes #45139 from MaxGekk/fix-supportedFormat-3.5.

Authored-by: Max Gekk <[email protected]>
Signed-off-by: Max Gekk <[email protected]>
snmvaughan pushed a commit to snmvaughan/spark that referenced this pull request Mar 26, 2024
### What changes were proposed in this pull request?
In the PR, I propose to add one more field to keys of `supportedFormat` in `IntervalUtils` because current implementation has duplicate keys that overwrites each other. For instance, the following keys are the same:
```
(YM.YEAR, YM.MONTH)
...
(DT.DAY, DT.HOUR)
```
because `YM.YEAR = DT.DAY = 0` and `YM.MONTH = DT.HOUR = 1`.

This is a backport of apache#45127.

### Why are the changes needed?
To fix the incorrect error message when Spark cannot parse ANSI interval string. For example, the expected format should be some year-month format but Spark outputs day-time one:
```sql
spark-sql (default)> select interval '-\t2-2\t' year to month;

Interval string does not match year-month format of `[+|-]d h`, `INTERVAL [+|-]'[+|-]d h' DAY TO HOUR` when cast to interval year to month: -	2-2	. (line 1, pos 16)

== SQL ==
select interval '-\t2-2\t' year to month
----------------^^^
```

### Does this PR introduce _any_ user-facing change?
Yes.

### How was this patch tested?
By running the existing test suite:
```
$ build/sbt "test:testOnly *IntervalUtilsSuite"
```
and regenerating the golden files:
```
$ SPARK_GENERATE_GOLDEN_FILES=1 PYSPARK_PYTHON=python3 build/sbt "sql/testOnly org.apache.spark.sql.SQLQueryTestSuite"
```

### Was this patch authored or co-authored using generative AI tooling?
No.

Authored-by: Max Gekk <max.gekkgmail.com>
(cherry picked from commit 074fcf2)

Closes apache#45140 from MaxGekk/fix-supportedFormat-3.4.

Authored-by: Max Gekk <[email protected]>
Signed-off-by: Max Gekk <[email protected]>
turboFei pushed a commit to turboFei/spark that referenced this pull request Nov 6, 2025
…ges (apache#368)

### What changes were proposed in this pull request?
In the PR, I propose to add one more field to keys of `supportedFormat` in `IntervalUtils` because current implementation has duplicate keys that overwrites each other. For instance, the following keys are the same:
```
(YM.YEAR, YM.MONTH)
...
(DT.DAY, DT.HOUR)
```
because `YM.YEAR = DT.DAY = 0` and `YM.MONTH = DT.HOUR = 1`.

This is a backport of apache#45127.

### Why are the changes needed?
To fix the incorrect error message when Spark cannot parse ANSI interval string. For example, the expected format should be some year-month format but Spark outputs day-time one:
```sql
spark-sql (default)> select interval '-\t2-2\t' year to month;

Interval string does not match year-month format of `[+|-]d h`, `INTERVAL [+|-]'[+|-]d h' DAY TO HOUR` when cast to interval year to month: -	2-2	. (line 1, pos 16)

== SQL ==
select interval '-\t2-2\t' year to month
----------------^^^
```

### Does this PR introduce _any_ user-facing change?
Yes.

### How was this patch tested?
By running the existing test suite:
```
$ build/sbt "test:testOnly *IntervalUtilsSuite"
```
and regenerating the golden files:
```
$ SPARK_GENERATE_GOLDEN_FILES=1 PYSPARK_PYTHON=python3 build/sbt "sql/testOnly org.apache.spark.sql.SQLQueryTestSuite"
```

### Was this patch authored or co-authored using generative AI tooling?
No.

Authored-by: Max Gekk <max.gekkgmail.com>
(cherry picked from commit 074fcf2)

Closes apache#45139 from MaxGekk/fix-supportedFormat-3.5.

Authored-by: Max Gekk <[email protected]>

Signed-off-by: Max Gekk <[email protected]>
Co-authored-by: Max Gekk <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants