Skip to content

Conversation

@JoshRosen
Copy link
Contributor

branch-3.4 pick of PR #46333 , fixing test issue due to difference in expected error message parameter formatting across branches; original description follows below:


What changes were proposed in this pull request?

While migrating the NTile expression's type check failures to the new error class framework, PR #38457 removed a pair of not-unnecessary return statements and thus caused certain branches' values to be discarded rather than returned.

As a result, invalid usages like

select ntile(99.9) OVER (order by id) from range(10)

trigger internal errors like errors like

 java.lang.ClassCastException: class org.apache.spark.sql.types.Decimal cannot be cast to class java.lang.Integer (org.apache.spark.sql.types.Decimal is in unnamed module of loader 'app'; java.lang.Integer is in module java.base of loader 'bootstrap')
  at scala.runtime.BoxesRunTime.unboxToInt(BoxesRunTime.java:99)
  at org.apache.spark.sql.catalyst.expressions.NTile.checkInputDataTypes(windowExpressions.scala:877)

instead of clear error framework errors like

org.apache.spark.sql.catalyst.ExtendedAnalysisException: [DATATYPE_MISMATCH.UNEXPECTED_INPUT_TYPE] Cannot resolve "ntile(99.9)" due to data type mismatch: The first parameter requires the "INT" type, however "99.9" has the type "DECIMAL(3,1)". SQLSTATE: 42K09; line 1 pos 7;
'Project [unresolvedalias(ntile(99.9) windowspecdefinition(id#0L ASC NULLS FIRST, specifiedwindowframe(RowFrame, unboundedpreceding$(), currentrow$())))]
+- Range (0, 10, step=1, splits=None)

  at org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.dataTypeMismatch(package.scala:73)
  at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis0$7(CheckAnalysis.scala:315)

Why are the changes needed?

Improve error messages.

Does this PR introduce any user-facing change?

Yes, it improves an error message.

How was this patch tested?

Added a new test case to AnalysisErrorSuite.

Was this patch authored or co-authored using generative AI tooling?

No.

JoshRosen added 2 commits May 2, 2024 09:36
…hen argument is non-foldable or of wrong type

### What changes were proposed in this pull request?

While migrating the `NTile` expression's type check failures to the new error class framework, PR apache#38457 removed a pair of not-unnecessary `return` statements and thus caused certain branches' values to be discarded rather than returned.

As a result, invalid usages like

```
select ntile(99.9) OVER (order by id) from range(10)
```

trigger internal errors like errors like

```
 java.lang.ClassCastException: class org.apache.spark.sql.types.Decimal cannot be cast to class java.lang.Integer (org.apache.spark.sql.types.Decimal is in unnamed module of loader 'app'; java.lang.Integer is in module java.base of loader 'bootstrap')
  at scala.runtime.BoxesRunTime.unboxToInt(BoxesRunTime.java:99)
  at org.apache.spark.sql.catalyst.expressions.NTile.checkInputDataTypes(windowExpressions.scala:877)
```

instead of clear error framework errors like

```
org.apache.spark.sql.catalyst.ExtendedAnalysisException: [DATATYPE_MISMATCH.UNEXPECTED_INPUT_TYPE] Cannot resolve "ntile(99.9)" due to data type mismatch: The first parameter requires the "INT" type, however "99.9" has the type "DECIMAL(3,1)". SQLSTATE: 42K09; line 1 pos 7;
'Project [unresolvedalias(ntile(99.9) windowspecdefinition(id#0L ASC NULLS FIRST, specifiedwindowframe(RowFrame, unboundedpreceding$(), currentrow$())))]
+- Range (0, 10, step=1, splits=None)

  at org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.dataTypeMismatch(package.scala:73)
  at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis0$7(CheckAnalysis.scala:315)
```

### Why are the changes needed?

Improve error messages.

### Does this PR introduce _any_ user-facing change?

Yes, it improves an error message.

### How was this patch tested?

Added a new test case to AnalysisErrorSuite.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes apache#46333 from JoshRosen/SPARK-48081.

Authored-by: Josh Rosen <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM (Pending CIs).
Thank you!

dongjoon-hyun pushed a commit that referenced this pull request May 2, 2024
…aTypes() when argument is non-foldable or of wrong type

branch-3.4 pick of PR #46333 , fixing test issue due to difference in expected error message parameter formatting across branches; original description follows below:

---

### What changes were proposed in this pull request?

While migrating the `NTile` expression's type check failures to the new error class framework, PR #38457 removed a pair of not-unnecessary `return` statements and thus caused certain branches' values to be discarded rather than returned.

As a result, invalid usages like

```
select ntile(99.9) OVER (order by id) from range(10)
```

trigger internal errors like errors like

```
 java.lang.ClassCastException: class org.apache.spark.sql.types.Decimal cannot be cast to class java.lang.Integer (org.apache.spark.sql.types.Decimal is in unnamed module of loader 'app'; java.lang.Integer is in module java.base of loader 'bootstrap')
  at scala.runtime.BoxesRunTime.unboxToInt(BoxesRunTime.java:99)
  at org.apache.spark.sql.catalyst.expressions.NTile.checkInputDataTypes(windowExpressions.scala:877)
```

instead of clear error framework errors like

```
org.apache.spark.sql.catalyst.ExtendedAnalysisException: [DATATYPE_MISMATCH.UNEXPECTED_INPUT_TYPE] Cannot resolve "ntile(99.9)" due to data type mismatch: The first parameter requires the "INT" type, however "99.9" has the type "DECIMAL(3,1)". SQLSTATE: 42K09; line 1 pos 7;
'Project [unresolvedalias(ntile(99.9) windowspecdefinition(id#0L ASC NULLS FIRST, specifiedwindowframe(RowFrame, unboundedpreceding$(), currentrow$())))]
+- Range (0, 10, step=1, splits=None)

  at org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.dataTypeMismatch(package.scala:73)
  at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis0$7(CheckAnalysis.scala:315)
```

### Why are the changes needed?

Improve error messages.

### Does this PR introduce _any_ user-facing change?

Yes, it improves an error message.

### How was this patch tested?

Added a new test case to AnalysisErrorSuite.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #46337 from JoshRosen/SPARK-48081-branch-3.4.

Authored-by: Josh Rosen <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
@dongjoon-hyun
Copy link
Member

Merged to branch-3.4.

szehon-ho pushed a commit to szehon-ho/spark that referenced this pull request Aug 7, 2024
…aTypes() when argument is non-foldable or of wrong type

branch-3.4 pick of PR apache#46333 , fixing test issue due to difference in expected error message parameter formatting across branches; original description follows below:

---

### What changes were proposed in this pull request?

While migrating the `NTile` expression's type check failures to the new error class framework, PR apache#38457 removed a pair of not-unnecessary `return` statements and thus caused certain branches' values to be discarded rather than returned.

As a result, invalid usages like

```
select ntile(99.9) OVER (order by id) from range(10)
```

trigger internal errors like errors like

```
 java.lang.ClassCastException: class org.apache.spark.sql.types.Decimal cannot be cast to class java.lang.Integer (org.apache.spark.sql.types.Decimal is in unnamed module of loader 'app'; java.lang.Integer is in module java.base of loader 'bootstrap')
  at scala.runtime.BoxesRunTime.unboxToInt(BoxesRunTime.java:99)
  at org.apache.spark.sql.catalyst.expressions.NTile.checkInputDataTypes(windowExpressions.scala:877)
```

instead of clear error framework errors like

```
org.apache.spark.sql.catalyst.ExtendedAnalysisException: [DATATYPE_MISMATCH.UNEXPECTED_INPUT_TYPE] Cannot resolve "ntile(99.9)" due to data type mismatch: The first parameter requires the "INT" type, however "99.9" has the type "DECIMAL(3,1)". SQLSTATE: 42K09; line 1 pos 7;
'Project [unresolvedalias(ntile(99.9) windowspecdefinition(id#0L ASC NULLS FIRST, specifiedwindowframe(RowFrame, unboundedpreceding$(), currentrow$())))]
+- Range (0, 10, step=1, splits=None)

  at org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.dataTypeMismatch(package.scala:73)
  at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis0$7(CheckAnalysis.scala:315)
```

### Why are the changes needed?

Improve error messages.

### Does this PR introduce _any_ user-facing change?

Yes, it improves an error message.

### How was this patch tested?

Added a new test case to AnalysisErrorSuite.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes apache#46337 from JoshRosen/SPARK-48081-branch-3.4.

Authored-by: Josh Rosen <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
(cherry picked from commit 681a1de)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants