Skip to content

Conversation

@ericm-db
Copy link
Collaborator

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

@ericm-db ericm-db changed the base branch from master to state-ttl March 27, 2024 17:42
@sahnib sahnib force-pushed the state-ttl branch 2 times, most recently from ab4000f to 69ad1e2 Compare April 1, 2024 15:48
@ericm-db ericm-db closed this Apr 1, 2024
sahnib pushed a commit that referenced this pull request Aug 13, 2024
… throw internal error

### What changes were proposed in this pull request?

This PR fixes the error messages and classes when Python UDFs are used in higher order functions.

### Why are the changes needed?

To show the proper user-facing exceptions with error classes.

### Does this PR introduce _any_ user-facing change?

Yes, previously it threw internal error such as:

```python
from pyspark.sql.functions import transform, udf, col, array
spark.range(1).select(transform(array("id"), lambda x: udf(lambda y: y)(x))).collect()
```

Before:

```
py4j.protocol.Py4JJavaError: An error occurred while calling o74.collectToPython.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 15 in stage 0.0 failed 1 times, most recent failure: Lost task 15.0 in stage 0.0 (TID 15) (ip-192-168-123-103.ap-northeast-2.compute.internal executor driver): org.apache.spark.SparkException: [INTERNAL_ERROR] Cannot evaluate expression: <lambda>(lambda x_0#3L)#2 SQLSTATE: XX000
	at org.apache.spark.SparkException$.internalError(SparkException.scala:92)
	at org.apache.spark.SparkException$.internalError(SparkException.scala:96)
```

After:

```
pyspark.errors.exceptions.captured.AnalysisException: [INVALID_LAMBDA_FUNCTION_CALL.UNEVALUABLE] Invalid lambda function call. Python UDFs should be used in a lambda function at a higher order function. However, "<lambda>(lambda x_0#3L)" was a Python UDF. SQLSTATE: 42K0D;
Project [transform(array(id#0L), lambdafunction(<lambda>(lambda x_0#3L)#2, lambda x_0#3L, false)) AS transform(array(id), lambdafunction(<lambda>(lambda x_0#3L), namedlambdavariable()))#4]
+- Range (0, 1, step=1, splits=Some(16))
```

### How was this patch tested?

Unittest was added

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes apache#47079 from HyukjinKwon/SPARK-48706.

Authored-by: Hyukjin Kwon <[email protected]>
Signed-off-by: Kent Yao <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants