You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[SPARK-38977][SQL] Fix schema pruning with correlated subqueries
### What changes were proposed in this pull request?
This PR fixes schema pruning for queries with multiple correlated subqueries. Previously, Spark would throw an exception trying to determine root fields in `SchemaPruning$identifyRootFields`. That was happening because expressions in predicates that referenced attributes in subqueries were not ignored. That's why attributes from multiple subqueries could conflict with each other (e.g. incompatible types) even though they should be ignored.
For instance, the following query would throw a runtime exception.
```
SELECT name FROM contacts c
WHERE
EXISTS (SELECT 1 FROM ids i WHERE i.value = c.id)
AND
EXISTS (SELECT 1 FROM first_names n WHERE c.name.first = n.value)
```
```
[info] org.apache.spark.SparkException: Failed to merge fields 'value' and 'value'. Failed to merge incompatible data types int and string
[info] at org.apache.spark.sql.errors.QueryExecutionErrors$.failedMergingFieldsError(QueryExecutionErrors.scala:936)
```
### Why are the changes needed?
These changes are needed to avoid exceptions for some queries with multiple correlated subqueries.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
This PR comes with tests.
Closes#36303 from aokolnychyi/spark-38977.
Authored-by: Anton Okolnychyi <[email protected]>
Signed-off-by: Liang-Chi Hsieh <[email protected]>
0 commit comments