Skip to content

Conversation

@wForget
Copy link
Member

@wForget wForget commented Mar 19, 2024

What changes were proposed in this pull request?

Use V2Predicate to wrap If expr when building v2 expressions.

Why are the changes needed?

The PushFoldableIntoBranches optimizer may fold predicate into (if / case) branches and V2ExpressionBuilder wraps If as GeneralScalarExpression, which causes the assertion in PushablePredicate.unapply to fail.

Does this PR introduce any user-facing change?

No

How was this patch tested?

added unit test

Was this patch authored or co-authored using generative AI tooling?

No

@github-actions github-actions bot added the SQL label Mar 19, 2024
@wForget
Copy link
Member Author

wForget commented Mar 19, 2024

@beliefer @cloud-fan could you please take a look?

@wForget
Copy link
Member Author

wForget commented Apr 9, 2024

@beliefer @cloud-fan sorry to bother you again, could you please take a look if you have time?

@wForget
Copy link
Member Author

wForget commented Apr 11, 2024

The org.apache.spark.sql.catalyst.optimizer.SimplifyBinaryComparison optimizer may also fold predicate.

@cloud-fan
Copy link
Contributor

With hindsight, we shouldn't create the v2 Predicate API in the first place, and should just use the v2 Expression API. The Predicate trait in catalyst is not useful as well. Some expressions can return different data types, including boolean type, it's hard to classify them.

It's too late to change the API now. One suggestion is to follow how we handle CASE WHEN: always use v2 Predicate if this expression may return boolean type. v2 Predicate extends GeneralScalarExpression, so this is fine.

None
}
case iff: If => generateExpressionWithName("CASE_WHEN", iff.children)
case iff: If =>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you check other expressions matched here and see if they may return boolean type?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you check other expressions matched here and see if they may return boolean type?

there is also Coalesce.

}
}

private def generatePredicateWithName(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one idea: can we update generateExpressionWithName to take the target expression instead of the children? Then we can simply do

val children = e.children
...
if (d.dataType == BooleanType && isPredicate) {
  new V2Predicate ...
} else {
  new GeneralScalarExpression
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds good, I will try that.

case unaryMinus @ UnaryMinus(_, true) =>
generateExpressionWithName("-", unaryMinus, isPredicate)
case _: BitwiseNot => generateExpressionWithName("~", expr, isPredicate)
case CaseWhen(branches, elseValue) =>
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CaseWhen always seems to return V2Predicate, is this correct?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so, but this probably doesn't hurt as V2Predicate extends GeneralScalarExpression. We should still fix it to make code clearer.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I made fix.

@wForget wForget changed the title [SPARK-47463][SQL] Use V2Predicate to wrap If expr when building v2 expressions [SPARK-47463][SQL] Use V2Predicate to wrap expression with return type of boolean Apr 11, 2024
generateExpressionWithName("-", unaryMinus, isPredicate)
case _: BitwiseNot => generateExpressionWithName("~", expr, isPredicate)
case caseWhen @ CaseWhen(branches, elseValue) =>
val conditions = branches.map(_._1).flatMap(generateExpression(_, true))
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reserved isPredicate=true for conditions of casewhen

val df1 = sql(
s"""
|select * from
|(select if(i = 1, i, 0) as c from t1) t
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this test is sufficient. If can be a pedicate and before this PR we don't return V2Predicate which causes errors.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this test is sufficient. If can be a pedicate and before this PR we don't return V2Predicate which causes errors.

makes sense to me

@wForget wForget requested a review from cloud-fan April 15, 2024 06:33
s"""
|select * from
|(select if(i = 1, i, 0) as c from t1) t
|where t.c > 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think where if(i = 1, i, 0) > 0 is a valid SQL? BTW, please upper case the SQL keywords in the SQL statement.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think where if(i = 1, i, 0) > 0 is a valid SQL? BTW, please upper case the SQL keywords in the SQL statement.

Indeed, thank you. I have changed.

@cloud-fan
Copy link
Contributor

thanks, merging to master!

@cloud-fan cloud-fan closed this in fa3ef03 Apr 16, 2024
@cloud-fan
Copy link
Contributor

@wForget can you help to create a 3.5 backport PR? thanks!

@wForget
Copy link
Member Author

wForget commented Apr 16, 2024

@wForget can you help to create a 3.5 backport PR? thanks!

Sure, I will create it as soon as possible, and thanks for your review.

cloud-fan pushed a commit that referenced this pull request Apr 18, 2024
…n type of boolean

Backports #45589 to 3.5

### What changes were proposed in this pull request?

Use V2Predicate to wrap If expr when building v2 expressions.

### Why are the changes needed?

The `PushFoldableIntoBranches` optimizer may fold predicate into (if / case) branches and `V2ExpressionBuilder` wraps `If` as `GeneralScalarExpression`, which causes the assertion in `PushablePredicate.unapply` to fail.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

added unit test

### Was this patch authored or co-authored using generative AI tooling?

No

Closes #46074 from wForget/SPARK-47463_3.5.

Authored-by: Zhen Wang <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
@rami-lv
Copy link

rami-lv commented Dec 3, 2024

I encountered a similar issue, I am using spark3.5.3 meaning it already has this commit.
I think this is caused by Coalesce expression but I am not sure why.
I would appreciate any indications for easier debugging.

2024/12/03 05:21:58 [stderr] : org.apache.spark.SparkException: [INTERNAL_ERROR] The Spark SQL phase optimization failed with an internal error. You hit a bug in Spark or the Spark plugins you use. Please, report this bug to the corresponding communities or vendors, and provide the full stack trace.
2024/12/03 05:21:58 [stderr] 	at org.apache.spark.SparkException$.internalError(SparkException.scala:107)
2024/12/03 05:21:58 [stderr] 	at org.apache.spark.sql.execution.QueryExecution$.toInternalError(QueryExecution.scala:536)
2024/12/03 05:21:58 [stderr] 	at org.apache.spark.sql.execution.QueryExecution$.withInternalError(QueryExecution.scala:548)
2024/12/03 05:21:58 [stderr] 	at org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$1(QueryExecution.scala:219)
2024/12/03 05:21:58 [stderr] 	at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:900)
2024/12/03 05:21:58 [stderr] 	at org.apache.spark.sql.execution.QueryExecution.executePhase(QueryExecution.scala:218)
2024/12/03 05:21:58 [stderr] 	at org.apache.spark.sql.execution.QueryExecution.optimizedPlan$lzycompute(QueryExecution.scala:148)
2024/12/03 05:21:58 [stderr] 	at org.apache.spark.sql.execution.QueryExecution.optimizedPlan(QueryExecution.scala:144)
2024/12/03 05:21:58 [stderr] 	at org.apache.spark.sql.execution.QueryExecution.assertOptimized(QueryExecution.scala:162)
2024/12/03 05:21:58 [stderr] 	at org.apache.spark.sql.execution.QueryExecution.executedPlan$lzycompute(QueryExecution.scala:182)
2024/12/03 05:21:58 [stderr] 	at org.apache.spark.sql.execution.QueryExecution.executedPlan(QueryExecution.scala:179)
2024/12/03 05:21:58 [stderr] 	at org.apache.spark.sql.execution.QueryExecution.simpleString(QueryExecution.scala:238)
2024/12/03 05:21:58 [stderr] 	at org.apache.spark.sql.execution.QueryExecution.org$apache$spark$sql$execution$QueryExecution$$explainString(QueryExecution.scala:284)
2024/12/03 05:21:58 [stderr] 	at org.apache.spark.sql.execution.QueryExecution.explainString(QueryExecution.scala:252)
2024/12/03 05:21:58 [stderr] 	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$6(SQLExecution.scala:117)
2024/12/03 05:21:58 [stderr] 	at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:201)
2024/12/03 05:21:58 [stderr] 	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:108)
2024/12/03 05:21:58 [stderr] 	at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:900)
2024/12/03 05:21:58 [stderr] 	at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:66)
2024/12/03 05:21:58 [stderr] 	at org.apache.spark.sql.Dataset.withAction(Dataset.scala:4321)
2024/12/03 05:21:58 [stderr] 	at org.apache.spark.sql.Dataset.checkpoint(Dataset.scala:727)
2024/12/03 05:21:58 [stderr] 	at org.apache.spark.sql.Dataset.checkpoint(Dataset.scala:690)
2024/12/03 05:21:58 [stderr] 	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
2024/12/03 05:21:58 [stderr] 	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
2024/12/03 05:21:58 [stderr] 	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
2024/12/03 05:21:58 [stderr] 	at java.base/java.lang.reflect.Method.invoke(Method.java:569)
2024/12/03 05:21:58 [stderr] 	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
2024/12/03 05:21:58 [stderr] 	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:374)
2024/12/03 05:21:58 [stderr] 	at py4j.Gateway.invoke(Gateway.java:282)
2024/12/03 05:21:58 [stderr] 	at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
2024/12/03 05:21:58 [stderr] 	at py4j.commands.CallCommand.execute(CallCommand.java:79)
2024/12/03 05:21:58 [stderr] 	at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182)
2024/12/03 05:21:58 [stderr] 	at py4j.ClientServerConnection.run(ClientServerConnection.java:106)
2024/12/03 05:21:58 [stderr] 	at java.base/java.lang.Thread.run(Thread.java:840)
2024/12/03 05:21:58 [stderr] Caused by: java.lang.AssertionError: assertion failed
2024/12/03 05:21:58 [stderr] 	at scala.Predef$.assert(Predef.scala:208)
2024/12/03 05:21:58 [stderr] 	at org.apache.spark.sql.catalyst.util.V2ExpressionBuilder.generateExpression(V2ExpressionBuilder.scala:143)
2024/12/03 05:21:58 [stderr] 	at org.apache.spark.sql.catalyst.util.V2ExpressionBuilder.$anonfun$generateExpressionWithNameByChildren$1(V2ExpressionBuilder.scala:359)
2024/12/03 05:21:58 [stderr] 	at scala.collection.TraversableLike.$anonfun$flatMap$1(TraversableLike.scala:293)
2024/12/03 05:21:58 [stderr] 	at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
2024/12/03 05:21:58 [stderr] 	at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
2024/12/03 05:21:58 [stderr] 	at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
2024/12/03 05:21:58 [stderr] 	at scala.collection.TraversableLike.flatMap(TraversableLike.scala:293)
2024/12/03 05:21:58 [stderr] 	at scala.collection.TraversableLike.flatMap$(TraversableLike.scala:290)
2024/12/03 05:21:58 [stderr] 	at scala.collection.AbstractTraversable.flatMap(Traversable.scala:108)
2024/12/03 05:21:58 [stderr] 	at org.apache.spark.sql.catalyst.util.V2ExpressionBuilder.generateExpressionWithNameByChildren(V2ExpressionBuilder.scala:359)
2024/12/03 05:21:58 [stderr] 	at org.apache.spark.sql.catalyst.util.V2ExpressionBuilder.generateExpressionWithName(V2ExpressionBuilder.scala:351)
2024/12/03 05:21:58 [stderr] 	at org.apache.spark.sql.catalyst.util.V2ExpressionBuilder.generateExpression(V2ExpressionBuilder.scala:100)
2024/12/03 05:21:58 [stderr] 	at org.apache.spark.sql.catalyst.util.V2ExpressionBuilder.build(V2ExpressionBuilder.scala:33)
2024/12/03 05:21:58 [stderr] 	at org.apache.spark.sql.execution.datasources.v2.PushablePredicate$.unapply(DataSourceV2Strategy.scala:663)
2024/12/03 05:21:58 [stderr] 	at org.apache.spark.sql.execution.datasources.v2.DataSourceV2Strategy$.translateLeafNodeFilterV2(DataSourceV2Strategy.scala:557)
2024/12/03 05:21:58 [stderr] 	at org.apache.spark.sql.execution.datasources.v2.DataSourceV2Strategy$.translateFilterV2WithMapping(DataSourceV2Strategy.scala:610)
2024/12/03 05:21:58 [stderr] 	at org.apache.spark.sql.execution.datasources.v2.DataSourceV2Strategy$.translateFilterV2WithMapping(DataSourceV2Strategy.scala:607)
2024/12/03 05:21:58 [stderr] 	at org.apache.spark.sql.execution.datasources.v2.PushDownUtils$.$anonfun$pushFilters$3(PushDownUtils.scala:88)
2024/12/03 05:21:58 [stderr] 	at scala.collection.immutable.List.foreach(List.scala:431)
2024/12/03 05:21:58 [stderr] 	at org.apache.spark.sql.execution.datasources.v2.PushDownUtils$.pushFilters(PushDownUtils.scala:85)
2024/12/03 05:21:58 [stderr] 	at org.apache.spark.sql.execution.datasources.v2.V2ScanRelationPushDown$$anonfun$pushDownFilters$1.applyOrElse(V2ScanRelationPushDown.scala:74)
2024/12/03 05:21:58 [stderr] 	at org.apache.spark.sql.execution.datasources.v2.V2ScanRelationPushDown$$anonfun$pushDownFilters$1.applyOrElse(V2ScanRelationPushDown.scala:61)
2024/12/03 05:21:58 [stderr] 	at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:461)
2024/12/03 05:21:58 [stderr] 	at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(origin.scala:76)
2024/12/03 05:21:58 [stderr] 	at org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:461)
2024/12/03 05:21:58 [stderr] 	at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDownWithPruning(LogicalPlan.scala:32)
2024/12/03 05:21:58 [stderr] 	at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning(AnalysisHelper.scala:267)
2024/12/03 05:21:58 [stderr] 	at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning$(AnalysisHelper.scala:263)
2024/12/03 05:21:58 [stderr] 	at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:32)
2024/12/03 05:21:58 [stderr] 	at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:32)
2024/12/03 05:21:58 [stderr] 	at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$3(TreeNode.scala:466)
2024/12/03 05:21:58 [stderr] 	at org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren(TreeNode.scala:1216)
2024/12/03 05:21:58 [stderr] 	at org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren$(TreeNode.scala:1215)
2024/12/03 05:21:58 [stderr] 	at org.apache.spark.sql.catalyst.plans.logical.Project.mapChildren(basicLogicalOperators.scala:71)
2024/12/03 05:21:58 [stderr] 	at org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:466)
2024/12/03 05:21:58 [stderr] 	at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDownWithPruning(LogicalPlan.scala:32)
2024/12/03 05:21:58 [stderr] 	at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning(AnalysisHelper.scala:267)
2024/12/03 05:21:58 [stderr] 	at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning$(AnalysisHelper.scala:263)
2024/12/03 05:21:58 [stderr] 	at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:32)
2024/12/03 05:21:58 [stderr] 	at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:32)
2024/12/03 05:21:58 [stderr] 	at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$3(TreeNode.scala:466)
2024/12/03 05:21:58 [stderr] 	at org.apache.spark.sql.catalyst.trees.BinaryLike.mapChildren(TreeNode.scala:1242)
2024/12/03 05:21:58 [stderr] 	at org.apache.spark.sql.catalyst.trees.BinaryLike.mapChildren$(TreeNode.scala:1241)
2024/12/03 05:21:58 [stderr] 	at org.apache.spark.sql.catalyst.plans.logical.Join.mapChildren(basicLogicalOperators.scala:543)
2024/12/03 05:21:58 [stderr] 	at org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:466)
2024/12/03 05:21:58 [stderr] 	at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDownWithPruning(LogicalPlan.scala:32)
2024/12/03 05:21:58 [stderr] 	at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning(AnalysisHelper.scala:267)
2024/12/03 05:21:58 [stderr] 	at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning$(AnalysisHelper.scala:263)
2024/12/03 05:21:58 [stderr] 	at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:32)
2024/12/03 05:21:58 [stderr] 	at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:32)
2024/12/03 05:21:58 [stderr] 	at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$3(TreeNode.scala:466)
2024/12/03 05:21:58 [stderr] 	at org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren(TreeNode.scala:1216)
2024/12/03 05:21:58 [stderr] 	at org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren$(TreeNode.scala:1215)
2024/12/03 05:21:58 [stderr] 	at org.apache.spark.sql.catalyst.plans.logical.Project.mapChildren(basicLogicalOperators.scala:71)
2024/12/03 05:21:58 [stderr] 	at org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:466)
2024/12/03 05:21:58 [stderr] 	at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDownWithPruning(LogicalPlan.scala:32)
2024/12/03 05:21:58 [stderr] 	at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning(AnalysisHelper.scala:267)
2024/12/03 05:21:58 [stderr] 	at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning$(AnalysisHelper.scala:263)
2024/12/03 05:21:58 [stderr] 	at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:32)
2024/12/03 05:21:58 [stderr] 	at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:32)
2024/12/03 05:21:58 [stderr] 	at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:437)
2024/12/03 05:21:58 [stderr] 	at org.apache.spark.sql.catalyst.trees.TreeNode.transform(TreeNode.scala:405)
2024/12/03 05:21:58 [stderr] 	at org.apache.spark.sql.execution.datasources.v2.V2ScanRelationPushDown$.pushDownFilters(V2ScanRelationPushDown.scala:61)
2024/12/03 05:21:58 [stderr] 	at org.apache.spark.sql.execution.datasources.v2.V2ScanRelationPushDown$.$anonfun$apply$3(V2ScanRelationPushDown.scala:45)
2024/12/03 05:21:58 [stderr] 	at org.apache.spark.sql.execution.datasources.v2.V2ScanRelationPushDown$.$anonfun$apply$8(V2ScanRelationPushDown.scala:52)
2024/12/03 05:21:58 [stderr] 	at scala.collection.LinearSeqOptimized.foldLeft(LinearSeqOptimized.scala:126)
2024/12/03 05:21:58 [stderr] 	at scala.collection.LinearSeqOptimized.foldLeft$(LinearSeqOptimized.scala:122)
2024/12/03 05:21:58 [stderr] 	at scala.collection.immutable.List.foldLeft(List.scala:91)
2024/12/03 05:21:58 [stderr] 	at org.apache.spark.sql.execution.datasources.v2.V2ScanRelationPushDown$.apply(V2ScanRelationPushDown.scala:51)
2024/12/03 05:21:58 [stderr] 	at org.apache.spark.sql.execution.datasources.v2.V2ScanRelationPushDown$.apply(V2ScanRelationPushDown.scala:38)
2024/12/03 05:21:58 [stderr] 	at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$2(RuleExecutor.scala:222)
2024/12/03 05:21:58 [stderr] 	at scala.collection.LinearSeqOptimized.foldLeft(LinearSeqOptimized.scala:126)
2024/12/03 05:21:58 [stderr] 	at scala.collection.LinearSeqOptimized.foldLeft$(LinearSeqOptimized.scala:122)
2024/12/03 05:21:58 [stderr] 	at scala.collection.immutable.List.foldLeft(List.scala:91)
2024/12/03 05:21:58 [stderr] 	at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$1(RuleExecutor.scala:219)
2024/12/03 05:21:58 [stderr] 	at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$1$adapted(RuleExecutor.scala:211)
2024/12/03 05:21:58 [stderr] 	at scala.collection.immutable.List.foreach(List.scala:431)
2024/12/03 05:21:58 [stderr] 	at org.apache.spark.sql.catalyst.rules.RuleExecutor.execute(RuleExecutor.scala:211)
2024/12/03 05:21:58 [stderr] 	at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$executeAndTrack$1(RuleExecutor.scala:182)
2024/12/03 05:21:58 [stderr] 	at org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:89)
2024/12/03 05:21:58 [stderr] 	at org.apache.spark.sql.catalyst.rules.RuleExecutor.executeAndTrack(RuleExecutor.scala:182)
2024/12/03 05:21:58 [stderr] 	at org.apache.spark.sql.execution.QueryExecution.$anonfun$optimizedPlan$1(QueryExecution.scala:152)
2024/12/03 05:21:58 [stderr] 	at org.apache.spark.sql.catalyst.QueryPlanningTracker.measurePhase(QueryPlanningTracker.scala:138)
2024/12/03 05:21:58 [stderr] 	at org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$2(QueryExecution.scala:219)
2024/12/03 05:21:58 [stderr] 	at org.apache.spark.sql.execution.QueryExecution$.withInternalError(QueryExecution.scala:546)

@rami-lv
Copy link

rami-lv commented Dec 5, 2024

@wForget could you please take a look

@wForget
Copy link
Member Author

wForget commented Dec 5, 2024

@wForget could you please take a look

Can you provide a sql to reproduce?

@cloud-fan
Copy link
Contributor

cloud-fan commented Dec 6, 2024

#48621 should fix this. @rami-lv can you take a look? We can backport it to 3.5

@rami-lv
Copy link

rami-lv commented Dec 6, 2024

I finally have been able to reproduce it:

# Write a Spark DataFrame as an Iceberg table
spark.range(10).write.saveAsTable(
    "iceberg_catalog.temp",
    format="iceberg",
    mode="overwrite"
)

# Query the Iceberg table with a SQL query
spark.sql("""
    SELECT *
    FROM iceberg_catalog.temp
    WHERE coalesce(
        (
            id = 43536 AND
            NULL - 10 > 86400000
        ),
        false
    )
""").show()

Note that the exception is only thrown when the source is an iceberg table.

@rami-lv
Copy link

rami-lv commented Dec 6, 2024

#48621 should fix this. @rami-lv can you take a look? We can backport it to 3.5

I am not sure! The issue with this query is with the Coalesce expression having AND expression as a parameter. Also removing the query pass when changing NULL to something else.

#48621 seems to solve an issue with IIF expression.

@beliefer
Copy link
Contributor

late LGTM.

turboFei pushed a commit to turboFei/spark that referenced this pull request Nov 6, 2025
…n type of boolean (apache#381)

Backports apache#45589 to 3.5

### What changes were proposed in this pull request?

Use V2Predicate to wrap If expr when building v2 expressions.

### Why are the changes needed?

The `PushFoldableIntoBranches` optimizer may fold predicate into (if / case) branches and `V2ExpressionBuilder` wraps `If` as `GeneralScalarExpression`, which causes the assertion in `PushablePredicate.unapply` to fail.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

added unit test

### Was this patch authored or co-authored using generative AI tooling?

No

Closes apache#46074 from wForget/SPARK-47463_3.5.

Authored-by: Zhen Wang <[email protected]>

Signed-off-by: Wenchen Fan <[email protected]>
Co-authored-by: Zhen Wang <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants