-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-18609][SQL]Fix when CTE with Join between two table with same column name #16255
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -26,7 +26,7 @@ import org.apache.spark.api.java.function.FilterFunction | |
| import org.apache.spark.sql.AnalysisException | ||
| import org.apache.spark.sql.catalyst.{CatalystConf, SimpleCatalystConf} | ||
| import org.apache.spark.sql.catalyst.analysis._ | ||
| import org.apache.spark.sql.catalyst.catalog.{InMemoryCatalog, SessionCatalog} | ||
| import org.apache.spark.sql.catalyst.catalog.{CatalogRelation, InMemoryCatalog, SessionCatalog} | ||
| import org.apache.spark.sql.catalyst.expressions._ | ||
| import org.apache.spark.sql.catalyst.expressions.aggregate._ | ||
| import org.apache.spark.sql.catalyst.expressions.Literal.{FalseLiteral, TrueLiteral} | ||
|
|
@@ -200,6 +200,8 @@ object RemoveAliasOnlyProject extends Rule[LogicalPlan] { | |
| case plan: Project if plan eq proj => plan.child | ||
| case plan => plan transformExpressions { | ||
| case a: Attribute if attrMap.contains(a) => attrMap(a) | ||
| case b: Alias if attrMap.exists(_._1.exprId == b.exprId) | ||
| && b.child.isInstanceOf[NamedExpression] => b.child | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. How do you reason about this? why we treat
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. As you said, if we find an alias-only project, e.g. alias a#1 to a#2, it's safe to here, the logic shows that we process the situation.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't get it, for
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. RemoveAliasOnlyProject will remove
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I know how the failure happens and this can fix it, but I think it's too hacky and does not catch the root cause. https://github.com/apache/spark/pull/16255/files#r92348878 easily explains why the failure happens and how to fix it, can you make other people understand your fix easily? |
||
| } | ||
| } | ||
| }.getOrElse(plan) | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem is: this rule assumes that, if we find an alias-only project, e.g. alias a#1 to a#2, it's safe to remove this project and replace all a#2 with a#1 in this plan. However, this is not true for complex cases like https://github.com/apache/spark/pull/16255/files#diff-1ea02a6fab84e938582f7f87cc4d9ea1R2023 .
Let's see if there is a way to fix this problem entirely.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a naive way to do this is, make sure we only replace attributes in the ancestor nodes of the alias-only project:
It's very inefficient, maybe we can improve
TreeNodeto maintain the parent-child relationship between nodes.Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is safe to only replace attributes in the ancestor nodes.
Alias with the same exprId but not the same object, replace the alias with it's child. it is not safe ,right?
Project [col#9 AS col#6] -> Project [col#9]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what do you mean?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry, describe it clearly,this is not safe? https://github.com/windpiger/spark/blob/0413f9dad4ad1294e3400dc0f42f66529b1b055b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala#L203-L204