-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-17425][SQL] Override sameResult in HiveTableScanExec to make ReuseExchange work in text format table #14988
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why the default one doesn't work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
left.cleanArgs == right.cleanArgs in defalut sameResult return false, because equals in MetastoreRelation compare the output(AttributeReference) and exprIds are diff. We need to erase the exprId.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah, I think all leaf nodes suffer this problem, can you follow the way how they fix it? e.g.
override def sameResult(plan: LogicalPlan): Boolean = {
plan.canonicalized match {
case LocalRelation(otherOutput, otherData) =>
otherOutput.map(_.dataType) == output.map(_.dataType) && otherData == data
case _ => false
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ReuseExchange work in parquet/orc format, because FileSourceScanExec has override the sameResult.
8e537a1 to
d9ba28d
Compare
|
Test build #65017 has finished for PR 14988 at commit
|
|
Test build #65018 has finished for PR 14988 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this comment doesn't match the code. Can you explain more about why the default cleanExpression doesn't work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the delay, see the pic below, left.cleanArgs == right.cleanArgs in defalut sameResult return false, because equals in MetastoreRelation compare the output(AttributeReference) and exprIds are diff. cleanExpression can't clean the exprId in AttributeReference. So I think we need to override the sameResult in HiveTableScanExec like FileSourceScanExec. Let me know if I don't explain clearly.

|
Test build #65441 has finished for PR 14988 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you follow the existing workaround?
override def sameResult(plan: LogicalPlan): Boolean = {
plan.canonicalized match {
case LocalRelation(otherOutput, otherData) =>
otherOutput.map(_.dataType) == output.map(_.dataType) && otherData == data
case _ => false
}
}
then we don't need to override cleanExpression
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I only override the sameReult like FileSourceScanExec. I think HiveTableScanExec is used to text format and FileSourceScanExec is used to parquet/orc format.
d9ba28d to
e410d14
Compare
| val result = relation.sameResult(other.relation) && | ||
| output.length == other.output.length && | ||
| output.zip(other.output) | ||
| .forall(p => p._1.name == p._2.name && p._1.dataType == p._2.dataType) && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does the name matter? I'm not quite sure, but LogicalRelation only checks data type
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For example, the full output of table src is (key: Int, value: Int), and output1 is (key: Int), output2 is (value: Int), their(output1, output2) dataType are same, but they are diff and can't be resued.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
Test build #65646 has finished for PR 14988 at commit
|
|
LGTM, merging to master! |
What changes were proposed in this pull request?
The PR will override the
sameResultinHiveTableScanExecto makeReuseExchangework in text format table.How was this patch tested?
SQL
Before
After
cc: @davies @cloud-fan