Skip to content

Conversation

@kazuyukitanimura
Copy link
Contributor

What changes were proposed in this pull request?

This is a follow-up PR to fix the bug introduced by SPARK-36665. With this fix, NotPropagation optimization does not apply to InSubquery cases.

Why are the changes needed?

NotPropagation optimization previously broke RewritePredicateSubquery so that it does not properly rewrite the predicate to a NULL-aware left anti join anymore.

Does this PR introduce any user-facing change?

No

How was this patch tested?

Unit test added

@github-actions github-actions bot added the SQL label Feb 5, 2022
@kazuyukitanimura
Copy link
Contributor Author

@aokolnychyi Thank you for finding the issue. cc @viirya

@viirya
Copy link
Member

viirya commented Feb 5, 2022

cc @cloud-fan

Copy link
Member

@viirya viirya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you also add an end-to-end test? We can make sure if the query result is correct.

@HyukjinKwon
Copy link
Member

cc @allisonwang-db FYI

@cloud-fan
Copy link
Contributor

Are there any real-world examples to demonstrate the effectiveness of the rule NotPropagation? I'm a bit worried about making this rule more complicated while this rule has nearly no visible benefit...

@kazuyukitanimura
Copy link
Contributor Author

Thanks @cloud-fan My goal here is to unblock #35395. As you mentioned your concern about the complexity of this logic, I opened #35428 to remove NotPropagation for now.

For the effectiveness, I put many examples at

test("Using (Not(a) === b) == (a === Not(b)), (Not(a) <=> b) == (a <=> Not(b)) rules") {
checkCondition(Not('e) === Literal(true), 'e === Literal(false))
checkCondition(Not('e) === Literal(false), 'e === Literal(true))
checkCondition(Not('e) === Literal(null, BooleanType), testRelation)
checkCondition(Literal(true) === Not('e), Literal(false) === 'e)
checkCondition(Literal(false) === Not('e), Literal(true) === 'e)
checkCondition(Literal(null, BooleanType) === Not('e), testRelation)
checkCondition(Not('e) <=> Literal(true), 'e <=> Literal(false))
checkCondition(Not('e) <=> Literal(false), 'e <=> Literal(true))
checkCondition(Not('e) <=> Literal(null, BooleanType), IsNull('e))
checkCondition(Literal(true) <=> Not('e), Literal(false) <=> 'e)
checkCondition(Literal(false) <=> Not('e), Literal(true) <=> 'e)
checkCondition(Literal(null, BooleanType) <=> Not('e), IsNull('e))
checkCondition(Not('e) === Not('f), 'e === 'f)
checkCondition(Not('e) <=> Not('f), 'e <=> 'f)
checkCondition(IsNull('e) === Not('f), IsNotNull('e) === 'f)
checkCondition(Not('e) === IsNull('f), 'e === IsNotNull('f))
checkCondition(IsNull('e) <=> Not('f), IsNotNull('e) <=> 'f)
checkCondition(Not('e) <=> IsNull('f), 'e <=> IsNotNull('f))
checkCondition(IsNotNull('e) === Not('f), IsNull('e) === 'f)
checkCondition(Not('e) === IsNotNull('f), 'e === IsNull('f))
checkCondition(IsNotNull('e) <=> Not('f), IsNull('e) <=> 'f)
checkCondition(Not('e) <=> IsNotNull('f), 'e <=> IsNull('f))
checkCondition(Not('e) === Not(And('f, 'g)), 'e === And('f, 'g))
checkCondition(Not(And('e, 'f)) === Not('g), And('e, 'f) === 'g)
checkCondition(Not('e) <=> Not(And('f, 'g)), 'e <=> And('f, 'g))
checkCondition(Not(And('e, 'f)) <=> Not('g), And('e, 'f) <=> 'g)
checkCondition(Not('e) === Not(Or('f, 'g)), 'e === Or('f, 'g))
checkCondition(Not(Or('e, 'f)) === Not('g), Or('e, 'f) === 'g)
checkCondition(Not('e) <=> Not(Or('f, 'g)), 'e <=> Or('f, 'g))
checkCondition(Not(Or('e, 'f)) <=> Not('g), Or('e, 'f) <=> 'g)
checkCondition(('a > 'b) === Not('f), ('a <= 'b) === 'f)
checkCondition(Not('e) === ('a > 'b), 'e === ('a <= 'b))
checkCondition(('a > 'b) <=> Not('f), ('a <= 'b) <=> 'f)
checkCondition(Not('e) <=> ('a > 'b), 'e <=> ('a <= 'b))
checkCondition(('a >= 'b) === Not('f), ('a < 'b) === 'f)
checkCondition(Not('e) === ('a >= 'b), 'e === ('a < 'b))
checkCondition(('a >= 'b) <=> Not('f), ('a < 'b) <=> 'f)
checkCondition(Not('e) <=> ('a >= 'b), 'e <=> ('a < 'b))
checkCondition(('a < 'b) === Not('f), ('a >= 'b) === 'f)
checkCondition(Not('e) === ('a < 'b), 'e === ('a >= 'b))
checkCondition(('a < 'b) <=> Not('f), ('a >= 'b) <=> 'f)
checkCondition(Not('e) <=> ('a < 'b), 'e <=> ('a >= 'b))
checkCondition(('a <= 'b) === Not('f), ('a > 'b) === 'f)
checkCondition(Not('e) === ('a <= 'b), 'e === ('a > 'b))
checkCondition(('a <= 'b) <=> Not('f), ('a > 'b) <=> 'f)
checkCondition(Not('e) <=> ('a <= 'b), 'e <=> ('a > 'b))
}
test("Using (a =!= b) == (a === Not(b)), Not(a <=> b) == (a <=> Not(b)) rules") {
checkCondition('e =!= Literal(true), 'e === Literal(false))
checkCondition('e =!= Literal(false), 'e === Literal(true))
checkCondition('e =!= Literal(null, BooleanType), testRelation)
checkCondition(Literal(true) =!= 'e, Literal(false) === 'e)
checkCondition(Literal(false) =!= 'e, Literal(true) === 'e)
checkCondition(Literal(null, BooleanType) =!= 'e, testRelation)
checkCondition(Not(('a <=> 'b) <=> Literal(true)), ('a <=> 'b) <=> Literal(false))
checkCondition(Not(('a <=> 'b) <=> Literal(false)), ('a <=> 'b) <=> Literal(true))
checkCondition(Not(('a <=> 'b) <=> Literal(null, BooleanType)), testRelationWithData)
checkCondition(Not(Literal(true) <=> ('a <=> 'b)), Literal(false) <=> ('a <=> 'b))
checkCondition(Not(Literal(false) <=> ('a <=> 'b)), Literal(true) <=> ('a <=> 'b))
checkCondition(Not(Literal(null, BooleanType) <=> IsNull('e)), testRelationWithData)
checkCondition('e =!= Not('f), 'e === 'f)
checkCondition(Not('e) =!= 'f, 'e === 'f)
checkCondition(Not(('a <=> 'b) <=> Not(('b <=> 'c))), ('a <=> 'b) <=> ('b <=> 'c))
checkCondition(Not(Not(('a <=> 'b)) <=> ('b <=> 'c)), ('a <=> 'b) <=> ('b <=> 'c))
checkCondition('e =!= IsNull('f), 'e === IsNotNull('f))
checkCondition(IsNull('e) =!= 'f, IsNotNull('e) === 'f)
checkCondition(Not(('a <=> 'b) <=> IsNull('f)), ('a <=> 'b) <=> IsNotNull('f))
checkCondition(Not(IsNull('e) <=> ('b <=> 'c)), IsNotNull('e) <=> ('b <=> 'c))
checkCondition('e =!= IsNotNull('f), 'e === IsNull('f))
checkCondition(IsNotNull('e) =!= 'f, IsNull('e) === 'f)
checkCondition(Not(('a <=> 'b) <=> IsNotNull('f)), ('a <=> 'b) <=> IsNull('f))
checkCondition(Not(IsNotNull('e) <=> ('b <=> 'c)), IsNull('e) <=> ('b <=> 'c))
checkCondition('e =!= Not(And('f, 'g)), 'e === And('f, 'g))
checkCondition(Not(And('e, 'f)) =!= 'g, And('e, 'f) === 'g)
checkCondition('e =!= Not(Or('f, 'g)), 'e === Or('f, 'g))
checkCondition(Not(Or('e, 'f)) =!= 'g, Or('e, 'f) === 'g)
checkCondition(('a > 'b) =!= 'f, ('a <= 'b) === 'f)
checkCondition('e =!= ('a > 'b), 'e === ('a <= 'b))
checkCondition(('a >= 'b) =!= 'f, ('a < 'b) === 'f)
checkCondition('e =!= ('a >= 'b), 'e === ('a < 'b))
checkCondition(('a < 'b) =!= 'f, ('a >= 'b) === 'f)
checkCondition('e =!= ('a < 'b), 'e === ('a >= 'b))
checkCondition(('a <= 'b) =!= 'f, ('a > 'b) === 'f)
checkCondition('e =!= ('a <= 'b), 'e === ('a > 'b))
checkCondition('e =!= ('f === ('g === Not('h))), 'e === ('f === ('g === 'h)))
}
that helps simplifying users' queries.

@kazuyukitanimura
Copy link
Contributor Author

Closing by preferring #35428

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants