-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-12395] [SQL] fix resulting columns of outer join #10353
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
cc @rxin @liancheng |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Here we can use an AttributeSet for convenience:
val joinRefs = AttributeSet(condition.toSeq.flatMap(_.references))
val resultCols = joinedCols ++ joined.output.filterNot(joinRefs.contains)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe not, as the comment said, we can't compare Attribute here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but that is exactly what an AttributeSet is for. It is a set that only compares the id.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@liancheng @marmbrus Oh, I see, missed that, thanks!
|
LGTM except for one minor issue. |
|
Test build #47914 has finished for PR 10353 at commit
|
|
Does 1.5 have this problem? |
|
@yhuai No, this is a new feature in 1.6 |
For API DataFrame.join(right, usingColumns, joinType), if the joinType is right_outer or full_outer, the resulting join columns could be wrong (will be null). The order of columns had been changed to match that with MySQL and PostgreSQL [1]. This PR also fix the nullability of output for outer join. [1] http://www.postgresql.org/docs/9.2/static/queries-table-expressions.html Author: Davies Liu <[email protected]> Closes #10353 from davies/fix_join. (cherry picked from commit a170d34) Signed-off-by: Davies Liu <[email protected]>
For API DataFrame.join(right, usingColumns, joinType), if the joinType is right_outer or full_outer, the resulting join columns could be wrong (will be null).
The order of columns had been changed to match that with MySQL and PostgreSQL [1].
This PR also fix the nullability of output for outer join.
[1] http://www.postgresql.org/docs/9.2/static/queries-table-expressions.html