Skip to content

Conversation

@davies
Copy link
Contributor

@davies davies commented Dec 17, 2015

For API DataFrame.join(right, usingColumns, joinType), if the joinType is right_outer or full_outer, the resulting join columns could be wrong (will be null).

The order of columns had been changed to match that with MySQL and PostgreSQL [1].

This PR also fix the nullability of output for outer join.

[1] http://www.postgresql.org/docs/9.2/static/queries-table-expressions.html

@davies
Copy link
Contributor Author

davies commented Dec 17, 2015

cc @rxin @liancheng

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Here we can use an AttributeSet for convenience:

val joinRefs = AttributeSet(condition.toSeq.flatMap(_.references))
val resultCols = joinedCols ++ joined.output.filterNot(joinRefs.contains)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe not, as the comment said, we can't compare Attribute here

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but that is exactly what an AttributeSet is for. It is a set that only compares the id.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@liancheng @marmbrus Oh, I see, missed that, thanks!

@liancheng
Copy link
Contributor

LGTM except for one minor issue.

@SparkQA
Copy link

SparkQA commented Dec 17, 2015

Test build #47914 has finished for PR 10353 at commit f5ab9cb.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@yhuai
Copy link
Contributor

yhuai commented Dec 17, 2015

Does 1.5 have this problem?

@davies
Copy link
Contributor Author

davies commented Dec 17, 2015

@yhuai No, this is a new feature in 1.6

asfgit pushed a commit that referenced this pull request Dec 17, 2015
For API DataFrame.join(right, usingColumns, joinType), if the joinType is right_outer or full_outer, the resulting join columns could be wrong (will be null).

The order of columns had been changed to match that with MySQL and PostgreSQL [1].

This PR also fix the nullability of output for outer join.

[1] http://www.postgresql.org/docs/9.2/static/queries-table-expressions.html

Author: Davies Liu <[email protected]>

Closes #10353 from davies/fix_join.

(cherry picked from commit a170d34)
Signed-off-by: Davies Liu <[email protected]>
@asfgit asfgit closed this in a170d34 Dec 17, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants