-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-6865][SQL] DataFrame column names should be treated as string literals #5505
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…literals. For example, "a.b" should match a column named `a.b`.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note @marmbrus our new resolver semantics breaks this test. Not sure how important it is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The more I think about this, the more I am worried that we can't make a change this large. There is no way to express self join queries if we don't handle . in column names. We are also going to break lots of existing user code...
|
Test build #30221 has finished for PR 5505 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and @liancheng I had to disable this test as well since it used "tablename.columnname".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess it should be OK to disable or even remove this test now, since now we check for invalid field names explicitly and suggest users to add aliases. See #5263.
|
Test build #30227 has finished for PR 5505 at commit
|
|
I discussed with michael offline -- given this would break self-join, we've decided to treat dot as a special case. |
|
As #5638 handled self join correctly, should we reopen this PR? |
|
That one actually doesn't handle most self join cases, since very often in self joins you join on different keys. |
For example, "a.b" should match a column named
a.b.