-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-8782] [SQL] Fix code generation for ORDER BY NULL #7179
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
This should block on #7176 being merged. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test is total overkill, but it's a neat example of how randomized data generation plus a list of types can be used for exploratory testing.
|
Merged build triggered. |
|
Merged build started. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While working on this, I discovered that RowOrdering and GenerateOrdering disagree for inputs containing NaN. This isn't a bug per-se, since many systems have undefined behavior when sorting on NaN. For this reason, I think that some databases prohibit NaN and Infinity from being used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that we might use sorting for clustering as part of a sort-based distinct operator, I wonder whether this has any bad implications for performing distinct on columns that contain NaN. Should we warn about this undefined behavior somewhere in our documentation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It turns out that it's actually possible to crash the Sort operator with "Comparison method violates its general contract!" errors if NaNs are present in the column being sorted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
damn
|
Test build #36352 has started for PR 7179 at commit |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The nesting of the loops here is slightly misleading, because we'll always report a passed test for types where we don't have a data generator. We at least test that we're able to generate code for the ordering even if we don't actually execute that code. Maybe this is an okay trade-off, but it's a concern to watch out for.
|
Test build #36352 has finished for PR 7179 at commit
|
|
Merged build finished. Test PASSed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @davies
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
Can you take the testing stuff out of this pr and merge this first? |
|
Merged build triggered. |
|
Merged build started. |
21325e2 to
6ef49a6
Compare
|
@rxin, done. |
|
LGTM/ |
|
Merged build triggered. |
|
Merged build started. |
|
Test build #36441 has started for PR 7179 at commit |
|
Merged build finished. Test FAILed. |
|
Test build #36441 has finished for PR 7179 at commit
|
|
Merged build finished. Test FAILed. |
|
Jenkins, retest this please. |
|
Merged build triggered. |
|
Merged build started. |
|
Test build #36445 has started for PR 7179 at commit |
|
Test build #36445 has finished for PR 7179 at commit
|
|
Merged build finished. Test PASSed. |
|
Thanks - merged. |
This fixes code generation for queries containing
ORDER BY NULL. Previously, the generated code would fail to compile.