-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-17045] [SQL] Build/move Join-related test cases in SQLQueryTestSuite #14625
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
cc @rxin @petermaxlee @cloud-fan @hvanhovell Before moving all the test cases of auto joins, I want to confirm whether this is the right direction. Thanks! |
| @@ -0,0 +1,9 @@ | |||
| select sum(hash(a.k1,a.v1,a.k2, a.v2)) | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we don't guarantee hash consistency across versions so I'm not sure if it is a good idea to use hash here.
|
What exactly is the data set testing? I'm not a big fan of just pulling a non-trivial (say beyond 5 records) dataset in. It makes it very difficult to understand what exactly the test case is testing, and what the correct output should be. |
|
The original purpose of Hive test sets is for verifying the query results and behaviors of Auto Join Conversion This is not applicable to Spark SQL. I assume the current purposes in Spark are just for checking whether the query results match the outputs of Hive. Do you want me to completely remove them? Or reduce the data set to around 5 records and keep the queries untouched? |
|
I think we should create a comprehensive test suite for joins, and that should use a small dataset. The Hive thing I'm OK with keeping it for a while, but eventually we should remove them since we are not even sure what they are testing ... I don't remember a case in which these join tests in Hive caught an issue that was not caught by our regular join suites. |
|
Test build #63703 has finished for PR 14625 at commit
|
|
I completely agree. Now, if we still want to temporarily keep these test cases, what should we do next? Based on my understanding, we want to get rid of To remove these TestHive-specific classes, should we just use the existing |
|
I think
Now we have
|
|
@cloud-fan I see. Will try to rewrite the test cases with very few records. Thank you! |
|
Test build #63716 has finished for PR 14625 at commit
|
|
Test build #63717 has finished for PR 14625 at commit
|
|
Test build #63730 has finished for PR 14625 at commit
|
|
@gatorsmile the comment should apply not only to data, but also query (e.g. what case we are testing ...) |
|
@rxin Sure, will do it. |
|
FYI, found the original JIRA that delivered the first 25 auto_join test cases to Hive: https://issues.apache.org/jira/browse/HIVE-1642 |
|
Below is the output of Hive for the same queries. They are the same. |
|
Can we repurpose this ticket to just create test cases for joins in general? |
|
Sure. The scope is a little bit large, but let me try to go over the existing join-related test cases in the test suites. We might not be able to cover all of them in a single ticket. Will try my best. |
|
Test build #63737 has finished for PR 14625 at commit
|
|
Test build #63945 has finished for PR 14625 at commit
|
|
Test build #63947 has finished for PR 14625 at commit
|
|
Test build #63950 has finished for PR 14625 at commit
|
|
Test build #63957 has finished for PR 14625 at commit
|
|
@rxin @cloud-fan The code is ready for review. Thanks! |
|
Test build #64120 has finished for PR 14625 at commit
|
| @@ -0,0 +1,225 @@ | |||
| -- join nested table expressions (auto_join0.q) | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to reference to the hive .q file? I think hive golden file tests will be removed eventually.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
: ) That is for helping reviewers know the origins of the queries. If you think we do not care, we can remove it.
|
Test build #64174 has finished for PR 14625 at commit
|
|
Test build #64181 has finished for PR 14625 at commit
|
|
retest this please |
|
Test build #64226 has finished for PR 14625 at commit
|
|
hmmm, can we split this PR into multiple PRs? We are not copying tests from |
|
Sure, will split it to multiple PRs. Thanks! |
What changes were proposed in this pull request?
#14498 plans to remove Hive Built-in Hash Functions. 10+ test cases are broken because the results are different from the Hive golden answer files. These broken test cases are not Hive specific. Thus, it makes more sense to move them toSQLQueryTestSuiteBased on file-based SQL end-to-end testing framework in
SQLQueryTestSuite, this PR is to create test cases for joins in general. We will try to move most Join-specific SQL test cases into the newjoins.sqlfile.How was this patch tested?
This PR is just for improving test cases.