-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-35327][SQL][TESTS] Filters out the TPC-DS queries that can cause flaky test results #32454
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Kubernetes integration test starting |
|
Kubernetes integration test status failure |
dongjoon-hyun
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry but I'm not sure why do we need this merging, @maropu . Are we dropping TPCDS v1.4 gradually?
|
Test build #138212 has finished for PR 32454 at commit
|
Ah, on second thought, it is okay just to filter out these queries in |
|
Test build #138222 has finished for PR 32454 at commit
|
|
Kubernetes integration test unable to build dist. exiting with code: 1 |
18c6875 to
386d666
Compare
|
Kubernetes integration test starting |
|
Kubernetes integration test status failure |
|
Test build #138226 has finished for PR 32454 at commit
|
|
cc: @HyukjinKwon |
dongjoon-hyun
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with the purpose of this PR. +1.
cc @gatorsmile and @cloud-fan too since this has been here for a long time.
|
Thank you, @dongjoon-hyun ~ Merged to master. |
…se flaky test results This PR proposes to filter out TPCDS v1.4 q6 and q75 in `TPCDSQueryTestSuite`. I saw`TPCDSQueryTestSuite` failed nondeterministically because output row orders were different with those in the golden files. For example, the failure in the GA job, https://github.com/linhongliu-db/spark/runs/2507928605?check_suite_focus=true, happened because the `tpcds/q6.sql` query output rows were only sorted by `cnt`: https://github.com/apache/spark/blob/a0c76a8755a148e2bd774edcda12fe20f2f38c75/sql/core/src/test/resources/tpcds/q6.sql#L20 Actually, `tpcds/q6.sql` and `tpcds-v2.7.0/q6.sql` are almost the same and the only difference is that `tpcds-v2.7.0/q6.sql` sorts both `cnt` and `a.ca_state`: https://github.com/apache/spark/blob/a0c76a8755a148e2bd774edcda12fe20f2f38c75/sql/core/src/test/resources/tpcds-v2.7.0/q6.sql#L22 So, I think it's okay just to test `tpcds-v2.7.0/q6.sql` in this case (q75 has the same issue). For stable testing. No, dev-only. GA passed. Closes apache#32454 from maropu/CleanUpTpcdsQueries. Authored-by: Takeshi Yamamuro <[email protected]> Signed-off-by: Takeshi Yamamuro <[email protected]>
…se flaky test results This PR proposes to filter out TPCDS v1.4 q6 and q75 in `TPCDSQueryTestSuite`. I saw`TPCDSQueryTestSuite` failed nondeterministically because output row orders were different with those in the golden files. For example, the failure in the GA job, https://github.com/linhongliu-db/spark/runs/2507928605?check_suite_focus=true, happened because the `tpcds/q6.sql` query output rows were only sorted by `cnt`: https://github.com/apache/spark/blob/a0c76a8755a148e2bd774edcda12fe20f2f38c75/sql/core/src/test/resources/tpcds/q6.sql#L20 Actually, `tpcds/q6.sql` and `tpcds-v2.7.0/q6.sql` are almost the same and the only difference is that `tpcds-v2.7.0/q6.sql` sorts both `cnt` and `a.ca_state`: https://github.com/apache/spark/blob/a0c76a8755a148e2bd774edcda12fe20f2f38c75/sql/core/src/test/resources/tpcds-v2.7.0/q6.sql#L22 So, I think it's okay just to test `tpcds-v2.7.0/q6.sql` in this case (q75 has the same issue). For stable testing. No, dev-only. GA passed. Closes apache#32454 from maropu/CleanUpTpcdsQueries. Authored-by: Takeshi Yamamuro <[email protected]> Signed-off-by: Takeshi Yamamuro <[email protected]>
| // The TPCDS queries below are based on v1.4 | ||
| val tpcdsQueries = Seq( | ||
| def tpcdsQueries: Seq[String] = Seq( | ||
| "q1", "q2", "q3", "q4", "q5", "q6", "q7", "q8", "q9", "q10", "q11", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shall we remove q6 from here for all the tests, if the only difference is an extra order by column?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
okay, I'll check it and make a PR to fix it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See #32520
What changes were proposed in this pull request?
This PR proposes to filter out TPCDS v1.4 q6 and q75 in
TPCDSQueryTestSuite.I saw
TPCDSQueryTestSuitefailed nondeterministically because output row orders were different with those in the golden files. For example, the failure in the GA job, https://github.com/linhongliu-db/spark/runs/2507928605?check_suite_focus=true, happened because thetpcds/q6.sqlquery output rows were only sorted bycnt:spark/sql/core/src/test/resources/tpcds/q6.sql
Line 20 in a0c76a8
Actually,
tpcds/q6.sqlandtpcds-v2.7.0/q6.sqlare almost the same and the only difference is thattpcds-v2.7.0/q6.sqlsorts bothcntanda.ca_state:spark/sql/core/src/test/resources/tpcds-v2.7.0/q6.sql
Line 22 in a0c76a8
So, I think it's okay just to test
tpcds-v2.7.0/q6.sqlin this case (q75 has the same issue).Why are the changes needed?
For stable testing.
Does this PR introduce any user-facing change?
No, dev-only.
How was this patch tested?
GA passed.