-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-24562][TESTS] Support different configs for same test in SQLQueryTestSuite #21568
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
cc @cloud-fan |
|
Test build #91859 has finished for PR 21568 at commit
|
| @@ -0,0 +1,193 @@ | |||
| -- Automatically generated by SQLQueryTestSuite | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This new feature looks good to me. Btw, if we set multiple configurations, filenames get too long? How about adding the configuration info. in the head of files? Then, filenames are;
./decimalArithmeticOperations.sql.out.1
./decimalArithmeticOperations.sql.out.2
...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we have two cases:
1 - Different configs produce different results (so different files) and in this case your suggestion is fine;
2 - Different configs produce the same results (so we have one golden file for all of them), how would you address this case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking the second case also output multiple same result files for each config.
// these files have the same result
subquery/in-subquery/in-joins.sql.out.1 <- sort-merge joins
subquery/in-subquery/in-joins.sql.out.2 <- hash joins
| "typeCoercion/native/decimalArithmeticOperations.sql" -> | ||
| (Seq(Seq(SQLConf.DECIMAL_OPERATIONS_ALLOW_PREC_LOSS.key -> "true"), | ||
| Seq(SQLConf.DECIMAL_OPERATIONS_ALLOW_PREC_LOSS.key -> "false")) -> false), | ||
| "subquery/in-subquery/in-joins.sql" -> configsAllJoinTypes, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This entry and the others blow don't change existing test result output?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Exactly, changing these configs should not change the output.
|
I'm confused by the description. What does this PR actually do? |
|
IIUC this pr modified code to run a single test file (in |
|
@rxin the PR does what was suggested in these 2 comments #20023 (comment) and #21529 (comment). Basically we want to run the same SQL test file with different configs. We have two cases:
The goals are to avoid to copy and paste the same queries after setting different configurations (as it was done in |
|
We need to strictly handle the second case, too? If we accepted the same output files for that case, we could have simpler output file name rules as I described here? |
|
Sorry, @maropu, I didn't get want you mean by
may you please explain me? Anyway, the problem with that proposal is not really about the filenames, but it is about adding the configs inside the files as comments. Because if we have the same output file we cannot include them in the header... |
| // We should ignore this file from processing. | ||
| ) | ||
|
|
||
| private val configsAllJoinTypes = Seq(Seq(SQLConf.AUTO_BROADCASTJOIN_THRESHOLD.key -> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a comment describing what each config set means for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry I am not sure what you mean exactly. Every config has it is own description which explains what it is intended for. Did you mean something different?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I meant that here you definitely want to test against different join types, maybe it is good if you can describe which join type each config set means.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see now, got it, thanks. I am not sure as we do not usually do so in other test suites. @cloud-fan @maropu what do you think?
| private def testCases(inputPath: String): Seq[TestCase] = { | ||
| val baseResultFileName = inputPath.replace(inputFilePath, goldenFilePath) | ||
| val testCaseName = inputPath.stripPrefix(inputFilePath).stripPrefix(File.separator) | ||
| testConfigs.get(testCaseName) match { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Once a test is renamed, it might silently turn to default test config. To avoid that, maybe we should explicitly define which test case needs to run against custom configs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mmmh... Well, actually it is not very silently, as you would see it executes only once it the test runs and not many times with different suffixes. But of course it won't fail. @cloud-fan @maropu what do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we need to take care of the issue, how about listing up all the test files inside SQLQueryTestSuite.scala? Then, if developers add/delete/rename test files, they need to update the list?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Honestly I do not really like this idea @maropu, it puts an extra effort which is not really needed...
|
In the same result case, I'm worried that we cannot easily understand which SQL configs cause failures? In this test case (common patterns?), we cannot understand which case (true or false) causes the failure at first glance. For example, can we use |
|
@maropu I think we can, instead. Since the configs are put as suffix in the test case name (this happens also in the same result case) you know which configs failed. |
|
kindly ping @cloud-fan |
|
I think it's super confusing to have the config names encoded in file names. Makes the names super long and difficult to read, and also hard to verify what was set, and difficult to get multiple configs. |
|
Can you just define a config matrix in the beginning of the file, and each file is run with the config matrix? |
|
Sorry,what do you mean by a config matrix? And how would we discriminate whether each config should produce the same result or they should produce different ones? |
|
If they produce different results why do you need any infrastructure for them? They are just part of the normal test flow. If they produce the same result, and you don't want to define the same test queries twice, we can create an infra for that. I thought that's what this is about? |
|
The goal here was to address both cases. The need came out in previous PRs in order to avoid to copy and paste the same queries in order to test them with different configs. So the idea here was to have an infra which covers both cases in order to avoid this copy-and paste (see decimalOperations.sql for instance). |
|
What are the use cases other than decimal? I am not sure if we need to build a lot of infrastructure just for one or two use cases. |
|
The main other case here was to improve also the coverage for the join operations, because with the default values we are testing only the broadcast join. |
|
It's good to cover both of the cases in one design, but I'd like to prioritize the join one. I feel it's common to try with different optimization/runtime configs and make sure we get corrected result. It's more important than the decimal one that just saves some typing. Seems it's hard to reach a consensus of a good design to cover both of the cases, how about we just do the join one? i.e. a SQL test file can specify a config matrix(we need to design a syntax for it), and the test framework should run this test file with specified configs and their values to make sure the results all match the golden file. |
|
I am not sure it is a great idea to do only one of the 2 scenarios, if we plan to later include both them as we might have to redo the same work twice. But if you all agree on this plan I'll stick to it. |
|
We can deal with the decimal test file specially if that's the only use case. For now I'd say the join test is more important and let's finish it first. |
|
To me it is actually confusing to have the decimal one in there at all, by
defining a list of queries that are reused for different functional
testing. It is very easy to just ignore the subtle differences.
We are also risk over engineering this with only one use case.
…On Tue, Jul 10, 2018 at 8:20 AM Wenchen Fan ***@***.***> wrote:
We can deal with the decimal test file specially if that's the only use
case. For now I'd say the join test is more important and let's finish it
first.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#21568 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AATvPMjJPsZhXrOo_pbuxz-GwvKdds9lks5uFMYkgaJpZM4UoVQo>
.
|
|
@cloud-fan @rxin I updated the PR in order to handle only the case in which we have the same result for different configs and the configs are specified in the SQL files. Do you all agree with this approach? |
|
Test build #92857 has finished for PR 21568 at commit
|
| runQueries(queries, testCase.resultFile, None) | ||
| } else { | ||
| configSets.foreach { configSet => | ||
| try { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's better to do the try-catch inside runQueries, so that we can know which config cause a failure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mmh, but we know which are the configs causing a failure also here (and we are logging them). Am I missing something from your comment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah sorry I misread the code.
|
LGTM |
|
thanks, merging to master! |
What changes were proposed in this pull request?
The PR proposes to add support for running the same SQL test input files against different configs leading to the same result.
How was this patch tested?
Involved UTs