[SPARK-24562][TESTS] Support different configs for same test in SQLQueryTestSuite #21568

mgaido91 · 2018-06-14T17:07:38Z

What changes were proposed in this pull request?

The PR proposes to add support for running the same SQL test input files against different configs leading to the same result.

How was this patch tested?

Involved UTs

mgaido91 · 2018-06-14T17:07:50Z

cc @cloud-fan

SparkQA · 2018-06-14T20:33:05Z

Test build #91859 has finished for PR 21568 at commit ed01ff0.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

maropu · 2018-06-16T03:54:35Z

...ive/decimalArithmeticOperations.sql_spark.sql.decimalOperations.allowPrecisionLoss-false.out

@@ -0,0 +1,193 @@
+-- Automatically generated by SQLQueryTestSuite


This new feature looks good to me. Btw, if we set multiple configurations, filenames get too long? How about adding the configuration info. in the head of files? Then, filenames are;

./decimalArithmeticOperations.sql.out.1 ./decimalArithmeticOperations.sql.out.2 ...

Since we have two cases:
1 - Different configs produce different results (so different files) and in this case your suggestion is fine;
2 - Different configs produce the same results (so we have one golden file for all of them), how would you address this case?

I was thinking the second case also output multiple same result files for each config.

// these files have the same result subquery/in-subquery/in-joins.sql.out.1 <- sort-merge joins subquery/in-subquery/in-joins.sql.out.2 <- hash joins

maropu · 2018-06-16T03:57:35Z

sql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestSuite.scala

+    "typeCoercion/native/decimalArithmeticOperations.sql" ->
+      (Seq(Seq(SQLConf.DECIMAL_OPERATIONS_ALLOW_PREC_LOSS.key -> "true"),
+        Seq(SQLConf.DECIMAL_OPERATIONS_ALLOW_PREC_LOSS.key -> "false")) -> false),
+    "subquery/in-subquery/in-joins.sql" -> configsAllJoinTypes,


This entry and the others blow don't change existing test result output?

Exactly, changing these configs should not change the output.

rxin · 2018-06-16T03:57:53Z

I'm confused by the description. What does this PR actually do?

maropu · 2018-06-16T04:02:15Z

IIUC this pr modified code to run a single test file (in SQLQueryTestSuite) multiple times with different configurations.

mgaido91 · 2018-06-16T06:00:50Z

@rxin the PR does what was suggested in these 2 comments #20023 (comment) and #21529 (comment). Basically we want to run the same SQL test file with different configs.

We have two cases:

Running with different configs produces different output (so different golden files);
Running with different configs produces the same output (so we have only one golden file) but the tests are run against different configs.

The goals are to avoid to copy and paste the same queries after setting different configurations (as it was done in decimalArithmeticOperations) and to be able to improve test coverage for the joins (because with default configs we basically always execute broadcast joins).

maropu · 2018-06-16T10:04:52Z

We need to strictly handle the second case, too? If we accepted the same output files for that case, we could have simpler output file name rules as I described here?

mgaido91 · 2018-06-16T10:40:12Z

Sorry, @maropu, I didn't get want you mean by

If we accepted the same output files for that case

may you please explain me?

Anyway, the problem with that proposal is not really about the filenames, but it is about adding the configs inside the files as comments. Because if we have the same output file we cannot include them in the header...

viirya · 2018-06-17T06:54:06Z

sql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestSuite.scala

                      // We should ignore this file from processing.
  )

+  private val configsAllJoinTypes = Seq(Seq(SQLConf.AUTO_BROADCASTJOIN_THRESHOLD.key ->


Add a comment describing what each config set means for?

sorry I am not sure what you mean exactly. Every config has it is own description which explains what it is intended for. Did you mean something different?

Oh, I meant that here you definitely want to test against different join types, maybe it is good if you can describe which join type each config set means.

I see now, got it, thanks. I am not sure as we do not usually do so in other test suites. @cloud-fan @maropu what do you think?

viirya · 2018-06-17T07:30:41Z

sql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestSuite.scala

+  private def testCases(inputPath: String): Seq[TestCase] = {
+    val baseResultFileName = inputPath.replace(inputFilePath, goldenFilePath)
+    val testCaseName = inputPath.stripPrefix(inputFilePath).stripPrefix(File.separator)
+    testConfigs.get(testCaseName) match {


Once a test is renamed, it might silently turn to default test config. To avoid that, maybe we should explicitly define which test case needs to run against custom configs.

Mmmh... Well, actually it is not very silently, as you would see it executes only once it the test runs and not many times with different suffixes. But of course it won't fail. @cloud-fan @maropu what do you think?

If we need to take care of the issue, how about listing up all the test files inside SQLQueryTestSuite.scala? Then, if developers add/delete/rename test files, they need to update the list?

Honestly I do not really like this idea @maropu, it puts an extra effort which is not really needed...

maropu · 2018-06-18T08:00:12Z

In the same result case, I'm worried that we cannot easily understand which SQL configs cause failures?
IMHO withSQLConf has the same issue, too;

Seq(true, false).foreach { flag =>
  withSQLConf("spark.sql.anyFlag" -> flag.toString) {
    ...
  }
}

In this test case (common patterns?), we cannot understand which case (true or false) causes the failure at first glance. For example, can we use withClue to solve this?
master...maropu:AddConfigInfoInException

mgaido91 · 2018-06-18T09:57:36Z

@maropu I think we can, instead. Since the configs are put as suffix in the test case name (this happens also in the same result case) you know which configs failed.

mgaido91 · 2018-07-09T10:08:46Z

kindly ping @cloud-fan

rxin · 2018-07-09T18:19:25Z

I think it's super confusing to have the config names encoded in file names. Makes the names super long and difficult to read, and also hard to verify what was set, and difficult to get multiple configs.

mgaido91 · 2018-07-09T20:38:28Z

@rxin then we can do what @maropu suggested, i.e. adding a numeric suffix and maybe logging the used configs so even though we.don't have them in the test name we can anyway know which of them is failing. What do you think? Do you have a better proposal?

rxin · 2018-07-09T20:41:50Z

Can you just define a config matrix in the beginning of the file, and each file is run with the config matrix?

mgaido91 · 2018-07-09T20:46:40Z

Sorry,what do you mean by a config matrix? And how would we discriminate whether each config should produce the same result or they should produce different ones?

rxin · 2018-07-09T20:49:54Z

If they produce different results why do you need any infrastructure for them? They are just part of the normal test flow.

If they produce the same result, and you don't want to define the same test queries twice, we can create an infra for that. I thought that's what this is about?

mgaido91 · 2018-07-09T20:55:07Z

The goal here was to address both cases. The need came out in previous PRs in order to avoid to copy and paste the same queries in order to test them with different configs. So the idea here was to have an infra which covers both cases in order to avoid this copy-and paste (see decimalOperations.sql for instance).

rxin · 2018-07-09T21:00:05Z

What are the use cases other than decimal? I am not sure if we need to build a lot of infrastructure just for one or two use cases.

mgaido91 · 2018-07-09T21:05:27Z

The main other case here was to improve also the coverage for the join operations, because with the default values we are testing only the broadcast join.

cloud-fan · 2018-07-10T08:58:42Z

It's good to cover both of the cases in one design, but I'd like to prioritize the join one.

I feel it's common to try with different optimization/runtime configs and make sure we get corrected result. It's more important than the decimal one that just saves some typing.

Seems it's hard to reach a consensus of a good design to cover both of the cases, how about we just do the join one? i.e. a SQL test file can specify a config matrix(we need to design a syntax for it), and the test framework should run this test file with specified configs and their values to make sure the results all match the golden file.

mgaido91 · 2018-07-10T13:54:57Z

I am not sure it is a great idea to do only one of the 2 scenarios, if we plan to later include both them as we might have to redo the same work twice. But if you all agree on this plan I'll stick to it.

cloud-fan · 2018-07-10T15:18:13Z

We can deal with the decimal test file specially if that's the only use case. For now I'd say the join test is more important and let's finish it first.

rxin · 2018-07-10T16:09:43Z

To me it is actually confusing to have the decimal one in there at all, by defining a list of queries that are reused for different functional testing. It is very easy to just ignore the subtle differences. We are also risk over engineering this with only one use case.

…

On Tue, Jul 10, 2018 at 8:20 AM Wenchen Fan ***@***.***> wrote: We can deal with the decimal test file specially if that's the only use case. For now I'd say the join test is more important and let's finish it first. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#21568 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AATvPMjJPsZhXrOo_pbuxz-GwvKdds9lks5uFMYkgaJpZM4UoVQo> .

This reverts commit ed01ff0.

mgaido91 · 2018-07-11T11:15:03Z

@cloud-fan @rxin I updated the PR in order to handle only the case in which we have the same result for different configs and the configs are specified in the SQL files.
In order to make clear which config caused a job error I added a ERROR message in the logs.

Do you all agree with this approach?

SparkQA · 2018-07-11T14:44:35Z

Test build #92857 has finished for PR 21568 at commit 6f90e63.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2018-07-11T15:30:51Z

sql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestSuite.scala

+      runQueries(queries, testCase.resultFile, None)
+    } else {
+      configSets.foreach { configSet =>
+        try {


I think it's better to do the try-catch inside runQueries, so that we can know which config cause a failure.

mmh, but we know which are the configs causing a failure also here (and we are logging them). Am I missing something from your comment?

ah sorry I misread the code.

cloud-fan · 2018-07-11T15:31:25Z

LGTM

cloud-fan · 2018-07-11T15:43:40Z

thanks, merging to master!

Different config for same test in SQLQueryTestSuite

ed01ff0

maropu reviewed Jun 16, 2018

View reviewed changes

viirya reviewed Jun 17, 2018

View reviewed changes

mgaido91 added 2 commits July 11, 2018 11:20

Revert "Different config for same test in SQLQueryTestSuite"

94cbe24

This reverts commit ed01ff0.

use matrix of configs in the files

6f90e63

cloud-fan reviewed Jul 11, 2018

View reviewed changes

asfgit closed this in 592cc84 Jul 11, 2018

		@@ -0,0 +1,193 @@
		-- Automatically generated by SQLQueryTestSuite

[SPARK-24562][TESTS] Support different configs for same test in SQLQueryTestSuite #21568

[SPARK-24562][TESTS] Support different configs for same test in SQLQueryTestSuite #21568

Uh oh!

Conversation

mgaido91 commented Jun 14, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

mgaido91 commented Jun 14, 2018

Uh oh!

SparkQA commented Jun 14, 2018

Uh oh!

maropu Jun 16, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rxin commented Jun 16, 2018

Uh oh!

maropu commented Jun 16, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mgaido91 commented Jun 16, 2018

Uh oh!

maropu commented Jun 16, 2018

Uh oh!

mgaido91 commented Jun 16, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

maropu commented Jun 18, 2018

Uh oh!

mgaido91 commented Jun 18, 2018

Uh oh!

mgaido91 commented Jul 9, 2018

Uh oh!

rxin commented Jul 9, 2018

Uh oh!

mgaido91 commented Jul 9, 2018

Uh oh!

rxin commented Jul 9, 2018

Uh oh!

mgaido91 commented Jul 9, 2018

Uh oh!

rxin commented Jul 9, 2018

Uh oh!

mgaido91 commented Jul 9, 2018

Uh oh!

rxin commented Jul 9, 2018

Uh oh!

mgaido91 commented Jul 9, 2018

Uh oh!

cloud-fan commented Jul 10, 2018

Uh oh!

mgaido91 commented Jul 10, 2018

Uh oh!

cloud-fan commented Jul 10, 2018

Uh oh!

mgaido91 commented Jun 14, 2018 •

edited

Loading

maropu Jun 16, 2018 •

edited

Loading

maropu commented Jun 16, 2018 •

edited

Loading