[SPARK-24251][SQL] Add analysis tests for AppendData. #22043

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Closed

rdblue wants to merge 2 commits into apache:master from rdblue:SPARK-24251-add-append-data-analysis-tests

Contributor

rdblue commented Aug 8, 2018 •

edited

Loading

What changes were proposed in this pull request?

This is a follow-up to #21305 that adds a test suite for AppendData analysis.

This also fixes the following problems uncovered by these tests:

Incorrect order of data types passed to canWrite is fixed
The field check calls canWrite first to ensure all errors are found
AppendData#resolved must check resolution of the query's attributes
Column names are quoted to show empty names

How was this patch tested?

This PR adds a test suite for AppendData analysis.


          SPARK-24251: Add analysis tests for AppendData.

e58d4fc

Contributor Author

rdblue commented Aug 8, 2018

@cloud-fan, here are tests to validate the analysis of AppendData logical plans.

SparkQA commented Aug 8, 2018

Test build #94449 has finished for PR 22043 at commit e58d4fc.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan reviewed

View reviewed changes

...alyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala Outdated

    
                    case (inAttr, outAttr) =>

                        // names and types must match, nullability must be compatible

                        inAttr.name == outAttr.name &&

                        inAttr.resolved && outAttr.resolved &&

Contributor

cloud-fan Aug 9, 2018

I think it's more clear to write table.resolved && query.resolved && query.output.size == table.output.size && ...

Contributor Author

rdblue Aug 9, 2018

Agreed.

...talyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/DataSourceV2AnalysisSuite.scala

    
              class DataSourceV2AnalysisSuite extends AnalysisTest {

                val table = TestRelation(StructType(Seq(

                  StructField("x", FloatType),

                  StructField("y", FloatType))).toAttributes)

Contributor

cloud-fan Aug 9, 2018

nit:

import org.apache.spark.sql.catalyst.dsl.expressions._
import org.apache.spark.sql.catalyst.dsl.plans._

val table = TestRelation(Seq('x.float, 'y.float))
val requiredTable = TestRelation(Seq('x.float.notNull, 'y.float.notNull))
...

Contributor Author

rdblue Aug 9, 2018

Symbols are rarely used in Scala, so I think it is better to use the StructType and convert. It matches what users do more closely.

...talyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/DataSourceV2AnalysisSuite.scala

    
                  assertResolved(parsedPlan)

                }

                test("Append.byName: does not match by position") {

Contributor

cloud-fan Aug 9, 2018

this test is by name.

Contributor Author

rdblue Aug 9, 2018

Yes, this is testing that the query that would work when matching by position fails when matching by name.

...talyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/DataSourceV2AnalysisSuite.scala

    
                  val parsedPlan = AppendData.byName(table, query)

                  assertNotResolved(parsedPlan)

                  assertAnalysisError(parsedPlan, Seq(

Contributor

cloud-fan Aug 9, 2018

it's clearer to specify the caseSensitive parameter of assertAnalysisError

Contributor Author

rdblue Aug 9, 2018

Updated.

...talyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/DataSourceV2AnalysisSuite.scala Outdated

    
                    StructField("y", FloatType))).toAttributes)

                  val X = query.output.toIndexedSeq(0)

                  val y = query.output.toIndexedSeq(1)

Contributor

cloud-fan Aug 9, 2018

query.output.last

Contributor Author

rdblue Aug 9, 2018

Fixed.

...talyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/DataSourceV2AnalysisSuite.scala

    
                    "Cannot find data for output column", "'x'"))

                }

                test("Append.byName: missing required columns cause failure and are identified by name") {

Contributor

cloud-fan Aug 9, 2018

is there really a difference between missing required columns and missing optional columns?

Contributor Author

rdblue Aug 9, 2018

Missing optional columns may be allowed in the future. We've already had a team request this feature (enabled by a flag) to support schema evolution. The use case is that you don't want to fail existing jobs when you add a column to the table. Iceberg supports this, so Spark should too.

...talyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/DataSourceV2AnalysisSuite.scala

    
                    "Cannot find data for output column", "'x'"))

                }

                test("Append.byName: missing optional columns cause failure and are identified by name") {

Contributor

cloud-fan Aug 9, 2018

this test is identical to "Append.byName: missing columns are identified by name"

Contributor Author

rdblue Aug 9, 2018

I probably intended to update it for byPosition. I'll fix it.

Contributor Author

rdblue Aug 9, 2018

Removed. Looks like it was from when I split out the tests for required/optional.

...talyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/DataSourceV2AnalysisSuite.scala Outdated

    
                  assertResolved(expectedPlan)

                }

                test("Append.byPosition: case does not fail column resolution") {

Contributor

cloud-fan Aug 9, 2018

do we need this test? In "Append.byPosition: basic behavior" we proved that we can do append even the column names are different.

Contributor Author

rdblue Aug 9, 2018

I can remove it. I was including most test cases for both byName and byPosition to validate the different behaviors.

Contributor Author

rdblue Aug 9, 2018

Removed.

...talyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/DataSourceV2AnalysisSuite.scala Outdated

    
                    StructField("X", FloatType), // doesn't match case!

                    StructField("y", FloatType))).toAttributes)

                  val X = query.output.toIndexedSeq(0)

Contributor

cloud-fan Aug 9, 2018

can't we just call query.output.head?

...talyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/DataSourceV2AnalysisSuite.scala Outdated

    
                  assertResolved(expectedPlan)

                }

                test("Append.byPosition: case does not fail column resolution") {

Contributor

cloud-fan Aug 9, 2018

do we need this test? In "Append.byPosition: basic behavior" we proved that we can do append even the column names are different.


          SPARK-24251: Update for review comments.

765c5b4

SparkQA commented Aug 9, 2018

Test build #94516 has finished for PR 22043 at commit 765c5b4.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

Contributor

cloud-fan commented Aug 10, 2018

thanks, merging to master!

asfgit closed this in

bdd2796

Contributor Author

rdblue commented Aug 13, 2018

Thanks for reviewing, @cloud-fan!

jzhuge pushed a commit to jzhuge/spark that referenced this pull request


          [SPARK-24251][SQL] Add analysis tests for AppendData.

109b6db

## What changes were proposed in this pull request?

This is a follow-up to apache#21305 that adds a test suite for AppendData analysis.

This also fixes the following problems uncovered by these tests:
* Incorrect order of data types passed to `canWrite` is fixed
* The field check calls `canWrite` first to ensure all errors are found
* `AppendData#resolved` must check resolution of the query's attributes
* Column names are quoted to show empty names

## How was this patch tested?

This PR adds a test suite for AppendData analysis.

Closes apache#22043 from rdblue/SPARK-24251-add-append-data-analysis-tests.

Authored-by: Ryan Blue <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>

(cherry picked from commit bdd2796)

rdblue added a commit to rdblue/spark that referenced this pull request


          [SPARK-24251][SQL] Add analysis tests for AppendData.

818a35c

This is a follow-up to apache#21305 that adds a test suite for AppendData analysis.

This also fixes the following problems uncovered by these tests:
* Incorrect order of data types passed to `canWrite` is fixed
* The field check calls `canWrite` first to ensure all errors are found
* `AppendData#resolved` must check resolution of the query's attributes
* Column names are quoted to show empty names

This PR adds a test suite for AppendData analysis.

Closes apache#22043 from rdblue/SPARK-24251-add-append-data-analysis-tests.

Authored-by: Ryan Blue <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>

jzhuge pushed a commit to jzhuge/spark that referenced this pull request


          [SPARK-24251][SQL] Add analysis tests for AppendData.

467eef7

## What changes were proposed in this pull request?

This is a follow-up to apache#21305 that adds a test suite for AppendData analysis.

This also fixes the following problems uncovered by these tests:
* Incorrect order of data types passed to `canWrite` is fixed
* The field check calls `canWrite` first to ensure all errors are found
* `AppendData#resolved` must check resolution of the query's attributes
* Column names are quoted to show empty names

## How was this patch tested?

This PR adds a test suite for AppendData analysis.

Closes apache#22043 from rdblue/SPARK-24251-add-append-data-analysis-tests.

Authored-by: Ryan Blue <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>

(cherry picked from commit bdd2796)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet