[SPARK-17905] [SQL] [TEST] Added test cases for InMemoryRelation #15462

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Closed

kiszk wants to merge 2 commits into apache:master from kiszk:columnartestsuites

Member

kiszk commented Oct 13, 2016

What changes were proposed in this pull request?

This pull request adds test cases for the following cases:

keep all data types with null or without null
access CachedBatch disabling whole stage codegen
access only some columns in CachedBatch

This PR is a part of #15219. Here are motivations to add these tests. When #15219 is enabled, the first two cases are handled by specialized (generated) code. The third one is a pitfall.

In general, even for now, it would be helpful to increase test coverage.

How was this patch tested?

added test suites itself


          added test suites

b5e9866

SparkQA commented Oct 13, 2016

Test build #66884 has finished for PR 15462 at commit b5e9866.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

Member Author

kiszk commented Oct 13, 2016

Jenkins, retest this please

SparkQA commented Oct 13, 2016

Test build #66889 has finished for PR 15462 at commit b5e9866.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

kiszk mentioned this pull request

[WIP][SPARK-14098][SQL] Generate Java code to build CachedColumnarBatch and get values from CachedColumnarBatch when DataFrame.cache() is called #15219

Closed

Member Author

kiszk commented Oct 17, 2016 •

edited

Loading

@davies, could you please review this at first if #15219 is too big?
cc: @vanzin

andrewor14 suggested changes

View reviewed changes

Contributor

andrewor14 left a comment

@kiszk This looks good, a few minor code organization comments.

...core/src/test/scala/org/apache/spark/sql/execution/columnar/InMemoryColumnarQuerySuite.scala Outdated

    
                  assert(inMemoryRelation.cachedColumnBuffers.getStorageLevel == storageLevel)

                  inMemoryRelation.cachedColumnBuffers.collect().head match {

                    case _: CachedBatch => assert(true)

Contributor

andrewor14 Nov 22, 2016

no need to do assert true here

Member Author

kiszk Nov 24, 2016

i see

...core/src/test/scala/org/apache/spark/sql/execution/columnar/InMemoryColumnarQuerySuite.scala Outdated

    
                test("all data type w && w/o nullability") {

                  // all primitives

                  Seq(true, false).map { nullability =>

Contributor

andrewor14 Nov 22, 2016

can you split these into 2 separate tests? It's more debuggable that way. I would use a private helper method to abstract the logic

Member Author

kiszk Nov 24, 2016

OK. I created smaller multiple tests

...core/src/test/scala/org/apache/spark/sql/execution/columnar/InMemoryColumnarQuerySuite.scala Outdated

    
                test("access only some column of the all of columns") {

                  val df = spark.range(1, 100).map(i => (i, (i + 1).toFloat)).toDF("i", "f").cache

                  df.count

Contributor

andrewor14 Nov 22, 2016

please be explicit with the actions here count(), cache()

Member Author

kiszk Nov 24, 2016

Done

...core/src/test/scala/org/apache/spark/sql/execution/columnar/InMemoryColumnarQuerySuite.scala

    
                    checkAnswer(df, row)

                  }

                  withSQLConf(SQLConf.WHOLESTAGE_MAX_NUM_FIELDS.key -> "2") {

Contributor

andrewor14 Nov 22, 2016

similarly can you split these into multiple smaller unit tests?

Member Author

kiszk Nov 24, 2016

I see

...core/src/test/scala/org/apache/spark/sql/execution/columnar/InMemoryColumnarQuerySuite.scala Outdated

    
                setupTestData()

                def cachePrimitiveTest(data: DataFrame, dataType: String) {

Contributor

andrewor14 Nov 22, 2016

minor: private def

Member Author

kiszk Nov 24, 2016

done

Contributor

andrewor14 commented Nov 22, 2016

retest this please

SparkQA commented Nov 22, 2016

Test build #69025 has finished for PR 15462 at commit b5e9866.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.


          address review comments

42eade8

SparkQA commented Nov 24, 2016

Test build #69138 has finished for PR 15462 at commit 42eade8.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

Member Author

kiszk commented Nov 24, 2016

@andrewor14 thank you for your review. Could you please see it again?

Contributor

andrewor14 commented Nov 28, 2016

LGTM, merging into master 2.1 thanks.

asfgit pushed a commit that referenced this pull request


          [SPARK-17680][SQL][TEST] Added test cases for InMemoryRelation

b386943

## What changes were proposed in this pull request?

This pull request adds test cases for the following cases:
- keep all data types with null or without null
- access `CachedBatch` disabling whole stage codegen
- access only some columns in `CachedBatch`

This PR is a part of #15219. Here are motivations to add these tests. When #15219 is enabled, the first two cases are handled by specialized (generated) code. The third one is a pitfall.

In general, even for now, it would be helpful to increase test coverage.
## How was this patch tested?

added test suites itself

Author: Kazuaki Ishizaki <[email protected]>

Closes #15462 from kiszk/columnartestsuites.

Contributor

andrewor14 commented Nov 28, 2016

@kiszk is there a JIRA associated specifically with adding tests for InMemoryRelation?

asfgit closed this in

ad67993

kiszk changed the title ~~[SPARK-17680] [SQL] [TEST] Added test cases for InMemoryRelation~~ [SPARK-17905] [SQL] [TEST] Added test cases for InMemoryRelation

Member Author

kiszk commented Nov 29, 2016

Thank you for pointing out an JIRA issue. I made a correction to a JIRA entry.

Member

gatorsmile commented Nov 29, 2016

Done. Corrected the JIRA. Thanks!

robert3005 pushed a commit to palantir/spark that referenced this pull request


          [SPARK-17680][SQL][TEST] Added test cases for InMemoryRelation

5c0cb93

## What changes were proposed in this pull request?

This pull request adds test cases for the following cases:
- keep all data types with null or without null
- access `CachedBatch` disabling whole stage codegen
- access only some columns in `CachedBatch`

This PR is a part of apache#15219. Here are motivations to add these tests. When apache#15219 is enabled, the first two cases are handled by specialized (generated) code. The third one is a pitfall.

In general, even for now, it would be helpful to increase test coverage.
## How was this patch tested?

added test suites itself

Author: Kazuaki Ishizaki <[email protected]>

Closes apache#15462 from kiszk/columnartestsuites.

uzadude pushed a commit to uzadude/spark that referenced this pull request


          [SPARK-17680][SQL][TEST] Added test cases for InMemoryRelation

b94b735

## What changes were proposed in this pull request?

This pull request adds test cases for the following cases:
- keep all data types with null or without null
- access `CachedBatch` disabling whole stage codegen
- access only some columns in `CachedBatch`

This PR is a part of apache#15219. Here are motivations to add these tests. When apache#15219 is enabled, the first two cases are handled by specialized (generated) code. The third one is a pitfall.

In general, even for now, it would be helpful to increase test coverage.
## How was this patch tested?

added test suites itself

Author: Kazuaki Ishizaki <[email protected]>

Closes apache#15462 from kiszk/columnartestsuites.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet