Skip to content

Conversation

@kiszk
Copy link
Member

@kiszk kiszk commented Oct 13, 2016

What changes were proposed in this pull request?

This pull request adds test cases for the following cases:

  • keep all data types with null or without null
  • access CachedBatch disabling whole stage codegen
  • access only some columns in CachedBatch

This PR is a part of #15219. Here are motivations to add these tests. When #15219 is enabled, the first two cases are handled by specialized (generated) code. The third one is a pitfall.

In general, even for now, it would be helpful to increase test coverage.

How was this patch tested?

added test suites itself

@SparkQA
Copy link

SparkQA commented Oct 13, 2016

Test build #66884 has finished for PR 15462 at commit b5e9866.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@kiszk
Copy link
Member Author

kiszk commented Oct 13, 2016

Jenkins, retest this please

@SparkQA
Copy link

SparkQA commented Oct 13, 2016

Test build #66889 has finished for PR 15462 at commit b5e9866.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@kiszk
Copy link
Member Author

kiszk commented Oct 17, 2016

@davies, could you please review this at first if #15219 is too big?
cc: @vanzin

Copy link
Contributor

@andrewor14 andrewor14 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kiszk This looks good, a few minor code organization comments.


assert(inMemoryRelation.cachedColumnBuffers.getStorageLevel == storageLevel)
inMemoryRelation.cachedColumnBuffers.collect().head match {
case _: CachedBatch => assert(true)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need to do assert true here

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i see


test("all data type w && w/o nullability") {
// all primitives
Seq(true, false).map { nullability =>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you split these into 2 separate tests? It's more debuggable that way. I would use a private helper method to abstract the logic

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. I created smaller multiple tests


test("access only some column of the all of columns") {
val df = spark.range(1, 100).map(i => (i, (i + 1).toFloat)).toDF("i", "f").cache
df.count
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please be explicit with the actions here count(), cache()

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

checkAnswer(df, row)
}

withSQLConf(SQLConf.WHOLESTAGE_MAX_NUM_FIELDS.key -> "2") {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

similarly can you split these into multiple smaller unit tests?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see


setupTestData()

def cachePrimitiveTest(data: DataFrame, dataType: String) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor: private def

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@andrewor14
Copy link
Contributor

retest this please

@SparkQA
Copy link

SparkQA commented Nov 22, 2016

Test build #69025 has finished for PR 15462 at commit b5e9866.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Nov 24, 2016

Test build #69138 has finished for PR 15462 at commit 42eade8.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@kiszk
Copy link
Member Author

kiszk commented Nov 24, 2016

@andrewor14 thank you for your review. Could you please see it again?

@andrewor14
Copy link
Contributor

LGTM, merging into master 2.1 thanks.

asfgit pushed a commit that referenced this pull request Nov 28, 2016
## What changes were proposed in this pull request?

This pull request adds test cases for the following cases:
- keep all data types with null or without null
- access `CachedBatch` disabling whole stage codegen
- access only some columns in `CachedBatch`

This PR is a part of #15219. Here are motivations to add these tests. When #15219 is enabled, the first two cases are handled by specialized (generated) code. The third one is a pitfall.

In general, even for now, it would be helpful to increase test coverage.
## How was this patch tested?

added test suites itself

Author: Kazuaki Ishizaki <[email protected]>

Closes #15462 from kiszk/columnartestsuites.
@andrewor14
Copy link
Contributor

@kiszk is there a JIRA associated specifically with adding tests for InMemoryRelation?

@asfgit asfgit closed this in ad67993 Nov 28, 2016
@kiszk kiszk changed the title [SPARK-17680] [SQL] [TEST] Added test cases for InMemoryRelation [SPARK-17905] [SQL] [TEST] Added test cases for InMemoryRelation Nov 29, 2016
@kiszk
Copy link
Member Author

kiszk commented Nov 29, 2016

Thank you for pointing out an JIRA issue. I made a correction to a JIRA entry.

@gatorsmile
Copy link
Member

Done. Corrected the JIRA. Thanks!

robert3005 pushed a commit to palantir/spark that referenced this pull request Dec 2, 2016
## What changes were proposed in this pull request?

This pull request adds test cases for the following cases:
- keep all data types with null or without null
- access `CachedBatch` disabling whole stage codegen
- access only some columns in `CachedBatch`

This PR is a part of apache#15219. Here are motivations to add these tests. When apache#15219 is enabled, the first two cases are handled by specialized (generated) code. The third one is a pitfall.

In general, even for now, it would be helpful to increase test coverage.
## How was this patch tested?

added test suites itself

Author: Kazuaki Ishizaki <[email protected]>

Closes apache#15462 from kiszk/columnartestsuites.
uzadude pushed a commit to uzadude/spark that referenced this pull request Jan 27, 2017
## What changes were proposed in this pull request?

This pull request adds test cases for the following cases:
- keep all data types with null or without null
- access `CachedBatch` disabling whole stage codegen
- access only some columns in `CachedBatch`

This PR is a part of apache#15219. Here are motivations to add these tests. When apache#15219 is enabled, the first two cases are handled by specialized (generated) code. The third one is a pitfall.

In general, even for now, it would be helpful to increase test coverage.
## How was this patch tested?

added test suites itself

Author: Kazuaki Ishizaki <[email protected]>

Closes apache#15462 from kiszk/columnartestsuites.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants