Skip to content

Conversation

@dongjoon-hyun
Copy link
Member

@dongjoon-hyun dongjoon-hyun commented Aug 4, 2020

What changes were proposed in this pull request?

This PR aims to clean up InMemoryRelation.ser.

Why are the changes needed?

SPARK-32274 makes SQL cache serialization pluggable.

[SPARK-32274][SQL] Make SQL cache serialization pluggable

This causes UT failures.

$ build/sbt "sql/testOnly *.CachedBatchSerializerSuite *.CachedTableSuite"
...
[info]   Cause: java.lang.IllegalStateException: This does not work. This is only for testing
[info]   at org.apache.spark.sql.execution.columnar.TestSingleIntColumnarCachedBatchSerializer.convertInternalRowToCachedBatch(CachedBatchSerializerSuite.scala:49)
...
[info] *** 30 TESTS FAILED ***
[error] Failed: Total 51, Failed 30, Errors 0, Passed 21
[error] Failed tests:
[error] 	org.apache.spark.sql.CachedTableSuite
[error] (sql/test:testOnly) sbt.TestsFailedException: Tests unsuccessful

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Manually.

$ build/sbt "sql/testOnly *.CachedBatchSerializerSuite *.CachedTableSuite"
[info] Tests: succeeded 51, failed 0, canceled 0, ignored 0, pending 0
[info] All tests passed.
[info] Passed: Total 51, Failed 0, Errors 0, Passed 51

@dongjoon-hyun
Copy link
Member Author

Could you review this PR, @revans2 , @tgravescs , @cloud-fan ?

}

/* Visible for testing */
private[spark] def clearSerializer(): Unit = synchronized { ser = None }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, @dongjoon-hyun. What about just fixing CachedBatchSerializerSuite not to extend SharedSparkSessionBase? For example, like ExecutorSideSQLConfSuite or SparkSessionExtensionSuite. I think that would be simpler and a more isolated fix.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ya. I thought that way first, but this is more general way because SPARK-32274 make SQL cache serialization pluggable. We may have another test suite in the future.

Copy link
Member Author

@dongjoon-hyun dongjoon-hyun Aug 4, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, the test resource clean-up had better be centralized at SharedSparkSession in order to not to forget.

Copy link
Member Author

@dongjoon-hyun dongjoon-hyun Aug 4, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, I'm still open to your idea. Let's see the original author and committer opinion. Thanks.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it really help? InMemoryRelation.ser doesn't belong to any session and is global.

I think a simpler fix is to clear it in CachedBatchSerializerSuite.beforeAll and afterAll.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this question, yes. The root cause is that InMemoryRelation.ser is a kind of singleton. Since the new configuration is static conf, this will match with the semantic of InMemoryRelation.ser. So, the problem is the testing.

Does it really help? InMemoryRelation.ser doesn't belong to any session and is global.

Copy link
Member Author

@dongjoon-hyun dongjoon-hyun Aug 4, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me create another PR for @HyukjinKwon or @cloud-fan idea to compare with this.

Copy link
Member Author

@dongjoon-hyun dongjoon-hyun Aug 4, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First of all, @HyukjinKwon 's idea is unable to remove the failure because InMemoryRelation.ser is a singleton.

What about just fixing CachedBatchSerializerSuite not to extend SharedSparkSessionBase?

I'm moving to @cloud-fan 's proposal.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @dongjoon-hyun!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like @cloud-fan proposal as it will make testing easier as well.

@dongjoon-hyun
Copy link
Member Author

@dongjoon-hyun
Copy link
Member Author

This PR is closed in favor of the alternative.

@SparkQA
Copy link

SparkQA commented Aug 4, 2020

Test build #127025 has finished for PR 29344 at commit b3bed27.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Aug 4, 2020

Test build #127026 has finished for PR 29344 at commit 8cfe79b.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants