-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-32524][SQL][TESTS] SharedSparkSession should clean up InMemoryRelation.ser #29344
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Could you review this PR, @revans2 , @tgravescs , @cloud-fan ? |
| } | ||
|
|
||
| /* Visible for testing */ | ||
| private[spark] def clearSerializer(): Unit = synchronized { ser = None } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, @dongjoon-hyun. What about just fixing CachedBatchSerializerSuite not to extend SharedSparkSessionBase? For example, like ExecutorSideSQLConfSuite or SparkSessionExtensionSuite. I think that would be simpler and a more isolated fix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ya. I thought that way first, but this is more general way because SPARK-32274 make SQL cache serialization pluggable. We may have another test suite in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, the test resource clean-up had better be centralized at SharedSparkSession in order to not to forget.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW, I'm still open to your idea. Let's see the original author and committer opinion. Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it really help? InMemoryRelation.ser doesn't belong to any session and is global.
I think a simpler fix is to clear it in CachedBatchSerializerSuite.beforeAll and afterAll.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For this question, yes. The root cause is that InMemoryRelation.ser is a kind of singleton. Since the new configuration is static conf, this will match with the semantic of InMemoryRelation.ser. So, the problem is the testing.
Does it really help? InMemoryRelation.ser doesn't belong to any session and is global.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me create another PR for @HyukjinKwon or @cloud-fan idea to compare with this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First of all, @HyukjinKwon 's idea is unable to remove the failure because InMemoryRelation.ser is a singleton.
What about just fixing CachedBatchSerializerSuite not to extend SharedSparkSessionBase?
I'm moving to @cloud-fan 's proposal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @dongjoon-hyun!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like @cloud-fan proposal as it will make testing easier as well.
|
The alternative PR is ready. |
|
This PR is closed in favor of the alternative. |
|
Test build #127025 has finished for PR 29344 at commit
|
|
Test build #127026 has finished for PR 29344 at commit
|
What changes were proposed in this pull request?
This PR aims to clean up
InMemoryRelation.ser.Why are the changes needed?
SPARK-32274 makes SQL cache serialization pluggable.
This causes UT failures.
Does this PR introduce any user-facing change?
No.
How was this patch tested?
Manually.