Skip to content

Conversation

@dongjoon-hyun
Copy link
Member

@dongjoon-hyun dongjoon-hyun commented Aug 31, 2017

What changes were proposed in this pull request?

This PR aims to fix StackOverflowError in branch-2.2. This happens when OptimizeMetadataOnlyQuery returns LocalRelation with partition informations without materializations, e.g. for Data source tables (Parquet/ORC) or Hive table stored by Parquet with convertMetastore.
master branch has the same logic, but it doesn't throw StackOverflowError due to the other differences.

scala> spark.version
res0: String = 2.2.0   // 2.2.1-SNAPSHOT is the same.

scala> sql("CREATE TABLE t_1000 (a INT, p INT) USING PARQUET PARTITIONED BY (p)")
res1: org.apache.spark.sql.DataFrame = []

scala> (1 to 1000).foreach(p => sql(s"ALTER TABLE t_1000 ADD PARTITION (p=$p)"))

scala> sql("SELECT COUNT(DISTINCT p) FROM t_1000").collect
java.lang.StackOverflowError
  at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1522)

How was this patch tested?

Pass the Jenkins with a new test case.

@dongjoon-hyun dongjoon-hyun changed the title [SPARK-21884][SQL] Fix StackOverflowError on MetadataOnlyQuery [SPARK-21884][SQL][BRANCH-2.2] Fix StackOverflowError on MetadataOnlyQuery Aug 31, 2017
@SparkQA
Copy link

SparkQA commented Aug 31, 2017

Test build #81278 has finished for PR 19094 at commit 07126f7.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dongjoon-hyun
Copy link
Member Author

Hi, @lianhuiwang and @hvanhovell .
Could you review this PR? When this was introduced at 2.1.0, there was no problem.
When the underlying classes of fsRelation.location.listFiles changed in 2.2.0, this happens.

@gatorsmile
Copy link
Member

#18686

My fix resolves your issue, right?

@dongjoon-hyun
Copy link
Member Author

Thank you!

@dongjoon-hyun
Copy link
Member Author

I close this issue. Thank you again.

asfgit pushed a commit that referenced this pull request Sep 1, 2017
…'s input data transient

This PR is to backport #18686 for resolving the issue in #19094

---

## What changes were proposed in this pull request?
This PR is to mark the parameter `rows` and `unsafeRow` of LocalTableScanExec transient. It can avoid serializing the unneeded objects.

## How was this patch tested?
N/A

Author: gatorsmile <[email protected]>

Closes #19101 from gatorsmile/backport-21477.
MatthewRBruce pushed a commit to Shopify/spark that referenced this pull request Jul 31, 2018
…'s input data transient

This PR is to backport apache#18686 for resolving the issue in apache#19094

---

## What changes were proposed in this pull request?
This PR is to mark the parameter `rows` and `unsafeRow` of LocalTableScanExec transient. It can avoid serializing the unneeded objects.

## How was this patch tested?
N/A

Author: gatorsmile <[email protected]>

Closes apache#19101 from gatorsmile/backport-21477.
@dongjoon-hyun dongjoon-hyun deleted the SPARK-21884 branch January 7, 2019 07:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants