Skip to content

Conversation

@liancheng
Copy link
Contributor

ReadContext.init calls InitContext.getMergedKeyValueMetadata, which doesn't know how to merge conflicting user defined key-value metadata and throws exception. In our case, when dealing with different but compatible schemas, we have different Spark SQL schema JSON strings in different Parquet part-files, thus causes this problem. Reading similar Parquet files generated by Hive doesn't suffer from this issue.

In this PR, we manually merge the schemas before passing it to ReadContext to avoid the exception.

Review on Reviewable

@SparkQA
Copy link

SparkQA commented Feb 25, 2015

Test build #27951 has started for PR 4768 at commit 9002f0a.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Feb 25, 2015

Test build #27951 has finished for PR 4768 at commit 9002f0a.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/27951/
Test PASSed.

asfgit pushed a commit that referenced this pull request Feb 25, 2015
…g splits

`ReadContext.init` calls `InitContext.getMergedKeyValueMetadata`, which doesn't know how to merge conflicting user defined key-value metadata and throws exception. In our case, when dealing with different but compatible schemas, we have different Spark SQL schema JSON strings in different Parquet part-files, thus causes this problem. Reading similar Parquet files generated by Hive doesn't suffer from this issue.

In this PR, we manually merge the schemas before passing it to `ReadContext` to avoid the exception.

<!-- Reviewable:start -->
[<img src="https://reviewable.io/review_button.png" height=40 alt="Review on Reviewable"/>](https://reviewable.io/reviews/apache/spark/4768)
<!-- Reviewable:end -->

Author: Cheng Lian <[email protected]>

Closes #4768 from liancheng/spark-6010 and squashes the following commits:

9002f0a [Cheng Lian] Fixes SPARK-6010

(cherry picked from commit e0fdd46)
Signed-off-by: Michael Armbrust <[email protected]>
@asfgit asfgit closed this in e0fdd46 Feb 25, 2015
@liancheng liancheng deleted the spark-6010 branch February 26, 2015 02:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants