Skip to content

Conversation

@viirya
Copy link
Member

@viirya viirya commented Feb 26, 2015

FilteringParquetRowInputFormat manually merges Parquet schemas before computing splits. However, it is duplicate because the schemas are already merged in ParquetRelation2. We don't need to re-merge them at InputFormat.

@SparkQA
Copy link

SparkQA commented Feb 26, 2015

Test build #27996 has finished for PR 4786 at commit ef78a5a.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@liancheng
Copy link
Contributor

The reason why we needed to do a separate schema merging in FilteringParquetRowInputFormat was explained in #4768. I'm not sure why removing this doesn't break the test right now. Will investigate this tomorrow. I guess #4775 made the difference.

@viirya
Copy link
Member Author

viirya commented Feb 27, 2015

@liancheng #4768 just explained why you need to do merging. The problem is, before the reading task is launched, the different schemas are already merged in ParquetRelation2. You just re-perform the merging task in FilteringParquetRowInputFormat. We just need to get the already merged schema from configuration and use it.

@liancheng
Copy link
Contributor

Oh I see, you didn't cancel the change, but reused the merged schema, makes sense, thanks! Merging to master and branch-1.3.

asfgit pushed a commit that referenced this pull request Feb 27, 2015
`FilteringParquetRowInputFormat` manually merges Parquet schemas before computing splits. However, it is duplicate because the schemas are already merged in `ParquetRelation2`. We don't need to re-merge them at `InputFormat`.

Author: Liang-Chi Hsieh <[email protected]>

Closes #4786 from viirya/dup_parquet_schemas_merge and squashes the following commits:

ef78a5a [Liang-Chi Hsieh] Avoiding duplicate Parquet schema merging.

(cherry picked from commit 4ad5153)
Signed-off-by: Cheng Lian <[email protected]>
@asfgit asfgit closed this in 4ad5153 Feb 27, 2015
@viirya viirya deleted the dup_parquet_schemas_merge branch December 27, 2023 18:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants