Skip to content

Conversation

@yhuai
Copy link
Contributor

@yhuai yhuai commented Feb 25, 2015

Please see JIRA (https://issues.apache.org/jira/browse/SPARK-6016) for details of the bug.

@yhuai
Copy link
Contributor Author

yhuai commented Feb 25, 2015

@liancheng Can we remove FilteringParquetRowInputFormat since task side split is in parquet? If you think it is good, we can do it in another PR.

@SparkQA
Copy link

SparkQA commented Feb 26, 2015

Test build #27966 has finished for PR 4775 at commit 1541554.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@liancheng
Copy link
Contributor

@yhuai As long as we decide to completely deprecate client side metadata reading, it's OK to remove FilteringParquetInputFormat.

@liancheng
Copy link
Contributor

@yhuai Actually, as we discussed offline, FilteringParquetRowInputFormat is still necessary, as we have to do schema merging in getSplits to prevent the exception thrown in SPARK-6010

@liancheng
Copy link
Contributor

This LGTM, please help rebasing it, then I can merge it.

@SparkQA
Copy link

SparkQA commented Feb 26, 2015

Test build #28006 has finished for PR 4775 at commit 78787b1.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@liancheng
Copy link
Contributor

Merging into master and branch-1.3, thanks!

@asfgit asfgit closed this in 192e42a Feb 26, 2015
asfgit pushed a commit that referenced this pull request Feb 26, 2015
… existing table when spark.sql.parquet.cacheMetadata=true

Please see JIRA (https://issues.apache.org/jira/browse/SPARK-6016) for details of the bug.

Author: Yin Huai <[email protected]>

Closes #4775 from yhuai/parquetFooterCache and squashes the following commits:

78787b1 [Yin Huai] Remove footerCache in FilteringParquetRowInputFormat.
dff6fba [Yin Huai] Failed unit test.

(cherry picked from commit 192e42a)
Signed-off-by: Cheng Lian <[email protected]>
@karthikgolagani
Copy link

@liancheng
Hi lian, if you are using sparkContext(sc), you can set ("parquet.enable.summary-metadata", "false") like below:

sc.("parquet.enable.summary-metadata", "false"). This fixed my issue instantly .
I did it in my spark streaming application.

WARN ParquetOutputCommitter: could not write summary file for hdfs://localhost/user/hive/warehouse java.lang.NullPointerException

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants