Skip to content

Conversation

@liancheng
Copy link
Contributor

This PR bumps parquet-mr to 1.8.1 to fix PARQUET-251.

One unexpected issue is that parquet-mr 1.8.1 doesn't allow instantiating empty Parquet schema (PARQUET-278). This affects queries like SELECT COUNT(1) FROM t, since we have to leave at least one column in Parquet requested schema. This PR works around this issue by selecting the "narrowest" column when no columns is requested. This issue has already been fixed in parquet-mr 1.8.2-SNAPSHOT (PARQUET-363).

UPDATE We worked around the above issue by constructing a non-empty MessageType and then remove all its fields. See this commit.

@liancheng liancheng force-pushed the spark-9876.parquet-mr-1.8.1 branch from d00ff45 to 020daf1 Compare October 22, 2015 15:23
@SparkQA
Copy link

SparkQA commented Oct 22, 2015

Test build #44153 has finished for PR 9225 at commit 020daf1.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Oct 22, 2015

Test build #44151 timed out for PR 9225 at commit d00ff45 after a configured wait of 250m.

@SparkQA
Copy link

SparkQA commented Oct 27, 2015

Test build #44428 has finished for PR 9225 at commit e48c83a.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@davies
Copy link
Contributor

davies commented Oct 28, 2015

@liancheng Do we have a workaround for PARQUET-251? If yes, we could skip 1.8.1, since it's not in a good state, wait for 1.8.2 instead.

@liancheng
Copy link
Contributor Author

@davies The current master and branch-1.5 have already worked around this issue by disabling Parquet filter push-down for binary and string columns. So in either way, we face performance regression in one way or another.

@liancheng
Copy link
Contributor Author

Closing this since we decided not to bump parquet-mr to 1.8.1 for 1.6.

@liancheng liancheng closed this Nov 5, 2015
ianlcsd added a commit to ianlcsd/spark that referenced this pull request Nov 19, 2015
- manual porting apache#9225
- helpful tests:
org.apache.spark.sql.execution.datasources.parquet.BinaryTest and
org.apache.spark.sql.execution.datasources.parquet.ParquetFilterSuite
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants