Skip to content

Conversation

@liancheng
Copy link
Contributor

ORC writes empty schema (struct<>) to ORC files containing zero rows. This is OK for Hive since the table schema is managed by the metastore. But it causes trouble when reading raw ORC files via Spark SQL since we have to discover the schema from the files.

Notice that the ORC data source always avoids writing empty ORC files, but it's still problematic when reading Hive tables which contain empty part-files.

@liancheng
Copy link
Contributor Author

cc @yhuai @zhzhan

@SparkQA
Copy link

SparkQA commented Jul 2, 2015

Test build #36437 has finished for PR 7199 at commit a290221.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 3, 2015

Test build #36456 has finished for PR 7199 at commit ad5b0ae.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 3, 2015

Test build #36459 has finished for PR 7199 at commit bb8cd95.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@liancheng
Copy link
Contributor Author

Merging to master. This PR is backported to branch-1.4 by #7200.

asfgit pushed a commit that referenced this pull request Jul 3, 2015
…rt to 1.4)

This PR backports #7199 to branch-1.4

Author: Cheng Lian <[email protected]>

Closes #7200 from liancheng/spark-8501-for-1.4 and squashes the following commits:

725e9e3 [Cheng Lian] Addresses comments
0fa25af [Cheng Lian] Avoids reading schema from empty ORC files
@asfgit asfgit closed this in 20a4d7d Jul 3, 2015
@liancheng liancheng deleted the spark-8501 branch September 27, 2016 14:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants