Skip to content

Commit 407175e

Browse files
lianchengrxin
authored andcommitted
[SPARK-9974] [BUILD] [SQL] Makes sure com.twitter:parquet-hadoop-bundle:1.6.0 is in SBT assembly jar
PR #7967 enables Spark SQL to persist Parquet tables in Hive compatible format when possible. One of the consequence is that, we have to set input/output classes to `MapredParquetInputFormat`/`MapredParquetOutputFormat`, which rely on com.twitter:parquet-hadoop:1.6.0 bundled with Hive 1.2.1. When loading such a table in Spark SQL, `o.a.h.h.ql.metadata.Table` first loads these input/output format classes, and thus classes in com.twitter:parquet-hadoop:1.6.0. However, the scope of this dependency is defined as "runtime", and is not packaged into Spark assembly jar. This results in a `ClassNotFoundException`. This issue can be worked around by asking users to add parquet-hadoop 1.6.0 via the `--driver-class-path` option. However, considering Maven build is immune to this problem, I feel it can be confusing and inconvenient for users. So this PR fixes this issue by changing scope of parquet-hadoop 1.6.0 to "compile". Author: Cheng Lian <[email protected]> Closes #8198 from liancheng/spark-9974/bundle-parquet-1.6.0. (cherry picked from commit 52ae952) Signed-off-by: Reynold Xin <[email protected]>
1 parent 0f1417b commit 407175e

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

pom.xml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1598,7 +1598,7 @@
15981598
<groupId>com.twitter</groupId>
15991599
<artifactId>parquet-hadoop-bundle</artifactId>
16001600
<version>${hive.parquet.version}</version>
1601-
<scope>runtime</scope>
1601+
<scope>compile</scope>
16021602
</dependency>
16031603
<dependency>
16041604
<groupId>org.apache.flume</groupId>

0 commit comments

Comments
 (0)