Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 4 additions & 3 deletions dev/deps/spark-deps-hadoop-2.6
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@ JavaEWAH-0.3.2.jar
RoaringBitmap-0.5.11.jar
ST4-4.0.4.jar
activation-1.1.1.jar
aircompressor-0.8.jar
aircompressor-0.10.jar
antlr-2.7.7.jar
antlr-runtime-3.4.jar
antlr4-runtime-4.7.jar
Expand Down Expand Up @@ -157,8 +157,9 @@ objenesis-2.1.jar
okhttp-3.8.1.jar
okio-1.13.0.jar
opencsv-2.3.jar
orc-core-1.4.4-nohive.jar
orc-mapreduce-1.4.4-nohive.jar
orc-core-1.5.2-nohive.jar
orc-mapreduce-1.5.2-nohive.jar
orc-shims-1.5.2.jar
oro-2.0.8.jar
osgi-resource-locator-1.0.1.jar
paranamer-2.8.jar
Expand Down
7 changes: 4 additions & 3 deletions dev/deps/spark-deps-hadoop-2.7
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@ JavaEWAH-0.3.2.jar
RoaringBitmap-0.5.11.jar
ST4-4.0.4.jar
activation-1.1.1.jar
aircompressor-0.8.jar
aircompressor-0.10.jar
antlr-2.7.7.jar
antlr-runtime-3.4.jar
antlr4-runtime-4.7.jar
Expand Down Expand Up @@ -158,8 +158,9 @@ objenesis-2.1.jar
okhttp-3.8.1.jar
okio-1.13.0.jar
opencsv-2.3.jar
orc-core-1.4.4-nohive.jar
orc-mapreduce-1.4.4-nohive.jar
orc-core-1.5.2-nohive.jar
orc-mapreduce-1.5.2-nohive.jar
orc-shims-1.5.2.jar
oro-2.0.8.jar
osgi-resource-locator-1.0.1.jar
paranamer-2.8.jar
Expand Down
7 changes: 4 additions & 3 deletions dev/deps/spark-deps-hadoop-3.1
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ RoaringBitmap-0.5.11.jar
ST4-4.0.4.jar
accessors-smart-1.2.jar
activation-1.1.1.jar
aircompressor-0.8.jar
aircompressor-0.10.jar
antlr-2.7.7.jar
antlr-runtime-3.4.jar
antlr4-runtime-4.7.jar
Expand Down Expand Up @@ -176,8 +176,9 @@ okhttp-2.7.5.jar
okhttp-3.8.1.jar
okio-1.13.0.jar
opencsv-2.3.jar
orc-core-1.4.4-nohive.jar
orc-mapreduce-1.4.4-nohive.jar
orc-core-1.5.2-nohive.jar
orc-mapreduce-1.5.2-nohive.jar
orc-shims-1.5.2.jar
oro-2.0.8.jar
osgi-resource-locator-1.0.1.jar
paranamer-2.8.jar
Expand Down
2 changes: 1 addition & 1 deletion pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -130,7 +130,7 @@
<hive.version.short>1.2.1</hive.version.short>
<derby.version>10.12.1.1</derby.version>
<parquet.version>1.10.0</parquet.version>
<orc.version>1.4.4</orc.version>
<orc.version>1.5.2</orc.version>
<orc.classifier>nohive</orc.classifier>
<hive.parquet.version>1.6.0</hive.parquet.version>
<jetty.version>9.3.20.v20170531</jetty.version>
Expand Down
28 changes: 28 additions & 0 deletions sql/core/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -90,11 +90,39 @@
<groupId>org.apache.orc</groupId>
<artifactId>orc-core</artifactId>
<classifier>${orc.classifier}</classifier>
<exclusions>
<exclusion>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-hdfs</artifactId>
</exclusion>
<!--
orc-core:nohive doesn't have this dependency, but we adds this to prevent
sbt from getting confused.
-->
<exclusion>
<groupId>org.apache.hive</groupId>
<artifactId>hive-storage-api</artifactId>
</exclusion>
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added the above eight lines to be consistent for both mvn and sbt.

</exclusions>
</dependency>
<dependency>
<groupId>org.apache.orc</groupId>
<artifactId>orc-mapreduce</artifactId>
<classifier>${orc.classifier}</classifier>
<exclusions>
<exclusion>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-hdfs</artifactId>
</exclusion>
<!--
orc-core:nohive doesn't have this dependency, but we adds this to prevent
sbt from getting confused.
-->
<exclusion>
<groupId>org.apache.hive</groupId>
<artifactId>hive-storage-api</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>org.apache.parquet</groupId>
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -59,6 +59,19 @@ private[sql] object OrcFileFormat {
def checkFieldNames(names: Seq[String]): Unit = {
names.foreach(checkFieldName)
}

def getQuotedSchemaString(dataType: DataType): String = dataType match {
case _: AtomicType => dataType.catalogString
case StructType(fields) =>
fields.map(f => s"`${f.name}`:${getQuotedSchemaString(f.dataType)}")
.mkString("struct<", ",", ">")
case ArrayType(elementType, _) =>
s"array<${getQuotedSchemaString(elementType)}>"
case MapType(keyType, valueType, _) =>
s"map<${getQuotedSchemaString(keyType)},${getQuotedSchemaString(valueType)}>"
case _ => // UDT and others
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Seems the first _: AtomicType can be saved because this covers all other cases?

dataType.catalogString
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need to recursively quote udt.sqlType?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for review, @maropu . Yes, that is not handled here because the goal is to support user's column names like col1.x usually at the top-level column names.

}
}

/**
Expand Down Expand Up @@ -95,7 +108,7 @@ class OrcFileFormat

val conf = job.getConfiguration

conf.set(MAPRED_OUTPUT_SCHEMA.getAttribute, dataSchema.catalogString)
conf.set(MAPRED_OUTPUT_SCHEMA.getAttribute, OrcFileFormat.getQuotedSchemaString(dataSchema))

conf.set(COMPRESS.getAttribute, orcOptions.compressionCodec)

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -223,6 +223,6 @@ class OrcSerializer(dataSchema: StructType) {
* Return a Orc value object for the given Spark schema.
*/
private def createOrcValue(dataType: DataType) = {
OrcStruct.createValue(TypeDescription.fromString(dataType.catalogString))
OrcStruct.createValue(TypeDescription.fromString(OrcFileFormat.getQuotedSchemaString(dataType)))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this change?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for review, @viirya . ORC 1.5 checks the field name syntax more strictly. For example, a field name having dot.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dongjoon-hyun Thanks for explaining it.

}
}