@@ -358,9 +358,24 @@ private[parquet] class CatalystSchemaConverter(
358358 case DateType =>
359359 Types .primitive(INT32 , repetition).as(DATE ).named(field.name)
360360
361- // NOTE: !! This timestamp type is not specified in Parquet format spec !!
362- // However, Impala and older versions of Spark SQL use INT96 to store timestamps with
363- // nanosecond precision (not TIME_MILLIS or TIMESTAMP_MILLIS described in the spec).
361+ // NOTE: Spark SQL TimestampType is NOT a well defined type in Parquet format spec.
362+ //
363+ // As stated in PARQUET-323, Parquet `INT96` was originally introduced to represent nanosecond
364+ // timestamp in Impala for some historical reasons, it's not recommended to be used for any
365+ // other types and will probably be deprecated in future Parquet format spec. That's the
366+ // reason why Parquet format spec only defines `TIMESTAMP_MILLIS` and `TIMESTAMP_MICROS` which
367+ // are both logical types annotating `INT64`.
368+ //
369+ // Originally, Spark SQL uses the same nanosecond timestamp type as Impala and Hive. Starting
370+ // from Spark 1.5.0, we resort to a timestamp type with 100 ns precision so that we can store
371+ // a timestamp into a `Long`. This design decision is subject to change though, for example,
372+ // we may resort to microsecond precision in the future.
373+ //
374+ // For Parquet, we plan to write all `TimestampType` value as `TIMESTAMP_MICROS`, but it's
375+ // currently not implemented yet because parquet-mr 1.7.0 (the version we're currently using)
376+ // hasn't implemented `TIMESTAMP_MICROS` yet.
377+ //
378+ // TODO Implements `TIMESTAMP_MICROS` once parquet-mr has that.
364379 case TimestampType =>
365380 Types .primitive(INT96 , repetition).named(field.name)
366381
0 commit comments