Skip to content

parquet doesn't preserve the time in Datatype::Date64 #1275

@rjnanderson

Description

@rjnanderson

Describe the bug
When a RecordBatch is stored in a parquet file and then retrieved the time portion of Datatype::Date64 values is changed to 0.

To Reproduce
with this schema:
Field::new(“item”, DataType::Utf8, false),
Field::new(“timestamp”, DataType::Date64, false)

  1. Read the csv1 data below into batch1
  2. Write batch1 to csv1a
  3. Compare csv1 to csv1a — they match
  4. Write batch1 to a parquet file
  5. Read batch2 from the same parquet file
  6. Write batch2 to csv2
  7. Compare csv1 to csv2 — they don’t match because in csv2 the times are all 00:00:00.000000000

csv1:
item,timestamp
1,1998-10-28T19:10:30.056000000
2,1998-10-30T11:10:10.623000000
3,1999-01-23T17:10:31.006000000

csv2:
item,timestamp
1,1998-10-28T00:00:00.000000000
2,1998-10-30T00:00:00.000000000
3,1999-01-23T00:00:00.000000000

Expected behavior
The time portion of the DataType::Date64 value should be preserved in parquet just as it is in csv.

Additional context
Version 8.0.0

It looks like this unit test needs to include some non-zero times:

#[test]
fn date64_single_column() {
    // Date64 must be a multiple of 86400000, see ARROW-10925
    required_and_optional::<Date64Array, _>(
        (0..(SMALL_SIZE as i64 * 86400000)).step_by(86400000),
    );
}

According to ARROW-10925 a valid time is in the range 0..86400000 milliseconds.
Here DataType::Date64 is defined to be in milliseconds: https://arrow.apache.org/docs/cpp/api/datatype.html

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions