Skip to content

Handle Parquet Files With Inconsistent Timestamp Units #1459

@anliakho2

Description

@anliakho2

Describe the bug
If parquet is written with timestamps with time unit other than ns reading such file would produce incorrect dates, whereas pandas is reading the dates correctly

To Reproduce
Generate parquet file as follows:
`
import pandas as pd
import numpy as np

np.random.seed(0)

create an array of 5 dates starting at '2015-02-24', one per minute

rng = pd.date_range('2020-01-01', periods=5, freq='H')
df = pd.DataFrame({ 'Date': rng, 'Val': np.random.randn(len(rng)) })
df.to_parquet('data/myfile.parquet', coerce_timestamps='ms', allow_truncated_timestamps=True)
`

Expected behavior
Data is not corrupted and dates are read back correctly.

Additional context
_

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementAny new improvement worthy of a entry in the changelogparquetChanges to the parquet crate

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions