Skip to content

Conversation

@seisman
Copy link
Member

@seisman seisman commented Nov 10, 2024

Description of proposed changes

Add tests for panda.Series with pyarrow date32/date64 dtypes.

In PR #2845, we added the support of date32[day][pyarrow] and date64[ms][pyarrow] and mapped them to np.datetime64. This PR improves it and maps them to "datetime64[D]" and datetim64[ms] respectively.

After approval, I plan to cherry-pick changes in e13b5d3 into a separate PR so that we can have a changelog entry.

@seisman seisman added maintenance Boring but important stuff for the core devs needs review This PR has higher priority and needs review. labels Nov 10, 2024
@seisman seisman added this to the 0.14.0 milestone Nov 10, 2024
@seisman seisman force-pushed the to_numpy/pandas_pyarrow_date branch from e13b5d3 to d504a04 Compare November 10, 2024 14:10
@weiji14 weiji14 changed the title clib.conversion._to_numpy: Add tests for panda.Series with pyarrow date32/date64 types clib.conversion._to_numpy: Add tests for pandas.Series with pyarrow date32/date64 types Nov 12, 2024
@michaelgrund michaelgrund added final review call This PR requires final review and approval from a second reviewer and removed needs review This PR has higher priority and needs review. labels Nov 12, 2024
@seisman
Copy link
Member Author

seisman commented Nov 14, 2024

Ping @weiji14 for final reviews on the date32/date64 behavior changes.

@weiji14
Copy link
Member

weiji14 commented Nov 14, 2024

Maybe update the test here too to check explicitly for "datetime64[D]"/"datetim64[ms]" instead of just np.datetime64? Or does it not matter I guess?

vectors = [
pd.Series(
data=[datetime.date(2020, 1, 1), datetime.date(2021, 12, 31)],
dtype="date32[day][pyarrow]",
),
pd.Series(
data=[datetime.date(2022, 1, 1), datetime.date(2023, 12, 31)],
dtype="date64[ms][pyarrow]",
),
]
arrays = vectors_to_arrays(vectors)
assert all(i.dtype.type == np.datetime64 for i in arrays)

@seisman
Copy link
Member Author

seisman commented Nov 14, 2024

Actually I feel the test test_vectors_to_arrays_pyarrow_datetime can be removed, since it's already covered by the newly added tests in this PR.

Copy link
Member

@weiji14 weiji14 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I feel the test test_vectors_to_arrays_pyarrow_datetime can be removed, since it's already covered by the newly added tests in this PR.

On second thought, maybe keep the test. we need to test that vectors_to_arrays can handle mixed dtypes (and that test checks both date32/date64) from a pandas.DataFrame or nested list, whereas _to_numpy is for single dtypes on a pandas.Series or single list.

@seisman seisman added skip-changelog Skip adding Pull Request to changelog and removed final review call This PR requires final review and approval from a second reviewer labels Nov 14, 2024
@seisman seisman merged commit 1475837 into main Nov 14, 2024
22 of 23 checks passed
@seisman seisman deleted the to_numpy/pandas_pyarrow_date branch November 14, 2024 03:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

maintenance Boring but important stuff for the core devs skip-changelog Skip adding Pull Request to changelog

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants