Skip to content

Conversation

@Kyrpel
Copy link

@Kyrpel Kyrpel commented Mar 29, 2022

@rhshadrach rhshadrach added IO CSV read_csv, to_csv Bug Categorical Categorical Data Type labels Apr 22, 2022
Copy link
Member

@rhshadrach rhshadrach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR!

Comment on lines +2284 to +2289
elif isinstance(values, (PeriodArray, IntervalArray)):
# GH46297
values = np.array(values, dtype="object")
mask = isna(values)
values[mask] = na_rep
return values
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is very similar to the EA block below (L2321); can they be combined?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Kyrpel can you do this one

)
if arr.dtype.kind in "O" and is_interval_dtype(arr):
# # GH46297 Interval
return arr.take(indexer, allow_fill=allow_fill)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fill_value here comes from the na_rep argument of to_csv; is ignoring it the right behavior? Does that mean whatever the user specifies will be ignored?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yah, this seems like a weird place to special-case

@simonjayhawkins simonjayhawkins added the Regression Functionality that used to work in a prior pandas version label May 16, 2022
@simonjayhawkins simonjayhawkins added this to the 1.4.3 milestone May 16, 2022
@simonjayhawkins
Copy link
Member

Thanks @Kyrpel for the PR. does this also fix the related issue #46812?

can you add a release note doc/source/whatsnew/v1.4.3.rst

Comment on lines +2284 to +2289
elif isinstance(values, (PeriodArray, IntervalArray)):
# GH46297
values = np.array(values, dtype="object")
mask = isna(values)
values[mask] = na_rep
return values
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Kyrpel can you do this one

return arr.take(
indexer, fill_value=fill_value, allow_fill=allow_fill, axis=axis
)
if arr.dtype.kind in "O" and is_interval_dtype(arr):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

elif

@simonjayhawkins
Copy link
Member

Thanks @Kyrpel for the PR.

This fix is currently adding some more code where the regression was caused by a code addition in #44930.

if isinstance(values, Categorical):
# GH#40754 Convert categorical datetimes to datetime array
values = algos.take_nd(
values.categories._values,
ensure_platform_int(values._codes),
fill_value=na_rep,
)

whereas before #44930 the conversion was done in ...

elif isinstance(values, ExtensionArray):
mask = isna(values)
new_values = np.asarray(values.astype(object))
new_values[mask] = na_rep
return new_values

Now removing the code addition in #44930 also fixes this issue.

Now since the astype(object) for categorical containing Timestamps was also fixed in #18024, i'm not sure why the special casing for Categorical was added to to_native_types.

Going back to this issue. repr already still works and according to some code comments, to_csv/repr are supposed to be sharing the same to_native_types code.

we also have the related issue #46812, and I may need to update this observation once I have looked at that one. I suspect that both issues should probably be fixed togther.

@simonjayhawkins
Copy link
Member

Now since the astype(object) for categorical containing Timestamps was also fixed in #18024, i'm not sure why the special casing for Categorical was added to to_native_types.

should have been containing Timestamps was also fixed in #18024 -> containing Timestamps #18024 was also fixed in #44930

cc @jbrockmendel @phofl (participants in #44930)

@simonjayhawkins simonjayhawkins mentioned this pull request Jun 8, 2022
@simonjayhawkins
Copy link
Member

Thanks @Kyrpel for the PR. closing as superseded by #47347.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Bug Categorical Categorical Data Type IO CSV read_csv, to_csv Regression Functionality that used to work in a prior pandas version

Projects

None yet

Development

Successfully merging this pull request may close these issues.

BUG: Error writing DataFrame with categorical type column and interval data to a CSV file

5 participants