Skip to content

Commit fd47a93

Browse files
authored
Merge branch 'main' into np-eval-fix
2 parents 4051d36 + c468028 commit fd47a93

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

58 files changed

+401
-459
lines changed

doc/source/whatsnew/v2.2.2.rst

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -15,15 +15,17 @@ Fixed regressions
1515
~~~~~~~~~~~~~~~~~
1616
- :meth:`DataFrame.__dataframe__` was producing incorrect data buffers when the a column's type was a pandas nullable on with missing values (:issue:`56702`)
1717
- :meth:`DataFrame.__dataframe__` was producing incorrect data buffers when the a column's type was a pyarrow nullable on with missing values (:issue:`57664`)
18-
-
18+
- Fixed regression in precision of :func:`to_datetime` with string and ``unit`` input (:issue:`57051`)
1919

2020
.. ---------------------------------------------------------------------------
2121
.. _whatsnew_222.bug_fixes:
2222

2323
Bug fixes
2424
~~~~~~~~~
25+
- :meth:`DataFrame.__dataframe__` was producing incorrect data buffers when the column's type was nullable boolean (:issue:`55332`)
2526
- :meth:`DataFrame.__dataframe__` was showing bytemask instead of bitmask for ``'string[pyarrow]'`` validity buffer (:issue:`57762`)
2627
- :meth:`DataFrame.__dataframe__` was showing non-null validity buffer (instead of ``None``) ``'string[pyarrow]'`` without missing values (:issue:`57761`)
28+
- :meth:`DataFrame.to_sql` was failing to find the right table when using the schema argument (:issue:`57539`)
2729

2830
.. ---------------------------------------------------------------------------
2931
.. _whatsnew_222.other:

doc/source/whatsnew/v3.0.0.rst

Lines changed: 9 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -206,18 +206,23 @@ Removal of prior version deprecations/changes
206206
- :meth:`SeriesGroupBy.agg` no longer pins the name of the group to the input passed to the provided ``func`` (:issue:`51703`)
207207
- All arguments except ``name`` in :meth:`Index.rename` are now keyword only (:issue:`56493`)
208208
- All arguments except the first ``path``-like argument in IO writers are now keyword only (:issue:`54229`)
209+
- Disallow passing a pandas type to :meth:`Index.view` (:issue:`55709`)
210+
- Removed "freq" keyword from :class:`PeriodArray` constructor, use "dtype" instead (:issue:`52462`)
211+
- Removed deprecated "method" and "limit" keywords from :meth:`Series.replace` and :meth:`DataFrame.replace` (:issue:`53492`)
209212
- Removed the "closed" and "normalize" keywords in :meth:`DatetimeIndex.__new__` (:issue:`52628`)
210213
- Removed the "closed" and "unit" keywords in :meth:`TimedeltaIndex.__new__` (:issue:`52628`, :issue:`55499`)
211214
- All arguments in :meth:`Index.sort_values` are now keyword only (:issue:`56493`)
212215
- All arguments in :meth:`Series.to_dict` are now keyword only (:issue:`56493`)
213216
- Changed the default value of ``observed`` in :meth:`DataFrame.groupby` and :meth:`Series.groupby` to ``True`` (:issue:`51811`)
214217
- Enforce deprecation in :func:`testing.assert_series_equal` and :func:`testing.assert_frame_equal` with object dtype and mismatched null-like values, which are now considered not-equal (:issue:`18463`)
218+
- Enforced deprecation ``all`` and ``any`` reductions with ``datetime64`` and :class:`DatetimeTZDtype` dtypes (:issue:`58029`)
215219
- Enforced deprecation disallowing parsing datetimes with mixed time zones unless user passes ``utc=True`` to :func:`to_datetime` (:issue:`57275`)
216220
- Enforced deprecation in :meth:`Series.value_counts` and :meth:`Index.value_counts` with object dtype performing dtype inference on the ``.index`` of the result (:issue:`56161`)
217221
- Enforced deprecation of :meth:`.DataFrameGroupBy.get_group` and :meth:`.SeriesGroupBy.get_group` allowing the ``name`` argument to be a non-tuple when grouping by a list of length 1 (:issue:`54155`)
218222
- Enforced deprecation of :meth:`Series.interpolate` and :meth:`DataFrame.interpolate` for object-dtype (:issue:`57820`)
219223
- Enforced deprecation of :meth:`offsets.Tick.delta`, use ``pd.Timedelta(obj)`` instead (:issue:`55498`)
220224
- Enforced deprecation of ``axis=None`` acting the same as ``axis=0`` in the DataFrame reductions ``sum``, ``prod``, ``std``, ``var``, and ``sem``, passing ``axis=None`` will now reduce over both axes; this is particularly the case when doing e.g. ``numpy.sum(df)`` (:issue:`21597`)
225+
- Enforced deprecation of non-standard (``np.ndarray``, :class:`ExtensionArray`, :class:`Index`, or :class:`Series`) argument to :func:`api.extensions.take` (:issue:`52981`)
221226
- Enforced deprecation of parsing system timezone strings to ``tzlocal``, which depended on system timezone, pass the 'tz' keyword instead (:issue:`50791`)
222227
- Enforced deprecation of passing a dictionary to :meth:`SeriesGroupBy.agg` (:issue:`52268`)
223228
- Enforced deprecation of string ``AS`` denoting frequency in :class:`YearBegin` and strings ``AS-DEC``, ``AS-JAN``, etc. denoting annual frequencies with various fiscal year starts (:issue:`57793`)
@@ -297,6 +302,7 @@ Performance improvements
297302
- Performance improvement in :meth:`DataFrameGroupBy.ffill`, :meth:`DataFrameGroupBy.bfill`, :meth:`SeriesGroupBy.ffill`, and :meth:`SeriesGroupBy.bfill` (:issue:`56902`)
298303
- Performance improvement in :meth:`Index.join` by propagating cached attributes in cases where the result matches one of the inputs (:issue:`57023`)
299304
- Performance improvement in :meth:`Index.take` when ``indices`` is a full range indexer from zero to length of index (:issue:`56806`)
305+
- Performance improvement in :meth:`Index.to_frame` returning a :class:`RangeIndex` columns of a :class:`Index` when possible. (:issue:`58018`)
300306
- Performance improvement in :meth:`MultiIndex.equals` for equal length indexes (:issue:`56990`)
301307
- Performance improvement in :meth:`RangeIndex.__getitem__` with a boolean mask or integers returning a :class:`RangeIndex` instead of a :class:`Index` when possible. (:issue:`57588`)
302308
- Performance improvement in :meth:`RangeIndex.append` when appending the same index (:issue:`57252`)
@@ -318,9 +324,11 @@ Bug fixes
318324
~~~~~~~~~
319325
- Fixed bug in :class:`SparseDtype` for equal comparison with na fill value. (:issue:`54770`)
320326
- Fixed bug in :meth:`.DataFrameGroupBy.median` where nat values gave an incorrect result. (:issue:`57926`)
321-
- Fixed bug in :meth:`DataFrame.eval` and :meth:`DataFrame.query` which caused an exception when using numpy via ``@`` notation. (:issue:`58041`)
327+
- Fixed bug in :meth:`DataFrame.cumsum` which was raising ``IndexError`` if dtype is ``timedelta64[ns]`` (:issue:`57956`)
328+
- Fixed bug in :meth:`DataFrame.eval` and :meth:`DataFrame.query` which caused an exception when using numpy via ``@`` notation. (:issue:`58041
322329
- Fixed bug in :meth:`DataFrame.join` inconsistently setting result index name (:issue:`55815`)
323330
- Fixed bug in :meth:`DataFrame.to_string` that raised ``StopIteration`` with nested DataFrames. (:issue:`16098`)
331+
- Fixed bug in :meth:`DataFrame.transform` that was returning the wrong order unless the index was monotonically increasing. (:issue:`57069`)
324332
- Fixed bug in :meth:`DataFrame.update` bool dtype being converted to object (:issue:`55509`)
325333
- Fixed bug in :meth:`DataFrameGroupBy.apply` that was returning a completely empty DataFrame when all return values of ``func`` were ``None`` instead of returning an empty DataFrame with the original columns and dtypes. (:issue:`57775`)
326334
- Fixed bug in :meth:`Series.diff` allowing non-integer values for the ``periods`` argument. (:issue:`56607`)

pandas/_libs/tslib.pyx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -275,7 +275,7 @@ def array_with_unit_to_datetime(
275275
bint is_raise = errors == "raise"
276276
ndarray[int64_t] iresult
277277
tzinfo tz = None
278-
float fval
278+
double fval
279279

280280
assert is_coerce or is_raise
281281

pandas/conftest.py

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -150,7 +150,6 @@ def pytest_collection_modifyitems(items, config) -> None:
150150
("is_categorical_dtype", "is_categorical_dtype is deprecated"),
151151
("is_sparse", "is_sparse is deprecated"),
152152
("DataFrameGroupBy.fillna", "DataFrameGroupBy.fillna is deprecated"),
153-
("NDFrame.replace", "The 'method' keyword"),
154153
("NDFrame.replace", "Series.replace without 'value'"),
155154
("NDFrame.clip", "Downcasting behavior in Series and DataFrame methods"),
156155
("Series.idxmin", "The behavior of Series.idxmin"),

pandas/core/algorithms.py

Lines changed: 13 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -43,7 +43,6 @@
4343
ensure_float64,
4444
ensure_object,
4545
ensure_platform_int,
46-
is_array_like,
4746
is_bool_dtype,
4847
is_complex_dtype,
4948
is_dict_like,
@@ -1163,28 +1162,30 @@ def take(
11631162
"""
11641163
if not isinstance(arr, (np.ndarray, ABCExtensionArray, ABCIndex, ABCSeries)):
11651164
# GH#52981
1166-
warnings.warn(
1167-
"pd.api.extensions.take accepting non-standard inputs is deprecated "
1168-
"and will raise in a future version. Pass either a numpy.ndarray, "
1169-
"ExtensionArray, Index, or Series instead.",
1170-
FutureWarning,
1171-
stacklevel=find_stack_level(),
1165+
raise TypeError(
1166+
"pd.api.extensions.take requires a numpy.ndarray, "
1167+
f"ExtensionArray, Index, or Series, got {type(arr).__name__}."
11721168
)
11731169

1174-
if not is_array_like(arr):
1175-
arr = np.asarray(arr)
1176-
11771170
indices = ensure_platform_int(indices)
11781171

11791172
if allow_fill:
11801173
# Pandas style, -1 means NA
11811174
validate_indices(indices, arr.shape[axis])
1175+
# error: Argument 1 to "take_nd" has incompatible type
1176+
# "ndarray[Any, Any] | ExtensionArray | Index | Series"; expected
1177+
# "ndarray[Any, Any]"
11821178
result = take_nd(
1183-
arr, indices, axis=axis, allow_fill=True, fill_value=fill_value
1179+
arr, # type: ignore[arg-type]
1180+
indices,
1181+
axis=axis,
1182+
allow_fill=True,
1183+
fill_value=fill_value,
11841184
)
11851185
else:
11861186
# NumPy style
1187-
result = arr.take(indices, axis=axis)
1187+
# error: Unexpected keyword argument "axis" for "take" of "ExtensionArray"
1188+
result = arr.take(indices, axis=axis) # type: ignore[call-arg,assignment]
11881189
return result
11891190

11901191

pandas/core/array_algos/datetimelike_accumulations.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -49,7 +49,8 @@ def _cum_func(
4949
if not skipna:
5050
mask = np.maximum.accumulate(mask)
5151

52-
result = func(y)
52+
# GH 57956
53+
result = func(y, axis=0)
5354
result[mask] = iNaT
5455

5556
if values.dtype.kind in "mM":

pandas/core/arrays/datetimelike.py

Lines changed: 4 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -1661,16 +1661,8 @@ def _groupby_op(
16611661
dtype = self.dtype
16621662
if dtype.kind == "M":
16631663
# Adding/multiplying datetimes is not valid
1664-
if how in ["sum", "prod", "cumsum", "cumprod", "var", "skew"]:
1665-
raise TypeError(f"datetime64 type does not support {how} operations")
1666-
if how in ["any", "all"]:
1667-
# GH#34479
1668-
warnings.warn(
1669-
f"'{how}' with datetime64 dtypes is deprecated and will raise in a "
1670-
f"future version. Use (obj != pd.Timestamp(0)).{how}() instead.",
1671-
FutureWarning,
1672-
stacklevel=find_stack_level(),
1673-
)
1664+
if how in ["any", "all", "sum", "prod", "cumsum", "cumprod", "var", "skew"]:
1665+
raise TypeError(f"datetime64 type does not support operation: '{how}'")
16741666

16751667
elif isinstance(dtype, PeriodDtype):
16761668
# Adding/multiplying Periods is not valid
@@ -2217,11 +2209,11 @@ def ceil(
22172209
# Reductions
22182210

22192211
def any(self, *, axis: AxisInt | None = None, skipna: bool = True) -> bool:
2220-
# GH#34479 the nanops call will issue a FutureWarning for non-td64 dtype
2212+
# GH#34479 the nanops call will raise a TypeError for non-td64 dtype
22212213
return nanops.nanany(self._ndarray, axis=axis, skipna=skipna, mask=self.isna())
22222214

22232215
def all(self, *, axis: AxisInt | None = None, skipna: bool = True) -> bool:
2224-
# GH#34479 the nanops call will issue a FutureWarning for non-td64 dtype
2216+
# GH#34479 the nanops call will raise a TypeError for non-td64 dtype
22252217

22262218
return nanops.nanall(self._ndarray, axis=axis, skipna=skipna, mask=self.isna())
22272219

pandas/core/arrays/period.py

Lines changed: 1 addition & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -54,7 +54,6 @@
5454
cache_readonly,
5555
doc,
5656
)
57-
from pandas.util._exceptions import find_stack_level
5857

5958
from pandas.core.dtypes.common import (
6059
ensure_object,
@@ -135,11 +134,6 @@ class PeriodArray(dtl.DatelikeOps, libperiod.PeriodMixin): # type: ignore[misc]
135134
dtype : PeriodDtype, optional
136135
A PeriodDtype instance from which to extract a `freq`. If both
137136
`freq` and `dtype` are specified, then the frequencies must match.
138-
freq : str or DateOffset
139-
The `freq` to use for the array. Mostly applicable when `values`
140-
is an ndarray of integers, when `freq` is required. When `values`
141-
is a PeriodArray (or box around), it's checked that ``values.freq``
142-
matches `freq`.
143137
copy : bool, default False
144138
Whether to copy the ordinals before storing.
145139
@@ -224,20 +218,7 @@ def _scalar_type(self) -> type[Period]:
224218
# --------------------------------------------------------------------
225219
# Constructors
226220

227-
def __init__(
228-
self, values, dtype: Dtype | None = None, freq=None, copy: bool = False
229-
) -> None:
230-
if freq is not None:
231-
# GH#52462
232-
warnings.warn(
233-
"The 'freq' keyword in the PeriodArray constructor is deprecated "
234-
"and will be removed in a future version. Pass 'dtype' instead",
235-
FutureWarning,
236-
stacklevel=find_stack_level(),
237-
)
238-
freq = validate_dtype_freq(dtype, freq)
239-
dtype = PeriodDtype(freq)
240-
221+
def __init__(self, values, dtype: Dtype | None = None, copy: bool = False) -> None:
241222
if dtype is not None:
242223
dtype = pandas_dtype(dtype)
243224
if not isinstance(dtype, PeriodDtype):

pandas/core/generic.py

Lines changed: 10 additions & 55 deletions
Original file line numberDiff line numberDiff line change
@@ -6727,12 +6727,10 @@ def _pad_or_backfill(
67276727
axis = self._get_axis_number(axis)
67286728
method = clean_fill_method(method)
67296729

6730-
if not self._mgr.is_single_block and axis == 1:
6731-
# e.g. test_align_fill_method
6732-
# TODO(3.0): once downcast is removed, we can do the .T
6733-
# in all axis=1 cases, and remove axis kward from mgr.pad_or_backfill.
6734-
if inplace:
6730+
if axis == 1:
6731+
if not self._mgr.is_single_block and inplace:
67356732
raise NotImplementedError()
6733+
# e.g. test_align_fill_method
67366734
result = self.T._pad_or_backfill(
67376735
method=method, limit=limit, limit_area=limit_area
67386736
).T
@@ -6741,7 +6739,6 @@ def _pad_or_backfill(
67416739

67426740
new_mgr = self._mgr.pad_or_backfill(
67436741
method=method,
6744-
axis=self._get_block_manager_axis(axis),
67456742
limit=limit,
67466743
limit_area=limit_area,
67476744
inplace=inplace,
@@ -7285,9 +7282,7 @@ def replace(
72857282
value=...,
72867283
*,
72877284
inplace: Literal[False] = ...,
7288-
limit: int | None = ...,
72897285
regex: bool = ...,
7290-
method: Literal["pad", "ffill", "bfill"] | lib.NoDefault = ...,
72917286
) -> Self: ...
72927287

72937288
@overload
@@ -7297,9 +7292,7 @@ def replace(
72977292
value=...,
72987293
*,
72997294
inplace: Literal[True],
7300-
limit: int | None = ...,
73017295
regex: bool = ...,
7302-
method: Literal["pad", "ffill", "bfill"] | lib.NoDefault = ...,
73037296
) -> None: ...
73047297

73057298
@overload
@@ -7309,9 +7302,7 @@ def replace(
73097302
value=...,
73107303
*,
73117304
inplace: bool = ...,
7312-
limit: int | None = ...,
73137305
regex: bool = ...,
7314-
method: Literal["pad", "ffill", "bfill"] | lib.NoDefault = ...,
73157306
) -> Self | None: ...
73167307

73177308
@final
@@ -7326,32 +7317,9 @@ def replace(
73267317
value=lib.no_default,
73277318
*,
73287319
inplace: bool = False,
7329-
limit: int | None = None,
73307320
regex: bool = False,
7331-
method: Literal["pad", "ffill", "bfill"] | lib.NoDefault = lib.no_default,
73327321
) -> Self | None:
7333-
if method is not lib.no_default:
7334-
warnings.warn(
7335-
# GH#33302
7336-
f"The 'method' keyword in {type(self).__name__}.replace is "
7337-
"deprecated and will be removed in a future version.",
7338-
FutureWarning,
7339-
stacklevel=find_stack_level(),
7340-
)
7341-
elif limit is not None:
7342-
warnings.warn(
7343-
# GH#33302
7344-
f"The 'limit' keyword in {type(self).__name__}.replace is "
7345-
"deprecated and will be removed in a future version.",
7346-
FutureWarning,
7347-
stacklevel=find_stack_level(),
7348-
)
7349-
if (
7350-
value is lib.no_default
7351-
and method is lib.no_default
7352-
and not is_dict_like(to_replace)
7353-
and regex is False
7354-
):
7322+
if value is lib.no_default and not is_dict_like(to_replace) and regex is False:
73557323
# case that goes through _replace_single and defaults to method="pad"
73567324
warnings.warn(
73577325
# GH#33302
@@ -7387,14 +7355,11 @@ def replace(
73877355
if not is_bool(regex) and to_replace is not None:
73887356
raise ValueError("'to_replace' must be 'None' if 'regex' is not a bool")
73897357

7390-
if value is lib.no_default or method is not lib.no_default:
7358+
if value is lib.no_default:
73917359
# GH#36984 if the user explicitly passes value=None we want to
73927360
# respect that. We have the corner case where the user explicitly
73937361
# passes value=None *and* a method, which we interpret as meaning
73947362
# they want the (documented) default behavior.
7395-
if method is lib.no_default:
7396-
# TODO: get this to show up as the default in the docs?
7397-
method = "pad"
73987363

73997364
# passing a single value that is scalar like
74007365
# when value is None (GH5319), for compat
@@ -7408,12 +7373,12 @@ def replace(
74087373

74097374
result = self.apply(
74107375
Series._replace_single,
7411-
args=(to_replace, method, inplace, limit),
7376+
args=(to_replace, inplace),
74127377
)
74137378
if inplace:
74147379
return None
74157380
return result
7416-
return self._replace_single(to_replace, method, inplace, limit)
7381+
return self._replace_single(to_replace, inplace)
74177382

74187383
if not is_dict_like(to_replace):
74197384
if not is_dict_like(regex):
@@ -7458,9 +7423,7 @@ def replace(
74587423
else:
74597424
to_replace, value = keys, values
74607425

7461-
return self.replace(
7462-
to_replace, value, inplace=inplace, limit=limit, regex=regex
7463-
)
7426+
return self.replace(to_replace, value, inplace=inplace, regex=regex)
74647427
else:
74657428
# need a non-zero len on all axes
74667429
if not self.size:
@@ -7524,9 +7487,7 @@ def replace(
75247487
f"or a list or dict of strings or regular expressions, "
75257488
f"you passed a {type(regex).__name__!r}"
75267489
)
7527-
return self.replace(
7528-
regex, value, inplace=inplace, limit=limit, regex=True
7529-
)
7490+
return self.replace(regex, value, inplace=inplace, regex=True)
75307491
else:
75317492
# dest iterable dict-like
75327493
if is_dict_like(value): # NA -> {'A' : 0, 'B' : -1}
@@ -9660,13 +9621,7 @@ def _where(
96609621

96619622
# make sure we are boolean
96629623
fill_value = bool(inplace)
9663-
with warnings.catch_warnings():
9664-
warnings.filterwarnings(
9665-
"ignore",
9666-
"Downcasting object dtype arrays",
9667-
category=FutureWarning,
9668-
)
9669-
cond = cond.fillna(fill_value)
9624+
cond = cond.fillna(fill_value)
96709625
cond = cond.infer_objects()
96719626

96729627
msg = "Boolean array expected for the condition, not {dtype}"

pandas/core/groupby/groupby.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1439,6 +1439,7 @@ def _transform_with_numba(self, func, *args, engine_kwargs=None, **kwargs):
14391439
data and indices into a Numba jitted function.
14401440
"""
14411441
data = self._obj_with_exclusions
1442+
index_sorting = self._grouper.result_ilocs
14421443
df = data if data.ndim == 2 else data.to_frame()
14431444

14441445
starts, ends, sorted_index, sorted_data = self._numba_prep(df)
@@ -1456,7 +1457,7 @@ def _transform_with_numba(self, func, *args, engine_kwargs=None, **kwargs):
14561457
)
14571458
# result values needs to be resorted to their original positions since we
14581459
# evaluated the data sorted by group
1459-
result = result.take(np.argsort(sorted_index), axis=0)
1460+
result = result.take(np.argsort(index_sorting), axis=0)
14601461
index = data.index
14611462
if data.ndim == 1:
14621463
result_kwargs = {"name": data.name}

0 commit comments

Comments
 (0)