Skip to content

Commit 5bad13f

Browse files
authored
Merge pull request #157 from pandas-dev/master
Sync Fork from Upstream Repo
2 parents 91209d8 + d7b04d1 commit 5bad13f

File tree

27 files changed

+1099
-499
lines changed

27 files changed

+1099
-499
lines changed
9.56 KB
Binary file not shown.
9.14 KB
Binary file not shown.

doc/source/ecosystem.rst

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -98,7 +98,8 @@ which can be used for a wide variety of time series data mining tasks.
9898
Visualization
9999
-------------
100100

101-
While :ref:`pandas has built-in support for data visualization with matplotlib <visualization>`,
101+
`Pandas has its own Styler class for table visualization <user_guide/style.ipynb>`_, and while
102+
:ref:`pandas also has built-in support for data visualization through charts with matplotlib <visualization>`,
102103
there are a number of other pandas-compatible libraries.
103104

104105
`Altair <https://altair-viz.github.io/>`__

doc/source/user_guide/index.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -38,12 +38,12 @@ Further information on any specific method can be obtained in the
3838
integer_na
3939
boolean
4040
visualization
41+
style
4142
computation
4243
groupby
4344
window
4445
timeseries
4546
timedeltas
46-
style
4747
options
4848
enhancingperf
4949
scale

doc/source/user_guide/style.ipynb

Lines changed: 785 additions & 407 deletions
Large diffs are not rendered by default.

doc/source/user_guide/visualization.rst

Lines changed: 6 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -2,9 +2,12 @@
22

33
{{ header }}
44

5-
*************
6-
Visualization
7-
*************
5+
*******************
6+
Chart Visualization
7+
*******************
8+
9+
This section demonstrates visualization through charting. For information on
10+
visualization of tabular data please see the section on `Table Visualization <style.ipynb>`_.
811

912
We use the standard convention for referencing the matplotlib API:
1013

doc/source/user_guide/window.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -101,7 +101,7 @@ be calculated with :meth:`~Rolling.apply` by specifying a separate column of wei
101101
102102
All windowing operations support a ``min_periods`` argument that dictates the minimum amount of
103103
non-``np.nan`` values a window must have; otherwise, the resulting value is ``np.nan``.
104-
``min_peridos`` defaults to 1 for time-based windows and ``window`` for fixed windows
104+
``min_periods`` defaults to 1 for time-based windows and ``window`` for fixed windows
105105

106106
.. ipython:: python
107107

doc/source/whatsnew/v1.3.0.rst

Lines changed: 33 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -302,6 +302,38 @@ cast to ``dtype=object`` (:issue:`38709`)
302302
ser2
303303
304304
305+
.. _whatsnew_130.notable_bug_fixes.rolling_groupby_column:
306+
307+
GroupBy.rolling no longer returns grouped-by column in values
308+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
309+
310+
The group-by column will now be dropped from the result of a
311+
``groupby.rolling`` operation (:issue:`32262`)
312+
313+
.. ipython:: python
314+
315+
df = pd.DataFrame({"A": [1, 1, 2, 3], "B": [0, 1, 2, 3]})
316+
df
317+
318+
*Previous behavior*:
319+
320+
.. code-block:: ipython
321+
322+
In [1]: df.groupby("A").rolling(2).sum()
323+
Out[1]:
324+
A B
325+
A
326+
1 0 NaN NaN
327+
1 2.0 1.0
328+
2 2 NaN NaN
329+
3 3 NaN NaN
330+
331+
*New behavior*:
332+
333+
.. ipython:: python
334+
335+
df.groupby("A").rolling(2).sum()
336+
305337
.. _whatsnew_130.notable_bug_fixes.rolling_var_precision:
306338

307339
Removed artificial truncation in rolling variance and standard deviation
@@ -501,6 +533,7 @@ Numeric
501533
- Bug in :meth:`DataFrame.mode` and :meth:`Series.mode` not keeping consistent integer :class:`Index` for empty input (:issue:`33321`)
502534
- Bug in :meth:`DataFrame.rank` with ``np.inf`` and mixture of ``np.nan`` and ``np.inf`` (:issue:`32593`)
503535
- Bug in :meth:`DataFrame.rank` with ``axis=0`` and columns holding incomparable types raising ``IndexError`` (:issue:`38932`)
536+
- Bug in ``rank`` method for :class:`Series`, :class:`DataFrame`, :class:`DataFrameGroupBy`, and :class:`SeriesGroupBy` treating the most negative ``int64`` value as missing (:issue:`32859`)
504537
- Bug in :func:`select_dtypes` different behavior between Windows and Linux with ``include="int"`` (:issue:`36569`)
505538
- Bug in :meth:`DataFrame.apply` and :meth:`DataFrame.agg` when passed argument ``func="size"`` would operate on the entire ``DataFrame`` instead of rows or columns (:issue:`39934`)
506539
- Bug in :meth:`DataFrame.transform` would raise ``SpecificationError`` when passed a dictionary and columns were missing; will now raise a ``KeyError`` instead (:issue:`40004`)

pandas/_libs/algos.pyx

Lines changed: 19 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -962,6 +962,7 @@ ctypedef fused rank_t:
962962
def rank_1d(
963963
ndarray[rank_t, ndim=1] values,
964964
const intp_t[:] labels,
965+
bint is_datetimelike=False,
965966
ties_method="average",
966967
bint ascending=True,
967968
bint pct=False,
@@ -977,6 +978,8 @@ def rank_1d(
977978
Array containing unique label for each group, with its ordering
978979
matching up to the corresponding record in `values`. If not called
979980
from a groupby operation, will be an array of 0's
981+
is_datetimelike : bool, default False
982+
True if `values` contains datetime-like entries.
980983
ties_method : {'average', 'min', 'max', 'first', 'dense'}, default
981984
'average'
982985
* average: average rank of group
@@ -1032,7 +1035,7 @@ def rank_1d(
10321035

10331036
if rank_t is object:
10341037
mask = missing.isnaobj(masked_vals)
1035-
elif rank_t is int64_t:
1038+
elif rank_t is int64_t and is_datetimelike:
10361039
mask = (masked_vals == NPY_NAT).astype(np.uint8)
10371040
elif rank_t is float64_t:
10381041
mask = np.isnan(masked_vals).astype(np.uint8)
@@ -1059,7 +1062,7 @@ def rank_1d(
10591062
if rank_t is object:
10601063
nan_fill_val = NegInfinity()
10611064
elif rank_t is int64_t:
1062-
nan_fill_val = np.iinfo(np.int64).min
1065+
nan_fill_val = NPY_NAT
10631066
elif rank_t is uint64_t:
10641067
nan_fill_val = 0
10651068
else:
@@ -1275,6 +1278,7 @@ def rank_1d(
12751278
def rank_2d(
12761279
ndarray[rank_t, ndim=2] in_arr,
12771280
int axis=0,
1281+
bint is_datetimelike=False,
12781282
ties_method="average",
12791283
bint ascending=True,
12801284
na_option="keep",
@@ -1299,7 +1303,9 @@ def rank_2d(
12991303
tiebreak = tiebreakers[ties_method]
13001304

13011305
keep_na = na_option == 'keep'
1302-
check_mask = rank_t is not uint64_t
1306+
1307+
# For cases where a mask is not possible, we can avoid mask checks
1308+
check_mask = not (rank_t is uint64_t or (rank_t is int64_t and not is_datetimelike))
13031309

13041310
if axis == 0:
13051311
values = np.asarray(in_arr).T.copy()
@@ -1310,28 +1316,34 @@ def rank_2d(
13101316
if values.dtype != np.object_:
13111317
values = values.astype('O')
13121318

1313-
if rank_t is not uint64_t:
1319+
if check_mask:
13141320
if ascending ^ (na_option == 'top'):
13151321
if rank_t is object:
13161322
nan_value = Infinity()
13171323
elif rank_t is float64_t:
13181324
nan_value = np.inf
1319-
elif rank_t is int64_t:
1325+
1326+
# int64 and datetimelike
1327+
else:
13201328
nan_value = np.iinfo(np.int64).max
13211329

13221330
else:
13231331
if rank_t is object:
13241332
nan_value = NegInfinity()
13251333
elif rank_t is float64_t:
13261334
nan_value = -np.inf
1327-
elif rank_t is int64_t:
1335+
1336+
# int64 and datetimelike
1337+
else:
13281338
nan_value = NPY_NAT
13291339

13301340
if rank_t is object:
13311341
mask = missing.isnaobj2d(values)
13321342
elif rank_t is float64_t:
13331343
mask = np.isnan(values)
1334-
elif rank_t is int64_t:
1344+
1345+
# int64 and datetimelike
1346+
else:
13351347
mask = values == NPY_NAT
13361348

13371349
np.putmask(values, mask, nan_value)

pandas/_libs/groupby.pyx

Lines changed: 9 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -681,18 +681,17 @@ group_mean_float64 = _group_mean['double']
681681

682682
@cython.wraparound(False)
683683
@cython.boundscheck(False)
684-
def _group_ohlc(floating[:, ::1] out,
685-
int64_t[::1] counts,
686-
ndarray[floating, ndim=2] values,
687-
const intp_t[:] labels,
688-
Py_ssize_t min_count=-1):
684+
def group_ohlc(floating[:, ::1] out,
685+
int64_t[::1] counts,
686+
ndarray[floating, ndim=2] values,
687+
const intp_t[:] labels,
688+
Py_ssize_t min_count=-1):
689689
"""
690690
Only aggregates on axis=0
691691
"""
692692
cdef:
693693
Py_ssize_t i, j, N, K, lab
694-
floating val, count
695-
Py_ssize_t ngroups = len(counts)
694+
floating val
696695

697696
assert min_count == -1, "'min_count' only used in add and prod"
698697

@@ -727,10 +726,6 @@ def _group_ohlc(floating[:, ::1] out,
727726
out[lab, 3] = val
728727

729728

730-
group_ohlc_float32 = _group_ohlc['float']
731-
group_ohlc_float64 = _group_ohlc['double']
732-
733-
734729
@cython.boundscheck(False)
735730
@cython.wraparound(False)
736731
def group_quantile(ndarray[float64_t] out,
@@ -1079,9 +1074,8 @@ def group_rank(float64_t[:, ::1] out,
10791074
ngroups : int
10801075
This parameter is not used, is needed to match signatures of other
10811076
groupby functions.
1082-
is_datetimelike : bool, default False
1083-
unused in this method but provided for call compatibility with other
1084-
Cython transformations
1077+
is_datetimelike : bool
1078+
True if `values` contains datetime-like entries.
10851079
ties_method : {'average', 'min', 'max', 'first', 'dense'}, default
10861080
'average'
10871081
* average: average rank of group
@@ -1109,6 +1103,7 @@ def group_rank(float64_t[:, ::1] out,
11091103
result = rank_1d(
11101104
values=values[:, 0],
11111105
labels=labels,
1106+
is_datetimelike=is_datetimelike,
11121107
ties_method=ties_method,
11131108
ascending=ascending,
11141109
pct=pct,

0 commit comments

Comments
 (0)