Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
19 commits
Select commit Hold shift + click to select a range
8de43a1
CI: lint failure on master (#35007)
simonjayhawkins Jun 26, 2020
66fb100
CLN: remove redundant code in IndexOpsMixin.item (#35008)
simonjayhawkins Jun 26, 2020
f5b2e5a
Fix issue #35010: Double requirement given for fsspec (#35012)
SanthoshBala18 Jun 26, 2020
b8d4892
CLN: remove libreduction.Reducer (#35001)
jbrockmendel Jun 26, 2020
deec940
ENH: specificy missing labels in loc calls GH34272 (#34912)
timhunderwood Jun 26, 2020
9c77845
ENH: add ignore_index option in DataFrame.explode (#34933)
erfannariman Jun 26, 2020
a347e76
ERR: Fix to_timedelta error message (#34981)
dsaxton Jun 26, 2020
b88faa0
TST: rename fixtures named 'indices' to 'index' (#35024)
topper-123 Jun 26, 2020
dbfbef7
BUG: item_cache not cleared on DataFrame.values (#34999)
jbrockmendel Jun 26, 2020
0159cba
CLN: dont consolidate in indexing (#34679)
jbrockmendel Jun 26, 2020
1684d8d
CI: troubleshoot (#35044)
simonjayhawkins Jun 29, 2020
f6fb257
Fix issue #29837: added test case for aggregation with isnan (#35039)
biddwan09 Jun 29, 2020
1706d83
PERF: avoid duplicate is_single_block check (#35034)
jbrockmendel Jun 29, 2020
fc3b43a
CLN: move categorical tests from test_aggregate to test_categorical (…
simonjayhawkins Jun 29, 2020
84aa56b
DOC: correction to "size" in the plotting.bootstrap_plot docstring (#…
ericgrosz Jun 29, 2020
6598e39
BUG: reading line-format JSON from file url #27135 (#34811)
fangchenli Jun 29, 2020
549b0ff
CLN: assorted tslibs cleanups, annotations (#35045)
jbrockmendel Jun 29, 2020
bce11e2
DOC: Add example of NonFixedVariableWindowIndexer usage (#34994)
mroeschke Jun 29, 2020
02ab42f
CLN: make Info and DataFrameInfo subclasses (#34743)
MarcoGorelli Jun 29, 2020
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 12 additions & 0 deletions doc/source/user_guide/computation.rst
Original file line number Diff line number Diff line change
Expand Up @@ -597,6 +597,18 @@ You can view other examples of ``BaseIndexer`` subclasses `here <https://github.

.. versionadded:: 1.1

One subclass of note within those examples is the ``NonFixedVariableWindowIndexer`` that allows
rolling operations over a non-fixed offset like a ``BusinessDay``.

.. ipython:: python

from pandas.api.indexers import NonFixedVariableWindowIndexer
df = pd.DataFrame(range(10), index=pd.date_range('2020', periods=10))
offset = pd.offsets.BDay(1)
indexer = NonFixedVariableWindowIndexer(index=df.index, offset=offset)
df
df.rolling(indexer).sum()

For some problems knowledge of the future is available for analysis. For example, this occurs when
each data point is a full time series read from an experiment, and the task is to extract underlying
conditions. In these cases it can be useful to perform forward-looking rolling window computations.
Expand Down
12 changes: 12 additions & 0 deletions doc/source/whatsnew/v1.1.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,15 @@ including other versions of pandas.
Enhancements
~~~~~~~~~~~~

.. _whatsnew_110.specify_missing_labels:

KeyErrors raised by loc specify missing labels
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Previously, if labels were missing for a loc call, a KeyError was raised stating that this was no longer supported.

Now the error message also includes a list of the missing labels (max 10 items, display width 80 characters). See :issue:`34272`.


.. _whatsnew_110.astype_string:

All dtypes can now be converted to ``StringDtype``
Expand Down Expand Up @@ -318,6 +327,7 @@ Other enhancements
- :meth:`DataFrame.cov` and :meth:`Series.cov` now support a new parameter ddof to support delta degrees of freedom as in the corresponding numpy methods (:issue:`34611`).
- :meth:`DataFrame.to_html` and :meth:`DataFrame.to_string`'s ``col_space`` parameter now accepts a list or dict to change only some specific columns' width (:issue:`28917`).
- :meth:`DataFrame.to_excel` can now also write OpenOffice spreadsheet (.ods) files (:issue:`27222`)
- :meth:`~Series.explode` now accepts ``ignore_index`` to reset the index, similarly to :meth:`pd.concat` or :meth:`DataFrame.sort_values` (:issue:`34932`).

.. ---------------------------------------------------------------------------

Expand Down Expand Up @@ -663,6 +673,7 @@ Other API changes
- ``loc`` lookups with an object-dtype :class:`Index` and an integer key will now raise ``KeyError`` instead of ``TypeError`` when key is missing (:issue:`31905`)
- Using a :func:`pandas.api.indexers.BaseIndexer` with ``count``, ``min``, ``max``, ``median``, ``skew``, ``cov``, ``corr`` will now return correct results for any monotonic :func:`pandas.api.indexers.BaseIndexer` descendant (:issue:`32865`)
- Added a :func:`pandas.api.indexers.FixedForwardWindowIndexer` class to support forward-looking windows during ``rolling`` operations.
- Added a :func:`pandas.api.indexers.NonFixedVariableWindowIndexer` class to support ``rolling`` operations with non-fixed offsets (:issue:`34994`)
- Added :class:`pandas.errors.InvalidIndexError` (:issue:`34570`).
- :meth:`DataFrame.swaplevels` now raises a ``TypeError`` if the axis is not a :class:`MultiIndex`.
Previously an ``AttributeError`` was raised (:issue:`31126`)
Expand Down Expand Up @@ -1030,6 +1041,7 @@ I/O
- Bug in :meth:`read_excel` for ODS files removes 0.0 values (:issue:`27222`)
- Bug in :meth:`ujson.encode` was raising an `OverflowError` with numbers larger than sys.maxsize (:issue: `34395`)
- Bug in :meth:`HDFStore.append_to_multiple` was raising a ``ValueError`` when the min_itemsize parameter is set (:issue:`11238`)
- :meth:`read_json` now could read line-delimited json file from a file url while `lines` and `chunksize` are set.

Plotting
^^^^^^^^
Expand Down
1 change: 0 additions & 1 deletion environment.yml
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,6 @@ dependencies:
# Dask and its dependencies (that dont install with dask)
- dask-core
- toolz>=0.7.3
- fsspec>=0.5.1
- partd>=0.3.10
- cloudpickle>=0.2.1

Expand Down
174 changes: 1 addition & 173 deletions pandas/_libs/reduction.pyx
Original file line number Diff line number Diff line change
@@ -1,17 +1,12 @@
from copy import copy

from cython import Py_ssize_t
from cpython.ref cimport Py_INCREF

from libc.stdlib cimport malloc, free

import numpy as np
cimport numpy as cnp
from numpy cimport (ndarray,
int64_t,
PyArray_SETITEM,
PyArray_ITER_NEXT, PyArray_ITER_DATA, PyArray_IterNew,
flatiter)
from numpy cimport ndarray, int64_t
cnp.import_array()

from pandas._libs cimport util
Expand All @@ -26,146 +21,6 @@ cdef _check_result_array(object obj, Py_ssize_t cnt):
raise ValueError('Function does not reduce')


cdef class Reducer:
"""
Performs generic reduction operation on a C or Fortran-contiguous ndarray
while avoiding ndarray construction overhead
"""
cdef:
Py_ssize_t increment, chunksize, nresults
object dummy, f, labels, typ, ityp, index
ndarray arr

def __init__(
self, ndarray arr, object f, int axis=1, object dummy=None, object labels=None
):
cdef:
Py_ssize_t n, k

n, k = (<object>arr).shape

if axis == 0:
if not arr.flags.f_contiguous:
arr = arr.copy('F')

self.nresults = k
self.chunksize = n
self.increment = n * arr.dtype.itemsize
else:
if not arr.flags.c_contiguous:
arr = arr.copy('C')

self.nresults = n
self.chunksize = k
self.increment = k * arr.dtype.itemsize

self.f = f
self.arr = arr
self.labels = labels
self.dummy, self.typ, self.index, self.ityp = self._check_dummy(
dummy=dummy)

cdef _check_dummy(self, object dummy=None):
cdef:
object index = None, typ = None, ityp = None

if dummy is None:
dummy = np.empty(self.chunksize, dtype=self.arr.dtype)

# our ref is stolen later since we are creating this array
# in cython, so increment first
Py_INCREF(dummy)

else:

# we passed a Series
typ = type(dummy)
index = dummy.index
dummy = dummy.values

if dummy.dtype != self.arr.dtype:
raise ValueError('Dummy array must be same dtype')
if len(dummy) != self.chunksize:
raise ValueError(f'Dummy array must be length {self.chunksize}')

return dummy, typ, index, ityp

def get_result(self):
cdef:
char* dummy_buf
ndarray arr, result, chunk
Py_ssize_t i
flatiter it
object res, name, labels
object cached_typ = None

arr = self.arr
chunk = self.dummy
dummy_buf = chunk.data
chunk.data = arr.data
labels = self.labels

result = np.empty(self.nresults, dtype='O')
it = <flatiter>PyArray_IterNew(result)
reduction_success = True

try:
for i in range(self.nresults):

# create the cached type
# each time just reassign the data
if i == 0:

if self.typ is not None:
# In this case, we also have self.index
name = labels[i]
cached_typ = self.typ(
chunk, index=self.index, name=name, dtype=arr.dtype)

# use the cached_typ if possible
if cached_typ is not None:
# In this case, we also have non-None labels
name = labels[i]

object.__setattr__(
cached_typ._mgr._block, 'values', chunk)
object.__setattr__(cached_typ, 'name', name)
res = self.f(cached_typ)
else:
res = self.f(chunk)

# TODO: reason for not squeezing here?
extracted_res = _extract_result(res, squeeze=False)
if i == 0:
# On the first pass, we check the output shape to see
# if this looks like a reduction.
# If it does not, return the computed value to be used by the
# pure python implementation,
# so the function won't be called twice on the same object,
# and side effects would occur twice
try:
_check_result_array(extracted_res, len(self.dummy))
except ValueError as err:
if "Function does not reduce" not in str(err):
# catch only the specific exception
raise

reduction_success = False
PyArray_SETITEM(result, PyArray_ITER_DATA(it), copy(res))
break

PyArray_SETITEM(result, PyArray_ITER_DATA(it), extracted_res)
chunk.data = chunk.data + self.increment
PyArray_ITER_NEXT(it)

finally:
# so we don't free the wrong memory
chunk.data = dummy_buf

result = maybe_convert_objects(result)
return result, reduction_success


cdef class _BaseGrouper:
cdef _check_dummy(self, object dummy):
# both values and index must be an ndarray!
Expand Down Expand Up @@ -610,30 +465,3 @@ cdef class BlockSlider:
# axis=1 is the frame's axis=0
arr.data = self.base_ptrs[i]
arr.shape[1] = 0


def compute_reduction(arr: ndarray, f, axis: int = 0, dummy=None, labels=None):
"""

Parameters
-----------
arr : np.ndarray
f : function
axis : integer axis
dummy : type of reduced output (series)
labels : Index or None
"""

# We either have both dummy and labels, or neither of them
if (labels is None) ^ (dummy is None):
raise ValueError("Must pass either dummy and labels, or neither")

if labels is not None:
# Caller is responsible for ensuring we don't have MultiIndex
assert labels.nlevels == 1

# pass as an ndarray/ExtensionArray
labels = labels._values

reducer = Reducer(arr, f, axis=axis, dummy=dummy, labels=labels)
return reducer.get_result()
2 changes: 1 addition & 1 deletion pandas/_libs/tslibs/conversion.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -77,7 +77,7 @@ cdef inline int64_t cast_from_unit(object ts, str unit) except? -1:
return <int64_t>(base * m) + <int64_t>(frac * m)


cpdef inline object precision_from_unit(str unit):
cpdef inline (int64_t, int) precision_from_unit(str unit):
"""
Return a casting of the unit represented to nanoseconds + the precision
to round the fractional part.
Expand Down
8 changes: 4 additions & 4 deletions pandas/_libs/tslibs/fields.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -91,7 +91,7 @@ def build_field_sarray(const int64_t[:] dtindex):

@cython.wraparound(False)
@cython.boundscheck(False)
def get_date_name_field(const int64_t[:] dtindex, object field, object locale=None):
def get_date_name_field(const int64_t[:] dtindex, str field, object locale=None):
"""
Given a int64-based datetime index, return array of strings of date
name based on requested field (e.g. day_name)
Expand Down Expand Up @@ -141,7 +141,7 @@ def get_date_name_field(const int64_t[:] dtindex, object field, object locale=No

@cython.wraparound(False)
@cython.boundscheck(False)
def get_start_end_field(const int64_t[:] dtindex, object field,
def get_start_end_field(const int64_t[:] dtindex, str field,
object freqstr=None, int month_kw=12):
"""
Given an int64-based datetime index return array of indicators
Expand Down Expand Up @@ -386,7 +386,7 @@ def get_start_end_field(const int64_t[:] dtindex, object field,

@cython.wraparound(False)
@cython.boundscheck(False)
def get_date_field(const int64_t[:] dtindex, object field):
def get_date_field(const int64_t[:] dtindex, str field):
"""
Given a int64-based datetime index, extract the year, month, etc.,
field and return an array of these values.
Expand Down Expand Up @@ -548,7 +548,7 @@ def get_date_field(const int64_t[:] dtindex, object field):

@cython.wraparound(False)
@cython.boundscheck(False)
def get_timedelta_field(const int64_t[:] tdindex, object field):
def get_timedelta_field(const int64_t[:] tdindex, str field):
"""
Given a int64-based timedelta index, extract the days, hrs, sec.,
field and return an array of these values.
Expand Down
20 changes: 10 additions & 10 deletions pandas/_libs/tslibs/nattype.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -50,23 +50,23 @@ _nat_scalar_rules[Py_GE] = False
# ----------------------------------------------------------------------


def _make_nan_func(func_name, doc):
def _make_nan_func(func_name: str, doc: str):
def f(*args, **kwargs):
return np.nan
f.__name__ = func_name
f.__doc__ = doc
return f


def _make_nat_func(func_name, doc):
def _make_nat_func(func_name: str, doc: str):
def f(*args, **kwargs):
return c_NaT
f.__name__ = func_name
f.__doc__ = doc
return f


def _make_error_func(func_name, cls):
def _make_error_func(func_name: str, cls):
def f(*args, **kwargs):
raise ValueError(f"NaTType does not support {func_name}")

Expand Down Expand Up @@ -282,31 +282,31 @@ cdef class _NaT(datetime):
return NPY_NAT

@property
def is_leap_year(self):
def is_leap_year(self) -> bool:
return False

@property
def is_month_start(self):
def is_month_start(self) -> bool:
return False

@property
def is_quarter_start(self):
def is_quarter_start(self) -> bool:
return False

@property
def is_year_start(self):
def is_year_start(self) -> bool:
return False

@property
def is_month_end(self):
def is_month_end(self) -> bool:
return False

@property
def is_quarter_end(self):
def is_quarter_end(self) -> bool:
return False

@property
def is_year_end(self):
def is_year_end(self) -> bool:
return False


Expand Down
3 changes: 2 additions & 1 deletion pandas/_libs/tslibs/period.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@ import cython

from cpython.datetime cimport (
datetime,
tzinfo,
PyDate_Check,
PyDateTime_Check,
PyDateTime_IMPORT,
Expand Down Expand Up @@ -1417,7 +1418,7 @@ def extract_freq(ndarray[object] values):

@cython.wraparound(False)
@cython.boundscheck(False)
def dt64arr_to_periodarr(const int64_t[:] stamps, int freq, object tz):
def dt64arr_to_periodarr(const int64_t[:] stamps, int freq, tzinfo tz):
cdef:
Py_ssize_t n = len(stamps)
int64_t[:] result = np.empty(n, dtype=np.int64)
Expand Down
Loading