Skip to content

Commit bab9fe6

Browse files
committed
Merge branch 'master' of https://github.com/pandas-dev/pandas into fix_33956
2 parents 87d45d3 + 4a267c6 commit bab9fe6

File tree

146 files changed

+3431
-1710
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

146 files changed

+3431
-1710
lines changed

.travis.yml

Lines changed: 11 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -27,6 +27,11 @@ matrix:
2727
fast_finish: true
2828

2929
include:
30+
# In allowed failures
31+
- dist: bionic
32+
python: 3.9-dev
33+
env:
34+
- JOB="3.9-dev" PATTERN="(not slow and not network and not clipboard)"
3035
- env:
3136
- JOB="3.8" ENV_FILE="ci/deps/travis-38.yaml" PATTERN="(not slow and not network and not clipboard)"
3237

@@ -53,6 +58,11 @@ matrix:
5358
services:
5459
- mysql
5560
- postgresql
61+
allow_failures:
62+
- dist: bionic
63+
python: 3.9-dev
64+
env:
65+
- JOB="3.9-dev" PATTERN="(not slow and not network)"
5666

5767
before_install:
5868
- echo "before_install"
@@ -83,7 +93,7 @@ install:
8393
script:
8494
- echo "script start"
8595
- echo "$JOB"
86-
- source activate pandas-dev
96+
- if [ "$JOB" != "3.9-dev" ]; then source activate pandas-dev; fi
8797
- ci/run_tests.sh
8898

8999
after_script:

asv_bench/benchmarks/algorithms.py

Lines changed: 14 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -34,7 +34,16 @@ class Factorize:
3434
params = [
3535
[True, False],
3636
[True, False],
37-
["int", "uint", "float", "string", "datetime64[ns]", "datetime64[ns, tz]"],
37+
[
38+
"int",
39+
"uint",
40+
"float",
41+
"string",
42+
"datetime64[ns]",
43+
"datetime64[ns, tz]",
44+
"Int64",
45+
"boolean",
46+
],
3847
]
3948
param_names = ["unique", "sort", "dtype"]
4049

@@ -49,13 +58,15 @@ def setup(self, unique, sort, dtype):
4958
"datetime64[ns, tz]": pd.date_range(
5059
"2011-01-01", freq="H", periods=N, tz="Asia/Tokyo"
5160
),
61+
"Int64": pd.array(np.arange(N), dtype="Int64"),
62+
"boolean": pd.array(np.random.randint(0, 2, N), dtype="boolean"),
5263
}[dtype]
5364
if not unique:
5465
data = data.repeat(5)
55-
self.idx = data
66+
self.data = data
5667

5768
def time_factorize(self, unique, sort, dtype):
58-
self.idx.factorize(sort=sort)
69+
pd.factorize(self.data, sort=sort)
5970

6071

6172
class Duplicated:

ci/build39.sh

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
#!/bin/bash -e
2+
# Special build for python3.9 until numpy puts its own wheels up
3+
4+
sudo apt-get install build-essential gcc xvfb
5+
pip install --no-deps -U pip wheel setuptools
6+
pip install python-dateutil pytz pytest pytest-xdist hypothesis
7+
pip install cython --pre # https://github.com/cython/cython/issues/3395
8+
9+
git clone https://github.com/numpy/numpy
10+
cd numpy
11+
python setup.py build_ext --inplace
12+
python setup.py install
13+
cd ..
14+
rm -rf numpy
15+
16+
python setup.py build_ext -inplace
17+
python -m pip install --no-build-isolation -e .
18+
19+
python -c "import sys; print(sys.version_info)"
20+
python -c "import pandas as pd"
21+
python -c "import hypothesis"

ci/setup_env.sh

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,10 @@
11
#!/bin/bash -e
22

3+
if [ "$JOB" == "3.9-dev" ]; then
4+
/bin/bash ci/build39.sh
5+
exit 0
6+
fi
7+
38
# edit the locale file if needed
49
if [[ "$(uname)" == "Linux" && -n "$LC_ALL" ]]; then
510
echo "Adding locale to the first line of pandas/__init__.py"

doc/source/reference/extensions.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -45,6 +45,7 @@ objects.
4545
api.extensions.ExtensionArray.copy
4646
api.extensions.ExtensionArray.view
4747
api.extensions.ExtensionArray.dropna
48+
api.extensions.ExtensionArray.equals
4849
api.extensions.ExtensionArray.factorize
4950
api.extensions.ExtensionArray.fillna
5051
api.extensions.ExtensionArray.isna

doc/source/user_guide/groupby.rst

Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -199,6 +199,33 @@ For example, the groups created by ``groupby()`` below are in the order they app
199199
df3.groupby(['X']).get_group('B')
200200
201201
202+
.. _groupby.dropna:
203+
204+
.. versionadded:: 1.1.0
205+
206+
GroupBy dropna
207+
^^^^^^^^^^^^^^
208+
209+
By default ``NA`` values are excluded from group keys during the ``groupby`` operation. However,
210+
in case you want to include ``NA`` values in group keys, you could pass ``dropna=False`` to achieve it.
211+
212+
.. ipython:: python
213+
214+
df_list = [[1, 2, 3], [1, None, 4], [2, 1, 3], [1, 2, 2]]
215+
df_dropna = pd.DataFrame(df_list, columns=["a", "b", "c"])
216+
217+
df_dropna
218+
219+
.. ipython:: python
220+
221+
# Default `dropna` is set to True, which will exclude NaNs in keys
222+
df_dropna.groupby(by=["b"], dropna=True).sum()
223+
224+
# In order to allow NaN in keys, set `dropna` to False
225+
df_dropna.groupby(by=["b"], dropna=False).sum()
226+
227+
The default setting of ``dropna`` argument is ``True`` which means ``NA`` are not included in group keys.
228+
202229

203230
.. _groupby.attributes:
204231

doc/source/user_guide/timeseries.rst

Lines changed: 73 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1572,19 +1572,16 @@ end of the interval is closed:
15721572
15731573
ts.resample('5Min', closed='left').mean()
15741574
1575-
Parameters like ``label`` and ``loffset`` are used to manipulate the resulting
1576-
labels. ``label`` specifies whether the result is labeled with the beginning or
1577-
the end of the interval. ``loffset`` performs a time adjustment on the output
1578-
labels.
1575+
Parameters like ``label`` are used to manipulate the resulting labels.
1576+
``label`` specifies whether the result is labeled with the beginning or
1577+
the end of the interval.
15791578

15801579
.. ipython:: python
15811580
15821581
ts.resample('5Min').mean() # by default label='left'
15831582
15841583
ts.resample('5Min', label='left').mean()
15851584
1586-
ts.resample('5Min', label='left', loffset='1s').mean()
1587-
15881585
.. warning::
15891586

15901587
The default values for ``label`` and ``closed`` is '**left**' for all
@@ -1789,6 +1786,58 @@ natural and functions similarly to :py:func:`itertools.groupby`:
17891786
17901787
See :ref:`groupby.iterating-label` or :class:`Resampler.__iter__` for more.
17911788

1789+
.. _timeseries.adjust-the-start-of-the-bins:
1790+
1791+
Use `origin` or `offset` to adjust the start of the bins
1792+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1793+
1794+
.. versionadded:: 1.1.0
1795+
1796+
The bins of the grouping are adjusted based on the beginning of the day of the time series starting point. This works well with frequencies that are multiples of a day (like `30D`) or that divide a day evenly (like `90s` or `1min`). This can create inconsistencies with some frequencies that do not meet this criteria. To change this behavior you can specify a fixed Timestamp with the argument ``origin``.
1797+
1798+
For example:
1799+
1800+
.. ipython:: python
1801+
1802+
start, end = '2000-10-01 23:30:00', '2000-10-02 00:30:00'
1803+
middle = '2000-10-02 00:00:00'
1804+
rng = pd.date_range(start, end, freq='7min')
1805+
ts = pd.Series(np.arange(len(rng)) * 3, index=rng)
1806+
ts
1807+
1808+
Here we can see that, when using ``origin`` with its default value (``'start_day'``), the result after ``'2000-10-02 00:00:00'`` are not identical depending on the start of time series:
1809+
1810+
.. ipython:: python
1811+
1812+
ts.resample('17min', origin='start_day').sum()
1813+
ts[middle:end].resample('17min', origin='start_day').sum()
1814+
1815+
1816+
Here we can see that, when setting ``origin`` to ``'epoch'``, the result after ``'2000-10-02 00:00:00'`` are identical depending on the start of time series:
1817+
1818+
.. ipython:: python
1819+
1820+
ts.resample('17min', origin='epoch').sum()
1821+
ts[middle:end].resample('17min', origin='epoch').sum()
1822+
1823+
1824+
If needed you can use a custom timestamp for ``origin``:
1825+
1826+
.. ipython:: python
1827+
1828+
ts.resample('17min', origin='2001-01-01').sum()
1829+
ts[middle:end].resample('17min', origin=pd.Timestamp('2001-01-01')).sum()
1830+
1831+
If needed you can just adjust the bins with an ``offset`` Timedelta that would be added to the default ``origin``.
1832+
Those two examples are equivalent for this time series:
1833+
1834+
.. ipython:: python
1835+
1836+
ts.resample('17min', origin='start').sum()
1837+
ts.resample('17min', offset='23h30min').sum()
1838+
1839+
1840+
Note the use of ``'start'`` for ``origin`` on the last example. In that case, ``origin`` will be set to the first value of the timeseries.
17921841

17931842
.. _timeseries.periods:
17941843

@@ -2265,6 +2314,24 @@ you can use the ``tz_convert`` method.
22652314
Instead, the datetime needs to be localized using the ``localize`` method
22662315
on the ``pytz`` time zone object.
22672316

2317+
.. warning::
2318+
2319+
If you are using dates beyond 2038-01-18, due to current deficiencies
2320+
in the underlying libraries caused by the year 2038 problem, daylight saving time (DST) adjustments
2321+
to timezone aware dates will not be applied. If and when the underlying libraries are fixed,
2322+
the DST transitions will be applied. It should be noted though, that time zone data for far future time zones
2323+
are likely to be inaccurate, as they are simple extrapolations of the current set of (regularly revised) rules.
2324+
2325+
For example, for two dates that are in British Summer Time (and so would normally be GMT+1), both the following asserts evaluate as true:
2326+
2327+
.. ipython:: python
2328+
2329+
d_2037 = '2037-03-31T010101'
2330+
d_2038 = '2038-03-31T010101'
2331+
DST = 'Europe/London'
2332+
assert pd.Timestamp(d_2037, tz=DST) != pd.Timestamp(d_2037, tz='GMT')
2333+
assert pd.Timestamp(d_2038, tz=DST) == pd.Timestamp(d_2038, tz='GMT')
2334+
22682335
Under the hood, all timestamps are stored in UTC. Values from a time zone aware
22692336
:class:`DatetimeIndex` or :class:`Timestamp` will have their fields (day, hour, minute, etc.)
22702337
localized to the time zone. However, timestamps with the same UTC value are

0 commit comments

Comments
 (0)