BUG: pivot_table with nested elements and numpy 1.24 #50682

mroeschke · 2023-01-11T22:43:42Z

closes BUG: numpy 1.24.0 causes ValueError with pivot_table and ragged nested sequences #50342 (Replace xxxx with the GitHub issue number)
Tests added and passed if fixing a bug or adding a new feature
All code checks passed.
Added type annotations to new arguments/methods/functions.
Added an entry in the latest doc/source/whatsnew/vX.X.X.rst file if fixing a bug or adding a new feature.

I think we have still pinned numpy in our 1.5.x branch but confirmed this still works locally

% conda list numpy
# packages in environment at /opt/miniconda3/envs/pandas-dev:
#
# Name                    Version                   Build  Channel
numpy                     1.24.1           py38hc2f29e8_0    conda-forge
numpydoc                  1.5.0              pyhd8ed1ab_0    conda-forge

% pytest pandas/tests/reshape/test_pivot.py -k test_pivot_table_with_mixed_nested_tuples
================================================================================== test session starts ===================================================================================
platform darwin -- Python 3.8.15, pytest-7.2.0, pluggy-1.0.0
rootdir: , configfile: pyproject.toml
plugins: asyncio-0.20.2, cython-0.2.0, hypothesis-6.58.1, xdist-3.0.2, cov-4.0.0, anyio-3.6.2
asyncio: mode=strict
collected 539 items / 538 deselected / 1 selected

pandas/tests/reshape/test_pivot.py .

------------------------------------------------------- generated xml file: /test-data.xml --------------------------------------------------------
================================================================================== slowest 30 durations ==================================================================================
0.01s call     pandas/tests/reshape/test_pivot.py::TestPivotTable::test_pivot_table_with_mixed_nested_tuples

(2 durations < 0.005s hidden.  Use -vv to show these durations.)
=========================================================================== 1 passed, 538 deselected in 0.48s ============================================================================

EDIT: Numpy is no longer pinned on main so this should run in the CI

mroeschke · 2023-01-12T01:16:28Z

Decided not to check the numpy version per se since in prior numpy versions this was raising a DeprecationWarning internally

phofl · 2023-01-14T22:56:11Z

pandas/core/common.py


-    if isinstance(values, list) and dtype in [np.object_, object]:
+    if isinstance(values, list) and (
+        dtype in [np.object_, object] or any(is_list_like(val) for val in values)


Any idea about performance impact here?

Yeah in the worst case where this can't short circuit there's a perf hit

In [1]: values = list(range(1000)) In [2]: %timeit pd.core.common.asarray_tuplesafe(values) # PR 279 µs ± 1.6 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each) In [2]: %timeit pd.core.common.asarray_tuplesafe(values) # main 41.8 µs ± 623 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

Is this a common use-case?

Would try-except be significantly faster?

Good idea. With the try except I get closer to main performance (~44)

datapythonista

lgtm, added two minor suggestions

datapythonista · 2023-01-17T08:55:07Z

pandas/core/common.py

+            # Can remove warning filter once NumPy 1.24 is min version
+            warnings.simplefilter("ignore", np.VisibleDeprecationWarning)
+            result = np.asarray(values, dtype=dtype)
+    except ValueError:


In general I think it's better to just wrap the line that can raise in the try block. Is it a problem with the warning catching in this case? No big deal, I guess the warnings stuff won't raise a ValueError, but if it does, the behavior won't be as expected.

Hm in this case I think this is better as is, if we would wrap try-except into the catch_warnings statement, then we would also catch warnings in the except block, which isn't what we want here

Yeah result = np.asarray(values, dtype=dtype) will raise a warning (due to our usage) w/ numpy < 1.24 and raise an exception with numpy >= 1.24. As mentioned, I don't want to accidentally mask a warning within the except block.

pandas/core/common.py

Co-authored-by: Marc Garcia <[email protected]>

…nd numpy 1.24

…ents and numpy 1.24) (#50792) Backport PR #50682: BUG: pivot_table with nested elements and numpy 1.24 Co-authored-by: Matthew Roeschke <[email protected]>

mroeschke added 2 commits January 11, 2023 14:17

Fix asarray_tuplesafe for numpy 1.24.1 deprecation

b43303f

BUG: pivot_table with nested elements and numpy 1.24

750ac7d

mroeschke added the Reshaping Concat, Merge/Join, Stack/Unstack, Explode label Jan 11, 2023

mroeschke added this to the 1.5.3 milestone Jan 11, 2023

mroeschke added 3 commits January 11, 2023 17:10

Merge remote-tracking branch 'upstream/main' into compat/np/pivot

1c25540

For all numpy versions

5c46695

Undo unneeded variable

4b88e87

Dr-Irv mentioned this pull request Jan 12, 2023

allow np.uint64 to be used in indexing. Support numpy 1.24.1 pandas-dev/pandas-stubs#510

Merged

2 tasks

mroeschke added 4 commits January 12, 2023 12:11

Merge remote-tracking branch 'upstream/main' into compat/np/pivot

c51966b

Merge remote-tracking branch 'upstream/main' into compat/np/pivot

5c482eb

fix for arraymanager

c56e38a

Merge remote-tracking branch 'upstream/main' into compat/np/pivot

ec60be0

mroeschke mentioned this pull request Jan 13, 2023

RLS: pandas 1.5.3 #49857

Closed

2 tasks

mroeschke added 2 commits January 13, 2023 15:16

Merge remote-tracking branch 'upstream/main' into compat/np/pivot

016cf09

Merge remote-tracking branch 'upstream/main' into compat/np/pivot

01649f1

phofl reviewed Jan 14, 2023

View reviewed changes

mroeschke added 3 commits January 15, 2023 14:26

use try except

419f857

Merge remote-tracking branch 'upstream/main' into compat/np/pivot

dba09f4

typing

b8f453b

datapythonista approved these changes Jan 17, 2023

View reviewed changes

datapythonista mentioned this pull request Jan 17, 2023

BUG: Change FutureWarning to DeprecationWarning for inplace setitem with DataFrame.(i)loc #50044

Merged

5 tasks

mroeschke and others added 2 commits January 17, 2023 09:27

Update pandas/core/common.py

b68e727

Co-authored-by: Marc Garcia <[email protected]>

line length

cb084a6

mroeschke merged commit affcdf9 into pandas-dev:main Jan 17, 2023

mroeschke deleted the compat/np/pivot branch January 17, 2023 17:35

meeseeksmachine mentioned this pull request Jan 17, 2023

Backport PR #50682 on branch 1.5.x (BUG: pivot_table with nested elements and numpy 1.24) #50792

Merged

meeseeksmachine pushed a commit to meeseeksmachine/pandas that referenced this pull request Jan 17, 2023

Backport PR pandas-dev#50682: BUG: pivot_table with nested elements a…

30beab2

…nd numpy 1.24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

BUG: pivot_table with nested elements and numpy 1.24 #50682

BUG: pivot_table with nested elements and numpy 1.24 #50682

Uh oh!

mroeschke commented Jan 11, 2023 •

edited

Loading

Uh oh!

mroeschke commented Jan 12, 2023

Uh oh!

phofl Jan 14, 2023

Uh oh!

mroeschke Jan 15, 2023 •

edited

Loading

Uh oh!

phofl Jan 15, 2023

Uh oh!

mroeschke Jan 15, 2023

Uh oh!

datapythonista left a comment

Uh oh!

datapythonista Jan 17, 2023

Uh oh!

phofl Jan 17, 2023 •

edited

Loading

Uh oh!

mroeschke Jan 17, 2023

Uh oh!

Uh oh!

Uh oh!

Uh oh!

BUG: pivot_table with nested elements and numpy 1.24 #50682

BUG: pivot_table with nested elements and numpy 1.24 #50682

Uh oh!

Conversation

mroeschke commented Jan 11, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mroeschke commented Jan 12, 2023

Uh oh!

phofl Jan 14, 2023

Choose a reason for hiding this comment

Uh oh!

mroeschke Jan 15, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

phofl Jan 15, 2023

Choose a reason for hiding this comment

Uh oh!

mroeschke Jan 15, 2023

Choose a reason for hiding this comment

Uh oh!

datapythonista left a comment

Choose a reason for hiding this comment

Uh oh!

datapythonista Jan 17, 2023

Choose a reason for hiding this comment

Uh oh!

phofl Jan 17, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mroeschke Jan 17, 2023

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

mroeschke commented Jan 11, 2023 •

edited

Loading

mroeschke Jan 15, 2023 •

edited

Loading

phofl Jan 17, 2023 •

edited

Loading