Skip to content

Conversation

mroeschke
Copy link
Member

@mroeschke mroeschke commented Jan 11, 2023

I think we have still pinned numpy in our 1.5.x branch but confirmed this still works locally

% conda list numpy
# packages in environment at /opt/miniconda3/envs/pandas-dev:
#
# Name                    Version                   Build  Channel
numpy                     1.24.1           py38hc2f29e8_0    conda-forge
numpydoc                  1.5.0              pyhd8ed1ab_0    conda-forge

% pytest pandas/tests/reshape/test_pivot.py -k test_pivot_table_with_mixed_nested_tuples
================================================================================== test session starts ===================================================================================
platform darwin -- Python 3.8.15, pytest-7.2.0, pluggy-1.0.0
rootdir: , configfile: pyproject.toml
plugins: asyncio-0.20.2, cython-0.2.0, hypothesis-6.58.1, xdist-3.0.2, cov-4.0.0, anyio-3.6.2
asyncio: mode=strict
collected 539 items / 538 deselected / 1 selected

pandas/tests/reshape/test_pivot.py .

------------------------------------------------------- generated xml file: /test-data.xml --------------------------------------------------------
================================================================================== slowest 30 durations ==================================================================================
0.01s call     pandas/tests/reshape/test_pivot.py::TestPivotTable::test_pivot_table_with_mixed_nested_tuples

(2 durations < 0.005s hidden.  Use -vv to show these durations.)
=========================================================================== 1 passed, 538 deselected in 0.48s ============================================================================

EDIT: Numpy is no longer pinned on main so this should run in the CI

@mroeschke mroeschke added the Reshaping Concat, Merge/Join, Stack/Unstack, Explode label Jan 11, 2023
@mroeschke mroeschke added this to the 1.5.3 milestone Jan 11, 2023
@mroeschke
Copy link
Member Author

Decided not to check the numpy version per se since in prior numpy versions this was raising a DeprecationWarning internally

@mroeschke mroeschke mentioned this pull request Jan 13, 2023
2 tasks

if isinstance(values, list) and dtype in [np.object_, object]:
if isinstance(values, list) and (
dtype in [np.object_, object] or any(is_list_like(val) for val in values)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any idea about performance impact here?

Copy link
Member Author

@mroeschke mroeschke Jan 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah in the worst case where this can't short circuit there's a perf hit

In [1]: values = list(range(1000))

In [2]: %timeit pd.core.common.asarray_tuplesafe(values) # PR
279 µs ± 1.6 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

In [2]: %timeit pd.core.common.asarray_tuplesafe(values) # main
41.8 µs ± 623 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a common use-case?

Would try-except be significantly faster?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea. With the try except I get closer to main performance (~44)

Copy link
Member

@datapythonista datapythonista left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, added two minor suggestions

# Can remove warning filter once NumPy 1.24 is min version
warnings.simplefilter("ignore", np.VisibleDeprecationWarning)
result = np.asarray(values, dtype=dtype)
except ValueError:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general I think it's better to just wrap the line that can raise in the try block. Is it a problem with the warning catching in this case? No big deal, I guess the warnings stuff won't raise a ValueError, but if it does, the behavior won't be as expected.

Copy link
Member

@phofl phofl Jan 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm in this case I think this is better as is, if we would wrap try-except into the catch_warnings statement, then we would also catch warnings in the except block, which isn't what we want here

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah result = np.asarray(values, dtype=dtype) will raise a warning (due to our usage) w/ numpy < 1.24 and raise an exception with numpy >= 1.24. As mentioned, I don't want to accidentally mask a warning within the except block.

@mroeschke mroeschke merged commit affcdf9 into pandas-dev:main Jan 17, 2023
@mroeschke mroeschke deleted the compat/np/pivot branch January 17, 2023 17:35
meeseeksmachine pushed a commit to meeseeksmachine/pandas that referenced this pull request Jan 17, 2023
phofl pushed a commit that referenced this pull request Jan 17, 2023
…ents and numpy 1.24) (#50792)

Backport PR #50682: BUG: pivot_table with nested elements and numpy 1.24

Co-authored-by: Matthew Roeschke <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: numpy 1.24.0 causes ValueError with pivot_table and ragged nested sequences
3 participants