Skip to content

Conversation

@lukemanley
Copy link
Member

@lukemanley lukemanley commented Mar 4, 2022

Avoid unnecessary (and potentially expensive) call to MultiIndex._values. This only matters when MultiIndex._values has not been cached. Existing asv has been updated to test cached and non-cached cases.

import numpy as np
import pandas as pd

mi = pd.MultiIndex.from_arrays(
  [np.arange(10**6)] * 2
)
mi2 = pd.MultiIndex.from_arrays(
  [np.arange(10**7)] * 2
)

df = pd.DataFrame({"A": 1.0}, index=mi)

%timeit x = df.reindex(mi2.copy())
1.84 s ± 106 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)   <- main
685 ms ± 22.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)  <- PR

@lukemanley lukemanley added Indexing Related to indexing on series/frames, not to indexes themselves MultiIndex Performance Memory or execution speed performance labels Mar 4, 2022
elif method == "nearest":
indexer = self._get_nearest_indexer(target, limit, tolerance)
else:
tgt_values = target._get_engine_target()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice!

@jreback jreback added this to the 1.5 milestone Mar 5, 2022
@jreback jreback merged commit cefafd4 into pandas-dev:main Mar 5, 2022
@jreback
Copy link
Contributor

jreback commented Mar 5, 2022

very nice @lukemanley

@lukemanley lukemanley deleted the reindex-with-multiindex branch March 5, 2022 12:59
@twoertwein
Copy link
Member

twoertwein commented Mar 5, 2022

I think this PR made pyright unhappy, but I honestly don't see how this PR affected _mixins.py:

pandas/core/arrays/_mixins.py:185:26 - warning: Instance methods should take a "self" parameter (reportSelfClsParameterName)

If the solution is to ignore this warning, you can add
# pyright: reportSelfClsParameterName = false
at the top of the file.

edit:
or rename cls to self:

def _from_factorized(cls, values, original):

@jreback
Copy link
Contributor

jreback commented Mar 5, 2022

i think was a prior PR

@twoertwein
Copy link
Member

i think was a prior PR

Yes, you are right: #46214 If no one beats me, I will create a PR to rename cls to self

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Indexing Related to indexing on series/frames, not to indexes themselves MultiIndex Performance Memory or execution speed performance

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants