-
-
Notifications
You must be signed in to change notification settings - Fork 19.3k
Closed
Labels
IndexingRelated to indexing on series/frames, not to indexes themselvesRelated to indexing on series/frames, not to indexes themselvesPerformanceMemory or execution speed performanceMemory or execution speed performance
Milestone
Description
I need to append several big Series to a big categorical Series.
Trying to update categories FAST i've found out that Index.difference uses Python's set, which is slow on creating LARGE set (i have up to 500k categories and 1.3M values).
numpy's setdiff1 is more than an order of magnitude faster (as of datetime64 Categorical):
tmp_unique = tmp.unique()
new_cats = pd.Index(pd.np.setdiff1d(tmp_unique[~pd.isnull(tmp_unique)], to.cat.categories))
Not so fast:
new_cats = pd.Index(tmp_unique[~pd.isnull(tmp_unique)]).difference(to.cat.categories)
Metadata
Metadata
Assignees
Labels
IndexingRelated to indexing on series/frames, not to indexes themselvesRelated to indexing on series/frames, not to indexes themselvesPerformanceMemory or execution speed performanceMemory or execution speed performance